当前位置:网站首页>Hnsw introduction and some reference articles in lucene9
Hnsw introduction and some reference articles in lucene9
2022-07-03 07:30:00 【chuanyangwang】
NSW Search for :
K-NNSearch(object q, integer:m, k)
Definition :TreeSet[object] tempRes, candidates, visitedset, result
for(i=0;i<m;i++) do:
Pick one at random entry point Put in candidates in
tempRes = null
repeat:
stay candidates Select distance from q Some recent c
take c from candidates Delete in
If c Than all in result The elements in are all separated q far
then break repeat
for e: c.friends
If e be not in visitedSet Middle principle
hold e Join in visitedSet, candidates, tempRes
end repeat
hold tempRes Add result in
end for
from result Before returning in k results
NSW Of Insert :
Nearest_Neighbor_Insert(object:new_object, integer:f,w)
SET[object]: neighbors = k-NNSearch(new_object, w, f)
for(i = 0; i<f; i++) do:
neighbors[i].connect(new_object)
new_object.connect(neighbors[i])Lucene in hsnw The index structure of is as follows
Meta data and index part: +--------------------------------------------------+ | meta data | +--------+-----------------------------------------+ | doc id | offset to first friend list for the doc | +--------+-----------------------------------------+ | doc id | offset to first friend list for the doc | +--------+-----------------------------------------+ | ...... | +--------+-----------------------------------------+ Graph data part: +-------------------------+---------------------------+---------+-------------------------+ | friends list at layer N | friends list at layer N-1 | ...... | friends list at level 0 | <- friends lists for doc 0 +-------------------------+---------------------------+---------+-------------------------+ | friends list at layer N | friends list at layer N-1 | ...... | friends list at level 0 | <- friends lists for doc 1 +-------------------------+---------------------------+---------+-------------------------+ | ...... | <- and so on +-----------------------------------------------------------------------------------------+ Vector data part: +----------------------+ | encoded vector value | <- vector value for doc 0 +----------------------+ | encoded vector value | <- vector value for doc 1 +----------------------+ | ...... | <- and so on +----------------------+
The logic of writing indexes
org.apache.lucene.codecs.lucene90.Lucene90HnswVectorsWriter#writeField
1. Write vector
2. Write graph
3. Write meta
@Override
public void writeField(FieldInfo fieldInfo, VectorValues vectors) throws IOException {
long pos = vectorData.getFilePointer();
// write floats aligned at 4 bytes. This will not survive CFS, but it shows a small benefit when
// CFS is not used, eg for larger indexes
long padding = (4 - (pos & 0x3)) & 0x3;
long vectorDataOffset = pos + padding;
for (int i = 0; i < padding; i++) {
vectorData.writeByte((byte) 0);
}
// TODO - use a better data structure; a bitset? DocsWithFieldSet is p.p. in o.a.l.index
int[] docIds = new int[vectors.size()];
int count = 0;
for (int docV = vectors.nextDoc(); docV != NO_MORE_DOCS; docV = vectors.nextDoc(), count++) {
// write vector
writeVectorValue(vectors);
docIds[count] = docV;
}
// count may be < vectors.size() e,g, if some documents were deleted
long[] offsets = new long[count];
long vectorDataLength = vectorData.getFilePointer() - vectorDataOffset;
long vectorIndexOffset = vectorIndex.getFilePointer();
if (vectors instanceof RandomAccessVectorValuesProducer) {
writeGraph(
vectorIndex,
(RandomAccessVectorValuesProducer) vectors,
fieldInfo.getVectorSimilarityFunction(),
vectorIndexOffset,
offsets,
count,
maxConn,
beamWidth);
} else {
throw new IllegalArgumentException(
"Indexing an HNSW graph requires a random access vector values, got " + vectors);
}
long vectorIndexLength = vectorIndex.getFilePointer() - vectorIndexOffset;
writeMeta(
fieldInfo,
vectorDataOffset,
vectorDataLength,
vectorIndexOffset,
vectorIndexLength,
count,
docIds);
writeGraphOffsets(meta, offsets);
}HNSW Introduce – d0evi1 The blog of
http://d0evi1.com/hnsw/
Delaunay Triangulation - luoru - Blog Garden
https://www.cnblogs.com/zfluo/p/5131851.html
HNSW Learning notes - You know NN Nearest neighbor search is widely used in all kinds of search 、 In the classification task , On a very large data set, it turns into ANN, Common algorithms are KD Trees 、LSH、IVFPQ And the HNSW. HNSW(Hierarchical Navigable Small World) yes ANN Graph based algorithms in search domain …
https://zhuanlan.zhihu.com/p/80552211
increase hnsw Discussion post for
https://issues.apache.org/jira/browse/LUCENE-9004
https://issues.apache.org/jira/browse/LUCENE-9004
Realize layered hnsw Discussion post for
hnsw Connectivity issues
Here is the definition of small world
The following is about connectivity
边栏推荐
- LeetCode
- Topic | synchronous asynchronous
- The difference between typescript let and VaR
- 【已解决】Unknown error 1146
- New stills of Lord of the rings: the ring of strength: the caster of the ring of strength appears
- How long is the fastest time you can develop data API? One minute is enough for me
- 在 4EVERLAND 上存储 WordPress 媒体内容,完成去中心化存储
- Win 2008 R2 crashed at the final installation stage
- Realize the reuse of components with different routing parameters and monitor the changes of routing parameters
- II. D3.js draw a simple figure -- circle
猜你喜欢

1. E-commerce tool cefsharp autojs MySQL Alibaba cloud react C RPA automated script, open source log

项目经验分享:实现一个昇思MindSpore 图层 IR 融合优化 pass

Interview questions about producers and consumers (important)
![PdfWriter. GetInstance throws system Nullreferenceexception [en] pdfwriter GetInstance throws System. NullRef](/img/65/1f28071fc15e76abb37f1b128e1d90.jpg)
PdfWriter. GetInstance throws system Nullreferenceexception [en] pdfwriter GetInstance throws System. NullRef

Image recognition and detection -- Notes

为什么说数据服务化是下一代数据中台的方向?

带你全流程,全方位的了解属于测试的软件事故

URL programming

Recursion, Fibonacci sequence

IO stream system and FileReader, filewriter
随机推荐
TypeScript let与var的区别
Hisat2 - stringtie - deseq2 pipeline for bulk RNA seq
Web router of vertx
Common APIs
Summary of Arduino serial functions related to print read
High concurrency memory pool
IO stream system and FileReader, filewriter
1. E-commerce tool cefsharp autojs MySQL Alibaba cloud react C RPA automated script, open source log
PgSQL converts string to double type (to_number())
The babbage industrial policy forum
7.2刷题两个
Chapter VI - Containers
图像识别与检测--笔记
C代码生产YUV420 planar格式文件
【已解决】SQLException: Invalid value for getInt() - ‘田鹏‘
VMware network mode - bridge, host only, NAT network
Logging log configuration of vertx
Jeecg data button permission settings
Dora (discover offer request recognition) process of obtaining IP address
Beginners use Minio