当前位置:网站首页>Hnsw introduction and some reference articles in lucene9
Hnsw introduction and some reference articles in lucene9
2022-07-03 07:30:00 【chuanyangwang】
NSW Search for :
K-NNSearch(object q, integer:m, k)
Definition :TreeSet[object] tempRes, candidates, visitedset, result
for(i=0;i<m;i++) do:
Pick one at random entry point Put in candidates in
tempRes = null
repeat:
stay candidates Select distance from q Some recent c
take c from candidates Delete in
If c Than all in result The elements in are all separated q far
then break repeat
for e: c.friends
If e be not in visitedSet Middle principle
hold e Join in visitedSet, candidates, tempRes
end repeat
hold tempRes Add result in
end for
from result Before returning in k results
NSW Of Insert :
Nearest_Neighbor_Insert(object:new_object, integer:f,w)
SET[object]: neighbors = k-NNSearch(new_object, w, f)
for(i = 0; i<f; i++) do:
neighbors[i].connect(new_object)
new_object.connect(neighbors[i])
Lucene in hsnw The index structure of is as follows
Meta data and index part: +--------------------------------------------------+ | meta data | +--------+-----------------------------------------+ | doc id | offset to first friend list for the doc | +--------+-----------------------------------------+ | doc id | offset to first friend list for the doc | +--------+-----------------------------------------+ | ...... | +--------+-----------------------------------------+ Graph data part: +-------------------------+---------------------------+---------+-------------------------+ | friends list at layer N | friends list at layer N-1 | ...... | friends list at level 0 | <- friends lists for doc 0 +-------------------------+---------------------------+---------+-------------------------+ | friends list at layer N | friends list at layer N-1 | ...... | friends list at level 0 | <- friends lists for doc 1 +-------------------------+---------------------------+---------+-------------------------+ | ...... | <- and so on +-----------------------------------------------------------------------------------------+ Vector data part: +----------------------+ | encoded vector value | <- vector value for doc 0 +----------------------+ | encoded vector value | <- vector value for doc 1 +----------------------+ | ...... | <- and so on +----------------------+
The logic of writing indexes
org.apache.lucene.codecs.lucene90.Lucene90HnswVectorsWriter#writeField
1. Write vector
2. Write graph
3. Write meta
@Override
public void writeField(FieldInfo fieldInfo, VectorValues vectors) throws IOException {
long pos = vectorData.getFilePointer();
// write floats aligned at 4 bytes. This will not survive CFS, but it shows a small benefit when
// CFS is not used, eg for larger indexes
long padding = (4 - (pos & 0x3)) & 0x3;
long vectorDataOffset = pos + padding;
for (int i = 0; i < padding; i++) {
vectorData.writeByte((byte) 0);
}
// TODO - use a better data structure; a bitset? DocsWithFieldSet is p.p. in o.a.l.index
int[] docIds = new int[vectors.size()];
int count = 0;
for (int docV = vectors.nextDoc(); docV != NO_MORE_DOCS; docV = vectors.nextDoc(), count++) {
// write vector
writeVectorValue(vectors);
docIds[count] = docV;
}
// count may be < vectors.size() e,g, if some documents were deleted
long[] offsets = new long[count];
long vectorDataLength = vectorData.getFilePointer() - vectorDataOffset;
long vectorIndexOffset = vectorIndex.getFilePointer();
if (vectors instanceof RandomAccessVectorValuesProducer) {
writeGraph(
vectorIndex,
(RandomAccessVectorValuesProducer) vectors,
fieldInfo.getVectorSimilarityFunction(),
vectorIndexOffset,
offsets,
count,
maxConn,
beamWidth);
} else {
throw new IllegalArgumentException(
"Indexing an HNSW graph requires a random access vector values, got " + vectors);
}
long vectorIndexLength = vectorIndex.getFilePointer() - vectorIndexOffset;
writeMeta(
fieldInfo,
vectorDataOffset,
vectorDataLength,
vectorIndexOffset,
vectorIndexLength,
count,
docIds);
writeGraphOffsets(meta, offsets);
}
HNSW Introduce – d0evi1 The blog of http://d0evi1.com/hnsw/
Delaunay Triangulation - luoru - Blog Garden https://www.cnblogs.com/zfluo/p/5131851.html
HNSW Learning notes - You know NN Nearest neighbor search is widely used in all kinds of search 、 In the classification task , On a very large data set, it turns into ANN, Common algorithms are KD Trees 、LSH、IVFPQ And the HNSW. HNSW(Hierarchical Navigable Small World) yes ANN Graph based algorithms in search domain …https://zhuanlan.zhihu.com/p/80552211
increase hnsw Discussion post for
https://issues.apache.org/jira/browse/LUCENE-9004https://issues.apache.org/jira/browse/LUCENE-9004
Realize layered hnsw Discussion post for
hnsw Connectivity issues
Here is the definition of small world
The following is about connectivity
边栏推荐
- Comparison of advantages and disadvantages between most complete SQL and NoSQL
- Final, override, polymorphism, abstraction, interface
- Take you through the whole process and comprehensively understand the software accidents that belong to testing
- Recursion, Fibonacci sequence
- [set theory] order relation (partial order relation | partial order set | example of partial order set)
- Vertx's responsive redis client
- SecureCRT password to cancel session recording
- Realize the reuse of components with different routing parameters and monitor the changes of routing parameters
- [most detailed] latest and complete redis interview book (50)
- HISAT2 - StringTie - DESeq2 pipeline 进行bulk RNA-seq
猜你喜欢
Margin left: -100% understanding in the Grail layout
HCIA notes
Introduction of transformation flow
Use of generics
SecureCRT password to cancel session recording
"Baidu Cup" CTF game 2017 February, Web: blast-1
Common architectures of IO streams
Le Seigneur des anneaux: l'anneau du pouvoir
Comparison of advantages and disadvantages between most complete SQL and NoSQL
Qtip2 solves the problem of too many texts
随机推荐
Summary of Arduino serial functions related to print read
VMware network mode - bridge, host only, NAT network
docker建立mysql:5.7版本指定路径挂载不上。
昇思MindSpore再升级,深度科学计算的极致创新
[untitled]
Introduction of buffer flow
The babbage industrial policy forum
Lombok cooperates with @slf4j and logback to realize logging
Circuit, packet and message exchange
PdfWriter. GetInstance throws system Nullreferenceexception [en] pdfwriter GetInstance throws System. NullRef
2. E-commerce tool cefsharp autojs MySQL Alibaba cloud react C RPA automated script, open source log
项目经验分享:实现一个昇思MindSpore 图层 IR 融合优化 pass
Why is data service the direction of the next generation data center?
【最詳細】最新最全Redis面試大全(50道)
List exercises after class
II. D3.js draw a simple figure -- circle
3311. Longest arithmetic
20220319
HISAT2 - StringTie - DESeq2 pipeline 进行bulk RNA-seq
Use of generics