当前位置:网站首页>Hnsw introduction and some reference articles in lucene9
Hnsw introduction and some reference articles in lucene9
2022-07-03 07:30:00 【chuanyangwang】
NSW Search for :
K-NNSearch(object q, integer:m, k)
Definition :TreeSet[object] tempRes, candidates, visitedset, result
for(i=0;i<m;i++) do:
Pick one at random entry point Put in candidates in
tempRes = null
repeat:
stay candidates Select distance from q Some recent c
take c from candidates Delete in
If c Than all in result The elements in are all separated q far
then break repeat
for e: c.friends
If e be not in visitedSet Middle principle
hold e Join in visitedSet, candidates, tempRes
end repeat
hold tempRes Add result in
end for
from result Before returning in k results
NSW Of Insert :
Nearest_Neighbor_Insert(object:new_object, integer:f,w)
SET[object]: neighbors = k-NNSearch(new_object, w, f)
for(i = 0; i<f; i++) do:
neighbors[i].connect(new_object)
new_object.connect(neighbors[i])Lucene in hsnw The index structure of is as follows
Meta data and index part: +--------------------------------------------------+ | meta data | +--------+-----------------------------------------+ | doc id | offset to first friend list for the doc | +--------+-----------------------------------------+ | doc id | offset to first friend list for the doc | +--------+-----------------------------------------+ | ...... | +--------+-----------------------------------------+ Graph data part: +-------------------------+---------------------------+---------+-------------------------+ | friends list at layer N | friends list at layer N-1 | ...... | friends list at level 0 | <- friends lists for doc 0 +-------------------------+---------------------------+---------+-------------------------+ | friends list at layer N | friends list at layer N-1 | ...... | friends list at level 0 | <- friends lists for doc 1 +-------------------------+---------------------------+---------+-------------------------+ | ...... | <- and so on +-----------------------------------------------------------------------------------------+ Vector data part: +----------------------+ | encoded vector value | <- vector value for doc 0 +----------------------+ | encoded vector value | <- vector value for doc 1 +----------------------+ | ...... | <- and so on +----------------------+
The logic of writing indexes
org.apache.lucene.codecs.lucene90.Lucene90HnswVectorsWriter#writeField
1. Write vector
2. Write graph
3. Write meta
@Override
public void writeField(FieldInfo fieldInfo, VectorValues vectors) throws IOException {
long pos = vectorData.getFilePointer();
// write floats aligned at 4 bytes. This will not survive CFS, but it shows a small benefit when
// CFS is not used, eg for larger indexes
long padding = (4 - (pos & 0x3)) & 0x3;
long vectorDataOffset = pos + padding;
for (int i = 0; i < padding; i++) {
vectorData.writeByte((byte) 0);
}
// TODO - use a better data structure; a bitset? DocsWithFieldSet is p.p. in o.a.l.index
int[] docIds = new int[vectors.size()];
int count = 0;
for (int docV = vectors.nextDoc(); docV != NO_MORE_DOCS; docV = vectors.nextDoc(), count++) {
// write vector
writeVectorValue(vectors);
docIds[count] = docV;
}
// count may be < vectors.size() e,g, if some documents were deleted
long[] offsets = new long[count];
long vectorDataLength = vectorData.getFilePointer() - vectorDataOffset;
long vectorIndexOffset = vectorIndex.getFilePointer();
if (vectors instanceof RandomAccessVectorValuesProducer) {
writeGraph(
vectorIndex,
(RandomAccessVectorValuesProducer) vectors,
fieldInfo.getVectorSimilarityFunction(),
vectorIndexOffset,
offsets,
count,
maxConn,
beamWidth);
} else {
throw new IllegalArgumentException(
"Indexing an HNSW graph requires a random access vector values, got " + vectors);
}
long vectorIndexLength = vectorIndex.getFilePointer() - vectorIndexOffset;
writeMeta(
fieldInfo,
vectorDataOffset,
vectorDataLength,
vectorIndexOffset,
vectorIndexLength,
count,
docIds);
writeGraphOffsets(meta, offsets);
}HNSW Introduce – d0evi1 The blog of
http://d0evi1.com/hnsw/
Delaunay Triangulation - luoru - Blog Garden
https://www.cnblogs.com/zfluo/p/5131851.html
HNSW Learning notes - You know NN Nearest neighbor search is widely used in all kinds of search 、 In the classification task , On a very large data set, it turns into ANN, Common algorithms are KD Trees 、LSH、IVFPQ And the HNSW. HNSW(Hierarchical Navigable Small World) yes ANN Graph based algorithms in search domain …
https://zhuanlan.zhihu.com/p/80552211
increase hnsw Discussion post for
https://issues.apache.org/jira/browse/LUCENE-9004
https://issues.apache.org/jira/browse/LUCENE-9004
Realize layered hnsw Discussion post for
hnsw Connectivity issues
Here is the definition of small world
The following is about connectivity
边栏推荐
- Various postures of CS without online line
- Leetcode 213: 打家劫舍 II
- Understanding of class
- Grpc message sending of vertx
- 1. E-commerce tool cefsharp autojs MySQL Alibaba cloud react C RPA automated script, open source log
- Some basic operations of reflection
- Arduino 软串口通信 的几点体会
- Leetcode 198: house raiding
- Deep learning parameter initialization (I) Xavier initialization with code
- Basic knowledge about SQL database
猜你喜欢

Discussion on some problems of array

Dora (discover offer request recognition) process of obtaining IP address

《指環王:力量之戒》新劇照 力量之戒鑄造者亮相

【已解决】Unknown error 1146

昇思MindSpore再升级,深度科学计算的极致创新

c语言指针的概念

Homology policy / cross domain and cross domain solutions /web security attacks CSRF and XSS

Margin left: -100% understanding in the Grail layout

最全SQL与NoSQL优缺点对比

IO stream system and FileReader, filewriter
随机推荐
Store WordPress media content on 4everland to complete decentralized storage
Interview questions about producers and consumers (important)
[cmake] cmake link SQLite Library
"Moss ma not found" solution
Vertx restful style web router
Responsive MySQL of vertx
TypeScript let与var的区别
Sent by mqtt client server of vertx
Realize the reuse of components with different routing parameters and monitor the changes of routing parameters
TreeMap
TypeScript let与var的区别
Warehouse database fields_ Summary of SQL problems in kingbase8 migration of Jincang database
Discussion on some problems of array
Summary of abnormal mechanism of interview
JS monitors empty objects and empty references
The babbage industrial policy forum
SecureCRT password to cancel session recording
VMware network mode - bridge, host only, NAT network
Leetcode 213: looting II
Leetcode 198: 打家劫舍