当前位置:网站首页>Hnsw introduction and some reference articles in lucene9
Hnsw introduction and some reference articles in lucene9
2022-07-03 07:30:00 【chuanyangwang】
NSW Search for :
K-NNSearch(object q, integer:m, k)
Definition :TreeSet[object] tempRes, candidates, visitedset, result
for(i=0;i<m;i++) do:
Pick one at random entry point Put in candidates in
tempRes = null
repeat:
stay candidates Select distance from q Some recent c
take c from candidates Delete in
If c Than all in result The elements in are all separated q far
then break repeat
for e: c.friends
If e be not in visitedSet Middle principle
hold e Join in visitedSet, candidates, tempRes
end repeat
hold tempRes Add result in
end for
from result Before returning in k results
NSW Of Insert :
Nearest_Neighbor_Insert(object:new_object, integer:f,w)
SET[object]: neighbors = k-NNSearch(new_object, w, f)
for(i = 0; i<f; i++) do:
neighbors[i].connect(new_object)
new_object.connect(neighbors[i])
Lucene in hsnw The index structure of is as follows
Meta data and index part: +--------------------------------------------------+ | meta data | +--------+-----------------------------------------+ | doc id | offset to first friend list for the doc | +--------+-----------------------------------------+ | doc id | offset to first friend list for the doc | +--------+-----------------------------------------+ | ...... | +--------+-----------------------------------------+ Graph data part: +-------------------------+---------------------------+---------+-------------------------+ | friends list at layer N | friends list at layer N-1 | ...... | friends list at level 0 | <- friends lists for doc 0 +-------------------------+---------------------------+---------+-------------------------+ | friends list at layer N | friends list at layer N-1 | ...... | friends list at level 0 | <- friends lists for doc 1 +-------------------------+---------------------------+---------+-------------------------+ | ...... | <- and so on +-----------------------------------------------------------------------------------------+ Vector data part: +----------------------+ | encoded vector value | <- vector value for doc 0 +----------------------+ | encoded vector value | <- vector value for doc 1 +----------------------+ | ...... | <- and so on +----------------------+
The logic of writing indexes
org.apache.lucene.codecs.lucene90.Lucene90HnswVectorsWriter#writeField
1. Write vector
2. Write graph
3. Write meta
@Override
public void writeField(FieldInfo fieldInfo, VectorValues vectors) throws IOException {
long pos = vectorData.getFilePointer();
// write floats aligned at 4 bytes. This will not survive CFS, but it shows a small benefit when
// CFS is not used, eg for larger indexes
long padding = (4 - (pos & 0x3)) & 0x3;
long vectorDataOffset = pos + padding;
for (int i = 0; i < padding; i++) {
vectorData.writeByte((byte) 0);
}
// TODO - use a better data structure; a bitset? DocsWithFieldSet is p.p. in o.a.l.index
int[] docIds = new int[vectors.size()];
int count = 0;
for (int docV = vectors.nextDoc(); docV != NO_MORE_DOCS; docV = vectors.nextDoc(), count++) {
// write vector
writeVectorValue(vectors);
docIds[count] = docV;
}
// count may be < vectors.size() e,g, if some documents were deleted
long[] offsets = new long[count];
long vectorDataLength = vectorData.getFilePointer() - vectorDataOffset;
long vectorIndexOffset = vectorIndex.getFilePointer();
if (vectors instanceof RandomAccessVectorValuesProducer) {
writeGraph(
vectorIndex,
(RandomAccessVectorValuesProducer) vectors,
fieldInfo.getVectorSimilarityFunction(),
vectorIndexOffset,
offsets,
count,
maxConn,
beamWidth);
} else {
throw new IllegalArgumentException(
"Indexing an HNSW graph requires a random access vector values, got " + vectors);
}
long vectorIndexLength = vectorIndex.getFilePointer() - vectorIndexOffset;
writeMeta(
fieldInfo,
vectorDataOffset,
vectorDataLength,
vectorIndexOffset,
vectorIndexLength,
count,
docIds);
writeGraphOffsets(meta, offsets);
}
HNSW Introduce – d0evi1 The blog of http://d0evi1.com/hnsw/
Delaunay Triangulation - luoru - Blog Garden https://www.cnblogs.com/zfluo/p/5131851.html
HNSW Learning notes - You know NN Nearest neighbor search is widely used in all kinds of search 、 In the classification task , On a very large data set, it turns into ANN, Common algorithms are KD Trees 、LSH、IVFPQ And the HNSW. HNSW(Hierarchical Navigable Small World) yes ANN Graph based algorithms in search domain …https://zhuanlan.zhihu.com/p/80552211
increase hnsw Discussion post for
https://issues.apache.org/jira/browse/LUCENE-9004https://issues.apache.org/jira/browse/LUCENE-9004
Realize layered hnsw Discussion post for
hnsw Connectivity issues
Here is the definition of small world
The following is about connectivity
边栏推荐
- 3311. Longest arithmetic
- The difference between typescript let and VaR
- II. D3.js draw a simple figure -- circle
- Homology policy / cross domain and cross domain solutions /web security attacks CSRF and XSS
- Warehouse database fields_ Summary of SQL problems in kingbase8 migration of Jincang database
- Beginners use Minio
- [untitled]
- Common APIs
- 最全SQL与NoSQL优缺点对比
- [coppeliasim4.3] C calls UR5 in the remoteapi control scenario
猜你喜欢
VMware network mode - bridge, host only, NAT network
项目经验分享:实现一个昇思MindSpore 图层 IR 融合优化 pass
Reconnaissance et détection d'images - Notes
Introduction of buffer flow
Deep learning parameter initialization (I) Xavier initialization with code
《指环王:力量之戒》新剧照 力量之戒铸造者亮相
The embodiment of generics in inheritance and wildcards
【已解决】Unknown error 1146
论文学习——鄱阳湖星子站水位时间序列相似度研究
Why is data service the direction of the next generation data center?
随机推荐
Docker builds MySQL: the specified path of version 5.7 cannot be mounted.
Pat grade a real problem 1166
C WinForm framework
Summary of Arduino serial functions related to print read
【已解决】win10找不到本地组策略编辑器解决方法
HCIA notes
PgSQL converts string to double type (to_number())
Summary of abnormal mechanism of interview
I. D3.js hello world
2. E-commerce tool cefsharp autojs MySQL Alibaba cloud react C RPA automated script, open source log
[solved] win10 cannot find a solution to the local group policy editor
Jeecg data button permission settings
JUnit unit test of vertx
Introduction of transformation flow
The concept of C language pointer
【CMake】CMake链接SQLite库
Pgadmin 4 v6.11 release, PostgreSQL open source graphical management tool
Why is data service the direction of the next generation data center?
List exercises after class
Hisat2 - stringtie - deseq2 pipeline for bulk RNA seq