当前位置:网站首页>Hnsw introduction and some reference articles in lucene9
Hnsw introduction and some reference articles in lucene9
2022-07-03 07:30:00 【chuanyangwang】
NSW Search for :
K-NNSearch(object q, integer:m, k)
Definition :TreeSet[object] tempRes, candidates, visitedset, result
for(i=0;i<m;i++) do:
Pick one at random entry point Put in candidates in
tempRes = null
repeat:
stay candidates Select distance from q Some recent c
take c from candidates Delete in
If c Than all in result The elements in are all separated q far
then break repeat
for e: c.friends
If e be not in visitedSet Middle principle
hold e Join in visitedSet, candidates, tempRes
end repeat
hold tempRes Add result in
end for
from result Before returning in k results
NSW Of Insert :
Nearest_Neighbor_Insert(object:new_object, integer:f,w)
SET[object]: neighbors = k-NNSearch(new_object, w, f)
for(i = 0; i<f; i++) do:
neighbors[i].connect(new_object)
new_object.connect(neighbors[i])Lucene in hsnw The index structure of is as follows
Meta data and index part: +--------------------------------------------------+ | meta data | +--------+-----------------------------------------+ | doc id | offset to first friend list for the doc | +--------+-----------------------------------------+ | doc id | offset to first friend list for the doc | +--------+-----------------------------------------+ | ...... | +--------+-----------------------------------------+ Graph data part: +-------------------------+---------------------------+---------+-------------------------+ | friends list at layer N | friends list at layer N-1 | ...... | friends list at level 0 | <- friends lists for doc 0 +-------------------------+---------------------------+---------+-------------------------+ | friends list at layer N | friends list at layer N-1 | ...... | friends list at level 0 | <- friends lists for doc 1 +-------------------------+---------------------------+---------+-------------------------+ | ...... | <- and so on +-----------------------------------------------------------------------------------------+ Vector data part: +----------------------+ | encoded vector value | <- vector value for doc 0 +----------------------+ | encoded vector value | <- vector value for doc 1 +----------------------+ | ...... | <- and so on +----------------------+
The logic of writing indexes
org.apache.lucene.codecs.lucene90.Lucene90HnswVectorsWriter#writeField
1. Write vector
2. Write graph
3. Write meta
@Override
public void writeField(FieldInfo fieldInfo, VectorValues vectors) throws IOException {
long pos = vectorData.getFilePointer();
// write floats aligned at 4 bytes. This will not survive CFS, but it shows a small benefit when
// CFS is not used, eg for larger indexes
long padding = (4 - (pos & 0x3)) & 0x3;
long vectorDataOffset = pos + padding;
for (int i = 0; i < padding; i++) {
vectorData.writeByte((byte) 0);
}
// TODO - use a better data structure; a bitset? DocsWithFieldSet is p.p. in o.a.l.index
int[] docIds = new int[vectors.size()];
int count = 0;
for (int docV = vectors.nextDoc(); docV != NO_MORE_DOCS; docV = vectors.nextDoc(), count++) {
// write vector
writeVectorValue(vectors);
docIds[count] = docV;
}
// count may be < vectors.size() e,g, if some documents were deleted
long[] offsets = new long[count];
long vectorDataLength = vectorData.getFilePointer() - vectorDataOffset;
long vectorIndexOffset = vectorIndex.getFilePointer();
if (vectors instanceof RandomAccessVectorValuesProducer) {
writeGraph(
vectorIndex,
(RandomAccessVectorValuesProducer) vectors,
fieldInfo.getVectorSimilarityFunction(),
vectorIndexOffset,
offsets,
count,
maxConn,
beamWidth);
} else {
throw new IllegalArgumentException(
"Indexing an HNSW graph requires a random access vector values, got " + vectors);
}
long vectorIndexLength = vectorIndex.getFilePointer() - vectorIndexOffset;
writeMeta(
fieldInfo,
vectorDataOffset,
vectorDataLength,
vectorIndexOffset,
vectorIndexLength,
count,
docIds);
writeGraphOffsets(meta, offsets);
}HNSW Introduce – d0evi1 The blog of
http://d0evi1.com/hnsw/
Delaunay Triangulation - luoru - Blog Garden
https://www.cnblogs.com/zfluo/p/5131851.html
HNSW Learning notes - You know NN Nearest neighbor search is widely used in all kinds of search 、 In the classification task , On a very large data set, it turns into ANN, Common algorithms are KD Trees 、LSH、IVFPQ And the HNSW. HNSW(Hierarchical Navigable Small World) yes ANN Graph based algorithms in search domain …
https://zhuanlan.zhihu.com/p/80552211
increase hnsw Discussion post for
https://issues.apache.org/jira/browse/LUCENE-9004
https://issues.apache.org/jira/browse/LUCENE-9004
Realize layered hnsw Discussion post for
hnsw Connectivity issues
Here is the definition of small world
The following is about connectivity
边栏推荐
- [most detailed] latest and complete redis interview book (50)
- Leetcode 213: 打家劫舍 II
- Longest common prefix and
- "Moss ma not found" solution
- Pat grade a real problem 1166
- C WinForm framework
- Summary of abnormal mechanism of interview
- Vertx metric Prometheus monitoring indicators
- Custom generic structure
- II. D3.js draw a simple figure -- circle
猜你喜欢

JS monitors empty objects and empty references

691. Cube IV

High concurrency memory pool

昇思MindSpore再升级,深度科学计算的极致创新

Basic knowledge about SQL database

【开发笔记】基于机智云4G转接板GC211的设备上云APP控制

Le Seigneur des anneaux: l'anneau du pouvoir
![[solved] sqlexception: invalid value for getint() - 'Tian Peng‘](/img/bf/f6310304d58d964b3d09a9d011ddb5.png)
[solved] sqlexception: invalid value for getint() - 'Tian Peng‘
![[solved] unknown error 1146](/img/f1/b8dd3ca8359ac9eb19e1911bd3790a.png)
[solved] unknown error 1146

IP home online query platform
随机推荐
Arduino Serial系列函数 有关print read 的总结
Custom generic structure
Longest common prefix and
twenty million two hundred and twenty thousand three hundred and nineteen
[Development Notes] cloud app control on device based on smart cloud 4G adapter gc211
【已解决】Unknown error 1146
pgAdmin 4 v6.11 发布,PostgreSQL 开源图形化管理工具
Introduction of transformation flow
Circuit, packet and message exchange
项目经验分享:实现一个昇思MindSpore 图层 IR 融合优化 pass
The difference between typescript let and VaR
圖像識別與檢測--筆記
Jeecg request URL signature
【CMake】CMake链接SQLite库
URL programming
New stills of Lord of the rings: the ring of strength: the caster of the ring of strength appears
Some basic operations of reflection
TypeScript let与var的区别
sharepoint 2007 versions
Lombok -- simplify code