当前位置:网站首页>[NLP] vector retrieval model landing: Bottleneck and solution!
[NLP] vector retrieval model landing: Bottleneck and solution!
2022-06-26 07:34:00 【Demeanor 78】
author | Maple Xiao Qi
Arrangement | NewBeeNLP
The huge memory consumption of dense vector retrieval has always been a bottleneck to limit its implementation . actually ,DPR Generated 768 There is a lot of redundant information in dimensional dense vectors , We can use some compression method to exchange a small loss of accuracy for a significant reduction in memory usage .
Today I share an article from EMNLP 2021 The paper of , Three simple and effective compression methods are discussed :
Unsupervised PCA Dimension reduction
Supervised fine tuning and dimensionality reduction
Product quantization
The experimental results show that it is simple PCA Dimensionality reduction has high cost performance , Can be less than 3% Of top-100 Loss of accuracy 48 Times the compression ratio , Less than 4% Of top-100 Loss of accuracy 96 Times the compression ratio .

Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval
Introduction
In the past two years , With DPR The dense vector retrieval model represented by has been widely used in open domain question answering and other fields . although DPR It can provide more accurate search results , but DPR The memory consumption of the generated vector index is very large .
For example, when we index Wikipedia , Based on inverted index BM25 Occupy only 2.4GB Of memory , and DPR Generated 768 Dimensional dense vectors need to occupy 61GB Of memory , This is more than BM25 More than enough 24 times , And this extra 24 Times more memory on multiple data sets in exchange for only an average 2.5% The indicators of the project have been improved (Top-100 Accuracy).
It can be guessed that ,DPR Generated 768 The dimensional density vector may be too large , There may be a lot of redundancy , We can try to trade a small loss of accuracy for a significant reduction in memory usage . In response to this question , This paper explores three simple and effective dense vector compression methods , Including principal component analysis (Principal Component Analysis, PCA)、 Product quantization (Product Quantization, PQ) And supervised dimensionality reduction (Supervised Dimensionality Reduction).
Quantifying Redundancy
First , Let's make sure DPR Whether there is redundancy in dense vectors , The author gives two common indicators :PCA Of 「 Explain the variance ratio 」(explained variance ratio), And with 「 Mutual information 」(mutual information).
PCA Using feature decomposition, a set of vectors that may have correlation are transformed into a coordinate system composed of linearly independent feature vectors , And keep the direction with large variance , Discard the direction with small variance , The explanatory variance ratio is a measure of PCA Indicators of dimensionality reduction effect :
Where is the variance corresponding to the eigenvalue from the largest to the smallest , And respectively PCA Dimensions before and after dimensionality reduction . This ratio shows the proportion of the original dense vector variance that can be retained by retaining the previous eigenvector .
Another way to measure vector redundancy is to calculate the mutual information of sum , Mutual information can be obtained from the formula :
402 Payment Required
We can use DPR Approximate calculation of optimization objectives :
The upper bound of mutual information is , In order to and PCA contrast , We can standardize mutual information .
After varying degrees of PCA Dimension reduction , Explain the variation trend of variance ratio and standard mutual information with vector dimension, as shown in the figure below . We can find out ,200 Weizuo is a sweet spot ,「 take 768 The vector of dimension is reduced to 200 dimension , The reduced dimension vector can retain 90% And the variance of 99% Mutual information of , Further dimensionality reduction will lead to a rapid decline in the amount of information .」

Dense Vector Compression
Next , We try three simple ways to compress dense vectors :
「Supervised Approach:」 We can simply add two linear layers on the top of the double tower encoder
Heli dimension reduction , During training, you can freeze the lower parameters , Fine tune only linear layers , At the same time, we can also add an orthogonalization loss encouragement and mutual orthogonality , This can make the point product similarity after dimensionality reduction and the point product similarity before dimensionality reduction scale It's consistent .「Unsupervised Approach:」 We can mix and , Then a linear equation is fitted to this vector set PCA Transformation , In the reasoning stage , Use the fitted PCA Transformation pair DPR The dimension of the generated vector is reduced .
「Product Quantization:」 We can also use product quantization to further compress the vector size , Its basic principle is to decompose the vector of a dimension into sub vectors , Each sub vector adopts -means quantitative , And use bits to store . such as , A vector of one dimension takes up bits , By decomposing it into sub vectors of bits , The size of the vector is compressed to bits , That is, the original size , On average , The number of bits per dimension has decreased from to .
Experiment & Results
The author in NQ、TriviaQA、WQ、CuratedTREC、SQuAD I tested it. DPR Of top- Accuracy rate ( At least one correct proportion of the previous recall results ), See the original text for details of the experiment .
Dimensionality Reduction
Let's first compare the performance of supervised and unsupervised dimensionality reduction , among PCA-* and Linear-* They are unsupervised PCA Peacekeeping reduction is supervised and fine tuned ( Fine tune only linear layers ) Result , and DPR-* Indicates that the lower layer parameters are not frozen , The result of joint fine tuning with linear layer . You can find , When the vector dimension is large (、), Unsupervised PCA Better , When the vector dimension is small (), Supervised fine-tuning will perform better , However, at this time, the performance of the model also decreases significantly , Therefore, in general, there is no supervision PCA More practical .
Although theoretically Linear-* You can learn PCA-* Linear mapping of fitting , But it is not easy to make the parameters converge to a good solution . in addition , Freeze the lower parameters (Linear-*) Than not frozen (DPR-*) The results are better , This is also caused by inadequate training . Sum up , in the majority of cases , We just need to do simple linear PCA Transformation , You can get a very good compression ratio .

Product Quantization
Product quantization is a very effective compression method , Based on the above experimental results, the author further adds product quantization , The experimental results are shown in the following table , among PQ-2 Indicates that the number of bits occupied by each dimension after quantization is . As can be seen from the table below ,PQ-1 Compression is too radical , Although its compression ratio is PQ-2 Twice as many , But the index has fallen more than twice , It's very uneconomic .
Sum up , We think PCA Dimensionality reduction plus product quantization is the best compression method , If we limit the decline of the index to the average 4% within , We can use PCA-128+PQ2 Compress dense vectors 96 times , Reduce the memory footprint of Wikipedia's vector index from 61GB Down to 642MB, At the same time, the reasoning time is changed from 7570ms Down to 416ms.

Hybrid Search
A large number of studies have shown that combining sparse vector retrieval (BM25) And dense vector retrieval can improve performance , The easiest way to do a weighted sum of scores is to do a linear sum :
Here we simply set , That is, dense retrieval and sparse retrieval .
Adding hybrid retrieval can further improve performance , The following figure shows the relationship between retrieval accuracy and index size of different compression methods , Each curve from left to right is PQ1、PQ2 and w/o PQ, The black dotted line in the figure is the Pareto boundary , The original 768 dimension DPR The vector does not fall on the Pareto boundary , It shows that there is room for improvement . say concretely ,「PCA-256+PQ2+Hybrid Search Your compression strategy will 61GB The index size of has been reduced to 3.7GB, Its Top-100 The accuracy is even better than the original DPR Better (+0.2%).」

Discussion
One of the bottlenecks restricting the implementation of dense vector retrieval model is the reasoning delay and memory consumption , This paper proves through experiments that simple principal component analysis plus product quantization , Supplemented by sparse vector retrieval , It can greatly reduce the memory occupation on the premise of ensuring accuracy , Speed up retrieval , Quite practical .
Communicate together
I want to learn and progress with you !『NewBeeNLP』 At present, many communication groups in different directions have been established ( machine learning / Deep learning / natural language processing / Search recommendations / Figure network / Interview communication / etc. ), Quota co., LTD. , Quickly add the wechat below to join the discussion and exchange !( Pay attention to it o want Notes Can pass )

- END -


All things can be Graph | When the recommendation system meets the above neural network ( Four )

Industry share | Meituan search sorting practice

hardcore ! This paper combs the classical graph network model



边栏推荐
- Porphyrin based polyimide (ppbpis); Synthesis of crosslinked porphyrin based polyimides (ppbpi CRS) porphyrin products supplied by Qiyue biology
- Redis series - redis startup, client day1-2
- How MySQL implements the RC transaction isolation level
- Dark red crystal meso-5,10,15,20-tetra (p-aminophenyl) cobalt porphyrin (co TAPP); Meso-5,10,15,20-tetra (p-aminophenyl) cobalt porphyrin no complex (TAPP co no) supplied by Qiyue
- Ppbpi-h-cr, ppbpimn Cr, ppbpi Fe Cr alkynyl crosslinked porphyrin based polyimide material Qiyue porphyrin reagent
- [UVM basics] connect of UVM_ Phase execution sequence
- 5,10,15,20-tetraphenylporphyrin (TPP) and metal complexes fetpp/mntpp/cutpp/zntpp/nitpp/cotpp/pttpp/pdtpp/cdtpp supplied by Qiyue
- Web technology sharing | webrtc recording video stream
- 异地北京办理居住证详细材料
- Meso tetra (4-bromophenyl) porphyrin (tbpp); 5,10,15,20-tetra (4-methoxy-3-sulfonylphenyl) porphyrin [t (4-mop) ps4] supplied by Qiyue
猜你喜欢

缓存使用

The "big grievances" in the workplace are not only physically tired, but also mentally emptied

Alkynyl crosslinked porphyrin based polyimide materials (ppbpi-h-cr, ppbpi Mn cr.ppbpi Fe Cr); Metalloporphyrin based polyimide (ppbpi Mn, ppbpi FE) supplied by Qiyue

Junit
![Meso tetra (4-bromophenyl) porphyrin (tbpp); 5,10,15,20-tetra (4-methoxy-3-sulfonylphenyl) porphyrin [t (4-mop) ps4] supplied by Qiyue](/img/83/ddbf296ac83f006f31cfd0bbbabe5e.jpg)
Meso tetra (4-bromophenyl) porphyrin (tbpp); 5,10,15,20-tetra (4-methoxy-3-sulfonylphenyl) porphyrin [t (4-mop) ps4] supplied by Qiyue

How to quickly merge multiple PDF files?

C#实现给DevExpress中GridView表格指定列添加进度条显示效果——代码实现方式

Cloud native integration data warehouse heavy release
![[recommend an entity class conversion tool mapstruct, which is powerful and easy to use]](/img/7b/43becce42192fb5e0469465aa27a36.png)
[recommend an entity class conversion tool mapstruct, which is powerful and easy to use]

Multisensor fusion sensing
随机推荐
Introduction to mapping in ES
缓存使用
3D porphyrin MOF (mof-p5) / 3D porphyrin MOF (mof-p4) / 2D cobalt porphyrin MOF (ppf-1-co) / 2D porphyrin COF (POR COF) / supplied by Qiyue
Detailed materials of applying for residence permit in non local Beijing
Liangshui Xianmu shows his personal awareness as a unity3d worker
Shengshi Haotong enterprise wechat sector creates a digital ecological community
Which of the top ten securities companies has the lowest commission fee and is the most safe and reliable?
Jemter 压力测试 -可视化工具支持-【安装篇】
Jmeter压力测试-Web代理本地接口测试【教学篇】
Crosslinked metalloporphyrin based polyimide ppbpi-h) PPBP Mn; PBP-Fe; PPBPI-Fe-CR; Ppbpi Mn CR product - supplied by Qiyue
The "big grievances" in the workplace are not only physically tired, but also mentally emptied
Xiaosha's counting (bit operation, Combinatorial Mathematics) - Niuke
GMP模型
How can I find the completely deleted photos in Apple mobile phone?
php array_merge详解
指南针炒股软件开户是合法的吗?安全吗
Encapsulating ranging and surface class based on leaflet
Young man, do you know the original appearance of kotlin association process?
PXRD, IR, TGA of two-dimensional porphyrin COF (POR COF) /cof (2D pdpor COF) - supplied by Qiyue
QTreeWidget And QTableWidget