当前位置:网站首页>The strongest installation of the twin tower model, Google is playing "antique" again?
The strongest installation of the twin tower model, Google is playing "antique" again?
2022-07-07 21:55:00 【Zhiyuan community】
The twin tower model has proved to be a very effective modeling method in search and question and answer tasks , The theory and business are quite mature . The two towers share different degrees according to parameters , It usually falls into two categories :Simese dual encoder and Asymmetric dual encoder, The former parameter structure is completely symmetrical , The latter is not completely symmetrical ( Hereinafter referred to as" SDE and ADE).
This paper is after the long silence of the twin towers , Google pushed it to the center of the universe again , And open the strongest export of the twin towers , Explore the differences and connections between the two in detail , More empirical conclusions of the double tower structure are also given through experiments . It is suitable for old drivers to recall classics and Xiaobai again and make a deep and systematic introduction ~
Thesis title :
Exploring Dual Encoder Architectures for Question Answering
Thesis link :
https://arxiv.org/abs/2204.07120
background
First of all, what is popular science SDE and ADE? The dual encoder network structure will text1 and text2 Respectively encoded into vector representation , Then calculate the sum of the two cosine Equidistance function measures its similarity .SDE Is a twin network that fully shares parameters , That is, although it is a double tower , But actually query/user and doc/item Share a set of parameters ;ADE Only some parameters are shared or not shared at all , It is an independent two parameter network . What they have in common is that they will not interact deeply , contrast BERT Is a typical interactive network . A typical application of double tower structure is recall or Rough row , Scenarios that require strict computing speed .

The modeling idea of twin towers is relatively simple and easy to understand . This article is short and concise , The highlight is Provide a more general conclusion under the twin tower application scenario , Explain several questions clearly :
- ADE and SDE stay QA Which one works better on the task ?
- ADE What are the reasons for poor performance ? What's the solution ?
The author draws a reliable conclusion through reasonable and detailed experiments , Xiaobai can also quickly get To how in ( towards ) real ( guide ) Examination ( t ) Do a section ( Remit ) study ( newspaper ).

experiment
The author in QA The retrieval task is carried out 5 An experiment , Calculation query And candidates answer(doc or passage) The similarity of , The evaluation task is MS MARCO and MultiReQA. Model encoder Is based on transformer,cosine As a distance measurement function , The goal is to explore the influence of the sharing degree of parameters on the modeling effect . 5 A group of experimental networks are the standards of Figure 1 SDE and ADE, as well as 3 Variant structure :• ADE with shared token embedder (ADE-STE) • ADE with frozen token embedder (ADE-FTE) • ADE with shared projection layer (ADE-SPL) The experimental results are as follows :

The experimental conclusion :
- ADE Performance on multiple tasks is significantly inferior to SDE. The reasonable explanation given by the author is due to ADE The essence is two networks with different parameters , So the query and doc Map to two completely different vector spaces . This point later gives more powerful evidence .
- ADE-SPL Our performance is comparable to SDE. after 3 The first experiment is the structure proposed by the author to explore the degree of parameter sharing , At the same time, it also gives which part of the network is limited ADE The key to the effect . Just share or fix the bottom token embedder The effect improvement brought by parameters is not obvious , But when the last top-level parameters share a full connection layer , Can get and SDE The effect of proximity . Why? ? The author's guess is because of the last MLP The parameters are constrained to the same vector space again .
To further illustrate the problem , The author conducted another experiment , take NaturalQuestions Test set query and answer Calculate in advance , And then through t-SNE Map and cluster into a two-dimensional space , Be surprised to find ,dual encoder The performance of depends on whether the last two are in a comparable vector space .

边栏推荐
- L2:ZK-Rollup的现状,前景和痛点
- POJ 3140 Contestants Division「建议收藏」
- South China x99 platform chicken blood tutorial
- gridView自己定义做时间排版「建议收藏」
- How does win11 unblock the keyboard? Method of unlocking keyboard in win11
- Index summary (assault version)
- Devil daddy B1 hearing the last barrier, break through with all his strength
- Validutil, "Rethinking the setting of semi supervised learning on graphs"
- Codemail auto collation code of visual studio plug-in
- Description of the difference between character varying and character in PostgreSQL database
猜你喜欢

Automatic classification of defective photovoltaic module cells in electronic images

Tcp/ip protocol stack

L2:ZK-Rollup的现状,前景和痛点

How does win11 time display the day of the week? How does win11 display the day of the week today?

Reptile combat (VII): pictures of the king of reptiles' heroes
![[200 opencv routines] 223 Polygon fitting for feature extraction (cv.approxpolydp)](/img/1e/055df228853d9b464fc4bcbde0a7ee.png)
[200 opencv routines] 223 Polygon fitting for feature extraction (cv.approxpolydp)
![[开源] .Net ORM 访问 Firebird 数据库](/img/a2/4eff4f0af53bf3b9839a73019a212f.png)
[开源] .Net ORM 访问 Firebird 数据库
![Jerry's test box configuration channel [chapter]](/img/d4/fb67f5ee0fe413c22e4e5cd5037938.png)
Jerry's test box configuration channel [chapter]
![Restapi version control strategy [eolink translation]](/img/65/decbc158f467ab8c8923c5947af535.png)
Restapi version control strategy [eolink translation]

为什么Win11不能显示秒数?Win11时间不显示秒怎么解决?
随机推荐
UVA 12230 – crossing rivers (probability) "suggested collection"
Problems encountered in installing mysql8 for Ubuntu and the detailed installation process
Develop those things: go plus c.free to free memory, and what are the reasons for compilation errors?
Can I open a stock account directly online now? Is it safe?
Cv2.resize function reports an error: error: (-215:assertion failed) func= 0 in function ‘cv::hal::resize‘
Codeforces round 275 (Div. 2) C – diverse permutation (construction) [easy to understand]
How polardb-x does distributed database hotspot analysis
Prometheus remote_ write InfluxDB,unable to parse authentication credentials,authorization failed
【colmap】稀疏重建转为MVSNet格式输入
Implementation of mahout Pearson correlation
An overview of the latest research progress of "efficient deep segmentation of labels" at Shanghai Jiaotong University, which comprehensively expounds the deep segmentation methods of unsupervised, ro
Automatic classification of defective photovoltaic module cells in electronic images
[uvalive 6663 count the regions] (DFS + discretization) [easy to understand]
2022 how to evaluate and select low code development platforms?
EasyUI date control emptying value
Description of the difference between character varying and character in PostgreSQL database
Using enumeration to realize English to braille
单词反转实现「建议收藏」
NVR硬盘录像机通过国标GB28181协议接入EasyCVR,设备通道信息不显示是什么原因?
Jerry's key to initiate pairing [chapter]