当前位置:网站首页>Text matching - [naacl 2021] augsbert
Text matching - [naacl 2021] augsbert
2022-06-30 14:48:00 【User 1621453】
Background and challenges
Address of thesis :https://arxiv.org/abs/2010.08240
at present , State-of-the-art NLP Architectural models are often reused in Wikipedia and Toronto Books Corpus And other large text corpora BERT Model as baseline . Through deep pre training BERT Fine tuning , Many alternative architectures have been invented , for example DeBERT、RetriBERT、RoBERTa …… They substantially improve the benchmarks for various language understanding tasks . stay NLP Common tasks in , Pairwise sentence scoring in information retrieval 、 Question and answer 、 Repeated problem detection or clustering has a wide range of applications . Usually , Two typical methods are proposed :Bi-encoders and Cross-encoders.
- Cross-encoders: Perform a complete... For a given input and label candidate ( cross )self-attention, And often more than their Bi-encoders Get higher accuracy . however , It must recalculate the encoding of each input and tag ; result , They cannot retrieve end-to-end information , Because they don't produce independent representations for input , And the speed is very slow when testing . for example ,10,000 The clustering of sentences has quadratic complexity , It takes about 65 Hours of training .
- Bi-encoders: Execute... For input and candidate tags respectively self-attention, Map them to a dense vector space , Then combine them at the end to get the final representation . therefore ,Bi-encoders Ability to index coded candidates and compare these representations for each input , So as to speed up the prediction time . In clustering 10,000 At the same complexity of sentences , The time from 65 Hours reduced to about 5 second . advanced Bi-encoder Bert The performance of the model is determined by Ubiquitous Knowledge Processing Lab (UKP-TUDA) Put forward , be called Sentence-BERT (SBERT).
On the other hand , No method is perfect in all respects ,Bi-encoders No exception . And Cross-encoders Methods compared ,Bi-encoders The performance of methods is usually low , And it needs a lot of training data . as a result of Cross-encoders You can compare two inputs at the same time , and Bi-encoders The input must be independently mapped to a meaningful vector space , This requires a sufficient number of training examples to fine tune .
To solve this problem , Invented “Poly-encoders”. “Poly-encoders” Use two independent Converters ( Be similar to cross-encoders), But only apply attention between the top two inputs , Lead to more than Bi-encoders Better performance gain and ratio Cross-encoders Greater speed gain . However ,“Poly-encoders” There are still some shortcomings : Because of the asymmetric score function , They cannot be applied to tasks with symmetric similarity , also “Poly-encoders” The representation of cannot be effectively indexed , This leads to problems in large corpus size retrieval tasks .
In this paper , I would like to introduce a method that can effectively use Cross-encoders and Bi-encoders New method of —— Data to enhance . This strategy is called enhancement SBERT (AugSBERT) , It USES BERT Cross-encoders To mark a larger set of input pairs , To enhance SBERT Bi-encoders Training data . then ,SBERT Bi-encoders Fine tune on this larger enhancement training set , This significantly improves performance . This idea is related to computer vision 《Self-Supervised Learning by Relational Reasoning》 Very similar . therefore , Simply speaking , We can think of it as self supervised learning in naturallanguageprocessing . For more information , It will be introduced in the next section .
Technical highlights
Enhancement for paired sentence regression or classification tasks SBERT There are three main scenarios for the method .
scene 1: Complete annotation data set ( All marked sentences are correct )
under these circumstances , Apply direct data enhancement strategies to prepare and extend tagged datasets . There are three most common levels : character , word , The sentence :
however , The word level is the most appropriate level for the sentence to task . Training based Bi-encoders Performance of , Few recommended methods : Embed... Through context words (BERT、DistilBERT、RoBERTA or XLNet) Insert / To replace with or by synonyms (WordNet、PPDB). After creating enhanced text data , Combine it with the original text data and put it in Bi-Encoders.
However , In rare or exceptional cases of marked data sets , The simple word replacement or increment strategy shown in the does not help the data enhancement in the sentence pair task , Even worse performance than a model without enhancements .
In short , A direct data enhancement strategy involves three steps :
- The first 1 Step : Prepare the semantic text similarity data set of complete markup (gold data)
- The first 2 Step : Replace synonyms in pairs (silver data)
- The first 3 Step : Expanding (gold + silver) Train dual coders on the training data set (SBERT)
scene 2: Limited or fewer annotation data sets ( There are few tagged sentences )
under these circumstances , Due to tag dataset (gold data) Co., LTD. , So use pre trained Cross-encoders For unmarked data ( Same domain ) Weak marking . However , Choosing two sentences at random usually results in different ( no ) Yes ; Positive pairs are extremely rare . This makes silver dataset The label distribution of is heavily biased towards negative pairs . therefore , Two appropriate sampling methods are recommended :
BM25 Sampling (BM25): The algorithm is based on lexical overlap , It is usually used as a scoring function by many search engines . Before querying and retrieving from uniquely indexed sentences k A similar sentence .
Semantic search sampling (SS): In the process of the training Bi-Encoders (SBERT) Used to retrieve the previous in our collection k The most similar sentences . For large collections , You can use the Faiss Such an approximate nearest neighbor search is used for fast retrieval k The most similar sentences . It can solve BM25 Disadvantages in synonymous sentences with little or no lexical overlap .
after , The sampled sentence pairs will pass the pre training Cross-encoders Weak marking , And merge with the gold dataset . then , Train dual coders on this extended training data set . This model is called enhancement SBERT (AugSBERT).AugSBERT May improve existing Bi- encoders And reduce the performance with Cross-encoders The difference of .
All in all , For finite data sets AugSBERT There are three steps involved :
- The first 1 Step : In a small (gold dataset) Fine tune up Cross-encoders (BERT)
- step 2.1: Create pairs through reorganization and pass BM25 Or semantic search to reduce
- step 2.2: Use Cross-encoders(silver dataset) Weakly marked new pairs
- The first 3 Step : Expanding (gold + silver) Training on the training data set bi-encoder (SBERT)
scene 3: No annotated datasets ( Only unmarked sentence pairs )
When we wish SBERT Data in different domains ( No comment ) For high performance , That's what happens . Basically ,SBERT It is impossible to map a sentence with an unknown term to a reasonable vector space . therefore , Proposed the related data enhancement strategy domain adaptation :
The first 1 Step : Train from scratch on the source dataset Cross-encoders (BERT). The first 2 Step : Use these Cross-encoders (BERT) Tag your target dataset , That is, unmarked sentence pairs The first 3 Step : Last , Train on the marked target data set Bi-encoders (SBERT) Generally speaking , When the source domain is quite generic and the target domain is quite specific , AugSBERT Will benefit a lot . conversely , When it moves from a specific domain to a common target domain , Performance is only slightly improved .
边栏推荐
- PS cutting height 1px, Y-axis tiling background image problem
- PHP recursive multi-level classification, infinite classification
- Lfi-rce without controllable documents
- Hbuilder most commonly used and full shortcut key set
- K high frequency elements before sorting
- Computer screenshot how to cut the mouse in
- Detailed explanation of the first three passes of upload Labs
- PS tip: the video frame to Layer command cannot be completed because dynamiclink is not available
- Attack and defense world web questions
- C language & the difference between the address pointed to and the address pointed to by the pointer
猜你喜欢

Computer screenshot how to cut the mouse in

1 figure to explain the difference and connection between nodejs and JS

CCF date calculation (Full Score code + skill summary) February 2, 2015

CCF elimination games (Full Score code + problem solving ideas + skill summary) February 2, 2015

PS tip: the video frame to Layer command cannot be completed because dynamiclink is not available
![[geek challenge 2019] PHP problem solving record](/img/bf/038082e8ee1c91eaf6e35add39f760.jpg)
[geek challenge 2019] PHP problem solving record

Detailed explanation of the first three passes of upload Labs

Knowledge learned from the water resources institute project

V3 03_ Getting started

CCF drawing (full mark code + problem solving ideas + skill summary) February 2, 2014
随机推荐
Double pointer letter matching
PS tip: the video frame to Layer command cannot be completed because dynamiclink is not available
Attack and defense world web questions
Lost connection to the flow server (0 retries remaining): |Out of retries, exiting! Error reporting solution (flow)
Computer screenshot how to cut the mouse in
1 figure to explain the difference and connection between nodejs and JS
机械工程师面试的几个问题,你能答上来几个?
Color classification of sorting
Is it troublesome for CITIC futures to open an account? Is it safe? How much is the handling charge for opening an account for futures? Can you offer a discount
ES6 notes
The JSON data returned from the control layer to JS has a "\" translator. How to remove it
CCF date calculation (Full Score code + skill summary) February 2, 2015
V3 01_ Welcome
2021 geek challenge Web
Fastcgi CGI shallow understanding
Laravel8 custom log directory, rename
Logiciel de récupération de données easyrecovery15 téléchargement
ctfshow nodejs
Problem: wechat developer tool visitor mode cannot use this function
An error is reported when installing dataspherestudio doc: invalid default value for 'update_ time‘