当前位置:网站首页>Analysis of ESIM short text matching model
Analysis of ESIM short text matching model
2022-06-24 01:48:00 【Goose】
ESIM It is a comprehensive application BiLSTM And the attention mechanism , The effect is very powerful in text matching .
Text matching is to analyze whether two sentences have a certain relationship , For example, there is a problem , Now give me an answer , We need to analyze whether the answer matches the question , So it can also be regarded as a binary classification problem ( Output is or is not ). Now it's mainly based on SNIL and MutilNLI These two corpora , They contain two sentences premise and hypothesis And one. label,label Is to judge the relationship between these two sentences , This article mainly explains how to use ESIM Analysis of the problem .
1. brief introduction
ESIM The model is mainly used for text reasoning , Give a premise premise pp Deduce the hypothesis hypothesis pp, The objective of the loss function is to judge pp And hh Is there a connection , That is, whether it can be determined by pp Deduce hh, therefore , This model can also be used for text matching , But the goal of the loss function is whether the two sequences are synonymous sentences .
2. Model structure
ESIM Papers , The author proposes two structures , As shown in the figure below , On the left is the natural language understanding model ESIM, On the right is based on the syntax tree structure HIM, This article also mainly explains ESIM Structure , If you are right HIM If you are interested, you can read the original paper .
ESIM It consists of four parts ,Input Encoding、Local Inference Modeling、 Inference Composition、Prediction
2.1 Input Encoding
The input content of this layer structure , Generally, we can use pre trained word vectors or add embedding layer . Then there is a two-way LSTM, The function mainly depends on the input value encoding, It can also be understood as feature extraction , Finally, keep the value of its hidden state , Write them down as \bar{a}_i and \bar{b}_i, among i And j They represent different moments ,a And b It means the above mentioned p And h.
\begin{array}{l} \bar{a}_{i}=\operatorname{BiLSTM}(a, i) \\ \bar{b}_{i}=\operatorname{BiLSTM}(b, i) \end{array}
2.2 Local Inference Modeling
The next step is to analyze the relationship between the two sentences , How to analyze , The first thing to notice is , We now have the representation vectors of sentences and words , It is based on the current context and the comprehensive analysis of the meaning between words , So if there is a greater connection between two words , It means that the distance and included angle between them are less , such as (1,0) and (0,1) There is no connection between (0.5,0.5) and (0.5,0.5) Large connection between . After understanding this , Let's see ESIM How to analyze .
First , Multiplication between word vectors of two sentences
As I said before , If two word vectors are more related , Then the product will be larger , Then proceed softmax Calculate its weight :
The purpose of the above formulas , In short, it can be understood in this way , such as premise There is a word in "good", First, I analyze the relationship between this word and the words in another sentence , The result of the calculation e_{ij}eij As a weight after standardization , Each word vector in another sentence is represented by weight "good", Such analysis and comparison one by one , Get a new sequence .
The above operation is a attention Mechanism ,\tilde{a}_{i} and \tilde{b}_{j} The first fraction of is attention weight. Pay attention here , Calculation \tilde{a}_{i} The calculation method is the same as \bar{b}_{j} Do weighted sum . instead of \bar{a}_{j}, about \tilde{b}_{j} Empathy .
The next step is to analyze the differences , So as to judge whether the connection between the two sentences is big enough ,ESIM It mainly calculates the difference and product between the new and old sequences , And combine all the information and store it in a sequence :
2.3 Inference Composition
The reason for the above is to store all the information in one sequence , because ESIM Finally, you need to integrate all the information , Do a global analysis , This process is still through BiLSTM Process these two sequences :
It is worth noting that ,F It is a single-layer neural network (ReLU As ** function ), It is mainly used to reduce the parameters of the model and avoid over fitting , in addition , above t Express BiLSTM stay t The output of time .
Because for different sentences , The resulting vector v The length is different , In order to facilitate the final analysis , Here is the BiLSTM The obtained values are pooled , Store the result in a fixed length vector . It is worth noting that , Because considering that the sum operation is sensitive to the sequence length , Thus, the robustness of the model is reduced , therefore ESIM Choose to do both sequences at the same time average pooling and max pooling, Then put the result into a vector :
2.4 prediction
Finally came to the last step , That's the vector v Throw it into a multi-layer perceptron classifier , Use... In the output layer softmax function .
summary
ESIM The first is to input sentences word embedding Or use the pre - trained word vector directly to BiLSTM In the network , take LSTM The output of the network is Attention Calculation ( take p Each word vector in the sentence uses h The weighted sum of all word vectors in , In the same way h Each word vector in the sentence uses p The weighted sum of all word vectors in the sentence indicates ), Then calculate the difference . Feed the two difference matrices into BiLSTM In the network , take LSTM The network output is pooled in average and maximum ( The two are connected ), Finally, the pooled output is sent to the multi-layer perceptron classifier , Use softmax classification .
ESIM The loss function is used to judge whether the two sentences are semantically matched , Match for 1, Don't match for 0; So the cross entropy loss function is used .
Ref
- https://www.pianshen.com/article/66361316884/
- https://zhuanlan.zhihu.com/p/47580077
边栏推荐
- Five things programmers need to consider when developing with low code – thenewstack
- [technology for grass planting] lightweight 248 helps individual developers go to the cloud
- Interviewer: let's talk about the snowflake algorithm. The more detailed, the better
- How to set up AI speech synthesis? What is the function of speech synthesis?
- How to create a group on a barcode label
- Moment. JS how to use epoch time to construct objects
- CLB O & M & operation best practices - big insight into access logs
- Software cost evaluation: basic knowledge interpretation of cosmoc method
- [combat power upgrade] Tencent cloud's first arm architecture instance was launched! Experience the new architecture computing power!
- Tcapulusdb Jun · industry news collection
猜你喜欢

It's too difficult for me. Ali has had 7 rounds of interviews (5 years of experience and won the offer of P7 post)

I, a 27 year old female programmer, feel that life is meaningless, not counting the accumulation fund deposit of 430000
![[SQL injection 12] user agent injection foundation and Practice (based on burpsuite tool and sqli labs LESS18 target machine platform)](/img/c8/f6c2a62b8ab8fa88bd2b3d8f35f592.jpg)
[SQL injection 12] user agent injection foundation and Practice (based on burpsuite tool and sqli labs LESS18 target machine platform)
![[SQL injection 13] referer injection foundation and Practice (based on burpseuite tool and sqli labs less19 target platform)](/img/b5/a8c4bbaf868dd20b7dc9449d2a4378.jpg)
[SQL injection 13] referer injection foundation and Practice (based on burpseuite tool and sqli labs less19 target platform)
随机推荐
Spatial4j introduction practice
Common e-commerce data index system
[dry goods] four tools linkage of automated batch hole digging process
Logistics industry supplier collaborative management platform supplier life cycle management to optimize logistics costs
Tcapulusdb database: the king behind the "Tianya Mingyue swordsman Tour"
Tcapulusdb Jun · industry news collection
Go language core 36 lectures (go language practice and application VI) -- learning notes
Glusterfs version 4.1 selection and deployment
Gin framework: implementing timeout Middleware
[tcapulusdb knowledge base] tcapulusdb introduction Questions Summary
Learn 30 programming languages in 1 minute
Comparison between rule engine and ML model - xlaszlo
Line/kotlin jdsl: kotlin DSL for JPA criteria API
Tcapulusdb Jun · industry news collection (November 22)
Ppt layout design how to make pages not messy
How to create a group on a barcode label
PHP implementation of interval sorting of classified data
If the program exits abnormally, how to catch the fatal error through the go language?
[tcapulusdb knowledge base] common problems of tcapulusdb local deployment
Note 3 of disruptor: basic operation of ring queue (without disruptor class)