当前位置:网站首页>Am model in NLP field
Am model in NLP field
2022-07-29 06:11:00 【Quinn-ntmy】
1. Encoder-Decoder frame
In the vast majority of the literature AM Models are attached to Encoder-Decoder Within the framework of . but !!AM The model itself does not depend on Encoder-Decoder frame .
Encoder-Decoder frame : You can think of it as being suitable to deal with a sentence ( Or chapter ) Make another sentence ( Or chapter ) The general processing model of .
- Encoder: For input sentences X Conduct code , Pass the input sentence Nonlinear transformation Turn into Intermediate semantic representation C:C=F(x1, x2, …, xm).
- Decoder: According to the sentence X Of Intermediate semantic representation C And previously generated Historical information y1, y2, …, yi-1 Come on Generate i Words to be generated at any time yi:yi=g(C, y1, y2, …, yi-1).
Every yi All in turn produce , So it looks like the whole system is based on input sentences X The target sentence is generated Y.
When generating the words of the target sentence , No matter which word is generated ,y1, y2 Good ,y3 Good , They use sentences X Of Semantic coding C It's all the same , It doesn't make any difference . That is, sentences X Any word pair in generates a target yi The influence is the same for all of us . amount to A distraction model without focus .
【 but ! If Encoder yes RNN Words , Theoretically, the later the words are input, the greater the influence , Not equal rights , So later Google Put forward seq2seq When the model is found, input the sentence Reverse order input The effect of translation will be better .】
2. AM
The core idea formula :
The probability of each word represents when translating the current word , How much attention is assigned to different English words in the attention allocation model .【 It can be understood that each English word is important for translating a target word Different degrees of influence . The correlation ?】
So In generating each word Yi When , They are all the same intermediate semantic representations C It will be replaced with the one that changes according to the currently generated word Ci.
a key : Fixed intermediate semantic representation C Instead of Adjust to... According to the current output word Add changes in the attention model Ci.
example :“Tom chase Jerry.”
- C Tom = g(0.6 * f2(“Tom”), 0.2 * f2(“chase”), 0.2 * f2(“Jerry”))
- C Chase = g(0.2 * f2(“Tom”), 0.7 * f2(“chase”), 0.1 * f2(“Jerry”))
- C Jerry = g(0.3 * f2(“Tom”), 0.2 * f2(“chase”), 0.5 * f2(“Jerry”))
among ,f2 Function representation Encoder Some kind of transformation function for input words ,eg: If Encoder Yes, it is RNN Model words , This f2 The result of a function is often an input at a certain time xi Hide the status value of the node after .【 The role of hidden layers : Abstract the characteristics of input data , For better linear division 】
g Function representation Encoder According to the middle representation of words, the transformation function of the middle semantic representation of the whole sentence is synthesized . General practice ,g A function is a pair of components Weighted sum of elements , Formulas are often seen in papers :
hypothesis Ci in i yes “ Tom ”, that Tx yes 3, Represents the length of the input sentence ,h1=f2(“Tom”),h2=f2(“Chase”),h3=f2(“Jerry”), The corresponding attention model weights are 0.6,0.2,0.2.
3. Probability distribution value of word attention distribution
above (Tom, 0.6)(chase, 0.2) (Jerry, 0.2) How to get ???
Suppose for the above framework ,Encoder use RNN Model ,Decoder Also used RNN Model .
Refined model :
Attention distribution probability calculation process :
- To adopt RNN Of Decoder Come on , If you want to generate yi word , At the moment i, We can know It's generating yi Previous hidden layer nodes i The output value of the moment Hi Of .
- Then you can use it i The state of hidden layer nodes at any time Hi Corresponding to each word in the input sentence RNN Hidden layer node state hj Contrast one by one , That is, through the function F(hj, Hi) To obtain a Target words Yi The alignment possibilities corresponding to each input word .( This F Functions take different methods in different papers )
- Finally, the function F The output of Softmax Normalize to get a 0~1 The probability distribution value of attention distribution .
Most of the AM All models adopt the above calculation framework , It's just F It may be different .
Usually the AM The model is regarded as a word alignment model .
The probability distribution of each word generated by the target sentence corresponding to the input sentence word can be understood as Enter sentence words And this The target generates words Of Alignment probability .
边栏推荐
- 6、 Pointer meter recognition based on deep learning key points
- ML15 neural network (1)
- Continue the new journey and control smart storage together
- Typical case of xdfs & Aerospace Institute HPC cluster
- Typical cases of xdfs & China Daily Online Collaborative Editing Platform
- Set automatic build in idea - change the code, and refresh the page without restarting the project
- 基于FPGA:运动目标检测(补充仿真结果,可用毕设)
- Isaccessible() method: use reflection techniques to improve your performance several times
- 电脑视频暂停再继续,声音突然变大
- [target detection] 6. SSD
猜你喜欢
【Transformer】ATS: Adaptive Token Sampling For Efficient Vision Transformers
Wechat built-in browser prohibits caching
一、Focal Loss理论及代码实现
ROS教程(Xavier)
ML8自学笔记
迁移学习——Transitive Transfer Learning
Transformer回顾+理解
【Transformer】AdaViT: Adaptive Tokens for Efficient Vision Transformer
QT学习笔记-Qt Model/View
六、基于深度学习关键点的指针式表计识别
随机推荐
迁移学习—Geodesic Flow Kernel for Unsupervised Domain Adaptation
ML17-神经网络实战
fastText学习——文本分类
【Transformer】TransMix: Attend to Mix for Vision Transformers
ML6自学笔记
2、 How to save the images of train and test in MNIST dataset?
引入Spacy模块出错—OSError: [E941] Can‘t find model ‘en‘.
基于STM32:情侣互动玩偶(设计方案+源码+3D图纸+AD电路)
华为云14天鸿蒙设备开发-Day7WIFI功能开发
Typical case of xdfs & Aerospace Institute HPC cluster
基于FPGA:多目标运动检测(手把手教学①)
华为云14天鸿蒙设备开发-Day2编译框架
GAN:生成对抗网络 Generative Adversarial Networks
华为云14天鸿蒙设备开发-Day5驱动子系统开发
迁移学习——Transitive Transfer Learning
第一周任务 深度学习和pytorch基础
Isaccessible() method: use reflection techniques to improve your performance several times
tensorflow中tf.get_variable()函数详解
基于STM32开源:磁流体蓝牙音箱(包含源码+PCB)
Transfer joint matching for unsupervised domain adaptation