当前位置:网站首页>Am model in NLP field
Am model in NLP field
2022-07-29 06:11:00 【Quinn-ntmy】
1. Encoder-Decoder frame
In the vast majority of the literature AM Models are attached to Encoder-Decoder Within the framework of . but !!AM The model itself does not depend on Encoder-Decoder frame .
Encoder-Decoder frame : You can think of it as being suitable to deal with a sentence ( Or chapter ) Make another sentence ( Or chapter ) The general processing model of .
- Encoder: For input sentences X Conduct code , Pass the input sentence Nonlinear transformation Turn into Intermediate semantic representation C:C=F(x1, x2, …, xm).
- Decoder: According to the sentence X Of Intermediate semantic representation C And previously generated Historical information y1, y2, …, yi-1 Come on Generate i Words to be generated at any time yi:yi=g(C, y1, y2, …, yi-1).
Every yi All in turn produce , So it looks like the whole system is based on input sentences X The target sentence is generated Y.
When generating the words of the target sentence , No matter which word is generated ,y1, y2 Good ,y3 Good , They use sentences X Of Semantic coding C It's all the same , It doesn't make any difference . That is, sentences X Any word pair in generates a target yi The influence is the same for all of us . amount to A distraction model without focus .
【 but ! If Encoder yes RNN Words , Theoretically, the later the words are input, the greater the influence , Not equal rights , So later Google Put forward seq2seq When the model is found, input the sentence Reverse order input The effect of translation will be better .】
2. AM
The core idea formula :
The probability of each word represents when translating the current word , How much attention is assigned to different English words in the attention allocation model .【 It can be understood that each English word is important for translating a target word Different degrees of influence . The correlation ?】
So In generating each word Yi When , They are all the same intermediate semantic representations C It will be replaced with the one that changes according to the currently generated word Ci.
a key : Fixed intermediate semantic representation C Instead of Adjust to... According to the current output word Add changes in the attention model Ci.
example :“Tom chase Jerry.”
- C Tom = g(0.6 * f2(“Tom”), 0.2 * f2(“chase”), 0.2 * f2(“Jerry”))
- C Chase = g(0.2 * f2(“Tom”), 0.7 * f2(“chase”), 0.1 * f2(“Jerry”))
- C Jerry = g(0.3 * f2(“Tom”), 0.2 * f2(“chase”), 0.5 * f2(“Jerry”))
among ,f2 Function representation Encoder Some kind of transformation function for input words ,eg: If Encoder Yes, it is RNN Model words , This f2 The result of a function is often an input at a certain time xi Hide the status value of the node after .【 The role of hidden layers : Abstract the characteristics of input data , For better linear division 】
g Function representation Encoder According to the middle representation of words, the transformation function of the middle semantic representation of the whole sentence is synthesized . General practice ,g A function is a pair of components Weighted sum of elements , Formulas are often seen in papers :
hypothesis Ci in i yes “ Tom ”, that Tx yes 3, Represents the length of the input sentence ,h1=f2(“Tom”),h2=f2(“Chase”),h3=f2(“Jerry”), The corresponding attention model weights are 0.6,0.2,0.2.
3. Probability distribution value of word attention distribution
above (Tom, 0.6)(chase, 0.2) (Jerry, 0.2) How to get ???
Suppose for the above framework ,Encoder use RNN Model ,Decoder Also used RNN Model .
Refined model :
Attention distribution probability calculation process :
- To adopt RNN Of Decoder Come on , If you want to generate yi word , At the moment i, We can know It's generating yi Previous hidden layer nodes i The output value of the moment Hi Of .
- Then you can use it i The state of hidden layer nodes at any time Hi Corresponding to each word in the input sentence RNN Hidden layer node state hj Contrast one by one , That is, through the function F(hj, Hi) To obtain a Target words Yi The alignment possibilities corresponding to each input word .( This F Functions take different methods in different papers )
- Finally, the function F The output of Softmax Normalize to get a 0~1 The probability distribution value of attention distribution .
Most of the AM All models adopt the above calculation framework , It's just F It may be different .
Usually the AM The model is regarded as a word alignment model .
The probability distribution of each word generated by the target sentence corresponding to the input sentence word can be understood as Enter sentence words And this The target generates words Of Alignment probability .
边栏推荐
- ML16 neural network (2)
- ROS教程(Xavier)
- 二、如何保存MNIST数据集中train和test的图片?
- 一、迁移学习与fine-tuning有什么区别?
- Wechat built-in browser prohibits caching
- Migration learning robot visual domain adaptation with low rank reconstruction
- Pytorch Basics (Introductory)
- Beijing Baode & taocloud jointly build the road of information innovation
- 2、 Multi concurrent interface pressure test
- 基于STM32:情侣互动玩偶(设计方案+源码+3D图纸+AD电路)
猜你喜欢
1、 Focal loss theory and code implementation
Low rank transfer subspace learning
PyTorch的数据读取机制
电脑视频暂停再继续,声音突然变大
基于STC51:四轴飞控开源项目原理图与源码(入门级DIY)
【Transformer】SOFT: Softmax-free Transformer with Linear Complexity
ML8自学笔记-LDA原理公式推导
Power Bi report server custom authentication
Discussion on the design of distributed full flash memory automatic test platform
【Transformer】ACMix:On the Integration of Self-Attention and Convolution
随机推荐
京微齐力:基于HMEP060的心率血氧模块开发(1:FPGA发送多位指令)
ASM piling: after learning ASM tree API, you don't have to be afraid of hook anymore
【Transformer】SOFT: Softmax-free Transformer with Linear Complexity
ML6自学笔记
引入Spacy模块出错—OSError: [E941] Can‘t find model ‘en‘.
Set automatic build in idea - change the code, and refresh the page without restarting the project
三、如何读取视频?
二、深度学习数据增强方法汇总
Torch. NN. Embedding() details
PyTorch的数据读取机制
Transfer learning
[target detection] 6. SSD
研究生新生培训第一周:深度学习和pytorch基础
Migration learning notes - adaptive component analysis
ML17-神经网络实战
tensorboard使用
[tensorrt] convert pytorch into deployable tensorrt
2、 How to save the images of train and test in MNIST dataset?
迁移学习—Geodesic Flow Kernel for Unsupervised Domain Adaptation
CNOOC, desktop cloud & network disk storage system application case