当前位置:网站首页>ABSA1: Attentional Encoder Network for Targeted Sentiment Classification
ABSA1: Attentional Encoder Network for Targeted Sentiment Classification
2022-07-29 06:12:00 【Quinn-ntmy】
ABSA1: Attentional Encoder Network for Targeted Sentiment Classification
Paper title :
Attentional Encoder Network for Targeted Sentiment Classification( Click to download pdf)
Paper source code :ABSA model base (PyTorch edition )
One 、 introduction
In the past , about ABSA Most of the models created by the problem are RNN + Attention The idea of .
Existing problems Q:
- RNN Series model ( for example NLP The golden oil in the task LSTM ) Very expressive but Difficult to parallelize , And back propagation over time requires A lot of memory and Computing , Basically every RNN All training algorithms are truncated BPTT, This will affect the ability of the model to capture dependencies over a longer period of time .LSTM To some extent, it can alleviate the problem of gradient disappearance , But usually A lot of training data .
- Most of the previous work neglected The label is not trusted problem (label unreliability issue)—— Neutral label is a vague expression of emotion , Training samples with neutral emotional labels are not credible .
Two 、 Solution
- Propose an attention based model , Draw the target word with attention target And context words context Introspection between (introspective) And interaction (interactive) semantics .
- The label unreliability issue—— Adding an effective label smoothing regularization term to the loss function promotes the model to learn fuzzy labels .
Label smoothing regularization LSR May refer to :
https://zhuanlan.zhihu.com/p/64970719
3、 ... and 、 Model structure AEN
By embedded layer 、 Attention encoder layer 、 Target specific attention layer and output layer .
1、Embedding Layer
There are two kinds of Embedding The way :
(1)GloVe Embedding;
(2)BERT Embedding: You need to convert the given context and target into “[CLS] + context + [SEP]” and “[CLS] + target + [SEP]”.
2、Attentional Encoder Layer
The attention coder layer is LSTM Parallelizable and interactive alternatives , To calculate the Input Embedding The hidden state of . This layer consists of multiple heads (MHA) And pointwise convolution transform (PCT) It consists of two sub modules . It's equivalent to using MHA —> PCT Feature extraction .
(1)MHA(Multi-Head Attention)
- Given a context, embed e^c, Use Intra-MHA Introspective context words (context) modeling , namely self-attention:
- Given a context, embed e c e^c ec Embedded with a target e t e^t et, Use Inter-MHA Context aware target words (context about target) modeling , That is, tradition attention:
(2)PCT(Point-wise Convolution Transformation)
PCT To convert MHA Collected context information . Point by point convolution , That is, the size of convolution kernel is 1, For the above two attention encoder Do the following :
Get the output hidden state of the attention encoder layer :
3、Target-specific Attention Layer
After obtaining introspective context representation and context aware target representation , Use MHA To get the target specific context representation :
4、Output Layer
The final representation of the output of the previous step is obtained through average pooling , Then connect them into the final representation , And use the full connection layer to project the connected vector to the target C Class space .
【 The role of pooling : Reduce the size of the feature map , That is, it can reduce the amount of computation and the required video memory . That is, feature dimensionality reduction . Average pooling can well preserve the characteristics of the overall data , Can highlight the background information ; Maximum pooling can better preserve the characteristics of texture .】
5、Loss Function
In order to solve the problem of untrusted labels , Introduced LSR:
Let's learn about LSR(Label Smoothing Regularization):
By outputting Y Add noise to , Implement constraints on the model , So as to reduce the over fitting of the model , It is used to classify problems .
In the classification problem ,p(y|x) Is the predicted probability distribution ,q(y|x) There are multiple categories of real probability distribution data , Usually use one-hot The form , The category is 1 Express , Other use 0 Express . Use one-hot There are two problems with form :
- It is easy to cause over fitting
- It's easy to rely too much on models , It is easy to make the prediction result deviate from the fact seriously
LSR It can be used to solve the above two problems , Introduce a priori knowledge u(y), It is generally expressed as 1/k,k Represents the number of classes ,ϵ It's the smoothing factor , Belong to [0,1],
This formula is equivalent to the label Y Added noise , Prevent the model from over concentrating the predicted value on the classification with high probability , And assign some probability values to categories with low probability .
边栏推荐
- GA-RPN:引导锚点的建议区域网络
- PyTorch中的模型构建
- Set automatic build in idea - change the code, and refresh the page without restarting the project
- 2021-06-10
- 1、 Transmission of file stream on Web page
- [semantic segmentation] overview of semantic segmentation
- [semantic segmentation] setr_ Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer
- STM32:麦克纳姆轮进行循迹任务(库函数程序代码)
- QT学习笔记-Qt Model/View
- 京微齐力:基于HMEP060的OLED字符显示(及FUXI工程建立演示)
猜你喜欢
噪音监测传感系统
Error in installing pyspider under Windows: Please specify --curl dir=/path/to/build/libcurl solution
基于FPGA:多目标运动检测(手把手教学①)
How to use the pre training language model
HR面必问问题——如何与HR斗志斗勇(收集于FPGA探索者)
Migration learning robot visual domain adaptation with low rank reconstruction
Discussion on the design of distributed full flash memory automatic test platform
[semantic segmentation] Introduction to mapillary dataset
CNOOC, desktop cloud & network disk storage system application case
The differences and reasons between MySQL with and without quotation marks when querying string types
随机推荐
Continue the new journey and control smart storage together
三、如何搞自定义数据集?
5、 Image pixel statistics
一、多个txt文件合并成1个txt文件
【Attention】Visual Attention Network
逻辑回归-项目实战-信用卡检测任务(下)
Change! Change! Change!
华为云14天鸿蒙设备开发-Day2编译框架
Torch. NN. Embedding() details
[semantic segmentation] full attention network for semantic segmentation
[target detection] KL loss: bounding box progression with uncertainty for accurate object detection
GA-RPN:引导锚点的建议区域网络
HAL库学习笔记- 8 串口通信之使用
2、 During OCR training, txt files and picture data are converted to LMDB file format
零基础学FPGA(五):时序逻辑电路设计之计数器(附有呼吸灯实验、简单组合逻辑设计介绍)
NLP领域的AM模型
2021-06-10
MySQL inserts millions of data (using functions and stored procedures)
[image classification] how to use mmclassification to train your classification model
Chongqing Avenue cloud bank, as a representative of the software industry, was invited to participate in the signing ceremony of key projects in Yuzhong District