当前位置:网站首页>ICLR 2022 | pre training language model based on anti self attention mechanism
ICLR 2022 | pre training language model based on anti self attention mechanism
2022-07-06 22:45:00 【Zhiyuan community】

Title of thesis :
Adversarial Self-Attention For Language Understanding
ICLR 2022
https://arxiv.org/pdf/2206.12608.pdf
There is a great deal of evidence that , Self attention can be drawn from allowing bias Benefit from ,allowing bias A certain degree of transcendence ( Such as masking, Smoothing of distribution ) Add to the original attention structure . These prior knowledge can enable the model to learn useful knowledge from smaller corpus . But these prior knowledge are generally task specific knowledge , It makes it difficult to extend the model to rich tasks . adversarial training The robustness of the model is improved by adding disturbances to the input content . The author found that only input embedding Adding disturbances is difficult confuse To attention maps. The attention of the model does not change before and after the disturbance .
Maximize empirical training risk, Learn by automating the process of building prior knowledge biased(or adversarial) Structure . adversial Structure is learned from input data , bring ASA It is different from the traditional confrontation training or the variant of self attention . Use gradient inversion layer to convert model and adversary Combine as a whole . ASA Nature is interpretable .
边栏推荐
- memcached
- 2014阿里巴巴web前实习生项目分析(1)
- BasicVSR_PlusPlus-master测试视频、图片
- uniapp设置背景图效果demo(整理)
- The ceiling of MySQL tutorial. Collect it and take your time
- AdaViT——自适应选择计算结构的动态网络
- uniapp滑动到一定的高度后固定某个元素到顶部效果demo(整理)
- 新手程序员该不该背代码?
- MySQL ---- first acquaintance with MySQL
- Dealing with the crash of QT quick project in offscreen mode
猜你喜欢

Leetcode: interview question 17.24 Maximum cumulative sum of submatrix (to be studied)

云原生技术--- 容器知识点

ACL 2022 | 序列标注的小样本NER:融合标签语义的双塔BERT模型

signed、unsigned关键字

Signed and unsigned keywords
The SQL response is slow. What are your troubleshooting ideas?

(18) LCD1602 experiment

Aardio - Method of batch processing attributes and callback functions when encapsulating Libraries

Mysql 身份认证绕过漏洞(CVE-2012-2122)

How to confirm the storage mode of the current system by program?
随机推荐
Aardio - 不声明直接传float数值的方法
自制J-Flash烧录工具——Qt调用jlinkARM.dll方式
TypeScript获取函数参数类型
Build op-tee development environment based on qemuv8
Improving Multimodal Accuracy Through Modality Pre-training and Attention
pytorch_ Yolox pruning [with code]
volatile关键字
extern关键字
BasicVSR_PlusPlus-master测试视频、图片
如何用程序确认当前系统的存储模式?
Typescript get function parameter type
云原生技术--- 容器知识点
case 关键字后面的值有什么要求吗?
Return keyword
手写ABA遇到的坑
What are the interface tests? What are the general test points?
The difference between enumeration and define macro
How big is the empty structure?
config:invalid signature 解决办法和问题排查详解
Advantages of link local address in IPv6