当前位置:网站首页>ICLR 2022 | pre training language model based on anti self attention mechanism
ICLR 2022 | pre training language model based on anti self attention mechanism
2022-07-06 22:45:00 【Zhiyuan community】

Title of thesis :
Adversarial Self-Attention For Language Understanding
ICLR 2022
https://arxiv.org/pdf/2206.12608.pdf
There is a great deal of evidence that , Self attention can be drawn from allowing bias Benefit from ,allowing bias A certain degree of transcendence ( Such as masking, Smoothing of distribution ) Add to the original attention structure . These prior knowledge can enable the model to learn useful knowledge from smaller corpus . But these prior knowledge are generally task specific knowledge , It makes it difficult to extend the model to rich tasks . adversarial training The robustness of the model is improved by adding disturbances to the input content . The author found that only input embedding Adding disturbances is difficult confuse To attention maps. The attention of the model does not change before and after the disturbance .
Maximize empirical training risk, Learn by automating the process of building prior knowledge biased(or adversarial) Structure . adversial Structure is learned from input data , bring ASA It is different from the traditional confrontation training or the variant of self attention . Use gradient inversion layer to convert model and adversary Combine as a whole . ASA Nature is interpretable .
边栏推荐
猜你喜欢

Improving Multimodal Accuracy Through Modality Pre-training and Attention

自定义 swap 函数

Sword finger offer question brushing record 1

Traversal of a tree in first order, middle order, and then order

云原生技术--- 容器知识点

MySQL authentication bypass vulnerability (cve-2012-2122)

金融人士必读书籍系列之六:权益投资(基于cfa考试内容大纲和框架)

signed、unsigned关键字

Installation and use of labelimg

新手程序员该不该背代码?
随机推荐
Aardio - Method of batch processing attributes and callback functions when encapsulating Libraries
[leetcode] 19. Delete the penultimate node of the linked list
Financial professionals must read book series 6: equity investment (based on the outline and framework of the CFA exam)
MATLAB小技巧(27)灰色预测
NPDP certification | how do product managers communicate across functions / teams?
How do I write Flask's excellent debug log message to a file in production?
cuda 探索
The SQL response is slow. What are your troubleshooting ideas?
Typescript get function parameter type
机试刷题1
2022-07-05 stonedb sub query processing parsing time analysis
Leetcode: interview question 17.24 Maximum cumulative sum of submatrix (to be studied)
专为决策树打造,新加坡国立大学&清华大学联合提出快速安全的联邦学习新系统
Windows Auzre 微软的云计算产品的后台操作界面
Pit encountered by handwritten ABA
LeetCode 练习——剑指 Offer 26. 树的子结构
Unity3d minigame unity webgl transform plug-in converts wechat games to use dlopen, you need to use embedded 's problem
Classification, function and usage of MySQL constraints
Aardio - does not declare the method of directly passing float values
case 关键字后面的值有什么要求吗?