当前位置:网站首页>ICLR 2022 | pre training language model based on anti self attention mechanism
ICLR 2022 | pre training language model based on anti self attention mechanism
2022-07-06 22:45:00 【Zhiyuan community】
Title of thesis :
Adversarial Self-Attention For Language Understanding
ICLR 2022
https://arxiv.org/pdf/2206.12608.pdf
There is a great deal of evidence that , Self attention can be drawn from allowing bias Benefit from ,allowing bias A certain degree of transcendence ( Such as masking, Smoothing of distribution ) Add to the original attention structure . These prior knowledge can enable the model to learn useful knowledge from smaller corpus . But these prior knowledge are generally task specific knowledge , It makes it difficult to extend the model to rich tasks . adversarial training The robustness of the model is improved by adding disturbances to the input content . The author found that only input embedding Adding disturbances is difficult confuse To attention maps. The attention of the model does not change before and after the disturbance .
Maximize empirical training risk, Learn by automating the process of building prior knowledge biased(or adversarial) Structure . adversial Structure is learned from input data , bring ASA It is different from the traditional confrontation training or the variant of self attention . Use gradient inversion layer to convert model and adversary Combine as a whole . ASA Nature is interpretable .
边栏推荐
- (18) LCD1602 experiment
- UE4蓝图学习篇(四)--流程控制ForLoop和WhileLoop
- 剪映+json解析将视频中的声音转换成文本
- OpenSSL: a full-featured toolkit for TLS and SSL protocols, and a general encryption library
- 视图(view)
- MySQL ---- first acquaintance with MySQL
- ThreadLocal详解
- That's why you can't understand recursion
- Aardio - 不声明直接传float数值的方法
- volatile关键字
猜你喜欢
Export MySQL table data in pure mode
ICLR 2022 | 基于对抗自注意力机制的预训练语言模型
AdaViT——自适应选择计算结构的动态网络
Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medi
树的先序中序后序遍历
CocosCreator+TypeScripts自己写一个对象池
Heavyweight news | softing fg-200 has obtained China 3C explosion-proof certification to provide safety assurance for customers' on-site testing
【编译原理】做了一半的LR(0)分析器
Improving Multimodal Accuracy Through Modality Pre-training and Attention
【LeetCode】19、 删除链表的倒数第 N 个结点
随机推荐
What are the interface tests? What are the general test points?
Aardio - does not declare the method of directly passing float values
剑指offer刷题记录1
Machine test question 1
Slide the uniapp to a certain height and fix an element to the top effect demo (organize)
Some suggestions for foreign lead2022 in the second half of the year
UVa 11732 – strcmp() Anyone?
Const keyword
case 关键字后面的值有什么要求吗?
Adavit -- dynamic network with adaptive selection of computing structure
OpenNMS separation database
Rust knowledge mind map XMIND
Sword finger offer question brushing record 1
Financial professionals must read book series 6: equity investment (based on the outline and framework of the CFA exam)
config:invalid signature 解决办法和问题排查详解
three. JS gorgeous bubble effect
On the problems of born charge and non analytical correction in phonon and heat transport calculations
Export MySQL table data in pure mode
NPDP certification | how do product managers communicate across functions / teams?
NPM cannot install sharp