当前位置:网站首页>ICLR 2022 | pre training language model based on anti self attention mechanism
ICLR 2022 | pre training language model based on anti self attention mechanism
2022-07-06 22:45:00 【Zhiyuan community】

Title of thesis :
Adversarial Self-Attention For Language Understanding
ICLR 2022
https://arxiv.org/pdf/2206.12608.pdf
There is a great deal of evidence that , Self attention can be drawn from allowing bias Benefit from ,allowing bias A certain degree of transcendence ( Such as masking, Smoothing of distribution ) Add to the original attention structure . These prior knowledge can enable the model to learn useful knowledge from smaller corpus . But these prior knowledge are generally task specific knowledge , It makes it difficult to extend the model to rich tasks . adversarial training The robustness of the model is improved by adding disturbances to the input content . The author found that only input embedding Adding disturbances is difficult confuse To attention maps. The attention of the model does not change before and after the disturbance .
Maximize empirical training risk, Learn by automating the process of building prior knowledge biased(or adversarial) Structure . adversial Structure is learned from input data , bring ASA It is different from the traditional confrontation training or the variant of self attention . Use gradient inversion layer to convert model and adversary Combine as a whole . ASA Nature is interpretable .
边栏推荐
- Improving Multimodal Accuracy Through Modality Pre-training and Attention
- Aardio - construct a multi button component with customplus library +plus
- return 关键字
- [step on pit collection] attempting to deserialize object on CUDA device+buff/cache occupy too much +pad_ sequence
- 自制J-Flash烧录工具——Qt调用jlinkARM.dll方式
- How do I write Flask's excellent debug log message to a file in production?
- 自定义 swap 函数
- Is there any requirement for the value after the case keyword?
- POJ 1258 Agri-Net
- Pit encountered by handwritten ABA
猜你喜欢

Installation and use of labelimg

手写ABA遇到的坑

Heavyweight news | softing fg-200 has obtained China 3C explosion-proof certification to provide safety assurance for customers' on-site testing

Advantages of link local address in IPv6

Should novice programmers memorize code?

AdaViT——自适应选择计算结构的动态网络

如何用程序确认当前系统的存储模式?

【编译原理】做了一半的LR(0)分析器

云原生技术--- 容器知识点

UE4蓝图学习篇(四)--流程控制ForLoop和WhileLoop
随机推荐
TypeScript获取函数参数类型
Sword finger offer question brushing record 1
OpenNMS separation database
CSDN 上传图片取消自动加水印的方法
Build op-tee development environment based on qemuv8
Slide the uniapp to a certain height and fix an element to the top effect demo (organize)
pytorch_ Yolox pruning [with code]
How to confirm the storage mode of the current system by program?
剪映+json解析将视频中的声音转换成文本
memcached
雅思口语的具体步骤和时间安排是什么样的?
Void keyword
On the problems of born charge and non analytical correction in phonon and heat transport calculations
QT signal and slot
BasicVSR_PlusPlus-master测试视频、图片
如何用程序确认当前系统的存储模式?
Aardio - Method of batch processing attributes and callback functions when encapsulating Libraries
(18) LCD1602 experiment
Detailed explanation of ThreadLocal
Config:invalid signature solution and troubleshooting details