当前位置:网站首页>ICLR 2022 | pre training language model based on anti self attention mechanism
ICLR 2022 | pre training language model based on anti self attention mechanism
2022-07-06 22:45:00 【Zhiyuan community】
Title of thesis :
Adversarial Self-Attention For Language Understanding
ICLR 2022
https://arxiv.org/pdf/2206.12608.pdf
There is a great deal of evidence that , Self attention can be drawn from allowing bias Benefit from ,allowing bias A certain degree of transcendence ( Such as masking, Smoothing of distribution ) Add to the original attention structure . These prior knowledge can enable the model to learn useful knowledge from smaller corpus . But these prior knowledge are generally task specific knowledge , It makes it difficult to extend the model to rich tasks . adversarial training The robustness of the model is improved by adding disturbances to the input content . The author found that only input embedding Adding disturbances is difficult confuse To attention maps. The attention of the model does not change before and after the disturbance .
Maximize empirical training risk, Learn by automating the process of building prior knowledge biased(or adversarial) Structure . adversial Structure is learned from input data , bring ASA It is different from the traditional confrontation training or the variant of self attention . Use gradient inversion layer to convert model and adversary Combine as a whole . ASA Nature is interpretable .
边栏推荐
- Unity3d minigame unity webgl transform plug-in converts wechat games to use dlopen, you need to use embedded 's problem
- case 关键字后面的值有什么要求吗?
- Comparison between variable and "zero value"
- Aardio - integrate variable values into a string of text through variable names
- Build op-tee development environment based on qemuv8
- pytorch_YOLOX剪枝【附代码】
- 2022-07-04 the high-performance database engine stonedb of MySQL is compiled and run in centos7.9
- How do I write Flask's excellent debug log message to a file in production?
- 【雅思口语】安娜口语学习记录part1
- Advantages of link local address in IPv6
猜你喜欢
MySQL数据库基本操作-DML
Export MySQL table data in pure mode
Aardio - construct a multi button component with customplus library +plus
Config:invalid signature solution and troubleshooting details
Web APIs DOM 时间对象
欧洲生物信息研究所2021亮点报告发布:采用AlphaFold已预测出近1百万个蛋白质
Sword finger offer question brushing record 1
Signed and unsigned keywords
uniapp滑动到一定的高度后固定某个元素到顶部效果demo(整理)
MySQL ---- first acquaintance with MySQL
随机推荐
使用云服务器搭建代理
Installation and use of labelimg
Custom swap function
MATLAB小技巧(27)灰色预测
poj 1094 Sorting It All Out (拓扑排序)
Volatile keyword
Self made j-flash burning tool -- QT calls jlinkarm DLL mode
[leetcode] 19. Delete the penultimate node of the linked list
云原生技术--- 容器知识点
Aardio - does not declare the method of directly passing float values
Some suggestions for foreign lead2022 in the second half of the year
ACL 2022 | 序列标注的小样本NER:融合标签语义的双塔BERT模型
How big is the empty structure?
MySQL教程的天花板,收藏好,慢慢看
DR-Net: dual-rotation network with feature map enhancement for medical image segmentation
Mysql 身份认证绕过漏洞(CVE-2012-2122)
如何用程序确认当前系统的存储模式?
What are the interface tests? What are the general test points?
MySQL数据库基本操作-DML
case 关键字后面的值有什么要求吗?