当前位置：网站首页>ICLR 2022 | pre training language model based on anti self attention mechanism

ICLR 2022 | pre training language model based on anti self attention mechanism

2022-07-06 22:45:00 【Zhiyuan community】

Title of thesis ：

Adversarial Self-Attention For Language Understanding

Source of the paper ：

ICLR 2022

Thesis link ：

https://arxiv.org/pdf/2206.12608.pdf

This paper proposes Adversarial Self-Attention Mechanism （ASA）, Use confrontation training to reconstruct Transformer The attention of , Make the model trained in the polluted model structure .

Try to solve the problem ：

There is a great deal of evidence that , Self attention can be drawn from allowing bias Benefit from ,allowing bias A certain degree of transcendence （ Such as masking, Smoothing of distribution ） Add to the original attention structure . These prior knowledge can enable the model to learn useful knowledge from smaller corpus . But these prior knowledge are generally task specific knowledge , It makes it difficult to extend the model to rich tasks .
adversarial training The robustness of the model is improved by adding disturbances to the input content . The author found that only input embedding Adding disturbances is difficult confuse To attention maps. The attention of the model does not change before and after the disturbance .

In order to solve the above problems , The author puts forward ASA, It has the following advantages ：

Maximize empirical training risk, Learn by automating the process of building prior knowledge biased（or adversarial） Structure .
adversial Structure is learned from input data , bring ASA It is different from the traditional confrontation training or the variant of self attention .
Use gradient inversion layer to convert model and adversary Combine as a whole .
ASA Nature is interpretable .

原网站

版权声明
本文为[Zhiyuan community]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/187/202207061532493398.html

当前位置：网站首页>ICLR 2022 | pre training language model based on anti self attention mechanism

ICLR 2022 | pre training language model based on anti self attention mechanism

边栏推荐

猜你喜欢

随机推荐