当前位置:网站首页>ICLR 2022 | pre training language model based on anti self attention mechanism

ICLR 2022 | pre training language model based on anti self attention mechanism

2022-07-06 22:45:00 Zhiyuan community

Title of thesis :

Adversarial Self-Attention For Language Understanding

Source of the paper :

ICLR 2022

Thesis link :

https://arxiv.org/pdf/2206.12608.pdf

 

This paper proposes Adversarial Self-Attention Mechanism (ASA), Use confrontation training to reconstruct Transformer The attention of , Make the model trained in the polluted model structure .
Try to solve the problem :
  1. There is a great deal of evidence that , Self attention can be drawn from allowing bias Benefit from ,allowing bias A certain degree of transcendence ( Such as masking, Smoothing of distribution ) Add to the original attention structure . These prior knowledge can enable the model to learn useful knowledge from smaller corpus . But these prior knowledge are generally task specific knowledge , It makes it difficult to extend the model to rich tasks .
  2. adversarial training The robustness of the model is improved by adding disturbances to the input content . The author found that only input embedding Adding disturbances is difficult confuse To attention maps. The attention of the model does not change before and after the disturbance .

In order to solve the above problems , The author puts forward ASA, It has the following advantages :
  1. Maximize empirical training risk, Learn by automating the process of building prior knowledge biased(or adversarial) Structure .
  2. adversial Structure is learned from input data , bring ASA It is different from the traditional confrontation training or the variant of self attention .
  3. Use gradient inversion layer to convert model and adversary Combine as a whole .
  4. ASA Nature is interpretable .

原网站

版权声明
本文为[Zhiyuan community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207061532493398.html