当前位置:网站首页>How much computing power does transformer have

How much computing power does transformer have

2022-07-04 05:38:00 Oriental Golden wood

https://jishuin.proginn.com/p/763bfbd4ca4f

I found this problem in my recent research , After checking, someone really said this thing
This paper indirectly explains what is in the middle of the residual attention It may not be necessary
therefore
Used linner Instead of this part, a transformer ( Leave the parts that decode each other )
Another one is designed to only use ( Decode each other , Other direct linner There is no residual after decoding )
The result is that the latter is better or the same effect, but the efficiency is not only MLP Efficient in the same task
That is to say, the residuals have little effect
The point is still MLP
And double output will be better than single output
And softmax of no avail , Self attention , The essence is a relational dictionary , Like Xinhua Dictionary
You can refer to the following code ( A little messy )
https://blog.csdn.net/weixin_32759777/category_11446474.html
 Insert picture description here
Shield one side when reasoning , In this way, they can be translated and used

 Insert picture description here

原网站

版权声明
本文为[Oriental Golden wood]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/185/202207040513488791.html