当前位置:网站首页>How much computing power does transformer have
How much computing power does transformer have
2022-07-04 05:38:00 【Oriental Golden wood】
https://jishuin.proginn.com/p/763bfbd4ca4f
I found this problem in my recent research , After checking, someone really said this thing
This paper indirectly explains what is in the middle of the residual attention It may not be necessary
therefore
Used linner Instead of this part, a transformer ( Leave the parts that decode each other )
Another one is designed to only use ( Decode each other , Other direct linner There is no residual after decoding )
The result is that the latter is better or the same effect, but the efficiency is not only MLP Efficient in the same task
That is to say, the residuals have little effect
The point is still MLP
And double output will be better than single output
And softmax of no avail , Self attention , The essence is a relational dictionary , Like Xinhua Dictionary
You can refer to the following code ( A little messy )
https://blog.csdn.net/weixin_32759777/category_11446474.html
Shield one side when reasoning , In this way, they can be translated and used
边栏推荐
- Halcon图片标定,使得后续图片处理过后变成与模板图片一样
- 1480. Dynamic sum of one-dimensional array
- [MySQL practice of massive data with high concurrency, high performance and high availability -8] - transaction isolation mechanism of InnoDB
- TCP state transition diagram
- 19. Framebuffer application programming
- [matlab] matlab simulates digital baseband transmission system eye diagram of bipolar baseband signal (class I part response waveform)
- Talk about the SQL server version of DTM sub transaction barrier function
- [技术发展-25]:广播电视网、互联网、电信网、电网四网融合技术
- [interested reading] advantageous filtering modeling on long term user behavior sequences for click through rate pre
- Principle and practice of common defects in RSA encryption application
猜你喜欢
Halcon图片标定,使得后续图片处理过后变成与模板图片一样
2022g2 power station boiler stoker special operation certificate examination question bank and answers
The data mark is a piece of fat meat, and it is not only China Manfu technology that focuses on this meat
C语言简易学生管理系统(含源码)
2022年T电梯修理操作证考试题库及模拟考试
ping端口神器psping
[Excel] 数据透视图
一键过滤选择百度网盘文件
PostgreSQL has officially surpassed mysql. Is this guy too strong!
VB.net GIF(制作、拆解——优化代码,类库——5)
随机推荐
【兴趣阅读】Adversarial Filtering Modeling on Long-term User Behavior Sequences for Click-Through Rate Pre
IP时代来临,电竞酒店如何借好游戏的“东风”?
[matlab] matlab simulates digital bandpass transmission system ask, PSK, FSK system
[untitled]
一键过滤选择百度网盘文件
Introduction to AMBA
VB.net 简单的处理图片,黑白(类库——7)
【雕爷学编程】Arduino动手做(105)---压电陶瓷振动模块
Analysis of classical pointer and array written test questions in C language
Two sides of the evening: tell me about the bloom filter and cuckoo filter? Application scenario? I'm confused..
(4) Canal multi instance use
2022G2电站锅炉司炉特种作业证考试题库及答案
总线的基本概念
[matlab] communication signal modulation general function - low pass filter
LM小型可编程控制器软件(基于CoDeSys)笔记二十二:错误4268/4052
Penetration tool - sqlmap
[matlab] general function of communication signal modulation bandpass filter
input显示当前选择的图片
[QT] timer
Flask