当前位置:网站首页>How much computing power does transformer have
How much computing power does transformer have
2022-07-04 05:38:00 【Oriental Golden wood】
https://jishuin.proginn.com/p/763bfbd4ca4f
I found this problem in my recent research , After checking, someone really said this thing
This paper indirectly explains what is in the middle of the residual attention It may not be necessary
therefore
Used linner Instead of this part, a transformer ( Leave the parts that decode each other )
Another one is designed to only use ( Decode each other , Other direct linner There is no residual after decoding )
The result is that the latter is better or the same effect, but the efficiency is not only MLP Efficient in the same task
That is to say, the residuals have little effect
The point is still MLP
And double output will be better than single output
And softmax of no avail , Self attention , The essence is a relational dictionary , Like Xinhua Dictionary
You can refer to the following code ( A little messy )
https://blog.csdn.net/weixin_32759777/category_11446474.html
Shield one side when reasoning , In this way, they can be translated and used
边栏推荐
- 简易零钱通
- Signification des lettres du module optique et abréviation des paramètres Daquan
- 远程桌面客户端 RDP
- BUU-Crypto-[GUET-CTF2019]BabyRSA
- Zhanrui tankbang | jointly build, cooperate and win-win zhanrui core ecology
- [wechat applet] template and configuration (wxml, wxss, global and page configuration, network data request)
- Write a complete answer applet (including single choice questions, judgment questions and multiple topics) (III) single choice questions, judgment questions, and the first question display
- IP时代来临,电竞酒店如何借好游戏的“东风”?
- Simulink与Arduino串口通信
- Integer type of C language
猜你喜欢
2022 a special equipment related management (elevator) examination questions simulation examination platform operation
19. Framebuffer application programming
Simulink与Arduino串口通信
[paper summary] zero shot semantic segmentation
Integer type of C language
Evolution of system architecture: differences and connections between SOA and microservice architecture
基于单片机的太阳能杀虫系统
[Excel] 数据透视图
Solar insect killing system based on single chip microcomputer
VB. Net simple processing pictures, black and white (class library - 7)
随机推荐
VB.net 调用FFmpeg简单处理视频(类库——6)
Flask
Nodejs learning document
C语言简易学生管理系统(含源码)
Simulink与Arduino串口通信
Unity2D--人物移动并转身
[matlab] matlab simulates digital baseband transmission system eye diagram of bipolar baseband signal (class I part response waveform)
How to use postman to realize simple interface Association [add, delete, modify and query]
Appearance of LabVIEW error dialog box
724. Find the central subscript of the array
ANSYS command
2022g2 power station boiler stoker special operation certificate examination question bank and answers
Descriptive analysis of data distribution characteristics (data exploration)
Just do it with your hands 7 - * project construction details 2 - hook configuration
VB. Net simple processing pictures, black and white (class library - 7)
BUU-Crypto-[HDCTF2019]basic rsa
企业级日志分析系统ELK(如果事与愿违那一定另有安排)
Simulated small root pile
KMP匹配字符串
Introduction to AMBA