当前位置:网站首页>How much computing power does transformer have
How much computing power does transformer have
2022-07-04 05:38:00 【Oriental Golden wood】
https://jishuin.proginn.com/p/763bfbd4ca4f
I found this problem in my recent research , After checking, someone really said this thing
This paper indirectly explains what is in the middle of the residual attention It may not be necessary
therefore
Used linner Instead of this part, a transformer ( Leave the parts that decode each other )
Another one is designed to only use ( Decode each other , Other direct linner There is no residual after decoding )
The result is that the latter is better or the same effect, but the efficiency is not only MLP Efficient in the same task
That is to say, the residuals have little effect
The point is still MLP
And double output will be better than single output
And softmax of no avail , Self attention , The essence is a relational dictionary , Like Xinhua Dictionary
You can refer to the following code ( A little messy )
https://blog.csdn.net/weixin_32759777/category_11446474.html
Shield one side when reasoning , In this way, they can be translated and used

边栏推荐
- [QT] create mycombobox click event
- 总线的基本概念
- Notepad++--显示相关的配置
- BUU-Real-[PHP]XXE
- Redis realizes ranking function
- 拓扑排序和关键路径的图形化显示
- C # character similarity comparison general class
- 数据标注是一块肥肉,盯上这块肉的不止中国丨曼孚科技
- JS string splicing
- Build an Internet of things infrared temperature measuring punch in machine with esp32 / rush to work after the Spring Festival? Baa, no matter how hard you work, you must take your temperature first
猜你喜欢
![[interested reading] advantageous filtering modeling on long term user behavior sequences for click through rate pre](/img/3e/b5df691ca1790469eb1b4e8ea5b4c0.png)
[interested reading] advantageous filtering modeling on long term user behavior sequences for click through rate pre

BUU-Crypto-Cipher

Daily question brushing record (12)

Two sides of the evening: tell me about the bloom filter and cuckoo filter? Application scenario? I'm confused..

Simulated small root pile

2022危险化学品经营单位安全管理人员上岗证题库及答案
![[paper summary] zero shot semantic segmentation](/img/78/ee64118d86a7e43ec4d1cb97191fbe.jpg)
[paper summary] zero shot semantic segmentation

KMP match string

Graduation design of small programs -- small programs of food and recipes
![BUU-Crypto-[HDCTF2019]basic rsa](/img/d0/8e451dabb2a6897f6680220d16d04d.jpg)
BUU-Crypto-[HDCTF2019]basic rsa
随机推荐
Automated testing selenium foundation -- webdriverapi
Redis realizes ranking function
Flink1.13 basic SQL syntax (II) join operation
Daily question brushing record (12)
Basic concept of bus
企业级日志分析系统ELK(如果事与愿违那一定另有安排)
Li Kou's 300th weekly match
Use of hutool Pinyin tool
光模块字母含义及参数简称大全
Appearance of LabVIEW error dialog box
VB.net 简单的处理图片,黑白(类库——7)
Flask
LC周赛300
KMP match string
BUU-Real-[PHP]XXE
Introduction To AMBA 简单理解
PostgreSQL has officially surpassed mysql. Is this guy too strong!
Upper computer software development - log information is stored in the database based on log4net
Graduation design of small programs -- small programs of food and recipes
Leetcode 184 Employees with the highest wages in the Department (July 3, 2022)