当前位置:网站首页>Transformer T5 model read slowly
Transformer T5 model read slowly
2022-07-03 18:34:00 【Dongxuan】
Transformer T5 Read the model slowly
Code address :https://github.com/google-research/text-to-text-transfer-transformer
Other reference addresses :
Start with his thesis , But many pages ...
Address of thesis :https://arxiv.org/abs/1910.10683
One 、 A rough understanding of the model
T5 The origin of , Namely prompt Ideological evolution , Convert all downstream tasks into one task Let all tasks share one solution ( It greatly reduces the amount of parameters , If 2 A mission , Each task 10 Ten thousand parameters , Then this generative method is still 10 ten thousand , The traditional downstream model needs 20 Ten thousand parameters ), Generative to generate answers . As for the loss function, it depends on the later design .( What is shown in the following figure means the same )
Two 、 Some basic settings of the model
The dataset is C4 A big data set of web crawlers , The overall model is based on Transformer The generation model of .
1. Model framework :
It is similar to classic Transformer The differences between models are 3 spot ,
2. Input form :
① You can fine tune the downstream respectively , You don't need all downstream tasks to fine tune together .② utilize prefix Prefix hints to indicate different learning tasks .
边栏推荐
- [combinatorics] generating function (positive integer splitting | unordered | ordered | allowed repetition | not allowed repetition | unordered not repeated splitting | unordered repeated splitting)
- Xception for deeplab v3+ (including super detailed code comments and original drawing of the paper)
- Coordinate layer conversion tool (video)
- 2022-2028 global plasmid DNA cdmo industry research and trend analysis report
- 2022-2028 global lithium battery copper foil industry research and trend analysis report
- Torch learning notes (6) -- logistic regression model (self training)
- The number of incremental paths in the grid graph [dfs reverse path + memory dfs]
- 199. Right view of binary tree - breadth search
- Ping problem between virtual machine and development board
- [combinatorics] generating function (use generating function to solve the combination number of multiple sets R)
猜你喜欢
Redis core technology and practice - learning notes (VII) sentinel mechanism
How to analyze the rising and falling rules of London gold trend chart
How to track the real-time trend of Bank of London
Bloom filter [proposed by bloom in 1970; redis cache penetration solution]
Have you learned the correct expression posture of programmers on Valentine's day?
[enumeration] annoying frogs always step on my rice fields: (who is the most hateful? (POJ hundred practice 2812)
[Godot] add menu button
Theoretical description of linear equations and summary of methods for solving linear equations by eigen
Win 11 major updates, new features love love.
Line by line explanation of yolox source code of anchor free series network (5) -- mosaic data enhancement and mathematical understanding
随机推荐
How to track the real-time trend of Bank of London
PHP MySQL preprocessing statement
How to analyze the rising and falling rules of London gold trend chart
SDNUOJ1015
189. Rotation array
[tutorial] build your first application on coreos
[combinatorics] exponential generating function (example 2 of solving multiple set permutation with exponential generating function)
Prototype inheritance..
Win32: dump file analysis of heap corruption
CV in transformer learning notes (continuously updated)
Valentine's day, send you a little red flower~
NFT new opportunity, multimedia NFT aggregation platform okaleido will be launched soon
Image 24 bit depth to 8 bit depth
win32:堆破坏的dump文件分析
ES7 - Optimization of promise
Enterprise custom form engine solution (12) -- form rule engine 2
Grammaire anglaise Nom - Classification
2022-2028 global scar care product industry research and trend analysis report
MySQL duplicate check
Why can deeplab v3+ be a God? (the explanation of the paper includes super detailed notes + Chinese English comparison + pictures)