当前位置:网站首页>Transformer T5 model read slowly
Transformer T5 model read slowly
2022-07-03 18:34:00 【Dongxuan】
Transformer T5 Read the model slowly
Code address :https://github.com/google-research/text-to-text-transfer-transformer
Other reference addresses :
Start with his thesis , But many pages ...
Address of thesis :https://arxiv.org/abs/1910.10683

One 、 A rough understanding of the model
T5 The origin of , Namely prompt Ideological evolution , Convert all downstream tasks into one task Let all tasks share one solution ( It greatly reduces the amount of parameters , If 2 A mission , Each task 10 Ten thousand parameters , Then this generative method is still 10 ten thousand , The traditional downstream model needs 20 Ten thousand parameters ), Generative to generate answers . As for the loss function, it depends on the later design .( What is shown in the following figure means the same )

Two 、 Some basic settings of the model
The dataset is C4 A big data set of web crawlers , The overall model is based on Transformer The generation model of .
1. Model framework :
It is similar to classic Transformer The differences between models are 3 spot ,
2. Input form :
① You can fine tune the downstream respectively , You don't need all downstream tasks to fine tune together .② utilize prefix Prefix hints to indicate different learning tasks .
边栏推荐
- This diversion
- English语法_形容词/副词3级 - 倍数表达
- Boost.Asio Library
- On Data Mining
- Naoqi robot summary 27
- Why can deeplab v3+ be a God? (the explanation of the paper includes super detailed notes + Chinese English comparison + pictures)
- English语法_名词 - 分类
- webcodecs
- AcWing 271. Teacher Yang's photographic arrangement [multidimensional DP]
- 企业级自定义表单引擎解决方案(十二)--表单规则引擎2
猜你喜欢

Codeforces Round #803 (Div. 2) C. 3SUM Closure

2022-2028 global marking ink industry research and trend analysis report

NFT new opportunity, multimedia NFT aggregation platform okaleido will be launched soon

Why can deeplab v3+ be a God? (the explanation of the paper includes super detailed notes + Chinese English comparison + pictures)

Opencv learning notes (continuously updated)

Sensor debugging process

2022-2028 global aircraft head up display (HUD) industry research and trend analysis report

Install apache+php+mysql+phpmyadmin xampp and its error resolution

English語法_名詞 - 分類

Xception for deeplab v3+ (including super detailed code comments and original drawing of the paper)
随机推荐
Bidding procurement scheme management of Oracle project management system
2022-2028 global copper foil (thickness 12 μ M) industry research and trend analysis report
After the festival, a large number of people change careers. Is it still time to be 30? Listen to the experience of the past people
Codeforces Round #803 (Div. 2) C. 3SUM Closure
How to disable the clear button of ie10 insert text box- How can I disable the clear button that IE10 inserts into textboxes?
Redis core technology and practice - learning notes (VIII) sentinel cluster: sentinel hung up
Read the paper glodyne global topology preserving dynamic network embedding
Change the single node of Postgres database into master-slave
[combinatorics] generating function (positive integer splitting | repeated ordered splitting | non repeated ordered splitting | proof of the number of repeated ordered splitting schemes)
[combinatorics] exponential generating function (properties of exponential generating function | exponential generating function solving multiple set arrangement)
Torch learning notes (5) -- autograd
Niuke monthly race 31 minus integer
Redis core technology and practice - learning notes (VII) sentinel mechanism
Data analysis is popular on the Internet, and the full version of "Introduction to data science" is free to download
How to expand the capacity of golang slice slice
Summary and Reflection on the third week of winter vacation
[combinatorics] exponential generating function (example 2 of solving multiple set permutation with exponential generating function)
WebView module manages the application window interface to realize the logical control and management operation of multiple windows (Part 1)
[combinatorics] generating function (property summary | important generating function)*
Line by line explanation of yolox source code of anchor free series network (5) -- mosaic data enhancement and mathematical understanding