当前位置:网站首页>Transformer T5 model read slowly
Transformer T5 model read slowly
2022-07-03 18:34:00 【Dongxuan】
Transformer T5 Read the model slowly
Code address :https://github.com/google-research/text-to-text-transfer-transformer
Other reference addresses :
Start with his thesis , But many pages ...
Address of thesis :https://arxiv.org/abs/1910.10683

One 、 A rough understanding of the model
T5 The origin of , Namely prompt Ideological evolution , Convert all downstream tasks into one task Let all tasks share one solution ( It greatly reduces the amount of parameters , If 2 A mission , Each task 10 Ten thousand parameters , Then this generative method is still 10 ten thousand , The traditional downstream model needs 20 Ten thousand parameters ), Generative to generate answers . As for the loss function, it depends on the later design .( What is shown in the following figure means the same )

Two 、 Some basic settings of the model
The dataset is C4 A big data set of web crawlers , The overall model is based on Transformer The generation model of .
1. Model framework :
It is similar to classic Transformer The differences between models are 3 spot ,
2. Input form :
① You can fine tune the downstream respectively , You don't need all downstream tasks to fine tune together .② utilize prefix Prefix hints to indicate different learning tasks .
边栏推荐
- [enumeration] annoying frogs always step on my rice fields: (who is the most hateful? (POJ hundred practice 2812)
- An academic paper sharing and approval system based on PHP for computer graduation design
- Recent learning experience
- The second largest gay dating website in the world was exposed, and the status of programmers in 2022
- [combinatorics] exponential generating function (proving that the exponential generating function solves the arrangement of multiple sets)
- Computer graduation design PHP makeup sales Beauty shopping mall
- Redis core technology and practice - learning notes (VI) how to achieve data consistency between master and slave Libraries
- Codeforces Round #803 (Div. 2) C. 3SUM Closure
- How does GCN use large convolution instead of small convolution? (the explanation of the paper includes super detailed notes + Chinese English comparison + pictures)
- Multifunctional web file manager filestash
猜你喜欢
![[combinatorics] dislocation problem (recursive formula | general term formula | derivation process)*](/img/e8/67961bf8a589869bde2a0aa3e09605.jpg)
[combinatorics] dislocation problem (recursive formula | general term formula | derivation process)*

win32:堆破坏的dump文件分析

Mysql45 lecture learning notes (II)

How many convolution methods does deep learning have? (including drawings)

Naoqi robot summary 27

2022-2028 global physiotherapy clinic industry research and trend analysis report

G1 garbage collector of garbage collector

Should I be laid off at the age of 40? IBM is suspected of age discrimination, calling its old employees "dinosaurs" and planning to dismiss, but the employees can't refute it

Opencv learning notes (continuously updated)

CV in transformer learning notes (continuously updated)
随机推荐
This diversion
2022-2028 global sepsis treatment drug industry research and trend analysis report
Prototype inheritance..
How does GCN use large convolution instead of small convolution? (the explanation of the paper includes super detailed notes + Chinese English comparison + pictures)
AcWing 271. Teacher Yang's photographic arrangement [multidimensional DP]
2022-2028 global copper foil (thickness 12 μ M) industry research and trend analysis report
Naoqi robot summary 27
Okaleido, a multimedia NFT aggregation platform, is about to go online, and a new NFT era may come
[combinatorics] generating function (property summary | important generating function)*
What does foo mean in programming?
Mysql45 lecture learning notes (II)
Theoretical description of linear equations and summary of methods for solving linear equations by eigen
What kind of experience is it when the Institute earns 20000 yuan a month?
圖像24比特深度轉8比特深度
Self executing function
12、 Service management
2022-2028 global aircraft head up display (HUD) industry research and trend analysis report
Sepconv (separable revolution) code recurrence
Read the paper glodyne global topology preserving dynamic network embedding
Install apache+php+mysql+phpmyadmin xampp and its error resolution