当前位置:网站首页>Transformer T5 model read slowly
Transformer T5 model read slowly
2022-07-03 18:34:00 【Dongxuan】
Transformer T5 Read the model slowly
Code address :https://github.com/google-research/text-to-text-transfer-transformer
Other reference addresses :
Start with his thesis , But many pages ...
Address of thesis :https://arxiv.org/abs/1910.10683

One 、 A rough understanding of the model
T5 The origin of , Namely prompt Ideological evolution , Convert all downstream tasks into one task Let all tasks share one solution ( It greatly reduces the amount of parameters , If 2 A mission , Each task 10 Ten thousand parameters , Then this generative method is still 10 ten thousand , The traditional downstream model needs 20 Ten thousand parameters ), Generative to generate answers . As for the loss function, it depends on the later design .( What is shown in the following figure means the same )

Two 、 Some basic settings of the model
The dataset is C4 A big data set of web crawlers , The overall model is based on Transformer The generation model of .
1. Model framework :
It is similar to classic Transformer The differences between models are 3 spot ,
2. Input form :
① You can fine tune the downstream respectively , You don't need all downstream tasks to fine tune together .② utilize prefix Prefix hints to indicate different learning tasks .
边栏推荐
- Read the paper glodyne global topology preserving dynamic network embedding
- 4. Load balancing and dynamic static separation
- [combinatorics] exponential generating function (example of exponential generating function solving multiple set arrangement)
- On Data Mining
- Redis cache avalanche, penetration, breakdown
- [combinatorics] generating function (generating function application scenario | using generating function to solve recursive equation)
- Should I be laid off at the age of 40? IBM is suspected of age discrimination, calling its old employees "dinosaurs" and planning to dismiss, but the employees can't refute it
- Image 24 bits de profondeur à 8 bits de profondeur
- win32:堆破壞的dump文件分析
- Line by line explanation of yolox source code of anchor free series network (5) -- mosaic data enhancement and mathematical understanding
猜你喜欢

CTO and programmer were both sentenced for losing control of the crawler

The vscode code is automatically modified to a compliance code when it is formatted and saved

Read the paper glodyne global topology preserving dynamic network embedding

2022-2028 global plasmid DNA cdmo industry research and trend analysis report

Computer graduation design PHP makeup sales Beauty shopping mall

What kind of experience is it when the Institute earns 20000 yuan a month?
![[combinatorics] dislocation problem (recursive formula | general term formula | derivation process)*](/img/e8/67961bf8a589869bde2a0aa3e09605.jpg)
[combinatorics] dislocation problem (recursive formula | general term formula | derivation process)*

PHP MySQL inserts data

2022-2028 global lithium battery copper foil industry research and trend analysis report

English语法_形容词/副词3级 - 倍数表达
随机推荐
[combinatorics] exponential generating function (properties of exponential generating function | exponential generating function solving multiple set arrangement)
Typescript official website tutorial
[combinatorics] generating function (property summary | important generating function)*
[combinatorics] generating function (generating function application scenario | using generating function to solve recursive equation)
Boost. Asio Library
Change the single node of Postgres database into master-slave
Why can deeplab v3+ be a God? (the explanation of the paper includes super detailed notes + Chinese English comparison + pictures)
[enumeration] annoying frogs always step on my rice fields: (who is the most hateful? (POJ hundred practice 2812)
2022-2028 global sepsis treatment drug industry research and trend analysis report
Bloom filter [proposed by bloom in 1970; redis cache penetration solution]
Gao Qing, Beijing University of Aeronautics and Astronautics: CIM is a natural quantum computing platform for graph data processing
Niuke monthly race 31 minus integer
Module 9 operation
win32:堆破壞的dump文件分析
How to track the real-time trend of Bank of London
041. (2.10) talk about manpower outsourcing
Raft 日志复制
198. Looting - Dynamic Planning
What problems can cross-border e-commerce sellers solve with multi platform ERP management system
On Data Mining