当前位置:网站首页>Transformer T5 model read slowly
Transformer T5 model read slowly
2022-07-03 18:34:00 【Dongxuan】
Transformer T5 Read the model slowly
Code address :https://github.com/google-research/text-to-text-transfer-transformer
Other reference addresses :
Start with his thesis , But many pages ...
Address of thesis :https://arxiv.org/abs/1910.10683
One 、 A rough understanding of the model
T5 The origin of , Namely prompt Ideological evolution , Convert all downstream tasks into one task Let all tasks share one solution ( It greatly reduces the amount of parameters , If 2 A mission , Each task 10 Ten thousand parameters , Then this generative method is still 10 ten thousand , The traditional downstream model needs 20 Ten thousand parameters ), Generative to generate answers . As for the loss function, it depends on the later design .( What is shown in the following figure means the same )
Two 、 Some basic settings of the model
The dataset is C4 A big data set of web crawlers , The overall model is based on Transformer The generation model of .
1. Model framework :
It is similar to classic Transformer The differences between models are 3 spot ,
2. Input form :
① You can fine tune the downstream respectively , You don't need all downstream tasks to fine tune together .② utilize prefix Prefix hints to indicate different learning tasks .
边栏推荐
- Typescript official website tutorial
- SDNUOJ1015
- Solve the problem of inaccurate network traffic monitored by ZABBIX with SNMP
- 图像24位深度转8位深度
- PHP MySQL order by keyword
- English語法_名詞 - 分類
- 2022-2028 global scar care product industry research and trend analysis report
- English grammar_ Noun classification
- What is SQL get connection
- NFT new opportunity, multimedia NFT aggregation platform okaleido will be launched soon
猜你喜欢
How to expand the capacity of golang slice slice
Module 9 operation
After the festival, a large number of people change careers. Is it still time to be 30? Listen to the experience of the past people
English语法_名词 - 分类
Line by line explanation of yolox source code of anchor free series network (5) -- mosaic data enhancement and mathematical understanding
NFT新的契机,多媒体NFT聚合平台OKALEIDO即将上线
How to track the real-time trend of Bank of London
Computer graduation project PHP library book borrowing management system
NFT new opportunity, multimedia NFT aggregation platform okaleido will be launched soon
Sensor debugging process
随机推荐
How to expand the capacity of golang slice slice
Redis core technology and practice - learning notes (VIII) sentinel cluster: sentinel hung up
[enumeration] annoying frogs always step on my rice fields: (who is the most hateful? (POJ hundred practice 2812)
网格图中递增路径的数目[dfs逆向路径+记忆dfs]
win32:堆破坏的dump文件分析
Okaleido, a multimedia NFT aggregation platform, is about to go online, and a new NFT era may come
PHP MySQL Update
Line by line explanation of yolox source code of anchor free series network (5) -- mosaic data enhancement and mathematical understanding
Torch learning notes (2) -- 11 common operation modes of tensor
Torch learning notes (1) -- 19 common ways to create tensor
统计图像中各像素值的数量
[combinatorics] generating function (positive integer splitting | unordered non repeated splitting example)
图像24位深度转8位深度
Use of unsafe class
[combinatorics] generating function (commutative property | derivative property | integral property)
Sensor 调试流程
Boost. Asio Library
Grammaire anglaise Nom - Classification
Niuke monthly race 31 minus integer
Opencv learning notes (continuously updated)