当前位置:网站首页>Transformer T5 model read slowly
Transformer T5 model read slowly
2022-07-03 18:34:00 【Dongxuan】
Transformer T5 Read the model slowly
Code address :https://github.com/google-research/text-to-text-transfer-transformer
Other reference addresses :
Start with his thesis , But many pages ...
Address of thesis :https://arxiv.org/abs/1910.10683

One 、 A rough understanding of the model
T5 The origin of , Namely prompt Ideological evolution , Convert all downstream tasks into one task Let all tasks share one solution ( It greatly reduces the amount of parameters , If 2 A mission , Each task 10 Ten thousand parameters , Then this generative method is still 10 ten thousand , The traditional downstream model needs 20 Ten thousand parameters ), Generative to generate answers . As for the loss function, it depends on the later design .( What is shown in the following figure means the same )

Two 、 Some basic settings of the model
The dataset is C4 A big data set of web crawlers , The overall model is based on Transformer The generation model of .
1. Model framework :
It is similar to classic Transformer The differences between models are 3 spot ,
2. Input form :
① You can fine tune the downstream respectively , You don't need all downstream tasks to fine tune together .② utilize prefix Prefix hints to indicate different learning tasks .
边栏推荐
- Should I be laid off at the age of 40? IBM is suspected of age discrimination, calling its old employees "dinosaurs" and planning to dismiss, but the employees can't refute it
- 204. Count prime
- Solve the problem of inaccurate network traffic monitored by ZABBIX with SNMP
- Computer graduation design PHP makeup sales Beauty shopping mall
- PHP determines which constellation it belongs to today
- Torch learning notes (7) -- take lenet as an example for dataload operation (detailed explanation + reserve knowledge supplement)
- Self executing function
- Okaleido, a multimedia NFT aggregation platform, is about to go online, and a new NFT era may come
- 2022-2028 global aircraft head up display (HUD) industry research and trend analysis report
- Module 9 operation
猜你喜欢

Data analysis is popular on the Internet, and the full version of "Introduction to data science" is free to download

2022-2028 global aircraft head up display (HUD) industry research and trend analysis report

NFT新的契机,多媒体NFT聚合平台OKALEIDO即将上线

Have you learned the correct expression posture of programmers on Valentine's day?

论文阅读 GloDyNE Global Topology Preserving Dynamic Network Embedding

Redis cache avalanche, penetration, breakdown

2022-2028 global sepsis treatment drug industry research and trend analysis report

English grammar_ Adjective / adverb Level 3 - multiple expression

After the festival, a large number of people change careers. Is it still time to be 30? Listen to the experience of the past people

Nodejs (01) - introductory tutorial
随机推荐
Torch learning notes (7) -- take lenet as an example for dataload operation (detailed explanation + reserve knowledge supplement)
[combinatorics] generating function (use generating function to solve the number of solutions of indefinite equation)
Golang string (string) and byte array ([]byte) are converted to each other
Have you learned the correct expression posture of programmers on Valentine's day?
网格图中递增路径的数目[dfs逆向路径+记忆dfs]
CV in transformer learning notes (continuously updated)
PHP MySQL preprocessing statement
Count the number of pixel values in the image
English语法_名词 - 分类
编程中常见的 Foo 是什么意思?
Self executing function
圖像24比特深度轉8比特深度
Computer graduation project PHP library book borrowing management system
The second largest gay dating website in the world was exposed, and the status of programmers in 2022
Xception for deeplab v3+ (including super detailed code comments and original drawing of the paper)
Torch learning notes (5) -- autograd
English grammar_ Adjective / adverb Level 3 - multiple expression
统计图像中各像素值的数量
[combinatorics] exponential generating function (concept of exponential generating function | permutation number exponential generating function = combinatorial number ordinary generating function | e
AcWing 271. Teacher Yang's photographic arrangement [multidimensional DP]