当前位置:网站首页>Transformer T5 model read slowly
Transformer T5 model read slowly
2022-07-03 18:34:00 【Dongxuan】
Transformer T5 Read the model slowly
Code address :https://github.com/google-research/text-to-text-transfer-transformer
Other reference addresses :
Start with his thesis , But many pages ...
Address of thesis :https://arxiv.org/abs/1910.10683

One 、 A rough understanding of the model
T5 The origin of , Namely prompt Ideological evolution , Convert all downstream tasks into one task Let all tasks share one solution ( It greatly reduces the amount of parameters , If 2 A mission , Each task 10 Ten thousand parameters , Then this generative method is still 10 ten thousand , The traditional downstream model needs 20 Ten thousand parameters ), Generative to generate answers . As for the loss function, it depends on the later design .( What is shown in the following figure means the same )

Two 、 Some basic settings of the model
The dataset is C4 A big data set of web crawlers , The overall model is based on Transformer The generation model of .
1. Model framework :
It is similar to classic Transformer The differences between models are 3 spot ,
2. Input form :
① You can fine tune the downstream respectively , You don't need all downstream tasks to fine tune together .② utilize prefix Prefix hints to indicate different learning tasks .
边栏推荐
- Unsafe类的使用
- A. Odd Selection【BruteForce】
- 多媒体NFT聚合平台OKALEIDO即将上线,全新的NFT时代或将来临
- Torch learning notes (2) -- 11 common operation modes of tensor
- SDNUOJ1015
- 毕业总结
- Torch learning notes (6) -- logistic regression model (self training)
- Self executing function
- Torch learning notes (4) -- torch's dynamic calculation diagram
- English語法_名詞 - 分類
猜你喜欢

2022-2028 global aircraft head up display (HUD) industry research and trend analysis report

How does GCN use large convolution instead of small convolution? (the explanation of the paper includes super detailed notes + Chinese English comparison + pictures)

Naoqi robot summary 27

English语法_形容词/副词3级 - 倍数表达

Read the paper glodyne global topology preserving dynamic network embedding

Computer graduation design PHP campus address book telephone number inquiry system

12、 Service management
![Golang string (string) and byte array ([]byte) are converted to each other](/img/41/20f445ef9de4adf2a2aa97828cb67f.jpg)
Golang string (string) and byte array ([]byte) are converted to each other

Sensor debugging process

Valentine's day, send you a little red flower~
随机推荐
Module 9 operation
Count the number of pixel values in the image
Grammaire anglaise Nom - Classification
PHP MySQL Update
Solve the problem of inaccurate network traffic monitored by ZABBIX with SNMP
Theoretical description of linear equations and summary of methods for solving linear equations by eigen
How do microservices aggregate API documents? This wave of operation is too good
How to draw non overlapping bubble chart in MATLAB
企业级自定义表单引擎解决方案(十二)--表单规则引擎2
[Tongxin UOS] scanner device management driver installation
Line by line explanation of yolox source code of anchor free series network (6) -- mixup data enhancement
PHP MySQL create database
ES7 - Optimization of promise
虚拟机和开发板互Ping问题
Gao Qing, Beijing University of Aeronautics and Astronautics: CIM is a natural quantum computing platform for graph data processing
The number of incremental paths in the grid graph [dfs reverse path + memory dfs]
What problems can cross-border e-commerce sellers solve with multi platform ERP management system
[combinatorics] generating function (use generating function to solve the number of solutions of indefinite equation example 2 | extended to integer solution)
Win 11 major updates, new features love love.
Torch learning notes (5) -- autograd