当前位置:网站首页>Prefix-Tuning: Optimizing Continuous Prompts for Generation
Prefix-Tuning: Optimizing Continuous Prompts for Generation
2022-07-28 03:41:00 【HDU-Dade】
List of articles
Reference resources
Prefix-Tuning: Optimizing Continuous Prompts for Generation Explanation of the author's little sister
In-context Learning

advantage
- Just write different tips for different tasks , No task specific training is required
shortcoming
But you can't use very large data sets , Such as GPT-3 There is a bounded context window , Only a limited number of token, So when we have a training set longer than the context window , Context learning cannot make full use of this training set
We have to manually prompt , These manually written tips may not be the best
GPT-3 It cannot be well extended to smaller models
Prefix-tuning

Freeze the pre training language model , Optimize prefix only , Only store this very small prefix for each task . As the number of tasks increases , The cost is very small
Prefix can be trained . It is not necessary to specify
Context learning is a unique framework , Only applicable to large models . Prefix learning can make prompt Extended to smaller models
Related Work

Tuning the top k layers
adjustment top k Layers are a common practice in large fine-tuning models . Usually k be equal to 1 or 2. Adjust the parameter quantity to 20% Because we have to adjust the language model header that contains many parametersAdapter-tuning ( Also known as lightweight trim )
Another effective way to adjust the parameters of the language model for downstream tasks , Freeze pre training parameters , And in LM Add some trainable between each layer of mlp layer
Prefix-tuning-intuition

Optimized for discrete instructions
Optimized for continuous word embedding 
Optimize prefix activation of all layers 
Fine-tuning

Connect x and y In order to obtain z, adopt Autoregresive LM, Calculate the activation vector at each time step h i h_i hi, therefore h i h_i hi By doing context activation and time distribution i i i Input to calculate
The goal is every generation y y y Sum of logarithmic probabilities of each marker in

Mission :
table-to-text Mission : Input X X X Represents a linear table , Output Y Y Y Represents a short text ;
Autoregressive model : At some point i i i,Transformer The hidden state vectors of each layer of are spliced together and used to predict the next word ;
Overall adoption encoder-to-decoder framework ;
Can be token Optimized for continuous word embedding , Instead of optimizing discrete tags , Its effect will spread upward to all Transformer Activation layer , Then propagate to the right to subsequent Tags . This is more expressive than discrete prompts that need to match the embedding of real words . meanwhile , This is not as good as interfering with the expressiveness of all active layers , This avoids long-term dependence and includes more adjustable parameters . therefore ,Prefix-Tuning All layer parameters corresponding to the prefix part are optimized .
Add one prefix, The autoregressive model is expressed as z = [ p r e f i x ; x ; y ] z=[prefix;x;y] z=[prefix;x;y],encoder decoder The model is expressed as z = [ p r e f i x ; x ; p r e f i x ′ ; y ] ; z=[prefix;x;prefix′ ;y]; z=[prefix;x;prefix′;y];
Input part p r e f i x , x , y prefix, x, y prefix,x,y Of position id Respectively as P i d x , X i d x and Y i d x P_{idx},X_{idx} and Y_{idx} Pidx,Xidx and Yidx
prefix-tuning Initialize a trainable matrix , Write it down as P θ ∈ R ∣ P i d x ∣ × d i m ( h i ) P_\theta\in\mathbb{R}^{|P_{idx}|\times dim(h_i)} Pθ∈R∣Pidx∣×dim(hi)
Its dimension is prefix × Activate the dimension of the vectorh i h_i hi Used to store prefix parameters:
In the prefix section token, Training matrix designed by parameter selection
And the other parts token, The parameters are fixed and are parameters of the pre training language model .
Result(table-to-text)

table-to-text
prefix Performance ratio of adapt and fine-tuneing Better 





Application:Personalization


边栏推荐
- 每周推荐短视频:如何正确理解“精益”这个词?
- ES6 从入门到精通 # 07:解构赋值
- MSGAN用于多种图像合成的模式搜索生成对抗网络---解决模式崩塌问题
- Tensorboard usage record
- Outlook 教程,如何在 Outlook 中使用颜色类别和提醒?
- 什么是Tor?Tor浏览器更新有什么用?
- Capacity expansion and reduction of RBD block storage device (VI)
- 【P4】 查看库文件两个历史版本的区别
- 20220727 use the Bluetooth module hc-05 of Huicheng technology to pair mobile phones for Bluetooth serial port demonstration
- D2dengine edible tutorial (4) -- draw text
猜你喜欢

12月份PMP考试首次采用新考纲,该怎么学?

"Xiaodeng" network equipment monitoring in operation and maintenance

ASEMI整流桥GBPC3510在直流有刷电机中的妙用

Vertical align align the elements in the row are vertically centered

过亿资产地址被拉入黑名单?Tether地址冻结功能该怎么用?

Leetcode brush question: dynamic planning 09 (weight of the last stone II)

ES6 从入门到精通 # 08:扩展的对象的功能
D2DEngine食用教程(4)———绘制文本

高等数学(第七版)同济大学 习题3-4 个人解答(前8题)

Mysql基础篇(创建、管理、增删改表)
随机推荐
某宝模拟登录,减少二次验证的方法
D2DEngine食用教程(4)———绘制文本
Do you regret doing automated testing?
Mysql基础篇(创建、管理、增删改表)
Push chart written by helm to harbor warehouse
A treasure simulates login and reduces the method of secondary verification
Interface automation test, complete introduction
高等数学(第七版)同济大学 习题3-6 个人解答
AIRIOT答疑第6期|如何使用二次开发引擎?
【OPENVX】对象基本使用之vx_distribution
After 95, Alibaba P7 published the payroll: it's really heartbreaking
贪心——53. 最大子数组和
Vertical align align the elements in the row are vertically centered
Billions of asset addresses are blacklisted? How to use the tether address freezing function?
动态规划——474. 一和零
Light year admin background management system template
VBA reads the create document of SQL in batches to create tables
Tensorboard usage record
【OPENVX】对象基本使用之vx_convolution
Play WolframAlpha computing knowledge engine