当前位置:网站首页>Prefix-Tuning: Optimizing Continuous Prompts for Generation
Prefix-Tuning: Optimizing Continuous Prompts for Generation
2022-07-28 03:41:00 【HDU-Dade】
List of articles
Reference resources
Prefix-Tuning: Optimizing Continuous Prompts for Generation Explanation of the author's little sister
In-context Learning

advantage
- Just write different tips for different tasks , No task specific training is required
shortcoming
But you can't use very large data sets , Such as GPT-3 There is a bounded context window , Only a limited number of token, So when we have a training set longer than the context window , Context learning cannot make full use of this training set
We have to manually prompt , These manually written tips may not be the best
GPT-3 It cannot be well extended to smaller models
Prefix-tuning

Freeze the pre training language model , Optimize prefix only , Only store this very small prefix for each task . As the number of tasks increases , The cost is very small
Prefix can be trained . It is not necessary to specify
Context learning is a unique framework , Only applicable to large models . Prefix learning can make prompt Extended to smaller models
Related Work

Tuning the top k layers
adjustment top k Layers are a common practice in large fine-tuning models . Usually k be equal to 1 or 2. Adjust the parameter quantity to 20% Because we have to adjust the language model header that contains many parametersAdapter-tuning ( Also known as lightweight trim )
Another effective way to adjust the parameters of the language model for downstream tasks , Freeze pre training parameters , And in LM Add some trainable between each layer of mlp layer
Prefix-tuning-intuition

Optimized for discrete instructions
Optimized for continuous word embedding 
Optimize prefix activation of all layers 
Fine-tuning

Connect x and y In order to obtain z, adopt Autoregresive LM, Calculate the activation vector at each time step h i h_i hi, therefore h i h_i hi By doing context activation and time distribution i i i Input to calculate
The goal is every generation y y y Sum of logarithmic probabilities of each marker in

Mission :
table-to-text Mission : Input X X X Represents a linear table , Output Y Y Y Represents a short text ;
Autoregressive model : At some point i i i,Transformer The hidden state vectors of each layer of are spliced together and used to predict the next word ;
Overall adoption encoder-to-decoder framework ;
Can be token Optimized for continuous word embedding , Instead of optimizing discrete tags , Its effect will spread upward to all Transformer Activation layer , Then propagate to the right to subsequent Tags . This is more expressive than discrete prompts that need to match the embedding of real words . meanwhile , This is not as good as interfering with the expressiveness of all active layers , This avoids long-term dependence and includes more adjustable parameters . therefore ,Prefix-Tuning All layer parameters corresponding to the prefix part are optimized .
Add one prefix, The autoregressive model is expressed as z = [ p r e f i x ; x ; y ] z=[prefix;x;y] z=[prefix;x;y],encoder decoder The model is expressed as z = [ p r e f i x ; x ; p r e f i x ′ ; y ] ; z=[prefix;x;prefix′ ;y]; z=[prefix;x;prefix′;y];
Input part p r e f i x , x , y prefix, x, y prefix,x,y Of position id Respectively as P i d x , X i d x and Y i d x P_{idx},X_{idx} and Y_{idx} Pidx,Xidx and Yidx
prefix-tuning Initialize a trainable matrix , Write it down as P θ ∈ R ∣ P i d x ∣ × d i m ( h i ) P_\theta\in\mathbb{R}^{|P_{idx}|\times dim(h_i)} Pθ∈R∣Pidx∣×dim(hi)
Its dimension is prefix × Activate the dimension of the vectorh i h_i hi Used to store prefix parameters:
In the prefix section token, Training matrix designed by parameter selection
And the other parts token, The parameters are fixed and are parameters of the pre training language model .
Result(table-to-text)

table-to-text
prefix Performance ratio of adapt and fine-tuneing Better 





Application:Personalization


边栏推荐
- Airiot Q & A issue 6 | how to use the secondary development engine?
- Responsive high-end website template source code Gallery material resource download platform source code
- Push chart written by helm to harbor warehouse
- Unity backpack system
- [openvx] VX for basic use of objects_ matrix
- Win11 how to rename an audio device
- Unity简单实现对话功能
- 递归和非递归分别实现求第n个斐波那契数
- 如何让外网访问内网IP(esp8266网页使用)
- Server memory failure prediction can actually do this!
猜你喜欢

WordPress简约mkBlog博客主题模板v2.1

How to make the Internet access the intranet IP (used by esp8266 web pages)

一个仿win10蓝屏的404页面源码

Redis basic operation

Redis source code analysis (who says C language can't analyze it?)

ES6 从入门到精通 # 08:扩展的对象的功能

【力扣】1337.矩阵中战斗力最弱的k行
D2DEngine食用教程(4)———绘制文本

Weekly recommended short video: how to correctly understand the word "lean"?

20220727使用汇承科技的蓝牙模块HC-05配对手机进行蓝牙串口的演示
随机推荐
After 95, Alibaba P7 published the payroll: it's really heartbreaking
一篇文章掌握Postgresql中对于日期类数据的计算和处理
verticle-align行内元素垂直居中对齐
【OPENVX】对象基本使用之vx_matrix
动态规划——1049. 最后一块石头的重量 II
C language to achieve a dynamic version of the address book
贪心——55. 跳跃游戏
动态规划——416. 分割等和子集
20220726 at command test of Bluetooth module hc-05 of Huicheng Technology
高等数学(第七版)同济大学 习题3-5 个人解答
Leetcode brush question: dynamic planning 09 (weight of the last stone II)
Redis source code analysis (who says C language can't analyze it?)
【OPENVX】对象基本使用之vx_pyramid
20220727使用汇承科技的蓝牙模块HC-05配对手机进行蓝牙串口的演示
单调栈——42. 接雨水——面大厂必须会的困难题
MSGAN用于多种图像合成的模式搜索生成对抗网络---解决模式崩塌问题
玩一玩WolframAlpha计算知识引擎
An article grasps the calculation and processing of date data in PostgreSQL
MangoPapa 的实用小脚本(目录篇)
After reading MySQL database advanced practice (SQL xiaoxuzhu)