当前位置:网站首页>NLP four paradigms: paradigm 1: fully supervised learning in the era of non neural networks (Feature Engineering); Paradigm 2: fully supervised learning based on neural network (Architecture Engineeri
NLP four paradigms: paradigm 1: fully supervised learning in the era of non neural networks (Feature Engineering); Paradigm 2: fully supervised learning based on neural network (Architecture Engineeri
2022-07-03 16:35:00 【u013250861】
natural language processing (Natural Language Processing,NLP) It's computer science , Artificial intelligence , Linguistics is the field of interaction between computers and human natural language , It is an important direction in the field of computer science and artificial intelligence .
Natural language processing has a long history , As early as 1949 In the year , American Weaver put forward the design scheme of machine translation , It can be regarded as the beginning of the field of natural language processing , Since then, natural language processing has been developing , In the last century, the methods in this field are mainly rule-based methods and statistical methods , This kind of method is inefficient , Labor cost , And cannot handle large data sets , Therefore, the field of natural language processing has been tepid .
2008 Since then , With the deep learning, it has achieved success in the field of speech recognition and image processing , Researchers began to use deep learning methods to deal with natural language processing problems , From the initial word vector , To 2013 Year of word2vec, Until then 2018 Year of bert, Finally, up to now prompt, Natural language processing technology has developed rapidly in the past decade .
CMU Dr. liupengfei of summarized four paradigms in the development of natural language processing technology , Each paradigm represents a type of natural language processing , This article combines some simple examples , To sum up NLP Four paradigms in .
One 、 First normal form : Fully supervised learning in the age of non neural networks ( Feature Engineering )
The first paradigm refers to before the introduction of Neural Networks NLP How to deal with the field , Extract some features from natural language corpus , Use specific rules or Mathematics 、 Statistical models to match and utilize features , And then complete a specific NLP Mission . Common methods such as the following methods to classify sequences 、 Sequence labeling and other tasks :
- Bayes
- veterbi Algorithm
- hidden Markov model
Two 、 Second normal form : Fully supervised learning based on neural network ( Architecture Engineering )
The second paradigm refers to After the introduction of neural network , Before the pre training model appeared NLP Research methods in the field .
Such methods do not need to manually set features and rules , It saves a lot of human resources , However, it is still necessary to manually design a suitable neural network architecture to train the data set . Common methods like CNN、RNN、 In machine translation Seq2Seq Models and so on .
3、 ... and 、 Third normal form : Preliminary training , Fine tuning paradigm ( Target project )
The third paradigm refers to pre training on a large unsupervised data set , Learn some common grammatical and semantic features , Then use the pre trained model on the specific data set of downstream tasks fine-tuning, Make the model more suitable for downstream tasks .
GPT、Bert、XLNet And other models belong to the third paradigm , Its characteristic is that it does not need a large amount of supervised downstream task data , The model is mainly trained on large unsupervised data , Only a small amount of downstream task data is needed to fine tune a small amount of network layer .
Four 、 Fourth normal form : Preliminary training , Tips , Prediction paradigm (Prompt engineering )
The fourth paradigm refers to Redefine the modeling method of downstream tasks : Through the right prompt( Prompt 、 The clue word ) To solve downstream tasks directly on the pre training model , This model requires very little ( You don't even need to ) Downstream task data , Make a small sample 、 Zero sample learning becomes possible .
The third paradigm mentioned above fine-tuning The process is to adjust the pre training model , Make it more suitable for downstream tasks , Then the fourth paradigm is just the opposite ,prompt The process is to adjust downstream tasks , Make it better match the pre training model . That is, the third paradigm is that the pre training model accommodates downstream tasks , The fourth paradigm is the downstream task accommodation pre training model .
So how to transform the downstream tasks to better accommodate the pre training model ? With Bert For example ,Bert There are two pre training tasks ,Masked LM Cloze task and Next Sentence Prediction Next sentence prediction task , For these two different and training tasks ,prompt The types of can also be divided into two categories :
- Cloze prompt
- Fill in the blanks prompt: Such as emotion classification task , Input is 【 This toy is good 】, Can be prompt Change to : This toy is good , too 【Z】,Z The output of is “ Great ”.
- classification 、 matching 、 choice Related tasks often use cloze prompt( Self coding )
- Prefix hint prompt
- Prefix hint prompt: For example, machine translation tasks , Input is 【 study hard 】, Can be prompt Change to : study hard , Translate into English :【Z】,Z The output of is "good good english".
- Generate Related tasks often use prefix prompts prompt( Autoregression ).
1、 2. Classified tasks prompt Example
Suppose there is an emotion classification task , Input sentences X, Output their emotional tags Y{ positive , Negative }.
For example, for " The service in this restaurant is really good ." This sentence classifies emotions ,
- First construct prompt:_____ Satisfied , The service in this restaurant is really good , Change the input sentence into this prompt,
- Then put the label Y It maps to { very , No }( Very satisfied represents positive , Dissatisfaction means negative ).
Due to the normal Bert Of MLM The pre training task is to predict the whole vocabulary , And the above prompt Only two words predicted { very , No }, Therefore, we need to adjust the vocabulary , Then the cross entropy loss function is used to model .
therefore , The whole process of the fourth paradigm is :
- First construct the input as prompt,
- Then construct the corresponding prompt The label of ,
- Then map the input sentence 、 Map the original label of the downstream task ,
- Use mapped prompt Input and fine tune the model with the new label .
2、 Match task prompt Example
For example, judgment " I went to Beijing " and " I went to Shanghai " The connection between these two sentences , You can use cloze to construct prompt:
- I went to Beijing ?_____, I went to Shanghai .
Map the two input sentences to prompt, Then remove the label from { matching , Mismatch } It maps to { Yes , No }, With the help of MLM Model training .
3、 There are three main problems in the research field of the fourth paradigm
There are three main problems in the research field of the fourth paradigm :
- For input : How to construct prompt, Better modeling of downstream tasks , To be able to Stimulate the potential of the pre training model ;
- For output : how Map the original tag to prompt Corresponding new label ;
- For the model : How to fine tune the pre training model ;
3.1 structure prompt
First , about prompt, Mainly refers to " Attach a supplementary description to the original input , Through this supplementary description statement, we can realize task transformation and task solution , This description statement and the original input together form a semantically reasonable statement as prompt The input of ".
in other words , For input text x, It can be generated in two steps prompt:
- First step : Use a template ( Natural language fragments ), The template contains two empty locations , Used to fill in x And generate answer text z
- The second step is to fill the input into x Location
Reference material :
NLP Four paradigms of
边栏推荐
- [proteus simulation] 8 × 8LED dot matrix screen imitates elevator digital scrolling display
- EditText request focus - EditText request focus
- 特征多项式与常系数齐次线性递推
- There are several APIs of airtest and poco that are easy to use wrong in "super". See if you have encountered them
- arduino-esp32:LVGL项目(一)整体框架
- Data driving of appium framework for mobile terminal automated testing
- Multithread 02 thread join
- 线程池执行定时任务
- Thread pool executes scheduled tasks
- Add color to the interface automation test framework and realize the enterprise wechat test report
猜你喜欢
Slam learning notes - build a complete gazebo multi machine simulation slam from scratch (4)
消息队列消息丢失和消息重复发送的处理策略
8个酷炫可视化图表,快速写出老板爱看的可视化分析报告
[solved] access denied for user 'root' @ 'localhost' (using password: yes)
[proteus simulation] 74hc595+74ls154 drive display 16x16 dot matrix
Uploads labs range (with source code analysis) (under update)
近视:摘镜or配镜?这些问题必须先了解清楚
NSQ source code installation and operation process
Mb10m-asemi rectifier bridge mb10m
Mysql 将逗号隔开的属性字段数据由列转行
随机推荐
Aike AI frontier promotion (7.3)
初试scikit-learn库
Slam learning notes - build a complete gazebo multi machine simulation slam from scratch (4)
(Supplement) double pointer topic
Chinese translation of Tagore's floating birds (1~10)
什么是质押池,如何进行质押呢?
《天天数学》连载56:二月二十五日
【LeetCode】94. Middle order traversal of binary tree
Mongodb installation and basic operation
word 退格键删除不了选中文本,只能按delete
PHP secondary domain name session sharing scheme
Develop team OKR in the way of "crowdfunding"
MongoDB 的安装和基本操作
Slam learning notes - build a complete gazebo multi machine simulation slam from scratch (II)
Threejs Part 2: vertex concept, geometry structure
Initial test of scikit learn Library
深入理解 SQL 中的 Grouping Sets 语句
Construction practice camp - graduation summary of phase 6
Qt插件之自定义插件构建和使用
Hibernate的缓存机制/会话级缓存机制