当前位置:网站首页>NLP four paradigms: paradigm 1: fully supervised learning in the era of non neural networks (Feature Engineering); Paradigm 2: fully supervised learning based on neural network (Architecture Engineeri
NLP four paradigms: paradigm 1: fully supervised learning in the era of non neural networks (Feature Engineering); Paradigm 2: fully supervised learning based on neural network (Architecture Engineeri
2022-07-03 16:35:00 【u013250861】

natural language processing (Natural Language Processing,NLP) It's computer science , Artificial intelligence , Linguistics is the field of interaction between computers and human natural language , It is an important direction in the field of computer science and artificial intelligence .
Natural language processing has a long history , As early as 1949 In the year , American Weaver put forward the design scheme of machine translation , It can be regarded as the beginning of the field of natural language processing , Since then, natural language processing has been developing , In the last century, the methods in this field are mainly rule-based methods and statistical methods , This kind of method is inefficient , Labor cost , And cannot handle large data sets , Therefore, the field of natural language processing has been tepid .
2008 Since then , With the deep learning, it has achieved success in the field of speech recognition and image processing , Researchers began to use deep learning methods to deal with natural language processing problems , From the initial word vector , To 2013 Year of word2vec, Until then 2018 Year of bert, Finally, up to now prompt, Natural language processing technology has developed rapidly in the past decade .
CMU Dr. liupengfei of summarized four paradigms in the development of natural language processing technology , Each paradigm represents a type of natural language processing , This article combines some simple examples , To sum up NLP Four paradigms in .
One 、 First normal form : Fully supervised learning in the age of non neural networks ( Feature Engineering )
The first paradigm refers to before the introduction of Neural Networks NLP How to deal with the field , Extract some features from natural language corpus , Use specific rules or Mathematics 、 Statistical models to match and utilize features , And then complete a specific NLP Mission . Common methods such as the following methods to classify sequences 、 Sequence labeling and other tasks :
- Bayes
- veterbi Algorithm
- hidden Markov model

Two 、 Second normal form : Fully supervised learning based on neural network ( Architecture Engineering )
The second paradigm refers to After the introduction of neural network , Before the pre training model appeared NLP Research methods in the field .
Such methods do not need to manually set features and rules , It saves a lot of human resources , However, it is still necessary to manually design a suitable neural network architecture to train the data set . Common methods like CNN、RNN、 In machine translation Seq2Seq Models and so on .
3、 ... and 、 Third normal form : Preliminary training , Fine tuning paradigm ( Target project )
The third paradigm refers to pre training on a large unsupervised data set , Learn some common grammatical and semantic features , Then use the pre trained model on the specific data set of downstream tasks fine-tuning, Make the model more suitable for downstream tasks .
GPT、Bert、XLNet And other models belong to the third paradigm , Its characteristic is that it does not need a large amount of supervised downstream task data , The model is mainly trained on large unsupervised data , Only a small amount of downstream task data is needed to fine tune a small amount of network layer .
Four 、 Fourth normal form : Preliminary training , Tips , Prediction paradigm (Prompt engineering )
The fourth paradigm refers to Redefine the modeling method of downstream tasks : Through the right prompt( Prompt 、 The clue word ) To solve downstream tasks directly on the pre training model , This model requires very little ( You don't even need to ) Downstream task data , Make a small sample 、 Zero sample learning becomes possible .
The third paradigm mentioned above fine-tuning The process is to adjust the pre training model , Make it more suitable for downstream tasks , Then the fourth paradigm is just the opposite ,prompt The process is to adjust downstream tasks , Make it better match the pre training model . That is, the third paradigm is that the pre training model accommodates downstream tasks , The fourth paradigm is the downstream task accommodation pre training model .
So how to transform the downstream tasks to better accommodate the pre training model ? With Bert For example ,Bert There are two pre training tasks ,Masked LM Cloze task and Next Sentence Prediction Next sentence prediction task , For these two different and training tasks ,prompt The types of can also be divided into two categories :
- Cloze prompt
- Fill in the blanks prompt: Such as emotion classification task , Input is 【 This toy is good 】, Can be prompt Change to : This toy is good , too 【Z】,Z The output of is “ Great ”.
- classification 、 matching 、 choice Related tasks often use cloze prompt( Self coding )
- Prefix hint prompt
- Prefix hint prompt: For example, machine translation tasks , Input is 【 study hard 】, Can be prompt Change to : study hard , Translate into English :【Z】,Z The output of is "good good english".
- Generate Related tasks often use prefix prompts prompt( Autoregression ).
1、 2. Classified tasks prompt Example
Suppose there is an emotion classification task , Input sentences X, Output their emotional tags Y{ positive , Negative }.
For example, for " The service in this restaurant is really good ." This sentence classifies emotions ,
- First construct prompt:_____ Satisfied , The service in this restaurant is really good , Change the input sentence into this prompt,
- Then put the label Y It maps to { very , No }( Very satisfied represents positive , Dissatisfaction means negative ).
Due to the normal Bert Of MLM The pre training task is to predict the whole vocabulary , And the above prompt Only two words predicted { very , No }, Therefore, we need to adjust the vocabulary , Then the cross entropy loss function is used to model .
therefore , The whole process of the fourth paradigm is :
- First construct the input as prompt,
- Then construct the corresponding prompt The label of ,
- Then map the input sentence 、 Map the original label of the downstream task ,
- Use mapped prompt Input and fine tune the model with the new label .
2、 Match task prompt Example

For example, judgment " I went to Beijing " and " I went to Shanghai " The connection between these two sentences , You can use cloze to construct prompt:
- I went to Beijing ?_____, I went to Shanghai .
Map the two input sentences to prompt, Then remove the label from { matching , Mismatch } It maps to { Yes , No }, With the help of MLM Model training .
3、 There are three main problems in the research field of the fourth paradigm
There are three main problems in the research field of the fourth paradigm :
- For input : How to construct prompt, Better modeling of downstream tasks , To be able to Stimulate the potential of the pre training model ;
- For output : how Map the original tag to prompt Corresponding new label ;
- For the model : How to fine tune the pre training model ;
3.1 structure prompt
First , about prompt, Mainly refers to " Attach a supplementary description to the original input , Through this supplementary description statement, we can realize task transformation and task solution , This description statement and the original input together form a semantically reasonable statement as prompt The input of ".
in other words , For input text x, It can be generated in two steps prompt:
- First step : Use a template ( Natural language fragments ), The template contains two empty locations , Used to fill in x And generate answer text z
- The second step is to fill the input into x Location
Reference material :
NLP Four paradigms of
边栏推荐
- Thread pool executes scheduled tasks
- Construction practice camp - graduation summary of phase 6
- 什么是质押池,如何进行质押呢?
- PyTorch 1.12发布,正式支持苹果M1芯片GPU加速,修复众多Bug
- There are several APIs of airtest and poco that are easy to use wrong in "super". See if you have encountered them
- One article takes you to understand machine learning
- QT serial port UI design and solution to display Chinese garbled code
- [combinatorics] combinatorial identity (sum of combinatorial identity products 1 | sum of products 1 proof | sum of combinatorial identity products 2 | sum of products 2 proof)
- Everyone in remote office works together to realize cooperative editing of materials and development of documents | community essay solicitation
- Getting started with Message Oriented Middleware
猜你喜欢

Mb10m-asemi rectifier bridge mb10m

Basis of target detection (IOU)

A survey of state of the art on visual slam

Thread pool executes scheduled tasks

Getting started with Message Oriented Middleware

Slam learning notes - build a complete gazebo multi machine simulation slam from scratch (II)

Unreal_ Datatable implements ID self increment and sets rowname

Slam learning notes - build a complete gazebo multi machine simulation slam from scratch (4)

Interviewer: how does the JVM allocate and recycle off heap memory

记一次jar包冲突解决过程
随机推荐
记一次jar包冲突解决过程
Preventing/catching “IllegalArgumentException: parameter must be a descendant of this view” error
Record a jar package conflict resolution process
Stm32f103c8t6 firmware library lighting
Page dynamics [2]keyframes
Register in PHP_ Globals parameter settings
Nifi from introduction to practice (nanny level tutorial) - flow
TCP拥塞控制详解 | 3. 设计空间
8个酷炫可视化图表,快速写出老板爱看的可视化分析报告
There are several APIs of airtest and poco that are easy to use wrong in "super". See if you have encountered them
Myopia: take off or match glasses? These problems must be understood clearly first
Slam learning notes - build a complete gazebo multi machine simulation slam from scratch (II)
PyTorch 1.12发布,正式支持苹果M1芯片GPU加速,修复众多Bug
Initial test of scikit learn Library
Qt插件之自定义插件构建和使用
Mysql 将逗号隔开的属性字段数据由列转行
Uploads labs range (with source code analysis) (under update)
无心剑中译泰戈尔《漂鸟集(1~10)》
Expression of request header in different countries and languages
[combinatorics] non descending path problem (outline of non descending path problem | basic model of non descending path problem | non descending path problem expansion model 1 non origin starting poi