当前位置:网站首页>NLP four paradigms: paradigm 1: fully supervised learning in the era of non neural networks (Feature Engineering); Paradigm 2: fully supervised learning based on neural network (Architecture Engineeri
NLP four paradigms: paradigm 1: fully supervised learning in the era of non neural networks (Feature Engineering); Paradigm 2: fully supervised learning based on neural network (Architecture Engineeri
2022-07-03 16:35:00 【u013250861】
natural language processing (Natural Language Processing,NLP) It's computer science , Artificial intelligence , Linguistics is the field of interaction between computers and human natural language , It is an important direction in the field of computer science and artificial intelligence .
Natural language processing has a long history , As early as 1949 In the year , American Weaver put forward the design scheme of machine translation , It can be regarded as the beginning of the field of natural language processing , Since then, natural language processing has been developing , In the last century, the methods in this field are mainly rule-based methods and statistical methods , This kind of method is inefficient , Labor cost , And cannot handle large data sets , Therefore, the field of natural language processing has been tepid .
2008 Since then , With the deep learning, it has achieved success in the field of speech recognition and image processing , Researchers began to use deep learning methods to deal with natural language processing problems , From the initial word vector , To 2013 Year of word2vec, Until then 2018 Year of bert, Finally, up to now prompt, Natural language processing technology has developed rapidly in the past decade .
CMU Dr. liupengfei of summarized four paradigms in the development of natural language processing technology , Each paradigm represents a type of natural language processing , This article combines some simple examples , To sum up NLP Four paradigms in .
One 、 First normal form : Fully supervised learning in the age of non neural networks ( Feature Engineering )
The first paradigm refers to before the introduction of Neural Networks NLP How to deal with the field , Extract some features from natural language corpus , Use specific rules or Mathematics 、 Statistical models to match and utilize features , And then complete a specific NLP Mission . Common methods such as the following methods to classify sequences 、 Sequence labeling and other tasks :
- Bayes
- veterbi Algorithm
- hidden Markov model
Two 、 Second normal form : Fully supervised learning based on neural network ( Architecture Engineering )
The second paradigm refers to After the introduction of neural network , Before the pre training model appeared NLP Research methods in the field .
Such methods do not need to manually set features and rules , It saves a lot of human resources , However, it is still necessary to manually design a suitable neural network architecture to train the data set . Common methods like CNN、RNN、 In machine translation Seq2Seq Models and so on .
3、 ... and 、 Third normal form : Preliminary training , Fine tuning paradigm ( Target project )
The third paradigm refers to pre training on a large unsupervised data set , Learn some common grammatical and semantic features , Then use the pre trained model on the specific data set of downstream tasks fine-tuning, Make the model more suitable for downstream tasks .
GPT、Bert、XLNet And other models belong to the third paradigm , Its characteristic is that it does not need a large amount of supervised downstream task data , The model is mainly trained on large unsupervised data , Only a small amount of downstream task data is needed to fine tune a small amount of network layer .
Four 、 Fourth normal form : Preliminary training , Tips , Prediction paradigm (Prompt engineering )
The fourth paradigm refers to Redefine the modeling method of downstream tasks : Through the right prompt( Prompt 、 The clue word ) To solve downstream tasks directly on the pre training model , This model requires very little ( You don't even need to ) Downstream task data , Make a small sample 、 Zero sample learning becomes possible .
The third paradigm mentioned above fine-tuning The process is to adjust the pre training model , Make it more suitable for downstream tasks , Then the fourth paradigm is just the opposite ,prompt The process is to adjust downstream tasks , Make it better match the pre training model . That is, the third paradigm is that the pre training model accommodates downstream tasks , The fourth paradigm is the downstream task accommodation pre training model .
So how to transform the downstream tasks to better accommodate the pre training model ? With Bert For example ,Bert There are two pre training tasks ,Masked LM Cloze task and Next Sentence Prediction Next sentence prediction task , For these two different and training tasks ,prompt The types of can also be divided into two categories :
- Cloze prompt
- Fill in the blanks prompt: Such as emotion classification task , Input is 【 This toy is good 】, Can be prompt Change to : This toy is good , too 【Z】,Z The output of is “ Great ”.
- classification 、 matching 、 choice Related tasks often use cloze prompt( Self coding )
- Prefix hint prompt
- Prefix hint prompt: For example, machine translation tasks , Input is 【 study hard 】, Can be prompt Change to : study hard , Translate into English :【Z】,Z The output of is "good good english".
- Generate Related tasks often use prefix prompts prompt( Autoregression ).
1、 2. Classified tasks prompt Example
Suppose there is an emotion classification task , Input sentences X, Output their emotional tags Y{ positive , Negative }.
For example, for " The service in this restaurant is really good ." This sentence classifies emotions ,
- First construct prompt:_____ Satisfied , The service in this restaurant is really good , Change the input sentence into this prompt,
- Then put the label Y It maps to { very , No }( Very satisfied represents positive , Dissatisfaction means negative ).
Due to the normal Bert Of MLM The pre training task is to predict the whole vocabulary , And the above prompt Only two words predicted { very , No }, Therefore, we need to adjust the vocabulary , Then the cross entropy loss function is used to model .
therefore , The whole process of the fourth paradigm is :
- First construct the input as prompt,
- Then construct the corresponding prompt The label of ,
- Then map the input sentence 、 Map the original label of the downstream task ,
- Use mapped prompt Input and fine tune the model with the new label .
2、 Match task prompt Example
For example, judgment " I went to Beijing " and " I went to Shanghai " The connection between these two sentences , You can use cloze to construct prompt:
- I went to Beijing ?_____, I went to Shanghai .
Map the two input sentences to prompt, Then remove the label from { matching , Mismatch } It maps to { Yes , No }, With the help of MLM Model training .
3、 There are three main problems in the research field of the fourth paradigm
There are three main problems in the research field of the fourth paradigm :
- For input : How to construct prompt, Better modeling of downstream tasks , To be able to Stimulate the potential of the pre training model ;
- For output : how Map the original tag to prompt Corresponding new label ;
- For the model : How to fine tune the pre training model ;
3.1 structure prompt
First , about prompt, Mainly refers to " Attach a supplementary description to the original input , Through this supplementary description statement, we can realize task transformation and task solution , This description statement and the original input together form a semantically reasonable statement as prompt The input of ".
in other words , For input text x, It can be generated in two steps prompt:
- First step : Use a template ( Natural language fragments ), The template contains two empty locations , Used to fill in x And generate answer text z
- The second step is to fill the input into x Location
Reference material :
NLP Four paradigms of
边栏推荐
- To resist 7-Zip, list "three sins"? Netizen: "is the third key?"
- Two sides of the evening: tell me about the bloom filter and cuckoo filter? Application scenario? I'm confused..
- Golang anonymous function use
- Develop team OKR in the way of "crowdfunding"
- 8 tips for effective performance evaluation
- 2022 love analysis · panoramic report of digital manufacturers of state-owned enterprises
- Svn usage specification
- Advanced Mathematics (Seventh Edition) Tongji University exercises 2-1 personal solutions
- Pointcut expression
- How to set up SVN server on this machine
猜你喜欢
为抵制 7-Zip,列出 “三宗罪” ?网友:“第3个才是重点吧?”
(Supplement) double pointer topic
探索Cassandra的去中心化分布式架构
Remote file contains actual operation
Mysql 单表字段重复数据取最新一条sql语句
【声明】关于检索SogK1997而找到诸多网页爬虫结果这件事
arduino-esp32:LVGL项目(一)整体框架
什么是质押池,如何进行质押呢?
To resist 7-Zip, list "three sins"? Netizen: "is the third key?"
Myopia: take off or match glasses? These problems must be understood clearly first
随机推荐
关于视觉SLAM的最先进技术的调查-A survey of state-of-the-art on visual SLAM
斑马识别成狗,AI犯错的原因被斯坦福找到了
Advanced Mathematics (Seventh Edition) Tongji University exercises 2-1 personal solutions
Chinese translation of Tagore's floating birds (1~10)
MongoDB 的安装和基本操作
用通达信炒股开户安全吗?
How programming apes grow rapidly
arduino-esp32:LVGL项目(一)整体框架
Acwing game 58
[proteus simulation] 8 × 8LED dot matrix screen imitates elevator digital scrolling display
于文文、胡夏等明星带你玩转派对 皮皮APP点燃你的夏日
面试官:JVM如何分配和回收堆外内存
2022 love analysis · panoramic report of digital manufacturers of state-owned enterprises
Threejs Part 2: vertex concept, geometry structure
Thread pool executes scheduled tasks
[web security] - [SQL injection] - error detection injection
Multithread 02 thread join
线程池执行定时任务
NFT新的契机,多媒体NFT聚合平台OKALEIDO即将上线
Unreal_ Datatable implements ID self increment and sets rowname