当前位置:网站首页>[NLP] NLP full path learning recommendation

[NLP] NLP full path learning recommendation

2022-06-10 13:06:00 Demeanor 78

 Me and some BAT Brother chatted NLP Full path learning , Summarize the following , contain :
  • Study NLP What foundations are needed

  • NLP The project of learning each task in the whole path

01

Study NLP Need to have the foundation

01  machine learning

Be familiar with simple machine learning models . for example : Logical regression 、 Decision tree 、 Naive Bayes 、 The hidden Markov model 、K-Means、 Regularization methods, etc ; Some advanced machine learning foundations are better . for example : Integrated learning ( Random forests 、GBDT、XGB、Stacking etc. )、 Conditional random field CRF、 Bayesian network 、 Support vector machine 、 Theme model, etc. .

02  Deep learning

Familiar with the basics of simple neural networks . for example : Neuron model 、 Multilayer perceptron 、 Back propagation algorithm 、 Use of activation functions 、Word2Vec、RNN、CNN etc. ; Some advanced deep learning foundations are better . for example :LSTM、BiLSTM-CRF、TextCNN、 Attention mechanism Attention、Transformer、Bert etc. .

03  Language framework

skilled Python programing language 、 Understand the simple Pytorch Use .

04 NLP Direction foundation

understand BERT-based Structure of model ,

Understand the concept of model distillation

Understand the concepts of abstract and generative summary models

understand GPT-2

Learn about text categorization 、 The principle of sequential annotation model

Understand graph neural networks and representation learning

02

Basic projects

01  Chinese word segmentation

Chinese word segmentation is a basic step in Chinese text processing . What is different from English is , There is no word boundary in Chinese sentences , Therefore, in the process of Chinese natural language processing , Usually, we need to do word segmentation first , The effect of word segmentation will directly affect the part of speech 、 Syntax tree and other modules . Although there are some ready-made word splitters that can be used , But understand its internal principles 、 It is also beneficial to practice an industrial word splitter .


Recommended projects Sohu News like scene of the Chinese word segmentation                                           Project difficulty :**

The project combines the common word segmentation requirements in the industrial scene : The effect is better 、 Speed up 、 It can be solved quickly badcase、 The more you use the system, the better 、 Fast scene migration, etc .

After learning the project , You can be familiar with the implementation of word segmentation in the whole industry from zero 、 Use scenarios 、 Methods and techniques .

02  Keywords extraction

In the field of natural language processing , The key to deal with massive text files is to extract the most concerned problems of users . Whether for long text or short text , It is often possible to peek into the theme of the whole text through a few keywords . meanwhile , Whether it's text-based recommendation or text-based search , The dependence on text keywords is also great , The accuracy of keyword extraction is directly related to the final effect of recommendation system or search system .

Recommended projects Keyword extraction in Sina portal like scenario                                           Project difficulty :**

Implement a keyword extractor under news corpus , Satisfy : In the case of a single process , At least 0.80 The accuracy of the , At least 100 Of QPS; At the same time, the ability to recognize new words ;

After learning the project , You can be familiar with the context of keyword extraction methods from zero 、 The whole idea of unsupervised keyword extraction .

03

Advanced project

03  Entity recognition

Entity recognition is NLP A very common task in the field , Almost a necessary skill . It mainly extracts important entities from text , such as : The person's name 、 Place names 、 Organization name 、 Time 、 Proper nouns, etc , It can be extended to any entity you care about , Like the license plate number 、 Nationality, etc .

Recommended projects Entity recognition in Sina Weibo like scenario                                         Project difficulty :****

Implement an entity recognition system based on multiple models , The following characteristics should be met : Traditional machine learning , At least 80% The accuracy of the , At least 10 Of qps; Deep learning , At least 90% The accuracy of the , Not less than 1 Of qps( Don't use GPU), Not less than 10 Of qps( Use GPU);

After learning the project , You can be familiar with the historical context of entity recognition methods from zero 、 And implement common models .

04  Text classification

The text classification is NLP The most basic task in , It is also the most common task , In the actual work, we often encounter a variety of text classification tasks . With BERT-based The development of models , Text classification model in practical work baseline Greatly improved . However , In the actual model development process , There are often two types of problems :(1) Text data is not standardized , The data contains a lot of noise 、 Class imbalance, etc ;(2) Text data lacks annotation . These two kinds of problems greatly increase the difficulty of developing a stable text classification model .

Recommended projects Under the scenario of headline classification BERT Classifier training 、 Optimization and distillation          difficulty :***

Realization :

1、 be based on BERT Classifier , Under the condition of medium difficulty task and noise data , forecast F1 No less than 0.9

2、 be based on BERT Distillation classifier , Compared with the above model , Effect degradation does not exceed 0.05, The response speed shall not be lower than 10qps;

After learning the project , Understand the basic ideas and methods of text classification model development .

05  Text in this paper,

Text summary generation is NLP Advanced tasks in , It refers to compressing long text to improve the reading efficiency of users , It is the deep-seated business demand of the enterprise , It is common in the business field of news release and analysis of the Internet or financial enterprises , It often requires senior algorithmic personnel to meet such business needs .

Recommended projects Implement a GPT The generative summary model of                               Project difficulty :****

Implement a GPT The generative summary model of , The following conditions are met :

1、 High response effectiveness ,top10 The effectiveness of the abstract is no less than 90%;

2、 single GPU The lower prediction speed shall not be lower than 1qps;

After learning the project , You can understand the technical development history of the summary model , Learn about classic models , Understand the basic ideas and methods of model development and optimization .

06  Dialogue system

Smart conversations are NLP Advanced tasks in , It usually includes small talk ( Generative )、QA( Retrieval type ) And task-based , The most common intelligent dialogue system in practical work is QA Dialogue system , The second is task-based dialogue system , The chatty dialogue system is often used as a subsidiary of the first two dialogue systems , To improve the affinity of the dialogue system .

Recommended projects build Industrial dialogue system : Retrieval type / Task oriented / Small talk                Project difficulty :*****

Build an industrial dialogue system :

1、 Search dialog ,Learning to Rank system implementation FAQ; 

2、 Task based dialogue system , be based on rasa The open source framework completes simple multi - task dialogues ; 

3、 Small talk conversation , be based on GPT Dialog generation of the model ;

After learning the project , You can understand the development of different types of dialogue systems , Understand the basic ideas and methods of developing different types of dialogue systems .

07  Knowledge map

The map of knowledge is 2012 year google A semantic expression specification based on Semantic Web , This is based on ontology (ontology) The semantic web of . With the development of Knowledge Mapping , Its presence NLP Is also more and more widely used . Now electricity supplier , Search engine , Dialogue robots and other business forms are inseparable from the knowledge map .

Recommended projects Build a map of knowledge                                                                 Project difficulty :****

can ——

1) Independently develop and construct information extraction components of knowledge map . Include NER, Relation extraction, etc

2) Independently develop a question answering system based on knowledge base

3) Develop recommendation system based on knowledge base independently

After learning the project , You can understand the context of the knowledge map , Understand the application scenarios and construction methods of knowledge maps .

 Past highlights 




 It is suitable for beginners to download the route and materials of artificial intelligence ( Image & Text + video ) Introduction to machine learning series download Chinese University Courses 《 machine learning 》( Huang haiguang keynote speaker ) Print materials such as machine learning and in-depth learning notes 《 Statistical learning method 》 Code reproduction album machine learning communication qq Group 955171419, Please scan the code to join wechat group 

656b8ff755861d595ec095a6866fa246.png

原网站

版权声明
本文为[Demeanor 78]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/161/202206101240310783.html