当前位置:网站首页>[NLP] NLP full path learning recommendation
[NLP] NLP full path learning recommendation
2022-06-10 13:06:00 【Demeanor 78】
Me and some BAT Brother chatted NLP Full path learning , Summarize the following , contain :Study NLP What foundations are needed
NLP The project of learning each task in the whole path
01
Study NLP Need to have the foundation
01 machine learning
Be familiar with simple machine learning models . for example : Logical regression 、 Decision tree 、 Naive Bayes 、 The hidden Markov model 、K-Means、 Regularization methods, etc ; Some advanced machine learning foundations are better . for example : Integrated learning ( Random forests 、GBDT、XGB、Stacking etc. )、 Conditional random field CRF、 Bayesian network 、 Support vector machine 、 Theme model, etc. .
02 Deep learning
Familiar with the basics of simple neural networks . for example : Neuron model 、 Multilayer perceptron 、 Back propagation algorithm 、 Use of activation functions 、Word2Vec、RNN、CNN etc. ; Some advanced deep learning foundations are better . for example :LSTM、BiLSTM-CRF、TextCNN、 Attention mechanism Attention、Transformer、Bert etc. .
03 Language framework
skilled Python programing language 、 Understand the simple Pytorch Use .
04 NLP Direction foundation
understand BERT-based Structure of model ,
Understand the concept of model distillation
Understand the concepts of abstract and generative summary models
understand GPT-2
Learn about text categorization 、 The principle of sequential annotation model
Understand graph neural networks and representation learning
02
Basic projects
01 Chinese word segmentation
Chinese word segmentation is a basic step in Chinese text processing . What is different from English is , There is no word boundary in Chinese sentences , Therefore, in the process of Chinese natural language processing , Usually, we need to do word segmentation first , The effect of word segmentation will directly affect the part of speech 、 Syntax tree and other modules . Although there are some ready-made word splitters that can be used , But understand its internal principles 、 It is also beneficial to practice an industrial word splitter .
Recommended projects : Sohu News like scene of the Chinese word segmentation Project difficulty :**
The project combines the common word segmentation requirements in the industrial scene : The effect is better 、 Speed up 、 It can be solved quickly badcase、 The more you use the system, the better 、 Fast scene migration, etc .
After learning the project , You can be familiar with the implementation of word segmentation in the whole industry from zero 、 Use scenarios 、 Methods and techniques .
02 Keywords extraction
In the field of natural language processing , The key to deal with massive text files is to extract the most concerned problems of users . Whether for long text or short text , It is often possible to peek into the theme of the whole text through a few keywords . meanwhile , Whether it's text-based recommendation or text-based search , The dependence on text keywords is also great , The accuracy of keyword extraction is directly related to the final effect of recommendation system or search system .
Recommended projects : Keyword extraction in Sina portal like scenario Project difficulty :**
Implement a keyword extractor under news corpus , Satisfy : In the case of a single process , At least 0.80 The accuracy of the , At least 100 Of QPS; At the same time, the ability to recognize new words ;
After learning the project , You can be familiar with the context of keyword extraction methods from zero 、 The whole idea of unsupervised keyword extraction .
03
Advanced project
03 Entity recognition
Entity recognition is NLP A very common task in the field , Almost a necessary skill . It mainly extracts important entities from text , such as : The person's name 、 Place names 、 Organization name 、 Time 、 Proper nouns, etc , It can be extended to any entity you care about , Like the license plate number 、 Nationality, etc .
Recommended projects : Entity recognition in Sina Weibo like scenario Project difficulty :****
Implement an entity recognition system based on multiple models , The following characteristics should be met : Traditional machine learning , At least 80% The accuracy of the , At least 10 Of qps; Deep learning , At least 90% The accuracy of the , Not less than 1 Of qps( Don't use GPU), Not less than 10 Of qps( Use GPU);
After learning the project , You can be familiar with the historical context of entity recognition methods from zero 、 And implement common models .
04 Text classification
The text classification is NLP The most basic task in , It is also the most common task , In the actual work, we often encounter a variety of text classification tasks . With BERT-based The development of models , Text classification model in practical work baseline Greatly improved . However , In the actual model development process , There are often two types of problems :(1) Text data is not standardized , The data contains a lot of noise 、 Class imbalance, etc ;(2) Text data lacks annotation . These two kinds of problems greatly increase the difficulty of developing a stable text classification model .
Recommended projects : Under the scenario of headline classification BERT Classifier training 、 Optimization and distillation difficulty :***
Realization :
1、 be based on BERT Classifier , Under the condition of medium difficulty task and noise data , forecast F1 No less than 0.9
2、 be based on BERT Distillation classifier , Compared with the above model , Effect degradation does not exceed 0.05, The response speed shall not be lower than 10qps;
After learning the project , Understand the basic ideas and methods of text classification model development .
05 Text in this paper,
Text summary generation is NLP Advanced tasks in , It refers to compressing long text to improve the reading efficiency of users , It is the deep-seated business demand of the enterprise , It is common in the business field of news release and analysis of the Internet or financial enterprises , It often requires senior algorithmic personnel to meet such business needs .
Recommended projects : Implement a GPT The generative summary model of Project difficulty :****
Implement a GPT The generative summary model of , The following conditions are met :
1、 High response effectiveness ,top10 The effectiveness of the abstract is no less than 90%;
2、 single GPU The lower prediction speed shall not be lower than 1qps;
After learning the project , You can understand the technical development history of the summary model , Learn about classic models , Understand the basic ideas and methods of model development and optimization .
06 Dialogue system
Smart conversations are NLP Advanced tasks in , It usually includes small talk ( Generative )、QA( Retrieval type ) And task-based , The most common intelligent dialogue system in practical work is QA Dialogue system , The second is task-based dialogue system , The chatty dialogue system is often used as a subsidiary of the first two dialogue systems , To improve the affinity of the dialogue system .
Recommended projects : build Industrial dialogue system : Retrieval type / Task oriented / Small talk Project difficulty :*****
Build an industrial dialogue system :
1、 Search dialog ,Learning to Rank system implementation FAQ;
2、 Task based dialogue system , be based on rasa The open source framework completes simple multi - task dialogues ;
3、 Small talk conversation , be based on GPT Dialog generation of the model ;
After learning the project , You can understand the development of different types of dialogue systems , Understand the basic ideas and methods of developing different types of dialogue systems .
07 Knowledge map
The map of knowledge is 2012 year google A semantic expression specification based on Semantic Web , This is based on ontology (ontology) The semantic web of . With the development of Knowledge Mapping , Its presence NLP Is also more and more widely used . Now electricity supplier , Search engine , Dialogue robots and other business forms are inseparable from the knowledge map .
Recommended projects : Build a map of knowledge Project difficulty :****
can ——
1) Independently develop and construct information extraction components of knowledge map . Include NER, Relation extraction, etc
2) Independently develop a question answering system based on knowledge base
3) Develop recommendation system based on knowledge base independently
After learning the project , You can understand the context of the knowledge map , Understand the application scenarios and construction methods of knowledge maps .
Past highlights
It is suitable for beginners to download the route and materials of artificial intelligence ( Image & Text + video ) Introduction to machine learning series download Chinese University Courses 《 machine learning 》( Huang haiguang keynote speaker ) Print materials such as machine learning and in-depth learning notes 《 Statistical learning method 》 Code reproduction album machine learning communication qq Group 955171419, Please scan the code to join wechat group 
边栏推荐
- Leetcode 96. 不同的二叉搜索树
- JS array to JSON, JSON to array. Array to comma separated string, string to array
- Which EDA design software should Altium Allegro pads choose
- GNN is used as the new backbone of the three major tasks of CV, with the same cost performance as CNN, Vit and MLP | Chinese Academy of Sciences & Huawei Noah open source
- Colmap source code reading notes [1] threading cc
- Example of full page sliding screen at mobile terminal (sliding the whole screen up and down) (sorting)
- Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving 论文阅读
- Code free may event Microsoft low code matrix update; Multiple industry reports released
- Element close and destroy the pop-up box: clear the data cache in the pop-up box (organize)
- [summary] individual competition supplement POJ - 3041 asteroids & codeforces - 173b chamber of Secrets
猜你喜欢

Some words from ShareIt group

The ability to register user names and passwords with the database

Count the number and average value of natural numbers whose sum of bits within 100 is 7

Unity3d uses URP rendering pipeline to realize ar shadow (shadow casting and transparent ground)

Wei Lai: "pinches" the data and "pinches" the future
![Vdo-slam source code reading notes [2] local optimization and global optimization](/img/01/7ce7113737d9799ac2684788d9f08d.jpg)
Vdo-slam source code reading notes [2] local optimization and global optimization

【FLinlk】Flink小坑之kerberos动态认证

Recommended learning materials for Altium Designer

Leetcode 96. Différents arbres de recherche binaires

Automatic mapping of tailored landmark representations for automated driving and map learning
随机推荐
MySQL 服务演进
In June, 2022, China Database ranking: tidb made a comeback to win the crown, and Dameng was dormant and won the flowers in May
Office technical lecture: punctuation - Chinese - vertical
VDO-SLAM: A Visual Dynamic Object-aware SLAM System 论文阅读
Some words from ShareIt group
编写程序,计算2/1+3/2+5/3+8/5.....的值。要求计算前n项之和,保留2位小数(该序列从第二项起,每一项的分子是前一项分子与分母的和,分母是前一项的分子)
[Accessibility] Missing contentDescription attribute on image
Summary of Kitti related information
[raise bar C #] how to call the base of the interface
excel异步导出
微信web开发工具使用教程,公司开发web
Leetcode 96. Different binary search trees
DynaSLAM II: Tightly-Coupled Multi-Object Tracking and SLAM 论文阅读
Introduction of Altium Designer
Tidb elementary course experience 8 (cluster management and maintenance, adding a tikv node)
VDMA调试总结
SparkStreaming实时数仓 问题&回答
CVPR2022|AConvNetforthe2020s&如何设计神经网络总结
UML class diagram
Unity3d uses URP rendering pipeline to realize ar shadow (shadow casting and transparent ground)