当前位置:网站首页>NLP enhanced technology
NLP enhanced technology
2022-06-26 01:37:00 【Green Lantern swordsman】
I met an old brother yesterday , Ask me what I know NLP Enhancement technology . I was stunned , Enhanced technology originated from The image processing , Later on 《 Baimian machine learning 》 I saw its detailed interpretation in this book .NLP Enhanced technology for ? Actually , I used it before .
In the voice assistant , As input, expect , First, I enhanced the data of the definition . As the main model Fasttext, In fact, we also have data enhancement technology , So , I have also written several exploration summaries .
After you come back , I saw something about NLP Data enhancement technology , So I prepared to do some research .
This article mainly refers to Zhihu famous post 1
- Word substitution
1.1 Dictionary based substitution
(1) Verb , Made into a small set; interjection 、 People and words have dictionaries , It can be replaced at will
(2) Resource replacement is a difficult problem . such as , Children education This skill . Some resources are exclusive to children's education , Other resources do not belong to children's education , Therefore, attention should be paid to , Also add Counter sentences . And in this case , We also have words embedded Way to grasp .
1.2. Word replacement based on word vector
Never used , The pre training model is too big , This should not be easy to operate
1.3 Based on mask language model (MLM augmentation)
Difficult to operate , It's not easy to control
1.4 be based on TF-IDF Replace with the word
tf It is a word of high frequency ,IDF It is a word that distinguishes great power among articles . The original idea of this technique is Get rid of those Unimportant words . - Back translation
This method is very useful in enhancing text similar data sets - Text form transformation
This is specifically for To expand or abbreviate words - Random noise injection
Based on a hypothesis : A small amount of interference with the sample , The results predicted by the model are consistent
4.1 Introduce spelling errors
This is . We save the error results after speech recognition , As an alias for a formal resource name .
4.2 Unigram The noise 、Blank The noise
One is to join Useless high-frequency words ; One is to join Fixed symbol
4.3 Other simple methods
(1) Sentence order shuffle
(2) Randomly exchange the order of two words
(3) Insert words or sentences randomly
(4) Random delete - Instance cross enhancement
边栏推荐
- Install tensorflow GPU miscellaneous
- 100ask seven day IOT training camp learning notes - bare metal program framework design
- Enlightenment Q & A
- Have you considered going or staying in graduation season
- JSON简介
- Complete review (including syntax) -- MySQL regular expressions
- Oracle database complete uninstallation steps (no screenshot)
- Native DOM vs. virtual DOM
- 21. Hoff circle transformation
- Is it safe to open a fund account? Are there any risks?
猜你喜欢

Shengxin weekly issue 33

Musk vs. jobs, who is the greatest entrepreneur in the 21st century

25. histogram comparison

The kth largest element in the array

CityJSON

Freertos+stm32l+esp8266+mqtt protocol transmits temperature and humidity data to Tencent cloud IOT platform

“热帖”统计

--SQL of urban cultivation manual -- Chapter 1 basic review

15 `bs对象.节点名称.节点名称.string` 获取嵌套节点内容

Reading notes on how to connect the network - hubs, routers and routers (III)
随机推荐
Web information collection, naked runners on the Internet
Design and process analysis of anti backflow circuit for MOS transistor
经纬度 多点 获取中心点 已解决
25. histogram comparison
Flex & Bison 开始
RT thread project engineering construction and configuration - (Env kconfig)
Musk vs. jobs, who is the greatest entrepreneur in the 21st century
Etcd database source code analysis cluster communication initialization
The overall process of adding, deleting, modifying and querying function items realized by super detailed SSM framework
Idea configuration
Laravel basic course routing and MVC - controller
New library launched | cnopendata wholesale price data of agricultural products
Summary of informer's paper
JSON实例(一)
shell正则表达式
新库上线 | CnOpenDataA股上市公司IPO申报发行文本数据
单选框互斥且可同时取消选中
Install tensorflow GPU miscellaneous
Etcd database source code analysis -- inter cluster network layer server interface
Tools - API document generation tool