当前位置:网站首页>Li Mu [practical machine learning] 1.4 data annotation
Li Mu [practical machine learning] 1.4 data annotation
2022-06-12 20:54:00 【A summer of swans】
Tips : When the article is finished , Directories can be generated automatically , How to generate it, please refer to the help document on the right
List of articles
Preface
Annotation of data —— Mind mapping
One 、 Semi-supervised learning
A small part is marked by , But many have no feedback .
for example : A web page , A small number of visitors have clear labels , But most users don't know what they do , So there was no feedback and no comments . So how to use small labeled data and large unlabeled data together .
hypothesis :
1. Assumption of continuity : Sample characteristics are similar , Then the labels of the two samples are the same
2. Clustering hypothesis : The user community has similar behavior , If the data has a good clustering structure , Suppose the data in the class has the same label .
3. Popular assumptions : In fact, my data is essentially low dimensional data , Therefore, cleaner data can be obtained by dimensionality reduction .
Important algorithm : Self learning algorithm
1. How to choose confidence samples
2. So you can use more expensive models ( Deep neural networks ), Because it's just for data annotation , It will not be deployed online at all .——》 Make it more accurate .
Two 、 Crowdsourcing marks
Find a lot of people on the Internet , Manpower to mark data
ImageNet Data sets ——> Marked millions of pieces of data .
For example, many data companies , They are also services for labeling data .
You need to consider
1. Need to design relatively simple tasks .( Different educational background )
2. spending : So you also need to consider how many tasks the data needs to generate , How long does the task take , Multiply the two , Figure out how much it will cost .
3. Dimension quality
resolvent
1. In mission design , The complexity of the task needs to be reduced .
2. There are some simple pictures , There is no need for people to mark
Active learning
People will intervene
It will annotate the important data without annotation
Algorithm :
1. Train the model with labeled data . Then choose the data that I am particularly unsure of , Mark it for others
2. Train multiple models , Let multiple models vote and say , Which data is more difficult , Then select the data and mark it
The combination of self-learning and active learning
3. The quality control
People make mistakes
1. Each picture and task will be sent to multiple taggers , But the task has been expanded
2. Send the results to many people if they are not sure .
Weak supervised learning
Semi automatic generation of labels , It's a little worse than the target , But good enough to train some models
Data programming , Use heuristic methods to label data
For example, summing up some rules of annotation , Put it in the program , Let the program label according to these rules .
summary
Tips : Here is a summary of the article :
for example :
Get more labels
1. Self training ( Simple data )
2. crowdsourcing , Let people tabulate the data ( Difficult data )
3. Weak supervised learning ( Find the general rule that people judge labels , Let the machine label )
边栏推荐
- Understanding of functions
- Junda technology is applicable to "kestar" intelligent precision air conditioning network monitoring
- 检测当前系统语言
- remote: Support for password authentication was removed on August 13, 2021
- A simple understanding of consistent hash
- Introduction to system mode development of rouya wechat mall
- P5076 [Fondation profonde 16. Exemple 7] Arbre binaire commun (version simplifiée)
- GPU giant NVIDIA suffered a "devastating" network attack, and the number one malware shut down its botnet infrastructure | global network security hotspot on February 28
- MinIO客户端(mc命令)实现数据迁移
- Properties to YML
猜你喜欢
The required books for software testers (with e-books) recommended by senior Ali have benefited me a lot
Halcon angle and radian interchange
Design rule check constraint (set_max_transition, set_max_capability)
牛客網:三數之和
Nexus3 build local warehouse
Introduction to system mode development of rouya wechat mall
Data visualization - biaxial comparison effect
At the same time, do the test. Others have been paid 20W a year. Why are you still working hard to reach 10K a month?
Successful transition from self-study test halfway, 10K for the first test
Illustrator tutorial, how to recolor artwork in illustrator?
随机推荐
How do testers plan for their future? To achieve 25K in 2 years?
How to download putty using alicloud image?
Before job hopping, Jin San made up the interview questions. Jin San successfully landed at Tencent and got a 30K test offer
Integrated monitoring solution for power environment of small and medium-sized computer rooms
Listener in JSP
Summary of machine learning materials
Niuke.com: sum of three numbers
函数的了解
循环插入excel某一列,以及多列之和
P5076 [deep base 16. Example 7] common binary tree (simplified version)
Understanding of functions
Unauthorized rce in VMware vCenter
Circularly insert one excel column and the sum of multiple columns
remote: Support for password authentication was removed on August 13, 2021
QT pro文件配置ffmpeg宏
Halcon angle and radian interchange
InRelease: 由于没有公钥,无法验证下列签名: NO_PUBKEY EB3E94ADBE1229CF
Lake shore PT-100 platinum resistance temperature sensor
Algorinote_2_主定理与 Akra-Bazzi 定理
remote: Support for password authentication was removed on August 13, 2021