当前位置:网站首页>Li Mu [practical machine learning] 1.4 data annotation
Li Mu [practical machine learning] 1.4 data annotation
2022-06-12 20:54:00 【A summer of swans】
Tips : When the article is finished , Directories can be generated automatically , How to generate it, please refer to the help document on the right
List of articles
Preface
Annotation of data —— Mind mapping 
One 、 Semi-supervised learning
A small part is marked by , But many have no feedback .
for example : A web page , A small number of visitors have clear labels , But most users don't know what they do , So there was no feedback and no comments . So how to use small labeled data and large unlabeled data together .
hypothesis :
1. Assumption of continuity : Sample characteristics are similar , Then the labels of the two samples are the same
2. Clustering hypothesis : The user community has similar behavior , If the data has a good clustering structure , Suppose the data in the class has the same label .
3. Popular assumptions : In fact, my data is essentially low dimensional data , Therefore, cleaner data can be obtained by dimensionality reduction .
Important algorithm : Self learning algorithm

1. How to choose confidence samples
2. So you can use more expensive models ( Deep neural networks ), Because it's just for data annotation , It will not be deployed online at all .——》 Make it more accurate .
Two 、 Crowdsourcing marks
Find a lot of people on the Internet , Manpower to mark data
ImageNet Data sets ——> Marked millions of pieces of data .
For example, many data companies , They are also services for labeling data .
You need to consider
1. Need to design relatively simple tasks .( Different educational background )
2. spending : So you also need to consider how many tasks the data needs to generate , How long does the task take , Multiply the two , Figure out how much it will cost .
3. Dimension quality
resolvent
1. In mission design , The complexity of the task needs to be reduced .
2. There are some simple pictures , There is no need for people to mark
Active learning
People will intervene
It will annotate the important data without annotation
Algorithm :
1. Train the model with labeled data . Then choose the data that I am particularly unsure of , Mark it for others
2. Train multiple models , Let multiple models vote and say , Which data is more difficult , Then select the data and mark it
The combination of self-learning and active learning

3. The quality control
People make mistakes
1. Each picture and task will be sent to multiple taggers , But the task has been expanded
2. Send the results to many people if they are not sure .
Weak supervised learning
Semi automatic generation of labels , It's a little worse than the target , But good enough to train some models
Data programming , Use heuristic methods to label data
For example, summing up some rules of annotation , Put it in the program , Let the program label according to these rules .
summary
Tips : Here is a summary of the article :
for example :
Get more labels
1. Self training ( Simple data )
2. crowdsourcing , Let people tabulate the data ( Difficult data )
3. Weak supervised learning ( Find the general rule that people judge labels , Let the machine label )
边栏推荐
- Niuke net: somme des trois nombres
- 对闭包的理解
- House raiding 3
- new做了哪几件事
- Data visualization diagram microblog forwarding diagram
- JSON file handles object Tags
- It has been engaged in the functional test of 10K to the test development of 40W annual salary for 5 years, and spent 7 days sorting out the super comprehensive learning route
- Scope and scope chain
- MinIO客户端(mc命令)实现数据迁移
- Introduction to the characteristics of building a balancer decentralized exchange market capitalization robot
猜你喜欢

Successful transition from self-study test halfway, 10K for the first test

Nexus3搭建本地仓库

Junda technology is applicable to "kestar" intelligent precision air conditioning network monitoring

测试人如何规划自己的未来?才能实现入行2年达到25k?

半路自学测试成功转行,第一份测试工作就拿10k

Halcon angle and radian interchange

View 的事件分发机制

Scope and scope chain

Scala基础语法入门(三)Scala中的各种运算符

机器学习资料汇总
随机推荐
Lake shore PT-100 platinum resistance temperature sensor
Can flush open an account? Can you directly open the security of securities companies on the app? How to open an account online when buying stocks
Properties to YML
How to determine fragment restored from Backstack
How to improve communication efficiency during home office | community essay solicitation
Scalars, vectors, arrays, and matrices
(11) Image frequency domain filtering with OpenCV
字符串基础知识
HR SaaS unicorn is about to emerge. Will the employee experience be the next explosive point?
What did new do
P5076 【深基16.例7】普通二叉樹(簡化版)
Scala基础语法入门(三)Scala中的各种运算符
Restful API interface specification
DFT learning notes
MinIO客户端(mc命令)实现数据迁移
Solution of multi machine room dynamic loop status network touch screen monitoring
Why my order by create_ Time ASC becomes order by ASC
Circularly insert one excel column and the sum of multiple columns
How can CTCM in the inspection lot system status of SAP QM be eliminated?
Niuke net: somme des trois nombres