当前位置:网站首页>Structured machine learning project (II) - machine learning strategy (2)
Structured machine learning project (II) - machine learning strategy (2)
2022-06-27 22:30:00 【997and】
This study note mainly records various records during in-depth study , Including teacher Wu Enda's video learning 、 Flower Book . The author's ability is limited , If there are errors, etc , Please contact us for modification , Thank you very much !
@[TOC]( Structured machine learning project ( Two )- Machine learning strategies (2))
The first edition 2022-06-01 first draft
One 、 Conduct error analysis (Carrying out error analysis)

Debug cat classifier ,90% Accuracy rate .
As shown in the figure, the two dogs are wrongly analyzed , It can be aimed at dogs , Collect more dog graphs or design algorithms to deal with dogs
Recommended :
First , Collect , Such as 100 Wrong samples , Manual check . Adding human data to machine learning is not good .
Sometimes when doing error analysis , Several ideas can be evaluated in parallel at the same time . Mark error , Half way through, you may find that the filter interferes with the classifier .
Two 、 Clear mislabeled data (Cleaning up Incorrectly labeled data)

Pictured , The penultimate one is marked wrong .
The deep learning algorithm is quite robust to random errors in the training set (robust), But not so robust to systematic errors .
Whether it is worth revising 6% Mark samples with errors .
First , No matter what correction means , Both should be applied to the development set and the test set at the same time , The two must come from the same distribution .
secondly , Consider simultaneously testing the samples with correct judgment and wrong interpretation ,
Last , You may decide to fix only the development set and the test set , They are relatively small .
3、 ... and 、 Quickly build your first system , And iterate (Build your first system quickly,then iterate)

Speech recognition in many noisy examples .
Four 、 Use data from different distributions for training and testing (Training and testing on different distributions)

A data source comes from a mobile phone , There are different fuzzy ; Another source of data is the crawler .
The purpose of setting up development and is to tell the team what to aim at .
The first option focuses mostly on optimizing the images downloaded from the web , Don't suggest ;
The second option training set is web download 200000 A picture , Plus 5000 Photos uploaded from mobile phone . Both the development set and the test set are mobile phones . Better performance over time .
All voice data can be used as training set .
5、 ... and 、 Analysis of deviation and variance when data distribution does not match (Bias and Variance with mismatched data distributions)

First, the algorithm only sees the training set data , Never seen development set data ; second , Development set data comes from different distributions .
Build development sets and test sets from the same distribution , But the training sets come from different distributions . All you have to do is randomly break up the training set , Set aside a portion for training - Development set .
On the lower right are three examples , The second high deviation , Under fitting .
Open to avoid deviation 、 variance 、 The data doesn't match 、 Over fitting the development set ( If the gap is large, it will be over fitting ).
Example on the right , It performs better in test set and development set .
The horizontal axis : General speech recognition gets data 、 Collect different data sets such as voice data related to rear-view mirrors
Vertical axis : Different ways or algorithms of processing data - Human level 、 The error rate achieved on a neural network trained or untrained data set .
How to deal with data mismatch ?
Especially from different distributions , You can use more training data .2.6 I'll talk about .
6、 ... and 、 Dealing with data mismatches (Addressing data mismatch)

To avoid over fitting the test set , Do error analysis , Look at the development set, not the test set .
1. Find out how the development set differs from the training set
2. Make the training set more like a development set 
The training set is close to the development set , You can synthesize data .
Think there is a data mismatch problem , It can be used for data analysis .
7、 ... and 、 The migration study (Transfer learning)

Sometimes , Neural networks can acquire knowledge from a task , And apply these to just another independent task .
( In blue ) say concretely , During the first stage of training , When you train for image recognition tasks , You can train all the common parameters of the neural network , All the weights , All layers , Then you get a network that can do image recognition and prediction . After training the neural network , To realize transfer learning , What you have to do now is , Replace the data set with a new one (x,y) Yes , Now these are radiology images , and y Is the diagnosis you want to predict , What you need to do is initialize the weight of the last layer , Let's call it w[L] and b[L] Random initialization .
( Purple ) Training data on a new radiology dataset :
The data set is small , Retrain the last layer of weights , And keep other parameters unchanged .
Enough data , Retrain all remaining layers in the neural network . The initial training is pre training (pre-training)、 Update all weights , Then the training process on the radiology data is called fine tuning (fine tuning)
( Next ) Here is another example , Suppose you've trained a speech recognition system , Now? x Is audio or audio clip input , and y It's dictation , So you've trained your speech recognition system , Let it output dictation text . Now let's say you want to build a “ Rousing words 〞 or “ Trigger word ” Detection system , The so-called awakening word or trigger word is a sentence we say , It can wake up the voice control equipment at home , Like you said "Alexa” can To wake up an Amazon Echo equipment , Or use “OK Google〞 Wake up the Google equipment , use "Hey siri" To wake up Apple Devices , use " Hello baidu " Wake up a Baidu device . To do that , You may need to remove the last layer of the neural network , Then add a new output node , But sometimes you can add more than one new node , Or even add a few new layers to your neural network , Then put the wake-up word to detect the tag of the problem y Feed in and train . Again , It depends on how much data you have , You may just need to retrain the new layer of the network , Maybe you need to retrain more layers of the neural network .
( Shanghong ) The first image training is 100 All samples , You can learn low-level features . Radiology training has 100 Samples , So a lot of knowledge learned from image recognition training can be transferred , Even though the radiology department has little data .
When transfer learning makes sense ?
1. Want to start from the task A Learn and transfer some knowledge to the task B, When A and B There's the same input x Time makes sense ;
2. Mission A Data is better than task B Much more , meaningful ;
3.A The low-level features of can help the task B
8、 ... and 、 Multi task learning (Multi-task learning)

Transfer learning , The steps are serial .
Driverless cars detect pedestrians at the same time 、 vehicle 、 Stop sign 、 traffic lights 
Define the loss function of neural network ,softmax Regression assigns a single label to a single sample , This diagram can have many different labels , Multiple objects may appear in the same picture at the same time , Instead, you iterate through different types .
Multi task learning , Four different neural networks can also be trained . Even if some images have only a small number of labels ,
When does multitasking make sense ?
1. If training a group of tasks , Low level features can be shared 6
2. Not absolutely : The amount of data in each task is very close .
3. When training a large enough neural network , Do all the work at the same time . So an alternative to multi task learning is to train a separate neural network for each task . The only case of performance degradation is that the neural network is not large enough .
Transfer learning is frequently used , The data set is relatively small , Transfer learning can help you .
Nine 、 What is the end-to-end in-depth learning (What is end-to-end deep learning)

Previous data processing systems or learning systems , Multiple stages of processing are required , So end-to-end deep learning is to ignore all these different stages , Replace it with a single neural network .
For example, speech recognition , First you will extract some features , Some hand designed audio features , Maybe you've heard MFCC, This algorithm is used to extract a specific set of artificially designed features from audio . After extracting some low-level features , You can use machine learning algorithms to find phonemes in audio clips , So phoneme is the basic unit of sound , for instance ”cat” The word is made up of three syllables ,Cu-、Ah- And u-, The algorithm extracts these three phonemes , Then you string phonemes together to form separate words , Then you string the words together to form the dictation text of the audio clip .
Compared with the above pipeline , End to end deep learning is shown in the bottom line of the figure . One of its biggest challenges is that it requires a lot of data to make the system perform well .
Face recognition access control system .
The best approach so far seems to be a multi-step approach , First , You run a software to detect faces , So the first detector looks for the location of the face , Face detected , Then zoom in on that part of the image , And crop the image , Center the face , Then there are the red framed photos here , And feed it into the neural network , Let the network learn , Or estimate
The identity of that man .
Why two-step method is better :
1. Two problems to solve , Each is simple
2. There are a lot of training data for both subtasks 
Machine translation
Watch your child's hands x light , Estimated age
Ten 、 Whether to use end-to-end deep learning (Whether to use end-to-end learning)

advantage :
1. Let data speak
2. Fewer manually designed components are required
shortcoming :
1. It may take a lot of data
2. Manual design components that may be useful are excluded 
When applying end-to-end deep learning , Consider whether there is enough data to learn directly from x Mapping to y Functions that are complex enough .
Necessary complexity (complexity needed)
Driverless technology : Check around the car , Plan the route 、 Steering wheel accuracy 、 Precise throttle force
边栏推荐
- Gbase 8A OLAP analysis function cume_ Example of dist
- Go from introduction to actual combat - execute only once (note)
- YOLOv6:又快又准的目标检测框架开源啦
- Professor of Tsinghua University: software testing has gone into a misunderstanding - "code is necessary"
- It smells good. Since I used Charles, Fiddler has been completely uninstalled by me
- CDH集群之YARN性能调优
- Contest 2050 and Codeforces Round #718 (Div. 1 + Div. 2)
- Go from introduction to practice -- shared memory concurrency mechanism (notes)
- Beijing University of Posts and Telecommunications - multi-agent deep reinforcement learning for cost and delay sensitive virtual network function placement and routing
- [leetcode] dynamic programming solution partition array i[red fox]
猜你喜欢

. Net learning notes (V) -- lambda, LINQ, anonymous class (VaR), extension method

年薪50W+的测试大鸟都在用这个:Jmeter 脚本开发之——扩展函数

结构化机器学习项目(一)- 机器学习策略

Summary of Web testing and app testing by bat testing experts

Go from introduction to practice - error mechanism (note)

美团20k软件测试工程师的经验分享

渗透学习-靶场篇-dvwa靶场详细攻略(持续更新中-目前只更新sql注入部分)

Open source technology exchange - Introduction to Chengying, a one-stop fully automated operation and maintenance manager

Do280openshift access control -- Security Policy and chapter experiment

渗透学习-sql注入过程中遇到的问题-针对sort=left(version(),1)的解释-对order by后接字符串的理解
随机推荐
软件缺陷管理——测试人员必会
Stm32cubeide1.9.0\stm32cubemx 6.5 f429igt6 plus lan8720a, configure eth+lwip
Summary of Web testing and app testing by bat testing experts
6G显卡显存不足出现CUDA Error:out of memory解决办法
结构化机器学习项目(一)- 机器学习策略
Professor of Tsinghua University: software testing has gone into a misunderstanding - "code is necessary"
OpenSSL Programming II: building CA
解决本地连接不上虚拟机的问题
读写分离-Mysql的主从复制
Matlab finds the position of a row or column in the matrix
从学生到工程师的蜕变之路
正则表达式
Typescript learning
Conversion between flat array and JSON tree
登录凭证(cookie+session和Token令牌)
Codeforces Round #719 (Div. 3)
[leetcode] dynamic programming solution partition array i[red fox]
Exclusive interview with millions of annual salary. What should developers do if they don't fix bugs?
Gbase 8A method for reducing the impact on the system by controlling resource usage through concurrency during node replacement of V8 version
CDH集群之YARN性能调优