当前位置：网站首页>Structured machine learning project (II) - machine learning strategy (2)

Structured machine learning project (II) - machine learning strategy (2)

2022-06-27 22:30:00 【997and】

This study note mainly records various records during in-depth study , Including teacher Wu Enda's video learning 、 Flower Book . The author's ability is limited , If there are errors, etc , Please contact us for modification , Thank you very much ！

@[TOC]( Structured machine learning project （ Two ）- Machine learning strategies (2))

The first edition 2022-06-01 first draft

One 、 Conduct error analysis (Carrying out error analysis)

Insert picture description here
Debug cat classifier ,90％ Accuracy rate .
As shown in the figure, the two dogs are wrongly analyzed , It can be aimed at dogs , Collect more dog graphs or design algorithms to deal with dogs

Recommended ：
First , Collect , Such as 100 Wrong samples , Manual check . Adding human data to machine learning is not good .
Insert picture description here
Sometimes when doing error analysis , Several ideas can be evaluated in parallel at the same time . Mark error , Half way through, you may find that the filter interferes with the classifier .

Two 、 Clear mislabeled data (Cleaning up Incorrectly labeled data)

Insert picture description here
Pictured , The penultimate one is marked wrong .
The deep learning algorithm is quite robust to random errors in the training set （robust）, But not so robust to systematic errors .

Whether it is worth revising 6％ Mark samples with errors .

First , No matter what correction means , Both should be applied to the development set and the test set at the same time , The two must come from the same distribution .
secondly , Consider simultaneously testing the samples with correct judgment and wrong interpretation ,
Last , You may decide to fix only the development set and the test set , They are relatively small .

3、 ... and 、 Quickly build your first system , And iterate (Build your first system quickly,then iterate)

Insert picture description here
Speech recognition in many noisy examples .

Four 、 Use data from different distributions for training and testing (Training and testing on different distributions)

Insert picture description here
A data source comes from a mobile phone , There are different fuzzy ; Another source of data is the crawler .
The purpose of setting up development and is to tell the team what to aim at .

The first option focuses mostly on optimizing the images downloaded from the web , Don't suggest ;
The second option training set is web download 200000 A picture , Plus 5000 Photos uploaded from mobile phone . Both the development set and the test set are mobile phones . Better performance over time .
Insert picture description here
All voice data can be used as training set .

5、 ... and 、 Analysis of deviation and variance when data distribution does not match (Bias and Variance with mismatched data distributions)

Insert picture description here
First, the algorithm only sees the training set data , Never seen development set data ; second , Development set data comes from different distributions .

Build development sets and test sets from the same distribution , But the training sets come from different distributions . All you have to do is randomly break up the training set , Set aside a portion for training - Development set .
On the lower right are three examples , The second high deviation , Under fitting .
Insert picture description here
Open to avoid deviation 、 variance 、 The data doesn't match 、 Over fitting the development set （ If the gap is large, it will be over fitting ）.

Example on the right , It performs better in test set and development set .
Insert picture description here
The horizontal axis ： General speech recognition gets data 、 Collect different data sets such as voice data related to rear-view mirrors
Vertical axis ： Different ways or algorithms of processing data - Human level 、 The error rate achieved on a neural network trained or untrained data set .

How to deal with data mismatch ？
Especially from different distributions , You can use more training data .2.6 I'll talk about .

6、 ... and 、 Dealing with data mismatches (Addressing data mismatch)

Insert picture description here
To avoid over fitting the test set , Do error analysis , Look at the development set, not the test set .
1. Find out how the development set differs from the training set
2. Make the training set more like a development set

The training set is close to the development set , You can synthesize data .
Insert picture description here
Think there is a data mismatch problem , It can be used for data analysis .

7、 ... and 、 The migration study (Transfer learning)

Insert picture description here
Sometimes , Neural networks can acquire knowledge from a task , And apply these to just another independent task .

( In blue ) say concretely , During the first stage of training , When you train for image recognition tasks , You can train all the common parameters of the neural network , All the weights , All layers , Then you get a network that can do image recognition and prediction . After training the neural network , To realize transfer learning , What you have to do now is , Replace the data set with a new one (x,y) Yes , Now these are radiology images , and y Is the diagnosis you want to predict , What you need to do is initialize the weight of the last layer , Let's call it w^[L] and b^[L] Random initialization .

（ Purple ） Training data on a new radiology dataset ：
The data set is small , Retrain the last layer of weights , And keep other parameters unchanged .
Enough data , Retrain all remaining layers in the neural network . The initial training is pre training (pre-training)、 Update all weights , Then the training process on the radiology data is called fine tuning (fine tuning)

( Next ) Here is another example , Suppose you've trained a speech recognition system , Now? x Is audio or audio clip input , and y It's dictation , So you've trained your speech recognition system , Let it output dictation text . Now let's say you want to build a “ Rousing words 〞 or “ Trigger word ” Detection system , The so-called awakening word or trigger word is a sentence we say , It can wake up the voice control equipment at home , Like you said "Alexa” can To wake up an Amazon Echo equipment , Or use “OK Google〞 Wake up the Google equipment , use "Hey siri" To wake up Apple Devices , use " Hello baidu " Wake up a Baidu device . To do that , You may need to remove the last layer of the neural network , Then add a new output node , But sometimes you can add more than one new node , Or even add a few new layers to your neural network , Then put the wake-up word to detect the tag of the problem y Feed in and train . Again , It depends on how much data you have , You may just need to retrain the new layer of the network , Maybe you need to retrain more layers of the neural network .

（ Shanghong ） The first image training is 100 All samples , You can learn low-level features . Radiology training has 100 Samples , So a lot of knowledge learned from image recognition training can be transferred , Even though the radiology department has little data .
Insert picture description here
When transfer learning makes sense ？
1. Want to start from the task A Learn and transfer some knowledge to the task B, When A and B There's the same input x Time makes sense ;
2. Mission A Data is better than task B Much more , meaningful ;
3.A The low-level features of can help the task B

8、 ... and 、 Multi task learning (Multi-task learning)

Insert picture description here
Transfer learning , The steps are serial .
Driverless cars detect pedestrians at the same time 、 vehicle 、 Stop sign 、 traffic lights

Define the loss function of neural network ,softmax Regression assigns a single label to a single sample , This diagram can have many different labels , Multiple objects may appear in the same picture at the same time , Instead, you iterate through different types .

Multi task learning , Four different neural networks can also be trained . Even if some images have only a small number of labels ,
Insert picture description here
When does multitasking make sense ？
1. If training a group of tasks , Low level features can be shared 6
2. Not absolutely ： The amount of data in each task is very close .
3. When training a large enough neural network , Do all the work at the same time . So an alternative to multi task learning is to train a separate neural network for each task . The only case of performance degradation is that the neural network is not large enough .

Transfer learning is frequently used , The data set is relatively small , Transfer learning can help you .

Nine 、 What is the end-to-end in-depth learning (What is end-to-end deep learning）

Insert picture description here
Previous data processing systems or learning systems , Multiple stages of processing are required , So end-to-end deep learning is to ignore all these different stages , Replace it with a single neural network .

For example, speech recognition , First you will extract some features , Some hand designed audio features , Maybe you've heard MFCC, This algorithm is used to extract a specific set of artificially designed features from audio . After extracting some low-level features , You can use machine learning algorithms to find phonemes in audio clips , So phoneme is the basic unit of sound , for instance ”cat” The word is made up of three syllables ,Cu-、Ah- And u-, The algorithm extracts these three phonemes , Then you string phonemes together to form separate words , Then you string the words together to form the dictation text of the audio clip .

Compared with the above pipeline , End to end deep learning is shown in the bottom line of the figure . One of its biggest challenges is that it requires a lot of data to make the system perform well .
Insert picture description here
Face recognition access control system .
The best approach so far seems to be a multi-step approach , First , You run a software to detect faces , So the first detector looks for the location of the face , Face detected , Then zoom in on that part of the image , And crop the image , Center the face , Then there are the red framed photos here , And feed it into the neural network , Let the network learn , Or estimate
The identity of that man .

Why two-step method is better ：
1. Two problems to solve , Each is simple
2. There are a lot of training data for both subtasks
Insert picture description here
Machine translation

Watch your child's hands x light , Estimated age

Ten 、 Whether to use end-to-end deep learning (Whether to use end-to-end learning）

Insert picture description here
advantage ：
1. Let data speak
2. Fewer manually designed components are required

shortcoming ：
1. It may take a lot of data
2. Manual design components that may be useful are excluded
Insert picture description here
When applying end-to-end deep learning , Consider whether there is enough data to learn directly from x Mapping to y Functions that are complex enough .
Necessary complexity （complexity needed）

Driverless technology ： Check around the car , Plan the route 、 Steering wheel accuracy 、 Precise throttle force

Deep learning - Wu enda ︎

原网站

版权声明
本文为[997and]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/178/202206271951331435.html