当前位置:网站首页>Cs231n notes (bottom) - applicable to 0 Foundation

Cs231n notes (bottom) - applicable to 0 Foundation

2022-07-05 16:43:00 Small margin, rush

Catalog

 

Transfer learning and object location monitoring

The migration study

Object positioning & testing

The sliding window -overfeat

RCNN

fast R-CNN

SSD

Object segmentation &  Semantic segmentation


 

Transfer learning and object location monitoring

Can target detection be regarded as a regression problem ? The bounding box of multiple objects output for different pictures , May output 4 A bounding box may also output 8 A bounding box , The number of outputs is not fixed , So it can't be regarded as a return problem .

The migration study

In practice , Because few data sets are large enough , So few people choose to train the network from scratch . The common way is : Pre train one on a very large data set CNN, Then use the pre trained network as the initialization network to fine tune 、 Or as a feature extractor .

  • Convolutional neural network as a feature extractor . Use in ImageNet Pre trained CNN, Remove the last full connection layer ( namely : The last layer used for classification ), Then use the rest as a feature extractor .
  • Fine-tuning CNN. Replace the input layer of the network ( data ), Continue training with new data .

Transfer learning scenarios :

  • The new dataset is relatively small , And it is highly similar to the original data set . It is not recommended to CNN Conduct Fine-tune, It is recommended to use pre trained CNN As a feature extractor , Then train a linear classifier .
  • The new data set is relatively large , And it is highly similar to the original data set . Because the new data set is large enough , Sure fine-tune The whole network
  • The new dataset is smaller , And it is very different from the original data set . Because the data set is very small , So it is best to train a linear classifier . And because the data set is not similar to the original data set , The best way is to train a linear classifier from the shallow output of the pre training network as a feature .
  • The new data set is relatively large , And it is very different from the original data set . Because the new data set is big enough , You can retrain the network .

Object positioning & testing

After labeling the picture , Also frame where the object is - In classification and positioning , The number of output boxes is known in advance , And object detection is uncertain , And classification + The difference between positioning tasks is , The number of objects to be detected in object detection is uncertain , Therefore, the regression framework cannot be used directly .

The sliding window -overfeat

Randomly select several windows of different sizes and positions

RCNN

  1. Pre train a CNN
  2. Build a training set : First apply Selective Search The algorithm selects 2000~3000 Candidate box .
  3. Each candidate region is preprocessed , Deliver to CNN Extract image features , Then send the image features to SVM In the classifier , Calculate the loss of label classification . At the same time, the image features are also sent to the regressor , Calculate the offset distance L2 Loss .
  4. Back propagation training

fast R-CNN

solve R-CNN The problem of slow training prediction , The whole image is CNN feature extraction , Then select the candidate area

  1. And R-CNN Same use Selective Search Method generation 2000 Multiple candidate boxes
  2. Input the whole picture directly CNN in , Feature extraction
  3. Put the 2000 Boxes map to just CNN The last layer extracted feature map On

faster R-CNN:

SSD

SSD The idea is to divide the image into many grids , Several can be derived from the center of each lattice base boxes. Use neural network to classify these grids at one time , For these baseboxes Regression .

Object segmentation &  Semantic segmentation

Semantic segmentation is to classify each pixel in the image , Do not distinguish between objects , Only care about pixels , Often costly , May first The framework of undersampling and oversampling

Under sampling can use convolution layer and pooling , Over sampling adopts de pooling , Transposition convolution

There is another one called Max Unpooling Methods , This method records the previous use max pooling The index of the previous maximum values in the array , When de pooling, put the value in the index , Fill in other positions 0:

Object segmentation -Mask RCNN

What we need to do is to go further in object detection , Segment the objects from the pixel level .

Use images CNN Process as a feature , And then through a RPN Network generation candidate area , Project to the previous feature map. Here with faster RCNN equally . Then there are two branches , A branch and faster RCNN identical , Predict the classification and boundary value of the candidate box , Another branch is similar to semantic segmentation , Classify each pixel .

 

原网站

版权声明
本文为[Small margin, rush]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202140512287802.html