当前位置:网站首页>History of object recognition
History of object recognition
2022-07-06 09:53:00 【zyw2002】
stay github I see a very good picture summarized on ( Original image address ) Code first
Overview of object recognition
The development history :
Image classification (Image Classification)
Mission : Classify the images according to the dominant objects in the images .
Data sets :MNIST, CIFAR, ImageNet
Object positioning (Object Localization)
Mission : Predict the image area containing the dominant target . Then we use image classification to recognize the target in this area .
Data sets :ImageNet
Object recognition (Object Recognition)
Mission : Locate and classify all objects in the image . This task usually includes : Proposed area , Then classify the objects .
Data sets :PASCAL, COCO
Semantic segmentation (Semantic Segmentation)
Mission : Mark each pixel of the image with the object class to which the image belongs , For example, the person in this example 、 Sheep and grass .
Data sets : PASCAL, COCO
Instance segmentation (Instance Segmentation)
Mission : Mark each pixel of the image with the object class and object instance it belongs to .
Data sets :PASCAL, COCO
Key point detection (Keypoint Detection)
Mission : Detect the position of a set of predefined object keys , Such as key points of human body , Face key points .
Data sets :COCO
Related concepts of convolution network
features (feature)
Pattern (pattern)、 Neurons activate (activation of a neuron)、 Characteristic detector (feature detector)
Characteristics refer to : When a particular mode ( features ) Appears in its input field ( Receiving area ) when , Activated hide Neuron
Patterns detected by neurons can be visualized by :
(1) Optimize the input area to maximize the activation of neurons (deep dream)
(2) Visualize the gradient of neuron activation or guiding gradient on the input pixel ( Back propagation and guided back propagation )
(3) Visualize a group of image areas that can most activate neurons in the training data set
Receptive domain (Receptive Field)
Input area of a feature (input region of a feature)
The accepted domain refers to : Input image Areas that affect feature activation . let me put it another way , It is the area concerned by the feature .
Generally speaking , Higher level features have larger acceptance domains , This allows it to learn to capture more complex / Abstract patterns . The structure of convolutional neural network determines how the acceptance domain changes layer by layer .
Characteristics of figure (Feature Map)
A hidden layer channel (a channel of a hidden layer)
Characteristic diagram refers to : By applying the same Characteristic detector ( filter ) With Sliding window Created by mouth A set of features ( That is convolution )
Features in the same feature map have the same acceptance ability , And look for the same pattern in different places . This produces Space invariance of Convolutional Neural Networks .
Characteristic quantity (Feature Volume)
Hidden layer in convolutional neural network (a channel of a hidden layer)
Characteristic quantity refers to A set of feature maps ( Characteristics of figure ), Each feature map searches for features at a fixed position on the input image . All features have the same accepted domain size .
The full connection layer is used as the characteristic quantity (Fully connected layer as Feature Volume)
have k A full connection layer of hidden nodes (fc layer —— Usually connected to the end of convolutional neural network for classification ) Can be seen as a 1x1xk Characteristic quantity of .
This feature quantity has a feature in each feature graph , Its acceptance domain covers the entire image . Will a 1x1xk Filter core with a 1x1xd The characteristic volume of , Will create a 1x1xk Feature volume . Replace completely connected layers with convoluted layers , It enables us to apply convolution networks to images of any size .
Transposition convolution (Transposed Convolution)
The gradient operation of back propagation convolution operation . let me put it another way , It is the backward transmission of convolution . A transposed convolution can be realized as a normal convolution inserting zero between input features . The size of a filter is k, stride s And zero padding p The convolution of has a related transpose convolution , Its filter size is k ’ =k, stride s ’ =1, Zero fill p ’ =k-p-1, And insert s-1 0.
As shown in the picture above on the left , The red input unit helps activate the 4 Output units ( adopt 4 Colored squares ), Therefore, it receives gradients from these output units . This gradient back propagation can be achieved by transpose convolution shown on the right .
End to end object recognition system (End-To-End object recognition pipeline)
By optimizing a single objective function ( That is, the differentiable function of variables in each stage ) To train all stages ( Preprocessing 、 Regional proposal generation 、 Proposal classification 、 post-processing ) Target recognition process This end-to-end system is opposite to the traditional object recognition system , The latter connects the stages in a non differentiable way . In these systems , We don't know how changing the variables of a stage will affect the overall performance , So each stage must be trained independently or alternately , Or heuristic programming .
Related concepts of object detection (Object Recognition Concepts)
The bounding box proposes (Bounding box proposal)
Areas of interest (region of interest), Regional proposal (region proposal), Box proposal (box proposal)
Input a rectangular area in the image that may contain objects . These suggestions can be generated by some heuristic search : Object search 、 Selective search or regional suggestion network (RPN). It can be expressed as a bounding box 4 Unit vector , Or store its two angular coordinates (x0, y0) (x1, y1), or ( More common ) Store its center position and its width and height (x, y, w h). A bounding box is usually accompanied by a confidence score ( That is, judge how likely the detection box contains objects ). The difference between two bounding boxes is usually represented by their vectors L2 Distance to measure .W and h Logarithmic transformation can be carried out before calculating the distance .
Occurring simultaneously than (Intersection over Union、IOU)
Measure the similarity between the real frame and the detection frame
Non maximum suppression (Non Maxium Suppression、NMS)
Any detection frame that significantly overlaps with the detection frame with higher reliability (IoU > IoU_threshold) Be inhibited ( Delete ).
Bounding box regression( Regression of bounding box )
By observing the input area , We can infer the bounding box that is more suitable for the object in it , Even if the object is only partially visible . The above example shows that it can be inferred by observing only a part of the object ground truth box The possibility of .
therefore , A regressor can be trained to observe an input region , Predict the offset between the input field box and the truth box ∆(x, y, w, h). If every object class has a regressor , It is called the regression of specific classes , Otherwise, it is called class independent regression ( All classes have a regressor ).
Bounding box regressors are usually accompanied by a bounding box classifier ( Confidence scorer ) To estimate the confidence that the object exists in the box . Classifiers can also be class specific or class independent . If the front box is not defined , The input area box plays the role of the previous box .
A priori box (Prior box)
Unlike using the input field as the only a priori box , We can train multiple bounding box regressors , Each regressor looks at the same input field , But there are different a priori frames , And learn to predict the offset between your prior box and the ground truth box . such , Regressors with different prior frames can learn to predict different properties ( Aspect ratio 、 The proportion 、 Location ) The bounding box of . The previous box can be predefined relative to the input area , Or learn through clustering . The correct box matching strategy is the key to make the training converge .
Check box matching strategy (Box matching Strategy)
We cannot expect the bounding box regressor to predict the distance from the input region or a priori box ( More common ) Objects too far away . therefore , We need a matching strategy to decide which a priori box matches the real value . Each match is a training example for regression . Possible strategies :( Multiple boxes ) Match each real box with a previous highest IOU A priori box match of ;
Full picture ~
边栏推荐
- Configure system environment variables through bat script
- C杂讲 动态链表操作 再讲
- max-flow min-cut
- 通过bat脚本配置系统环境变量
- Why data Tiering
- MapReduce instance (IV): natural sorting
- 大学想要选择学习自动化专业,可以看什么书去提前了解?
- Canoe CAPL file operation directory collection
- Control the operation of the test module through the panel in canoe (primary)
- Summary of May training - from a Guang
猜你喜欢
小白带你重游Spark生态圈!
C杂讲 文件 初讲
C杂讲 浅拷贝 与 深拷贝
Single chip microcomputer realizes modular programming: Thinking + example + system tutorial (the degree of practicality is appalling)
MapReduce instance (VIII): Map end join
C杂讲 双向循环链表
Download address of canoe, download and activation of can demo 16, and appendix of all canoe software versions
Listen to my advice and learn according to this embedded curriculum content and curriculum system
Summary of May training - from a Guang
Several silly built-in functions about relative path / absolute path operation in CAPL script
随机推荐
Mapreduce实例(六):倒排索引
Function description of shell command parser
Vs All comments and uncomments
Which is the better prospect for mechanical engineer or Electrical Engineer?
MapReduce instance (VI): inverted index
112 pages of mathematical knowledge sorting! Machine learning - a review of fundamentals of mathematics pptx
[deep learning] semantic segmentation: thesis reading (neurips 2021) maskformer: per pixel classification is not all you need
[Yu Yue education] reference materials of power electronics technology of Jiangxi University of science and technology
MapReduce instance (IX): reduce end join
33岁可以学PLC吗
Download address of canoe, download and activation of can demo 16, and appendix of all canoe software versions
PR 2021 quick start tutorial, first understanding the Premiere Pro working interface
Vh6501 Learning Series
Learning SCM is of great help to society
单片机如何从上电复位执行到main函数?
Mapreduce实例(十):ChainMapReduce
There are software load balancing and hardware load balancing. Which one to choose?
How does the single chip microcomputer execute the main function from power on reset?
零基础学习单片机切记这四点要求,少走弯路
CAPL脚本中关于相对路径/绝对路径操作的几个傻傻分不清的内置函数