当前位置：网站首页>Summary of semantic segmentation learning (I) -- basic concepts

Summary of semantic segmentation learning (I) -- basic concepts

2022-06-12 07:30:00 【Nidiya is trying】

“ The foundation is not solid , The earth trembled and the mountains swayed ”, When learning knowledge in a certain field , Knowing the basic concepts must be the most important . It will be updated for a long time ~~~

1、 On the sampling

Technology to improve image resolution . Now many up sampling methods are used to restore the resolution of the feature image to the resolution of the original image . Several implementation methods of up sampling ：

（1）unpooling： The reverse process of maximum pooling . Write down to do max pooling The biggest time item The location of , Fill in the maximum value corresponding to the maximum value , Fill in other positions 0.

（2）interpolation： For example, the nearest neighbor element method 、 Bilinear interpolation , Bicubic interpolation algorithm, etc , The figure is an example of bilinear interpolation （ The source is from the Internet ）：

And then in y Linear interpolation in direction , obtain ：

$f(p)\approx \frac{y_{2}-y}{y_{2}-y_{1}}f(R_{1})+\frac{y-y_{1}}{y_{2}-y_{1}}f(R_{2})$

（3）deconvolution： After transposing the convolution matrix , And then we do convolution on this basis . Understand transpose convolution from the perspective of matrix , I added some descriptions ：（ The source is from the Internet ）

2、 Context information

In the real world , Goals cannot exist alone , It must have a more or less relationship with other objects around it , This is commonly referred to as context information . Every pixel in an image cannot be isolated , A pixel has a certain relationship with the surrounding pixels , The context information in an image is the relationship between one pixel and other pixels , Or other pixels affect the pixel .

3、 Semantic information

Some of the characteristics that humans can define , An abstraction of features . It can be understood as the texture of an image , Color , Or the category of the target . Some people also divide the semantics of images into visual layers 、 Object layer and concept layer . The visual layer is usually understood as the bottom layer , It's color 、 Texture and shape and so on , These features are called underlying feature semantics ; The object layer is the middle layer , It usually includes attributes, characteristics, etc , Is the state of an object at a certain time ; The concept level is the high level , Is the closest thing that the image expresses to human understanding . When doing semantic segmentation , Is to separate the highest level semantics , for instance , For example, mark the passing cars in the automatic driving , A car has many parts ： Rearview mirror 、 tire 、 The window ······ wait , These all belong to the semantics of the middle layer , The network needs all these parts to be labeled as cars . That is to divide The content that the image can express in the largest range . The reason is that it is a man-made control range , For example, the label and task given by people is to divide the parts on the car , So the rearview mirror 、 tire 、 The window ······ And so on are the semantics of images .

4、 Location information

The position of the pixel in the image . Location information is important for segmentation tasks , Because semantic segmentation requires category labels to align with the original image , Therefore, it is necessary to introduce the position information of pixels .

5、 Translation invariance

No matter where the object in the image is moved , The results should be the same .

6、 Translational isomorphism

The working principle of the system in different positions is the same , But its response varies with the location of the target . such as , Instance split task , You need to translate homomorphism , If the target is translated , Then the output instance mask should change accordingly .FCIS As mentioned in the article , A pixel may be the foreground in an instance , But in an adjacent instance, it may be the background , in other words , The same pixel is in different relative positions , It has different semantics , Corresponding to different responses , This is also translational homomorphism .

7、 Potential space

Is a representation of compressed data . The potential spatial representation of the data contains all the important information needed to represent the original data points . The most important features of the image are stored in the potential space . About “ Potential space ” Description of , some paper Described as “ Latent feature space ”, So this potential space is what we usually say “ The feature space ”.

8、 End to end

Input is data , The output is the predicted result , There is no need to do a series of feature extraction like traditional machine learning .

9、CRF Conditional random field （Conditional Random Fields）

（1） What is it? ： Given a set of input sequences , The conditional probability distribution of another set of output sequences .

（2） When to use ： When the state of each position of the output sequence needs to take into account the state of adjacent positions .

（3） random process ： A collection of random variables ; A random process indexed by a spatial variable is called a random field .

（4） Markov random Airport ： If the assignment of a position is only related to the value of its adjacent position , It has nothing to do with the value of the position not adjacent to it , So this random airport is a Markov random airport . and CRF Is to give a set of Markov random fields in the observation state , in other words CRF Considering the prior condition of the observed state . Observation status can be understood as some alternative options , Used to make decisions .

（5） principle ： Conditional random fields satisfy Gibbs distribution

$P(\textup{X}=\textup{x}|\textup{I}) = \frac{1}{Z(\textup{I})}\textup{exp}(-E(\textup{x}|\textup{I}))$ ,

Where the energy function

$E(\textup{x}|\textup{I})=\sum_{i}^{}{\psi}_{u}(x_{i})+\sum_{i<j}^{}{\psi}_{p}(x_{i},x_{j})$ .

The first term in the energy function is the univariate potential function , Used to measure when pixels i The pixel value of is yi when , This pixel belongs to the category label xi Probability . This univariate potential function can be directly determined by CNN Output ,CNN After the training, the probability value of each pixel belonging to each category can be output . The second term is the bivariate potential function , Describe the relationship between pixels , Encourage similar pixels to assign the same label , Pixels with large differences are assigned different labels . The bivariate potential consists of label compatible terms and eigenfunctions ： Label compatibility constrains the conduction between pixels , Only under the same label conditions can energy be transmitted to each other , At this point, the label compatibility value is 0, Otherwise 1, The feature function measures the similarity between different pixels . The position information and color information in the binary potential function are provided by the original image . When energy E(x) The more hours , The more accurate the predicted category labels , The final result can be obtained by iteratively minimizing the energy function , Realization CRF Implicit variable X The reasoning of .

———————————————————————————————————————————

Reference resources ：

https://blog.csdn.net/weixin_44350541/article/details/105171196

https://zhuanlan.zhihu.com/p/52407509

Conditional random vector field CRF - Simple books

If you have any questions or mistakes, please send a private letter ~

原网站

版权声明
本文为[Nidiya is trying]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203010557067499.html