当前位置:网站首页>Summary of semantic segmentation learning (I) -- basic concepts
Summary of semantic segmentation learning (I) -- basic concepts
2022-06-12 07:30:00 【Nidiya is trying】
“ The foundation is not solid , The earth trembled and the mountains swayed ”, When learning knowledge in a certain field , Knowing the basic concepts must be the most important . It will be updated for a long time ~~~
1、 On the sampling
Technology to improve image resolution . Now many up sampling methods are used to restore the resolution of the feature image to the resolution of the original image . Several implementation methods of up sampling :
(1)unpooling: The reverse process of maximum pooling . Write down to do max pooling The biggest time item The location of , Fill in the maximum value corresponding to the maximum value , Fill in other positions 0.

(2)interpolation: For example, the nearest neighbor element method 、 Bilinear interpolation , Bicubic interpolation algorithm, etc , The figure is an example of bilinear interpolation ( The source is from the Internet ):

And then in y Linear interpolation in direction , obtain :

(3)deconvolution: After transposing the convolution matrix , And then we do convolution on this basis . Understand transpose convolution from the perspective of matrix , I added some descriptions :( The source is from the Internet )

2、 Context information
In the real world , Goals cannot exist alone , It must have a more or less relationship with other objects around it , This is commonly referred to as context information . Every pixel in an image cannot be isolated , A pixel has a certain relationship with the surrounding pixels , The context information in an image is the relationship between one pixel and other pixels , Or other pixels affect the pixel .
3、 Semantic information
Some of the characteristics that humans can define , An abstraction of features . It can be understood as the texture of an image , Color , Or the category of the target . Some people also divide the semantics of images into visual layers 、 Object layer and concept layer . The visual layer is usually understood as the bottom layer , It's color 、 Texture and shape and so on , These features are called underlying feature semantics ; The object layer is the middle layer , It usually includes attributes, characteristics, etc , Is the state of an object at a certain time ; The concept level is the high level , Is the closest thing that the image expresses to human understanding . When doing semantic segmentation , Is to separate the highest level semantics , for instance , For example, mark the passing cars in the automatic driving , A car has many parts : Rearview mirror 、 tire 、 The window ······ wait , These all belong to the semantics of the middle layer , The network needs all these parts to be labeled as cars . That is to divide The content that the image can express in the largest range . The reason is that it is a man-made control range , For example, the label and task given by people is to divide the parts on the car , So the rearview mirror 、 tire 、 The window ······ And so on are the semantics of images .
4、 Location information
The position of the pixel in the image . Location information is important for segmentation tasks , Because semantic segmentation requires category labels to align with the original image , Therefore, it is necessary to introduce the position information of pixels .
5、 Translation invariance
No matter where the object in the image is moved , The results should be the same .
6、 Translational isomorphism
The working principle of the system in different positions is the same , But its response varies with the location of the target . such as , Instance split task , You need to translate homomorphism , If the target is translated , Then the output instance mask should change accordingly .FCIS As mentioned in the article , A pixel may be the foreground in an instance , But in an adjacent instance, it may be the background , in other words , The same pixel is in different relative positions , It has different semantics , Corresponding to different responses , This is also translational homomorphism .
7、 Potential space
Is a representation of compressed data . The potential spatial representation of the data contains all the important information needed to represent the original data points . The most important features of the image are stored in the potential space . About “ Potential space ” Description of , some paper Described as “ Latent feature space ”, So this potential space is what we usually say “ The feature space ”.
8、 End to end
Input is data , The output is the predicted result , There is no need to do a series of feature extraction like traditional machine learning .
9、CRF Conditional random field (Conditional Random Fields)
(1) What is it? : Given a set of input sequences , The conditional probability distribution of another set of output sequences .
(2) When to use : When the state of each position of the output sequence needs to take into account the state of adjacent positions .
(3) random process : A collection of random variables ; A random process indexed by a spatial variable is called a random field .
(4) Markov random Airport : If the assignment of a position is only related to the value of its adjacent position , It has nothing to do with the value of the position not adjacent to it , So this random airport is a Markov random airport . and CRF Is to give a set of Markov random fields in the observation state , in other words CRF Considering the prior condition of the observed state . Observation status can be understood as some alternative options , Used to make decisions .
(5) principle : Conditional random fields satisfy Gibbs distribution
,
Where the energy function
.
The first term in the energy function is the univariate potential function , Used to measure when pixels i The pixel value of is yi when , This pixel belongs to the category label xi Probability . This univariate potential function can be directly determined by CNN Output ,CNN After the training, the probability value of each pixel belonging to each category can be output . The second term is the bivariate potential function , Describe the relationship between pixels , Encourage similar pixels to assign the same label , Pixels with large differences are assigned different labels . The bivariate potential consists of label compatible terms and eigenfunctions : Label compatibility constrains the conduction between pixels , Only under the same label conditions can energy be transmitted to each other , At this point, the label compatibility value is 0, Otherwise 1, The feature function measures the similarity between different pixels . The position information and color information in the binary potential function are provided by the original image . When energy E(x) The more hours , The more accurate the predicted category labels , The final result can be obtained by iteratively minimizing the energy function , Realization CRF Implicit variable X The reasoning of .
———————————————————————————————————————————
Reference resources :
https://blog.csdn.net/weixin_44350541/article/details/105171196
https://zhuanlan.zhihu.com/p/52407509
Conditional random vector field CRF - Simple books
If you have any questions or mistakes, please send a private letter ~
边栏推荐
- 面试计算机网络-传输层
- Keil installation of C language development tool for 51 single chip microcomputer
- AcWing——4269校庆
- Tradeoff and selection of SWC compatible Polyfill
- AI狂想|来这场大会,一起盘盘 AI 的新工具!
- BI技巧丨当月期初
- Explain ADC in stm32
- R语言rnorm函数生成正太分布数据、使用epiDisplay包的summ函数计算向量数据的描述性统计汇总信息并可视化有序点图(名称、有效值个数、均值、中位数、标准差、最大值、最小值)
- Modelarts培训任务1
- VS2019 MFC IP Address Control 控件继承CIPAddressCtrl类重绘
猜你喜欢

2022起重机械指挥考试题模拟考试平台操作

Complete set of typescript Basics

RT thread studio learning (IX) TF Card File System

Keil installation of C language development tool for 51 single chip microcomputer

Gd32f4 (5): gd32f450 clock is configured as 200m process analysis

Kotlin plug-ins kotlin Android extensions

2022 electrician (elementary) examination question bank and simulation examination

Kotlin插件 kotlin-android-extensions

Detailed explanation of coordinate tracking of TF2 operation in ROS (example + code)

5 lines of code identify various verification codes
随机推荐
Model deployment learning notes (I)
RT thread studio learning (I) new project
面试计算机网络-传输层
i. Mx6ul porting openwrt
Detailed explanation of memory addressing in 8086 real address mode
Static coordinate transformation in ROS (analysis + example)
8086/8088 instruction execution pipeline disconnection reason
FCPX插件:简约线条呼出文字标题介绍动画Call Outs With Photo Placeholders for FCPX
Node, topic, parameter renaming and global, relative and private namespaces in ROS (example + code)
Leetcode34. find the first and last positions of elements in a sorted array
Imx6q pwm3 modify duty cycle
Circular linked list and bidirectional linked list - practice after class
AcWing——4269校庆
D
AI狂想|来这场大会,一起盘盘 AI 的新工具!
Day 4 of pyhon
QT realization tray
[Li Kou] curriculum series
RT thread studio learning (VII) using multiple serial ports
我人生中的第一个需求——Excel数据批量上传到数据库