当前位置:网站首页>Machine learning notes - convolutional neural network memo list

Machine learning notes - convolutional neural network memo list

2022-06-11 09:08:00 Sit and watch the clouds rise

One 、 summary

         Tradition CNN Convolutional neural network , Also known as CNN, Is a specific type of neural network , It usually consists of the following layers :

Two 、 The main types of layers

1、Convolution layer (CONV)

         Convolution layer (CONV) In scan input I Use the filter that performs the convolution operation . Its super parameters include filter size F And stride S. Generated output O It is called characteristic graph or activation graph .

         remarks : The convolution step can also be extended to 1D and 3D situation . 

2、Pooling (POOL)

         Pooling layer (POOL) It is a down sampling operation , It is usually applied after the convolution layer , It has certain space invariance . especially , The largest and average pool is a special type of pool , Take the maximum and average values respectively .

        Maximum pooling : Select the maximum value of the current view for each pool operation

        The average pooling : Each pooled operation averages the value of the current view

3、Fully Connected (FC)

         Fully connected layer (FC) Run on flattened input , Each of these inputs is connected to all neurons . If there is ,FC Layers usually appear in CNN The end of the architecture , Can be used to optimize the target , For example, class scores .

  3、 ... and 、 Filter super parameters

         Convolution layer containing filter , It is important to understand the meaning behind its superparameters .

1、Dimensions of a filter

         One size is F\times F The filter of shall be used to contain C The input to the channel is a F \times F \times C Volume , Its pair size is I \times I \times C  The input of performs convolution And generate a size of O \times O \times 1 The output characteristic diagram of ( Also called activation diagram ).

          remarks : The size is F\times F Of K The application of the filter results in an output size of O \times O \times K Characteristic graph .

2、Stride

         For convolution or pooling , Stride S Represents the number of pixels the window moves after each operation .

3、Zero-padding

         Zero padding means that P The process of adding zeros to each side of the input boundary . This value can be specified manually , It can also be set automatically by one of the three modes detailed below :

  Four 、 Adjust super parameters

1、 Parameter compatibility in convolution layer

         By paying attention to I Enter the length of the volume size ,F The length of the filter ,P Zero fill ,S Stride , Then the output size of the feature graph along this dimension O Given by the following formula :

 2、 Understand the complexity of the model

         To assess the complexity of the model , It is often useful to determine the number of parameters its schema will have . In a given layer of a convolutional neural network , It is done as follows :

 3、 Feel the field

         The first k The receptive field of the layer is expressed as R_k \times R_k Region , The first k Each pixel of an activation map can “ notice ” Of input . By calling F_j layer j Filter size and S_i layer i And use the Convention S_0 = 1, You can calculate layers using the following formula k Feeling field of :

          In the following example ,F_1 = F_2=3,S_1 = S_2=1,R^2 =1+2\cdot 1+2\cdot 1=5

  5、 ... and 、 Common activation functions

(1)Rectified Linear Unit

         Rectifier linear unit layer (ReLU) It's an activation function g, All elements for volume . It aims to introduce nonlinearity into the network . The following table summarizes its variants :

(2)Softmax

        softmax Step can be regarded as a generalized logic function , It takes the fractional vector x\in\mathbb{R}^n As input , And output an output probability vector p∈ R  Through the... At the end of the architecture softmax function . The definition is as follows :

        \boxed{p=\begin{pmatrix}p_1\\\vdots\\p_n\end{pmatrix}}\quad\textrm{where}\quad\boxed{p_i=\frac{e^{ x_i}}{\displaystyle\sum_{j=1}^ne^{x_j}}}

6、 ... and 、Object detection

(1) Model type

         Yes 3 There are three main types of object recognition algorithms , The nature of their predictions is different . They are described in the following table :

(2)Detection

         In the context of object detection , According to whether we just want to locate the object or detect more complex shapes in the image , Use different methods . The following table summarizes the two main :

(3)Intersection over Union(IOU)

         The intersection of the Union , Also known as IoU, Is a quantitative prediction bounding box B_p With the actual bounding box B_a A function of the degree of correct positioning . It is defined as :
        \boxed{\textrm{IoU}(B_p,B_a)=\frac{B_p\cap B_a}{B_p\cup B_a}}

         remarks : We always have IoU∈[0,1]. By convention , If \textrm{IoU}(B_p,B_a)\geqslant0.5, Then the predicted bounding box B_p Considered to be quite good .

 (4)Anchor boxes

         Anchor box is a technique for predicting overlapping bounding boxes . In practice , Allow the network to predict multiple boxes at the same time , Each box prediction is limited to a given set of geometric properties . for example , The first prediction may be a rectangle of a given shape , The second prediction may be another rectangle with different geometry .

(5)Non-max suppression

         The non maximum suppression technique aims to remove the overlapping bounding boxes of the same object by selecting the most representative bounding box . After removing all probability predictions below 0.6 Behind the box of , Repeat the following steps with the remaining boxes :

         For a given class ,
        • step 1: Select the box with the maximum prediction probability .
        • step 2: Discard any with IoU⩾0.5 And the previous box .

(6)YOLO

        You Only Look Once (YOLO) Is an object detection algorithm that performs the following steps :

         step 1: Divide the input image into G×G grid .

         The first 2 Step : For each grid cell , Run a forecast yy Of CNN, Its form is as follows :

        \boxed{y=\big[\underbrace{p_c,b_x,b_y,b_h,b_w,c_1,c_2,...,c_p}_{\textrm{repeated }k\textrm{ times}},...\big]^T\in\mathbb{R}^{G\times G\times k\times(5+p)}}

         among p_c Is the probability of detecting an object ,b_x,b_y,b_h,b_w Is the property of the detected bounding box ,c_1,...,c_p Is what is detected p Class one-hot Express ,k Is the number of anchor frames .

         The first 3 Step : Run a non maximum suppression algorithm to remove any potential duplicate overlapping bounding boxes .

(7)R-CNN

         Regions with convolutional neural networks (R-CNN) Is an object detection algorithm , It first segments the image to find the potential related bounding boxes , Then run the detection algorithm to find the most likely objects in these bounding boxes .

          remarks : Although the original algorithm is computationally expensive and slow , But the newer architecture makes the algorithm run faster , for example Fast R-CNN and Faster R-CNN.

7、 ... and 、 Face verification and recognition

(1) Model type

         The following table summarizes the two main model types :

(2)One Shot Learning

        One Shot Learning Is a face verification algorithm , It uses a limited training set to learn the similarity function , This function can quantify the different degrees of two given images . The similarity function applied to two images is usually recorded as d(\textrm{image 1}, \textrm{image 2}).

(3)Siamese Network

        Siamese Networks It aims to learn how to encode images , Then quantify the difference between the two images . For a given input image x^{(i)}, Coded output is usually recorded as f(x^{(i)}).

(4)Triplet loss

         Triplet loss \ell Is based on images A( anchor )、P( just ) and N( negative ) The embedding of triples of represents the calculated loss function . Anchor points and positive examples belong to the same category , Negative cases belong to another category . By calling \alpha\in\mathbb{R}^+  Margin parameter , This loss is defined as follows :
        \boxed{\ell(A,P,N)=\max\left(d(A,P)-d(A,N)+\alpha,0\right)}

8、 ... and 、 Neurostylistic migration

(1)Motivation

         The goal of neurostyle transfer is based on the given content C And given style S Generate the image G.

(2)Activation

         At a given layer l in , Activation is marked as a^{[l]} And it's a dimension n_H\times n_w\times n_c

(3)Content cost function

         Content cost function J_{\textrm{content}}(C,G) Used to determine the generated image G With the original content image C The difference between . The definition is as follows :

        \boxed{J_{\textrm{content}}(C,G)=\frac{1}{2}||a^{[l](C)}-a^{[l](G)}|| ^2}

(4)Style matrix

         Given layer l Style matrix G^{[l]} It's a Gram matrix , Each of these elements G_{kk'}^{[l]} The channel is quantified k and k' The relevance of . It's about activating a^{[l]} The definition is as follows :
        \boxed{G_{kk'}^{[l]}=\sum_{i=1}^{n_H^{[l]}}\sum_{j=1}^{n_w^{[l]}}a_ {ijk}^{[l]}a_{ijk'}^{[l]}}

         remarks : The style image and the style matrix of the generated image are respectively recorded as G^{[l](S)} and G^{[l](G)}.

(5)Style cost function

         Style cost function J_{\textrm{style}}(S,G) Used to determine the generated image G With style S The difference between . The definition is as follows :

\boxed{J_{\textrm{style}}^{[l]}(S,G)=\frac{1}{(2n_Hn_wn_c)^2}||G^{[l](S)}-G^{[l](G)}||_F^2=\frac{1}{(2n_Hn_wn_c)^2}\sum_{k,k'=1}^{n_c}\Big(G_{kk'}^{[l](S)}-G_{kk'}^{[l](G)}\Big)^2}

(6)Overall cost function

          The overall cost function is defined as a combination of content and style cost functions , By the parameter α,β weighting , As shown below :

        \boxed{J(G)=\alpha J_{\textrm{content}}(C,G)+\beta J_{\textrm{style}}(S,G)}

         remarks : Higher α Values make the model more concerned about content , And the higher β Values make the model more concerned about style.

Nine 、 An architecture that uses computational skills

(1)Generative Adversarial Network

         Generative antagonistic network , Also known as GAN, It consists of generative model and discriminant model , The generation model aims to generate the most realistic output , This output will be fed into a discrimination model designed to distinguish the generated image from the real image .

          remarks : Use GAN Variant use cases include text to image 、 Music generation and synthesis .

(2)ResNet

        ResNet Residual network architecture ( Also known as ResNet) Use residual blocks with a large number of layers , Designed to reduce training errors . The residual block has the following characteristic equation :

        \boxed{a^{[l+2]}=g(a^{[l]}+z^{[l+2]})}

(3)Inception Network

         The architecture uses the initial modules , To try different convolutions , To improve its performance through feature diversification . especially , It USES 1\times1 Convolution techniques to limit the computational burden .

原网站

版权声明
本文为[Sit and watch the clouds rise]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/162/202206110859382613.html