当前位置：网站首页>Deep learning - CV, CNN, RNN

Deep learning - CV, CNN, RNN

2022-07-26 06:31:00 【laluneX】

One 、 Computer vision

1. edge detection （edge detection）

stay majority When The edge of the image Sure Carry most of the information , also Edge extraction Sure remove quite a lot Interference information , Improve the efficiency of data processing . It contains Vertical edge detection （Vertical edge detection） and Horizontal edge detection （Vertical edge detection） wait

Vertical edge detection Yes, it will Input picture Through a kind of Convolution kernel Conduct operation Then I got a New pictures , In the picture Vertical edge display . Horizontal edge detection Through another Convolution kernel operation , Last The output picture shows the horizontal edge

Positive and negative sides , The actual difference is From light to dark And From dark to light The difference between

2. padding

His function yes ：

In order to solve After many convolutions after The image becomes very small The situation of
In order to Save the edge information of the original picture

padding In three ways ：full,same,valid

①full

full The pattern means , from filter and image Rigid intersection Start doing , The white part is filled with 0.
Insert picture description here

②same

there same After convolution Output feature map The size remains the same ( Relative to the input picture ), here filter Moving range ratio of full Smaller .

③valid

valid Refer to Don't do the original picture padding, Direct convolution , so filter The moving range of the is same Smaller .

3. Convolution step （Strided convolution）

Convolution step Refer to The distance of each filter move

4. CNN Layer type

Convolutional neural networks CNN Structure It generally includes these layers ：

Input layer ： be used for Data input
Convolution layer ： Use Convolution kernel Conduct Feature extraction and feature mapping
Pooling layer ： Take the next sample , Sparse processing of feature graph , Reduce the amount of data computation . It includes Max pooling and Average polling.Max pooling： take “ Pool vision ” In the matrix Maximum ;Average pooling： take “ Pool vision ” In the matrix Average
Fully connected layer ： Usually in CNN Tail of Refit , Reduce the loss of feature information
Output layer ： be used for Output results
Insert picture description here

Of course, some other functional layers can be used in the middle :
Normalized layer （Batch Normalization）： stay CNN Chinese vs Normalization of features
Cut the layers ： For some （ picture ） Data processing Separate learning in different regions
Fusion layer ： Yes The branch of independent feature learning Conduct The fusion

Two 、 Cyclic neural network -RNN(Recurrent Neural Networks)

1. Why RNN（ Cyclic neural network ）

General neural networks all Only one input can be processed separately , Previous input and The last input yes It doesn't matter at all . however , Some tasks Need to be able to Better processing of sequence information , namely The previous input is related to the following input .

2. RNN structure

Insert picture description here
This network is in t moment Received Input $x_t$ after , Hidden layer The value of is $s_t$ , Output value yes $o_t$ . The key point is , $s_t$ It's not just about $x_t$ , It also depends on $s_{t-1}$

3. RNN The type of structure

one-on-one 、 One to many 、 For one more 、 Many to many （ The output is equal to or unequal to the output ）、 Attention structure
Insert picture description here

4. The problem of gradient explosion

Gradient explosion is easier to deal with , Because when the gradient explodes , our The program will be received NaN error . We can Set up One Gradient threshold , When the gradient exceeds this threshold, you can To intercept directly .

5. The problem of gradient disappearance

Reasonable initialization weight value , To avoid the area where the gradient disappears
Use relu Instead of sigmoid and tanh As an activation function
Use Long and short term memory network （LSTM） and Gated Recurrent Unit（GRU）, This is the most popular way

① Long and short term memory network （Long Short Term Memory Network——LSTM）

LSTM use Two doors To control Unit status c The content of , One is Oblivion gate （forget gate）, It determines Unit status at the last time $c_{t-1}$ How much remains to the present moment $c_{t}$ ; The other is Update door （update gate）, It determines Current moment network $\tilde{c}_{t}$ How much of the input is saved to the unit state $c_{t}$ .LSTM use Output gate （output gate） To control Unit status $c_{t}$ How much output to LSTM The current output value of $a_{t}$ .
Insert picture description here

② Door control cycle unit （GRU）

GRU (Gated Recurrent Unit) yes LSTM A variant of , Maybe the most successful one . It makes a lot of simplification , At the same time, it keeps the same relationship with LSTM Same effect .

GRU There are two Two doors , namely A reset door （reset gate） And an update door （update gate）. Intuitively speaking , Reset door To determine the How to combine the new input information with the previous memory , Update door Defined The amount of previous memory saved to the current time step .

The following figure z Update doors for ,r To reset the door , $r_t$ representative $h_{t-1}$ And $\tilde{h}_{t}$ The correlation between

Insert picture description here

The following figure is a simplified GRU（ A door ）：

Insert picture description here

3、 ... and 、 Two way recurrent neural network -BRNN(Bidirectional Recurrent Neural Networks)

because Standard recurrent neural network （RNN） Process sequence in time sequence , They tend to Ignored Future contextual information . A very obvious terms of settlement It's for the network Add future contextual information . Theoretically ,M It can be very large to capture all available information in the future , But in fact, I found that if M Too big , The prediction results will get worse .

Two way recurrent neural network （BRNN） The basic idea of is to make For each training sequence Forward and backward formation Two recurrent neural networks （RNN）, And these two are connected An output layer . This structure is provided to Output layer At each point in the input sequence complete Past and future The context of .
Insert picture description here

Four 、 Deep circulation neural network (Deep RNN)

Deep circulation neural network In order to Enhance the expression ability of model And the network is set Multiple circulation layers
Insert picture description here