当前位置:网站首页>CV learning notes alexnet
CV learning notes alexnet
2022-07-03 10:09:00 【Moresweet cat】
Alexnet
1. background
AlexNet yes 2012 year ImageNet The winner of the competition Hinton And his students Alex Krizhevsky The design of the . Also after that year , More and deeper neural networks are proposed .
2. Network structure
Original network structure :
The original explanation is that the top half and the bottom half run in different GPU On , So simplify the native network into the following structure , Let's see how the intermediate process is calculated .
Detailed explanation :
Input receive a three channel two-dimensional 224$\times$224 matrix , Therefore, the original input image input network should be processed first ,resize To (224,224,3).
Use steps of 4$\times 4 , Big Small by 11 4, The size is 11 4, Big Small by 11\times$11 The convolution check image is convoluted , Characteristics of output (feature map) by 96 layer ( That is, the output has 96 Channels )
The detailed calculation method has been introduced in the author's previous article , Here is a deduction .
The number of output channels and the number of convolution kernels (3 passageway , The number of channels of convolution kernel should be consistent with the number of original input channels ) Agreement , Therefore, the number of output channels can be artificially defined after convolution . It's used here 96 individual 11$\times 11 volume product nucleus Into the That's ok 了 volume product , transport Out by 55 11 Convolution kernel performs convolution , Output is 55 11 volume product nucleus Into the That's ok 了 volume product , transport Out by 55\times 55 55 55\times$96,55 How to calculate it , Using the formula N=(W-F+2P)/S + 1,W Enter a size for ,F Is the convolution kernel size ,P To fill the value size ,S Is the step size , Substituting into the formula, we can get ,N=(224-55+2 × \times × 0)/4 +1=54, Many layers have been LRN operation , May refer to 《 The controversial local response normalization of deep learning (LRN) Detailed explanation 》, The author will not make an introduction here
Then enter the pool operation , Pooling does not change the number of output channels , Pooled pool_size by 3 $ \times$ 3, Therefore, the output size is (55-3)/2+1=27, So the final output is 27 × 27 × 96 27\times 27\times 96 27×27×96
And then pass by same The way padding after , use 5 × \times × 5 Convolution operation is carried out by convolution kernel of , The output channel is 256,same After calculation, the output is ⌈ 27 1 ⌉ = 27 \lceil \frac{27}{1} \rceil = 27 ⌈127⌉=27, The output size does not change , So the final output is 27 × 27 × 256 27 \times 27 \times 256 27×27×256
In the framework of general deep learning padding There are two filling methods ,same and vaild,same Under way , Try to keep the output consistent with the input size ( Excluding the number of channels ), It is based on the above calculation formula P value , To decide how many turns to add around 0, Then the output size is N = ⌈ W S ⌉ N =\lceil \frac{W}{S} \rceil N=⌈SW⌉
valid Specify P=0, Then the output size is N = ⌈ W − F + 1 S ⌉ N = \lceil \frac{W-F+1}{S}\rceil N=⌈SW−F+1⌉
contrast :valid Mode means only effective convolution , Do not process boundary data ;same Represents the convolution result at the reserved boundary , It usually leads to output shape With the input shape identical
And then use 3 × \times × 3 The window of , In steps 2 × \times × 2 The window of the maximum pool operation , Pooling does not change the number of channels , Output is (27-3+0)/2 + 1 = 13, Therefore, the output size is 13 × 13 × 256 13 \times 13 \times 256 13×13×256
Then go through same Way plus padding, Output is ⌈ 13 1 ⌉ = 13 \lceil \frac{13}{1} \rceil = 13 ⌈113⌉=13 The output channel is specified as 384, That's it 384 individual 3 × 3 3 \times 3 3×3 Convolution kernel ( The number of convolution cores is equal to the number of output channels ), The final output is 13 × 13 × 384 13 \times 13 \times 384 13×13×384
And then keep 384 The output channel of does not change , Add one circle padding( namely P=1), use 3 × \times × 3 The convolution kernel of convolution , Output is (13-3+2)/1 + 1 = 13, So the final output is 13 × 13 × 384 13 \times 13 \times 384 13×13×384
Then set the output channel to 256, Add one circle padding( namely P=1), use 3 × \times × 3 The convolution kernel of convolution , Output is (13-3+2)/1 + 1 = 13, So the final output is 13 × 13 × 256 13 \times 13 \times 256 13×13×256
And then use 3 × \times × 3 The window size of 、2 × \times × 2 Step size of to maximize pool operation , Pooling does not change the number of channels , The number of channels is still 256, Output is (13-3+0)/2 + 1 = 6, So the final output is 6 × 6 × 256 6 \times 6 \times 256 6×6×256
because FC( Fully connected layer ) Receive only one-dimensional vectors , Therefore, it is necessary to 6 × 6 × 256 6 \times 6 \times 256 6×6×256 convert to 1 × 1 × 9216 1 \times 1 \times 9216 1×1×9216 Vector , Input is 9216 Parameters , This process becomes the process of flattening , The principle is to use and original featuremap Convolution with convolution kernels of the same size , The number is the number of output channels , Then go through three layers FC, Re pass softmax Classifiers to classify ,softmax The number of output is the number of categories you want to divide ,FC The process in the layer is equivalent to using 1 × \times × 1 The process of convolution with the convolution kernel of .
Personal study notes , Only exchange learning , Reprint please indicate the source !
边栏推荐
- [Li Kou brush question notes (II)] special skills, module breakthroughs, classification and summary of 45 classic questions, and refinement in continuous consolidation
- Working mode of 80C51 Serial Port
- Opencv image rotation
- Opencv histogram equalization
- Opencv Harris corner detection
- LeetCode - 706 设计哈希映射(设计) *
- (1) 什么是Lambda表达式
- YOLO_ V1 summary
- Opencv gray histogram, histogram specification
- Leetcode - 933 number of recent requests
猜你喜欢
yocto 技术分享第四期:自定义增加软件包支持
LeetCode - 933 最近的请求次数
ADS simulation design of class AB RF power amplifier
Opencv histogram equalization
CV learning notes - feature extraction
QT self drawing button with bubbles
LeetCode - 508. Sum of subtree elements with the most occurrences (traversal of binary tree)
Uniapp realizes global sharing of wechat applet and custom sharing button style
Timer and counter of 51 single chip microcomputer
Leetcode - 1670 conception de la file d'attente avant, moyenne et arrière (conception - deux files d'attente à double extrémité)
随机推荐
LeetCode - 5 最长回文子串
CV learning notes - clustering
el-table X轴方向(横向)滚动条默认滑到右边
Opencv gray histogram, histogram specification
Leetcode - 1670 conception de la file d'attente avant, moyenne et arrière (conception - deux files d'attente à double extrémité)
Opencv feature extraction sift
2.Elment Ui 日期选择器 格式化问题
Adaptiveavgpool1d internal implementation
Stm32 NVIC interrupt priority management
Installation and removal of MySQL under Windows
CV learning notes - feature extraction
Tensorflow built-in evaluation
Leetcode bit operation
Leetcode - 933 number of recent requests
Yocto technology sharing phase IV: customize and add software package support
Sending and interrupt receiving of STM32 serial port
Connect Alibaba cloud servers in the form of key pairs
El table X-axis direction (horizontal) scroll bar slides to the right by default
My notes on intelligent charging pile development (II): overview of system hardware circuit design
Screen display of charging pile design -- led driver ta6932