当前位置:网站首页>Deep learning plus
Deep learning plus
2022-07-05 16:44:00 【Small margin, rush】
Continuous updating
1. Residual dense network RDN
Thesis link :https://arxiv.org/abs/1802.08797
The essence : Image super-resolution network utilizing all layered features - Single image super-resolution (SISR) Aimed at low resolution (LR) High resolution with good vision is generated on the basis of measurement (HR) Images .
2. Cross entropy error
Cross entropy describes the distance between two probability distributions , The closer the cross entropy is, the closer the two are
Use... For classification One Hot Label + Cross Entropy Loss
Training The process , Use... For classification Cross Entropy Loss, Go back to the question Mean Squared Error.
3. Batch gradient descent
Update the gradient every time you use all the training sets
Calculate the loss function each time using all the training set samples loss_function Gradient of params_grad, Then use the learning rate learning_rate Update each parameter of the model in the opposite direction of the gradient params
Batch gradient descent uses the entire training set for each learning , So the advantage is that every update will be in the right direction , Finally, it can be guaranteed to converge to the extreme point ( The convex function converges to the global extreme point , Nonconvex functions may converge to local extremum ), But its disadvantage is that each study time is too long , And if the training set is large enough to consume a lot of memory , And the full gradient descent can not update the model parameters on line
4. Stochastic gradient descent
Random gradient descent algorithm randomly selects one sample from the training set every time to learn , Learning is very fast , And can be updated online .
The random gradient decreases the most The disadvantage is that each update may not be in the right direction , So it can bring about optimization fluctuation ( Disturbance )
5. Small batch gradient descent
Small batch gradient descent combines batch gradient descent and random gradient descent , There is a balance between the speed of each update and the number of updates , Each update is randomly selected from the training set m,m<n Learn from samples
6. An optimization method
Moment: It simulates the inertia of an object in motion , It is to keep the direction of the previous update to a certain extent when updating , At the same time, take advantage of the current batch Fine tune the gradient of the final update direction . thus , It can increase stability to a certain extent , So that we can learn faster , And there is a certain ability to get rid of the local optimum
Adagrad: There is a constraint on the learning rate
RMSprop:RMSprop It can be counted as Adadelta A special case of , Depends on the global learning rate
Adam(Adaptive Moment Estimation) It's essentially a momentum term RMSprop, It uses the first-order moment estimation and the second-order moment estimation of the gradient to dynamically adjust the learning rate of each parameter .Adam The main advantage is that after bias correction , The learning rate of each iteration has a certain range , Make the parameters more stable .
7.BatchNormalization The role of
By means of standardization , Pull more and more biased distribution back to standardized distribution , Make the input value of the activation function fall in the area where the activation function is sensitive to the input , So that the gradient becomes larger , Speed up learning convergence , Avoid the problem of gradients disappearing .
In the neural network , When the learning speed of the front hidden layer is lower than that of the back hidden layer , That is, as the number of hidden layers increases , The accuracy of classification has declined .
8. 1*1 The convolution of
Realize cross channel interaction and information integration , To reduce the channel number of convolution kernel and increase the dimension , Can achieve more than one feature map The linear combination of , And it can achieve the equivalent effect with full connection layer
Dimension reduction : When the input is 6x6x32 when ,1x1 The form of convolution is 1x1x32, When there is only one 1x1 Convolution kernel , The output is 6x6x1
1x1 Convolution generally only changes the number of output channels (channels), Without changing the width and height of the output
9. Depth of understanding channel
tensorflow Given in , For input samples channels
The meaning of . General RGB picture ,channels
The number is 3 ( red 、 green 、 blue ); and monochrome picture ,channels
The number is 1 .
mxnet Mentioned in , commonly channels
The meaning is , The number of convolution kernels in each convolution layer .
Suppose the existing one is 6×6×3 Sample pictures of , Use 3×3×3 Convolution kernel (filter) Convolution operation . At this point, enter the name of the picture channels
by 3, and Convolution kernel Of in_channels
And Of data requiring convolution channels
Agreement
- Of the original input image sample
channels
, Depending on the picture type , such as RGB;- Output after convolution operation
out_channels
, Depending on the number of convolution kernels . At this timeout_channels
It will also be used as the convolution kernel for the next convolutionin_channels
;- In convolution kernel
in_channels
, just 2 It's already said in , It's the result of the last convolutionout_channels
, If you're doing convolution for the first time , Namely 1 In the sample picturechannels
.
10.GAN
2014Goodfellow Put forward GAN,GAN The main structure of the includes a generator G(Generator) And a Judging device D. In the process of training , Generation network G The goal is to generate real pictures as much as possible to cheat the discrimination network D. and D The goal is to try to G The generated image is different from the real image . such ,G and D Constitute a dynamic “ The game process ”.(Discriminator)
11.DCGAN
DCGAN A method called transpose convolution operation is used , That is, deconvolution . Transpose convolution can be scaled up . They help us transform low resolution images into high resolution images .
DCGAN The structure of convolution neural network is changed , In order to improve the quality of samples and the speed of convergence , These changes are :
- Cancel all pooling layer .G Transpose convolution is used in the network (transposed convolutional layer) Sample up ,D Join in the network stride Instead of pooling.
- stay D and G Both of them are used batch normalization
- Get rid of FC layer , Make the network full convolution network
- G Network usage ReLU As an activation function , The last layer uses tanh
- D Network usage LeakyReLU As an activation function
12.StyleGAN
StyleGAN Not focused on creating more realistic images , But improved GANs The ability to finely control the generated image .
StyleGAN Not focusing on architecture and loss functions
13.L1,L2 Regularization
Add it after the loss function to adjust loss Output , Prevent over fitting
14.
边栏推荐
- 单商户 V4.4,初心未变,实力依旧!
- tf.sequence_mask函数讲解案例
- Flet教程之 09 NavigationRail 基础入门(教程含源码)
- The memory of a Zhang
- Practice independent and controllable 3.0 and truly create the open source business of the Chinese people
- One click installation script enables rapid deployment of graylog server 4.2.10 stand-alone version
- DeSci:去中心化科学是Web3.0的新趋势?
- JSON转MAP前后数据校验 -- 自定义UDF
- Flet教程之 11 Row组件在水平数组中显示其子项的控件 基础入门(教程含源码)
- ES6 deep - ES6 class class
猜你喜欢
[brush title] goose factory shirt problem
今日睡眠质量记录79分
Global Data Center released DC brain system, enabling intelligent operation and management through science and technology
【刷题篇】鹅厂文化衫问题
Hiengine: comparable to the local cloud native memory database engine
2020-2022 two-year anniversary of creation
Domestic API management artifact used by the company
File operation --i/o
Summary of methods for finding intersection of ordered linked list sets
Jarvis OJ Webshell分析
随机推荐
【学术相关】多位博士毕业去了三四流高校,目前惨不忍睹……
Jarvis OJ 远程登录协议
树莓派4b安装Pytorch1.11
【刷題篇】鹅廠文化衫問題
【刷题篇】鹅厂文化衫问题
Accès aux données - intégration du cadre d'entité
Detailed explanation of use scenarios and functions of polar coordinate sector diagram
Cheer yourself up
一些認知的思考
新春限定丨“牛年忘烦”礼包等你来领~
漫画:什么是八皇后问题?
公司自用的国产API管理神器
EDI许可证和ICP经营性证有什么区别
Cartoon: what is blue-green deployment?
[echart] resize lodash 实现窗口缩放时图表自适应
Benji Bananas 会员通行证持有人第二季奖励活动更新一览
ES6 drill down - Async functions and symbol types
Today's sleep quality record 79 points
2020-2022 two-year anniversary of creation
How can programmers improve their situation?