当前位置:网站首页>Teacher lihongyi, NTU -- tips for DNN regulation
Teacher lihongyi, NTU -- tips for DNN regulation
2022-06-11 22:53:00 【Learning uncle】
List of articles
Recipe of Deep Learning

stay Deep learning in , We must have seen training data Is your performance good , Then look at testing data The above performance . If in training data It's not good , First adjust the model to achieve good performance .
So we don't want to see 20-layer Than 50-layer stay testing Perform well , Just say 50-layer Over fitting , First, go and see what they are doing testing data Result . It can be seen from the results ,20-layer stay training data Have more than 50-layer Okay , There are many possible reasons , such as 50-layer Of local minimum Point to 20 The height of .
So not all the methods we can use , For example, when traning data When the effect of , You can't use dropout. Just say training data We did a good job , however testing data When you're not doing well , Just can use .
# Vanishing Problem
stay MINIST In the project , Why the more layers , The lower the accuracy ?
Because the activation function uses sigmoid function ,sigmoid Function because it maps large values to 0-1 Within the range of , So the more layers , Right back output The smaller the impact of the output , It leads to more and more training , The front has changed a lot , It's hard to change later , Close to convergence .
ReLU



because ReLU Characteristics of , A linear network that makes the network very thin , But this can still solve the problem of non-linear , Because it is multi-layered .( I understand )
Maxout

Maxout Network amount to Max Pooling Application in neural network .


because Maxout Some neurons will disappear , But every time traning data The difference that disappears , As a result, the length of each activation function is different . Is a learnable activation function .

Adagrad & RMSprop



Local Minimum

Yan Lecun stay 07 Year said , If every dimension has a bottom , that 1000 individual features Namely 1000 A valley bottom , At every bottom p. So that is p**1000. So don't worry about a lot local minimum, Most likely what you find is global Or close to global.
Adam

Early Stopping

there testing set In fact, that is validation set
Regularization

We do regularization When , Don't consider bias, Because our main purpose of regularization is to make the function smoother ,bias Generally no matter smooth , Instead, the function moves up and down .
L2 norm The formula of , We can see w Will multiply by a factor close to 1 Value , therefore w Whether positive or negative , Will continue to approach 0. And because the gradient is subtracted , So this value will keep approaching the following value .L2 In fact, the effect of regularization in neural networks is not SVM good , It's equivalent to a weight decay.
L1 norm It's the absolute value , The derivation of the absolute value is used here sgn, If it's a positive number , The derivative is 1, If it's a negative number , The derivative is -1.
from L1 We can see that , It always ends up subtracting a fixed value ( Blue line section ). and L2 Always by multiplying by one 1 To attenuate .
So if we have a big w, such as w=1000000. This w stay L1 The decrease of is a fixed value every time , So it's slow , Maybe it will be big in the end . But in L2 above , It will drop quickly , Because it's a constant multiplication . So that's why L1 Regularization of , It makes the parameters sparse , yes , we have w It's big , Some tend to 0. and L2 Regularization does not achieve this effect .
Weight Decay

Dropout

Each time for a different mini batch, neural network drop Of neuron Is different .

Testing When , Using all the neurons . And all of w Ducheng 1-p.
Intuitive explain :


Explanation of principle :
Dropout Like the ultimate ensemble. Integrated learning .

every time dropout The neural networks are different . But in the end testing When , Will use all the neurons , This is the time w Need to take 1-p
Integrated y The average and y Is approximately equal .
For example, we use linear activation function , You can see these two situations y They are equal. . If not linear activation , The final result is approximate .
边栏推荐
- 16 | floating point numbers and fixed-point numbers (Part 2): what is the use of a deep understanding of floating-point numbers?
- Small program startup performance optimization practice
- Pourquoi Google Search ne peut - il pas Pager indéfiniment?
- Svn deploys servers and cleints locally and uses alicloud disks for automatic backup
- 【Day8 文献泛读】Space and Time in the Child‘s Mind: Evidence for a Cross-Dimensional Asymmetry
- The second bullet of in-depth dialogue with the container service ack distribution: how to build a hybrid cloud unified network plane with the help of hybridnet
- 分类统计字符个数 (15 分)
- 向线程池提交任务
- Matlab point cloud processing (XXIV): point cloud median filtering (pcmedian)
- leetcode 中的位运算
猜你喜欢

Google搜索為什麼不能無限分頁?

Lecture de l'article dense Visual SLAM for RGB - D Cameras

【Day15 文献泛读】Numerical magnitude affects temporal memories but not time encoding

Games-101 Yan Lingqi 5-6 lecture on raster processing (notes sorting)

【Day11-12 文献精读】On magnitudes in memory: An internal clock account of space-time interaction

Pourquoi Google Search ne peut - il pas Pager indéfiniment?

【解决】修改子物体Transform信息导致变换不对称、异常问题的解决方案
![[solution] solution to asymmetric and abnormal transformation caused by modifying the transform information of sub objects](/img/52/7e741154e4d6e61c5df7e8701ab177.png)
[solution] solution to asymmetric and abnormal transformation caused by modifying the transform information of sub objects

NLP - fastText

Is the product stronger or weaker, and is the price unchanged or reduced? Talk about domestic BMW X5
随机推荐
习题9-6 按等级统计学生成绩 (20 分)
[Matlab]二阶节约响应
Learn to crawl for a month and earn 6000 a month? Don't be fooled. The teacher told you the truth about the reptile
【解决】修改子物体Transform信息导致变换不对称、异常问题的解决方案
16 | 浮点数和定点数(下):深入理解浮点数到底有什么用?
Meetup review how Devops & mlops solve the machine learning dilemma in enterprises?
Four rounding modes in IEEE754 standard
Exercise 8-8 moving letters (10 points)
How to do investment analysis in the real estate industry? This article tells you
Start notes under the Astro Pro binocular camera ROS
SDNU_ ACM_ ICPC_ 2022_ Weekly_ Practice_ 1st (supplementary question)
[solution] solution to asymmetric and abnormal transformation caused by modifying the transform information of sub objects
Tensorflow [actual Google deep learning framework] uses HDF5 to process large data sets with tflearn
Xshell不小心按到ctrl+s造成页面锁定的解决办法
Are you still using localstorage directly? It's time to raise the bar
SecurityContextHolder.getContext().getAuthentication().getPrincipal()获取到的是username而不是UserDetails
Daily question -1317 Converts an integer to the sum of two zero free integers
The second bullet of in-depth dialogue with the container service ack distribution: how to build a hybrid cloud unified network plane with the help of hybridnet
Php+mysql library management system (course design)
IEEE754标准中的4种舍入模式