当前位置:网站首页>Teacher lihongyi, NTU -- tips for DNN regulation
Teacher lihongyi, NTU -- tips for DNN regulation
2022-06-11 22:53:00 【Learning uncle】
List of articles
Recipe of Deep Learning

stay Deep learning in , We must have seen training data Is your performance good , Then look at testing data The above performance . If in training data It's not good , First adjust the model to achieve good performance .
So we don't want to see 20-layer Than 50-layer stay testing Perform well , Just say 50-layer Over fitting , First, go and see what they are doing testing data Result . It can be seen from the results ,20-layer stay training data Have more than 50-layer Okay , There are many possible reasons , such as 50-layer Of local minimum Point to 20 The height of .
So not all the methods we can use , For example, when traning data When the effect of , You can't use dropout. Just say training data We did a good job , however testing data When you're not doing well , Just can use .
# Vanishing Problem
stay MINIST In the project , Why the more layers , The lower the accuracy ?
Because the activation function uses sigmoid function ,sigmoid Function because it maps large values to 0-1 Within the range of , So the more layers , Right back output The smaller the impact of the output , It leads to more and more training , The front has changed a lot , It's hard to change later , Close to convergence .
ReLU



because ReLU Characteristics of , A linear network that makes the network very thin , But this can still solve the problem of non-linear , Because it is multi-layered .( I understand )
Maxout

Maxout Network amount to Max Pooling Application in neural network .


because Maxout Some neurons will disappear , But every time traning data The difference that disappears , As a result, the length of each activation function is different . Is a learnable activation function .

Adagrad & RMSprop



Local Minimum

Yan Lecun stay 07 Year said , If every dimension has a bottom , that 1000 individual features Namely 1000 A valley bottom , At every bottom p. So that is p**1000. So don't worry about a lot local minimum, Most likely what you find is global Or close to global.
Adam

Early Stopping

there testing set In fact, that is validation set
Regularization

We do regularization When , Don't consider bias, Because our main purpose of regularization is to make the function smoother ,bias Generally no matter smooth , Instead, the function moves up and down .
L2 norm The formula of , We can see w Will multiply by a factor close to 1 Value , therefore w Whether positive or negative , Will continue to approach 0. And because the gradient is subtracted , So this value will keep approaching the following value .L2 In fact, the effect of regularization in neural networks is not SVM good , It's equivalent to a weight decay.
L1 norm It's the absolute value , The derivation of the absolute value is used here sgn, If it's a positive number , The derivative is 1, If it's a negative number , The derivative is -1.
from L1 We can see that , It always ends up subtracting a fixed value ( Blue line section ). and L2 Always by multiplying by one 1 To attenuate .
So if we have a big w, such as w=1000000. This w stay L1 The decrease of is a fixed value every time , So it's slow , Maybe it will be big in the end . But in L2 above , It will drop quickly , Because it's a constant multiplication . So that's why L1 Regularization of , It makes the parameters sparse , yes , we have w It's big , Some tend to 0. and L2 Regularization does not achieve this effect .
Weight Decay

Dropout

Each time for a different mini batch, neural network drop Of neuron Is different .

Testing When , Using all the neurons . And all of w Ducheng 1-p.
Intuitive explain :


Explanation of principle :
Dropout Like the ultimate ensemble. Integrated learning .

every time dropout The neural networks are different . But in the end testing When , Will use all the neurons , This is the time w Need to take 1-p
Integrated y The average and y Is approximately equal .
For example, we use linear activation function , You can see these two situations y They are equal. . If not linear activation , The final result is approximate .
边栏推荐
- Submit task to thread pool
- Research Report on development trend and competitive strategy of global reverse osmosis membrane cleaning agent industry
- MATLAB点云处理(二十五):点云生成 DEM(pc2dem)
- H.265编码原理入门
- 小程序启动性能优化实践
- 习题11-2 查找星期 (15 分)
- IEEE floating point mantissa even round - round to double
- Is it too troublesome to turn pages manually when you encounter a form? I'll teach you to write a script that shows all the data on one page
- 【NodeJs】Electron安装
- 【解决】修改子物体Transform信息导致变换不对称、异常问题的解决方案
猜你喜欢

FastAPI 5 - 常用请求及 postman、curl 使用(parameters,x-www-form-urlencoded, raw)

Why can't Google search page infinite?

Deconstruction of volatile | community essay solicitation

Mobile terminal - picture timeline of swipe effect

【Day10 文献泛读】Temporal Cognition Can Affect Spatial Cognition More Than Vice Versa: The Effect of ...

H.265编码原理入门

华为设备配置HoVPN

Google搜索為什麼不能無限分頁?

Games-101 闫令琪 5-6讲 光栅化处理 (笔记整理)

基于模板配置的数据可视化平台
随机推荐
阿里云服务器mysql远程连接一直连不上
Message queue MySQL table that stores message data
The key to the safe was inserted into the door, and the college students stole the mobile phone numbers of 1.1 billion users of Taobao alone
Exercise 6-6 using a function to output an integer in reverse order (20 points)
Google搜索為什麼不能無限分頁?
【Day10 文献泛读】Temporal Cognition Can Affect Spatial Cognition More Than Vice Versa: The Effect of ...
Learn to crawl for a month and earn 6000 a month? Don't be fooled. The teacher told you the truth about the reptile
[nodejs] electron installation
Use the securecrtportable script function to read data from network devices
Is the product stronger or weaker, and is the price unchanged or reduced? Talk about domestic BMW X5
Matlab point cloud processing (XXV): point cloud generation DEM (pc2dem)
Tkinter study notes (IV)
Why can't Google search page infinite?
Read dense visual slam for rgb-d cameras
0-1 knapsack problem of dynamic programming (detailed explanation + analysis + original code)
Exercise 11-2 find week (15 points)
Exercise 6-2 using functions to sum a special series of a numbers (20 points)
习题9-5 通讯录排序 (20 分)
Cloudcompare source code analysis: read ply file
Is it too troublesome to turn pages manually when you encounter a form? I'll teach you to write a script that shows all the data on one page