当前位置:网站首页>Deep learning network regularization
Deep learning network regularization
2022-07-04 14:56:00 【Falling flowers and rain】
When designing machine learning algorithm, it is not only required to have small error in the training set , And hope to have strong generalization ability on new samples . Many machine learning algorithms use relevant strategies to reduce the test error , These strategies are collectively referred to as regularization . Because of the strong representation ability of neural network, fitting is often encountered , Therefore, different forms of regularization strategies need to be used .
Regularization reduces generalization error by modifying the algorithm , At present, the most commonly used strategies in deep learning are parameter norm punishment , Early termination ,DropOut etc. , Next, we will introduce it in detail .
1. L1 And L2 Regularization ( review )
L1 and L2 Is the most common regularization method . They are in the loss function (cost function) Add a regular item to , Due to the addition of this regularization term , The value of the weight matrix decreases , Because it assumes that a neural network with a smaller weight matrix leads to a simpler model . therefore , It also reduces overfitting to some extent . However , This regularization term is in L1 and L2 It's different .
L2 Regularization
there λ It's a regularization parameter , It is a super parameter that needs to be optimized .L2 Regularization is also called weight attenuation , Because it causes the weight to tend to 0( But not all 0)L1 Regularization
here , We punish the absolute value of the weight matrix . among ,λ Is the regularization parameter , It's a super parameter. , differ L2, The weight value may be reduced to 0. therefore ,L1 Useful for compression models . In other cases , General preference L2 Regularization .
stay tf.keras The method used in the implementation is :
- L1 Regularization
tf.keras.regularizers.L1(l1=0.01)
- L2 Regularization
tf.keras.regularizers.L2(l2=0.01)
- L1L2 Regularization
tf.keras.regularizers.L1L2(
l1=0.0, l2=0.0
)
We're directly on a certain floor layers Specify the regularization type and hyperparameters in :
# Import the corresponding toolkit
import tensorflow as tf
from tensorflow.keras import regularizers
# Creating models
model = tf.keras.models.Sequential()
# L2 Regularization ,lambda by 0.01
model.add(tf.keras.layers.Dense(16, kernel_regularizer=regularizers.l2(0.001),
activation='relu', input_shape=(10,)))
# L1 Regularization ,lambda by 0.01
model.add(tf.keras.layers.Dense(16, kernel_regularizer=regularizers.l1(0.001),
activation='relu'))
# L1L2 Regularization ,lambda1 by 0.01,lambda2 by 0.01
model.add(tf.keras.layers.Dense(16, kernel_regularizer=regularizers.L1L2(0.001, 0.01),
activation='relu'))
2. Dropout Regularization
dropout It is the most commonly used regularization technology in the field of deep learning .Dropout It's very simple : Suppose our neural network structure is as follows , During each iteration , Select some nodes randomly , And delete forward and backward connections .
therefore , Each iteration process will have different node combinations , This leads to different outputs , This can be seen as an integration method in machine learning (ensemble technique). Integration model is the first mock exam. , Because they can capture more randomness . Similarly ,dropout Make the neural network model better than the normal model .
stay tf.keras The method used in the implementation is dropout:
tf.keras.layers.Dropout(rate)
Parameters :
rate: The probability that each neuron is discarded
Example :
# Import the corresponding library
import numpy as np
import tensorflow as tf
# Definition dropout layer , Every neuron has 0.2 The probability of being inactivated , The input that is not deactivated will be pressed 1 /(1-rate) Zoom in
layer = tf.keras.layers.Dropout(0.2,input_shape=(2,))
# Define data for five batches
data = np.arange(1,11).reshape(5, 2).astype(np.float32)
# Print raw data
print(data)
# Perform random deactivation : stay training In the pattern , Back to app dropout Output after ; Or in Africa training In mode , Normal return output ( No, dropout)
outputs = layer(data,training=True)
# Print the results after deactivation
print(outputs)
The result is :
[[ 1. 2.]
[ 3. 4.]
[ 5. 6.]
[ 7. 8.]
[ 9. 10.]]
tf.Tensor(
[[ 1.25 2.5 ]
[ 0. 5. ]
[ 6.25 7.5 ]
[ 8.75 10. ]
[ 0. 12.5 ]], shape=(5, 2), dtype=float32)
3. Early stop
Early stop (early stopping) Is to take part of the training set as the verification set (validation set). When the performance of the verification set becomes worse and worse, or the performance is no longer improved , Stop training the model immediately . This is called an early stop .
In the diagram above , Stop the training of the model at the dotted line , At this time, the model begins to over fit the training data .
stay tf.keras in , We can use callbacks Function implementation early stop :
tf.keras.callbacks.EarlyStopping(
monitor='val_loss', patience=5
)
above ,monitor The parameter indicates the monitored quantity , here val_loss Indicates the loss of validation set . and patience Parameters epochs Number , When there is no improvement in performance during this process, the training will be stopped . Just to understand , Let's take another look at the picture above . After the dotted line , Every epoch Will lead to higher verification set errors . therefore , After the dotted line patience individual epoch, The model will stop training , Because there is no further improvement .
# Import the corresponding toolkit
import tensorflow as tf
import numpy as np
# As the continuous 3 individual epoch loss Stop training if you don't descend
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
# Define a neural network with only one layer
model = tf.keras.models.Sequential([tf.keras.layers.Dense(10)])
# Set the loss function and gradient descent algorithm
model.compile(tf.keras.optimizers.SGD(), loss='mse')
# model training
history = model.fit(np.arange(100).reshape(5, 20), np.array([0,1,2,1,2]),
epochs=10, batch_size=1, callbacks=[callback],
verbose=1)
# Print running epoch
len(history.history['loss'])
Output :
Epoch 1/10
5/5 [==============================] - 0s 600us/step - loss: 145774557280600064.0000
Epoch 2/10
5/5 [==============================] - 0s 522us/step - loss: 10077891596456623723194184833695744.0000
Epoch 3/10
5/5 [==============================] - 0s 1ms/step - loss: inf
Epoch 4/10
5/5 [==============================] - 0s 1ms/step - loss: inf
# Just run 4 Time
4
4. Batch of standardized
Batch of standardized (BN layer ,Batch Normalization) yes 2015 A method proposed in , During in-depth network training , Most of them will adopt this algorithm , Same as full connection layer ,BN Layer is also a layer in the network .
BN Layer is for a single neuron , When using the Internet to train mini-batch To calculate the neuron xi The mean and variance of , Normalized and reconstructed , So it's called Batch Normalization. Before each layer input , Take the data BN, Then send it to the follow-up network for learning :
First, we standardize the output of neurons in a batch of data ,
Then use transform reconstruction , The learnable parameter is introduced γ、β, If the input mean of each hidden layer is close to 0 Region , That is, in the linear region of the activation function , It is not conducive to the training of nonlinear neural networks , So as to obtain the model with poor effect . therefore , Need to use γ and β Further processing of the standardized results :
This is it. BN The final result of the layer . The overall process is shown in the figure below :
stay tf.keras Implementation and use in :
# It can be directly put into the structure of neural network
tf.keras.layers.BatchNormalization(
epsilon=0.001, center=True, scale=True,
beta_initializer='zeros', gamma_initializer='ones',
)
summary
know L2 Regularization and L1 The method of regularization
In loss function (cost function) Add a regular item to , Due to the addition of this regularization term , The value of the weight matrix decreases , Because it assumes that a neural network with a smaller weight matrix leads to a simpler modelKnow random deactivation droupout Application
During each iteration , Select some nodes randomly , And delete forward and backward connectionsKnow how to stop early
When you see that the performance of the verification set is getting worse and worse, or the performance is no longer improved , Stop training the model immediatelyknow BN How to use layers
When using the Internet to train mini-batch To calculate the neuron xi The mean and variance of , Normalized and reconstructed , So it's called Batch Normalization
边栏推荐
- Redis publier et s'abonner
- 都在说DevOps,你真正了解它吗?
- LeetCode 1200 最小絕對差[排序] HERODING的LeetCode之路
- LVGL 8.2 List
- EventBridge 在 SaaS 企业集成领域的探索与实践
- How to match chords
- Leetcode 1200 minimum absolute difference [sort] the way of leetcode in heroding
- Techsmith Camtasia Studio 2022.0.2屏幕录制软件
- MP3是如何诞生的?
- 函数计算异步任务能力介绍 - 任务触发去重
猜你喜欢
No servers available for service: xxxx
Alcohol driving monitoring system based on stm32+ Huawei cloud IOT design
Quick introduction to automatic control principle + understanding
LVGL 8.2 Line wrap, recoloring and scrolling
UFO:微软学者提出视觉语言表征学习的统一Transformer,在多个多模态任务上达到SOTA性能!...
numpy笔记
LVGL 8.2 Line
SAIC Maxus officially released its new brand "mifa", and its flagship product mifa 9 was officially unveiled!
Introduction to asynchronous task capability of function calculation - task trigger de duplication
[information retrieval] experiment of classification and clustering
随机推荐
Ranking list of databases in July: mongodb and Oracle scores fell the most
Alcohol driving monitoring system based on stm32+ Huawei cloud IOT design
Free, easy-to-use, powerful lightweight note taking software evaluation: drafts, apple memo, flomo, keep, flowus, agenda, sidenote, workflow
Comment configurer un accord
[information retrieval] link analysis
Korean AI team plagiarizes shock academia! One tutor with 51 students, or plagiarism recidivist
10. (map data) offline terrain data processing (for cesium)
[cloud native] how can I compete with this database?
Classify boost libraries by function
[MySQL from introduction to proficiency] [advanced chapter] (IV) MySQL permission management and control
LVLG 8.2 circular scrolling animation of a label
LVLG 8.2 circular scrolling animation of a label
Helix swarm Chinese package is released, and perforce further improves the user experience in China
LVGL 8.2 List
内存管理总结
Leetcode 1200 minimum absolute difference [sort] the way of leetcode in heroding
Is BigDecimal safe to calculate the amount? Look at these five pits~~
为什么国产手机用户换下一部手机时,都选择了iPhone?
Helix Swarm中文包发布,Perforce进一步提升中国用户体验
Redis publier et s'abonner