当前位置:网站首页>Machine learning notes - in depth Learning Skills Checklist
Machine learning notes - in depth Learning Skills Checklist
2022-06-11 09:08:00 【Sit and watch the clouds rise】
One 、 Data processing
1、 Data to enhance
Deep learning models usually require a large amount of data to be properly trained . It is often useful to use data enhancement techniques to extract more data from existing data . The main findings are summarized in the table below . More precisely , Given the following input image , Here are the techniques we can apply :

2、 Batch standardization
It is a super parameter
One step of , Used to normalize batches
. By writing down
,
We want to correct the mean and variance of the batch , As shown below :

It is usually fully connected / After convolution layer and before nonlinear layer , Designed to allow a higher learning rate and reduce dependence on initialization .
Two 、 Training neural network
1、 Definition
Epoch: In the context of the training model ,epoch It's a term , Used to refer to an iteration in which the model sees the entire training set to update its weight .
Small batch gradient descent : In the training phase , Due to computational complexity or noise problems , Updating weights is usually not based on the entire training set or one data point at a time . contrary , The update step is completed in small batches , The number of data points in one batch is a super parameter that we can adjust .
Loss function : To quantify the execution of a given model , Loss function
Usually used to evaluate model output
Correctly predict the actual output
The degree of .
Cross entropy loss : In the context of binary classification of neural networks , Cross entropy loss
It is commonly used. , The definition is as follows :
![L(z,y)=-\Big[y\log(z)+(1-y)\log(1-z)\Big]](http://img.inotgo.com/imagesLocal/202111/16/20211116194353847O_65.gif%28z%2Cy%29%3D-%5CBig%5By%5Clog%28z%29+%281-y%29%5Clog%281-z%29%5CBig%5D)
2、 Find the best weight
Back propagation : Back propagation is a method to update the weights in neural networks by considering the actual output and the expected output . Use the chain rule to calculate each weight
The derivative of .

Use this method , Each weight is updated with the following rules :

Update weights : In the neural network , The weights are updated as follows :
step 1: Acquire a batch of training data and perform forward propagation to calculate the loss .
step 2: Back propagate the loss to obtain the gradient of the loss relative to each weight .
The first 3 Step : Use the gradient to update the weight of the network .

3、 ... and 、 Parameter tuning
1、 Weight initialization
Xavier initialization:Xavier Initialization does not initialize weights in a purely random way , Instead, it enables the initial weights to take into account the unique characteristics of the architecture .
The migration study : Training deep learning models requires a lot of data , More importantly, it takes a lot of time . In a few days / It is often useful to take advantage of pre training weights on a large dataset of weeks of training and use them for our use cases . According to how much data we have , Here are different ways to use it :

2、 Optimization convergence
Learning rate : Learning rate , It's usually written as
Or sometimes
, Indicates the speed of weight update . It can be changed fixedly or adaptively . At present, the better method in practice is called Adam, This is a way to adapt to the learning rate .
Adaptive learning rate : Change the learning rate when training the model , Can reduce training time , Improve the numerical optimal solution . although Adam The optimizer is the most commonly used technology , But other techniques are also useful . They are summarized in the table below :
| Method | Explanation | Update of w | Update of b |
| Momentum | Dampens oscillations Improvement to SGD 2 parameters to tune | ![]() | ![]() |
| RMSprop | Root Mean Square propagation Speeds up learning algorithm by controlling oscillations | ![]() | ![]() |
| Adam | Adaptive Moment estimation Most popular method 4 parameters to tune | ![]() | ![]() |
Four 、 Regularization
Dropout:Dropout Is a technique used in neural networks , By dropping probability
To prevent overfitting training data . It forces the model to avoid over reliance on specific feature sets .

Weight regularization : To ensure that the weights are not too large and that the model does not over fit the training set , The regularization technique is usually performed on the model weights . The main conclusions are as follows :
| LASSO | Ridge | Elastic Net |
| • Shrinks coefficients to 0 • Good for variable selection | Makes coefficients smaller | Tradeoff between variable selection and small coefficients |
|
|
|
![]() | ![]() | ![]() |
Early stop : Once the loss has stabilized or started to increase , This regularization technique will stop the training process .

5、 ... and 、 Better practice
Overfitting small batch: When debugging the model , It is often useful to do a quick test to see if there are any major problems with the architecture of the model itself . especially , To ensure that the model can be trained correctly , Pass a message within the network mini-batch To see if it fits . If not , It means that the model is either too complex , Or it's not complicated enough , You can't even fit in small batches , Not to mention a normal size training set .
Gradient checking: Gradient checking is a method used in performing back propagation of neural networks . It compares the value of the analytical gradient with the numerical gradient at a given point , And play the role of correctness inspection .
| Type | Numerical gradient | Analytical gradient |
| Formula | ![]() | ![]() |
| Comments | • Expensive; loss has to be computed two times per dimension • Used to verify correctness of analytical implementation • Trade-off in choosing $h$ not too small (numerical instability) nor too large (poor gradient approximation) | • 'Exact' result • Direct computation • Used in the final implementation |
边栏推荐
- 山东大学增强现实实验四
- PHP解决中文显示乱码
- Flutter development log - route management
- 面試題 02.02. 返回倒數第 k 個節點
- [software] ERP model selection method for large enterprises
- Livedata and stateflow, which should I use?
- 2130. 链表最大孪生和
- 844. 比较含退格的字符串
- Create a nodejs based background service using express+mysql
- C language printing heart
猜你喜欢

矩阵求逆操作的复杂度分析(逆矩阵的复杂度分析)
![[C language - data storage] how is data stored in memory?](/img/cb/2d0cc83fd77de7179a9c45655c1a2d.png)
[C language - data storage] how is data stored in memory?

Textview text size auto fit and textview margin removal

山东大学项目实训(四)—— 微信小程序扫描web端二维码实现web端登录

What software is required to process raw format images?

openstack详解(二十二)——Neutron插件配置

Matlab learning 8- linear and nonlinear sharpening filtering and nonlinear smoothing filtering of image processing

CUMT学习日记——ucosII理论解析—任哲版教材

Install jupyter in the specified environment

Create a nodejs based background service using express+mysql
随机推荐
Android 面试笔录(精心整理篇)
CUMT learning diary - theoretical analysis of uCOSII - Textbook of Renzhe Edition
制作业信息化为什么难施行?
844. 比较含退格的字符串
Console you don't know
Type-C蓝牙音箱单口可充可OTG方案
Award winning survey streamnational sponsored 2022 Apache pulsar user questionnaire
1721. 交换链表中的节点
C language printing diamond
【方案设计】基于单片机开发的家用血氧仪方案
剑指 Offer 06. 从尾到头打印链表
【C语言-指针进阶】挖掘指针更深一层的知识
Why is it difficult to implement informatization in manufacturing industry?
剑指 Offer II 036. 后缀表达式
[FAQ for novices on the road] about data visualization
203. 移除链表元素
445. 两数相加 II
Vagrant mounting pit
Port occupancy problem, 10000 ports
SAP OData development tutorial











![...+\lambda\Big[(1-\alpha)||\theta||_1+\alpha||\theta||_2^2\Big]$ $\lambda\in\mathbb{R},\alpha\in[0,1]](http://img.inotgo.com/imagesLocal/202206/11/202206110859382846_1.gif)

