当前位置:网站首页>Introduction to gradient descent method - black horse programmer machine learning handout
Introduction to gradient descent method - black horse programmer machine learning handout
2022-06-24 04:36:00 【Dark horse programmer official】
Learning goals
- Know the principle of full gradient descent algorithm
- Know the principle of random gradient descent algorithm
- Know the principle of random average gradient descent algorithm
- Know the principle of small batch gradient descent algorithm
In the previous section, we introduced the most basic implementation process of gradient descent method , Common gradient descent algorithms are :
- Full gradient descent algorithm (Full gradient descent),
- Stochastic gradient descent algorithm (Stochastic gradient descent),
- Small batch gradient descent algorithm (Mini-batch gradient descent),
- Random average gradient descent algorithm (Stochastic average gradient descent)
They are all to adjust the weight vector correctly , By calculating a gradient for each weight , To update the weights , Minimize the objective function as much as possible . The difference is that the samples are used in different ways .
1 Full gradient descent algorithm (FG)
Calculate all sample errors of the training set , Sum them and then take the average value as the objective function .
The weight vector moves in the opposite direction of its gradient , So as to reduce the current objective function by the most .
Because when you perform each update , We need to calculate all gradients over the entire data set , So the batch gradient descent method will be very slow , meanwhile , The batch gradient descent method cannot process data sets that exceed the memory capacity limit .
The batch gradient descent method also can not update the model online , That is, in the process of operation , No new samples can be added .
It is to calculate the loss function with respect to the parameters on the whole training data set θ Gradient of :

2 Stochastic gradient descent algorithm (SG)
because FG All sample errors need to be calculated for each iteration to update the weight , In practical problems, there are often hundreds of millions of training samples , Therefore, the efficiency is low , And it is easy to fall into local optimal solution , Therefore, a random gradient descent algorithm is proposed .
The objective function of each round of calculation is no longer the error of all samples , It's just a single sample error , namely Calculate the gradient of the objective function of one sample at a time to update the weight , Take another sample and repeat the process , Until the loss function value stops falling or the loss function value is less than a tolerable threshold .
This process is simple , Efficient , Generally, the convergence of update iteration to local optimal solution can be avoided . Its iterative form is

among ,x(i) Represents the eigenvalue of a training sample ,y(i) Represents the tag value of a training sample
But because of ,SG Use only one sample iteration at a time , In case of noise, it is easy to fall into local optimal solution .
3 Small batch gradient descent algorithm (mini-batch)
The small batch gradient descent algorithm is FG and SG A compromise of , To some extent, it takes into account the advantages of the above two methods .
A small sample set is randomly selected from the training sample set every time , On the extracted small sample set FG Iteratively update weights .
The number of sample points contained in the extracted small sample set is called batch_size, Usually set to 2 Power square , Better for GPU Accelerated processing .
Special , if batch_size=1, It becomes SG; if batch_size=n, It becomes FG. Its iterative form is

4 Random average gradient descent algorithm (SAG)
stay SG In the method , Although it avoids the problem of high computational cost , But for big data training ,SG The effect is often unsatisfactory , Because each round of gradient update is completely independent of the data and gradient of the previous round .
The random average gradient algorithm overcomes this problem , Maintain an old gradient in memory for each sample , Randomly select the second i Samples to update the gradient of this sample , The gradient of other samples remains unchanged , Then we get the average of all the gradients , Then the parameters are updated .
such , Each round of updating only needs to calculate the gradient of one sample , Calculating the cost is equivalent to SG, But it converges much faster .
5 Summary
- Full gradient descent algorithm (FG)【 know 】
- When doing the calculation , Calculate the average error of all samples , As my objective function
- Stochastic gradient descent algorithm (SG)【 know 】
- Select only one sample at a time for assessment
- Small batch gradient descent algorithm (mini-batch)【 know 】
- Select a part of samples for assessment
- Random average gradient descent algorithm (SAG)【 know 】
- Will maintain an average value for each sample , In the later calculation , Refer to this average
边栏推荐
- Easyanticheat uses to inject unsigned code into a protected process (2)
- 重新认识WorkPlus,不止IM即时通讯,是企业移动应用管理专家
- 2020年Android面试题汇总(初级)
- 什么是数据中台
- 数据库解答建标,按要求回答
- How to create an FTP server on the ECS? Is it safe to create an FTP server on the ECS?
- The practice of tidb slow log in accompanying fish
- How to adjust the alarm information that remains unchanged after paging is selected on the easygbs alarm page?
- IDC, Youshang cloud data on cloud (COS) best practices
- Ribbon
猜你喜欢

应用实践 | Apache Doris 整合 Iceberg + Flink CDC 构建实时湖仓一体的联邦查询分析架构

微博国际版更名为微博轻享版

由浅入深的混合精度训练教程

mysql - sql执行过程

What is etcd and its application scenarios

C语言自定义类型的介绍(结构体,枚举,联合体,位段)

An interface testing software that supports offline document sharing in the Intranet

一款支持内网脱机分享文档的接口测试软件

Abnova membrane protein lipoprotein solution

Training course of mixed accuracy from simple to deep
随机推荐
What if the ECS forgets its password? How can I retrieve my forgotten password?
DP summary of ACM in recent two weeks
I have an agreement with IOT
web渗透测试----5、暴力破解漏洞--(5)SMB密码破解
Abnova膜蛋白脂蛋白体解决方案
How does the compiler put the first instruction executed by the chip at the start address of the chip?
Abnova fluorescence in situ hybridization (FISH) probe solution
What does VPS server mean? What is the difference between a VPS server and an ECS?
Application and related research of Worthington elastase
How to do the right thing in digital marketing of consumer goods enterprises?
Collagenase -- four types of crude collagenase from Worthington
event
大一下学期期末总结(补充知识漏洞)
Wide & deep model and optimizer understand code practice
C语言自定义类型的介绍(结构体,枚举,联合体,位段)
What is etcd and its application scenarios
Chemical properties and specificity of Worthington Papain
Multi task video recommendation scheme, baidu engineers' actual combat experience sharing
How to build a website for ECS is the price of ECS very expensive
Easyanticheat uses to inject unsigned code into a protected process (2)