当前位置:网站首页>Introduction to gradient descent method - black horse programmer machine learning handout
Introduction to gradient descent method - black horse programmer machine learning handout
2022-06-24 04:36:00 【Dark horse programmer official】
Learning goals
- Know the principle of full gradient descent algorithm
- Know the principle of random gradient descent algorithm
- Know the principle of random average gradient descent algorithm
- Know the principle of small batch gradient descent algorithm
In the previous section, we introduced the most basic implementation process of gradient descent method , Common gradient descent algorithms are :
- Full gradient descent algorithm (Full gradient descent),
- Stochastic gradient descent algorithm (Stochastic gradient descent),
- Small batch gradient descent algorithm (Mini-batch gradient descent),
- Random average gradient descent algorithm (Stochastic average gradient descent)
They are all to adjust the weight vector correctly , By calculating a gradient for each weight , To update the weights , Minimize the objective function as much as possible . The difference is that the samples are used in different ways .
1 Full gradient descent algorithm (FG)
Calculate all sample errors of the training set , Sum them and then take the average value as the objective function .
The weight vector moves in the opposite direction of its gradient , So as to reduce the current objective function by the most .
Because when you perform each update , We need to calculate all gradients over the entire data set , So the batch gradient descent method will be very slow , meanwhile , The batch gradient descent method cannot process data sets that exceed the memory capacity limit .
The batch gradient descent method also can not update the model online , That is, in the process of operation , No new samples can be added .
It is to calculate the loss function with respect to the parameters on the whole training data set θ Gradient of :

2 Stochastic gradient descent algorithm (SG)
because FG All sample errors need to be calculated for each iteration to update the weight , In practical problems, there are often hundreds of millions of training samples , Therefore, the efficiency is low , And it is easy to fall into local optimal solution , Therefore, a random gradient descent algorithm is proposed .
The objective function of each round of calculation is no longer the error of all samples , It's just a single sample error , namely Calculate the gradient of the objective function of one sample at a time to update the weight , Take another sample and repeat the process , Until the loss function value stops falling or the loss function value is less than a tolerable threshold .
This process is simple , Efficient , Generally, the convergence of update iteration to local optimal solution can be avoided . Its iterative form is

among ,x(i) Represents the eigenvalue of a training sample ,y(i) Represents the tag value of a training sample
But because of ,SG Use only one sample iteration at a time , In case of noise, it is easy to fall into local optimal solution .
3 Small batch gradient descent algorithm (mini-batch)
The small batch gradient descent algorithm is FG and SG A compromise of , To some extent, it takes into account the advantages of the above two methods .
A small sample set is randomly selected from the training sample set every time , On the extracted small sample set FG Iteratively update weights .
The number of sample points contained in the extracted small sample set is called batch_size, Usually set to 2 Power square , Better for GPU Accelerated processing .
Special , if batch_size=1, It becomes SG; if batch_size=n, It becomes FG. Its iterative form is

4 Random average gradient descent algorithm (SAG)
stay SG In the method , Although it avoids the problem of high computational cost , But for big data training ,SG The effect is often unsatisfactory , Because each round of gradient update is completely independent of the data and gradient of the previous round .
The random average gradient algorithm overcomes this problem , Maintain an old gradient in memory for each sample , Randomly select the second i Samples to update the gradient of this sample , The gradient of other samples remains unchanged , Then we get the average of all the gradients , Then the parameters are updated .
such , Each round of updating only needs to calculate the gradient of one sample , Calculating the cost is equivalent to SG, But it converges much faster .
5 Summary
- Full gradient descent algorithm (FG)【 know 】
- When doing the calculation , Calculate the average error of all samples , As my objective function
- Stochastic gradient descent algorithm (SG)【 know 】
- Select only one sample at a time for assessment
- Small batch gradient descent algorithm (mini-batch)【 know 】
- Select a part of samples for assessment
- Random average gradient descent algorithm (SAG)【 know 】
- Will maintain an average value for each sample , In the later calculation , Refer to this average
边栏推荐
- 2. in depth tidb: entry code analysis and debugging tidb
- 开源之夏2022中选结果公示,449名高校生将投入开源项目贡献
- DP summary of ACM in recent two weeks
- getAttribute 返回值为null
- Introduction to C language custom types (structure, enumeration, union, bit segment)
- How to use and apply for ECS? What parameters can be configured
- 应用实践 | Apache Doris 整合 Iceberg + Flink CDC 构建实时湖仓一体的联邦查询分析架构
- Methods of creating and modifying shell script files in batch
- How to restart the ECS? What are the differences between ECS restart and normal computers?
- How to identify information more quickly and accurately through real-time streaming media video monitoring?
猜你喜欢

The results of the 2022 open source summer were announced, and 449 college students will contribute to open source projects

How can the new generation of HTAP databases be reshaped in the cloud? Tidb V6 online conference will be announced soon!
uni-app进阶之认证【day12】

15+ urban road element segmentation application, this segmentation model is enough

Clang code coverage detection (pile insertion technology)

apipost接口断言详解

How does the compiler put the first instruction executed by the chip at the start address of the chip?
2020年Android面试题汇总(中级)

TCPIP协议详解

大一下学期期末总结(补充知识漏洞)
随机推荐
[receive] new benefits of 60 yuan / year? Lowest in history! Double 11 has now begun to seize resources! Get started quickly!!
How does ECS select bandwidth? What types of servers do you usually have?
2022年二级造价工程师备考攻略,你准备好了吗?
Web penetration test - 5. Brute force cracking vulnerability - (9) MS-SQL password cracking
What if the ECS forgets its password? How can I retrieve my forgotten password?
Jointly build Euler community and share Euler ecology | join hands with Kirin software to create a digital intelligence future
What does VPS server mean? What is the difference between a VPS server and an ECS?
集成阿里云短信服务以及报签名不合法的原因
What is an ECS? What is the difference between ECs and traditional servers?
2. in depth tidb: entry code analysis and debugging tidb
How to open the port of ECS what are the precautions for using ECS
Worthington弹性蛋白酶的应用和相关研究
What is Ping? How can the server disable Ping?
After purchasing Tencent ECs, how to solve packet loss in Internet access?
What is etcd and its application scenarios
openEuler Kernel 技术分享第 20 期 | 执行实体创建与切换
How to select a telemedicine program system? These four points are the key!
Have you learned all these routines to solve bugs?
Kubernetes 资源拓扑感知调度优化
Doctor application | Hong Kong University of science and Technology (Guangzhou) Mr. Liu Hao recruits the full award doctor / Master in data mining