当前位置:网站首页>Working principle of gradient descent algorithm in machine learning
Working principle of gradient descent algorithm in machine learning
2020-11-06 01:14:00 【Artificial intelligence meets pioneer】
How gradient descent algorithm works in machine learning
author |NIKIL_REDDY compile |VK source |Analytics Vidhya
Introduce
Gradient descent algorithm is one of the most commonly used machine learning algorithms in industry . But it confuses a lot of new people .
If you're new to machine learning , The math behind the gradient decline is not easy . In this paper , My goal is to help you understand the intuition behind the gradient descent .
We will quickly understand the role of the cost function , The explanation for the gradient descent , How to choose learning parameters .
What is the cost function
It's a function , Used to measure the performance of a model against any given data . The cost function quantifies the error between the predicted value and the expected value , And expressed in the form of a single real number .
After assuming the initial parameters , We calculated the cost function . The goal is to reduce the cost function , The gradient descent algorithm is used to modify the given data . Here's the mathematical representation of it :
_LI.jpg)
What is gradient descent
Suppose you're playing a game , Players are at the top of the mountain , They were asked to reach the lowest point of the mountain . Besides , They're blindfolded . that , How do you think you can get to the lake ?
Before you go on reading , Take a moment to think about .
The best way is to look at the ground , Find out where the ground is falling . From this position , Take a step down , Repeat the process , Until we reach the lowest point .
Gradient descent method is an iterative optimization algorithm for solving local minimum of function .
We need to use the gradient descent method to find the local minimum of the function , The negative gradient of the function at the current point must be selected ( Away from the gradient ) The direction of . If we take a positive direction with the gradient , We are going to approach the local maximum of the function , This process is called gradient rise .
Gradient descent was originally made by Cauchy in 1847 Put forward in . It's also known as steepest descent .
The goal of gradient descent algorithm is to minimize the given function ( For example, the cost function ). In order to achieve this goal , It iteratively performs two steps :
-
Calculate the gradient ( Slope ), The first derivative of a function at that point
-
Do the opposite direction to the gradient ( Move )
.png)
Alpha It's called the learning rate - An adjustment parameter in the optimization process . It determines the step size .
Draw gradient descent algorithm
When we have a single parameter (θ), We can do it in y Plot the dependent variable cost on the axis , stay x Draw on the axis θ. If you have two parameters , We can do three-dimensional drawing , There's a cost on one of the shafts , There are two parameters on the other two axes (θ).
It can also be visualized by using contours . This shows a two-dimensional three-dimensional drawing , These include the parameters along the two axes and the response values of the contour lines . The response value away from the center increases , And it increases with the increase of rings .
α- Learning rate
We have a way forward , Now we have to decide the size of the steps we have to take .
You have to choose carefully , To achieve a local minimum .
-
If the learning rate is too high , We may exceed the minimum , It doesn't reach a minimum
-
If the learning rate is too low , The training time may be too long
a) The best learning rate , The model converges to the minimum
b) The learning speed is too slow , It takes more time , But it converges to the minimum
c) The learning rate is higher than the optimal value , Slower convergence (1/c<η < 2/c)
d) The learning rate is very high , It will deviate too much from , Deviation from the minimum , Learning performance declines
notes : As the gradient decreases, it moves to the local minimum , Step size reduction . therefore , Learning rate (alpha) It can remain unchanged during the optimization process , And you don't have to change it iteratively .
Local minimum
The cost function can consist of many minimum points . The gradient can fall on any minimum , It depends on the starting point ( That's the initial parameter θ) And learning rate . therefore , At different starting points and learning rates , Optimization can converge to different points .
Gradient down Python Code implementation
ending
Once we adjust the learning parameters (alpha) The optimal learning rate is obtained , We start iterating , Until we converge to a local minimum .
Link to the original text :https://www.analyticsvidhya.com/blog/2020/10/how-does-the-gradient-descent-algorithm-work-in-machine-learning/
Welcome to join us AI Blog station : http://panchuang.net/
sklearn Machine learning Chinese official documents : http://sklearn123.com/
Welcome to pay attention to pan Chuang blog resource summary station : http://docs.panchuang.net/
版权声明
本文为[Artificial intelligence meets pioneer]所创,转载请带上原文链接,感谢
边栏推荐
- 自然语言处理-错字识别(基于Python)kenlm、pycorrector
- 嘘!异步事件这样用真的好么?
- 二叉树的常见算法总结
- 6.7 theme resolver theme style parser (in-depth analysis of SSM and project practice)
- 如何使用ES6中的参数
- Anomaly detection method based on SVM
- 6.8 multipartresolver file upload parser (in-depth analysis of SSM and project practice)
- 让前端攻城师独立于后端进行开发: Mock.js
- 8.1.1 handling global exceptions through handlerexceptionresolver
- 被产品经理怼了,线上出Bug为啥你不知道
猜你喜欢
随机推荐
vite + ts 快速搭建 vue3 專案 以及介紹相關特性
不吹不黑,跨平臺框架AspNetCore開發實踐雜談
通过深层神经网络生成音乐
【新閣教育】窮學上位機系列——搭建STEP7模擬環境
如何选择分类模型的评价指标
网络安全工程师演示:原来***是这样获取你的计算机管理员权限的!【***】
非常规聚合问题举例
深入了解JS数组的常用方法
Jmeter——ForEach Controller&Loop Controller
为了省钱,我用1天时间把PHP学了!
接口压力测试:Siege压测安装、使用和说明
被产品经理怼了,线上出Bug为啥你不知道
Ubuntu18.04上安裝NS-3
计组-字长
OPTIMIZER_TRACE详解
【事件中心 Azure Event Hub】Event Hub日誌種發現的錯誤資訊解讀
Kitty中的动态线程池支持Nacos,Apollo多配置中心了
技術總監7年經驗,告訴大家,【拒絕】才是專業
c++学习之路:从入门到精通
【C/C++ 2】Clion配置与运行C语言