当前位置:网站首页>The principle of normal equation method and its difference from gradient descent method
The principle of normal equation method and its difference from gradient descent method
2022-07-26 22:01:00 【TranSad】
Normal equation method is another solution similar to gradient descent method, which can be used to solve multiple linear regression problems . Different from the gradient descent method, it needs iterative updates again and again , The normal equation method only needs to solve the equation , You can get the optimization result . This article will briefly introduce its principle and the difference with the gradient descent method .
First, let's take a look at such a picture to lead to a problem environment . The following figure J(θ1) Loss function Cost Function. Suppose it has only one parameter θ1, We use the formula in the figure below ( Gradient descent method ) The method in , You can find a local or global optimal solution .

But we are not going to talk about the gradient descent method , The additional noteworthy thing in the above figure is , At this point J(θ1) Yes θ1 The derivative of is 0. Now we come to Gaowei ( Here's the picture ), There are multiple parameters θ0 and θ1, Our optimal solution is at the position indicated by the arrow in the figure . similarly , At this point ,J(θ0,θ1) The derivative for all parameters is 0.

First, add a little premise knowledge :
Suppose we only have two parameters at present θ0 and θ1 Need to optimize , At this time, the initialization fitting line is :

The loss function is :

Now let's expand the parameter to θn. Because we know : Since the derivative of the point where the optimal solution is located to all parameters is 0, Then we can write these constraints first :

This is just a system of equations ? Yes n+1 Unknown variables ( from θ0 To θn), as well as n+1 Equation . We just need to solve the equation , You can get the best θ As our result . Finally, we solve the equation , You can sort it out θ Final form :

such , The idea of normal equation method is over .
wait a minute ! I wonder after you read the form of the final conclusion , Do you feel that we have made a big circle ? actually , In the way above , our θ Not easy to solve ( Although I have written the solution ). And we use another equivalent and more direct way , The results of the normal equation method can also be obtained .
Specifically θ How to calculate , Here is a specific example of house price prediction , To describe more clearly X and y as well as θ Form of relationship , And leads to another way to find the conclusion of the normal equation method . In the following example , We hope to use the existing data set , Fit a containing 5 Parameters θ0 To θ4 Regression curve of . Just look at the picture :

The essence of this idea is : We require an optimal θ, You can directly assume Xθ=y, So as to push back θ Come on . This is a loose but concise idea .
stay The first ① Step in , We know that if we want to get theta, Just multiply both sides by X The inverse matrix of , But most of the time X It doesn't have to be a square , There is not necessarily an inverse matrix , So we all multiply at the same time X The transpose matrix of , Let it become a square array . And then The first ② Step in , We can move the square matrix to the right by multiplying it by its inverse matrix , obtain θ The final solution of .
Gradient Descent Gradient descent method and Normal Equation Method The difference between normal equation method
1. The former needs to set the learning rate α; The latter does not need ;
2. The former requires multiple iterations ; The latter need not , Solve directly .
3. When X Dimensions n When a large , Gradient descent method is more suitable ; Because the matrix operation involved in the normal equation method will be very slow .
4. Gradient descent is easy to fall into local optimal solution ; The normal equation method is easier to find the optimal solution .
边栏推荐
- Search eBay product API by keyword
- Talk about TCP time_ WAIT
- July training (the 26th day) - and check the collection
- Basic operation of (C language) files
- 25 cool interactive charts, one entry plotly
- 分布式 session 的4个解决方案
- Altium Designer 22 修改选中元件的层属性
- Shangtang technology releases sensepass pro, an all-in-one face recognition machine
- 45、实例分割的labelme数据集转coco数据集以及coco数据集转labelme数据集
- 新来个技术总监要我做一个 IP 属地功能~
猜你喜欢

从手动测试,到自动化测试老司机,只用了几个月,我的薪资翻了一倍

七、微信小程序运行报错:Error: AppID 不合法,invalid appid

Altium Designer 22 中文字符乱码

FreeRTOS personal notes - Software Timer

JDBC operation and entry case of MySQL

JMeter自定义日志与日志分析

flask 源码启动阶段
![[mysql]substr usage - query the value of specific digits of a field in the table](/img/d5/68658ff15f204dc97abfe7c9e6b354.png)
[mysql]substr usage - query the value of specific digits of a field in the table

Flink's real-time data analysis practice in iFLYTEK AI marketing business

攻防世界----ics-07
随机推荐
JDBC总结
开发转测试:从零开始的6年自动化之路
虾皮shopee根据ID取商品详情 API
彻底搞通服务发现的原理和实现
Ren Zhengfei talked about the suppression of the United States again: to live is to win, and to defeat the United States
In depth analysis of the source code, why is the string class immutable? (hit me before you understand)
5、 Applet error: message:error: system error, error code: 80058, desc of scope userLocation is empty
Use of cmake
25 cool interactive charts, one entry plotly
七、微信小程序运行报错:Error: AppID 不合法,invalid appid
Altium Designer 22 修改选中元件的层属性
A bowl of noodles in a dream
Leetcode exercise - Sword finger offer II 005. maximum product of word length
按图搜索义乌购商品(拍立淘) API
[audio and video] ijkplayer player parameter description document
My SQL is OK. Why is it still so slow? MySQL locking rules
cmake编译obs-studio-27.2.0
《暑假每日一题》Week 7:7.18 - 7.24
Flink 在讯飞 AI 营销业务的实时数据分析实践
matlab 画短时平均幅度谱