当前位置:网站首页>Based on the least squares linear regression equation coefficient estimation
Based on the least squares linear regression equation coefficient estimation
2022-08-02 15:31:00 【Yang Laotou Soft Worker】
I. Description of the problem
Unary linear regression analysis is a very simple and very basic regression theory, which can be used to describe the change trend of the linear relationship between two variables, and then predict the data at the unknown point.
Regression analysis is to determine the regression function (equation) according to the change trend of the known data, in which the regression coefficient is to be determined, and then some numerical methods or statistical methods are used to estimate the regression coefficient.
Univariate linear regression analysis is to estimate the coefficients k and b in the equation y=kx+b. The common methods are: computational mathematics - least squares method, statistical method - maximum likelihood estimation method, machine learningMethods - perceptrons, etc., in addition, you can use the operation of the matrix (in fact, it is only the minimum value solution) to solve it directly.
This article takes the data of y = 2x + 1 and y = -2x + 5 for fitting as an example, and gives the method of estimating the regression coefficient by the least square method and its realization in matlab.
II. Mathematical derivation

Problem description:
As shown above, assuming the known data points (xi,yi), i=1...n, and the observation of the scatter plot basically satisfies the linear trend, according to which the function expression of the red straight line is obtained.
As shown in the figure above, the least squares method is to estimate the undetermined coefficients in the regression function by using the minimum sum of the squares of the distances from the known point represented by the black line segment to the regression curve.
The formula is derived as follows:
Substitute (xi,yi) into y=kx+b to get:
Construct least squares function (sum of squared distances):
Take the partial derivatives for k and b respectively:
Divide both ends of the above equation by the number of data points n to get:
Equation (2) can be further transformed into: 
where
Substitute 
into (1) to get:

Substitute k into
to get coefficient b, so far, the two coefficients in the regression equation are calculated.
3. Matlab program
1. Interpret the changing trend of the curve according to the scatter plot
trainX = linspace( 0, 2, 50 );trainY = 2 * trainX + 1 + randn( size( trainX ) )*0.4;plot( trainX, trainY, 'b.', 'markersize', 20 )As shown below:

From the distribution of points in the figure, we can see that the basicIt shows a linear growth trend, so consider using y=kx+b to fit this set of data.
2. The regression coefficient is calculated as follows:
n = length( trainX );xu = sum( trainX ) / n;yu = sum( trainY ) / n;k1 = sum( trainX .* trainY ) - n * xu * yu;k2 = sum( trainX .* trainX ) - n * xu * xu;k = k1 / k2;b = yu - k * xu;The calculation result is:
K = 1.8467, b = 1.2669
3. The complete code is as follows:
% Use the least squares method to estimate the coefficients k and b in the linear regression function y = kx + bclear allclc% Generate training datatrainX = linspace( 0, 2, 50 );trainY = 2 * trainX + 1 + randn( size( trainX ) )*0.4;% draw a scatter plotplot( trainX, trainY, 'b.', 'markersize', 20 )% estimated regression coefficientsn = length( trainX );xu = sum( trainX ) / n;yu = sum( trainY ) / n;k1 = sum( trainX .* trainY ) - n * xu * yu;k2 = sum( trainX .* trainX ) - n * xu * xu;k = k1 / k2;b = yu - k * xu;% draw the regression function curve (straight line)hold onx = [ -0.5 ; 2.5 ];y = k * x + b; % regression equationplot( x, y, 'r', 'LineWidth', 2 );title( 'LSM : y = 2x + 1' )axis( [ -0.5, 2.5, -1, 7 ] )The fitting results are as follows:
Modify statement"trainY = 2 * trainX + 1 + randn( size( trainX ) )*0.1;”
are different functional relationships, and different regression curves can be obtained.For example, modify it to
"trainY = -2 * trainX -5 + randn( size( trainX ) )*0.4;"
to get the following fitted image: 
4. Supplementary note
The least squares method is a very good method for estimating regression by finding the extreme value of a function.The method of parameters in the equation, in fact, although the objective function of the regression coefficient estimated by the maximum likelihood method is different, the results are the same as those estimated directly by the least squares method.
边栏推荐
- 软件测试基础知识(背)
- STM32LL库使用——SPI通信
- Actual combat Meituan Nuxt +Vue family bucket, server-side rendering, mailbox verification, passport authentication service, map API reference, mongodb, redis and other technical points
- Win10系统设置application identity自动提示拒绝访问怎么办
- 深入理解Golang之Map
- Failed to install using npx -p @storybook/cli sb init, build a dedicated storybook by hand
- Win7遇到错误无法正常开机进桌面怎么解决?
- LeetCode2 电话号码的字母组合
- Win10上帝模式干嘛的?Win10怎么开启上帝模式?
- KiCad常用快捷键
猜你喜欢

How to set the win10 taskbar does not merge icons

Win7怎么干净启动?如何只加载基本服务启动Win7系统

用U盘怎么重装Win7系统?如何使用u盘重装系统win7?

win11一直弹出用户账户控制怎么解决

Open the door of power and electricity "Circuit" (2): Power Calculation and Judgment

How to reinstall Win7 system with U disk?How to reinstall win7 using u disk?

Lightweight AlphaPose

C语言函数参数传递模式入门详解

基于最小二乘法的线性回归分析方程中系数的估计

Win11电脑一段时间不操作就断网怎么解决
随机推荐
2021-10-14
win10怎么设置不睡眠熄屏?win10设置永不睡眠的方法
基于最小二乘法的线性回归分析方程中系数的估计
What should I do if I install a solid-state drive in Win10 and still have obvious lags?
系统线性、时不变、因果判断
模板系列-并查集
casbin模型
STM32LL库——USART中断接收不定长信息
jest测试,组件测试
Win11系统找不到dll文件怎么修复
项目:数据库表的梳理
Publish module to NPM should be how to operate?Solutions to problems and mistake
pygame绘制弧线
2021-10-14
3. User upload avatar
pygame图像连续旋转
专硕与学硕
单端K总线收发器DP9637兼容L9637
Introduction to in-order traversal (non-recursive, recursive) after binary tree traversal
The SSE instructions into ARM NEON