当前位置:网站首页>Linear Regression
Linear Regression
2022-06-10 09:01:00 【Dreamer DBA】
Linear regression is a method for modeling the relationship between one or more independent variables and a dependent variable.It is a staple of statistics and it often considered a good introductory machine learning method. In this tutorial , you will discover the matrix formulation of linear regression and how to solve it using direct and matrix factorization methods.After completing this tutorial, you will know:
- Linear regression and the matrix reformulation with the normal equations.
- How to solve linear regression using a QR matrix decomposition.
- How to solve linear regression using SVD and the pseudoinverse.
1.1 Tutorial Overview
This tutorial is divided into 7 parts; they are:
1.What is Linear Regression
2.Matrix Formulation of linear Regression
3. Linear Regression Dataset
4.Solve via Inverse
5.Solve via QR Decomposition
6. Solve via SVD and Pseudoinverse
7.Solve via Convenience Function
1.2 What is Linear Regression
Linear regression is a method for modeling the relationship between scalar values: The input variable x and the output variable y. The model assumes that y is a linear function or a weighted sum of the input variables.

Or, stated with the coefficients

The model can also be used to model an output variable given multiple input variables called multivariate linear regression

The objective of creating a linear regression model is to find the values for the coefficient values (b) that minimize the error in the prediction of the output variable y
1.3 Matrix Formulation of Linear Regression


1.4 Linear Regression Dataset
# Example of example linear regression dataset
# linear regression dataset
from numpy import array
from matplotlib import pyplot
# define dataset
data = array([
[0.05, 0.12],
[0.18, 0.22],
[0.31, 0.35],
[0.42, 0.38],
[0.5, 0.49]
])
print(data)
# split into inputs and outputs
X,y = data[:,0],data[:,1]
X = X.reshape(len(X),1)
# scatter plot
pyplot.scatter(X, y)
pyplot.show()Running the example first prints the defined dataset.

1.5 Solve via Inverse

# Example of example linear regression dataset
# linear regression dataset
from numpy import array
from matplotlib import pyplot
# define dataset
data = array([
[0.05, 0.12],
[0.18, 0.22],
[0.31, 0.35],
[0.42, 0.38],
[0.5, 0.49]
])
print(data)
# split into inputs and outputs
X,y = data[:,0],data[:,1]
X = X.reshape(len(X),1)
# scatter plot
pyplot.scatter(X, y)
pyplot.show()
1.5 Solve via Inverse
The first approach is to attempt to solve the regression problem directly using the matrix inverse.That is,given X,what are the set of cofficients b that when multiplied by X will give y.As we saw in a previous section, the normal equations define how to calculate b directly.

# Example of calculating a linear regression solution directly
# direct solution to linear least squares
from numpy import array
from numpy.linalg import inv
from matplotlib import pyplot
# define dataset
data = array([
[0.05, 0.12],
[0.18, 0.22],
[0.31, 0.35],
[0.42, 0.38],
[0.5, 0.49]])
# split into inputs and outputs
X,y = data[:,0],data[:,1]
X = X.reshape((len(X), 1))
# linear leaset squares
b = inv(X.T.dot(X)).dot(X.T).dot(y)
print(b)
# predict using coefficients
yhat = X.dot(b)
# plot data and predictions
pyplot.scatter(X, y)
pyplot.plot(X, yhat, color='red')
pyplot.show()Running the example performs the calculation and prints the coefficient vector b.
[1.00233226]
A scatter plot of the dataset is then created with a line plot for the model, showing a reasonable fit to the data.

A problem with this approach is the matrix inverse that is both computationally expensive and numerically unstable. An alternative approach is to use a matrix decomposition to avoid this operation. We will look at two examples in the following sections.
1.6 Solve via QR Decomposition

# Example of calculating a QE decomposition
# QR decomposition
from numpy.linalg import qr
Q,R = qr(X)
b = inv(R).dot(Q.T).dot(y)# Example of calculating a linear regression solution using a QR decomposition
# QR decomposition solution to linear least squares
from numpy import array
from numpy.linalg import inv
from numpy.linalg import qr
from matplotlib import pyplot
# define dataset
data = array([
[0.05, 0.12],
[0.18, 0.22],
[0.31, 0.35],
[0.42, 0.38],
[0.5, 0.49]
])
# split into inputs and outputs
X,y = data[:,0],data[:,1]
X = X.reshape((len(X),1))
# factorize
Q,R = qr(X)
b = inv(R).dot(Q.T).dot(y)
print(b)
# predict using coefficients
yhat = X.dot(b)
# plot data and predictions
pyplot.scatter(X,y)
pyplot.plot(X,yhat,color='red')
pyplot.show()The QR decomposition approach is more computationally efficient and more numerically stable than calculating the normal equation directly, but does not work for all data matrices.
1.7 Solve via SVD and Pseudoinverse

Where X+ is the pseudoinverse of X and the + is a superscript, D+ is the pseudoinverse of the diagonal matrix Σ and V T is the transpose of V . NumPy provides the function pinv() to calculate the pseudoinverse directly. The complete example is listed below.
# Example of calculating a linear regression solution using an SVD
# SVD solution via pseudoinverse to linear least squares
from numpy import array
from numpy.linalg import pinv
from matplotlib import pyplot
# define dataset
data = array([
[0.05, 0.12],
[0.18, 0.22],
[0.31, 0.35],
[0.42, 0.38],
[0.5, 0.49]
])
# split into inputs and outputs
X,y = data[:,0],data[:,1]
X = X.reshape((len(X),1))
# calculate coefficients
b = pinv(X).dot(y)
print(b)
# predict using coefficients
yhat = X.dot(b)
# plot data and predictions
pyplot.scatter(X,y)
pyplot.plot(X, yhat,color='red')
pyplot.show()

1.8 Solve via Convenience Function
# least square via convenience function
from numpy import array
from numpy.linalg import lstsq
from matplotlib import pyplot
# define dataset
data = array([
[0.05, 0.12],
[0.18, 0.22],
[0.31, 0.35],
[0.42, 0.38],
[0.5, 0.49]
])
# split into inputs and outputs
X,y = data[:,0],data[:,1]
X = X.reshape((len(X),1))
# calculate coefficients
b,residuals,rank,s = lstsq(X,y)
print(b)
# predict using coefficients
yhat = X.dot(b)
#plot data and predictions
pyplot.scatter(X,y)
pyplot.plot(X,yhat,color='red')
pyplot.show()
边栏推荐
猜你喜欢

接触式IC卡 - STM32(Smart Card)

知识图谱、图数据平台、图技术如何助力零售业飞速发展
![Huawei software test interview question | a Huawei successful employee's sharing [written examination question]](/img/4c/7fb9390dd9490c6a1a75331e737d97.jpg)
Huawei software test interview question | a Huawei successful employee's sharing [written examination question]

The pipelineexecute pipeline execution process of VTK learning

Alignment HR_ MySQL storage engine, so it is

从零到一,一站式解决Linux环境下的MySQL(下载篇)

vtk学习之Pipeline管线

texstudio 显示行号和不检查拼写设置

LeetCode琅琊榜第十八层-两数之和(查找表法)

如丝般添加缓存来优化服务
随机推荐
【JUC系列】线程池基础使用
The R language uses the select function of the dplyr package to customize and change the order of two data columns in the dataframe data
Task03: more complex query (2)
Credit card customer churn forecast
Leetcode Langya list level 20 - binary sum
It only takes eight steps to package the applet to generate an app
From zero to one, one-stop solution to MySQL under linux environment (download)
36氪首发 | 新一代iPOCT产品持续发展,「伊鸿健康 」完成新一轮数千万元融资
Sparse knowledge points
R language uses LM function to build a simple linear regression model (establish a linear regression model), fit the regression line, use attributes function to view the attribute information of the l
Textstudio displays line numbers and does not check spelling settings
Why can't Google search page infinite?
乐鑫推出 ESP32-C3 的 AWS IoT 参考示例
[JUC series] basic use of thread pool
uni-app_ Configure network request in wechat applet development project (third-party package @escook/request miniprogram)
Task06: Autumn move script C
BlockingQueue、SynchronousQueue、ArrayBlockingQueue、ReentrantLock、Semaphore、公平与非公平
MainActivity
After Zotero beta 6.0 is installed, the problem that the built-in PDF reader cannot be used is solved
HP 笔记本电脑 - 笔记本电脑睡眠后如何唤醒
