当前位置:网站首页>Chapter 1 of machine learning [series] linear regression model
Chapter 1 of machine learning [series] linear regression model
2022-06-11 06:02:00 【Forward ing】
machine learning 【 series 】 Chapter one linear regression model
Chapter one Python A linear regression model for the introduction of machine learning
List of articles
- machine learning 【 series 】 Chapter one linear regression model
- Preface
- One 、 Linear regression algorithm
- Two 、 Code implementation of linear regression
- 1. Univariate linear regression code implementation
- (1) Draw a scatter plot
- (2) introduce Scikit-Learn Library building model
- (3) Model to predict
- (4) Model visualization
- (5) Linear regression equation construction
- 2. Case actual combat : Linear regression model of length of service and salary in different industries
- (1) Reading data
- (2) Model structures, + Data visualization
- (3) Linear regression equation construction
- (4) Model optimization
- 3、 ... and 、 The linear regression model evaluates
- summary
Preface
Machine learning involves a wide range of knowledge , But in general, there are two main lines , One main line is problem , Another main line is Model . Machine learning seems speechless , But generally speaking, it is mainly divided into Supervised learning and Unsupervised learning Two categories: .
In linear regression , We list according to the characteristic table ( Also called an argument ) To predict the response variables ( Also called dependent variable ). According to the number of characteristic variables, the linear regression model can be divided into Univariate linear regression and Multiple linear regression . for example , adopt “ Working years ” This characteristic variable is used to predict “ salary ”, It belongs to univariate linear regression ; And by “ Working years ”“ industry ”“ city ” Wait for multiple characteristic variables to predict “ salary ”, It belongs to multiple linear regression .
Tips : The following is the main body of this article , The following cases can be used for reference
One 、 Linear regression algorithm
1. Univariate linear regression
For a linear regression problem , in other words , there “ Mysterious equation ” It's a linear equation , The corresponding data set points must also be arranged linearly , that , All we have to do is adjust the two buttons of the linear equation , Make a straight line through these points one by one , That is, fitting . This is a linear equation that can fit data set points , That's what we're looking for “ Mysterious equation ”.
Two 、 Code implementation of linear regression
1. Univariate linear regression code implementation
(1) Draw a scatter plot
import matplotlib.pyplot as plt
X = [[1],[2],[4],[5]]
Y = [2,4,6,8]
plt.scatter(X,Y)
plt.show()
(2) introduce Scikit-Learn Library building model
# from Scikit-Learn The library introduces the correlation module of linear regression LinearRegression
from sklearn.linear_model import LinearRegression
# Construct an initial linear regression model and name it regr
regr = LinearRegression()
# use fit() Function to complete the model building , At this time regr It's a built linear regression model
regr.fit(X,Y)
(3) Model to predict
# So use predict() Function can predict the corresponding dependent variable y
y = regr.predict([[1.5],[2.5],[4.5]])
# The prediction results are as follows :
[2.9 4.3 7.1]
(4) Model visualization
plt.scatter(X,Y)
plt.plot(X,regr.predict(X))
# The principle is the least square method
plt.show()

(5) Linear regression equation construction
print(" coefficient a: "+str(regr.coef_[0]))
print(" intercept b: "+str(regr.intercept_))
# The operation results are as follows :
coefficient a: 1.4
intercept b: 0.8
# The univariate linear regression equation obtained by fitting is y = 1.4x + 0.8.
2. Case actual combat : Linear regression model of length of service and salary in different industries
The code is as follows ( Example ):
(1) Reading data
from sklearn import linear_model
from matplotlib import pyplot as plt
import pandas as pd
df = pd.read_excel('IT Industry income statement .xlsx')
X = df[[" Working years "]]
Y = df[[" salary "]]
plt.rcParams["font.sans-serif"] = ["SimHei"] # Used to display Chinese tags
plt.scatter(X,Y)
plt.xlabel(" Working years ")
plt.ylabel(" salary ")
plt.show()

(2) Model structures, + Data visualization
model = linear_model.LinearRegression()
model.fit(X,Y)
plt.scatter(X,Y)
plt.plot(X,model.predict(X),color="red") # color="red" Indicates that the trend line is drawn in red
plt.xlabel(" Working years ")
plt.ylabel(" salary ")
plt.show()

(3) Linear regression equation construction
print(" coefficient a: ",str(model.coef_[0]))
print(" intercept b: ",str(model.intercept_))
# give the result as follows :
coefficient a: [ 2497.1513476]
intercept b: [ 10143.13196687]
(4) Model optimization
There is also an advanced version of the univariate linear regression model : Univariate multiple linear regression model , The more common is the univariate quadratic linear regression model , Its form can be expressed as the following formula : y = ax2+bx+c
# A module for adding more than one item PolynomialFeatures
from sklearn.preprocessing import PolynomialFeatures
# Set the highest item to be quadratic , To generate quadratic item data (x2) To prepare for
ploy_reg = PolynomialFeatures(degree=2)
# Generate a new two-dimensional array X_
X_ = ploy_reg.fit_transform(X)
ploy_reg = PolynomialFeatures(degree=2)
X_ = ploy_reg.fit_transform(X)
regr = linear_model.LinearRegression()
regr.fit(X_,Y)
plt.scatter(X,Y)
plt.plot(X,regr.predict(X_),color="red")
plt.xlabel(" Working years ")
plt.ylabel(" salary ")
plt.show()

3、 ... and 、 The linear regression model evaluates
1. Programming implementation of model evaluation
After the model is built , The model also needs to be evaluated , Here mainly with 3 A value as a criterion :R-squared( In statistics R The square of )、Adj.R-squared( namely Adjusted R The square of )、P value . among R-squared and Adj.R-squared Used to measure the quality of linear fitting ,P Value is used to measure the significance of characteristic variables .
import statsmodels.api as sm
X2 = sm.add_constant(X)
est = sm.OLS(Y,X2).fit()
print(est.summary())

R-squared and Adj.R-squared The value range of is 0–1, The closer their values are 1, The higher the fitting degree of the model ;P Value is essentially a probability value , Its value range is also 0–1,P The closer the value is. 1, The higher the significance of the characteristic variable , That is, the characteristic variable is really related to the target variable .
2. Case actual combat : Customer value prediction model
import pandas as pd
df2 = pd.read_excel(' Customer value data sheet .xlsx')
df2.head()
X = df2[[" Historical loan amount "," Number of loans "," Education "," Monthly income "," Gender "]]
Y = df2[" Customer value "]
from sklearn.linear_model import LinearRegression
regr = LinearRegression()
regr.fit(X,Y)
print(" Each coefficient : "+str(regr.coef_))
print(" Constant coefficient k0:"+str(regr.intercept_))
import statsmodels.api as sm
X2 = sm.add_constant(X)
est = sm.OLS(Y,X2).fit()
print(est.summary())

summary
Here is a summary of the article :
That's what we're going to talk about today , This article simply introduced the linear regression model and simply carried out the case listing practice .
Reference books :《Python Big data analysis and machine learning business case practice 》
边栏推荐
- URL in flask_ for
- Global case | how an airline with a history of 100 years can expand and transform to promote innovation in the aviation industry
- 使用Batch管理VHD
- How to deal with message blackout and message sending failure of Aurora im
- Installing MySQL for Linux
- Fix [no Internet, security] problem
- qmake 实现QT工程pro脚本转vs解决方案
- Méthode de la partie du tableau
- Configure the rust compilation environment
- All the benefits of ci/cd, but greener
猜你喜欢

我们真的需要会议耳机吗?

做亚马逊测评要了解的知识点有哪些?

Summarize the five most common BlockingQueue features

Installing and using sublist3r in Kali

Box model

Servlet

箭头函数的this指向

Thymeleafengine template engine

跨境电商测评自养号团队应该怎么做?

"All in one" is a platform to solve all needs, and the era of operation and maintenance monitoring 3.0 has come
随机推荐
Qmake implementation of QT project Pro script to vs solution
OJDBC在Linux系统下Connection速度慢解决方案
Adapter the problem of executing only one animation in multiple frames
URL in flask_ for
This is probably the most comprehensive project about Twitter information crawler search on the Chinese Internet
After adding the header layout to the recyclerview, use the adapter Notifyitemchanged (POS,'test') invalid local refresh
View controller and navigation mode
MinGW-W64安装说明
Gilde failed to go to the listener to call back the reason record when loading the Gaussian blur image
PgSQL reports an error: current transaction is aborted, commands ignored until end of transaction block
ThymeleafEngine模板引擎
Cenos7 builds redis-3.2.9 and integrates jedis
Sign for this "plug-in" before returning home for the new year
Sqli-libs range 23-24 filtration and secondary injection practice
使用Batch设置IP地址
Deployment of Flink
Mingw-w64 installation instructions
Array partial method
Use of vlayout
The meaning in the status column displayed by PS aux command