当前位置:网站首页>Linear regression of machine learning (least square handwriting +sklearn Implementation)
Linear regression of machine learning (least square handwriting +sklearn Implementation)
2022-07-29 09:43:00 【Cyril-KI】
Write it at the front
This part is in my CSDN It has been mentioned in the blog , It's the basic part . Now move to official account , No more like CSDN It's as simple as that , I try to make it easy to understand .
One 、 Basic form
We use it Examples of attribute descriptions , among , yes In the Values on attributes . Linear model (Linear Model) Is trying to use a linear combination to describe some kind of comprehensive score of an example :
Generally, we write it in vector form :
among .
For example, we use five groups of attributes to describe whether a person is a good spouse :( Gender , character , Age , Appearance , Wealth ).
Then everyone can use a vector to express : Gender, personality, age, appearance, wealth
Then judge whether a person is a good spouse , We can define the following linear model :
Gender, personality, age, appearance, wealth Final , The higher the score, the more likely this person is to be a good spouse .
Two 、 Linear regression
Linear regression attempts to learn a linear model to predict real value output markers as accurately as possible . By establishing the cost function on the data set (loss function), Finally, the model parameters are determined with the objective of optimizing the cost function , Thus, the model can be used for subsequent prediction .
Imagine a scenario : Now there's a bunch of data , It contains 10000 The values of the above five attributes of an individual and their scores , Now we need to find out and , Then predict others ( Only know the attribute value ) Of .
So our ultimate goal is : Look for parameters and , bring and For this 10000 The predicted value of people and the real regression goal ( What has been given ) The mean square error between is the smallest . So definition :
among and Separate indication control Attribute values and tags of samples ( score ).
The goal is : Just to find out and , bring Minimum , So we can think we found and It has the best adaptability to this batch of data , If this batch of samples has good universality , We can use it and To predict others' scores .
3、 ... and 、 solve
To facilitate derivation , We are right. Make further simplification , We make :
Re order So finally :
then Yes Derivation is equal to 0:
And then we get :
So in the end :
If you give us a person's attribute values , We can use the above formula to calculate the score of this person .
Four 、 Code implementation
1. Import package
from sklearn import linear_model
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_errorsklearn There is a special linear model package in linear_model,matplotlib For drawing , In addition, import MSE,R_Square and MAE Three evaluation indicators .
2. Construct data set
Data can be generated automatically , You can also look for existing data , The following data is the data in the job Salary_Data.csv, The sample data has only one feature .
def load_data():
data=pd.read_csv('Salary_Data.csv',encoding='gbk')
data=data.values.tolist()
train_x=[];train_y=[]
test_x=[];test_y=[]
# The first half is a training set , The latter half is used as a test set
for i in range(len(data)):
if i<len(data)/2:
train_x.append(data[i][0])
train_y.append(data[i][1])
else:
test_x.append(data[i][0])
test_y.append(data[i][1])
return train_x,train_y,test_x,test_y3. model training + forecast
lr = linear_model.LinearRegression()
lr.fit(train_x, train_y)
y_pred = lr.predict(test_x)4. Output parameters and evaluation indicators
# Output factor and intercept
print('w:', lr.coef_, 'b:', lr.intercept_)
# Output evaluation index
print('MSE:', mean_squared_error(test_y, y_pred))
print('MAE:', mean_absolute_error(test_y, y_pred))
print('R2_Squared:', r2_score(test_y, y_pred))5. drawing
# Show
plt.scatter(test_x, test_y, color='black')
plt.plot(test_x, y_pred, color='blue', linewidth=3)
plt.show()Complete code :
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model
import numpy as np
from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_error
def load_data():
data=pd.read_csv('Salary_Data.csv',encoding='gbk')
data=data.values.tolist()
train_x=[];train_y=[]
test_x=[];test_y=[]
# The first half is a training set , The latter half is used as a test set
for i in range(len(data)):
if i<len(data)/2:
train_x.append(data[i][0])
train_y.append(data[i][1])
else:
test_x.append(data[i][0])
test_y.append(data[i][1])
return train_x,train_y,test_x,test_y
def model():
print(' Handwriting :')
train_x,train_y,test_x,test_y=load_data()
# The parameters are obtained by the least square method
sum=0.0;sum_square=0.0;sum_2=0.0;sum_b=0.0
for i in range(len(train_x)):
sum=sum+train_x[i]
sum_square=sum_square+train_x[i]**2
ave_x=sum/len(train_x)
for i in range(len(train_x)):
sum_2=sum_2+(train_y[i]*(train_x[i]-ave_x))
w=sum_2/(sum_square-sum**2/len(train_x))
for i in range(len(train_x)):
sum_b=sum_b+(train_y[i]-w*train_x[i])
b=sum_b/len(train_x)
print('w=',w,'b=',b)
# test
pred_y=[]
for i in range(len(test_x)):
pred_y.append(w*test_x[i]+b)
# Calculation MSE,MAE,r2_score
sum_mse=0.0;sum_mae=0.0
sum1=0.0;sum2=0.0
for i in range(len(pred_y)):
sum_mae=sum_mae+np.abs(pred_y[i]-test_y[i])
sum_mse=sum_mse+(pred_y[i]-test_y[i])**2
sum_y=0.0
for i in range(len(test_y)):
sum_y=sum_y+test_y[i]
ave_y=sum_y/len(test_y)
for i in range(len(pred_y)):
sum1=sum1+(pred_y[i]-test_y[i])**2
sum2=sum2+(ave_y-test_y[i])**2
print('MSE:',sum_mse/len(pred_y))
print('MAE:',sum_mae/len(pred_y))
print('R2_Squared:',1-sum1/sum2)
# Show
plt.scatter(test_x, test_y, color='black')
plt.plot(test_x, pred_y, color='blue', linewidth=3)
plt.show()
print('\n')
# Transfer
def sklearn_linearmodel():
print(' Transfer :')
train_x, train_y, test_x, test_y = load_data()
train_x=np.array(train_x).reshape(-1,1)
train_y = np.array(train_y).reshape(-1, 1)
test_x = np.array(test_x).reshape(-1, 1)
test_y = np.array(test_y).reshape(-1, 1)
# Training + test
lr = linear_model.LinearRegression()
lr.fit(train_x, train_y)
y_pred = lr.predict(test_x)
# Output factor and intercept
print('w:', lr.coef_, 'b:', lr.intercept_)
# Output evaluation index
print('MSE:', mean_squared_error(test_y, y_pred))
print('MAE:', mean_absolute_error(test_y, y_pred))
print('R2_Squared:', r2_score(test_y, y_pred))
# Show
plt.scatter(test_x, test_y, color='black')
plt.plot(test_x, y_pred, color='blue', linewidth=3)
plt.show()
if __name__=='__main__':
model()
sklearn_linearmodel()边栏推荐
- "Defects" of prototype chain inheritance and constructor inheritance
- How to introduce your project experience?
- 使用cpolar发布树莓派网页(cpolar功能的完善)
- 基于ArkUI eTS开发的坚果新闻(NutNews)
- 最新翻译的官方PyTorch简易入门教程(PyTorch1.0版本)
- PyQt5快速开发与实战 6.5 QGridLayout(网格布局)
- [苹果开发者账号]06 转让开发者账号后,开发者年费自动续费问题
- Vs2015 uses loadlibrary to call DLL library
- 网络安全(6)
- [centralized training] hcip cloud computing resource exchange post
猜你喜欢

View port occupancy

最新翻译的官方PyTorch简易入门教程(PyTorch1.0版本)

pytest+allure生成测试报告

开放原子开源基金会黄金捐赠人优博讯携手合作伙伴,助力OpenHarmony破圈!

PyQt5快速开发与实战 6.1 好软件的三个维度 && 6.2 PyQt5中的布局管理 && 6.3 PyQt5的绝对位置布局

Harmonyos 3.0 release!

MySQL的数据类型

7.9-7.17 new features and grammar of learning plan ES6

使用cpolar发布树莓派网页(cpolar隧道的完善)

What kind of framework is friendly to developers?
随机推荐
[苹果开发者账号]06 转让开发者账号后,开发者年费自动续费问题
dataframe. to_ Sql() inserts too many errors at one time
What kind of framework is friendly to developers?
How to export the old and new file names and locations to excel after file renaming
Behind 100000 visits...
Implementation and verification logic of complex expression input component
Qmainwindow details
Commonly used DOS commands [gradually improved]
Solve the problem of reading data garbled by redis visualization tool
Unity xchart3.0 basic usage quick start
Gao Zhiwei: data management enables the digital transformation of the transportation industry
C# 值类型和引用类型讲解
查看端口占用情况
如何为OpenHarmony做贡献
[Yunzhu co creation] [hcsd live broadcast] teach the interview tips of big companies in person
OpenCV图像处理基础操作
远程连接windows版本服务器redis的配置文件设置
redis可视化工具读取数据乱码问题解决
How to introduce your project experience?
Pytest+allure generate test report