当前位置:网站首页>Linear regression of machine learning (least square handwriting +sklearn Implementation)
Linear regression of machine learning (least square handwriting +sklearn Implementation)
2022-07-29 09:43:00 【Cyril-KI】
Write it at the front
This part is in my CSDN It has been mentioned in the blog , It's the basic part . Now move to official account , No more like CSDN It's as simple as that , I try to make it easy to understand .
One 、 Basic form
We use it Examples of attribute descriptions , among , yes In the Values on attributes . Linear model (Linear Model) Is trying to use a linear combination to describe some kind of comprehensive score of an example :
Generally, we write it in vector form :
among .
For example, we use five groups of attributes to describe whether a person is a good spouse :( Gender , character , Age , Appearance , Wealth ).
Then everyone can use a vector to express : Gender, personality, age, appearance, wealth
Then judge whether a person is a good spouse , We can define the following linear model :
Gender, personality, age, appearance, wealth Final , The higher the score, the more likely this person is to be a good spouse .
Two 、 Linear regression
Linear regression attempts to learn a linear model to predict real value output markers as accurately as possible . By establishing the cost function on the data set (loss function), Finally, the model parameters are determined with the objective of optimizing the cost function , Thus, the model can be used for subsequent prediction .
Imagine a scenario : Now there's a bunch of data , It contains 10000 The values of the above five attributes of an individual and their scores , Now we need to find out and , Then predict others ( Only know the attribute value ) Of .
So our ultimate goal is : Look for parameters and , bring and For this 10000 The predicted value of people and the real regression goal ( What has been given ) The mean square error between is the smallest . So definition :
among and Separate indication control Attribute values and tags of samples ( score ).
The goal is : Just to find out and , bring Minimum , So we can think we found and It has the best adaptability to this batch of data , If this batch of samples has good universality , We can use it and To predict others' scores .
3、 ... and 、 solve
To facilitate derivation , We are right. Make further simplification , We make :
Re order So finally :
then Yes Derivation is equal to 0:
And then we get :
So in the end :
If you give us a person's attribute values , We can use the above formula to calculate the score of this person .
Four 、 Code implementation
1. Import package
from sklearn import linear_model
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_errorsklearn There is a special linear model package in linear_model,matplotlib For drawing , In addition, import MSE,R_Square and MAE Three evaluation indicators .
2. Construct data set
Data can be generated automatically , You can also look for existing data , The following data is the data in the job Salary_Data.csv, The sample data has only one feature .
def load_data():
data=pd.read_csv('Salary_Data.csv',encoding='gbk')
data=data.values.tolist()
train_x=[];train_y=[]
test_x=[];test_y=[]
# The first half is a training set , The latter half is used as a test set
for i in range(len(data)):
if i<len(data)/2:
train_x.append(data[i][0])
train_y.append(data[i][1])
else:
test_x.append(data[i][0])
test_y.append(data[i][1])
return train_x,train_y,test_x,test_y3. model training + forecast
lr = linear_model.LinearRegression()
lr.fit(train_x, train_y)
y_pred = lr.predict(test_x)4. Output parameters and evaluation indicators
# Output factor and intercept
print('w:', lr.coef_, 'b:', lr.intercept_)
# Output evaluation index
print('MSE:', mean_squared_error(test_y, y_pred))
print('MAE:', mean_absolute_error(test_y, y_pred))
print('R2_Squared:', r2_score(test_y, y_pred))5. drawing
# Show
plt.scatter(test_x, test_y, color='black')
plt.plot(test_x, y_pred, color='blue', linewidth=3)
plt.show()Complete code :
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model
import numpy as np
from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_error
def load_data():
data=pd.read_csv('Salary_Data.csv',encoding='gbk')
data=data.values.tolist()
train_x=[];train_y=[]
test_x=[];test_y=[]
# The first half is a training set , The latter half is used as a test set
for i in range(len(data)):
if i<len(data)/2:
train_x.append(data[i][0])
train_y.append(data[i][1])
else:
test_x.append(data[i][0])
test_y.append(data[i][1])
return train_x,train_y,test_x,test_y
def model():
print(' Handwriting :')
train_x,train_y,test_x,test_y=load_data()
# The parameters are obtained by the least square method
sum=0.0;sum_square=0.0;sum_2=0.0;sum_b=0.0
for i in range(len(train_x)):
sum=sum+train_x[i]
sum_square=sum_square+train_x[i]**2
ave_x=sum/len(train_x)
for i in range(len(train_x)):
sum_2=sum_2+(train_y[i]*(train_x[i]-ave_x))
w=sum_2/(sum_square-sum**2/len(train_x))
for i in range(len(train_x)):
sum_b=sum_b+(train_y[i]-w*train_x[i])
b=sum_b/len(train_x)
print('w=',w,'b=',b)
# test
pred_y=[]
for i in range(len(test_x)):
pred_y.append(w*test_x[i]+b)
# Calculation MSE,MAE,r2_score
sum_mse=0.0;sum_mae=0.0
sum1=0.0;sum2=0.0
for i in range(len(pred_y)):
sum_mae=sum_mae+np.abs(pred_y[i]-test_y[i])
sum_mse=sum_mse+(pred_y[i]-test_y[i])**2
sum_y=0.0
for i in range(len(test_y)):
sum_y=sum_y+test_y[i]
ave_y=sum_y/len(test_y)
for i in range(len(pred_y)):
sum1=sum1+(pred_y[i]-test_y[i])**2
sum2=sum2+(ave_y-test_y[i])**2
print('MSE:',sum_mse/len(pred_y))
print('MAE:',sum_mae/len(pred_y))
print('R2_Squared:',1-sum1/sum2)
# Show
plt.scatter(test_x, test_y, color='black')
plt.plot(test_x, pred_y, color='blue', linewidth=3)
plt.show()
print('\n')
# Transfer
def sklearn_linearmodel():
print(' Transfer :')
train_x, train_y, test_x, test_y = load_data()
train_x=np.array(train_x).reshape(-1,1)
train_y = np.array(train_y).reshape(-1, 1)
test_x = np.array(test_x).reshape(-1, 1)
test_y = np.array(test_y).reshape(-1, 1)
# Training + test
lr = linear_model.LinearRegression()
lr.fit(train_x, train_y)
y_pred = lr.predict(test_x)
# Output factor and intercept
print('w:', lr.coef_, 'b:', lr.intercept_)
# Output evaluation index
print('MSE:', mean_squared_error(test_y, y_pred))
print('MAE:', mean_absolute_error(test_y, y_pred))
print('R2_Squared:', r2_score(test_y, y_pred))
# Show
plt.scatter(test_x, test_y, color='black')
plt.plot(test_x, y_pred, color='blue', linewidth=3)
plt.show()
if __name__=='__main__':
model()
sklearn_linearmodel()边栏推荐
- How to customize the opportunity closing form in dynamics 365online
- PyQt5快速开发与实战 6.1 好软件的三个维度 && 6.2 PyQt5中的布局管理 && 6.3 PyQt5的绝对位置布局
- OpenCV入门基础学习
- Summary of introduction to unityshader (2): Beginner Level
- Unity guidance system. Click the target object and prompt the text to change color to enter the next step
- Implementation and verification logic of complex expression input component
- Unity xchart3.0 basic usage quick start
- 当 update 修改数据与原数据相同时会被更新吗?
- Div horizontal layout aligned on both sides
- div 水平排列
猜你喜欢

PyQt5快速开发与实战 6.5 QGridLayout(网格布局)

Configuration file settings for remote connection to Windows version server redis

程序员脱离单身的一些建议

How to query express logistics and filter out no information doc No. to delete or copy

Mysql database final review question bank

pytest+allure生成测试报告

Unity Xchart3.0基本用法快速上手

How to realize the isolation level between MySQL transactions and mvcc

Basic operations of OpenCV image processing

综合设计一个OPPE主页--页面的底部
随机推荐
[ts]Typescript学习记录坑点合集
Summary of introduction to unityshader (2): Beginner Level
【C语言】扫雷(递归展开 + 标记功能)
附录2-一些简单的练习
Is the marginal old technology in its 40s weak in the future or rising from the ground?
User identity identification and account system practice
数据库表结构生成excel工具
2021年CS保研经历(四):西交软院预推免、信工所三室预推免
How to export the old and new file names and locations to excel after file renaming
Will the modified data be updated when it is the same as the original data?
云原生管理实践:业务引领的DevOps持续交付体系
什么是卡特兰数?有哪些应用?
How to realize the isolation level between MySQL transactions and mvcc
Redis command [gradually improved]
怎样查询快递物流筛选出无信息单号删除或者复制
Explanation of trie tree (dictionary tree)
PyQt5快速开发与实战 6.1 好软件的三个维度 && 6.2 PyQt5中的布局管理 && 6.3 PyQt5的绝对位置布局
深入浅出依赖注入及其在抖音直播中的应用
Sublime Text3 设置不同文件不同缩进
详解:到底什么是GPS北斗授时服务器?