当前位置:网站首页>Linear regression of machine learning (least square handwriting +sklearn Implementation)
Linear regression of machine learning (least square handwriting +sklearn Implementation)
2022-07-29 09:43:00 【Cyril-KI】
Write it at the front
This part is in my CSDN It has been mentioned in the blog , It's the basic part . Now move to official account , No more like CSDN It's as simple as that , I try to make it easy to understand .
One 、 Basic form
We use it Examples of attribute descriptions , among , yes In the Values on attributes . Linear model (Linear Model) Is trying to use a linear combination to describe some kind of comprehensive score of an example :
Generally, we write it in vector form :
among .
For example, we use five groups of attributes to describe whether a person is a good spouse :( Gender , character , Age , Appearance , Wealth ).
Then everyone can use a vector to express : Gender, personality, age, appearance, wealth
Then judge whether a person is a good spouse , We can define the following linear model :
Gender, personality, age, appearance, wealth Final , The higher the score, the more likely this person is to be a good spouse .
Two 、 Linear regression
Linear regression attempts to learn a linear model to predict real value output markers as accurately as possible . By establishing the cost function on the data set (loss function), Finally, the model parameters are determined with the objective of optimizing the cost function , Thus, the model can be used for subsequent prediction .
Imagine a scenario : Now there's a bunch of data , It contains 10000 The values of the above five attributes of an individual and their scores , Now we need to find out and , Then predict others ( Only know the attribute value ) Of .
So our ultimate goal is : Look for parameters and , bring and For this 10000 The predicted value of people and the real regression goal ( What has been given ) The mean square error between is the smallest . So definition :
among and Separate indication control Attribute values and tags of samples ( score ).
The goal is : Just to find out and , bring Minimum , So we can think we found and It has the best adaptability to this batch of data , If this batch of samples has good universality , We can use it and To predict others' scores .
3、 ... and 、 solve
To facilitate derivation , We are right. Make further simplification , We make :
Re order So finally :
then Yes Derivation is equal to 0:
And then we get :
So in the end :
If you give us a person's attribute values , We can use the above formula to calculate the score of this person .
Four 、 Code implementation
1. Import package
from sklearn import linear_model
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_errorsklearn There is a special linear model package in linear_model,matplotlib For drawing , In addition, import MSE,R_Square and MAE Three evaluation indicators .
2. Construct data set
Data can be generated automatically , You can also look for existing data , The following data is the data in the job Salary_Data.csv, The sample data has only one feature .
def load_data():
data=pd.read_csv('Salary_Data.csv',encoding='gbk')
data=data.values.tolist()
train_x=[];train_y=[]
test_x=[];test_y=[]
# The first half is a training set , The latter half is used as a test set
for i in range(len(data)):
if i<len(data)/2:
train_x.append(data[i][0])
train_y.append(data[i][1])
else:
test_x.append(data[i][0])
test_y.append(data[i][1])
return train_x,train_y,test_x,test_y3. model training + forecast
lr = linear_model.LinearRegression()
lr.fit(train_x, train_y)
y_pred = lr.predict(test_x)4. Output parameters and evaluation indicators
# Output factor and intercept
print('w:', lr.coef_, 'b:', lr.intercept_)
# Output evaluation index
print('MSE:', mean_squared_error(test_y, y_pred))
print('MAE:', mean_absolute_error(test_y, y_pred))
print('R2_Squared:', r2_score(test_y, y_pred))5. drawing
# Show
plt.scatter(test_x, test_y, color='black')
plt.plot(test_x, y_pred, color='blue', linewidth=3)
plt.show()Complete code :
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model
import numpy as np
from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_error
def load_data():
data=pd.read_csv('Salary_Data.csv',encoding='gbk')
data=data.values.tolist()
train_x=[];train_y=[]
test_x=[];test_y=[]
# The first half is a training set , The latter half is used as a test set
for i in range(len(data)):
if i<len(data)/2:
train_x.append(data[i][0])
train_y.append(data[i][1])
else:
test_x.append(data[i][0])
test_y.append(data[i][1])
return train_x,train_y,test_x,test_y
def model():
print(' Handwriting :')
train_x,train_y,test_x,test_y=load_data()
# The parameters are obtained by the least square method
sum=0.0;sum_square=0.0;sum_2=0.0;sum_b=0.0
for i in range(len(train_x)):
sum=sum+train_x[i]
sum_square=sum_square+train_x[i]**2
ave_x=sum/len(train_x)
for i in range(len(train_x)):
sum_2=sum_2+(train_y[i]*(train_x[i]-ave_x))
w=sum_2/(sum_square-sum**2/len(train_x))
for i in range(len(train_x)):
sum_b=sum_b+(train_y[i]-w*train_x[i])
b=sum_b/len(train_x)
print('w=',w,'b=',b)
# test
pred_y=[]
for i in range(len(test_x)):
pred_y.append(w*test_x[i]+b)
# Calculation MSE,MAE,r2_score
sum_mse=0.0;sum_mae=0.0
sum1=0.0;sum2=0.0
for i in range(len(pred_y)):
sum_mae=sum_mae+np.abs(pred_y[i]-test_y[i])
sum_mse=sum_mse+(pred_y[i]-test_y[i])**2
sum_y=0.0
for i in range(len(test_y)):
sum_y=sum_y+test_y[i]
ave_y=sum_y/len(test_y)
for i in range(len(pred_y)):
sum1=sum1+(pred_y[i]-test_y[i])**2
sum2=sum2+(ave_y-test_y[i])**2
print('MSE:',sum_mse/len(pred_y))
print('MAE:',sum_mae/len(pred_y))
print('R2_Squared:',1-sum1/sum2)
# Show
plt.scatter(test_x, test_y, color='black')
plt.plot(test_x, pred_y, color='blue', linewidth=3)
plt.show()
print('\n')
# Transfer
def sklearn_linearmodel():
print(' Transfer :')
train_x, train_y, test_x, test_y = load_data()
train_x=np.array(train_x).reshape(-1,1)
train_y = np.array(train_y).reshape(-1, 1)
test_x = np.array(test_x).reshape(-1, 1)
test_y = np.array(test_y).reshape(-1, 1)
# Training + test
lr = linear_model.LinearRegression()
lr.fit(train_x, train_y)
y_pred = lr.predict(test_x)
# Output factor and intercept
print('w:', lr.coef_, 'b:', lr.intercept_)
# Output evaluation index
print('MSE:', mean_squared_error(test_y, y_pred))
print('MAE:', mean_absolute_error(test_y, y_pred))
print('R2_Squared:', r2_score(test_y, y_pred))
# Show
plt.scatter(test_x, test_y, color='black')
plt.plot(test_x, y_pred, color='blue', linewidth=3)
plt.show()
if __name__=='__main__':
model()
sklearn_linearmodel()边栏推荐
- Sample is new and supported from API 8! Come and take it
- Visual Studio
- Detailed explanation: what is the GPS Beidou time service server?
- i.MX6ULL驱动开发 | 32 - 手动编写一个虚拟网卡设备
- 如何为OpenHarmony做贡献
- Flutter文本编辑器
- Div horizontal arrangement
- Commonly used DOS commands [gradually improved]
- First order traversal / second order traversal determines the approximate shape of the tree
- How to export the old and new file names and locations to excel after file renaming
猜你喜欢

SiC Power Semiconductor Industry Summit Forum successfully held

怎么样的框架对于开发者是友好的?

What kind of framework is friendly to developers?

vector实现

数仓项目踩坑记录与解决方法总结

Node (II)

使用cpolar发布树莓派网页(cpolar隧道的完善)

pytest+allure生成测试报告

How to customize the opportunity closing form in dynamics 365online

Dynamics 365Online 如何自定义商机关闭窗体
随机推荐
2021年CS保研经历(四):西交软院预推免、信工所三室预推免
【微信小程序】接口生成自定义首页二维码
How to contribute to openharmony
[centralized training] hcip cloud computing resource exchange post
Explanation of C value type and reference type
MySQL converts some table names to uppercase
In simple terms, dependency injection and its application in Tiktok live broadcast
Cloud native management practice: business led Devops continuous delivery system
Commonly used DOS commands [gradually improved]
JS 实现全屏效果
Harmonyos 3.0 release!
怎样查询快递物流筛选出无信息单号删除或者复制
一知半解 ~题目杂记 ~ 一个多态问题
C# 值类型和引用类型讲解
36. JS animation
高智伟:数据管理赋能交通行业数字化转型
Flutter文本编辑器
User identity identification and account system practice
Appendix 2 – some simple exercises
I don't know how lucky the boy who randomly typed logs is. There must be a lot of overtime