当前位置：网站首页>Linear Regression 02---Boston Housing Price Prediction

Linear Regression 02---Boston Housing Price Prediction

2022-08-04 06:05:00 【I'm fine please go away thank you】

文章目录

一、获取数据
二、数据分析
三、数据处理
四、特征工程标准化
五、机器学习创建模型
六、模型评估
七、全部代码

写在最前 :

Learning from a blogger's blog,讲得很好,很细致.
Portal click here
这个案例是以线性回归为模型预测的,The purpose is to find one线性函数,The parameters occupied by each feature,Finally, do this on the desired linear function模型评估.

一、获取数据

在这里插入图片描述

二、数据分析

2.1描述性统计分析

 # 2.1描述性统计分析
 # describe()is to return this set of datacount ,mean,std,min,max,and percentiles. .T是转置
data.describe().T

在这里插入图片描述
结论：数据总共有506行,14个变量,而且这14个变量都有506个非空的float64类型的数值,i.e. all variables have no null value.

2.2 散点图分析

1. 先绘制一个

在这里插入图片描述

2. Draw the rest as well

plt.figure(figsize=(15,10.5)) //图像大小
plot_count = 1
for feature in list(data.columns)[1:13]:   //把剩余13The graph of each feature is drawn cyclically
    plt.subplot(3,4,plot_count)  //表示三行四列,plot_countIndicates the position of each scatterplot
    plt.scatter(data[feature],data['target'])
    plt.xlabel(feature.replace('_',' ').title())
    plt.ylabel('target')
    plot_count += 1
plt.show()

图像：
在这里插入图片描述

三、数据处理

x = data.iloc[:,0:13]   //DataFrame切割,切割前13列（That is, put the last columntarget社区）
y = data.iloc[:,13:14]
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2,random_state=5 )

四、特征工程标准化

transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)

五、机器学习创建模型

在这里插入图片描述

六、模型评估

在这里插入图片描述

七、全部代码

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression #通过正规方程优化
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import SGDRegressor #suiji
from sklearn.linear_model import Ridge   #岭回归
import matplotlib.pyplot as plt
import pandas as pd

# 1.获取数据 
data = pd.read_csv('./data/boshidun.csv')
data

 # 2.1描述性统计分析
 # describe()is to return this set of datacount ,mean,std,min,max,and percentiles. .T是转置
data.describe().T   

 # 2.2 散点图分析
def drawing(x,y,xlabel):
    plt.scatter(x,y)
    plt.title('%s - House Prices'% xlabel)
    plt.xlabel(xlabel)
    plt.ylabel('House Prices')
    plt.yticks(range(0,60,5))
    plt.grid()
    plt.show()

# 绘制变量CRIMand a scatterplot of the dependent variable
drawing(data['CRIM'],data['target'],'Urban Per Crime Rate')

plt.figure(figsize=(15,10.5))
plot_count = 1
for feature in list(data.columns)[1:13]:
    plt.subplot(3,4,plot_count)
    plt.scatter(data[feature],data['target'])
    plt.xlabel(feature.replace('_',' ').title())
    plt.ylabel('target')
    plot_count += 1
plt.show()

x = data.iloc[:,0:13]
y = data.iloc[:,13:14]
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2,random_state=5 )

transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)

estimator = LinearRegression()

estimator.fit(x_train, y_train)

# 5.2 Print the estimated coefficients for the parameter
print(estimator.coef_[0])
coeffcients = pd.DataFrame([x.columns, estimator.coef_[0]]).T
coeffcients



# 6.1 获取预测值
y_predict = estimator.predict(x_test)
# 6.2 计算MSE 
mean_squared_error(y_pred=y_predict, y_true=y_test)
print('R-Squared: %.4f'% estimator.score(x_test, y_test))


y_predict = estimator.predict(x_test)
plt.figure()
plt.scatter( y_predict,y_test)
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.title('Actual Prices vs Predicted Prices')
plt.show()