当前位置:网站首页>Sklearn machine learning foundation (linear regression, under fitting, over fitting, ridge regression, model loading and saving)
Sklearn machine learning foundation (linear regression, under fitting, over fitting, ridge regression, model loading and saving)
2022-07-26 08:48:00 【Natural color】
Catalog
1.2 The normal equation of the least squares method
1.3 Gradient descent of least square method ( Universal )
1.4 Normal equation predicts Boston house price
1.5 Gradient decline predicts Boston house prices
1.6 Regression performance evaluation
2. Under fitting and over fitting
3. Ridge return ( Linear regression with regularization )
1. Linear model

The multiplication of matrix meets the requirements of linear regression operation
1.1 Loss function :

The process of optimization and iteration Is to seek the most suitable Weighted The process
1.2 The normal equation of the least squares method

1.3 Gradient descent of least square method ( Universal )

1.4 Normal equation predicts Boston house price
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
def myliner():
'''
# Linear regression prediction of house price
:return:
'''
# get data
lb = load_boston()
# Split data
x_train, x_test, y_train, y_test = train_test_split(lb.data, \
lb.target, test_size=0.2)
# Standardization Whether the target value should be standardized need !!
# Eigenvalue standardization
std_x = StandardScaler()
x_train = std_x.fit_transform(x_train)
x_test = std_x.transform(x_test)
# Target value standardization
std_y = StandardScaler()
y_train = std_y.fit_transform(y_train.reshape(-1, 1)) # Change the target value into two dimensions
y_test = std_y.transform(y_test.reshape(-1, 1))
# forecast
# Normal equation solving prediction
lr = LinearRegression()
lr.fit(x_train, y_train)
print(lr.coef_)
# Predict the house price of the test set The previous standardization , Through here inverse Then switch back
y_predict = std_y.inverse_transform(lr.predict(x_test))
print(" The predicted price of each house in the test set :", y_predict)
return None
if __name__ == '__main__':
myliner()1.5 Gradient decline predicts Boston house prices
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
def myliner():
'''
# Linear regression prediction of house price
:return:
'''
# get data
lb = load_boston()
# Split data
x_train, x_test, y_train, y_test = train_test_split(lb.data, lb.target, test_size=0.2)
# Standardization Whether the target value should be standardized need !!
# Eigenvalue standardization
std_x = StandardScaler()
x_train = std_x.fit_transform(x_train)
x_test = std_x.transform(x_test)
# Target value standardization
std_y = StandardScaler()
y_train = std_y.fit_transform(y_train.reshape(-1, 1)) # Change the target value into two dimensions
y_test = std_y.transform(y_test.reshape(-1, 1))
# forecast
# Gradient descent solution prediction
sgd = SGDRegressor()
sgd.fit(x_train, y_train)
print(sgd.coef_)
# Predict the house price of the test set The previous standardization , Through here inverse Then switch back
y_predict = std_y.inverse_transform(sgd.predict(x_test))
print(" The predicted price of each house in the test set :", y_predict)
return None
if __name__ == '__main__':
myliner()1.6 Regression performance evaluation
Small data :LinearRegression( You can't Solve the fitting problem ) And others
Large scale data :SGDRegressor
sklearn.metrics.mean_squared_error


2. Under fitting and over fitting

2.1 resolvent
Under fitting :
Learning too few features of data , should Increase the number of features in the data
Over fitting :
Too many original features , There are some noisy features , The model is too complex because it tries to consider all the test data points
solve :
Make feature selection , Eliminate the characteristics of great relevance ( It's hard to do )
Cross validation ( Get all the data trained ) test
L2 Regularization ( understand ): Reduce the weight of higher-order terms
3. Ridge return ( Linear regression with regularization )
sklearn.linear_model.Ridge

The greater the regularization , The smaller the weight value, the closer to 0
From regression The regression coefficient is more practical , More reliable . in addition , It can make the fluctuation range of estimation parameters smaller , Become more stable . It has great practical value in the research of morbid data .
from sklearn.datasets import load_boston
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
def myliner():
'''
# Linear regression prediction of house price
:return:
'''
# get data
lb = load_boston()
# Split data
x_train, x_test, y_train, y_test = train_test_split(lb.data, lb.target, test_size=0.2)
# Standardization Whether the target value should be standardized need !!
# Eigenvalue standardization
std_x = StandardScaler()
x_train = std_x.fit_transform(x_train)
x_test = std_x.transform(x_test)
# Target value standardization
std_y = StandardScaler()
y_train = std_y.fit_transform(y_train.reshape(-1, 1)) # Change the target value into two dimensions
y_test = std_y.transform(y_test.reshape(-1, 1))
# forecast
# Ridge regression solves the prediction
rd = Ridge(alpha=1.0)
rd.fit(x_train, y_train)
print(rd.coef_)
# Predict the house price of the test set The previous standardization , Through here inverse Then switch back
y_predict = std_y.inverse_transform(rd.predict(x_test))
print(" The predicted price of each house in the test set :", y_predict)
print(" Mean square error of ridge regression :", mean_squared_error(std_y.inverse_transform(y_test), y_predict))
return None
if __name__ == '__main__':
myliner()4. Model loading and saving
# Save the trained model
with open('rd.pickle', 'wb') as fw:
pickle.dump(rd, fw)
# Load model
with open('rd.pickle', 'rb') as fr:
new_rd = pickle.load(fr)from sklearn.datasets import load_boston
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import pickle
def myliner():
'''
# Linear regression prediction of house price
:return:
'''
# get data
lb = load_boston()
# Split data
x_train, x_test, y_train, y_test = train_test_split(lb.data, lb.target, test_size=0.2)
# Standardization Whether the target value should be standardized need !!
# Eigenvalue standardization
std_x = StandardScaler()
x_train = std_x.fit_transform(x_train)
x_test = std_x.transform(x_test)
# Target value standardization
std_y = StandardScaler()
y_train = std_y.fit_transform(y_train.reshape(-1, 1)) # Change the target value into two dimensions
y_test = std_y.transform(y_test.reshape(-1, 1))
# forecast
# Ridge regression solves the prediction
rd = Ridge(alpha=1.0)
rd.fit(x_train, y_train)
print(rd.coef_)
# Save the trained model
with open('rd.pickle', 'wb') as fw:
pickle.dump(rd, fw)
# Load model
with open('rd.pickle', 'rb') as fr:
new_rd = pickle.load(fr)
# Predict the house price of the test set The previous standardization , Through here inverse Then switch back
y_predict = std_y.inverse_transform(new_rd.predict(x_test))
print(" The predicted price of each house in the test set :", y_predict)
print(" Mean square error of ridge regression :", mean_squared_error(std_y.inverse_transform(y_test), y_predict))
return None
if __name__ == '__main__':
myliner()边栏推荐
- Fluent custom popupmenubutton
- 基于C语言的内存管理-动态分区分配方式模拟
- ES6模块化导入导出)(实现页面嵌套)
- PXE principles and concepts
- One click deployment of lamp and LNMP scripts is worth having
- 基于Raft共识协议的KV数据库
- Memory management based on C language - Simulation of dynamic partition allocation
- OA项目之我的会议(会议排座&送审)
- 有限元学习知识点备案
- Ansible important components (playbook)
猜你喜欢

pl/sql之集合-2

keepalived双机热备

File management file system based on C #

基于C语言设计的换乘指南打印系统

Winter vacation homework & Stamp cutting

Pan micro e-cology8 foreground SQL injection POC

OA项目之我的会议(会议排座&送审)

Hegong sky team vision training Day6 - traditional vision, image processing

KV database based on raft consensus protocol

node-v下载与应用、ES6模块导入与导出
随机推荐
03异常处理,状态保持,请求钩子---04大型项目结构与蓝图
The data read by Flink Oracle CDC is always null. Do you know
23.6 23.7 web environment web environment variable reading
Memory management - dynamic partition allocation simulation
Fluent uses protobuf
Use index to optimize SQL query "suggestions collection"
Flutter upgrade 2.10
The effective condition of MySQL joint index and the invalid condition of index
Arbitrum launched the anytrust chain to meet the diverse needs of ecological projects
[database] gbase 8A MPP cluster v95 installation and uninstall
Grid segmentation
[recommended collection] MySQL 30000 word essence summary index (II) [easy to understand]
What are the differences in the performance of different usages such as count (*), count (primary key ID), count (field) and count (1)? That's more efficient
【加密周报】加密市场有所回温?寒冬仍未解冻 盘点上周加密市场发生的重大事件
Mysql database connection / query index and other common syntax
正则表达式:判断是否符合USD格式
Neo eco technology monthly | help developers play smart contracts
Okaleido上线聚变Mining模式,OKA通证当下产出的唯一方式
解决C#跨线程调用窗体控件的问题
Registration of finite element learning knowledge points