当前位置:网站首页>Machine learning summary (I): linear regression, ridge regression, Lasso regression
Machine learning summary (I): linear regression, ridge regression, Lasso regression
2022-07-01 13:33:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm your friend, Quan Jun .
Linear regression as a regression analysis technology , The dependent variable of its analysis belongs to continuous variable , If the dependent variable is transformed into a discrete variable , Turn it into a classification problem . Regression analysis belongs to supervised Study problem , This blog will focus on reviewing the knowledge points of standard linear regression , And the possible problems in linear regression are briefly discussed , Two variants of linear regression, ridge regression and Lasso Return to , Finally through sklearn The library simulates the whole regression process .
Directory structure
- General form of linear regression
- Possible problems in linear regression
- Over fitting problem and its solution
- Linear regression code implementation
- The return of the mountains and the future Lasso Return to
- Ridge regression and Lasso Regression code implementation
General form of linear regression
Possible problems in linear regression
- There are two ways to solve the minimum value of the loss function : Gradient descent method and normal equation , The comparison between the two is listed in the attached notes .
- Feature scaling : That is to normalize the characteristic data , There are two benefits of feature scaling , First, it can improve the convergence speed of the model , Because if the data difference between features is large , Take two characteristics , Take these two features as horizontal and vertical coordinates to draw contour map , Drawn is a flat ellipse , At this time, finding the gradient direction through the gradient descent method will eventually take a zigzag route perpendicular to the contour , The iteration speed becomes slower . But if the feature is normalized , The entire contour map will appear circular , The direction of the gradient points to the center of the circle , The iteration speed is much faster than the former . Second, it can improve the accuracy of the model .
- Learning rate α Selection of : If learning rate α Selection too small , It will lead to more iterations , The convergence rate slows down ; Learning rate α The selection is too large , It is possible to skip the optimal solution , Eventually, there is no convergence at all .
Over fitting problem and its solution
- problem : The following picture shows the fitting problem
- resolvent :(1): Discard some features that have little impact on our final prediction , Specific features that need to be discarded can be passed PCA Algorithm to achieve ;(2): Using regularization techniques , Keep all features , But reduce the parameters in front of the feature θ Size , Specifically, you can modify the form of loss function in linear regression , Ridge regression and Lasso This is what regression does .
Linear regression code example
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model, discriminant_analysis, cross_validation
def load_data():
diabetes = datasets.load_diabetes()
return cross_validation.train_test_split(diabetes.data, diabetes.target, test_size=0.25, random_state=0)
def test_LinearRegression(*data):
X_train, X_test, y_train, y_test = data
# adopt sklearn Of linear_model Create a linear regression object
linearRegression = linear_model.LinearRegression()
# Training
linearRegression.fit(X_train, y_train)
# adopt LinearRegression Of coef_ Attribute gets the weight vector ,intercept_ get b Value
print(" The weight vector :%s, b The value of is :%.2f" % (linearRegression.coef_, linearRegression.intercept_))
# Calculate the value of the loss function
print(" The value of the loss function : %.2f" % np.mean((linearRegression.predict(X_test) - y_test) ** 2))
# Calculate the predicted performance score
print(" Predict performance scores : %.2f" % linearRegression.score(X_test, y_test))
if __name__ == '__main__':
# Acquired data set
X_train, X_test, y_train, y_test = load_data()
# Perform training and output prediction results
test_LinearRegression(X_train, X_test, y_train, y_test)Linear regression example output
The weight vector :[ -43.26774487 -208.67053951 593.39797213 302.89814903 -560.27689824
261.47657106 -8.83343952 135.93715156 703.22658427 28.34844354], b The value of is :153.07
The value of the loss function : 3180.20
Predict performance scores : 0.36The return of the mountains and the future Lasso Return to
The return of the mountains and the future Lasso The emergence of regression is to solve the over fitting of linear regression and solve it by normal equation method θ In the process of x Transpose times x The problem of irreversibility , These two kinds of regression achieve their goals by introducing regularization terms into the loss function , See the following figure for the comparison of the loss functions of the three :
among λ It is called regularization parameter , If λ The selection is too large , Will put all the parameters θ Minimize , Cause under fitting , If λ Selection too small , It will cause the over fitting problem to be solved improperly , therefore λ The selection of is a technical activity . The return of the mountains and the future Lasso The biggest difference of regression is that ridge regression introduces L2 Norm penalty term ,Lasso Regression introduces L1 Norm penalty term ,Lasso Regression can make many of the loss functions θ All become 0, This is better than ridge regression , Because the return of the ridge requires all θ All exist , In this way, the amount of calculation Lasso The return will be much smaller than the ridge return .
You can see ,Lasso The regression will eventually tend to a straight line , The reason is that there are many θ The values have all been 0, The ridge regression has a certain smoothness , Because of all the θ All values exist .
Ridge regression and Lasso Regression code implementation
Ridge regression code example
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model, discriminant_analysis, cross_validation
def load_data():
diabetes = datasets.load_diabetes()
return cross_validation.train_test_split(diabetes.data, diabetes.target, test_size=0.25, random_state=0)
def test_ridge(*data):
X_train, X_test, y_train, y_test = data
ridgeRegression = linear_model.Ridge()
ridgeRegression.fit(X_train, y_train)
print(" The weight vector :%s, b The value of is :%.2f" % (ridgeRegression.coef_, ridgeRegression.intercept_))
print(" The value of the loss function :%.2f" % np.mean((ridgeRegression.predict(X_test) - y_test) ** 2))
print(" Predict performance scores : %.2f" % ridgeRegression.score(X_test, y_test))
# Test different α Effect of value on prediction performance
def test_ridge_alpha(*data):
X_train, X_test, y_train, y_test = data
alphas = [0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000]
scores = []
for i, alpha in enumerate(alphas):
ridgeRegression = linear_model.Ridge(alpha=alpha)
ridgeRegression.fit(X_train, y_train)
scores.append(ridgeRegression.score(X_test, y_test))
return alphas, scores
def show_plot(alphas, scores):
figure = plt.figure()
ax = figure.add_subplot(1, 1, 1)
ax.plot(alphas, scores)
ax.set_xlabel(r"$\alpha$")
ax.set_ylabel(r"score")
ax.set_xscale("log")
ax.set_title("Ridge")
plt.show()
if __name__ == '__main__':
# Use default alpha
# Acquired data set
#X_train, X_test, y_train, y_test = load_data()
# Train and predict the results
#test_ridge(X_train, X_test, y_train, y_test)
# Use your own alpha
X_train, X_test, y_train, y_test = load_data()
alphas, scores = test_ridge_alpha(X_train, X_test, y_train, y_test)
show_plot(alphas, scores)Lasso Regression code example
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model, discriminant_analysis, cross_validation
def load_data():
diabetes = datasets.load_diabetes()
return cross_validation.train_test_split(diabetes.data, diabetes.target, test_size=0.25, random_state=0)
def test_lasso(*data):
X_train, X_test, y_train, y_test = data
lassoRegression = linear_model.Lasso()
lassoRegression.fit(X_train, y_train)
print(" The weight vector :%s, b The value of is :%.2f" % (lassoRegression.coef_, lassoRegression.intercept_))
print(" The value of the loss function :%.2f" % np.mean((lassoRegression.predict(X_test) - y_test) ** 2))
print(" Predict performance scores : %.2f" % lassoRegression.score(X_test, y_test))
# Test different α Effect of value on prediction performance
def test_lasso_alpha(*data):
X_train, X_test, y_train, y_test = data
alphas = [0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000]
scores = []
for i, alpha in enumerate(alphas):
lassoRegression = linear_model.Lasso(alpha=alpha)
lassoRegression.fit(X_train, y_train)
scores.append(lassoRegression.score(X_test, y_test))
return alphas, scores
def show_plot(alphas, scores):
figure = plt.figure()
ax = figure.add_subplot(1, 1, 1)
ax.plot(alphas, scores)
ax.set_xlabel(r"$\alpha$")
ax.set_ylabel(r"score")
ax.set_xscale("log")
ax.set_title("Ridge")
plt.show()
if __name__=='__main__':
X_train, X_test, y_train, y_test = load_data()
# Use default alpha
#test_lasso(X_train, X_test, y_train, y_test)
# Use your own alpha
alphas, scores = test_lasso_alpha(X_train, X_test, y_train, y_test)
show_plot(alphas, scores)Attach study notes
reference
- python War machine learning
- Andrew Ng Machine learning open class
- http://www.jianshu.com/p/35e67c9e4cbf
- http://freemind.pluskid.org/machine-learning/sparsity-and-some-basics-of-l1-regularization/#ed61992b37932e208ae114be75e42a3e6dc34cb3
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/131445.html Link to the original text :https://javaforall.cn
边栏推荐
- 04-Redis源码数据结构之字典
- Several models of IO blocking, non blocking, IO multiplexing, signal driven and asynchronous IO
- [machine learning] VAE variational self encoder learning notes
- Judea pearl, Turing prize winner: 19 causal inference papers worth reading recently
- Computer network interview knowledge points
- 1553B environment construction
- Arthas use
- 进入前六!博云在中国云管理软件市场销量排行持续上升
- Beidou communication module Beidou GPS module Beidou communication terminal DTU
- 7. Icons
猜你喜欢

刘对(火线安全)-多云环境的风险发现

陈宇(Aqua)-安全->云安全->多云安全

孔松(信通院)-数字化时代云安全能力建设及趋势

MySQL六十六问,两万字+五十图详解!复习必备

Colorful five pointed star SVG dynamic web page background JS special effect

Summary of interview questions (1) HTTPS man in the middle attack, the principle of concurrenthashmap, serialVersionUID constant, redis single thread,
基于mysql乐观锁实现秒杀的示例代码

一文读懂TDengine的窗口查询功能

研发效能度量框架解读

Professor Li Zexiang, Hong Kong University of science and technology: I'm wrong. Why is engineering consciousness more important than the best university?
随机推荐
微机原理与接口技术知识点整理复习–纯手打
String input function
内容审计技术
SAP intelligent robot process automation (IRPA) solution sharing
PG basics -- Logical Structure Management (trigger)
Introduction to reverse debugging PE structure input table output table 05/07
1553B环境搭建
Three questions about scientific entrepreneurship: timing, pain points and important decisions
盲盒NFT数字藏品平台系统开发(搭建源码)
Flow management technology
商汤科技崩盘 :IPO时已写好的剧本
Huawei HMS core joins hands with hypergraph to inject new momentum into 3D GIS
MySQL statistical bill information (Part 2): data import and query
20个实用的 TypeScript 单行代码汇总
Yarn restart applications record recovery
陈宇(Aqua)-安全->云安全->多云安全
Analysis report on the development prospect and investment strategic planning of China's wafer manufacturing Ⓔ 2022 ~ 2028
1. Sum of two numbers: given an integer array num and an integer target value, please find the two integers whose sum is the target value target in the array and return their array subscripts
spark源码(五)DAGScheduler TaskScheduler如何配合提交任务,application、job、stage、taskset、task对应关系是什么?
Social distance (cow infection)