当前位置:网站首页>[deep learning]: the second day of pytorch introduction to project practice: realize linear regression from zero (including detailed code)
[deep learning]: the second day of pytorch introduction to project practice: realize linear regression from zero (including detailed code)
2022-07-28 16:57:00 【JOJO's data analysis Adventure】
【 Deep learning 】: PyTorch Linear regression from zero
- This article is included in 【 Deep learning 】:《PyTorch Introduction to project practice 》 special column , This column mainly records how to use
PyTorchRealize deep learning notes , Try to keep updating every week , You are welcome to subscribe ! - Personal home page :JoJo Data analysis adventure
- Personal introduction : I'm reading statistics in my senior year , At present, Baoyan has reached statistical top3 Colleges and universities continue to study for Postgraduates in Statistics
- If it helps you , welcome
Focus on、give the thumbs-up、Collection、subscribespecial column
Reference material : This column focuses on bathing God 《 Hands-on deep learning 》 For learning materials , Take notes of your study , Limited ability , If there is a mistake , Welcome to correct . At the same time, Musen uploaded teaching videos and teaching materials , You can go to study .
- video : Hands-on deep learning
- The teaching material : Hands-on deep learning

List of articles
1. Basic concept of linear regression
Linear regression It is one of the most commonly used models in machine learning , Especially in the study of quantitative data , It can analyze the relationship between variables , And give a good explanation . Besides , It is also a good starting point for new methods : Many interesting statistical learning methods can be regarded as the generalization or extension of linear regression . for example Lasso Return to , Ridge return ,logistic regression,softmax Return to . For the specific theory introduction, you can see my article : Detailed explanation of linear regression of statistical learning method
The specific form of simple linear regression model can be expressed as follows :
y = w 1 x 1 + w 2 x 2 + . . . + w n x n + b y = w_1x_1+w_2x_2+...+w_nx_n+b y=w1x1+w2x2+...+wnxn+b
In the form of vectors :
Y = W T X + b Y = W^TX+b Y=WTX+b
After we get a pile of data , What we have to do is find the best parameters W , b W,b W,b, How to find the best parameters ? Before that, let's introduce Loss function and gradient descent
2. Loss function
Loss function is an important index to measure a model fitting , Indicates the difference between the actual value and the fitted value , In linear regression , The loss function is also called Squared error function . The reason why we ask for the sum of squares of errors , Because we generally think that the error is nonnegative , The absolute value of error is not convenient in derivation , And the error squared loss function , For most problems , Especially the problem of return , Is a reasonable choice . The specific definitions are as follows :
L ( W ^ , b ^ ) = 1 2 m ∑ i = 1 m ( y ( i ) − W T x ( i ) − b ) 2 L(\hat{W},\hat{b})=\frac{1}{2m}\sum_{i=1}^{m}(y^{(i)} - W^Tx^{(i)}-b)^2 L(W^,b^)=2m1i=1∑m(y(i)−WTx(i)−b)2
Our goal is to select the model parameters that can minimize the loss function , Because linear regression is a very simple problem , In most cases, there are analytical solutions , Yes L Find the gradient as 0 The point of . For the sake of representation , I will be here b b b Also put it into W W W in , Then there are :
Y = W T X Y = W^TX Y=WTX
among , w 0 = b w_0=b w0=b x 0 = 1 x_0=1 x0=1, At this time, the gradient of the above calculation is 0 Then we can get the solution of the normal equation :
W = ( X T X ) − 1 X T y W = (X^TX)^{-1}X^Ty W=(XTX)−1XTy
3. gradient descent
gradient descent It's a kind of optimization algorithm , Some other optimization methods will be introduced later , for example Adam,SGD etc. . In this chapter, the gradient descent is temporarily used to calculate the parameters . The idea behind the gradient drop is : At first, we randomly choose a combination of parameters , Calculate the loss function , Then we look for the next parameter combination that can reduce the value of the loss function the most .
The formula is as follows :
w i : = w i − η d L d w i w_{i} := w_i - \eta\frac{dL}{dw_i} wi:=wi−ηdwidL
- 1. Initialize a set of parameter values
- 2. Continuously update the parameters in the direction of negative gradient , among η \eta η yes Learning rate , It's a super parameter , It determines how much step we take in the direction that can make the loss function drop the most , In gradient descent , Each time we subtract the learning rate from all the parameters at the same time and multiply it by the derivative of the loss function . We need to give it in advance .
We keep doing this until we get a Local minimum (local minimum), Because we haven't tried all the parameter combinations , Therefore, it is uncertain whether the local minimum we get is Global minimum (global minimum), Choose different combinations of initial parameters , Different local minima may be found . How to set the appropriate η \eta η Value is something we need to consider , If η \eta η Too big , You may not reach the lowest point , Cause no convergence , If η \eta η Too small , That convergence process is too slow , These details will be discussed later , Next, let's see how to implement a simple linear regression model .
4.Pytorch Linear regression
Import related libraries
import random
import torch
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
4.1 Generate data set
For the convenience of example , Here, the simulation data set is used for demonstration . Suppose the sample comes from the standard normal distribution , Each sample has two characteristics , We generate 1000 Data sets , w = [ 1 , − 1 ] T w=[1,-1]^T w=[1,−1]T, b = 2 b=2 b=2, ϵ \epsilon ϵ Is an average of 0, The standard deviation is 0.1 Is a normal distribution
def simulation_data(w,b,n):
X = torch.normal(0,1,(n,len(w)))# Generate standard normal distribution
y = torch.matmul(X, w)+b# Calculate the regression fitting value
y += torch.normal(0,0.1,y.shape) # Add the random perturbation term
return X,y.reshape((-1,1))
true_w = torch.tensor([-1,1],dtype=torch.float32)
true_b = 2
features, target = simulation_data(true_w,true_b,1000)
At this point, the simulation data set has been generated , Now let's draw a graph and observe
plt.scatter(features[:,(1)].detach().numpy(),target.detach().numpy(),2)

I can see from the picture above ,y The relationship between and a feature shows an obvious linear relationship .
4.2 Initialize parameters
Now let's start initializing the parameters we require , Will usually w w w Set the mean value to 0 Is a normal distribution , b b b Set to 0 vector
w = torch.normal(0,0.1,(2,1),requires_grad=True)
b = torch.zeros(1,requires_grad=True)
4.3 Define the regression model
After initializing the parameters , Our next step is to start defining our linear regression model
def reg(X,w,b):
return torch.matmul(X,w)+b
4.4 Calculate the loss function
def loss_fun(y_hat,y):
return(y_hat-y)**2/2/len(y)
4.5 Use gradient descent to solve the parameter
def gd(params,n):
with torch.no_grad():# There is no need to solve the gradient outside , Only when the parameters are updated, the parameters are calculated
for param in params:
param -= n*param.grad# Make a gradient descent
param.grad.zero_()# Reset gradient to 0, So it won't be affected by the last time
n = 0.01# Learning rate
num_epochs = 3# Training times
net = reg
loss = loss_fun
X = features
y = target
for epoch in range(num_epochs):
l = loss(reg(X,w,b),y)# Calculate the loss function
l.sum().backward()# Back propagation gradient
gd([w,b],n)# Update parameters
with torch.no_grad():
train = loss(net(features,w,b),target)
print(f'epoch{
epoch+1},loss:{
float(train.mean()):f}')
epoch1,loss:0.001797
epoch2,loss:0.001763
epoch3,loss:0.001729
Since this example is an analog data set , We know the real parameters , Therefore, the error of parameter estimation can be calculated
print(f'w The error of the :{
true_w-w.reshape(true_w.shape)}')
print(f'b The error of the :{
true_b-b}')
w The error of the :tensor([-0.8461, 0.8311], grad_fn=<SubBackward0>)
b The error of the :tensor([1.4857], grad_fn=<RsubBackward1>)
You can try other Learning rate , Let's look at the differences in the results , Generally, the more commonly used is 0.010.030.10.31
This is the introduction of this chapter , If it helps you , Please do more thumb up 、 Collection 、 Comment on 、 Focus on supporting !!
边栏推荐
- Ruoyi集成flyway后启动报错的解决方法
- [pointer internal skill cultivation] character pointer + pointer array + array pointer + pointer parameter (I)
- Outline and principle of structured design -- modularization
- 如何使用Fail2Ban保护WordPress登录页面
- Redis series 4: sentinel (sentinel mode) with high availability
- Reset grafana login password to default password
- 综合设计一个OPPE主页--页面的售后服务
- parseJson
- [JS] eight practical new functions of 1394-es2022
- 【深度学习】:《PyTorch入门到项目实战》第七天之模型评估和选择(上):欠拟合和过拟合(含源码)
猜你喜欢

微软:Edge 浏览器已内置磁盘缓存压缩技术,可节省空间占用且不降低系统性能

Introduction and implementation of stack (detailed explanation)

RE14: reading paper illsi interpretable low resource legal decision making

Leetcode daily practice - the number of digits in the offer 56 array of the sword finger

Probability theory and mathematical statistics Chapter 1

【深度学习】:《PyTorch入门到项目实战》第四天:从0到1实现logistic回归(附源码)

阿里大哥教你如何正确认识关于标准IO缓冲区的问题

Leetcode learn complex questions with random pointer linked lists (detailed explanation)

有趣的 Kotlin 0x0A:Fun with composition

【深度学习】:《PyTorch入门到项目实战》第二天:从零实现线性回归(含详细代码)
随机推荐
在AD中添加差分对及连线
Given positive integers n and m, both between 1 and 10 ^ 9, n < = m, find out how many numbers have even digits between them (including N and m)
Question making note 2 (add two numbers)
PostgreSQL每周新闻—2022年7月20日
Splash (渲染JS服务)介绍安装
Quickly master kotlin set functions
LeetCode每日一练 —— 160. 相交链表
IM即时通讯开发优化提升连接成功率、速度等
Some opinions on bug handling
Binary representation of negative integers and floating point numbers
ANSYS secondary development - MFC interface calls ADPL file
Microsoft question 100 - do it every day - question 11
Probability theory and mathematical statistics Chapter 1
Alibaba cloud - Wulin headlines - site building expert competition
MySQL安装教程
有趣的 Kotlin 0x08:What am I
小程序:scroll-view默认滑倒最下面
【深度学习】:《PyTorch入门到项目实战》:简洁代码实现线性神经网络(附代码)
Do you really understand CMS garbage collector?
【深度学习】:《PyTorch入门到项目实战》第二天:从零实现线性回归(含详细代码)