当前位置：网站首页>[deep learning]: the second day of pytorch introduction to project practice: realize linear regression from zero (including detailed code)

[deep learning]: the second day of pytorch introduction to project practice: realize linear regression from zero (including detailed code)

2022-07-28 16:57:00 【JOJO's data analysis Adventure】

【 Deep learning 】: PyTorch Linear regression from zero

This article is included in 【 Deep learning 】：《PyTorch Introduction to project practice 》 special column , This column mainly records how to use PyTorch Realize deep learning notes , Try to keep updating every week , You are welcome to subscribe ！
Personal home page ：JoJo Data analysis adventure
Personal introduction ： I'm reading statistics in my senior year , At present, Baoyan has reached statistical top3 Colleges and universities continue to study for Postgraduates in Statistics
If it helps you , welcome Focus on 、 give the thumbs-up 、 Collection 、 subscribe special column

Reference material ： This column focuses on bathing God 《 Hands-on deep learning 》 For learning materials , Take notes of your study , Limited ability , If there is a mistake , Welcome to correct . At the same time, Musen uploaded teaching videos and teaching materials , You can go to study .

video ： Hands-on deep learning
The teaching material ： Hands-on deep learning

Please add a picture description

List of articles

【 Deep learning 】: PyTorch Linear regression from zero
1. Basic concept of linear regression
2. Loss function
3. gradient descent
4.Pytorch Linear regression

1. Basic concept of linear regression

Linear regression It is one of the most commonly used models in machine learning , Especially in the study of quantitative data , It can analyze the relationship between variables , And give a good explanation . Besides , It is also a good starting point for new methods ： Many interesting statistical learning methods can be regarded as the generalization or extension of linear regression . for example Lasso Return to , Ridge return ,logistic regression,softmax Return to . For the specific theory introduction, you can see my article ： Detailed explanation of linear regression of statistical learning method

The specific form of simple linear regression model can be expressed as follows ：
$y = w_1x_1+w_2x_2+...+w_nx_n+b$
In the form of vectors ：
$Y = W^TX+b$

After we get a pile of data , What we have to do is find the best parameters $W, b$ , How to find the best parameters ？ Before that, let's introduce Loss function and gradient descent

2. Loss function

Loss function is an important index to measure a model fitting , Indicates the difference between the actual value and the fitted value , In linear regression , The loss function is also called Squared error function . The reason why we ask for the sum of squares of errors , Because we generally think that the error is nonnegative , The absolute value of error is not convenient in derivation , And the error squared loss function , For most problems , Especially the problem of return , Is a reasonable choice . The specific definitions are as follows ：
$L(\hat{W},\hat{b})=\frac{1}{2m}\sum_{i=1}^{m}(y^{(i)} - W^Tx^{(i)}-b)^2$

Our goal is to select the model parameters that can minimize the loss function , Because linear regression is a very simple problem , In most cases, there are analytical solutions , Yes L Find the gradient as 0 The point of . For the sake of representation , I will be here $b$ Also put it into $W$ in , Then there are ：
$Y = W^TX$
among , $w_0=b$ $x_0=1$ , At this time, the gradient of the above calculation is 0 Then we can get the solution of the normal equation ：
$W = (X^TX)^{-1}X^Ty$

3. gradient descent

gradient descent It's a kind of optimization algorithm , Some other optimization methods will be introduced later , for example Adam,SGD etc. . In this chapter, the gradient descent is temporarily used to calculate the parameters . The idea behind the gradient drop is ： At first, we randomly choose a combination of parameters , Calculate the loss function , Then we look for the next parameter combination that can reduce the value of the loss function the most .

The formula is as follows ：
$w_{i} := w_i - \eta\frac{dL}{dw_i}$

1. Initialize a set of parameter values
2. Continuously update the parameters in the direction of negative gradient , among $\eta$ yes Learning rate , It's a super parameter , It determines how much step we take in the direction that can make the loss function drop the most , In gradient descent , Each time we subtract the learning rate from all the parameters at the same time and multiply it by the derivative of the loss function . We need to give it in advance .

We keep doing this until we get a Local minimum （local minimum）, Because we haven't tried all the parameter combinations , Therefore, it is uncertain whether the local minimum we get is Global minimum （global minimum）, Choose different combinations of initial parameters , Different local minima may be found . How to set the appropriate $\eta$ Value is something we need to consider , If $\eta$ Too big , You may not reach the lowest point , Cause no convergence , If $\eta$ Too small , That convergence process is too slow , These details will be discussed later , Next, let's see how to implement a simple linear regression model .

4.Pytorch Linear regression

Import related libraries

import random
import torch
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

4.1 Generate data set

For the convenience of example , Here, the simulation data set is used for demonstration . Suppose the sample comes from the standard normal distribution , Each sample has two characteristics , We generate 1000 Data sets , $w=[1,-1]^T$ , $b = 2$ , $\epsilon$ Is an average of 0, The standard deviation is 0.1 Is a normal distribution

def simulation_data(w,b,n):
    X = torch.normal(0,1,(n,len(w)))# Generate standard normal distribution 
    y = torch.matmul(X, w)+b# Calculate the regression fitting value 
    y += torch.normal(0,0.1,y.shape) # Add the random perturbation term 
    return X,y.reshape((-1,1))
true_w = torch.tensor([-1,1],dtype=torch.float32)
true_b = 2
features, target = simulation_data(true_w,true_b,1000)

At this point, the simulation data set has been generated , Now let's draw a graph and observe

plt.scatter(features[:,(1)].detach().numpy(),target.detach().numpy(),2)

png

I can see from the picture above ,y The relationship between and a feature shows an obvious linear relationship .

4.2 Initialize parameters

Now let's start initializing the parameters we require , Will usually $w$ Set the mean value to 0 Is a normal distribution , $b$ Set to 0 vector

w = torch.normal(0,0.1,(2,1),requires_grad=True)
b = torch.zeros(1,requires_grad=True)

4.3 Define the regression model

After initializing the parameters , Our next step is to start defining our linear regression model

def reg(X,w,b):
    return torch.matmul(X,w)+b

4.4 Calculate the loss function

def loss_fun(y_hat,y):
    return(y_hat-y)**2/2/len(y)

4.5 Use gradient descent to solve the parameter

def gd(params,n):
    with torch.no_grad():# There is no need to solve the gradient outside , Only when the parameters are updated, the parameters are calculated 
        for param in params:
            param -= n*param.grad# Make a gradient descent 
            param.grad.zero_()# Reset gradient to 0, So it won't be affected by the last time

n = 0.01# Learning rate 
num_epochs = 3# Training times 
net = reg
loss = loss_fun
X = features
y = target
for epoch in range(num_epochs):
    l = loss(reg(X,w,b),y)# Calculate the loss function 
    l.sum().backward()# Back propagation gradient 
    gd([w,b],n)# Update parameters 
    with torch.no_grad():
        train = loss(net(features,w,b),target)
        print(f'epoch{
      epoch+1},loss：{
      float(train.mean()):f}')

epoch1,loss：0.001797
epoch2,loss：0.001763
epoch3,loss：0.001729

Since this example is an analog data set , We know the real parameters , Therefore, the error of parameter estimation can be calculated

print(f'w The error of the ：{
      true_w-w.reshape(true_w.shape)}')
print(f'b The error of the ：{
      true_b-b}')

w The error of the ：tensor([-0.8461,  0.8311], grad_fn=<SubBackward0>)
b The error of the ：tensor([1.4857], grad_fn=<RsubBackward1>)

You can try other Learning rate , Let's look at the differences in the results , Generally, the more commonly used is 0.010.030.10.31

This is the introduction of this chapter , If it helps you , Please do more thumb up 、 Collection 、 Comment on 、 Focus on supporting ！！

原网站

版权声明
本文为[JOJO's data analysis Adventure]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207281559285271.html