当前位置：网站首页>[machine learning notes] univariate linear regression principle, formula and code implementation

[machine learning notes] univariate linear regression principle, formula and code implementation

2022-07-06 05:24:00 【Catching sheep on the green grassland】

Summary

Linear regression is the basis of logical regression , Logistic regression is also a part of neural network , For resolution 2 Classification problem

Linear regression is the basis of all algorithms

linear relationship And Nonlinear relation

Concept ：

Linear relationship means that the relationship between variables is Once function , An argument x And dependent variables y The relationship between is expressed as A straight line , Two independent variables and dependent variables y The relationship between is expressed as A plane
Nonlinear relation refers to an independent variable x And dependent variables y The relationship between is expressed as A curve , Two independent variables and dependent variables y The relationship between is expressed as A surface

An argument x It is equivalent to a feature , When fitting, it is a line , And if there are multiple features , Is the fitting surface
The linear relation can be understood as a function of degree , No matter how many arguments there are

Example ：

linear relationship ：$$y = a\times x + b$$
Nonlinear relation ：$$y = x^2$$

The return question

Concept ： Predict a Continuous problem The numerical ,

Linear regression is mainly used to deal with regression problems , A few cases are used to deal with classification problems

Univariate linear regression

Concept ： There is only one independent variable and dependent variable （ An independent variable is called a unary ） The situation of , And there is a linear relationship between the independent variable and the dependent variable （ Once function ） The regression model of

Representation form ：$$y = a \times x + b$$

Only x One The independent variables ,y by The dependent variable ,a by Slope , Also known as x The weight of ,b by intercept

effect ： Find a suitable straight line through the univariate linear regression model , Best fit the independent variables x And dependent variables y The relationship between , So we know one x Value , You can find the most possible through this fitting line y

The process of learning univariate linear model is to get the appropriate through training data a and b The process of , That is, the parameters of the univariate linear model are a and b, When entering a new test data point , We can predict through the trained model .

How to evaluate the quality of the model

The goal is ： The smaller the difference between the predicted value and the real value, the better , The smaller the distance , The better the effect of our model

A natural idea is ： For each of these points （x） All are calculated

\[y - y\_predict \]

Then add up all the values and divide by the number of samples , This is to reduce the impact of samples on the results .

The formula ：

\[\displaystyle\frac{1}{n}\displaystyle \sum_{i=1}^{n}(y^i-y\_predict^i) \]

The problem with this formula ： The predicted value may be greater than the real value or less than the real value , This will cause errors to be weakened , Positive and negative neutralization leads to the final cumulative error close to 0

improvement ： Calculate the absolute value of the error of each point , That is to say $|y^i-y\_predict^i|$, Then we add up

problem ： Subsequent error calculation and derivation problems

Absolute value function , such as $$y = |x|$$ stay x=0 Continuous , But in x The left derivative at is -1, The right derivative is 1, It's not equal , Derivable functions must be smooth , So the function is x=0 Do not guide

Further optimization ： For each point, the calculated error , Square the result once , And in order to ignore the influence of the number of samples , Average.

The formula ：

\[\displaystyle\frac{1}{n}\displaystyle \sum_{i=1}^{n}(y^i-y\_predict^i)^2 \]

Least square method

because

\[\displaystyle y\_predict^i=ax^i+b \]

Substitute into the formula in the previous section

\[\displaystyle\frac{1}{n}\displaystyle \sum_{i=1}^{n}(y^i-ax^i-b)^2 \]

Find the optimal parameters by the least square method a and b, So that the expression is as small as possible

Concept ： A mathematical optimization technique , Find the optimal parameters by minimizing the sum of squares of errors

The formula ：

\[\displaystyle a=\frac{\sum_{i=1}^{n}(x^i-\overline x)(y^i-\overline y)}{\sum_{i=1}^{n}(x^i-\overline x)^2} \]

\[\displaystyle b = \overline y - a\overline x \]

Code implementation

import numpy as np
from matplotlib import pyplot as plt

if __name__ == '__main__':
    #  Prepare the data 
    x = np.array([1, 2, 4, 6, 8])  #  Univariate linear regression model only deals with vectors , Instead of dealing with matrices 
    y = np.array([2, 5, 7, 8, 9])
    x_mean = np.mean(x)
    y_mean = np.mean(y)
    
    #  seek a and b
    denominator = 0.0  #  The denominator 
    numerator = 0.0  #  molecular 
    for x_i, y_i in zip(x, y):  #  take x, y Vectors are combined to form tuples (1, 2)、(2, 5)
        numerator += (x_i - x_mean) * (y_i - y_mean)
        denominator += (x_i - x_mean) ** 2
    a = numerator / denominator
    b = y_mean - a * x_mean
    
    #  use a and b Construct a linear function , The output prediction value is stored in y_predict
    y_predict = a * x + b  #  This function is a good fit to the training set x Of 
    
    #  Draw this straight line , And the data of the training set 
    plt.scatter(x, y, color='b')
    plt.plot(x, y_predict, color='r')
    plt.xlabel('x', fontsize=15)
    plt.ylabel('y', fontsize=15)
    plt.show()
    
    #  Input test data , Return to a value 
    x_test = 7
    y_predict_test = a * x_test + b
    print(y_predict_test)

Univariate linear regression model only deals with vectors , Instead of dealing with matrices

After some encapsulation ：

import numpy as np
import matplotlib.pyplot as plt

class SimpleLinearRegressionSelf:

    #  Initialize variable 
    def __init__(self):
        """ initialization simple linear regression Model """
        self.a_ = None  #  Use in class , Non user external input variables 
        self.b_ = None  #  Use in class , Non user external input variables 

    #  Training models 
    def fit(self, x_train, y_train):
        assert x_train.ndim == 1
        x_mean = np.mean(x_train)
        y_mean = np.mean(y_train)
        denominator = 0.0
        numerator = 0.0

        for x_i, y_i in zip(x_train, y_train):
            numerator += (x_i - x_mean) * (y_i - y_mean)
            denominator += (x_i - x_mean) ** 2
        self.a_ = numerator / denominator
        self.b_ = y_mean - self.a_ * x_mean

        return self

    #  forecast 
    def predict(self, x_test_group): #  The input is a set of vectors 
        #  Make a prediction for each vector in the input vector set , The specific implementation of prediction is encapsulated in _predict Function 
        return np.array([self._predict(x_test) for x_test in x_test_group])

    def _predict(self, x_test):
        #  Find each input x_test To get the predicted value 
        return self.a_ * x_test + self.b_

    #  Measure model scores 
    def mean_squared_error(self, y_true, y_predict):
        return np.sum((y_true - y_predict) ** 2) / len(y_true)

    def r_square(self, y_true, y_predict):
        #  Calculate the specified data （ Array elements ） Variance along the specified axis 
        return 1 - (self.mean_squared_error(y_true, y_predict) / np.var(y_true))


if __name__ == '__main__':
    x = np.array([1, 2, 4, 6, 8])
    y = np.array([2, 5, 7, 8, 9])

    lr = SimpleLinearRegressionSelf()
    lr.fit(x, y)
    print(lr.predict([7]))
    print(lr.r_square([8, 9], lr.predict([6, 8])))

Here, the formula to measure the model score is ：

\[\displaystyle R^2 = 1-\frac{\frac{1}{n}\sum_{i=1}^{n}(y\_predict^i-y^i)^2}{\sum_{i=1}^{n}(\overline y-y^i)^2} \]

原网站

版权声明
本文为[Catching sheep on the green grassland]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202132112569010.html

当前位置：网站首页>[machine learning notes] univariate linear regression principle, formula and code implementation

[machine learning notes] univariate linear regression principle, formula and code implementation

Summary

linear relationship And Nonlinear relation

The return question

Univariate linear regression

How to evaluate the quality of the model

Least square method

Code implementation

边栏推荐

猜你喜欢

随机推荐