当前位置：网站首页>Machine Learning - Notes and Implementation of Linear Regression, Logistic Regression Problems

Machine Learning - Notes and Implementation of Linear Regression, Logistic Regression Problems

2022-07-31 07:45:00 【Miracle Fan】

回归问题

概述：

A regression problem is all about predicting the value of a continuous problem,比如……,

And if the above regression problem,利用Sigmoid函数(Logistic 回归),The ability to turn a predicted value into a probability of judging whether or not something can be done,Change the continuous value obtained by regression to （0,1）之间的概率,It can then be used to deal with binary classification problems

一元线性回归

线性回归方程为：
$\hat{y} =ax+b$
For example, given a set of data,The following scatterplot can be obtained.

x=np.array([1,2,4,6,8])
y=np.array([2,5,7,8,9])

在这里插入图片描述

for linear regression,Equivalent to we fit a straight line,It is a good way to connect the various samples in the above figure,But in general, the perfect fitting effect cannot be achieved,Just wish it was as shown in the image below,The green line represents the error between the predicted point and the true point,We want the error to be as small as possible,That is, a better fitting effect can be achieved.

y_pred=lambda x: a*x+b
plt.scatter(x,y,color='b')
plt.plot(x,y_pred(x),color='r')
plt.plot([x,x], [y,y_pred(x)], color='g')
plt.show()

在这里插入图片描述

That is, a loss function can be defined：
$L=\frac{1}{n}\sum^n_{i=1}(y^i-y_{pred}^i)$
But if you choose this function,When we do the error calculation,In some cases the predicted value is larger than the true value,In some cases the predicted value is smaller than the true value.则会导致 $y-y_{pred}$ 出现正、negative case,when adding them together,will cause mutual cancellation,So here we need to use the mean squared loss function：
$L=\frac{1}{n}\sum^n_{i=1}(y^i-y_{pred}^i)^2$
Substitute into the fitting equation：
$L=\frac{1}{n}\sum^n_{i=1}(y^i-ax^i-b)^2$
Use the least squares method to derive the rule：
$a=\frac{\sum_{i=1}^n(x_i-\bar{x})(y^i-\bar{y})}{\sum_{i=1}^n(x^i-\bar{x})^2}\\ b=\bar{y}-a\bar{x}$

def Linear_Regression(x,y):
    x_mean=np.mean(x)
    y_mean=np.mean(y)
    # num=np.sum((x-np.tile(x_mean,x.shape))*(y-np.tile(y_mean,y.shape)))
    num=np.sum((x-x_mean)*(y-y_mean))
    den=np.sum((x-x_mean)**2)
    a=num/den
    b=y_mean-a*x_mean
    return a,b

由于numpy的广播机制,Not necessary herex_mean的维度进行调整.

多元线性回归

对于多元线性回归,其一般表达式为：
$y=\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_nx_n$
This formula can be simplified to ：
$Y=\theta \cdot X$

$X=\left(\begin{array}{cccc} 1 & x_{11} & \cdots & x_{1 p} \\ 1 & x_{21} & \cdots & x_{2 p} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n 1} & \cdots & x_{n p} \end{array}\right)$

$\theta=\begin{pmatrix} \theta_0\\ \theta_1\\ \theta_2\\ \dots \\ \theta _n \end{pmatrix}$

而对于 $\theta$ 的求解,Use the least squares method used above,可以得到：
$\theta=(X_i^TX_i)^{-1}X_i^Ty$

#Generates a column for manipulating intercept values
ones = np.ones((X_train.shape[0], 1))
#在horizentalstack in the direction
X_b = np.hstack((ones, X_train))  # 将XThe matrix is converted to the first column1,其余不变的X_b矩阵
theta = linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_train)
interception = theta[0]
coef =theta[1:]

logistic回归

简单的logisticsRegression is the addition of linear regressionSigmoid函数,Implement compression of prediction results,keep it there（0,1）It can also be understood as a probability value,Then usually with 0.5作为分界线,概率大于0.5则为类别1反之为0.

It is used to utilize probability problems obtained by linear regressionsigmoidThe output of the function is a class question.
$p=\frac{1}{1+e^{-z}}$

z：Usually a linear regression equation
p:The predicted probability,Determine which category belongs to by dividing line

在这里插入图片描述

梯度下降

Gradient descent is mainly used to optimize the loss function,找到lossThe parameter value with the smallest value.

For example, suppose a loss function is
$L=(x-2.5)^2-1$
在这里插入图片描述

Then define its loss function and its derivative.

def J(theta):
    try:
        return (theta-2.5)**2-1
    except:
        return float('inf')
def dJ(theta):
    return 2*(theta-2.5)

每一次迭代
$\theta=\theta+\eta\frac{dJ}{\theta}$

def CalGradient(eta):
    theta = 0.0
    theta_history = [theta]
    epsilon = 1e-8#Used to finally terminate the computation of gradient descent
    while True:
        gradient = dJ(theta)
        last_theta = theta
        theta = theta - eta * gradient
        theta_history.append(theta)
        if (abs(J(theta) - J(last_theta)) < epsilon):
            break

    plt.title('lr:' + str(eta))
    plt.plot(x, J(x), color='r')
    plt.plot(np.array(theta_history), J(np.array(theta_history)), color='b', marker='x')
    plt.show()
    print(len(theta_history))

It is relevant when taking different learning rates,The drop chart is shown below.The learning rate is generally in 0~1之间,As shown in the figure below when the learning rate is1时,Convergence has not been reached,And when the learning rate is greater than 1时,It will show a divergent state.

在这里插入图片描述

Logistic回归的损失函数

LogisticRegression The expression after incorporating linear regression is shown below：
$p=\frac{1}{1+e^{-\theta X}}$
对于Logistic回归,The logarithmic loss function is generally used,Compute the parameters.

$cost=\left \{\begin{matrix} -\log^{(p_{pred})} \quad if &y=1\\ -\log^{(1-p_{pred})} \quad if &y=0 \end{matrix}\right.$

With a little tidying up, a loss function can be synthesized：
$cost=-y\log(p_{pred})-(1-y)\log({1-p_{pred}})$

import numpy as np

class LogisticRegression:

    def __init__(self):
        self.coef_ = None
        self.intercept_ = None
        self._theta = None

    def _sigmoid(self, x):
        y = 1.0 / (1.0 + np.exp(-x))
        return y

    def fit(self, x_train, y_train, eta=0.01, n_iters=1e4):
        assert x_train.shape[0] == y_train.shape[0], 'The number of training set and its label length samples needs to be consistent'

        def J(theta, x, y):
            p_pred = self._sigmoid(x.dot(theta))
            try:
                return -np.sum(y * np.log(p_pred) + (1 - y) * np.log(1 - p_pred)) / len(y)
            except:
                return float('inf')

        def dJ(theta, x, y):
            x = self._sigmoid(x.dot(theta))
            return x.dot(x - y) / len(x)

            # Simulate gradient descent

        def gradient_descent(X_b, y, initial_theta, eta, n_iters=1e4, epsilon=1e-8):
            theta = initial_theta
            i_iter = 0
            while i_iter < n_iters:
                gradient = dJ(theta, X_b, y)
                last_theta = theta
                theta = theta - eta * gradient
                i_iter += 1
                if (abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):
                    break
            return theta

        X_b = np.hstack([np.ones((len(x_train), 1)), x_train])
        initial_theta = np.zeros(X_b.shape[1])  # 列向量
        self._theta = gradient_descent(X_b, y_train, initial_theta, eta, n_iters)
        self.intercept_ = self._theta[0]  # 截距
        self.coef_ = self._theta[1:]  # 维度
        return self

    def predict_proba(self, X_predict):
        X_b = np.hstack([np.ones((len(X_predict), 1)), X_predict])
        return self._sigmoid(X_b.dot(self._theta))

    def predict(self, X_predict):
        proba = self.predict_proba(X_predict)
        return np.array(proba > 0.5, dtype='int')