当前位置：网站首页>Wu En 07 regularization of teacher machine learning course notes

Wu En 07 regularization of teacher machine learning course notes

2022-07-29 12:06:00 【3077491278】

7 正则化

7.1 过拟合的问题

Meaning of overfitting

Underfitting is when the fitting algorithm has high bias,数据拟合效果很差.

Overfitting is when the fitting algorithm has high variance,can fit all data,but too many function variables,Not enough data to constrain,thus cannot generalize to new samples.

如果有非常多的特征,而只有非常少的训练数据,The learned model may be able to fit the training set very well（代
The valence function may be almost 0）,但是可能会不能推广到新的数据.

解决过拟合的方法

减少特征的数量：More important features can be manually selected to keep,Or use the model selection algorithm to be described later to automatically select the features that need to be preserved.But the disadvantage of this method is that while discarding some features, it also discards some information.
正则化：保留所有的特征,但是减少参数的大小.
By plotting the hypothetical function,Choose an appropriate degree of polynomial based on whether the curve is distorted.但是大多数时候,The research questions have many characteristics,cannot be visualized,causing this method to fail.

总结

Overfitting is when the fitting algorithm has high variance,can fit all data,但泛化能力差.

The problem of overfitting can generally be solved by reducing the number of features or regularization.

7.2 代价函数

正则化

It is the assumption that the degree of polynomial in the function is too high, which leads to overfitting.,So if the coefficients of these higher-order terms are made close to0的话,is equivalent to assuming that the function is still a function of lower degree、更加平滑,can solve the problem of overfitting.

If there were a lot of features,It is not known in advance which of these features are high-order terms,Therefore, it is necessary to modify the cost function to penalize all features.The specific modification method is to add a regularization term to the original cost function $J(\theta)=\frac{1}{2 m}\left[\sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}+\lambda \sum_{j=1}^{n} \theta_{j}^{2}\right]$ .It should be noted that there is no right $\theta_0$ 进行惩罚,这是约定俗成的,But whether or not this one has little effect on the results in practice.

$\lambda \sum_{j=1}^{n} \theta_{j}^{2}$ 是正则化项, $\lambda$ 是正则化参数,used to fit the training set and keep the parameters as small as possible（to avoid overfitting）trade-off between the two goals of.

并且让代价函数最优化的软件来选择这些惩罚的程度.这样的结果是得到了一个较为简单的能防止过拟合问题的假设.

Discussion of Regularization Parameters

如果 $\lambda$ 过大,In order to make the cost function as small as possible,所有的 $\theta_i$ （不包括 $\theta_0$ ）都会趋于0,The final result is a line parallel to $x$ 轴的直线,underfitting.

所以对于正则化,Need to take a reasonable $\lambda$ 的值,这样才能更好的应用正则化.

总结

The regularization term keeps the parameters as small as possible,So as to solve the problem of overfitting.

7.3 正则化线性回归

Gradient Descent for Regularized Linear Regression

正则化线性回归的代价函数为 $J(\theta)=\frac{1}{2 m}\left[\sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}+\lambda \sum_{j=1}^{n} \theta_{j}^{2}\right]$ .

The process of gradient descent is $\begin{aligned} \theta_{0}: &=\theta_{0}-a \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{0}^{(i)} \\ \theta_{j} &:=\theta_{j}-a\left[\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}+\frac{\lambda}{m} \theta_{j}\right] \end{aligned}$

Some transformations of the second formula can be obtained $\theta_{j}:=\theta_{j}\left(1-a \frac{\lambda}{m}\right)-a \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}$ ,其中 $\frac{\lambda}{m}$ 是一个比1略小的值.也就是说,正则化线性回归的梯度下降算法的变化在于,每次都在原有算法更新规则的基础上令 $\theta$ 值减少了一个额外的值.

Regularization linear regression of the normal equation method

The normal equation method of the general linear regression model can be solved as follows: $\theta=\left(X^{T} X\right)^{-1} X^{T} y$ .

The solution method of the normal equation method of the regularized linear regression model is: $\theta=\left(X^{T} X+\lambda\left[\begin{array}{lllll}0 & & & & \\ & 1 & & & \\ & & 1 & & \\ & & & \ddots & \\ & & & & 1\end{array}\right]\right)^{-1} X^{T} y$ .

总结

This section generalizes the gradient descent method and the normal equation method to regularized linear regression.

7.4 正则化的逻辑回归模型

Similar to the treatment of regularized linear regression models,对于逻辑回归,Also add a regularized expression to the cost function,得到代价函数 $J(\theta)=\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)-\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right]+\frac{\lambda}{2 m} \sum_{j=1}^{n} \theta_{j}^{2}$ .

The process of getting the gradient descent method is：

$\theta_{0}:=\theta_{0}-a \frac{1}{m} \sum_{i=1}^{m}\left(\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{0}^{(i)}\right)$
$\theta_{j}:=\theta_{j}-a\left[\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}+\frac{\lambda}{m} \theta_{j}\right]$