当前位置：网站首页>19. Support Vector Machines - Intuitive Understanding of Optimization Objectives and Large Spacing

19. Support Vector Machines - Intuitive Understanding of Optimization Objectives and Large Spacing

2022-07-31 02:19:00 【WuJiaYFN】

支持向量机(Support Vector Machine) is a more powerful algorithm,Widely used in industry and academia.与逻辑回归和神经网络相比, SVMLearning complex非线性方程 provides a more clarity,更加强大的方式

在这里插入图片描述

If we wish to assume a function of The output value will approach 1,则应当θ^Tx远大于0,>> Means far greater than,当 θ^Tx 远大于0时,That is, to the right of the figure,It is found that the output of logistic regression will be close to 1
If we wish to assume a function of The output value will approach 0,则应当θ^Tx远小于0,<< means far less than,当 θ^Tx 远小于0时,That is, to the left of the figure,It is found that the output of logistic regression will be close to 0

在这里插入图片描述

当 y = 1 时,随着 z 增大,h(x)=1/(1+e-z）逼近1,cost逐渐减小;Logistic regression above cost function Only the first item remains（Get the grey curve in the left image above）
当 y = 0 时,随着 z 减小,h(x)=1/(1+e-z）逼近0,cost逐渐减小;Logistic regression above cost function Only the second item remains（Get the grey curve in the right image above）

将新的 cost function instead of in logistic regression cost function, 新的 The cost function is the rose-colored curve in the figure above,It is divided into two parts: straight line and diagonal line.
We call the function on the leftcost1(z),The function on the right is called cost0(z)

在这里插入图片描述

如上图所示,Cost function in logistic regression（cost function）分为 A、B两个部分,进行下面的操作：
(1) 使用第3步定义的 cost1() 和 cost0() Replace the corresponding term in the formula
(2) 根据 SVM 的习惯,除去 1/m 这个系数（因为1/m 仅是个常量,Removing it gives the same θ 最优值）
(3) 对于逻辑回归, cost function 为 A + λ × B ,通过设置不同的 λ 达到优化目的.对于SVM, 我们删掉 λ,引入常数 C, 将 cost function 改为 C × A + B, 通过设置不同的 C 达到优化目的. （在优化过程中,Its meaning is the same as logistic regression;可以理解为 C = 1 / λ）

在这里插入图片描述

注意：The assumed output in logistic regression is a probability value. 而 SVM 直接预测 y = 1,还是 y = 0

在这里插入图片描述

Sometimes support vector machines are called 大间距分类器,That is, to find a positive class、The separator with the largest distance for the negative class（The black line in the image below）
- The two blue lines drawn by the figure,It can be found that there is a larger shortest distance between the black border and the training samples,However, the first and blue line training samples are very close,It will perform worse than the black line when separating the samples
- We call this distance 支持向量机的间距,This is why support vector machines are robust,Because he strives to separate the samples with a maximum spacing
- 鲁棒性 It means that the system is disturbed by uncertainty,具有保持某种性能不变的能力,It is also the ability of the system to survive abnormal and dangerous situations

为了增加鲁棒性,得到更好的结果(避免欠拟合、过拟合,Cope with linear inseparability),SVM Do more than maximize spacing,The following optimizations are also made：

在逻辑回归中,in the previous definition,θTx ≥ 0 are classified as positive,θTx < 0 classified as negative.事实上,SVM 的要求更严格： θTx ≥ 1 are classified as positive;θTx ≤ -1 classified as negative
This is equivalent to embedding an extra security factor in the SVM,或者说安全的间距因子

在这里插入图片描述

当C 特别大时,We want to make the first term of the cost function 0,则应该满足 θ^Tx >=1 且 y=1 或者 θ^Tx <=1 且 y=0
当第一项为0后,The goal becomes to minimize the second term

在这里插入图片描述

If you want to separate the samples with the maximum spacing,即将 C 设置的很大.Then just because of one outlier,The decision boundary will change from the black line in the image below to the pink line,这是不明智的
如果 C 设置的小一点,You end up with this black line.It can ignore the effects of some outliers,And when the data is linearly inseparable,They can also be properly separated,Get better decision boundaries
因为 C = 1 / λ,因此:
- C 较小时,相当于 λ 较大.可能会导致欠拟合,高偏差 variance
- C 较大时,相当于 λ 较小.可能会导致过拟合,高方差 bias
C的取值问题才是SVM的核心,CGet it when not so big,In order to have both a big boundary and a certain one/Some abnormal data are not sensitive