当前位置:网站首页>22. The support vector machine (SVM), gaussian kernel function
22. The support vector machine (SVM), gaussian kernel function
2022-08-02 00:15:00 【WuJiaYFN】
- The introduction of the Gaussian kernel function
- The Gaussian kernel function is introduced in detail
一、The introduction of the Gaussian kernel function
1.1 Nonlinear decision boundary problem—常规方法(Polynomial models for advanced numbers)
for a nonlinear decision boundary problem,我们可以使用Polynomial models for advanced numbers to solve classification problems that cannot be separated by straight lines:
为了获得上图所示的判定边界,我们的模型可能是 θ 0 + θ 1 x 1 + θ 2 x 2 + θ 3 x 1 x 2 + θ 4 x 1 2 + θ 5 x 2 2 + ⋯ { {\theta }_{0}}+{ {\theta }_{1}}{ {x}_{1}}+{ {\theta }_{2}}{ {x}_{2}}+{ {\theta }_{3}}{ {x}_{1}}{ {x}_{2}}+{ {\theta }_{4}}x_{1}^{2}+{ {\theta }_{5}}x_{2}^{2}+\cdots θ0+θ1x1+θ2x2+θ3x1x2+θ4x12+θ5x22+⋯ 的形式.
我们可以用一系列的新的特征 f f f来替换模型中的每一项.例如令:
f 1 = x 1 , f 2 = x 2 , f 3 = x 1 x 2 , f 4 = x 1 2 , f 5 = x 2 2 { {f}_{1}}={ {x}_{1}},{ {f}_{2}}={ {x}_{2}},{ {f}_{3}}={ {x}_{1}}{ {x}_{2}},{ {f}_{4}}=x_{1}^{2},{ {f}_{5}}=x_{2}^{2} f1=x1,f2=x2,f3=x1x2,f4=x12,f5=x22得到 h θ ( x ) = θ 1 f 1 + θ 2 f 2 + . . . + θ n f n h_θ(x)={ {\theta }_{1}}f_1+{ {\theta }_{2}}f_2+...+{ {\theta }_{n}}f_n hθ(x)=θ1f1+θ2f2+...+θnfn.
2.2 Nonlinear decision boundary problem—优化方法(高斯核函数)
- 给定一个训练样本 x x x, 利用 x x x The individual features are pre-selected with our pre-selected地标(landmarks) l ( 1 ) , l ( 2 ) , l ( 3 ) l^{(1)},l^{(2)},l^{(3)} l(1),l(2),l(3)approximationSelect a new feature f 1 , f 2 , f 3 f_1,f_2,f_3 f1,f2,f3.
How to get new features:例如: f 1 = s i m i l a r i t y ( x , l ( 1 ) ) = e ( − ∥ x − l ( 1 ) ∥ 2 2 σ 2 ) { {f}_{1}}=similarity(x,{ {l}^{(1)}})=e(-\frac{ { {\left\| x-{ {l}^{(1)}} \right\|}^{2}}}{2{ {\sigma }^{2}}}) f1=similarity(x,l(1))=e(−2σ2∥x−l(1)∥2)
其中: ∥ x − l ( 1 ) ∥ 2 = ∑ j = 1 n ( x j − l j ( 1 ) ) 2 { {\left\| x-{ {l}^{(1)}} \right\|}^{2}}=\sum{_{j=1}^{n}}{ {({ {x}_{j}}-l_{j}^{(1)})}^{2}} ∥∥x−l(1)∥∥2=∑j=1n(xj−lj(1))2,为实例 x x x中所有特征与地标 l ( 1 ) l^{(1)} l(1)之间的距离的和.上例中的 s i m i l a r i t y ( x , l ( 1 ) ) similarity(x,{ {l}^{(1)}}) similarity(x,l(1))就是核函数,具体而言,这里是一个高斯核函数(Gaussian Kernel)
The role of landmarks:
如果一个训练样本 x x x与地标 l l l之间的距离近似于0,则新特征 f f f近似于** e − 0 = 1 e^{-0}=1 e−0=1,**
如果训练样本 x x x与地标 l l l之间距离较远,则 f f f近似于** e − ( 一个较大的数 ) = 0 e^{-(一个较大的数)}=0 e−(一个较大的数)=0**
利用高斯核函数method to predict:
在上图中,Suppose we have trained the parameter list as :$θ_0=-0.5 ,θ_1=1,θ_2=1,θ_3=0 $
then when the sample is in Magenta dots位置处,因为其离 l ( 1 ) l^{(1)} l(1)更近,但是离 l ( 2 ) l^{(2)} l(2)和 l ( 3 ) l^{(3)} l(3)较远,因此 f 1 f_1 f1接近1,而 f 2 f_2 f2, f 3 f_3 f3接近0.因此 h θ ( x ) = θ 0 + θ 1 f 1 + θ 2 f 2 + θ 1 f 3 > 0 h_θ(x)=θ_0+θ_1f_1+θ_2f_2+θ_1f_3>0 hθ(x)=θ0+θ1f1+θ2f2+θ1f3>0,因此预测 y = 1 y=1 y=1
同理可以求出,对于离 l ( 2 ) l^{(2)} l(2)较近的绿色点,带入计算得 h θ ( x ) > 0 h_θ(x)>0 hθ(x)>0也预测 y = 1 y=1 y=1,
但是对于蓝绿色的点,因为其Far from all three landmarks,带入计算得 h θ ( x ) < 0 h_θ(x)<0 hθ(x)<0预测 y = 0 y=0 y=0
Finally, we get the results based on a single training sample and the landmarks we choosedecision boundary——That is, the red circle in the picture,Predict within the red circle y = 1 y=1 y=1,Predict outside the red circle y = 0 y=0 y=0,在预测时,The features we take are not features of the training samples themselves,而是New features computed by the kernel function f 1 , f 2 , f 3 f_1,f_2,f_3 f1,f2,f3
二、The Gaussian kernel function is introduced in detail
2.1 Landmarks in the Gaussian kernel l ( 1 ) l^{(1)} l(1)与 σ \sigma σValues are different 函数 f f f 得影响
假设我们的训练样本含有两个特征[ x 1 x_{1} x1 x 2 x{_2} x2],给定地标 l ( 1 ) l^{(1)} l(1)与不同的 σ \sigma σ值,见下图:
- 说明: The coordinates of the mid-level plane are x 1 x_{1} x1, x 2 x_{2} x2而垂直坐标轴代表 f f f
可以看出,只有当 x x x与 l ( 1 ) l^{(1)} l(1)重合时 f f f才具有最大值
随着 x x x的改变 f f f值改变的速率受到 σ 2 \sigma^2 σ2的控制
- α2越大,高斯核函数变得越平滑,即得到一个随输入x变化较缓慢,模型的偏差和方差大,泛化能力差,容易过拟合
- α2越小,高斯核函数变化越剧烈,The thinner the curve, the faster the change,模型的偏差和方差越小,模型对噪声样本比较敏感.
2.2 Selection method of landmarks in Gaussian kernel function
- 通常是The number of landmarks is chosen based on the number of training sets,即如果训练集中有 m m m个样本,则我们选取 m m m个地标,并且令: l ( 1 ) = x ( 1 ) , l ( 2 ) = x ( 2 ) , . . . . . , l ( m ) = x ( m ) l^{(1)}=x^{(1)},l^{(2)}=x^{(2)},.....,l^{(m)}=x^{(m)} l(1)=x(1),l(2)=x(2),.....,l(m)=x(m)
- 好处: The new features are based on the distance between the original features and all other features in the training set
对实例 (x(i),y(i)),有:
2.3 Apply a Gaussian kernel to a support vector machineSVM中
The Gaussian kernel functionGaussian Kernel 代入 SVM 的代价函数cost function:
- 这里与之前的 cost function的区别is to use a kernel function f 代替了x
- 预测一个实例 x method corresponding to the result: 给定 x x x,计算新特征 f f f,当 $θ^Tf >= 0 时预测 时预测 时预测 y = 1$; 否则反之.
为了简化计算, The regular term is being calculated$ θ^Tθ$ 时,用$ θ^TMθ $代替 $θ^Tθ $,其中 M 是一个矩阵,The kernel function is differentM不同.
(注:Kernel functions can also theoretically be used in logistic regression,但使用 M The method of simplifying the calculation does not work for logistic regression,Computation will be very time consuming.)
2.4 SVM 中的参数
Support Vector Machine with Gaussian KernelSVM有两个参数 C C C 和 σ σ σ, 对预测结果的影响:
(1) 参数 C C C
- 当 C 较大,相当于 λ 小,可能会导致过拟合,高方差 variance
- 当 C 较小,相当于 λ 大,可能会导致欠拟合,高偏差 bias
(2) 参数 σ σ σ
当 σ 较大时,The image is flat,可能会导致低方差,高偏差 bias
当 σ 较小时,Image narrow tip,可能会导致低偏差,高方差 variance
2.5 线性核函数(linear kernel)
- 不使用核函数Also known as linear kernel function.线性核函数SVM Simple for functions,Or very many features and very few instances.
如果觉得文章对你有帮助的话,可以给我点赞鼓励一下我哦,Welcome friends to collect articles
Axure tutorial - the new base (small white strongly recommended!!!)
Various Joins of Sql
ROS 动态参数
easy-excel 解决百万数据导入导出,性能很强
With a monthly salary of 12K, the butterfly changed to a new one and moved forward bravely - she doubled her monthly salary through the career change test~
CDH6 Hue to open a "ASCII" codec can 't encode characters
Is TCP reliable?Why?
Several interview questions about golang concurrency
Using the "stack" fast computing -- reverse polish expression
【ACWing】230. 排列计数
C language Qixi is coming!It's time to show the romance of programmers!
async/await 原理及执行顺序分析
【ACWing】406. 放置机器人