当前位置：网站首页>support vector machine

support vector machine

2022-06-21 10:33:00 【I'm afraid I'm not retarded】

Support vector machine （SVM）

Support vector machine SVM Introduce

What is support vector machine

So-called " Support vector " It refers to the training sample points at the edge of the interval area ,” machine “ Refers to the algorithm .

Is to find the spacing surface with the largest spacing , In fact, it solves the problem of optimal classifier design .

So the hyperplane is decided ( Hyperplane equation ) The key to your location , That is, some sample data with bits at the edge of the sample set .

Problem analysis ：

Purpose ： Find an optimal classifier , In other words , Find a hyperplane , Maximize the classification interval
Optimization of the Objective function ： Classification interval . The classification interval needs to be maximized
Optimization of the object ： Classification hyperplane ( The decision plane ). By adjusting the position of the classification hyperplane , Maximize spacing , Achieve optimization goals

Related concepts of support vector machine

hyperplane

hyperplane (Hyperplane) yes n Codimension in a dimensional Euclidean space is equal to 1 The linear subspace of , It is a straight line in two-dimensional space , Three dimensional space is a two-dimensional plane .

In other words, it is easier to understand ： One n Space of dimension , Its hyperplane is a n-1 Space of dimension .

interval

The interval is actually twice the vertical distance from the point corresponding to the support vector to the classification hyperplane , That is to say ：W = 2d

optimization problem

To find the support vector （ That is, the sample points on the hyperplane corresponding to the dotted line ）, The distance from the support vector to the classification decision surface is required d Maximum , That is, it can meet the requirements of the optimal classifier （ For linearly separable records ）.

Find a set of variables ω,γ Make interval W = 2d Maximum .

Kernel functions and relaxation variables

In the case of linear indivisibility

In the case of linear nonseparable, we can no longer directly use the previously obtained linearly separable hyperplane , Instead, consider mapping the sample to a higher dimensional space , We hope to be linearly separable in this high dimensional space , You can use the idea of linear separability , Construct a classification hyperplane .

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-JXgxoRzl-1626877649260)(D:\QianFeng\ Blog \ machine learning \ Alibaba cloud classroom \ Detailed explanation of machine learning algorithm \image- Linear nonseparable mapping .png)]

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-2lhqr3dy-1626877649264)(D:\QianFeng\ Blog \ machine learning \ Alibaba cloud classroom \ Detailed explanation of machine learning algorithm \imag- Linear nonseparable mapping 2.png)]

（ Theoretically ） If the original space is finite , That is, the number of attributes is limited , There must be a high dimensional special space to make the samples linearly separable .

Kernel function

Support vector machine through a nonlinear change Φ(x), Map the input space to the high-dimensional feature space . If only the inner product operation is used to solve the support vector machine , And there is a function in the low dimensional output space K(x,x’), He happens to be equal to this inner product in high-dimensional space , namely K(x,x') = <Φ(x).Φ(x')>, Then there is no need to calculate the complex nonlinear transformation , By function K(x,x’) The inner product of nonlinear variation is obtained directly , simplified calculation . function K(x,x') be called Kernel function

Classification and selection of kernel functions

Common kernel function types

Linear Kernel： Linear kernel
Polynomial Kernel： Polynomial kernel
Gaussian radial basis Kernel(RBF)： Gaussian radial basis kernel
Sigmoid Kernel

Select kernel function ：

There are no clear and workable guidelines
Select kernel function by prior knowledge
Using cross validation , Try different kernels , Choose the one with the least error
Mixed kernel function method , Mix different kernel functions to use
The most common is RBF, The second is the linear kernel

Linear indivisibility caused by outliers

The linearly nonseparable samples are mapped to a high dimensional space , The probability of finding the classification hyperplane is greatly increased . There may still be some situations that are difficult to deal with , For example, the structure of the sample data itself is not nonlinear , However, due to the noise, some points far away from the normal position ( Outliers Ourlier) May have a great impact on the model .（ Outliers affect the classification hyperplane ）

Relax variables （ How to deal with outliers that affect the classification hyperplane ）

Because we need to consider outliers , Constraints should be appropriately relaxed .

SVM Many classification

SVM It is designed for binary classification problems , However, the multi classification problem can be realized by constructing appropriate multi classifiers ：

direct method ： Modify the objective function directly , The parameter solution of multiple classification surfaces is combined into an optimization problem .
indirect method ： By effectively combining multiple dichotomies SVM classifier , So as to realize multi classification .
- One to many ： Divide the samples into certain types of samples and other types of samples , Train to get K individual SVM, Classify the unknown samples into the class with the largest classification function value
  - advantage ： Training K individual SVM, The number of less , Fast classification
  - shortcoming ： All samples for each training , And the number of negative samples is much larger than that of positive samples , Adding new categories requires retraining all models
- One on one ： Design one for any two samples SVM, need k(k-1)/2 A classifier , When classifying an unknown sample , Take the category that gets the most votes among all classifiers .
  - advantage ： You don't need to train all the models , Just focus on the new model
  - shortcoming ： Too many models , Training 、 The prediction time is quite long .

When classifying an unknown sample , Take the category that gets the most votes among all classifiers .

-  advantage ： You don't need to train all the models , Just focus on the new model 
-  shortcoming ： Too many models , Training 、 The prediction time is quite long .

原网站

版权声明
本文为[I'm afraid I'm not retarded]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202221440355776.html