当前位置：网站首页>Machine learning support vector machine SVM

Machine learning support vector machine SVM

2022-07-03 20:01:00 【Tc. Xiaohao】

List of articles

One support vector
- 1.0 brief introduction
- 1.1 Algorithmic thought
Soft space
Experimental part

SVM Is a very elegant Algorithm , Have perfect mathematical theory , Although not much is used in industry nowadays , But I decided to spend some time writing an article to sort it out .

One support vector

1.0 brief introduction

Support vector machine （support vector machines, SVM） It's a two category model , Its basic model is the linear classifier with the largest interval defined in the feature space , The maximum spacing makes it different from the perceptron ;SVM And nuclear techniques , This makes it a non-linear classifier in essence .SVM The learning strategy is to maximize the interval , It can be formalized as a problem of solving convex quadratic programming , It is also equivalent to the minimization of the regularized hinge loss function .SVM The learning algorithm of is the optimization algorithm for solving convex quadratic programming .

1.1 Algorithmic thought

Here is a list to understand ：
A brave man met the demon king in order to save the princess boss, The demon king set a test for him. If the brave can pass the test, return the princess to him , There are some red balls and basketball on the table , Ask the brave to separate them with a stick , requirement ： Try to put more balls after , Still apply .
This problem can be solved by constantly adjusting the position of the stick and keeping the maximum distance between the ball on both sides as far as possible .
Insert picture description here
Then the demon king upgraded the challenge , He placed the two balls in a disorderly way, making it impossible to use a straight stick to separate the two balls

This is certainly not difficult for our brave , The brave man slapped the table hard and hit the balls in the air. Then he quickly grabbed a piece of paper in the air and stuffed it in to separate the balls of two colors
Insert picture description here
The demon king was surprised , I still have this kind of operation , I took it , Only the princess can be returned to the brave

Finally, boring people call these balls ( data )data, Call the stick ( classifier )classifier, Find the biggest gap trick be called ( Optimize )optimization, Beating the table is called ( Nuclear transformation )kernelling, The paper is called ( hyperplane )hyperplane.

Through the above examples, we can understand svm The role of the , If the data is linearly separable , We can separate them with only one straight line , At this time, you only need to maximize the distance between the ball on the side and the straight line , This is the optimization process .
When we encounter the problem of linear indivisibility , Just use a kind of table beating trick, That is, corresponding to our kernel transformation (kernel), Then use hyperplane to classify the data in high-dimensional space .

Insert picture description here
Decision boundaries ： It's the line in the middle of the figure , Used to judge classification
Support vector ： It refers to the data closest to the decision boundary on the left and right sides of the graph
The largest interval ： Select the support vector farthest from the decision boundary , That is, the optimal solution we are looking for

First, consider two-dimensional space , The decision equation is defined as y=wx+b Corresponding $R^n$
A hyperplane in S among w Is the normal vector of the hyperplane ,b intercept
Insert picture description here
According to the calculation method of point to surface distance, we can get $distance=\frac{1}{||w||}|wx_i+b|$

Insert picture description here
Before derivation , Let's start with some definitions . Suppose a training data set on the feature space is given

Insert picture description here
among
[ The formula ] ,
$x_i$ , For the first time i eigenvectors , $y_i$ Tag for class , When it's equal to +1 Time is positive ; by -1 Time is negative . Suppose that the training data set is linearly separable .

Geometric interval ： For a given data set T And hyperplane $w * x + b$ , Define the hyperplane about the sample point $x_i,y_i)$ The geometric interval of is
Insert picture description here
Because we want the maximum interval , So the optimization goal is
Find a line (w and b), Make the point closest to the line farthest
$argmax_{ w,b} [ 1/∣∣w∣∣min_i (y_i (wx_i +b))]$

argmax Is the maximum distance min Is to find the nearest support vector , because $y_i(wx_i+b)=1$ So all that's left is
The current goal ：
$max _{w,b }1/∣∣w∣|$
Because our learning algorithm generally likes to find the minimum loss, we turn the problem into
Routine routine ： The problem of solving the maximum value is transformed into the problem of solving the minimum value
Insert picture description here
For the convenience of derivation, we multiply $\frac{1}{2}$ Square is also for derivation because $∣ ∣ w ∣ ∣$ yes 2 Norms are signed with roots for the convenience of derivation

solve ： Apply Lagrange
The problem has been transformed into a convex quadratic programming problem, so it can be solved by Lagrange dual method , The premise is that our objective function must be convex , Because only convex functions can guarantee the existence of global optimal solutions , This is associated with convex optimization , Our hyperplane is a convex . The problem of finding the optimal solution usually exists in the following types

1) Unrestricted : The objective function without constraints is ) $m i n f (x)$ Directly use Fermat lemma to find the extreme value of the derivative solution of the objective function
2) Equality constraints ： There is an equation $h_j(x),j=1,2,...m$ Under the circumstances , It is necessary to use Lagrange multiplier method to introduce multiplier vector and f(x) $m i n f (x)$ Construct a new objective function to solve .
3) Unequal constraints ： There are unequal constraints $g_i(x)<=0,i=1,2...n$ There may also be equality constraints $h_j(x),j=1,2,...m$ At this time, we also need to construct a new objective function with these constraints and multiplier vectors , adopt KKT The necessary condition that the condition can solve the optimal value .

【kkt Conditions 】

After being processed by Lagrange function x The derivative is 0
$h_i(x)=0$
$g_j(x)<=0$
Insert picture description here

Soft space

The above conditions assume that the data are completely linearly separable , When the data is not completely linearly separable and there is noise, we need to introduce a relaxation factor
Insert picture description here

Experimental part

from sklearn.svm import SVC
from sklearn import datasets

iris=datasets.load_iris()
# Select all samples , Take the second and third feature 
X=iris['data'][:,(2,3)]
y=iris['target']

# Take two categories 
setosa_or=(y==0)|(y==1)
X=X[setosa_or]
y=y[setosa_or]

svm_clf=SVC(kernel='linear',C=float('inf'))
svm_clf.fit(X,y)


# Data from 0 To 5.5 Yes 200 individual 
x0=np.linspace(0,5.5,200)
# Three curves 
pred_1=5*x0-20
pred_2=x0-1.8
pred_3=0.1*x0+0.5

def plot_svc_decision_boundary(svm_clf,xmin,xmax,sv=True):
    # The weight 
    w=svm_clf.coef_[0]
    
    # bias 
    b=svm_clf.intercept_[0]
    
    decison_boundary=-w[0]/w[1]*x0-b/w[1]
    margin=1/w[1]
    # Upper boundary 
    gutter_up=decison_boundary+margin
    # Lower boundary 
    gutter_down=decison_boundary-margin
    if sv:
        svs=svm_clf.support_vectors_
        plt.scatter(svs[:,0],svs[:,1],s=180,facecolors='#FFAAAA')
    plt.plot(x0,decison_boundary,'k-',linewidth=2)
    plt.plot(x0,gutter_up,'k--',linewidth=2)
    plt.plot(x0,gutter_down,'k--',linewidth=2)
plt.figure(figsize=(14,4))
plt.subplot(121)
# Yellow and blue data points 
plt.plot(X[:,0][y==1],X[:,1][y==1],'bs')
plt.plot(X[:,0][y==0],X[:,1][y==0],'ys')
# Three lines 
plt.plot(x0,pred_1,'g--',linewidth=2)
plt.plot(x0,pred_2,'m--',linewidth=2)
plt.plot(x0,pred_3,'r--',linewidth=2)
plt.axis([0,5.5,0,2])

plt.subplot(122)
plot_svc_decision_boundary(svm_clf,0,5.5)
plt.plot(X[:,0][y==1],X[:,1][y==1],'bs')
plt.plot(X[:,0][y==0],X[:,1][y==0],'ys')
plt.axis([0,5.5,0,2])