当前位置:网站首页>Machine learning support vector machine SVM
Machine learning support vector machine SVM
2022-07-03 20:01:00 【Tc. Xiaohao】
List of articles
SVM Is a very elegant Algorithm , Have perfect mathematical theory , Although not much is used in industry nowadays , But I decided to spend some time writing an article to sort it out .
One support vector
1.0 brief introduction
Support vector machine (support vector machines, SVM) It's a two category model , Its basic model is the linear classifier with the largest interval defined in the feature space , The maximum spacing makes it different from the perceptron ;SVM And nuclear techniques , This makes it a non-linear classifier in essence .SVM The learning strategy is to maximize the interval , It can be formalized as a problem of solving convex quadratic programming , It is also equivalent to the minimization of the regularized hinge loss function .SVM The learning algorithm of is the optimization algorithm for solving convex quadratic programming .
1.1 Algorithmic thought
Here is a list to understand :
A brave man met the demon king in order to save the princess boss, The demon king set a test for him. If the brave can pass the test, return the princess to him , There are some red balls and basketball on the table , Ask the brave to separate them with a stick , requirement : Try to put more balls after , Still apply .
This problem can be solved by constantly adjusting the position of the stick and keeping the maximum distance between the ball on both sides as far as possible .
Then the demon king upgraded the challenge , He placed the two balls in a disorderly way, making it impossible to use a straight stick to separate the two balls 
This is certainly not difficult for our brave , The brave man slapped the table hard and hit the balls in the air. Then he quickly grabbed a piece of paper in the air and stuffed it in to separate the balls of two colors 
The demon king was surprised , I still have this kind of operation , I took it , Only the princess can be returned to the brave
Finally, boring people call these balls ( data )data, Call the stick ( classifier )classifier, Find the biggest gap trick be called ( Optimize )optimization, Beating the table is called ( Nuclear transformation )kernelling, The paper is called ( hyperplane )hyperplane.
Through the above examples, we can understand svm The role of the , If the data is linearly separable , We can separate them with only one straight line , At this time, you only need to maximize the distance between the ball on the side and the straight line , This is the optimization process .
When we encounter the problem of linear indivisibility , Just use a kind of table beating trick, That is, corresponding to our kernel transformation (kernel), Then use hyperplane to classify the data in high-dimensional space .

Decision boundaries : It's the line in the middle of the figure , Used to judge classification
Support vector : It refers to the data closest to the decision boundary on the left and right sides of the graph
The largest interval : Select the support vector farthest from the decision boundary , That is, the optimal solution we are looking for
First, consider two-dimensional space , The decision equation is defined as y=wx+b Corresponding R n R^n Rn
A hyperplane in S among w Is the normal vector of the hyperplane ,b intercept 
According to the calculation method of point to surface distance, we can get d i s t a n c e = 1 ∣ ∣ w ∣ ∣ ∣ w x i + b ∣ distance=\frac{1}{||w||}|wx_i+b| distance=∣∣w∣∣1∣wxi+b∣

Before derivation , Let's start with some definitions . Suppose a training data set on the feature space is given

among
,
x i x_i xi, For the first time i eigenvectors , y i y_i yi Tag for class , When it's equal to +1 Time is positive ; by -1 Time is negative . Suppose that the training data set is linearly separable .
Geometric interval : For a given data set T And hyperplane w ∗ x + b w*x+b w∗x+b , Define the hyperplane about the sample point ( x i , y i ) (x_i,y_i) (xi,yi) The geometric interval of is 
Because we want the maximum interval , So the optimization goal is
Find a line (w and b), Make the point closest to the line farthest
a r g m a x w , b [ 1 / ∣ ∣ w ∣ ∣ m i n i ( y i ( w x i + b ) ) ] argmax_{ w,b} [ 1/∣∣w∣∣min_i (y_i (wx_i +b))] argmaxw,b[1/∣∣w∣∣mini(yi(wxi+b))]
argmax Is the maximum distance min Is to find the nearest support vector , because y i ( w x i + b ) = 1 y_i(wx_i+b)=1 yi(wxi+b)=1 So all that's left is
The current goal :
m a x w , b 1 / ∣ ∣ w ∣ ∣ max _{w,b }1/∣∣w∣| maxw,b1/∣∣w∣∣
Because our learning algorithm generally likes to find the minimum loss, we turn the problem into
Routine routine : The problem of solving the maximum value is transformed into the problem of solving the minimum value 
For the convenience of derivation, we multiply 1 2 \frac{1}{2} 21 Square is also for derivation because ∣ ∣ w ∣ ∣ ||w|| ∣∣w∣∣ yes 2 Norms are signed with roots for the convenience of derivation
solve : Apply Lagrange
The problem has been transformed into a convex quadratic programming problem, so it can be solved by Lagrange dual method , The premise is that our objective function must be convex , Because only convex functions can guarantee the existence of global optimal solutions , This is associated with convex optimization , Our hyperplane is a convex . The problem of finding the optimal solution usually exists in the following types
1) Unrestricted : The objective function without constraints is ) m i n f ( x ) minf(x) minf(x) Directly use Fermat lemma to find the extreme value of the derivative solution of the objective function
2) Equality constraints : There is an equation h j ( x ) , j = 1 , 2 , . . . m h_j(x),j=1,2,...m hj(x),j=1,2,...m Under the circumstances , It is necessary to use Lagrange multiplier method to introduce multiplier vector and f(x) m i n f ( x ) minf(x) minf(x) Construct a new objective function to solve .
3) Unequal constraints : There are unequal constraints g i ( x ) < = 0 , i = 1 , 2... n g_i(x)<=0,i=1,2...n gi(x)<=0,i=1,2...n There may also be equality constraints h j ( x ) , j = 1 , 2 , . . . m h_j(x),j=1,2,...m hj(x),j=1,2,...m At this time, we also need to construct a new objective function with these constraints and multiplier vectors , adopt KKT The necessary condition that the condition can solve the optimal value .
【kkt Conditions 】
After being processed by Lagrange function x The derivative is 0
h i ( x ) = 0 h_i(x)=0 hi(x)=0
g j ( x ) < = 0 g_j(x)<=0 gj(x)<=0

Soft space
The above conditions assume that the data are completely linearly separable , When the data is not completely linearly separable and there is noise, we need to introduce a relaxation factor 
Experimental part
from sklearn.svm import SVC
from sklearn import datasets
iris=datasets.load_iris()
# Select all samples , Take the second and third feature
X=iris['data'][:,(2,3)]
y=iris['target']
# Take two categories
setosa_or=(y==0)|(y==1)
X=X[setosa_or]
y=y[setosa_or]
svm_clf=SVC(kernel='linear',C=float('inf'))
svm_clf.fit(X,y)
# Data from 0 To 5.5 Yes 200 individual
x0=np.linspace(0,5.5,200)
# Three curves
pred_1=5*x0-20
pred_2=x0-1.8
pred_3=0.1*x0+0.5
def plot_svc_decision_boundary(svm_clf,xmin,xmax,sv=True):
# The weight
w=svm_clf.coef_[0]
# bias
b=svm_clf.intercept_[0]
decison_boundary=-w[0]/w[1]*x0-b/w[1]
margin=1/w[1]
# Upper boundary
gutter_up=decison_boundary+margin
# Lower boundary
gutter_down=decison_boundary-margin
if sv:
svs=svm_clf.support_vectors_
plt.scatter(svs[:,0],svs[:,1],s=180,facecolors='#FFAAAA')
plt.plot(x0,decison_boundary,'k-',linewidth=2)
plt.plot(x0,gutter_up,'k--',linewidth=2)
plt.plot(x0,gutter_down,'k--',linewidth=2)
plt.figure(figsize=(14,4))
plt.subplot(121)
# Yellow and blue data points
plt.plot(X[:,0][y==1],X[:,1][y==1],'bs')
plt.plot(X[:,0][y==0],X[:,1][y==0],'ys')
# Three lines
plt.plot(x0,pred_1,'g--',linewidth=2)
plt.plot(x0,pred_2,'m--',linewidth=2)
plt.plot(x0,pred_3,'r--',linewidth=2)
plt.axis([0,5.5,0,2])
plt.subplot(122)
plot_svc_decision_boundary(svm_clf,0,5.5)
plt.plot(X[:,0][y==1],X[:,1][y==1],'bs')
plt.plot(X[:,0][y==0],X[:,1][y==0],'ys')
plt.axis([0,5.5,0,2])
The effect is as follows 
边栏推荐
- 03 -- QT OpenGL EBO draw triangle
- P5.js development - setting
- 4. Data splitting of Flink real-time project
- Global and Chinese market of rubidium standard 2022-2028: Research Report on technology, participants, trends, market size and share
- BOC protected alanine zinc porphyrin Zn · TAPP ala BOC / alanine zinc porphyrin Zn · TAPP ala BOC / alanine zinc porphyrin Zn · TAPP ala BOC / alanine zinc porphyrin Zn · TAPP ala BOC supplied by Qiyu
- [raid] [simple DP] mine excavation
- IP address is such an important knowledge that it's useless to listen to a younger student?
- Microservice knowledge sorting - search technology and automatic deployment technology
- 04 -- QT OpenGL two sets of shaders draw two triangles
- Chapter 1: simplify the same code decimal sum s (D, n)
猜你喜欢

BOC protected alanine porphyrin compound TAPP ala BOC BOC BOC protected phenylalanine porphyrin compound TAPP Phe BOC Qi Yue supply

Kubernetes cluster builds efk log collection platform

2022-06-30 網工進階(十四)路由策略-匹配工具【ACL、IP-Prefix List】、策略工具【Filter-Policy】

AcWing 1460. Where am i?
![[effective Objective-C] - block and grand central distribution](/img/09/22b979b97ea13d649b4b904637b79f.jpg)
[effective Objective-C] - block and grand central distribution

Professional interpretation | how to become an SQL developer

Network security Kali penetration learning how to get started with web penetration how to scan based on nmap

Sparse matrix (triple) creation, transpose, traversal, addition, subtraction, multiplication. C implementation

IP address is such an important knowledge that it's useless to listen to a younger student?

IPv6 experiment
随机推荐
Global and Chinese markets of active matrix LCD 2022-2028: Research Report on technology, participants, trends, market size and share
Global and Chinese market of high temperature Silver sintering paste 2022-2028: Research Report on technology, participants, trends, market size and share
2022 Xinjiang latest construction eight members (standard members) simulated examination questions and answers
Test panghu was teaching you how to use the technical code to flirt with girls online on Valentine's Day 520
Rad+xray vulnerability scanning tool
Global and Chinese market of high purity copper foil 2022-2028: Research Report on technology, participants, trends, market size and share
4. Data splitting of Flink real-time project
Network security Kali penetration learning how to get started with web penetration how to scan based on nmap
5. MVVM model
FPGA 学习笔记:Vivado 2019.1 工程创建
第一章:求奇因数代数和,求同吗小数和s(d, n),简化同码小数和s(d, n),拓广同码小数和s(d, n)
Teach you how to quickly recover data by deleting recycle bin files by mistake
unittest框架基本使用
Ruby replaces gem Alibaba image
BOC protected tryptophan porphyrin compound (TAPP Trp BOC) Pink Solid 162.8mg supply - Qiyue supply
Micro service knowledge sorting - three pieces of micro Service Technology
6. Data agent object Defineproperty method
Global and Chinese markets for medical temperature sensors 2022-2028: Research Report on technology, participants, trends, market size and share
PR 2021 quick start tutorial, how to create new projects and basic settings of preferences?
Chapter 1: simplify the same code decimal sum s (D, n)