当前位置:网站首页>Machine learning support vector machine SVM
Machine learning support vector machine SVM
2022-07-03 20:01:00 【Tc. Xiaohao】
List of articles
SVM Is a very elegant Algorithm , Have perfect mathematical theory , Although not much is used in industry nowadays , But I decided to spend some time writing an article to sort it out .
One support vector
1.0 brief introduction
Support vector machine (support vector machines, SVM) It's a two category model , Its basic model is the linear classifier with the largest interval defined in the feature space , The maximum spacing makes it different from the perceptron ;SVM And nuclear techniques , This makes it a non-linear classifier in essence .SVM The learning strategy is to maximize the interval , It can be formalized as a problem of solving convex quadratic programming , It is also equivalent to the minimization of the regularized hinge loss function .SVM The learning algorithm of is the optimization algorithm for solving convex quadratic programming .
1.1 Algorithmic thought
Here is a list to understand :
A brave man met the demon king in order to save the princess boss, The demon king set a test for him. If the brave can pass the test, return the princess to him , There are some red balls and basketball on the table , Ask the brave to separate them with a stick , requirement : Try to put more balls after , Still apply .
This problem can be solved by constantly adjusting the position of the stick and keeping the maximum distance between the ball on both sides as far as possible .
Then the demon king upgraded the challenge , He placed the two balls in a disorderly way, making it impossible to use a straight stick to separate the two balls
This is certainly not difficult for our brave , The brave man slapped the table hard and hit the balls in the air. Then he quickly grabbed a piece of paper in the air and stuffed it in to separate the balls of two colors
The demon king was surprised , I still have this kind of operation , I took it , Only the princess can be returned to the brave
Finally, boring people call these balls ( data )data, Call the stick ( classifier )classifier, Find the biggest gap trick be called ( Optimize )optimization, Beating the table is called ( Nuclear transformation )kernelling, The paper is called ( hyperplane )hyperplane.
Through the above examples, we can understand svm The role of the , If the data is linearly separable , We can separate them with only one straight line , At this time, you only need to maximize the distance between the ball on the side and the straight line , This is the optimization process .
When we encounter the problem of linear indivisibility , Just use a kind of table beating trick, That is, corresponding to our kernel transformation (kernel), Then use hyperplane to classify the data in high-dimensional space .
Decision boundaries : It's the line in the middle of the figure , Used to judge classification
Support vector : It refers to the data closest to the decision boundary on the left and right sides of the graph
The largest interval : Select the support vector farthest from the decision boundary , That is, the optimal solution we are looking for
First, consider two-dimensional space , The decision equation is defined as y=wx+b Corresponding R n R^n Rn
A hyperplane in S among w Is the normal vector of the hyperplane ,b intercept
According to the calculation method of point to surface distance, we can get d i s t a n c e = 1 ∣ ∣ w ∣ ∣ ∣ w x i + b ∣ distance=\frac{1}{||w||}|wx_i+b| distance=∣∣w∣∣1∣wxi+b∣
Before derivation , Let's start with some definitions . Suppose a training data set on the feature space is given
among ,
x i x_i xi, For the first time i eigenvectors , y i y_i yi Tag for class , When it's equal to +1 Time is positive ; by -1 Time is negative . Suppose that the training data set is linearly separable .
Geometric interval : For a given data set T And hyperplane w ∗ x + b w*x+b w∗x+b , Define the hyperplane about the sample point ( x i , y i ) (x_i,y_i) (xi,yi) The geometric interval of is
Because we want the maximum interval , So the optimization goal is
Find a line (w and b), Make the point closest to the line farthest
a r g m a x w , b [ 1 / ∣ ∣ w ∣ ∣ m i n i ( y i ( w x i + b ) ) ] argmax_{ w,b} [ 1/∣∣w∣∣min_i (y_i (wx_i +b))] argmaxw,b[1/∣∣w∣∣mini(yi(wxi+b))]
argmax Is the maximum distance min Is to find the nearest support vector , because y i ( w x i + b ) = 1 y_i(wx_i+b)=1 yi(wxi+b)=1 So all that's left is
The current goal :
m a x w , b 1 / ∣ ∣ w ∣ ∣ max _{w,b }1/∣∣w∣| maxw,b1/∣∣w∣∣
Because our learning algorithm generally likes to find the minimum loss, we turn the problem into
Routine routine : The problem of solving the maximum value is transformed into the problem of solving the minimum value
For the convenience of derivation, we multiply 1 2 \frac{1}{2} 21 Square is also for derivation because ∣ ∣ w ∣ ∣ ||w|| ∣∣w∣∣ yes 2 Norms are signed with roots for the convenience of derivation
solve : Apply Lagrange
The problem has been transformed into a convex quadratic programming problem, so it can be solved by Lagrange dual method , The premise is that our objective function must be convex , Because only convex functions can guarantee the existence of global optimal solutions , This is associated with convex optimization , Our hyperplane is a convex . The problem of finding the optimal solution usually exists in the following types
1) Unrestricted : The objective function without constraints is ) m i n f ( x ) minf(x) minf(x) Directly use Fermat lemma to find the extreme value of the derivative solution of the objective function
2) Equality constraints : There is an equation h j ( x ) , j = 1 , 2 , . . . m h_j(x),j=1,2,...m hj(x),j=1,2,...m Under the circumstances , It is necessary to use Lagrange multiplier method to introduce multiplier vector and f(x) m i n f ( x ) minf(x) minf(x) Construct a new objective function to solve .
3) Unequal constraints : There are unequal constraints g i ( x ) < = 0 , i = 1 , 2... n g_i(x)<=0,i=1,2...n gi(x)<=0,i=1,2...n There may also be equality constraints h j ( x ) , j = 1 , 2 , . . . m h_j(x),j=1,2,...m hj(x),j=1,2,...m At this time, we also need to construct a new objective function with these constraints and multiplier vectors , adopt KKT The necessary condition that the condition can solve the optimal value .
【kkt Conditions 】
After being processed by Lagrange function x The derivative is 0
h i ( x ) = 0 h_i(x)=0 hi(x)=0
g j ( x ) < = 0 g_j(x)<=0 gj(x)<=0
Soft space
The above conditions assume that the data are completely linearly separable , When the data is not completely linearly separable and there is noise, we need to introduce a relaxation factor
Experimental part
from sklearn.svm import SVC
from sklearn import datasets
iris=datasets.load_iris()
# Select all samples , Take the second and third feature
X=iris['data'][:,(2,3)]
y=iris['target']
# Take two categories
setosa_or=(y==0)|(y==1)
X=X[setosa_or]
y=y[setosa_or]
svm_clf=SVC(kernel='linear',C=float('inf'))
svm_clf.fit(X,y)
# Data from 0 To 5.5 Yes 200 individual
x0=np.linspace(0,5.5,200)
# Three curves
pred_1=5*x0-20
pred_2=x0-1.8
pred_3=0.1*x0+0.5
def plot_svc_decision_boundary(svm_clf,xmin,xmax,sv=True):
# The weight
w=svm_clf.coef_[0]
# bias
b=svm_clf.intercept_[0]
decison_boundary=-w[0]/w[1]*x0-b/w[1]
margin=1/w[1]
# Upper boundary
gutter_up=decison_boundary+margin
# Lower boundary
gutter_down=decison_boundary-margin
if sv:
svs=svm_clf.support_vectors_
plt.scatter(svs[:,0],svs[:,1],s=180,facecolors='#FFAAAA')
plt.plot(x0,decison_boundary,'k-',linewidth=2)
plt.plot(x0,gutter_up,'k--',linewidth=2)
plt.plot(x0,gutter_down,'k--',linewidth=2)
plt.figure(figsize=(14,4))
plt.subplot(121)
# Yellow and blue data points
plt.plot(X[:,0][y==1],X[:,1][y==1],'bs')
plt.plot(X[:,0][y==0],X[:,1][y==0],'ys')
# Three lines
plt.plot(x0,pred_1,'g--',linewidth=2)
plt.plot(x0,pred_2,'m--',linewidth=2)
plt.plot(x0,pred_3,'r--',linewidth=2)
plt.axis([0,5.5,0,2])
plt.subplot(122)
plot_svc_decision_boundary(svm_clf,0,5.5)
plt.plot(X[:,0][y==1],X[:,1][y==1],'bs')
plt.plot(X[:,0][y==0],X[:,1][y==0],'ys')
plt.axis([0,5.5,0,2])
The effect is as follows
边栏推荐
- Sword finger offer 30 Stack containing min function
- Global and Chinese markets of lithium chloride 2022-2028: Research Report on technology, participants, trends, market size and share
- 05 -- QT OpenGL draw cube uniform
- Parental delegation mechanism
- Chapter 1: recursively find the factorial n of n!
- Utilisation de base du cadre unitest
- Global and Chinese market of full authority digital engine control (FADEC) 2022-2028: Research Report on technology, participants, trends, market size and share
- 47. Process lock & process pool & Collaboration
- 2022 Xinjiang latest road transportation safety officer simulation examination questions and answers
- Promethus
猜你喜欢
原生表格-滚动-合并功能
2022-07-02 advanced network engineering (XV) routing policy - route policy feature, policy based routing, MQC (modular QoS command line)
BOC protected alanine porphyrin compound TAPP ala BOC BOC BOC protected phenylalanine porphyrin compound TAPP Phe BOC Qi Yue supply
Gym welcomes the first complete environmental document, which makes it easier to get started with intensive learning!
Leetcode 1189. Maximum number of balloons (special character count)
Popularize the basics of IP routing
Chapter 1: sum of three factorials, graph point scanning
第一章:求奇因数代数和,求同吗小数和s(d, n),简化同码小数和s(d, n),拓广同码小数和s(d, n)
04 -- QT OpenGL two sets of shaders draw two triangles
Chapter 1: extend the same code decimal sum s (D, n)
随机推荐
NFT without IPFs and completely on the chain?
2. Template syntax
5. MVVM model
Day10 -- forced login, token refresh and JWT disable
BOC protected alanine porphyrin compound TAPP ala BOC BOC BOC protected phenylalanine porphyrin compound TAPP Phe BOC Qi Yue supply
Bool blind note - score query
CesiumJS 2022^ 源码解读[7] - 3DTiles 的请求、加载处理流程解析
Today's work summary and plan: February 14, 2022
第二章:4位卡普雷卡数,搜索偶数位卡普雷卡数,搜索n位2段和平方数,m位不含0的巧妙平方数,指定数字组成没有重复数字的7位平方数,求指定区间内的勾股数组,求指定区间内的倒立勾股数组
Day10 ---- 强制登录, token刷新与jwt禁用
2022-07-02 网工进阶(十五)路由策略-Route-Policy特性、策略路由(Policy-Based Routing)、MQC(模块化QoS命令行)
Global and Chinese market of full authority digital engine control (FADEC) 2022-2028: Research Report on technology, participants, trends, market size and share
Micro service knowledge sorting - asynchronous communication technology
2022-06-25 网工进阶(十一)IS-IS-三大表(邻居表、路由表、链路状态数据库表)、LSP、CSNP、PSNP、LSP的同步过程
Chapter 2: 4-digit Kaplan number, search even digit Kaplan number, search n-digit 2-segment sum square number, m-digit ingenious square number without 0, specify the number to form a 7-digit square nu
Xctf attack and defense world crypto advanced area best_ rsa
03 -- QT OpenGL EBO draw triangle
2022-06-27 advanced network engineering (XII) IS-IS overhead type, overhead calculation, LSP processing mechanism, route revocation, route penetration
2022-06-28 网工进阶(十三)IS-IS-路由过滤、路由汇总、认证、影响ISIS邻居关系建立的因素、其他命令和特性
Sword finger offer 30 Stack containing min function