当前位置:网站首页>Support vector machine -svm+ source code
Support vector machine -svm+ source code
2022-06-11 04:48:00 【Panbohhhhh】
Statement : Personal learning work , Constrained by time and personal ability , For your reference only . Welcome to discuss .
Support vector machine (SVM), A powerful discriminant model for classification and regression .
First, we will reconsider mapping features to higher dimensional spaces .
Then we will discuss how support vector machine can alleviate the computational and generalization problems encountered in learning from data mapped to high-dimensional space .
1.1 Nuclear and nuclear techniques
The perceptron uses a hyperplane as the decision boundary to distinguish positive instances from negative instances .
Decision boundary formula :
f ( x ) = < w , x > + b f(x)=<w,x>+b f(x)=<w,x>+b
Prediction formula :
h ( x ) = s i g n ( f ( x ) ) h(x)=sign(f(x)) h(x)=sign(f(x))
This way, , We call it primitive form
And in support vector machines , We need to turn it into a dual form .
f ( x ) = < w , x > + b = ∑ a i y i < x i , x > + b f(x)=<w,x>+b=\sum a_{i}y_{i}<x_{i},x>+b f(x)=<w,x>+b=∑aiyi<xi,x>+b
The biggest difference between the primal form and the dual form is , In the original form, the inner product of model parameters and test case eigenvectors is calculated , The dual form calculates the inner product of the eigenvectors of the training and test cases
And it is this feature , Used to deal with linear unclassifiable .
First , We formalize the definition of feature mapping to higher dimensional space
Mapping features to a higher dimensional space , In this space, the feature is linearly related to the response variable . Mapping increases the number of features by creating a quadratic term of the original feature . These synthetic features allow us to use a linear model to represent a nonlinear model .
in general , A mapping, in terms of formulas , as follows :
x → ϕ ( x ) x\rightarrow \phi (x) x→ϕ(x)
ϕ : R d → R D \phi :R^{d}\rightarrow R^{D} ϕ:Rd→RD
The effect is as follows [1]

In the second picture , The green plane can perfectly divide purple and red , Data becomes separable .
Back to the dual form of the decision boundary , You can see that the eigenvector wisdom appears in the dot product . We can map the data to a higher dimensional space by performing a mapping on the eigenvector .
f ( x ) = ∑ a i y i < x i , x > + b f(x)=\sum a_{i}y_{i}<x_{i},x>+b f(x)=∑aiyi<xi,x>+b
f ( x ) = ∑ a i y i < ϕ ( x i ) , ϕ ( x ) > + b f(x)=\sum a_{i}y_{i}<\phi (x_{i}),\phi (x)>+b f(x)=∑aiyi<ϕ(xi),ϕ(x)>+b
As we said earlier , Mapping allows us to represent more complex models , But it also introduces the computational problem and the generalization problem .
It means , It requires a lot of processing power to map and calculate the dot product of eigenvectors .
and , Although we map the eigenvector to a higher dimensional space , Eigenvectors still only appear in point product calculations . The result of the dot product is a scalar , A scalar is calculated , We no longer need the mapped eigenvectors .
If we can find the scalar which is the same as the dot product of the mapped vector by different methods , That's easy .
Nuclear skills
A function that verifies such , As long as the original eigenvector is given , It can return the same dot product value as its associated mapping eigenvector . Kernel does not directly map eigenvectors to a higher dimensional space , Or calculate the dot product of the mapping vector .
The kernel produces the same value through a series of different operations , These operations can be effectively calculated .
The definition formula of core is as follows :
K ( x , z ) = < ϕ ( x ) , ϕ ( z ) > K(x,z)=<\phi (x),\phi (z)> K(x,z)=<ϕ(x),ϕ(z)>
Here's how verification works .
Suppose we have two eigenvectors x and z,
x = ( x 1 , x 2 ) x=(x_{1},x_{2}) x=(x1,x2)
z = ( z 1 , z 2 ) z=(z_{1},z_{2}) z=(z1,z2)
In practice , We want to be able to map eigenvectors to higher dimensional spaces .
ϕ ( x ) = ( x 1 2 , x 2 2 , 2 x 1 x 2 ) \phi (x)=(x{_{1}}^{2},x{_{2}}^{2},\sqrt{2}x_{1}x_{2}) ϕ(x)=(x12,x22,2x1x2)
As described earlier , The dot product of the normalized eigenvector is mapped as follows :
< ϕ ( x ) , ϕ ( z ) > = ( x 1 2 , x 2 2 , 2 x 1 x 2 ) , ( z 1 2 , z 2 2 , 2 z 1 z 2 ) <\phi (x),\phi (z)>=(x{_{1}}^{2},x{_{2}}^{2},\sqrt{2}x_{1}x_{2}),(z{_{1}}^{2},z{_{2}}^{2},\sqrt{2}z_{1}z_{2}) <ϕ(x),ϕ(z)>=(x12,x22,2x1x2),(z12,z22,2z1z2)
The kernel function can produce values equal to the dot product of the mapped feature necklace :
K ( x , z ) = < x , z > 2 = ( x 1 z 1 + x 2 z 2 ) 2 = x 1 2 z 1 2 + 2 x 1 z 1 x 2 z 2 + x 2 2 z 2 2 K(x,z)=<x,z>^{2}=(x_{1}z_{1}+x_{2}z_{2})^{2}=x{_{1}}^{2}z{_{1}}^{2}+2x{_{1}}z{_{1}}x{_{2}}z{_{2}}+x{_{2}}^{2}z{_{2}}^{2} K(x,z)=<x,z>2=(x1z1+x2z2)2=x12z12+2x1z1x2z2+x22z22
K ( x , z ) = < ϕ ( x ) , ϕ ( z ) > K(x,z)=<\phi (x),\phi (z)> K(x,z)=<ϕ(x),ϕ(z)>
This is what the formula means , I will write an example of personal understanding . First, it is easy to understand , The second is to increase persuasion , This is just for convenience , I don't need the formula editor .
x=(4,9)
z=(3,3)
K(x,z)=4X4 X3x3 +2X4X3X9X3+9X9X3X3=1521
< ϕ ( x ) , ϕ ( z ) > = < ( 4 2 , 9 2 , 2 X 4 X 9 ) , ( 3 2 , 3 2 , 2 X 3 X 3 ) > = 1521 <\phi (x),\phi (z)>=<(4^{2},9^{2},\sqrt{2}X4X9),(3^{2},3^{2},\sqrt{2}X3X3)>=1521 <ϕ(x),ϕ(z)>=<(42,92,2X4X9),(32,32,2X3X3)>=1521
Kernel function K(x,z) The dot product of sum mapping vector is generated < ϕ ( x ) , ϕ ( z ) > <\phi (x),\phi (z)> <ϕ(x),ϕ(z)> Calculate the same value , But it does not explicitly map feature vectors to high-dimensional space , And requires relatively few mathematical operations .
In this case , In fact, only two-dimensional vectors are used .
However, even data sets with only a small number of features will generate a large dimensional mapping feature space .
scikit-learn Class libraries have some commonly used kernel functions :
Polynomial kernel function : K ( x , z ) = ( γ < x − z > + δ ) k K(x,z)=(\gamma <x-z>+\delta )^{k} K(x,z)=(γ<x−z>+δ)k
Square kernel or k be equal to 2 Polynomial kernel of , It is often used in naturallanguageprocessing .
S Tympanic nucleus : K ( x , z ) = t a n h ( γ < x − z > + δ ) k K(x,z)=tanh(\gamma <x-z>+\delta )^{k} K(x,z)=tanh(γ<x−z>+δ)k, among y and r These two parameters are super parameters that can be fine tuned in cross validation .
For the problems that need to be treated with nonlinear models , Gaussian kernel is the first priority . Gaussian kernel is a radial basis function . The hyperplane of the decision boundary in the mapped eigenvector space is similar to the hyperplane of the decision boundary in the source space .
The feature space produced by Gaussian kernel can have infinite dimensions , This is a feature that other feature spaces cannot have .
The definition of Gaussian kernel function is as follows :
K ( x , z ) = e x p ( − γ ∣ x − z ∣ 2 ) K(x,z)=exp(-\gamma |x-z|^{2}) K(x,z)=exp(−γ∣x−z∣2)
Use SVM when , It is important to scale features , But when using Gaussian kernels , Feature scaling is particularly important .
It is very difficult to choose a kernel function .
Ideally , A kernel function can measure the similarity between instances in a way that is effective for the task .
Although kernel functions are often associated with SVM Use it together , But it can also be used with any model that can be expressed as a dot product of two eigenvectors
Including but not limited to logistic regression , perceptron , Principal component analysis PCA.( After that, I will follow up )
1.2 Maximum interval classification and support vector
Or use actual examples to describe :
These three lines , That is what we call the three classifiers , Can be divided into sample categories .
Intuitively speaking , The red line is the best
But for technicians , What we want is rigor and argument .

Suppose that the line in the graph is a logistic regression classifier . Marked as A Instances of are far from decision boundaries , It has a high probability of being predicted as a positive class . Marked as B Instances of are still predicted to belong to the forward class , However, the closer the instance is to the decision boundary, the lower the probability of being predicted as a positive class .
Last , Marked as C The instance of has a very low probability of being predicted as a positive class , Even a change in the smile of the training secretary will lead to a change in the prediction class .
How to solve ?
The most credible prediction is the training instance far away from the decision boundary , Therefore, we can use its function interval to evaluate the reliability of the prediction .
The functional intervals of the training set are as follows :
f u n c t = m i n y i f ( x i ) funct=min y_{i}f(x_{i}) funct=minyif(xi)
f ( x ) = < w , x > + b f(x)=<w,x>+b f(x)=<w,x>+b
example A The interval between the functions of , example C The function interval of is very small . If C Classification error , The function interval will be negative .
The function interval is equal to 1 An instance of is called a support vector .(? Why? )
Associated with the function intervals are geometric intervals , Or the maximum width of the support vector . The geometric interval is equal to the normalized function interval , Because the function interval can pass through w Zoom ,
Therefore, it is necessary to standardize the function interval .
When w Is a unit vector , The geometric interval is equal to the functional interval .
The parametric model to maximize the geometric spacing can be obtained by constraining the optimization conditions .
m i n 1 2 < w , w > min\frac{1}{2}<w,w> min21<w,w>
y i ( < w i x i > + b ) ≥ 1 y_{i}(<w_{i}x_{i}>+b)\geq 1 yi(<wixi>+b)≥1
( Problems to be solved :)
SVM A feature of : Constrained optimization problem is a convex optimization problem , Its local minimum is also the global minimum ( How to prove ?)
It is a quadratic programming problem to find the parameters that maximize the geometric interval , The problem Sequence minimum optimization algorithm is usually used SMO solve .( Not studied )
Use examples to test : Classification of handwritten Numbers
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_mldata
import matplotlib.cm as cm
from sklearn import datasets
# Incoming data set
mnist = datasets.fetch_mldata("mnist-original", data_home='/home/tianchi/myspace/datapanbo/mnist')
#mnist = fetch_mldata("MINIST original", data_home='/home/tianchi/myspace/datapanbo/mnist')
counter = 1
for i in range(1,4):
for j in range(1,6):
plt.subplot(3,5,counter)
plt.imshow(mnist.data[(i-1) * 8000+j].reshape((28,28)),cm.Greys_r)
plt.axis('off')
counter+=1
plt.show

from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
X,y =mnist.data, mnist.target
X = X/255.0*2-1
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=11)
pipeline =Pipeline([
('clf',SVC(kernel='rbf', gamma=0.01, C=100))
])
parameters ={
'clf__gamma':(0.01, 0.03, 0.1, 0.3, 1),
'clf__C' : (0.1, 0.3, 1, 3, 10 , 30),
}
grid_search =GridSearchCV(pipeline,parameters,n_jobs=2, verbose=1, scoring='accuracy')
grid_search.fit(X_train[:100],y_train[:100])

Here I want to say , Because it is to test the code to run through , So I chose 100 Data .
print(' The best score : %0.3f' %grid_search.best_score_)
print(' The best set of parameters :')
best_parameters= grid_search.best_estimator_.get_params()
for param_name in sorted(parameters.keys()):
print('\t%s: % r '%(param_name,best_parameters[param_name]))
predictions=grid_search.predict(X_test)
print(classification_report(y_test,predictions))

This is to use the computer to run 10000 data , It took about an hour :63min
I will upload the following codes and data sets to git
GIT Code address
reference
1. Kernel function https://blog.csdn.net/zb1165048017/article/details/49280979
边栏推荐
- Acts: how to hide defects?
- Getting started with mathmatica
- The third small class discussion on the fundamentals of information and communication
- Pychart displays pictures with imshow
- 梅州二恶英实验室建设注意事项分享
- C language test question 3 (program multiple choice question - including detailed explanation of knowledge points)
- 华为设备配置通过GRE隧道接入虚拟专用网
- Decision tree (hunt, ID3, C4.5, cart)
- Cartographer learning records: 3D slam part of cartographer source code (I)
- Record of serial baud rate
猜你喜欢

博途仿真时出现“没有针对此地址组态任何硬件,无法进行修改”解决办法

Feature engineering feature dimension reduction

四大MQ的区别

codesys 获取系统时间

Pychart displays pictures with imshow

New UI learning method subtraction professional version 34235 question bank learning method subtraction professional version applet source code

Redis master-slave replication, sentinel, cluster cluster principle + experiment (wait, it will be later, but it will be better)

Free data | new library online | cnopendata data data of national heritage stores and auction enterprises

华为设备配置通过GRE接入虚拟专用网
![[Transformer]On the Integration of Self-Attention and Convolution](/img/64/59f611533ebb0cc130d08c596a8ab2.jpg)
[Transformer]On the Integration of Self-Attention and Convolution
随机推荐
hiredis 判断主节点
Parametric contractual learning: comparative learning in long tail problems
Meedu knowledge payment solution v4.5.4 source code
Google drive download failed, network error
梅州P2实验室建设方案阐述
Ican uses fast r-cnn to get an empty object detection result file
PCB地线设计_单点接地_底线加粗
用万用表检测数码管
How to purchase 25g optical network card
Record of serial baud rate
C language test question 3 (program multiple choice question - including detailed explanation of knowledge points)
碳路先行,华为数字能源为广西绿色发展注入新动能
Split all words into single words and delete the product thesaurus suitable for the company
Exness: liquidity series - order block, imbalance (II)
Decision tree (hunt, ID3, C4.5, cart)
Yolact paper reading and analysis
How can smart construction sites achieve digital transformation?
Redis persistence (young people always set sail with a fast horse, with obstacles and long turns)
Record an ES accident
Emlog new navigation source code / with user center