当前位置:网站首页>Machine learning from entry to re entry: re understanding of SVM
Machine learning from entry to re entry: re understanding of SVM
2022-06-12 07:19:00 【Nidiya is trying】
Let's start with some nonsense . A long time ago, I wanted to open a blog similar to the technology blog , However, I feel that I have no technology worth recording , Add to that the constant deluge of procrastination , I haven't started a blog yet . Recently, I think that blogging may help me sort out my knowledge , Keep track of your learning ; Plus there is an intermittent frenzy AI Followers , So I decided to open a blog , Record your recent knowledge 、AI Technology or some other interested dency . It is likely to be updated from time to time , But I will try to urge myself to update it in time .( Shout : The skin tightens ! Branch up !) nitrogen , As a rookie , There must be many shortcomings and mistakes , I also hope you will gently point it out to me , I will listen carefully to your suggestions ~~~
Here we go ~~~
The support vector machine in machine learning is a common question in interview , I was also questioned in the interview before !!! Several times , Although I have studied , But not deep enough , The foundation is not solid enough , So now Yanyi continues to take this course . Just a few days ago, I combed several places I didn't understand before in combination with what the teacher said in class and what other leaders summarized ( Mainly personal understanding , I won't cover the derivation of the formula here ), So the first blog of the small vegetable fish was born .
One 、 Development background
1963 year , Put forward the original SVM;
1992 year , introduce Kernel function It's solved Nonlinear classification problem ;
1995 year , Put forward Maximize soft spacing It's solved Noise problem ( Misclassification problem );
2001 year , Put forward SVC Way to complete Unsupervised learning ;
2005 year , Put forward and compare a variety of multi classification SVM Method .
Two 、 thought
While ensuring the correct classification , Let the point closest to the hyperplane ( These points are called support vectors ) To the hyperplane The biggest gap . In fact, it means , Maximize the spacing between the hyperplane and the support vector , This can best ensure the accuracy of classification .
3、 ... and 、 Comparison with perceptron model
We all know that the perceptron is a kind of aging machine ( I didn't mean to say it was bad ) The binary linear classification model , It's the basis of neural networks and support vector machines . Its idea is Try to find a hyperplane that can separate data sets . A place to pay attention to , Its loss function is only for misclassification points !!! Its optimization goal is to make The distance from the misclassification point to the hyperplane is the smallest , The optimization goal of support vector is to make Maximum distance between support vector and hyperplane .( This “ Maximum ”、“ Minimum ” To understand the concept of , In fact, it is not difficult to understand )
Four 、SVM The goal of optimization
Undergraduate studies svm When , I have seen many pairs of svm Different interpretations of the optimization objectives of , Very ignorant , But because the end of the period does not involve , So I ended the course in a muddle , I didn't delve into it after class , Therefore, I continue to be ignorant until I graduate from college . The main reason for ignorance is , Its objective function will involve “
”, And then again “
” Form like this . I haven't been ignorant recently , In fact, they are in the process of explanation “ In silence ” The empirical risk and structural risk are introduced .
1、 Empirical risk : Empirical risk is the average minimization of the loss function of all sample points in the training set . If only empirical risk is considered , There will be over fitting , Like neural networks .
The loss function used in support vector machine is “
” Form like this , among
It is called hinge loss function , When the sample point
Be correctly classified and
when , The loss is 0.
If you want to know the model
The ability to predict all the samples in the training sample , It is only necessary to calculate the loss of all sample points once and then add it up , namely :
, among
Is the hinge loss function ,
Experience risk . Empirical risk model
The better the fitting degree of the training set . The so-called empirical risk minimization is to minimize this formula .
2、 Structural risk : Add a regularization term after the empirical risk function ( Penalty item ) Structural risk ( Because we need to find a balance between training error and model complexity , Prevent over fitting , Reduce generalization error , So the regularization term is introduced ), At this time, the optimization goal becomes :
.
This structural risk and svm Original optimization objective
It is equivalent. .
( The proof can refer to Li Hang's statistical learning method P131)
【 Add : After checking the information, we know that , Statistical learning theory shows that , Learning about machines The actual risk is determined by the training sample Empirical risk and Confidence range Two parts . And I said before SVM The idea is to solve the separation hyperplane which can correctly divide the training data set and has the largest geometric interval , The reason is that the classification hyperplane can satisfy the minimum empirical risk , The maximum interval satisfies the minimum confidence range , In this way, less actual risk can be obtained , So as to achieve optimal structural risk , The tradeoff takes into account empirical risk and confidence limits , With good promotion ability . therefore , Support vector machine is a classifier designed by structural risk minimization criterion .】
( About experience risk 、 Structural risk 、 The relationship between confidence intervals , It can be moved (12 Bar message ) The way of machine learning ( 7、 ... and ) Support vector machine _asd2479745295 The blog of -CSDN Blog )
5、 ... and 、 Why is it called support vector machine
We know that with the introduction of Lagrange multipliers
The expression of is
, from KKT The condition is known ,
when ,
, So for those samples that are correctly classified ( As shown in the figure below pic1 Sample points in the blue circle , They are on the upper and lower sides of the support vector ), They are concerned about
Weight contribution of
by 0, That is, non support vector samples do not affect
Value , therefore
The value of is only related to the support vector . Hence the name of support vector machine .

6、 ... and 、 Kernel function
1、 effect : Everyone is familiar with the function of kernel function , Is to take the sample " mapping " To high dimensional space ( This mapping is not a real mapping , The second point will be ), Make the samples that are not separable in the low dimensional feature space linearly separable in the high dimensional feature space .
2、 principle : that , What's the principle of it ? It can be seen that there are both support vector machine optimization function and prediction function “
” Form like this , It represents the inner product of two vectors . If you use
Represents a mapping from a low dimensional space to a high dimensional space , be
After mapping, it can be expressed as 
, So the formula of optimization function and prediction function is
, But it would be too complicated to calculate such inner product in high-dimensional space , So the kernel function was born . Define kernel function
, Its meaning is
and
The inner product in the feature space is equal to their passage in the original sample space
The result of function calculation . in other words , Actually Kernel function is a simple method to calculate inner product after mapping to high dimensional space . That's why the kernel function is calculated in a low dimension , But the classification effect is shown in the high dimension .
I always thought that kernel function is a mapping function that maps features to high-dimensional space , Only here do we know that its essence is actually an inner product of vectors in high-dimensional space directly in low-dimensional space Simple calculation method .
That's all for my partial understanding of support vector machines , As for other concepts , For example, function interval 、 Geometric interval 、 The dual problem 、 And why there are so many numbers in these formulas 1······ There are detailed explanations in watermelon books , I won't repeat it . So on the whole , This article has no technical content , The main thing is to have more comprehensible content ~
Suggestions and comments are very welcome !!!!!
Do not accept malicious criticism , It's just that you're right !
See next article ~~~
------------------------2022.1.9 to update ------------------------
7、 ... and 、 Semi supervised support vector machine
1、 Semi-supervised learning : Let the learner automatically use unlabeled samples for self-learning , Improve learning performance . Can be understood as in the supervised SVM In the classification algorithm, unlabeled samples are added to realize semi supervised classification .
2、 Algorithmic thought : Iterative search for optimization objectives

among
Corresponding to the marked sample ,
Approximate solutions corresponding to unlabeled samples ,
、
It is a parameter to balance the effect of marked data and unlabeled data on the results .( In order to make the labeled samples account for a larger proportion in solving the hyperplane, because the predicted pseudo markers are largely deviated from the true values , At initialization time .
The ratio should be set
Small value .
3、 Algorithm steps : Use labeled samples to learn a “ Master ”SVM—— Use this master to predict unlabeled data , The prediction result is used as a pseudo marker —— Plug in L To solve hyperplane and relaxation vector —— Find two tags that are assigned as heterogeneous ( Labels are multiplied by negative numbers ) Unlabeled samples that are likely to be wrong ( The sum of relaxation variables is greater than 2), Exchange their marks —— Re based on optimization objectives L Solve the separation plane and relaxation vector —— Gradually increase
, until
, Stop the algorithm .
4、 Some improvements : Because each pair of unlabeled samples needs to be adjusted , The amount of computation generated is very large . Therefore, there are also some optimization studies for this problem .
边栏推荐
- linux下怎么停止mysql服务
- [image denoising] image denoising based on partial differential equation (PDE) with matlab code
- Tradeoff and selection of SWC compatible Polyfill
- Noi openjudge computes the n-th power of 2
- Summary of software testing tools in 2021 - unit testing tools
- Source code learning - [FreeRTOS] privileged_ Understanding of function meaning
- Planning and design of 1000 person medium-sized campus / enterprise network based on ENSP and firewall (with all configuration commands)
- LVDS drive adapter
- 2022年G3锅炉水处理复训题库及答案
- 右击文件转圈卡住、刷新、白屏、闪退、桌面崩溃的通用解决方法
猜你喜欢

Detailed explanation of 8086/8088 system bus (sequence analysis + bus related knowledge)

SQL -- course experiment examination

Kotlin plug-ins kotlin Android extensions

Putty installation and use

2022年危险化学品经营单位安全管理人员特种作业证考试题库及答案

最近面了15个人,发现这个测试基础题都答不上来...

Postman splice replacement parameter loop call interface

JDE 对象管理工作平台介绍及 From 的使用

Unity用Shader实现UGU i图片边缘选中高亮

Study on display principle of seven segment digital tube
随机推荐
Network packet loss troubleshooting
lambda 函数完美使用指南
2022R2移动式压力容器充装试题模拟考试平台操作
Detailed explanation of multi coordinate transformation in ROS (example + code)
Decoupling in D
Descscheduler secondary scheduling makes kubernetes load more balanced
ROS dynamic parameter configuration: use of dynparam command line tool (example + code)
Learning to continuously learn paper notes + code interpretation
1.3-1.9 summary
The most understandable explanation of coordinate transformation (push to + diagram)
RT thread studio learning (x) mpu9250
esp32 hosted
Imx6q pwm3 modify duty cycle
JDE 对象管理工作平台介绍及 From 的使用
Android studio uses database to realize login and registration interface function
The function of C language string Terminator
Pyhon的第五天
Decryption game of private protocol: from secret text to plaintext
1. Foundation of MySQL database (1- installation and basic operation)
Personalized federated learning using hypernetworks paper reading notes + code interpretation