当前位置：网站首页>Summary of wuenda's machine learning course (11)_ Support vector machine

Summary of wuenda's machine learning course (11)_ Support vector machine

2022-06-28 00:11:00 【51CTO】

12.1 Objective optimization

（1） Here is the logistic regression and the cost function of a single sample

Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Kernel function

Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Support vector machine _02

（2） First, the purple line in the above figure will be used （ be called cost1 perhaps cost0） The substitution curve of , Then the number of samples m Get rid of , The final will be C Instead of 1/λ（ It's understandable , But not exactly ）, Thus, the cost function of logistic regression is realized to SVM Transformation .

Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Kernel function _03

（3）SVM The output of will no longer be the probability of logistic regression , And that is 0 perhaps 1：

Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Support vector machine _04

12.2 The intuitive understanding of the big boundary

（1） First of all, z The requirements are more stringent , In logistic regression, only greater than or less than zero is required ,, This will be greater than or equal to 1 Or less than or equal to -1.

Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Kernel function _05

（2） hypothesis C Very big time , Our optimization will try to make the first term zero , Suppose we can get such a parameter , Then we can convert the cost function into ：

Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Kernel function _06

That is, to solve the previous minimum value under the following constraints .

（3）C Very big time （ namely λ A very small ）, Will try to meet the above constraints , This results in being very sensitive to outliers （ Over fitting ）, As shown below ：

Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Support vector machine _07

Then you will get a purple line , If you will C Reduce... Appropriately , You will get a satisfactory black line . namely C When not so big , Some outliers can be ignored .

C It's the penalty factor , It can be understood as adjusting two indicators in the optimization direction （ Interval size , Classification accuracy ） The weight of preferences , Tolerance to error ,C The higher the , The more intolerable the error is , Easy to overfit ,C The smaller it is , Easy under fitting ,C Too large or too small , Poor generalization ability .

（3） Support vector machines are often called maximum distance classifiers , stay C This is true when you are very big , but C Not so big , Will not be , As the example above shows . But this understanding is helpful to understand SVM Of .

（4）C Larger equivalent λ smaller , Over fitting occurs ; On the contrary, there is under fitting .

12.3 The largest boundary classification behind Mathematics （ Elective ）

（1） The inner product of a vector ： The product of the projection length from one vector to another and the norm of the vector , That is, multiply and add the corresponding coordinates .

Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Support vector machine _08

（2） The objective function is to make θ As small as possible , At this time, just make x stay θ The projection on the is as large as possible , Can be in θ The smaller the value, the constraint conditions are satisfied , This is it. SVM The math behind it .

Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Kernel function _09

（3）θ And boundary rendering 90° vertical , in addition θ₀ When it is zero, the boundary passes through the origin , On the contrary, it does not pass through the origin .

12.4 Kernel function 1

（1） If the polynomial is directly used to fit the following boundary , Ken can require a polynomial of a very high degree , There are many features .

Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Kernel function _10

（2） utilize x The various features of our pre selected landmarks （landmark）l^（1）,l^（2）,l^（3）, The degree of approximation of the new features f₁,f₂,f_3.

Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Support vector machine _11

Above is a Gaussian kernel function , notes ： This function has nothing to do with normal distribution , It just looks like it .

（3） The closer to the landmark, the result f The closer the 1, The farther f The closer the 0.

Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Kernel function _12

（3） It will be easy to classify by the following formula ：

Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Support vector machine _13

（4） The result of kernel function calculation is a new feature .

12.5 Kernel function 2

（1） The number of landmarks is set to the number of samples m, That is, the location of each sample is the location of the landmark ：

Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Cost function _14

（2） Apply kernel function to support vector machine ,

Given x, Computing new features f, When θ^Tf>0 when , forecast y=1, Otherwise, vice versa .

The corresponding modification cost function is ： Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Kernel function _15

Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Support vector machine _16

In the specific implementation process , You also need to fine tune the final regularization , In the calculation Wu enda 《 machine learning 》 Course summary (11)_ Support vector machine _ Kernel function _15 when , use θ^TMθ Instead of θ^Tθ.M It is related to the selected kernel function , Use a few blocks of related libraries to use the kernel function SVM.

Without kernel function SVM It is called a linear kernel function .

（3） Here are two parameters of support vector machine C and σ Influence ：

C=1/λ;

C large , amount to λ smaller , May cause over fitting , High variance ;

C More hours , amount to λ more , May lead to under fitting , High deviation ;

σ large , May lead to low variance , High deviation .

σ More hours , May cause low deviation , High variance .

12.6 Using support vector machines

（1） Although you don't have to write it yourself SVM function , Use related libraries directly , But a few things need to be done ：

1. It's to propose parameters C The choice of . It has been discussed in the previous video C The influence of square deviation .

2. Select kernel parameters or similar functions you want to use .

（2） Here are the choices of logistic regression and support vector machine ：

1. Compared to the number of samples m, Characteristic number n When you are much older , There is not so much data to train a very complex model , Consider using SVM.

2. If n smaller , and m Medium size , for example n stay 1-1000 Between , and m stay 10-1000 Between , Support vector machines using Gaussian functions .

3. If n smaller , and m more , for example n stay 1-1000 Between , and m Greater than 50000, It's very slow to use support vector , The solution is to create and add more features , Then use logistic regression or support vector machine without kernel function .

Neural network can perform well in the above three cases , But neural network training can be very slow , The main reason for choosing support vector machine is that its cost function is convex function , There is no local minimum .

author ： Your Rego

The copyright of this article belongs to the author , Welcome to reprint , But without the author's consent, the original link must be given on the article page , Otherwise, the right to pursue legal responsibility is reserved .

原网站

版权声明
本文为[51CTO]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/179/202206272133224337.html