当前位置:网站首页>Machine learning notes - supervised learning memo list
Machine learning notes - supervised learning memo list
2022-06-13 06:36:00 【Sit and watch the clouds rise】
One 、 Introduction to supervised learning
Given a set of data points Associate to a set of results
, We want to build a classifier , Learn how to learn from
forecast
.
1、 Forecast type
The following table summarizes the different types of forecasting models :
2、 Model type
The following table summarizes the different models :
Two 、 Symbols and general concepts
1、Hypothesis
This assumption is marked as And the model we chose . For a given input data
, The prediction output of the model is
.
2、Loss function
The loss function is a function , It will be compared with the real data value
The corresponding predicted value
As input , And output their differences . The following table summarizes the common loss functions :
3、Cost function
cost function It is usually used to evaluate the performance of the model , Its loss function
The definition is as follows :
4、Gradient descent
By paying attention to Learning rate , The updating rule of gradient descent is a function of learning rate and cost
Shown by the following :
remarks : Stochastic gradient descent (SGD) Is to update the parameters according to each training sample , Batch gradient descent is performed on a batch of training samples .
5、Likelihood
The given parameters Model of
The likelihood of is used to find the optimal parameters by likelihood maximization
. We have :
remarks : In practice , We use log likelihood, which is easier to optimize .
6、Newton's algorithm
Newton's algorithm is a method of solving The numerical method of , bring
. The update rules are as follows :
remarks : Multidimensional generalization , Also known as Newton-Raphson Method , There are the following update rules :
3、 ... and 、 Linear model
1、 Linear regression
Let's assume here that
(1)Normal equations
By paying attention to the design matrix , Minimize the cost function
The value of is a closed form solution , bring :
(2)LMS algorithm
By paying attention to α Learning rate , Least mean square (LMS) Algorithm to m Update rules for training sets of data points , Also known as Widrow-Hoff Learning rule , As shown below :
remarks : The update rule is a special case of gradient rise .
(3)LWR
Local weighted regression , Also known as LWR, Is a variant of linear regression , It passes through Each training sample is weighted in its cost function , It's made up of parameters
Defined as :
2、 Classification and logical regression
(1)Sigmoid function
sigmoid function , Also known as logical functions , The definition is as follows :
(2)Logistic regression
Let's assume here that . We have the following forms :
remarks : There is no closed form solution for logistic regression .
(3)Softmax regression
When there is more than 2 Result categories ,softmax Return to , Also called multi class logistic regression , Used to generalize logistic regression . By convention , We set up , This makes every class
Bernoulli parameter of
by :
3、 Generalized linear model
(1)Exponential family
If a kind of distribution can use natural parameters ( Also known as canonical arguments or link functions )、、 Sufficient statistics
And logarithm , It belongs to log-partition function
as follows :
remarks : We often have . Besides ,
It can be regarded as a standardized parameter , It will ensure that the sum of the probabilities is 1.
The following table summarizes the most common exponential distributions :
(2)Assumptions of GLMs
Generalized linear model (GLM) To predict random variables As
Function of , And depends on the following 3 A hypothesis :
(1) (2)
(3)
remarks : Ordinary least square method and logistic regression are special cases of generalized linear model .
Four 、 Support vector machine
The goal of support vector machine is to find the line with the minimum distance and the maximum distance from the line .
1、Optimal margin classifier
Best margin classifier That's true :
, among
Is the solution of the following optimization problem :
remarks : The decision boundary is defined as
2、Hinge loss
hinge loss be used for SVM Set up , The definition is as follows :
3、Kernel
Given a feature map , We will use the kernel
The definition is as follows :
In practice , from Defined kernel
It is called Gaussian kernel , More commonly used .
remarks : We say we use “ Kernel techniques ” To use the kernel to compute cost functions , Because we don't really need to know the explicit mapping , This is usually very complicated . contrary , Just the value
.
4、Lagrangian
We will Lagrange The definition is as follows :
remarks : coefficient It is called Lagrange multiplier .
5、 ... and 、 Generative learning
The generation model first attempts to estimate To learn how to generate data , Then we can use Bayesian estimation
The rules .
1、Gaussian Discriminant Analysis
(1)Setting
Gaussian discriminant analysis hypothesis and
and
That's true :
(1) (2)
(3)
(2)Estimation
The following table summarizes the estimates we found when maximizing likelihood :
2、Naive Bayes
(1)Assumption
Naive Bayesian model assumes that the characteristics of each data point are independent :
(2)Solutions
Maximizing log likelihood gives the following solution :
remarks : Naive Bayes is widely used in text classification and spam detection .
6、 ... and 、 Tree based and integrated approach
These methods can be used for regression and classification problems .
1、CART
Classification and regression trees (CART), Commonly known as decision tree , Can be expressed as a binary tree . Their advantage is that they are easy to explain .
2、Random forest
It is a tree based technology , It uses a large number of decision trees constructed from randomly selected feature sets . Contrary to a simple decision tree , It is highly unexplainable , But its generally good performance makes it a popular algorithm . remarks : Random forest is an integrated method .
3、Boosting
The idea of the lifting method is to combine several weak learners into a stronger one . The main conclusions are as follows :
Adaptive boosting | Gradient boosting |
• High weights are put on errors to improve at the next boosting step • Known as Adaboost | • Weak learners are trained on residuals • Examples include XGBoost |
7、 ... and 、 Other nonparametric methods
1、k-nearest neighbors
k- Nearest neighbor algorithm , Often referred to as k-NN, It's a nonparametric method , The response of data points is determined by the k The nature of a neighbor determines . It can be used for classification and regression settings .
notes : Parameters k The higher the , The bigger the deviation , Parameters k The lower the , The higher the variance is. .
8、 ... and 、 Learning theory
1、Union bound
Make by k Events . We have :
2、Hoeffding inequality
Make From the parameters
Extracted from Bernoulli distribution m iid Variable . set up
For their sample mean , also
Fix . We have :
remarks : This inequality is also called Chernoff bound .
3、Training error
For a given classifier h, We define training error , Also known as empirical risk or empirical error , as follows :
4、Probably Approximately Correct (PAC)
PAC It's a framework , Many results of learning theory are proved under this framework , And has the following set of assumptions :
The training set and the test set follow the same distribution
Training samples are drawn independently
5、Shattering
Given a set And a set of classifiers
, We said
about
If for any set of labels
, We have :
6、Upper bound theorem
Make Is a finite hypothetical class , bring
And make
And sample size
Is constant . then , At least
Probability , We have :
7、VC dimension
Given an infinite hypothetical class Of Vapnik-Chervonenkis (VC) dimension , Write it down as
Be being
The size of the largest set of fragments .
notes : Of VC Dimension is 3.
8、Theorem (Vapnik)
set up ,
and
It's the number of training samples . At least
Probability , We have :
边栏推荐
- [virtual machine] VMware virtual machine occupies too much space. Solution
- Simple use of event bus
- vue3路由缓存组件状态以及设置转场动画
- Logcat -b events and eventlogtags print the location correspondence of the events log in the code
- In kotlin?,!,?:,:, - & gt;、== Brief description of symbols
- Error in downloading opencv from pip
- Solution: vscode open file will always overwrite the last opened label
- 欧姆龙平替国产大货—JY-V640半导体晶元盒读写器
- The jadx decompiler can decompile jars and apks
- Kotlin collaboration - simple use of collaboration
猜你喜欢
RFID process management solution for electroplating fixture
‘ipconfig‘ 不是内部或外部命令,也不是可运行的程序 或批处理文件。
Base64 principle
[SketchUp 2021] CAD file import and modeling in the sketch master (establish elevation model in the sketch master by using CAD drawings), and the sketch master exports 2D, 3D and elevation effects of
Binary search
JVM Foundation
AI realizes "Resurrection" of relatives | old photo repair | old photo coloring, recommended by free app
Solutions to common problems in small program development
Huawei developer certification and deveco studio compiler Download
Intelligent digital asset management helps enterprises win the post epidemic Era
随机推荐
Detailed explanation of the player startup process of ijkplayer code walkthrough 2
Interface oriented programming in C language
Kotlin collaboration process +flow download case
Uniapp (upload local pictures, preview pictures, convert Base64 format, upload audio files)
Hbuilderx: installation of hbuilderx and its common plug-ins
[var const let differences]
JetPack - - - LifeCycle、ViewModel、LiveData
Introduction to applet layout
Relationship between fragment lifecycle and activity
Detailed explanation of the player startup process of ijkplayer code walk through
JS to realize bidirectional data binding
Ijkplayer code walk through read_ AV in thread thread_ read_ Detailed explanation of frame() data stream reading process
Unable to locate program input point getrawinputdevicelist in dynamic link library user32 DLL processing
Wechat applet custom tabbar (session customer service) vant
Use of kotlin basic common sets list, set and map
[2022 college entrance examination season] what I want to say as a passer-by
Kotlin collaboration -- context and exception handling
Kotlin basic definition class, initialization and inheritance
The web server failed to start Port 7001 was already in use
Comprehensive overview of ijkplayer contour features for ijkplayer code walk through