当前位置:网站首页>Machine learning notes - supervised learning memo list
Machine learning notes - supervised learning memo list
2022-06-13 06:36:00 【Sit and watch the clouds rise】
One 、 Introduction to supervised learning
Given a set of data points
Associate to a set of results
, We want to build a classifier , Learn how to learn from
forecast
.
1、 Forecast type
The following table summarizes the different types of forecasting models :

2、 Model type
The following table summarizes the different models :

Two 、 Symbols and general concepts
1、Hypothesis
This assumption is marked as And the model we chose . For a given input data
, The prediction output of the model is
.
2、Loss function
The loss function is a function
, It will be compared with the real data value
The corresponding predicted value
As input , And output their differences . The following table summarizes the common loss functions :

3、Cost function
cost function
It is usually used to evaluate the performance of the model , Its loss function
The definition is as follows :

4、Gradient descent
By paying attention to
Learning rate , The updating rule of gradient descent is a function of learning rate and cost
Shown by the following :


remarks : Stochastic gradient descent (SGD) Is to update the parameters according to each training sample , Batch gradient descent is performed on a batch of training samples .
5、Likelihood
The given parameters
Model of
The likelihood of is used to find the optimal parameters by likelihood maximization
. We have :

remarks : In practice , We use log likelihood, which is easier to optimize
.
6、Newton's algorithm
Newton's algorithm is a method of solving
The numerical method of , bring
. The update rules are as follows :
remarks : Multidimensional generalization , Also known as Newton-Raphson Method , There are the following update rules :

3、 ... and 、 Linear model
1、 Linear regression
Let's assume here that 
(1)Normal equations
By paying attention to the design matrix
, Minimize the cost function
The value of is a closed form solution , bring :

(2)LMS algorithm
By paying attention to α Learning rate , Least mean square (LMS) Algorithm to m Update rules for training sets of data points , Also known as Widrow-Hoff Learning rule , As shown below :
![\boxed{\forall j,\quad \theta_j \leftarrow \theta_j+\alpha\sum_{i=1}^m\left[y^{(i)}-h_\theta(x^{(i)})\right]x_j^{(i)}}](http://img.inotgo.com/imagesLocal/202206/13/202206130617323140_17.gif)
remarks : The update rule is a special case of gradient rise .
(3)LWR
Local weighted regression , Also known as LWR, Is a variant of linear regression , It passes through
Each training sample is weighted in its cost function , It's made up of parameters
Defined as :

2、 Classification and logical regression
(1)Sigmoid function
sigmoid function
, Also known as logical functions , The definition is as follows :![\forall z\in\mathbb{R},\quad\boxed{g(z)=\frac{1}{1+e^{-z}}\in[0,1]}](http://img.inotgo.com/imagesLocal/202206/13/202206130617323140_26.gif)
(2)Logistic regression
Let's assume here that
. We have the following forms :

remarks : There is no closed form solution for logistic regression .
(3)Softmax regression
When there is more than 2 Result categories ,softmax Return to , Also called multi class logistic regression , Used to generalize logistic regression . By convention , We set up , This makes every class
Bernoulli parameter of by :

3、 Generalized linear model
(1)Exponential family
If a kind of distribution can use natural parameters ( Also known as canonical arguments or link functions )、
、 Sufficient statistics
And logarithm , It belongs to log-partition function
as follows :

remarks : We often have
. Besides ,
It can be regarded as a standardized parameter , It will ensure that the sum of the probabilities is 1.
The following table summarizes the most common exponential distributions :

(2)Assumptions of GLMs
Generalized linear model (GLM) To predict random variables
As
Function of , And depends on the following 3 A hypothesis :
(1)
(2)
(3) 
remarks : Ordinary least square method and logistic regression are special cases of generalized linear model .
Four 、 Support vector machine
The goal of support vector machine is to find the line with the minimum distance and the maximum distance from the line .
1、Optimal margin classifier
Best margin classifier
That's true :
, among
Is the solution of the following optimization problem :

remarks : The decision boundary is defined as 
2、Hinge loss
hinge loss be used for SVM Set up , The definition is as follows :![\boxed{L(z,y)=[1-yz]_+=\max(0,1-yz)}](http://img.inotgo.com/imagesLocal/202206/13/202206130617323140_79.gif)
3、Kernel
Given a feature map
, We will use the kernel
The definition is as follows :
In practice , from
Defined kernel
It is called Gaussian kernel , More commonly used .

remarks : We say we use “ Kernel techniques ” To use the kernel to compute cost functions , Because we don't really need to know the explicit mapping
, This is usually very complicated . contrary , Just the value
.
4、Lagrangian
We will Lagrange
The definition is as follows :
remarks : coefficient
It is called Lagrange multiplier .
5、 ... and 、 Generative learning
The generation model first attempts to estimate
To learn how to generate data , Then we can use Bayesian estimation
The rules .
1、Gaussian Discriminant Analysis
(1)Setting
Gaussian discriminant analysis hypothesis
and
and
That's true :
(1)
(2)
(3) 
(2)Estimation
The following table summarizes the estimates we found when maximizing likelihood :

2、Naive Bayes
(1)Assumption
Naive Bayesian model assumes that the characteristics of each data point are independent :

(2)Solutions
Maximizing log likelihood gives the following solution :

![k\in\{0,1\} , l\in[\![1,L]\!]](http://img.inotgo.com/imagesLocal/202206/13/202206130617323140_45.gif)
remarks : Naive Bayes is widely used in text classification and spam detection .
6、 ... and 、 Tree based and integrated approach
These methods can be used for regression and classification problems .
1、CART
Classification and regression trees (CART), Commonly known as decision tree , Can be expressed as a binary tree . Their advantage is that they are easy to explain .
2、Random forest
It is a tree based technology , It uses a large number of decision trees constructed from randomly selected feature sets . Contrary to a simple decision tree , It is highly unexplainable , But its generally good performance makes it a popular algorithm . remarks : Random forest is an integrated method .
3、Boosting
The idea of the lifting method is to combine several weak learners into a stronger one . The main conclusions are as follows :
| Adaptive boosting | Gradient boosting |
| • High weights are put on errors to improve at the next boosting step • Known as Adaboost | • Weak learners are trained on residuals • Examples include XGBoost |
7、 ... and 、 Other nonparametric methods
1、k-nearest neighbors
k- Nearest neighbor algorithm , Often referred to as k-NN, It's a nonparametric method , The response of data points is determined by the k The nature of a neighbor determines . It can be used for classification and regression settings .
notes : Parameters k The higher the , The bigger the deviation , Parameters k The lower the , The higher the variance is. .

8、 ... and 、 Learning theory
1、Union bound
Make
by k Events . We have :

2、Hoeffding inequality
Make
From the parameters
Extracted from Bernoulli distribution m iid Variable . set up
For their sample mean , also
Fix . We have :
remarks : This inequality is also called Chernoff bound .
3、Training error
For a given classifier h, We define training error
, Also known as empirical risk or empirical error , as follows :

4、Probably Approximately Correct (PAC)
PAC It's a framework , Many results of learning theory are proved under this framework , And has the following set of assumptions :
The training set and the test set follow the same distribution
Training samples are drawn independently
5、Shattering
Given a set
And a set of classifiers
, We said
about
If for any set of labels
, We have :![\boxed{\exists h\in\mathcal{H}, \quad \forall i\in[\![1,d]\!],\quad h(x^{(i)})=y^{(i)}}](http://img.inotgo.com/imagesLocal/202206/13/202206130617323140_61.gif)
6、Upper bound theorem
Make
Is a finite hypothetical class , bring
And make
And sample size
Is constant . then , At least
Probability , We have :

7、VC dimension
Given an infinite hypothetical class
Of Vapnik-Chervonenkis (VC) dimension , Write it down as
Be being
The size of the largest set of fragments .
notes :
Of VC Dimension is 3.

8、Theorem (Vapnik)
set up
,
and
It's the number of training samples . At least
Probability , We have :

边栏推荐
- Kotlin basic string operation, numeric type conversion and standard library functions
- Kotlin collaboration - simple use of collaboration
- Win10 drqa installation
- RN Metro packaging process and sentry code monitoring
- Free screen recording software captura download and installation
- SSM框架整合--->简单后台管理
- Ijkplayer compilation process record
- 线程池学习
- 【新手上路常见问答】一步一步理解程序设计
- Wechat applet development (requesting background data and encapsulating request function)
猜你喜欢

Dragon Boat Festival wellbeing, use blessing words to generate word cloud

El form form verification

Detailed explanation of Yanghui triangle
![[DP 01 backpack]](/img/be/1e5295684ead652eebfb72ab0be47a.jpg)
[DP 01 backpack]

Construction and verification of Alibaba cloud server webrtc system

JetPack - - - LifeCycle、ViewModel、LiveData

You should consider upgrading via

Explication détaillée du triangle Yang hui

端午安康,使用祝福话语生成词云吧

AI realizes "Resurrection" of relatives | old photo repair | old photo coloring, recommended by free app
随机推荐
JetPack - - - DataBinding
BlockingQueue source code
Solution: vscode open file will always overwrite the last opened label
Applet Use of spaces
Analysis of 43 cases of MATLAB neural network: Chapter 11 optimization of continuous Hopfield Neural Network -- optimization calculation of traveling salesman problem
Wechat applet custom tabbar (session customer service) vant
App performance test: (III) traffic monitoring
端午安康,使用祝福话语生成词云吧
Detailed explanation of PHP distributed transaction principle
Cross process two-way communication using messenger
Scrcpy source code walk 3 what happened between socket and screen refresh
[MySQL] basic knowledge review
Analysis of synchronized
[JS] array flattening
Custom attribute acquisition of view in applet
Omron Ping replaces the large domestic product jy-v640 semiconductor wafer box reader
JS to realize bidirectional data binding
楊輝三角形詳解
[var const let differences]
347. top k high frequency elements heap sort + bucket sort +map