当前位置:网站首页>A detailed explanation takes you to reproduce the statistical learning method again -- Chapter 2, perceptron model
A detailed explanation takes you to reproduce the statistical learning method again -- Chapter 2, perceptron model
2022-07-02 09:11:00 【Qigui】
Individuality signature : The most important part of the whole building is the foundation , The foundation is unstable , The earth trembled and the mountains swayed .
And to learn technology, we should lay a solid foundation , Pay attention to me , Take you to firm the foundation of the neighborhood of each plate .
Blog home page : Qigui's blog
special column :《 Statistical learning method 》 The second edition —— Personal notes
It's not easy to create , Don't forget to hit three in a row when you pass by !!!
Focus on the author , Not only lucky , The future is more promising !!!Triple attack( Three strikes in a row ):Comment,Like and Collect--->Attention
One 、 Perceptron model
The perceptron is based on the feature vector of the input instance x A linear classification model for its second class classification , It's a discriminant model .
The hypothesis space of the perceptron model is all linear classification models or linear classifiers defined in the feature space ,
The set of functions
.
General form of perceptron model :
among ,x Represent eigenvectors
,
and b It's a perceptron model parameter ,
It's called weight or weight vector
,b Called bias (bias),
Express
and x Inner product ,sign It's a symbolic function , namely
The perceptron model corresponds to the separation hyperplane in the feature space :
The perceptron corresponds to a separating hyperplane that divides instances into positive and negative categories in the feature space S, among
Is the normal vector of the hyperplane ,b Is the intercept of the hyperplane . This hyperplane divides the feature space into two parts , The point in both parts ( Eigenvector ) They are called positive and negative respectively , The input is the eigenvector of the instance , The output is the category of the instance , take +1 and -1. So this hyperplane S It is called a separated hyperplane .
The perceptron uses a hyperplane to divide instances into positive and negative classes , But some data sets are not linearly separable , So no hyperplane can correctly classify all instances .
in general , The instance points are classified by training the perceptron model , For example, red beans and mung beans are classified after mixing , At that time, a hyperplane is needed to divide the two classes and mark them as +1 and -1.
import numpy as np
def perceptron(x1, x2):
x = np.array([x1, x2]) # Eigenvector
w = np.array([0.3, 0.7]) # A weight
b = -0.3 # bias
f = np.sum(w * x) + b # General form of perceptron model
# f = np.dot(w, x) + b
# Divide the examples through model training
if f >= 0:
return 1 # Just like
else:
return -1 # Negative class
# Input eigenvector x
for x in [(0, 0), (1, 0), (0, 1), (1, 1)]:
y = perceptron(x[0], x[1])
print(str(x) + '-->' + str(y))(0, 0)-->-1
(1, 0)-->1
(0, 1)-->1
(1, 1)-->1Two 、 Perceptron learning strategies
1、 Linear separability of data sets
If I have some hyperplane S The positive instance points and negative instance points of the data set can be completely and correctly divided into two sides of the hyperplane , For all
Example i, Yes
, For all
Example , Yes
, Then the data set is called linear separable data set ; otherwise , Call the data set linearly indivisible . In reality , This data set is ideal , The existence may be small .
2、 Perceptron learning strategies
To find such a hyperplane , That is to determine the parameters of the perceptron model
,b, Need to identify a learning strategy , This learning strategy is to define the loss function and minimize the loss function . Perceptron generally adopts : A natural choice of the loss function is the total number of misclassification points , Another choice of loss function is misclassification point
To the hyperplane S The total distance . The former loss function is not a parameter
,b Continuous differentiable function of , Difficult to optimize . such , Hypothetical hyperplane S The set of misclassification points is M, Then we can get all misclassification points to the hyperplane S The total distance , Thus the loss function of perceptron learning is obtained . Given the training data set , Loss function
yes
,b Continuous differentiable function of .
Minimize the loss function :
among ,M Set of misclassification points , This loss function is actually the empirical risk function of perceptron learning . The strategy of perceptron learning is to select the model parameters that minimize the loss function in the hypothesis space
,b Is the perceptron model , Corresponding to the total distance from the misclassification point to the separation hyperplane .
The loss function is nonnegative . If there is no misclassification point , The value of the loss function is 0. The fewer misclassification points , The closer the misclassification point is to the hyperplane , The smaller the value of the loss function . The loss function of a particular sample point : In case of misclassification, it is a parameter
,b The linear function of , In the correct classification is 0.
The learning strategy of perceptron is to minimize the loss function . To be able to classify correctly , We need to find a separation hyperplane to divide the instance points completely and correctly , To find the hyperplane, we need to solve the separated hyperplane, that is
Medium Parameters
,b;x Is the input eigenvector . However, there are certain errors due to the fact that the classification cannot be guaranteed to be completely correct , At this time, we need a learning strategy, that is, the loss function , Minimize the error , Is to minimize the loss function .
3、 ... and 、 Perceptron learning algorithm
Learn the appropriate value within the value range , The output of the model calculated for the given input eigenvector is the predicted value , Be as correct as possible , Such algorithm is the learning algorithm of perceptron model .
Perceptron is an error driven learning algorithm . If the prediction is correct , The perceptron algorithm will continue to predict the next instance ; If the prediction is wrong , The algorithm will update the weights , to
updated .
1、 The original form of perceptron learning algorithm
The perceptron learning algorithm is misclassified driven , The random gradient descent method is used . First, select a hyperplane arbitrarily
, Then the gradient descent method is used to continuously minimize the loss function, resulting in the minimum value . The minimization process does not make M The gradient of all misclassification points in , Instead, one misclassification point is randomly selected at a time to make its gradient drop .
Randomly select a classification error point
, Yes
updated :
# Initialize parameters w, b
w = np.array([0.3, 0.7]) # A weight w
b = -0.3 # bias b
# Set the learning rate η
learning_rate = 0.6
# Yes w,b updated
def update_weights(x, y, w, b):
w = w + learning_rate * y * x
b = b + learning_rate * y
return w, b among
It's the step length , Also known as learning rate . Usually , The learning algorithm adjusts the range of updating parameters by setting the learning rate . Through iteration, we can expect the loss function to decrease , Until 0.
explain : When an instance point is misclassified , That is, on the wrong side of the separation hyperplane , Then adjust
Value , Move the separation hyperplane to one side of the misclassification point , To reduce the distance between the misclassification point and the hyperplane , Until the hyperplane crosses the misclassification point to make it classify correctly .
To minimize the loss function , The method used is the gradient descent method . Gradient descent is to update the misclassification points , So as to change the parameters
,b Value , Find the separation hyperplane . During the update process , in order to Limit
,b The magnitude of change in the value of , Set a learning rate to adjust the magnitude .
2、 The convergence of the algorithm
Every time we traverse all the training instances, we call it a training cycle (epoch). If the learning algorithm classifies all training instances correctly in a training cycle , Then it reaches the convergence state .( Learning algorithms do not necessarily guarantee convergence , Therefore, the learning algorithm needs a super parameter to specify the maximum trainable cycle that can be completed before the algorithm terminates .)
After a finite number of iterations, we can get a separate hyperplane and perceptron model that completely and correctly divides the training data set , When the training data set is linearly separable , The original form iteration of perceptron learning algorithm is convergent ; Then when the training data set is linearly nonseparable , Perceptron learning algorithm does not converge , The iteration result will fluctuate . However, due to different initial values or different misclassification points , The solution can be different ; That is to say, there are many solutions to the perceptron learning algorithm , These solutions depend on the choice of initial values , It also depends on the selection order of misclassification points in the iterative process . To get the only hyperplane , We need to add constraints to the separation hyperplane .
3、 Dual form of perceptron learning algorithm
The basic idea : take
Represented as an instance
And tags
Of linear combinations of , By solving its coefficient
. The more instance points are updated , It means that the closer it is to the separation hyperplane , The more difficult it is to classify correctly , Examples at this time have the greatest impact on school results . Same as the original form , The dual form iteration of perceptron learning algorithm is convergent , There are multiple solutions .
《 Statistical learning method 》—— The third chapter 、K Nearest neighbor method
http://t.csdn.cn/wBQab
http://t.csdn.cn/wBQab
data structure 1800 test questions .pdf(C Language version ):
https://download.csdn.net/download/weixin_64215932/85253966
https://download.csdn.net/download/weixin_64215932/85253966
Reference material
1.《 Statistical learning method 》 The second edition -- expericnce
2.《scikit-learn machine learning 》 The second edition -- Gavin . Written by Haike , Translated by Zhang Haoran
3. Stanford machine learning PPT edition
边栏推荐
- 概念到方法,绝了《统计学习方法》——第三章、k近邻法
- 数构(C语言--代码有注释)——第二章、线性表(更新版)
- C language implementation of mine sweeping game
- Kubernetes deploys Loki logging system
- QT qtimer class
- "Interview high frequency question" is 1.5/5 difficult, and the classic "prefix and + dichotomy" application question
- Oracle related statistics
- Redis sorted set data type API and application scenario analysis
- 洞见云原生|微服务及微服务架构浅析
- Openshift container platform community okd 4.10.0 deployment
猜你喜欢

Redis zadd导致的一次线上问题排查和处理

机器学习实战:《美人鱼》属于爱情片还是动作片?KNN揭晓答案

京东面试官问:LEFT JOIN关联表中用ON还是WHERE跟条件有什么区别

【Go实战基础】gin 如何获取 GET 和 POST 的请求参数

C nail development: obtain all employee address books and send work notices

Kubernetes deploys Loki logging system

What is the future value of fluorite mine of karaqin Xinbao Mining Co., Ltd. under zhongang mining?
![[go practical basis] how to verify request parameters in gin](/img/de/50db131d6993e5d955e3416c667c4c.png)
[go practical basis] how to verify request parameters in gin
![[go practical basis] how to customize and use a middleware in gin](/img/fb/c0a4453b5d3fda845c207c0cb928ae.png)
[go practical basis] how to customize and use a middleware in gin

【Go实战基础】gin 如何自定义和使用一个中间件
随机推荐
Sentinel reports failed to fetch metric connection timeout and connection rejection
Gocv image reading and display
队列管理器running状态下无法查看通道
[staff] the lines and spaces of the staff (the nth line and the nth space in the staff | the plus N line and the plus N space on the staff | the plus N line and the plus N space below the staff | the
Data type case of machine learning -- using data to distinguish men and women based on Naive Bayesian method
Solution of Xiaomi TV's inability to access computer shared files
C# 将网页保存为图片(利用WebBrowser)
Installing Oracle database 19C RAC on Linux
How to realize asynchronous programming in a synchronous way?
Finishing the interview essentials of secsha system!!!
Cloudrev self built cloud disk practice, I said that no one can limit my capacity and speed
寻找链表中值域最小的节点并移到链表的最前面
Count the number of various characters in the string
十年開發經驗的程序員告訴你,你還缺少哪些核心競爭力?
QT qtimer class
Right click menu of QT
【Go实战基础】gin 如何自定义和使用一个中间件
[go practical basis] how to customize and use a middleware in gin
双非本科生进大厂,而我还在底层默默地爬树(上)
WSL安装、美化、网络代理和远程开发