当前位置:网站首页>A detailed explanation takes you to reproduce the statistical learning method again -- Chapter 2, perceptron model
A detailed explanation takes you to reproduce the statistical learning method again -- Chapter 2, perceptron model
2022-07-02 09:11:00 【Qigui】
Individuality signature : The most important part of the whole building is the foundation , The foundation is unstable , The earth trembled and the mountains swayed .
And to learn technology, we should lay a solid foundation , Pay attention to me , Take you to firm the foundation of the neighborhood of each plate .
Blog home page : Qigui's blog
special column :《 Statistical learning method 》 The second edition —— Personal notes
It's not easy to create , Don't forget to hit three in a row when you pass by !!!
Focus on the author , Not only lucky , The future is more promising !!!Triple attack( Three strikes in a row ):Comment,Like and Collect--->Attention
One 、 Perceptron model
The perceptron is based on the feature vector of the input instance x A linear classification model for its second class classification , It's a discriminant model .
The hypothesis space of the perceptron model is all linear classification models or linear classifiers defined in the feature space ,
The set of functions .
General form of perceptron model :
among ,x Represent eigenvectors ,
and b It's a perceptron model parameter ,
It's called weight or weight vector
,b Called bias (bias),
Express
and x Inner product ,sign It's a symbolic function , namely
The perceptron model corresponds to the separation hyperplane in the feature space :
The perceptron corresponds to a separating hyperplane that divides instances into positive and negative categories in the feature space S, among Is the normal vector of the hyperplane ,b Is the intercept of the hyperplane . This hyperplane divides the feature space into two parts , The point in both parts ( Eigenvector ) They are called positive and negative respectively , The input is the eigenvector of the instance , The output is the category of the instance , take +1 and -1. So this hyperplane S It is called a separated hyperplane .
The perceptron uses a hyperplane to divide instances into positive and negative classes , But some data sets are not linearly separable , So no hyperplane can correctly classify all instances .
in general , The instance points are classified by training the perceptron model , For example, red beans and mung beans are classified after mixing , At that time, a hyperplane is needed to divide the two classes and mark them as +1 and -1.
import numpy as np
def perceptron(x1, x2):
x = np.array([x1, x2]) # Eigenvector
w = np.array([0.3, 0.7]) # A weight
b = -0.3 # bias
f = np.sum(w * x) + b # General form of perceptron model
# f = np.dot(w, x) + b
# Divide the examples through model training
if f >= 0:
return 1 # Just like
else:
return -1 # Negative class
# Input eigenvector x
for x in [(0, 0), (1, 0), (0, 1), (1, 1)]:
y = perceptron(x[0], x[1])
print(str(x) + '-->' + str(y))
(0, 0)-->-1
(1, 0)-->1
(0, 1)-->1
(1, 1)-->1
Two 、 Perceptron learning strategies
1、 Linear separability of data sets
If I have some hyperplane S The positive instance points and negative instance points of the data set can be completely and correctly divided into two sides of the hyperplane , For all Example i, Yes
, For all
Example , Yes
, Then the data set is called linear separable data set ; otherwise , Call the data set linearly indivisible . In reality , This data set is ideal , The existence may be small .
2、 Perceptron learning strategies
To find such a hyperplane , That is to determine the parameters of the perceptron model ,b, Need to identify a learning strategy , This learning strategy is to define the loss function and minimize the loss function . Perceptron generally adopts : A natural choice of the loss function is the total number of misclassification points , Another choice of loss function is misclassification point
To the hyperplane S The total distance . The former loss function is not a parameter
,b Continuous differentiable function of , Difficult to optimize . such , Hypothetical hyperplane S The set of misclassification points is M, Then we can get all misclassification points to the hyperplane S The total distance , Thus the loss function of perceptron learning is obtained . Given the training data set , Loss function
yes
,b Continuous differentiable function of .
Minimize the loss function :
among ,M Set of misclassification points , This loss function is actually the empirical risk function of perceptron learning . The strategy of perceptron learning is to select the model parameters that minimize the loss function in the hypothesis space ,b Is the perceptron model , Corresponding to the total distance from the misclassification point to the separation hyperplane .
The loss function is nonnegative . If there is no misclassification point , The value of the loss function is 0. The fewer misclassification points , The closer the misclassification point is to the hyperplane , The smaller the value of the loss function . The loss function of a particular sample point : In case of misclassification, it is a parameter ,b The linear function of , In the correct classification is 0.
The learning strategy of perceptron is to minimize the loss function . To be able to classify correctly , We need to find a separation hyperplane to divide the instance points completely and correctly , To find the hyperplane, we need to solve the separated hyperplane, that is Medium Parameters
,b;x Is the input eigenvector . However, there are certain errors due to the fact that the classification cannot be guaranteed to be completely correct , At this time, we need a learning strategy, that is, the loss function , Minimize the error , Is to minimize the loss function .
3、 ... and 、 Perceptron learning algorithm
Learn the appropriate value within the value range , The output of the model calculated for the given input eigenvector is the predicted value , Be as correct as possible , Such algorithm is the learning algorithm of perceptron model .
Perceptron is an error driven learning algorithm . If the prediction is correct , The perceptron algorithm will continue to predict the next instance ; If the prediction is wrong , The algorithm will update the weights , to updated .
1、 The original form of perceptron learning algorithm
The perceptron learning algorithm is misclassified driven , The random gradient descent method is used . First, select a hyperplane arbitrarily , Then the gradient descent method is used to continuously minimize the loss function, resulting in the minimum value . The minimization process does not make M The gradient of all misclassification points in , Instead, one misclassification point is randomly selected at a time to make its gradient drop .
Randomly select a classification error point , Yes
updated :
# Initialize parameters w, b
w = np.array([0.3, 0.7]) # A weight w
b = -0.3 # bias b
# Set the learning rate η
learning_rate = 0.6
# Yes w,b updated
def update_weights(x, y, w, b):
w = w + learning_rate * y * x
b = b + learning_rate * y
return w, b
among It's the step length , Also known as learning rate . Usually , The learning algorithm adjusts the range of updating parameters by setting the learning rate . Through iteration, we can expect the loss function to decrease , Until 0.
explain : When an instance point is misclassified , That is, on the wrong side of the separation hyperplane , Then adjust Value , Move the separation hyperplane to one side of the misclassification point , To reduce the distance between the misclassification point and the hyperplane , Until the hyperplane crosses the misclassification point to make it classify correctly .
To minimize the loss function , The method used is the gradient descent method . Gradient descent is to update the misclassification points , So as to change the parameters ,b Value , Find the separation hyperplane . During the update process , in order to Limit
,b The magnitude of change in the value of , Set a learning rate to adjust the magnitude .
2、 The convergence of the algorithm
Every time we traverse all the training instances, we call it a training cycle (epoch). If the learning algorithm classifies all training instances correctly in a training cycle , Then it reaches the convergence state .( Learning algorithms do not necessarily guarantee convergence , Therefore, the learning algorithm needs a super parameter to specify the maximum trainable cycle that can be completed before the algorithm terminates .)
After a finite number of iterations, we can get a separate hyperplane and perceptron model that completely and correctly divides the training data set , When the training data set is linearly separable , The original form iteration of perceptron learning algorithm is convergent ; Then when the training data set is linearly nonseparable , Perceptron learning algorithm does not converge , The iteration result will fluctuate . However, due to different initial values or different misclassification points , The solution can be different ; That is to say, there are many solutions to the perceptron learning algorithm , These solutions depend on the choice of initial values , It also depends on the selection order of misclassification points in the iterative process . To get the only hyperplane , We need to add constraints to the separation hyperplane .
3、 Dual form of perceptron learning algorithm
The basic idea : take Represented as an instance
And tags
Of linear combinations of , By solving its coefficient
. The more instance points are updated , It means that the closer it is to the separation hyperplane , The more difficult it is to classify correctly , Examples at this time have the greatest impact on school results . Same as the original form , The dual form iteration of perceptron learning algorithm is convergent , There are multiple solutions .
《 Statistical learning method 》—— The third chapter 、K Nearest neighbor method
http://t.csdn.cn/wBQabhttp://t.csdn.cn/wBQab
data structure 1800 test questions .pdf(C Language version ):
https://download.csdn.net/download/weixin_64215932/85253966
https://download.csdn.net/download/weixin_64215932/85253966
Reference material
1.《 Statistical learning method 》 The second edition -- expericnce
2.《scikit-learn machine learning 》 The second edition -- Gavin . Written by Haike , Translated by Zhang Haoran
3. Stanford machine learning PPT edition
边栏推荐
- Finishing the interview essentials of secsha system!!!
- 【Go实战基础】gin 高效神器,如何将参数绑定到结构体
- 《统计学习方法》——第五章、决策树模型与学习(上)
- 京东高级工程师开发十年,编写出:“亿级流量网站架构核心技术”
- win10使用docker拉取redis镜像报错read-only file system: unknown
- Ora-12514 problem solving method
- Tensorflow2 keras classification model
- Count the number of various characters in the string
- 【Go实战基础】gin 如何自定义和使用一个中间件
- Minecraft air Island service
猜你喜欢
C language - Blue Bridge Cup - 7 segment code
win10使用docker拉取redis镜像报错read-only file system: unknown
Multi version concurrency control mvcc of MySQL
Ora-12514 problem solving method
Hengyuan cloud_ Can aiphacode replace programmers?
一篇详解带你再次重现《统计学习方法》——第二章、感知机模型
远程连接IBM MQ报错AMQ4036解决方法
Installing Oracle database 19C RAC on Linux
Redis zadd导致的一次线上问题排查和处理
小米电视不能访问电脑共享文件的解决方案
随机推荐
Redis安装部署(Windows/Linux)
How to realize asynchronous programming in a synchronous way?
统计字符串中各类字符的个数
机器学习实战:《美人鱼》属于爱情片还是动作片?KNN揭晓答案
[staff] time mark and note duration (staff time mark | full note rest | half note rest | quarter note rest | eighth note rest | sixteenth note rest | thirty second note rest)
CSDN Q & A_ Evaluation
盘点典型错误之TypeError: X() got multiple values for argument ‘Y‘
Redis zadd导致的一次线上问题排查和处理
Gocv image cutting and display
[go practical basis] how can gin get the request parameters of get and post
cmd窗口中中文呈现乱码解决方法
长篇总结(代码有注释)数构(C语言)——第四章、串(上)
Ora-12514 problem solving method
Gocv boundary fill
Matplotlib剑客行——容纳百川的艺术家教程
概率还不会的快看过来《统计学习方法》——第四章、朴素贝叶斯法
Analysis and solution of a classical Joseph problem
[staff] time sign and note duration (full note | half note | quarter note | eighth note | sixteenth note | thirty second note)
Jingdong senior engineer has developed for ten years and compiled "core technology of 100 million traffic website architecture"
破茧|一文说透什么是真正的云原生