当前位置:网站首页>A detailed explanation takes you to reproduce the statistical learning method again -- Chapter 2, perceptron model
A detailed explanation takes you to reproduce the statistical learning method again -- Chapter 2, perceptron model
2022-07-02 09:11:00 【Qigui】
Individuality signature : The most important part of the whole building is the foundation , The foundation is unstable , The earth trembled and the mountains swayed .
And to learn technology, we should lay a solid foundation , Pay attention to me , Take you to firm the foundation of the neighborhood of each plate .
Blog home page : Qigui's blog
special column :《 Statistical learning method 》 The second edition —— Personal notes
It's not easy to create , Don't forget to hit three in a row when you pass by !!!
Focus on the author , Not only lucky , The future is more promising !!!Triple attack( Three strikes in a row ):Comment,Like and Collect--->Attention
One 、 Perceptron model
The perceptron is based on the feature vector of the input instance x A linear classification model for its second class classification , It's a discriminant model .
The hypothesis space of the perceptron model is all linear classification models or linear classifiers defined in the feature space ,
The set of functions
.
General form of perceptron model :
among ,x Represent eigenvectors
,
and b It's a perceptron model parameter ,
It's called weight or weight vector
,b Called bias (bias),
Express
and x Inner product ,sign It's a symbolic function , namely
The perceptron model corresponds to the separation hyperplane in the feature space :
The perceptron corresponds to a separating hyperplane that divides instances into positive and negative categories in the feature space S, among
Is the normal vector of the hyperplane ,b Is the intercept of the hyperplane . This hyperplane divides the feature space into two parts , The point in both parts ( Eigenvector ) They are called positive and negative respectively , The input is the eigenvector of the instance , The output is the category of the instance , take +1 and -1. So this hyperplane S It is called a separated hyperplane .
The perceptron uses a hyperplane to divide instances into positive and negative classes , But some data sets are not linearly separable , So no hyperplane can correctly classify all instances .
in general , The instance points are classified by training the perceptron model , For example, red beans and mung beans are classified after mixing , At that time, a hyperplane is needed to divide the two classes and mark them as +1 and -1.
import numpy as np
def perceptron(x1, x2):
x = np.array([x1, x2]) # Eigenvector
w = np.array([0.3, 0.7]) # A weight
b = -0.3 # bias
f = np.sum(w * x) + b # General form of perceptron model
# f = np.dot(w, x) + b
# Divide the examples through model training
if f >= 0:
return 1 # Just like
else:
return -1 # Negative class
# Input eigenvector x
for x in [(0, 0), (1, 0), (0, 1), (1, 1)]:
y = perceptron(x[0], x[1])
print(str(x) + '-->' + str(y))(0, 0)-->-1
(1, 0)-->1
(0, 1)-->1
(1, 1)-->1Two 、 Perceptron learning strategies
1、 Linear separability of data sets
If I have some hyperplane S The positive instance points and negative instance points of the data set can be completely and correctly divided into two sides of the hyperplane , For all
Example i, Yes
, For all
Example , Yes
, Then the data set is called linear separable data set ; otherwise , Call the data set linearly indivisible . In reality , This data set is ideal , The existence may be small .
2、 Perceptron learning strategies
To find such a hyperplane , That is to determine the parameters of the perceptron model
,b, Need to identify a learning strategy , This learning strategy is to define the loss function and minimize the loss function . Perceptron generally adopts : A natural choice of the loss function is the total number of misclassification points , Another choice of loss function is misclassification point
To the hyperplane S The total distance . The former loss function is not a parameter
,b Continuous differentiable function of , Difficult to optimize . such , Hypothetical hyperplane S The set of misclassification points is M, Then we can get all misclassification points to the hyperplane S The total distance , Thus the loss function of perceptron learning is obtained . Given the training data set , Loss function
yes
,b Continuous differentiable function of .
Minimize the loss function :
among ,M Set of misclassification points , This loss function is actually the empirical risk function of perceptron learning . The strategy of perceptron learning is to select the model parameters that minimize the loss function in the hypothesis space
,b Is the perceptron model , Corresponding to the total distance from the misclassification point to the separation hyperplane .
The loss function is nonnegative . If there is no misclassification point , The value of the loss function is 0. The fewer misclassification points , The closer the misclassification point is to the hyperplane , The smaller the value of the loss function . The loss function of a particular sample point : In case of misclassification, it is a parameter
,b The linear function of , In the correct classification is 0.
The learning strategy of perceptron is to minimize the loss function . To be able to classify correctly , We need to find a separation hyperplane to divide the instance points completely and correctly , To find the hyperplane, we need to solve the separated hyperplane, that is
Medium Parameters
,b;x Is the input eigenvector . However, there are certain errors due to the fact that the classification cannot be guaranteed to be completely correct , At this time, we need a learning strategy, that is, the loss function , Minimize the error , Is to minimize the loss function .
3、 ... and 、 Perceptron learning algorithm
Learn the appropriate value within the value range , The output of the model calculated for the given input eigenvector is the predicted value , Be as correct as possible , Such algorithm is the learning algorithm of perceptron model .
Perceptron is an error driven learning algorithm . If the prediction is correct , The perceptron algorithm will continue to predict the next instance ; If the prediction is wrong , The algorithm will update the weights , to
updated .
1、 The original form of perceptron learning algorithm
The perceptron learning algorithm is misclassified driven , The random gradient descent method is used . First, select a hyperplane arbitrarily
, Then the gradient descent method is used to continuously minimize the loss function, resulting in the minimum value . The minimization process does not make M The gradient of all misclassification points in , Instead, one misclassification point is randomly selected at a time to make its gradient drop .
Randomly select a classification error point
, Yes
updated :
# Initialize parameters w, b
w = np.array([0.3, 0.7]) # A weight w
b = -0.3 # bias b
# Set the learning rate η
learning_rate = 0.6
# Yes w,b updated
def update_weights(x, y, w, b):
w = w + learning_rate * y * x
b = b + learning_rate * y
return w, b among
It's the step length , Also known as learning rate . Usually , The learning algorithm adjusts the range of updating parameters by setting the learning rate . Through iteration, we can expect the loss function to decrease , Until 0.
explain : When an instance point is misclassified , That is, on the wrong side of the separation hyperplane , Then adjust
Value , Move the separation hyperplane to one side of the misclassification point , To reduce the distance between the misclassification point and the hyperplane , Until the hyperplane crosses the misclassification point to make it classify correctly .
To minimize the loss function , The method used is the gradient descent method . Gradient descent is to update the misclassification points , So as to change the parameters
,b Value , Find the separation hyperplane . During the update process , in order to Limit
,b The magnitude of change in the value of , Set a learning rate to adjust the magnitude .
2、 The convergence of the algorithm
Every time we traverse all the training instances, we call it a training cycle (epoch). If the learning algorithm classifies all training instances correctly in a training cycle , Then it reaches the convergence state .( Learning algorithms do not necessarily guarantee convergence , Therefore, the learning algorithm needs a super parameter to specify the maximum trainable cycle that can be completed before the algorithm terminates .)
After a finite number of iterations, we can get a separate hyperplane and perceptron model that completely and correctly divides the training data set , When the training data set is linearly separable , The original form iteration of perceptron learning algorithm is convergent ; Then when the training data set is linearly nonseparable , Perceptron learning algorithm does not converge , The iteration result will fluctuate . However, due to different initial values or different misclassification points , The solution can be different ; That is to say, there are many solutions to the perceptron learning algorithm , These solutions depend on the choice of initial values , It also depends on the selection order of misclassification points in the iterative process . To get the only hyperplane , We need to add constraints to the separation hyperplane .
3、 Dual form of perceptron learning algorithm
The basic idea : take
Represented as an instance
And tags
Of linear combinations of , By solving its coefficient
. The more instance points are updated , It means that the closer it is to the separation hyperplane , The more difficult it is to classify correctly , Examples at this time have the greatest impact on school results . Same as the original form , The dual form iteration of perceptron learning algorithm is convergent , There are multiple solutions .
《 Statistical learning method 》—— The third chapter 、K Nearest neighbor method
http://t.csdn.cn/wBQab
http://t.csdn.cn/wBQab
data structure 1800 test questions .pdf(C Language version ):
https://download.csdn.net/download/weixin_64215932/85253966
https://download.csdn.net/download/weixin_64215932/85253966
Reference material
1.《 Statistical learning method 》 The second edition -- expericnce
2.《scikit-learn machine learning 》 The second edition -- Gavin . Written by Haike , Translated by Zhang Haoran
3. Stanford machine learning PPT edition
边栏推荐
- Kubernetes deploys Loki logging system
- Cloudrev self built cloud disk practice, I said that no one can limit my capacity and speed
- C call system sound beep~
- Troubleshooting and handling of an online problem caused by redis zadd
- Cartoon rendering - average normal stroke
- Gocv image cutting and display
- Tensorflow2 keras classification model
- 2022/2/14 summary
- Count the number of various characters in the string
- oracle修改数据库字符集
猜你喜欢

win10使用docker拉取redis镜像报错read-only file system: unknown

微服务实战|负载均衡组件及源码分析

C language - Blue Bridge Cup - 7 segment code

微服务实战|微服务网关Zuul入门与实战

"Redis source code series" learning and thinking about source code reading

C nail development: obtain all employee address books and send work notices

Minecraft module service opening

Avoid breaking changes caused by modifying constructor input parameters

知识点很细(代码有注释)数构(C语言)——第三章、栈和队列

将一串数字顺序后移
随机推荐
oracle修改数据库字符集
QT drag event
队列管理器running状态下无法查看通道
I've taken it. MySQL table 500W rows, but someone doesn't partition it?
分布式服务架构精讲pdf文档:原理+设计+实战,(收藏再看)
Ora-12514 problem solving method
Multi version concurrency control mvcc of MySQL
一个经典约瑟夫问题的分析与解答
What is the future value of fluorite mine of karaqin Xinbao Mining Co., Ltd. under zhongang mining?
Qt——如何在QWidget中设置阴影效果
Solution and analysis of Hanoi Tower problem
京东高级工程师开发十年,编写出:“亿级流量网站架构核心技术”
C# 调用系统声音 嘀~
[go practical basis] how can gin get the request parameters of get and post
C call system sound beep~
微服务实战|原生态实现服务的发现与调用
Minecraft install resource pack
统计字符串中各类字符的个数
Flink - use the streaming batch API to count the number of words
Leetcode sword finger offer brush questions - day 23