当前位置：网站首页>Statistical learning method -- perceptron

Statistical learning method -- perceptron

2022-07-07 16:14:00 【_ Spring_】

Catalog

Keywords of perceptron
The principle of perceptron
Why can't we solve XOR (XOR) problem
Supplementary knowledge

Perceptron is one of the most basic models of machine learning , It's the basis of neural networks and support vector machines .

Keywords of perceptron

Two classification
Discriminant model 、 Linear model
Data sets are linearly separable , There are infinitely many solutions
Unable to resolve XOR problem

The principle of perceptron

The perceptron is based on the feature vector of the data instance x A linear classification model for its second class classification , Output is +1 or -1. The function of the perceptron is ：
$f (x) = s i g n (w \cdot x + b)$
among $w$ Is the weight , $b$ It's bias , $w \cdot x$ yes $w$ and $x$ Inner product , $s i g n$ It's a symbolic function , namely ,
$\begin{cases} +1 & x\ge 0\\ -1 & x <0 \end{cases}$
When wx+b Greater than 0 when , according to sign function , Output is 1, Corresponding to positive class ; When wx+b Less than 0, Output is -1, Corresponding negative class .
Geometric interpretation of perceptron ： linear equation $w \cdot x + b = 0$ Corresponding to a hyperplane in the feature space $S$ , This hyperplane divides the feature space into two parts , The point in both parts （ Eigenvector ） They're divided into positive 、 Negative two types of . hyperplane $S$ It is also called separating hyperplane .
A linear equation divides the feature space into two parts . In the two-dimensional feature space ,wx+b=0 That is, between one , Divide the plane into two parts , The point above the line is brought in wx+b The calculated value is greater than 0, Is a positive class , Corresponding y The value is +1, The point below the line is less than 0, Is a negative class , Corresponding y The value is -1.
The strategy of perceptron learning is to minimize the loss function Count ：
$min_w, _bL(w,b)=-\sum y_i(w·x_i+b), x_i \in M$
The loss function corresponds to the total distance from the misclassification point to the separation hyperplane .
The loss function here focuses on misclassification points , Not all points . The minimum loss function is zero , That is, all points are classified correctly . Therefore, there are infinite solutions .
Perceptron learning algorithm is an optimization algorithm of loss function based on random gradient descent method . In the original form , First select a hyperplane , Then the gradient descent method is used to continuously minimize the objective function , In the process , Randomly select one misclassification point at a time to make its gradient drop .
When the training data set is linearly separable [ Add 1], The perceptron learning algorithm is convergent , But there are infinite solutions , These solutions depend on the choice of local values , It also depends on the selection order of misclassification points in the iterative process .
If you want to get a unique hyperplane , We need to add constraints to the separation hyperplane . Refer to support vector machine

Why can't we solve XOR (XOR) problem

XOR problem is in binary operation , The same value is 0, The difference is 1.
Map the XOR problem to two-dimensional space , Can be expressed as ：
Insert picture description here

picture source ： https://www.jianshu.com/p/853ebc9e69f6

In this two-dimensional space , We can't find a straight line to divide it into two categories . That is to say, it is impossible to use the perceptron model to X To the side of the straight line , At the same time O To the other side of the line . So the perceptron can't solve the XOR problem .

Supplementary knowledge

Add 1： Linear separability of data sets
Given a dataset
$T=\{(x_1, y_1), (x_2, y_2),…, (x_N, y_N)\}$
among , $x_i \in X = R^n$ , $y_i \in Y=\{+1,-1\}$ , $i = 1, 2, \dots, N$ , If I have some hyperplane $S$
$w \cdot x + b = 0$ The ability to partition the positive and negative instance points of a data set exactly to either side of a hyperplane , For all $y_i=+1$ Example $i$ , Yes $w·x_i+b>0$ , For all $y_i=-1$ Example $i$ , Yes $w·x_i+b<0$ , Is called a data set $T$ Is a linear fractional data set （Linearly separable data set）; otherwise , According to the data set $T$ The line shape is inseparable .

Recommended reading :
15 Minutes to thoroughly understand the perceptron （ Explain in detail + Code implementation ）
be based on sklearn Our perceptron python Code implementation

原网站

版权声明
本文为[_ Spring_]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/188/202207071353002375.html