当前位置:网站首页>Statistical learning method -- perceptron

Statistical learning method -- perceptron

2022-07-07 16:14:00 _ Spring_


Perceptron is one of the most basic models of machine learning , It's the basis of neural networks and support vector machines .

Keywords of perceptron

  • Two classification
  • Discriminant model 、 Linear model
  • Data sets are linearly separable , There are infinitely many solutions
  • Unable to resolve XOR problem

The principle of perceptron

  1. The perceptron is based on the feature vector of the data instance x A linear classification model for its second class classification , Output is +1 or -1. The function of the perceptron is :
    f ( x ) = s i g n ( w ⋅ x + b ) f(x)=sign(w·x+b) f(x)=sign(wx+b)
    among w w w Is the weight , b b b It's bias , w ⋅ x w·x wx yes w w w and x x x Inner product , s i g n sign sign It's a symbolic function , namely ,
    s i g n ( x ) = { + 1 x ≥ 0 − 1 x &lt; 0 sign(x) = \begin{cases} +1 &amp; x\ge 0\\ -1 &amp; x &lt;0 \end{cases} sign(x)={ +11x0x<0
    When wx+b Greater than 0 when , according to sign function , Output is 1, Corresponding to positive class ; When wx+b Less than 0, Output is -1, Corresponding negative class .

  2. Geometric interpretation of perceptron : linear equation w ⋅ x + b = 0 w·x+b=0 wx+b=0 Corresponding to a hyperplane in the feature space S S S, This hyperplane divides the feature space into two parts , The point in both parts ( Eigenvector ) They're divided into positive 、 Negative two types of . hyperplane S S S It is also called separating hyperplane .
    A linear equation divides the feature space into two parts . In the two-dimensional feature space ,wx+b=0 That is, between one , Divide the plane into two parts , The point above the line is brought in wx+b The calculated value is greater than 0, Is a positive class , Corresponding y The value is +1, The point below the line is less than 0, Is a negative class , Corresponding y The value is -1.

  3. The strategy of perceptron learning is to minimize the loss function   Count :
    m i n w , b L ( w , b ) = − ∑ y i ( w ⋅ x i + b ) , x i ∈ M min_w, _bL(w,b)=-\sum y_i(w·x_i+b), x_i \in M minw,bL(w,b)=yi(wxi+b),xiM
    The loss function corresponds to the total distance from the misclassification point to the separation hyperplane .
    The loss function here focuses on misclassification points , Not all points . The minimum loss function is zero , That is, all points are classified correctly . Therefore, there are infinite solutions .

  4. Perceptron learning algorithm is an optimization algorithm of loss function based on random gradient descent method . In the original form , First select a hyperplane , Then the gradient descent method is used to continuously minimize the objective function , In the process , Randomly select one misclassification point at a time to make its gradient drop .

  5. When the training data set is linearly separable [ Add 1], The perceptron learning algorithm is convergent , But there are infinite solutions , These solutions depend on the choice of local values , It also depends on the selection order of misclassification points in the iterative process .
    If you want to get a unique hyperplane , We need to add constraints to the separation hyperplane . Refer to support vector machine

Why can't we solve XOR (XOR) problem

XOR problem is in binary operation , The same value is 0, The difference is 1.
Map the XOR problem to two-dimensional space , Can be expressed as :
 Insert picture description here

picture source : https://www.jianshu.com/p/853ebc9e69f6

In this two-dimensional space , We can't find a straight line to divide it into two categories . That is to say, it is impossible to use the perceptron model to X To the side of the straight line , At the same time O To the other side of the line . So the perceptron can't solve the XOR problem .

Supplementary knowledge

  • Add 1: Linear separability of data sets
    Given a dataset
    T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , … , ( x N , y N ) } T=\{(x_1, y_1), (x_2, y_2),…, (x_N, y_N)\} T={ (x1,y1),(x2,y2),,(xN,yN)}
    among , x i ∈ X = R n x_i \in X = R^n xiX=Rn, y i ∈ Y = { + 1 , − 1 } y_i \in Y=\{+1,-1\} yiY={ +1,1}, i = 1 , 2 , … , N i=1,2,…,N i=1,2,,N, If I have some hyperplane S S S
    w ⋅ x + b = 0 w·x+b=0 wx+b=0 The ability to partition the positive and negative instance points of a data set exactly to either side of a hyperplane , For all y i = + 1 y_i=+1 yi=+1 Example i i i, Yes w ⋅ x i + b &gt; 0 w·x_i+b&gt;0 wxi+b>0, For all y i = − 1 y_i=-1 yi=1 Example i i i, Yes w ⋅ x i + b &lt; 0 w·x_i+b&lt;0 wxi+b<0, Is called a data set T T T Is a linear fractional data set (Linearly separable data set); otherwise , According to the data set T T T The line shape is inseparable .

Recommended reading :

  1. 15 Minutes to thoroughly understand the perceptron ( Explain in detail + Code implementation )
  2. be based on sklearn Our perceptron python Code implementation
原网站

版权声明
本文为[_ Spring_]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207071353002375.html