当前位置:网站首页>Statistical learning method -- perceptron
Statistical learning method -- perceptron
2022-07-07 16:14:00 【_ Spring_】
Catalog
Perceptron is one of the most basic models of machine learning , It's the basis of neural networks and support vector machines .
Keywords of perceptron
- Two classification
- Discriminant model 、 Linear model
- Data sets are linearly separable , There are infinitely many solutions
- Unable to resolve XOR problem
The principle of perceptron
The perceptron is based on the feature vector of the data instance x A linear classification model for its second class classification , Output is +1 or -1. The function of the perceptron is :
f ( x ) = s i g n ( w ⋅ x + b ) f(x)=sign(w·x+b) f(x)=sign(w⋅x+b)
among w w w Is the weight , b b b It's bias , w ⋅ x w·x w⋅x yes w w w and x x x Inner product , s i g n sign sign It's a symbolic function , namely ,
s i g n ( x ) = { + 1 x ≥ 0 − 1 x < 0 sign(x) = \begin{cases} +1 & x\ge 0\\ -1 & x <0 \end{cases} sign(x)={ +1−1x≥0x<0When wx+b Greater than 0 when , according to sign function , Output is 1, Corresponding to positive class ; When wx+b Less than 0, Output is -1, Corresponding negative class .
Geometric interpretation of perceptron : linear equation w ⋅ x + b = 0 w·x+b=0 w⋅x+b=0 Corresponding to a hyperplane in the feature space S S S, This hyperplane divides the feature space into two parts , The point in both parts ( Eigenvector ) They're divided into positive 、 Negative two types of . hyperplane S S S It is also called separating hyperplane .
A linear equation divides the feature space into two parts . In the two-dimensional feature space ,wx+b=0 That is, between one , Divide the plane into two parts , The point above the line is brought in wx+b The calculated value is greater than 0, Is a positive class , Corresponding y The value is +1, The point below the line is less than 0, Is a negative class , Corresponding y The value is -1.
The strategy of perceptron learning is to minimize the loss function Count :
m i n w , b L ( w , b ) = − ∑ y i ( w ⋅ x i + b ) , x i ∈ M min_w, _bL(w,b)=-\sum y_i(w·x_i+b), x_i \in M minw,bL(w,b)=−∑yi(w⋅xi+b),xi∈M
The loss function corresponds to the total distance from the misclassification point to the separation hyperplane .The loss function here focuses on misclassification points , Not all points . The minimum loss function is zero , That is, all points are classified correctly . Therefore, there are infinite solutions .
Perceptron learning algorithm is an optimization algorithm of loss function based on random gradient descent method . In the original form , First select a hyperplane , Then the gradient descent method is used to continuously minimize the objective function , In the process , Randomly select one misclassification point at a time to make its gradient drop .
When the training data set is linearly separable [ Add 1], The perceptron learning algorithm is convergent , But there are infinite solutions , These solutions depend on the choice of local values , It also depends on the selection order of misclassification points in the iterative process .
If you want to get a unique hyperplane , We need to add constraints to the separation hyperplane . Refer to support vector machine
Why can't we solve XOR (XOR) problem
XOR problem is in binary operation , The same value is 0, The difference is 1.
Map the XOR problem to two-dimensional space , Can be expressed as :
picture source : https://www.jianshu.com/p/853ebc9e69f6
In this two-dimensional space , We can't find a straight line to divide it into two categories . That is to say, it is impossible to use the perceptron model to X To the side of the straight line , At the same time O To the other side of the line . So the perceptron can't solve the XOR problem .
Supplementary knowledge
- Add 1: Linear separability of data sets
Given a dataset
T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , … , ( x N , y N ) } T=\{(x_1, y_1), (x_2, y_2),…, (x_N, y_N)\} T={ (x1,y1),(x2,y2),…,(xN,yN)}
among , x i ∈ X = R n x_i \in X = R^n xi∈X=Rn, y i ∈ Y = { + 1 , − 1 } y_i \in Y=\{+1,-1\} yi∈Y={ +1,−1}, i = 1 , 2 , … , N i=1,2,…,N i=1,2,…,N, If I have some hyperplane S S S
w ⋅ x + b = 0 w·x+b=0 w⋅x+b=0 The ability to partition the positive and negative instance points of a data set exactly to either side of a hyperplane , For all y i = + 1 y_i=+1 yi=+1 Example i i i, Yes w ⋅ x i + b > 0 w·x_i+b>0 w⋅xi+b>0, For all y i = − 1 y_i=-1 yi=−1 Example i i i, Yes w ⋅ x i + b < 0 w·x_i+b<0 w⋅xi+b<0, Is called a data set T T T Is a linear fractional data set (Linearly separable data set); otherwise , According to the data set T T T The line shape is inseparable .
Recommended reading :
边栏推荐
- nodejs package. JSON version number ^ and~
- How does geojson data merge the boundaries of regions?
- 无线传感器网络--ZigBee和6LoWPAN
- Three. JS introductory learning notes 03: perspective projection camera
- ThinkPHP URL 路由简介
- L'application à l'échelle de la normalisation mature des produits ai des compagnies maritimes, cimc, leader mondial de l'intelligence artificielle portuaire et maritime / intelligence artificielle des
- A JS script can be directly put into the browser to perform operations
- Multiplication in pytorch: mul (), multiply (), matmul (), mm (), MV (), dot ()
- Sysom case analysis: where is the missing memory| Dragon lizard Technology
- What are compiled languages and interpreted languages?
猜你喜欢
torch.numel作用
C4D learning notes 3- animation - animation rendering process case
Rongyun won the 2022 China Xinchuang digital office portal excellence product award!
Apache Doris刚“毕业”:为什么应关注这种SQL数据仓库?
Unity drawing plug-in = = [support the update of the original atlas]
A wave of open source notebooks is coming
Three. JS introductory learning notes 10:three JS grid
torch. Numel action
山东老博会,2022中国智慧养老展会,智能化养老、适老科技展
It's different for rich people to buy a house
随机推荐
Good news! Kelan sundb database and Hongshu technology privacy data protection management software complete compatibility adaptation
10 schemes to ensure interface data security
修改配置文件后tidb无法启动
Unity的三种单例模式(饿汉,懒汉,MonoBehaviour)
Lecturer solicitation order | Apache seatunnel (cultivating) meetup sharing guests are in hot Recruitment!
Mysql database basic operation DQL basic query
Shader basic UV operations, translation, rotation, scaling
喜讯!科蓝SUNDB数据库与鸿数科技隐私数据保护管理软件完成兼容性适配
Aerospace Hongtu information won the bid for the database system research and development project of a unit in Urumqi
Postman generate timestamp, future timestamp
Numpy -- epidemic data analysis case
hellogolang
Eye of depth (VI) -- inverse of matrix (attachment: some ideas of logistic model)
分步式監控平臺zabbix
47_Opencv中的轮廓查找 cv::findContours()
L'application à l'échelle de la normalisation mature des produits ai des compagnies maritimes, cimc, leader mondial de l'intelligence artificielle portuaire et maritime / intelligence artificielle des
What about the pointer in neural network C language
Three. JS introductory learning notes 18: how to export JSON files with Blender
星瑞格数据库入围“2021年度福建省信息技术应用创新典型解决方案”
Three. JS introductory learning notes 00: coordinate system, camera (temporarily understood)