当前位置:网站首页>AI zhetianchuan DL regression and classification
AI zhetianchuan DL regression and classification
2022-07-26 17:48:00 【Teacher, I forgot my homework】
This paper mainly introduces Logistic Return to and Softmax Return to
One 、 Regression and classified recall
Set of given data points
And the corresponding labels
, For a new data point x, Predict its label ( The goal is to find a mapping
):
If
Is a continuous set , Call it Return to (regression)
If
Is a discrete set , Call it classification (classfication)

Polynomial regression
Consider a regression problem , Input x And the output y All scalars . Find a function
To fit the data




Whether linear regression or nonlinear regression , We usually pass some cost function Such as minimum mean square error (MSE), As Loss function , To make sure f Parameters of .
Linear regression
Is linear

among
( bias / residual / Error term ) Can integrate
And get
- Set the mean square error (MSE) Is the cost function

- Find the best by minimizing the cost function w and b
Such as the least square method 、 The gradient descent method minimizes the loss function to solve the parameters .
AI Cover the sky ML- Introduction to regression analysis
Binary classification by regression
In feature space , A linear classifier corresponds to a hyperplane

Two typical linear classifiers :
- perceptron
- SVM(AI Cover the sky ML-SVM introduction )

- Return to - Forecast continuous

- classification - forecast

Binary classification using linear regression :
Assume
, Consider the case of one-dimensional features

Assume
, Consider the case of high-dimensional features

Binary classification using nonlinear regression
It can be a nonlinear function , Such as :logisitic sigoid function

Similarly, we can train nonlinear regression by training linear regression model , It's just the original 
Turned into
notes : there h Is a function such as logisitic sigoid function
Look at the problem from the perspective of probability
Suppose that the label obeys the mean
Of Normal distribution , Then its maximum likelihood estimation is equivalent to minimization :

- For the return question (t yes continuity Of ), The assumption of normal distribution is natural .
- For the classification problem (t yes discrete Of ), The assumption of normal distribution would be strange .
- There are more suitable assumptions for the data distribution of the binary classification problem ----> Bernoulli distribution
Why is Bernoulli distribution more suitable for binary classification problems ?
Two 、Logistic Return to
For a binary task , One 0-1 The unit is enough to represent a label

Try to learn conditional probability ( Have already put b integrate into
,x For input ,t Label )

Our goal is to find a
Value makes probability 
When x Belong to the category 1 when , Take a large value, such as 0.99999.
When x Belong to the category 2 when , Take a small value such as 0.00001 ( therefore
Take a large value )
We are essentially using another continuous function h Come on “ Return to ” A discrete function (x -> t)

Cross entropy error function (CSE)
For Bernoulli distribution , We maximize conditional data likelihood , Getting is equivalent to minimizing :

obtain New loss function (CSE)
Let's take out one of them :
- so , If t=1, be E = -ln(h)

- If t=0, be E = -ln(1-h)

You can see the river .
Training and testing

II. Summary of classification problems

3、 ... and 、SoftMax Return to
We explained the one-dimensional and multi-dimensional classification above , In fact, for multi classification , Just add the number of functions as the dimension .

Pictured above , For example, for a x, The result of the three functions is 1.2、4.1、1.9, Then it can be regressed or classified according to subsequent operations . These three functions may be linear Of , It could be nonlinear Of , Such as logistic Return to .
choice Mean square error (MSE) As a loss function

Use the least square method / The gradient descent method is used to calculate the parameters .
Representation of label categories
For the classification problem , That is, through a mapping f The output is a discrete set , We have two ways to represent labels :

For the first method , There is a distance relationship between categories , So we usually use the second representation . Each dimension has only 0-1 Two results .

We only need to see which kind of point represents the closest point in a certain point of the output to classify .
From the perspective of probability :
We mentioned above , For binary tasks , Bernoulli distribution is more suitable , So we introduced logistic Return to .
When faced with multi classification tasks (K>2) when , We choose As a whole multinoulli/categorical Distribution
Review and overall planning multinoulli/categorical Distribution

Overall and distributed learning :
- Make
Take the form of :

clearly ,
also 
- Given a test input x, For each k=1,2,...,K, It is estimated that

- When x Belong to the first K Class , Take a large value
- When x When it belongs to other classes , Take a small value
- because
It's a ( Successive ) probability , We need to convert it into discrete values that match the classification .
Softmax function

The following functions are called Softmax function :

- If
For all
All set up , Then for all
Yes
But its value is less than 1. - If
For all
All set up , Then for all
Yes
.
Again , We get the maximum conditional likelihood Cross entropy error function :

notes :
For each K, There is only one non 0 term ( Because like (0,0,0,1,0,0))
Calculate the gradient

vector - Matrix form

Training and testing

Stochastic gradient descent

Throughout the training set , The computational cost of minimizing the hate function is very large , We usually divide the training set into smaller subsets or minibatches And then in a single minibatches (xi,yi) Optimize the cost function , And take the average .

Introduce bias bias
up to now , We have assumed that 
among 
Sometimes the offset term can be introduced into
in , The parameter becomes {w,b}

obtain

Regularization is usually applied only to w On

Softmax Over parameterization
There are assumptions 
New parameters
Will get the same prediction
Minimizing the cross entropy function can have an infinite number of solutions , because :

among 
Four 、Softmax Review and logistic Review the relationship
Softmax In the regression , Make K=2


among h yes softmax function g yes logistic function
If you define a new variable
Well then logistic Regression is the same

5、 ... and 、 summary


Cross entropy in the general sense


边栏推荐
- SQL中去去重的三种方式
- How to set IP for layer 2 management switches
- 机器视觉在服务机器人中的应用
- Is it really safe and reliable to exempt five in case of opening an account in a stock company
- 【集训Day2】Torchbearer
- Asemi rectifier bridge kbpc3510, kbpc3510 package, kbpc3510 application
- What is a test case? How to design?
- 2.1.2 synchronization always fails
- After vs code is formatted, the function name will be automatically followed by a space
- [300 opencv routines] 240. Shi Tomas corner detection in opencv
猜你喜欢
![[machine learning] principle and code of mean shift](/img/d8/a367c26b51d9dbaf53bf4fe2a13917.png)
[machine learning] principle and code of mean shift

Tianyi cloud web application firewall (edge cloud version) supports the detection and interception of Apache spark shell command injection vulnerabilities

Diagram of seven connection modes of MySQL

【欧米读书会】谈谈元宇宙中的创作者经济:无限次元

The diagram of user login verification process is well written!

Centos安装docker以及mysql和redis环境

JS 闭包 模拟私有变量 面试题 立即执行函数IIFE

【集训Day1】 Dwarves line up

(25)Blender源码分析之顶层菜单Blender菜单

A detailed explanation of throughput, QPS, TPS, concurrency and other high concurrency indicators
随机推荐
Come on developer! Not only for the 200000 bonus, try the best "building blocks" for a brainstorming!
SQL中去去重的三种方式
Performance tuning bugs emerge in endlessly? These three documents can easily handle JVM tuning
解决哈希冲突的几种方式
A collection of commonly used shortcut keys for office software
就这一次!详细聊聊分布式系统的那些技术方案
即刻报名|飞桨黑客马拉松第三期盛夏登场,等你挑战
6-19 vulnerability exploitation -nsf to obtain the target password file
Tianyi cloud web application firewall (edge cloud version) supports the detection and interception of Apache spark shell command injection vulnerabilities
线性表的顺序存储结构——顺序表
SQL注入(思维导图)
CCS TM4C123新建工程
2.1.2 synchronization always fails
What is a test case? How to design?
图扑 3D 可视化国风设计 | 科技与文化碰撞炫酷”火花“
Pytest(思维导图)
The latest interface of Taobao / tmall keyword search
03 | implement usereducer and usestate
# MySQL 七种连接方式图解
【集训Day3】Reconstruction of roads




Take the form of :
It's a ( Successive ) probability , We need to convert it into discrete values that match the classification .
For all
All set up , Then for all
But its value is less than 1.
.