当前位置:网站首页>Logistic regression: the most basic neural network
Logistic regression: the most basic neural network
2022-07-05 07:34:00 【sukhoi27smk】
One 、 What is? logictic regression
The picture below is Andrew Ng One provided with logistic regression Schematic diagram of algorithm structure for identifying master and child pictures :
「 On the left 」 Of 「x0 To x12287「 It's input (input), We call it 」 features (feather)」, Often use 「 Column vector x(i)「 To express ( there i On behalf of the i Training samples , Next, when only one sample is discussed , Just omit this mark for the time being , So as not to faint -_-|||), In picture recognition , The feature is usually the pixel value of the picture , Putting all the pixel values in a sequence is the input feature , Each feature has its own 」 The weight (weight)」, It's on the line in the figure 「w0 To w12287」, Usually, we also combine the left and right weights into one 「 Column vector W」.
「 The middle circle 」, We can call it a neuron , It receives input from the left and multiplies it by the corresponding weight , Plus an offset term b( A real number ), So the total input finally received is :
But this is not the final output , Just like neurons , There will be one. 「 Activation function (activation function)「 To process the input , To decide whether to output or how much .Logistic Regression The activation function of is 」sigmoid function 」, Be situated between 0 and 1 Between , The slope in the middle is relatively large , The slope on both sides is very small and tends to zero in the distance . Long like this ( Remember function expressions ):
We use it to represent the output of this neuron ,σ() The function represents sigmoid, Then we can see :
This can be seen as a prediction made by our small model according to the input , In the case corresponding to the initial figure , It is to predict whether the picture is a cat according to the pixels of the picture . With the corresponding , Every sample x Each has its own real label , The representative picture is a cat , It means not a cat . We hope that the output of the model can be as close to the real label as possible , such , This model can be used to predict whether a new picture is a cat . therefore , Our task is to find a group W,b, So that our model can be based on the given , Predict correctly . Here, , We can argue that , As long as the calculated value is greater than 0.5, that y' It's closer to 1, So it can be predicted that “ It's a cat. ”, whereas “ It's not a cat ”.
That's all Logistic Regression The basic structure of .
Two 、 How to learn W and b
In fact, I mentioned earlier , We 「 Need to learn W and b It can make the predicted value of the model y' With real labels y As close as possible to , That is to say y' and y Try to narrow the gap 」. therefore , We can define one 「 Loss function (Loss function)」, To measure and y The gap between :
actually , This is the cross entropy loss function ,Cross-entropy loss. Cross entropy measures the difference between two different distributions , ad locum , That is to measure the gap between our predicted distribution and the official distribution .
How to explain that this formula is suitable as a loss function ? Let's see :
When y=1 when ,, To make L Minimum , The maximum , be =1;
When y=0 when ,, To make L Minimum , Minimum , be =0.
such , Then we know that it meets our expectations for the loss function , Therefore, it is suitable as a loss function .
We know ,x Represents a set of inputs , It is equivalent to the characteristics of a sample . But when we train a model, there will be many training samples , That is, there are many x, There will be x(1),x(2),...,x(m) common m Samples (m Column vectors ), They can be written as a X matrix :
Correspondingly, we also have m A label ,:
There will also be calculated by our model m individual :
The loss function we wrote earlier , Calculate the loss of only one sample . But we need to consider the loss of all training samples , Then the total loss can be calculated in this way :
With the total loss function , Our learning task can be expressed in one sentence :
“ seek w and b, Minimize the loss function ”
To minimize the ... Easier said than done , Fortunately, we have computers , It can help us do a lot of repeated operations , So in neural networks , We usually use 「 Gradient descent method (Gradient Decent)」:
This method is more popular , First, find a random point on the curve , Then calculate the slope of the point , Also known as gradient , Then follow the gradient one step down , After reaching a new point , Repeat the above steps , Until we reach the lowest point ( Or reach a certain condition we meet ). Such as , Yes w Make a gradient descent , Is to repeat the steps ( Repeat once is called a 「 iteration 」):
among := representative “ Update with the following values ”,α representative 「 Learning rate (learning rate)」,dJ/dw Namely J Yes w Finding partial derivatives .
Back to our Logistic Regression problem , Is to initialize (initializing) A group of W and b, And give a learning rate , Specified to 「 The number of iterations 」( Is how many steps you want the dot to go down ), Then, in each iteration, find w and b Gradient of , And update the w and b. The final W and b Is what we learned W and b, hold W and b Put it into our model , It's the model we learned , It can be used to predict !
It should be noted that , The loss we use here is the loss of all training samples . actually , It will be too slow to update with the loss of all samples , But use a sample to update , The error will be very big . therefore , We often choose 「 Batches of a certain size 」(batch), And then calculate a batch Loss within , Then update the parameters .
To sum up :
Logistic Regression Model :, Remember that the activation function used is sigmoid function .
Loss function : Measure the difference between the predicted value and the real value , The smaller the better. .
We usually calculate the total loss of a batch of samples , Then use the gradient descent method to update .
「 The steps of training the model 」:
initialization W and b
Appoint learning rate And the number of iterations
Every iteration , Based on the current W and b Calculate the corresponding gradient (J Yes W,b Partial derivative of ), And then update W and b
End of the iteration , Learning W and b, Bring in the model to predict , Test the accuracy of training set test set separately , To evaluate the model
It's so clear (▰˘◡˘▰)
边栏推荐
- Typecho adds Baidu collection (automatic API submission plug-in and crawler protocol)
- [vscode] prohibit the pylance plug-in from automatically adding import
- C learning notes
- Charles- unable to grab bags and surf the Internet
- UE5热更新-远端服务器自动下载和版本检测(SimpleHotUpdate)
- golang定时器使用踩的坑:定时器每天执行一次
- I implement queue with C I
- arcgis_ spatialjoin
- 借助 Navicat for MySQL 软件 把 不同或者相同数据库链接中的某数据库表数据 复制到 另一个数据库表中
- Install deeptools in CONDA mode
猜你喜欢
Close of office 365 reading
Idea common settings
Oracle-触发器和程序包
行测--资料分析--fb--高照老师
And play the little chestnut of dynamic agent
Daily Practice:Codeforces Round #794 (Div. 2)(A~D)
Unforgettable summary of 2021
The number of occurrences of numbers in the offer 56 array (XOR)
玩转gRPC—深入概念与原理
Shadowless cloud desktop - online computer
随机推荐
UE5热更新-远端服务器自动下载和版本检测(SimpleHotUpdate)
目标检测系列——Faster R-CNN原理详解
ORACLE CREATE SEQUENCE,ALTER SEQUENCE,DROP SEQUENCE
list. files: List the Files in a Directory/Folder
Differences between pycharm and idle and process -- join() in vs Code
CADD课程学习(6)-- 获得已有的虚拟化合物库(Drugbank、ZINC)
玩转gRPC—深入概念与原理
Efficiency difference: the add method used by the set directly and the add method used by the set after judgment
[untitled]
Eclipse project recompile, clear cache
Deepin get file (folder) list
Tshydro tool
Don't confuse the use difference between series / and / *
And let's play dynamic proxy (extreme depth version)
repo. conda. An example of COM path error
Apple modify system shortcut key
[idea] efficient plug-in save actions to improve your work efficiency
Rough notes of C language (1)
(top) pretty girl binary color code portal
Cookie operation