当前位置：网站首页>[Go through 7] Notes from the first section of the fully connected neural network video

[Go through 7] Notes from the first section of the fully connected neural network video

2022-08-05 05:25:00 【Mosu playing computer】

Article table of contents

Like today is dp
Fully Connected Neural Network
Send

Today is dp

June 25, 2022 Check out 7

Fully Connected Neural Network

Cascade multiple linear classifiers

The weight is the template

insert image description here
If in a linear classifier, the number of w lines isFixed, it is the same as the number of categories.So learn horses, learn two ends (obviously wrong)
But if you are in a full connection, you only require that w2 is fixed, w1 is casual, create 100 lines, 100 templates, and 10 of them are dividedIt is used to learn horses, learn single-headed horses, and then use the activation function to select them.That's right.

Name

insert image description here
Generally speaking, the former

Activation function

insert image description here
sigmoid is between 0-1
tanh is between -1 and 1, and it is symmetrical

softmax

We can set the output class according to the maximum value of the last value, but if we need to know how much predicted probability is, we need softmax
Insert image description here
Take e to the power of the exponent, then normalize it

Cross-entropy loss

insert image description here Cross-entropy loss is used to compare the difference between distributionsSimilarity , can not say distance, distance is AB BA is the same, has exchange, but entropy is not necessarily
Insert picture description here
And H[p] is the ground truth, the information it reflects is not confusing at all, so the entropy is 0.Cross entropy is 0+relative entropy.After simplification, it is the -log of the correct classification score.
But when the H[p] standard is not one-hot encoding, it is necessary to honestly use the relative entropy (KL divergence).
Insert picture description here
The teacher mentioned here: It may be in trainingAt the same time, there is a situation where "loss has not decreased, but accuracy has increased".Just like the example in the lower right corner of the figure above (assuming the third column is the correct classification),
0.35 0.33 0.32 (obviously not correct)
0.333 0.332 0.334 (correct)
For the correct classification-log 0.35 and -log0.333 are actually not much different, but his probability has become larger, and he stands out with a small improvement (0.334>0.333)

Calculation graph

Insert picture description here The positive value is the value, the negative direction is the gradient, chain derivationMultiplyable
Each node of the computational graph stores forward-propagation values and a reverse Jacobian matrix for forward and back-propagation
insert image description here

Granularity

insert image description here
A series of gates can be connected together to formA function gate like sigmoid has a large granularity, but has few calculation steps and is fast in operation.
Caffe someone wrote these functions, so it is fast; TensorFlow is a small gate, so the parameter return, slow (later improved)

Common door units

insert image description here
max is the larger number, and it will be passed to whoever.

Today, I watched the video of the third section, but unfortunately the notes I read in the pdf were squeezed out.I didn't stay, otherwise I can compare it and add it.

Send

I didn't sleep well this morning, got up with a golden shovel, then ate, rushed into the study room in the rain, was happy here, did whatever I wanted, and left at night for an hour of fast study.Hurry up.

原网站

版权声明
本文为[Mosu playing computer]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/217/202208050512158796.html