当前位置:网站首页>[Go through 7] Notes from the first section of the fully connected neural network video
[Go through 7] Notes from the first section of the fully connected neural network video
2022-08-05 05:25:00 【Mosu playing computer】
Article table of contents
Today is dp
June 25, 2022 Check out 7
Fully Connected Neural Network
Cascade multiple linear classifiers
The weight is the template
If in a linear classifier, the number of w lines isFixed, it is the same as the number of categories.So learn horses, learn two ends (obviously wrong)
But if you are in a full connection, you only require that w2 is fixed, w1 is casual, create 100 lines, 100 templates, and 10 of them are dividedIt is used to learn horses, learn single-headed horses, and then use the activation function to select them.That's right.
Name
Generally speaking, the former
Activation function
sigmoid is between 0-1
tanh is between -1 and 1, and it is symmetrical
softmax
We can set the output class according to the maximum value of the last value, but if we need to know how much predicted probability is, we need softmax
Take e to the power of the exponent, then normalize it
Cross-entropy loss
Cross-entropy loss is used to compare the difference between distributionsSimilarity , can not say distance, distance is AB BA is the same, has exchange, but entropy is not necessarily
And H[p] is the ground truth, the information it reflects is not confusing at all, so the entropy is 0.Cross entropy is 0+relative entropy.After simplification, it is the -log of the correct classification score.
But when the H[p] standard is not one-hot encoding, it is necessary to honestly use the relative entropy (KL divergence).
The teacher mentioned here: It may be in trainingAt the same time, there is a situation where "loss has not decreased, but accuracy has increased".Just like the example in the lower right corner of the figure above (assuming the third column is the correct classification),
0.35 0.33 0.32 (obviously not correct)
0.333 0.332 0.334 (correct)
For the correct classification-log 0.35 and -log0.333 are actually not much different, but his probability has become larger, and he stands out with a small improvement (0.334>0.333)
Calculation graph
The positive value is the value, the negative direction is the gradient, chain derivationMultiplyable
Each node of the computational graph stores forward-propagation values and a reverse Jacobian matrix for forward and back-propagation
Granularity
A series of gates can be connected together to formA function gate like sigmoid has a large granularity, but has few calculation steps and is fast in operation.
Caffe someone wrote these functions, so it is fast; TensorFlow is a small gate, so the parameter return, slow (later improved)
Common door units
max is the larger number, and it will be passed to whoever.
Today, I watched the video of the third section, but unfortunately the notes I read in the pdf were squeezed out.I didn't stay, otherwise I can compare it and add it.
Send
I didn't sleep well this morning, got up with a golden shovel, then ate, rushed into the study room in the rain, was happy here, did whatever I wanted, and left at night for an hour of fast study.Hurry up.
边栏推荐
猜你喜欢
Flex layout frog game clearance strategy
coppercam primer [6]
开发一套高容错分布式系统
第二讲 Linear Model 线性模型
第三讲 Gradient Tutorial梯度下降与随机梯度下降
Basic properties of binary tree + oj problem analysis
【过一下4】09-10_经典网络解析
LeetCode: 1403. Minimum subsequence in non-increasing order [greedy]
DOM及其应用
pycharm中调用Matlab配置:No module named ‘matlab.engine‘; ‘matlab‘ is not a package
随机推荐
Detailed Explanation of Redis Sentinel Mode Configuration File
开发一套高容错分布式系统
[Software Exam System Architect] Software Architecture Design ③ Domain-Specific Software Architecture (DSSA)
Opencv中,imag=cv2.cvtColor(imag,cv2.COLOR_BGR2GRAY) 报错:error:!_src.empty() in function ‘cv::cvtColor‘
使用二维码解决固定资产管理的难题
[Decoding tools] Some online tools for Bitcoin
MySQL基础(一)---基础认知及操作
Returned object not currently part of this pool
The mall background management system based on Web design and implementation
位运算符与逻辑运算符的区别
OFDM Lecture 16 5 -Discrete Convolution, ISI and ICI on DMT/OFDM Systems
coppercam primer [6]
1.3 mysql批量插入数据
Difference between for..in and for..of
【过一下11】随机森林和特征工程
ES6 生成器
Flutter学习三-Flutter基本结构和原理
【过一下16】回顾一下七月
【过一下 17】pytorch 改写 keras
"PHP8 Beginner's Guide" A brief introduction to PHP