当前位置:网站首页>[Go through 7] Notes from the first section of the fully connected neural network video
[Go through 7] Notes from the first section of the fully connected neural network video
2022-08-05 05:25:00 【Mosu playing computer】
Article table of contents
Today is dp
June 25, 2022 Check out 7
Fully Connected Neural Network
Cascade multiple linear classifiers
The weight is the template

If in a linear classifier, the number of w lines isFixed, it is the same as the number of categories.So learn horses, learn two ends (obviously wrong)
But if you are in a full connection, you only require that w2 is fixed, w1 is casual, create 100 lines, 100 templates, and 10 of them are dividedIt is used to learn horses, learn single-headed horses, and then use the activation function to select them.That's right.
Name

Generally speaking, the former
Activation function

sigmoid is between 0-1
tanh is between -1 and 1, and it is symmetrical
softmax
We can set the output class according to the maximum value of the last value, but if we need to know how much predicted probability is, we need softmax
Take e to the power of the exponent, then normalize it
Cross-entropy loss
Cross-entropy loss is used to compare the difference between distributionsSimilarity , can not say distance, distance is AB BA is the same, has exchange, but entropy is not necessarily 
And H[p] is the ground truth, the information it reflects is not confusing at all, so the entropy is 0.Cross entropy is 0+relative entropy.After simplification, it is the -log of the correct classification score.
But when the H[p] standard is not one-hot encoding, it is necessary to honestly use the relative entropy (KL divergence).
The teacher mentioned here: It may be in trainingAt the same time, there is a situation where "loss has not decreased, but accuracy has increased".Just like the example in the lower right corner of the figure above (assuming the third column is the correct classification),
0.35 0.33 0.32 (obviously not correct)
0.333 0.332 0.334 (correct)
For the correct classification-log 0.35 and -log0.333 are actually not much different, but his probability has become larger, and he stands out with a small improvement (0.334>0.333)
Calculation graph
The positive value is the value, the negative direction is the gradient, chain derivationMultiplyable
Each node of the computational graph stores forward-propagation values and a reverse Jacobian matrix for forward and back-propagation
Granularity

A series of gates can be connected together to formA function gate like sigmoid has a large granularity, but has few calculation steps and is fast in operation.
Caffe someone wrote these functions, so it is fast; TensorFlow is a small gate, so the parameter return, slow (later improved)
Common door units

max is the larger number, and it will be passed to whoever.
Today, I watched the video of the third section, but unfortunately the notes I read in the pdf were squeezed out.I didn't stay, otherwise I can compare it and add it.
Send
I didn't sleep well this morning, got up with a golden shovel, then ate, rushed into the study room in the rain, was happy here, did whatever I wanted, and left at night for an hour of fast study.Hurry up.
边栏推荐
猜你喜欢

jvm three heap and stack
![[Study Notes Dish Dog Learning C] Classic Written Exam Questions of Dynamic Memory Management](/img/0b/f7d9205c616f7785519cf94853d37d.png)
[Study Notes Dish Dog Learning C] Classic Written Exam Questions of Dynamic Memory Management

Structured Light 3D Reconstruction (2) Line Structured Light 3D Reconstruction

Structured light 3D reconstruction (1) Striped structured light 3D reconstruction

数据库 单表查询

Qt制作18帧丘比特表白意中人、是你的丘比特嘛!!!

开发一套高容错分布式系统

Reverse theory knowledge 4

【过一下10】sklearn使用记录

Develop a highly fault-tolerant distributed system
随机推荐
物理层的接口有哪几个方面的特性?各包含些什么内容?
jvm 三 之堆与栈
MySQL基础(一)---基础认知及操作
RDD和DataFrame和Dataset
coppercam入门手册[6]
【读书】长期更新
【过一下3】卷积&图像噪音&边缘&纹理
entry point injection
The underlying mechanism of the class
逆向理论知识4
【微信小程序】WXML模板语法-条件渲染
【Untitled】
RL强化学习总结(一)
The difference between span tag and p
Redis - 13、开发规范
Using QR codes to solve fixed asset management challenges
coppercam primer [6]
Algorithms - ones and zeros (Kotlin)
MySQL Foundation (1) - Basic Cognition and Operation
机器学习(二) —— 机器学习基础