当前位置：网站首页>[Go through 8] Fully Connected Neural Network Video Notes

[Go through 8] Fully Connected Neural Network Video Notes

2022-08-05 05:25:00 【Mosu playing computer】

过一下8 Video is over

It's been two days

Fifth video,The teacher reviewed the previous concepts first,About cross entropy and relative entropy here,The former is simpler without a denominator,然后onehotSo in the end it simplifies to -log
（2022年6月28日08:38:38 The remainder has now been seen1/38了,加油）
梯度消失（Multiply by reverse pass0）
梯度爆炸（飞出去,Hold the egg）
在这里插入图片描述
裁剪：Bound stride

It was often used in the last era when the two problems were not exposed.Now the usefulness is in the output layer,Results are available if needed0和1之间的时候.Not used in hidden layers.

在这里插入图片描述

梯度下降存在的问题

在这里插入图片描述

设置成1 It's like no friction ,never stop.v=v

Sprint on the flat road

vibration direction r大 One step Xinglang is small
But in the process of accumulating,r越来越大,最后步长很小,走不动了
（这里就相当于 Brainless integration of all previous records）
在这里插入图片描述
This is fine ρRepresents how many training records before the ensemble.
0.999r+0.001 (g*g)-100轮-》就很小了
保证了 He just keeps this100training experience,It will not increase infinitely
If you want to keep more,就ρ设大一点,但不能为1（That leaves everything）

动量法-此消彼长
自适应-Different steps in different directions
Adam-结合两者
在这里插入图片描述

（2022年6月28日09:14:35 看了 2/38了）

在这里插入图片描述
可以先adamQuickly pick a similar one,Then add momentumSGDAlchemy slowly
Momentum will also be added firstSGD,然后再adam

参数初始化

在这里插入图片描述

The game can be played

Basically normal
在这里插入图片描述

大部分集中在0
在这里插入图片描述

Much even
在这里插入图片描述

If weight initialization is not considered ,Each neuron has the same parameters,就相当于一个神经元
If an inappropriate combination of initialization method and activation function is used,It will either lead to uneven distribution,Or pull your hips

批归一化

在这里插入图片描述
现在不考虑 Weight initialization thing
Think straight from the end,我直接对y下手
你想要的不就是 0均值1方差的y嘛
Then I'll ask for an average,Reduce again（归一化一下）,然后把这个当做y

在这里插入图片描述
The idea is to put it after the activation function,But in practice,Score it first FCand the activation function is better
can make those The spot that would have landed in a place where birds don't poop came back to a good place（Originally smaller and smaller values and where there is no gradient）
在这里插入图片描述
x1…xm就是原来的y
y1…ym就是上面标黄的
如果止步于此,It is a normalization
做了个改进,平移缩放
Let the neural network decide the mean and variance by itself（Those two parameters are also learned）

Forward is more convenient,The reverse can also have gradients
Ensure smooth flow of information=》训练好
（2022年6月28日09:59:20 看完3/38）

过拟合欠拟合

在这里插入图片描述
过拟合记住就好了（Often run towards this to design）
欠拟合学习能力差,学不来（通常可以解决）

L损失 E误差
训练集-优化
验证集测试集-泛化（arithmetic precision）
（2022年6月28日10:13:25 看了 4/38)

应对过拟合

在这里插入图片描述
增加训练数据-成本高

调整大小-9层改8层,500neuron changes300个
在这里插入图片描述
Force the neural network not to rely on larger samples to influence the weight parameters,Take the overall situation into consideration,to be more dispersed
make interface Simpler and smoother

随机失活

在这里插入图片描述

针对解释2
It feels a bit like the entire universe in an instant
Might be the last fightboss,To draw power from other universes,Then try your best,Might be doing well here,结果突然bossCome over and kill it（dropout）,That's for the final fightboss,Other universes have to work hard to become stronger,不能太单一.
Why do you have to work hard（平均）,instead of raising a big dad（All in one）,Because I don't know which one will bedropout,If it's all messed up（Little information is stored）,That's even worse,bossDefinitely can't beat it
解释3
equivalent to thatx的网络B和AThe result of the vote
Although a network is very cattle,Probably right most of the time,But when you make a mistake,就完蛋了,So this time you need three stooges

使用的时候
在这里插入图片描述
The neurons were all turned on during the test,Not randomly deactivated

在这里插入图片描述
One more ride at the endp,Otherwise, it's all time training1/2期望E,When testing is expectedE,It's twice as bad

That's directly during training 除一下p,保证数值（期望）The same is fine

（2022年6月28日10:38:59 看完了 5/38）

参数

参数-The neural network learns by itself
超参数-我定的
在这里插入图片描述

妙呀,Compare the learning rate to the length of the stick,Too big to hang outside
Generally speaking, you can't touch the bottom of the valley.

In the upper right corner is the strategy
/e^t has been declining
Or train for a round,然后卡住了,然后 Go to the next level to tune,循环
在这里插入图片描述
（Slip off and rest）