当前位置:网站首页>Pytorch weight decay and dropout
Pytorch weight decay and dropout
2022-07-05 11:42:00 【My abyss, my abyss】
There are two common methods to solve over fitting :
1、 Weight decline
Common methods :L1,L2 Regularization
L2 Regularization :
A neural network is trained to loss When converging , There will be multiple w,b eligible . If w Too big , Then the noise of the input layer will be amplified , The result will also be inaccurate , So we need to minimize w Value . Regularization makes the learned model parameters smaller by adding penalty terms to the loss function of the model .
2、 The law of abandonment ( Can only be used in the full connection layer )
dropout Do not change the expected value of its input , Only use it during model training
Yes p Probability ,hi It will be cleared
Yes 1-p Probability ,hi Will divide by 1-p Do stretching
import torch
from torch import nn
from d2l import torch as d2l
dropout1, dropout2 = 0.2, 0.2
net = nn.Sequential(nn.Flatten(),
nn.Linear(784, 256),
nn.ReLU(),
# Add one after the first fully connected layer dropout layer
nn.Dropout(dropout1),
nn.Linear(256, 256),
nn.ReLU(),
# Add a... After the second fully connected layer dropout layer
nn.Dropout(dropout2),
nn.Linear(256, 10))
def init_weights(m):
if type(m) == nn.Linear:
nn.init.normal_(m.weight, std=0.01)
net.apply(init_weights);
num_epochs, lr, batch_size = 10, 0.5, 256
loss = nn.CrossEntropyLoss(reduction='none')
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
trainer = torch.optim.SGD(net.parameters(), lr=lr)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)
边栏推荐
- AutoCAD -- mask command, how to use CAD to locally enlarge drawings
- comsol--三维图形随便画----回转
- [mainstream nivida graphics card deep learning / reinforcement learning /ai computing power summary]
- Question and answer 45: application of performance probe monitoring principle node JS probe
- C # implements WinForm DataGridView control to support overlay data binding
- MySQL statistical skills: on duplicate key update usage
- View all processes of multiple machines
- Mongodb replica set
- [office] eight usages of if function in Excel
- 管理多个Instagram帐户防关联小技巧大分享
猜你喜欢
随机推荐
yolov5目标检测神经网络——损失函数计算原理
15 methods in "understand series after reading" teach you to play with strings
无密码身份验证如何保障用户隐私安全?
解决grpc连接问题Dial成功状态为TransientFailure
Prevent browser backward operation
Harbor镜像仓库搭建
Spark Tuning (I): from HQL to code
The ninth Operation Committee meeting of dragon lizard community was successfully held
12. (map data) cesium city building map
查看多台机器所有进程
Harbor image warehouse construction
redis的持久化机制原理
【pytorch 修改预训练模型:实测加载预训练模型与模型随机初始化差别不大】
【L1、L2、smooth L1三类损失函数】
解决readObjectStart: expect { or n, but found N, error found in #1 byte of ...||..., bigger context ..
NFT 交易市场主要使用 ETH 本位进行交易的局面是如何形成的?
【爬虫】wasm遇到的bug
Cron expression (seven subexpressions)
Ffmpeg calls avformat_ open_ Error -22 returned during input (invalid argument)
Solve the problem of slow access to foreign public static resources