当前位置：网站首页>Pytorch weight decay and dropout

Pytorch weight decay and dropout

2022-07-05 11:42:00 【My abyss, my abyss】

There are two common methods to solve over fitting ：

1、 Weight decline

Common methods ：L1,L2 Regularization

L2 Regularization ：
Alt
A neural network is trained to loss When converging , There will be multiple w,b eligible . If w Too big , Then the noise of the input layer will be amplified , The result will also be inaccurate , So we need to minimize w Value . Regularization makes the learned model parameters smaller by adding penalty terms to the loss function of the model .

2、 The law of abandonment （ Can only be used in the full connection layer ）

Alt
dropout Do not change the expected value of its input , Only use it during model training
Yes p Probability ,hi It will be cleared
Yes 1-p Probability ,hi Will divide by 1-p Do stretching
Alt
Alt

import torch
from torch import nn
from d2l import torch as d2l
dropout1, dropout2 = 0.2, 0.2
net = nn.Sequential(nn.Flatten(),
        nn.Linear(784, 256),
        nn.ReLU(),
        #  Add one after the first fully connected layer dropout layer 
        nn.Dropout(dropout1),
        nn.Linear(256, 256),
        nn.ReLU(),
        #  Add a... After the second fully connected layer dropout layer 
        nn.Dropout(dropout2),
        nn.Linear(256, 10))

def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, std=0.01)

net.apply(init_weights);

num_epochs, lr, batch_size = 10, 0.5, 256
loss = nn.CrossEntropyLoss(reduction='none')
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
trainer = torch.optim.SGD(net.parameters(), lr=lr)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)