当前位置:网站首页>Pytorch weight decay and dropout
Pytorch weight decay and dropout
2022-07-05 11:42:00 【My abyss, my abyss】
There are two common methods to solve over fitting :
1、 Weight decline
Common methods :L1,L2 Regularization
L2 Regularization :
A neural network is trained to loss When converging , There will be multiple w,b eligible . If w Too big , Then the noise of the input layer will be amplified , The result will also be inaccurate , So we need to minimize w Value . Regularization makes the learned model parameters smaller by adding penalty terms to the loss function of the model .
2、 The law of abandonment ( Can only be used in the full connection layer )

dropout Do not change the expected value of its input , Only use it during model training
Yes p Probability ,hi It will be cleared
Yes 1-p Probability ,hi Will divide by 1-p Do stretching 

import torch
from torch import nn
from d2l import torch as d2l
dropout1, dropout2 = 0.2, 0.2
net = nn.Sequential(nn.Flatten(),
nn.Linear(784, 256),
nn.ReLU(),
# Add one after the first fully connected layer dropout layer
nn.Dropout(dropout1),
nn.Linear(256, 256),
nn.ReLU(),
# Add a... After the second fully connected layer dropout layer
nn.Dropout(dropout2),
nn.Linear(256, 10))
def init_weights(m):
if type(m) == nn.Linear:
nn.init.normal_(m.weight, std=0.01)
net.apply(init_weights);
num_epochs, lr, batch_size = 10, 0.5, 256
loss = nn.CrossEntropyLoss(reduction='none')
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
trainer = torch.optim.SGD(net.parameters(), lr=lr)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)
边栏推荐
- 解决readObjectStart: expect { or n, but found N, error found in #1 byte of ...||..., bigger context ..
- Go language learning notes - analyze the first program
- ZCMU--1390: 队列问题(1)
- Is it difficult to apply for a job after graduation? "Hundreds of days and tens of millions" online recruitment activities to solve your problems
- pytorch-权重衰退(weight decay)和丢弃法(dropout)
- [singleshotmultiboxdetector (SSD, single step multi frame target detection)]
- Differences between IPv6 and IPv4 three departments including the office of network information technology promote IPv6 scale deployment
- 技术管理进阶——什么是管理者之体力、脑力、心力
- 12.(地图数据篇)cesium城市建筑物贴图
- Unity Xlua MonoProxy Mono代理类
猜你喜欢

Cdga | six principles that data governance has to adhere to

紫光展锐全球首个5G R17 IoT NTN卫星物联网上星实测完成

pytorch训练进程被中断了

13. (map data) conversion between Baidu coordinate (bd09), national survey of China coordinate (Mars coordinate, gcj02), and WGS84 coordinate system

redis主从中的Master自动选举之Sentinel哨兵机制

【L1、L2、smooth L1三类损失函数】

Idea set the number of open file windows

NFT 交易市场主要使用 ETH 本位进行交易的局面是如何形成的?

redis的持久化机制原理

COMSOL -- establishment of geometric model -- establishment of two-dimensional graphics
随机推荐
View all processes of multiple machines
Sklearn model sorting
Guys, I tested three threads to write to three MySQL tables at the same time. Each thread writes 100000 pieces of data respectively, using F
Prevent browser backward operation
《看完就懂系列》15个方法教你玩转字符串
7 大主题、9 位技术大咖!龙蜥大讲堂7月硬核直播预告抢先看,明天见
COMSOL--三维图形的建立
【pytorch 修改预训练模型:实测加载预训练模型与模型随机初始化差别不大】
11. (map data section) how to download and use OSM data
【主流Nivida显卡深度学习/强化学习/AI算力汇总】
【上采样方式-OpenCV插值】
I used Kaitian platform to build an urban epidemic prevention policy inquiry system [Kaitian apaas battle]
中非 钻石副石怎么镶嵌,才能既安全又好看?
Dynamic SQL of ibatis
Error assembling WAR: webxml attribute is required (or pre-existing WEB-INF/web.xml if executing in
redis主从模式
COMSOL--建立几何模型---二维图形的建立
How to protect user privacy without password authentication?
Harbor镜像仓库搭建
如何通俗理解超级浏览器?可以用于哪些场景?有哪些品牌?