当前位置:网站首页>pytorch-权重衰退(weight decay)和丢弃法(dropout)
pytorch-权重衰退(weight decay)和丢弃法(dropout)
2022-07-05 11:34:00 【我渊啊我渊啊】
解决过拟合的常用两种方法:
1、权重衰退
常用方法:L1,L2正则化
L2正则化:
一个神经网络训练至loss收敛时,会有多个w,b符合条件。如果w过大,则输入层的噪声将会被放大,导致结果也会不准确,因此需要尽量减少w的值。正则化通过为模型的损失函数加入惩罚项使得学出的模型参数值比较小。
2、丢弃法(只能用于全连接层)
dropout不改变其输入的期望值,只在模型训练的时候使用
有p的概率,hi会清零
有1-p的概率,hi会除以1-p做拉伸
import torch
from torch import nn
from d2l import torch as d2l
dropout1, dropout2 = 0.2, 0.2
net = nn.Sequential(nn.Flatten(),
nn.Linear(784, 256),
nn.ReLU(),
# 在第一个全连接层之后添加一个dropout层
nn.Dropout(dropout1),
nn.Linear(256, 256),
nn.ReLU(),
# 在第二个全连接层之后添加一个dropout层
nn.Dropout(dropout2),
nn.Linear(256, 10))
def init_weights(m):
if type(m) == nn.Linear:
nn.init.normal_(m.weight, std=0.01)
net.apply(init_weights);
num_epochs, lr, batch_size = 10, 0.5, 256
loss = nn.CrossEntropyLoss(reduction='none')
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
trainer = torch.optim.SGD(net.parameters(), lr=lr)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)
边栏推荐
- 技术管理进阶——什么是管理者之体力、脑力、心力
- Manage multiple instagram accounts and share anti Association tips
- XML parsing
- Harbor image warehouse construction
- spark调优(一):从hql转向代码
- FreeRTOS 中 RISC-V-Qemu-virt_GCC 的调度时机
- How to protect user privacy without password authentication?
- The art of communication III: Listening between people
- SLAM 01. Modeling of human recognition Environment & path
- Go language learning notes - first acquaintance with go language
猜你喜欢
12.(地图数据篇)cesium城市建筑物贴图
Is it difficult to apply for a job after graduation? "Hundreds of days and tens of millions" online recruitment activities to solve your problems
COMSOL--三维随便画--扫掠
go语言学习笔记-分析第一个程序
7 大主题、9 位技术大咖!龙蜥大讲堂7月硬核直播预告抢先看,明天见
The ninth Operation Committee meeting of dragon lizard community was successfully held
COMSOL -- three-dimensional graphics random drawing -- rotation
Ziguang zhanrui's first 5g R17 IOT NTN satellite in the world has been measured on the Internet of things
[Oracle] use DataGrid to connect to Oracle Database
12. (map data) cesium city building map
随机推荐
Oneforall installation and use
spark调优(一):从hql转向代码
[crawler] bugs encountered by wasm
技术分享 | 常见接口协议解析
Solve the grpc connection problem. Dial succeeds with transientfailure
Error assembling WAR: webxml attribute is required (or pre-existing WEB-INF/web.xml if executing in
COMSOL--三维图形的建立
Advanced technology management - what is the physical, mental and mental strength of managers
c#操作xml文件
中非 钻石副石怎么镶嵌,才能既安全又好看?
PHP中Array的hash函数实现
Summary of thread and thread synchronization under window
How did the situation that NFT trading market mainly uses eth standard for trading come into being?
redis集群中hash tag 使用
How can China Africa diamond accessory stones be inlaid to be safe and beautiful?
解决readObjectStart: expect { or n, but found N, error found in #1 byte of ...||..., bigger context ..
POJ 3176-Cow Bowling(DP||记忆化搜索)
阻止瀏覽器後退操作
Redis集群(主从)脑裂及解决方案
ibatis的动态sql