当前位置:网站首页>The four most common errors when using pytorch
The four most common errors when using pytorch
2022-07-04 15:43:00 【Xiaobai learns vision】
Reading guide
this 4 A mistake , I'm sure most people have committed , I hope I can give you a little reminder .
The most common neural network error :1) You didn't first try to fit a single batch.2) You forgot to set up for the network train/eval Pattern .3) stay .backward() Forget before .zero_grad()( stay pytorch in ).4) take softmaxed The output is passed to the expected raw logits The loss of , Anything else? ?
This article will analyze point by point how these errors are in PyTorch In the code example . Code :https://github.com/missinglinkai/common-nn-mistakes
Common mistakes 1 You didn't first try to fit a single batch
Andrej Say we should over fit individual batch. Why? ? ok , When you over fit a single batch —— You're actually making sure the model works . I don't want to waste hours of training on a huge dataset , Just to find out because of a small mistake , It's just 50% The accuracy of the . When your model fully remembers the input , What you get is a good prediction of its best performance .
Maybe the best performance is zero , Because an exception was thrown during execution . But it doesn't matter , Because we can quickly find the problem and solve it . To sum up , Why should you start over fitting from a small subset of the dataset :
- Find out bug
- Estimate the best possible loss and accuracy
- Fast iteration
stay PyTorch Data set , You are usually in dataloader Last iteration . Your first attempt might be to index train_loader.
# TypeError: 'DataLoader' object does not support indexing first_batch = train_loader[0]
You will immediately see a mistake , because DataLoaders Hope to support network flow and other scenarios that do not require indexing . So there was no __getitem__ Method , This led to the [0] operation failed , Then you'll try to convert it to list, This will support indexing .
# slow, wasteful first_batch = list(train_loader)[0]
But that means you have to evaluate the entire dataset, which consumes your time and memory . So what else can we try ?
stay Python for In circulation , When you type in the following :
for item in iterable:
do_stuff(item)You effectively get this :
iterator = iter(iterable)
try:
while True:
item = next(iterator)
do_stuff(item)
except StopIteration:
pass
call “iter” Function to create an iterator , Then call the function's “next” To get the next entry . Until we finish ,StopIteration Be triggered . In this cycle , We just need to call next, next, next… . To simulate this behavior, but only get the first one , We can use this :
first = next(iter(iterable))
We call “iter” To get the iterator , But we just call “next” Function a . Be careful , For the sake of clarity , I assign the next result to a named “first” Variables in . I call this “next-iter” trick. In the following code , You can see the complete train data loader Example :
for batch_idx, (data, target) in enumerate(train_loader): # training code here
Here's how to modify the loop to use first-iter trick :
first_batch = next(iter(train_loader)) for batch_idx, (data, target) in enumerate([first_batch] * 50): # training code here
You can see that I will “first_batch” Multiplied by 50 Time , To make sure I over fit .
Common mistakes 2: Forget to set up for the network train/eval Pattern
Why? PyTorch Focus on whether we're training or evaluating models ? The biggest reason is dropout. This technique removes neurons at random during training .
Imagine , If the red neuron on the right is the only neuron that contributes to the right result . Once we remove the red neurons , It forces other neurons to train and learn how to be accurate without red . such drop-out Improved the performance of the final test —— But it has a negative impact on performance during training , Because the network is incomplete . Run the script and see MissingLink dashobard The accuracy of , Remember that .
In this particular case , It seems that every 50 The next iteration will reduce the accuracy .
If we check the code —— We see that it is train The training mode is set in the function .
def train(model, optimizer, epoch, train_loader, validation_loader):
model.train() # ????????????
for batch_idx, (data, target) in experiment.batch_loop(iterable=train_loader):
data, target = Variable(data), Variable(target)
# Inference
output = model(data)
loss_t = F.nll_loss(output, target)
# The iconic grad-back-step trio
optimizer.zero_grad()
loss_t.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
train_loss = loss_t.item()
train_accuracy = get_correct_count(output, target) * 100.0 / len(target)
experiment.add_metric(LOSS_METRIC, train_loss)
experiment.add_metric(ACC_METRIC, train_accuracy)
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx, len(train_loader),
100. * batch_idx / len(train_loader), train_loss))
with experiment.validation():
val_loss, val_accuracy = test(model, validation_loader) # ????????????
experiment.add_metric(LOSS_METRIC, val_loss)
experiment.add_metric(ACC_METRIC, val_accuracy)This problem is not easy to notice , In the loop, we call test function .
def test(model, test_loader):
model.eval()
# ... stay test Internal function , We set the mode to eval! It means , If we call it in the training process test function , We'll go in eval Pattern , Until the next time train Function called . This leads to every one of them epoch Only one of them batch Used drop-out , This leads to the performance degradation we see .
It's easy to fix —— We will model.train() Move one line down , Let's be in the training cycle . The ideal pattern setting is as close as possible to the reasoning step , To avoid forgetting to set it . After correction , Our training process seems more reasonable , There is no middle peak . Please note that , Due to the use of drop-out , Training accuracy will be lower than verification accuracy .
Common mistakes 3: Forget in .backward() Before .zero_grad()
When in “loss” Tensor calls “backward” when , You're telling me PyTorch from loss Go back , And calculate the impact of each weight on the loss , That is, this is the gradient of each node in the graph . Use this gradient , We can optimally update weights .
Here it is PyTorch The look in the code . final “step” The method will be based on “backward” The result of the step updates the weight . What may not be obvious from this code is , If we've been in a lot of batch I'm going to do this , Gradients explode , What we use step It's going to get bigger .
output = model(input) # forward-pass loss_fn.backward() # backward-pass optimizer.step() # update weights by an ever growing gradient ????????????
for fear of step Get too big , We use zero_grad Method .
output = model(input) # forward-pass optimizer.zero_grad() # reset gradient ???? loss_fn.backward() # backward-pass optimizer.step() # update weights using a reasonably sized gradient ????
It may feel a little too obvious , But it does give precise control of gradients . There's a way to make sure you're not confused , That's putting these three functions together :
zero_gradbackwardstep
In our code example , Don't use... At all zero_grad Under the circumstances . Neural networks are starting to get better , Because it's improving , But gradients eventually explode , All the updates are getting more and more junk , Until the Internet finally becomes useless .
call backward Do it later zero_grad. Nothing happened , Because we erased the gradient , So the weight is not updated . The only change left is dropout.
I think every time step It makes sense to automatically reset the gradient when a method is called .
stay backward When you don't use zero_grad One reason for that is , If you call step() You have to call backward, for example , If you each batch Only one sample can be put into memory , So a gradient would be too noisy , You want to be in every step A few batch Gradient of . Another reason might be to call backward —— But in this case , You can also add up the losses , And then call... On the sum backward.
Common mistakes 4: You finish softmax The results are sent to the original logits In the loss function of
logits Is the last fully connected layer activation value .softmax It's the same activation value , But it's standardized .logits value , There's something you can see , Some are negative . and log_softmax The value after , It's all negative . If you look at the bar chart , The same can be seen in distributed , The only difference is the scale , But it's the subtle difference , The result is that the final mathematical calculation is completely different . But why is this a common mistake ? stay PyTorch The official MNIST In the example , see forward Method , At the end, you can see the last full connectivity layer self.fc2, Then is log_softmax.
But when you look at the official PyTorch resnet perhaps AlexNet When modeling , You'll find that these models don't end up with softmax layer , The final result is a fully connected output , Namely logits.
The difference between the two is not made clear in the document . If you check nll_loss function , It is not mentioned that the input is logits still softmax, Your only hope is to find nll_loss Used log_softmax As input .
边栏推荐
- Unity script API - GameObject game object, object object
- Understand Alibaba cloud's secret weapon "dragon architecture" in the article "science popularization talent"
- Redis sentinel mode realizes one master, two slave and three Sentinels
- 浮点数如何与0进行比较?
- %S format character
- Optimization method of deep learning neural network
- Deep learning neural network case (handwritten digit recognition)
- Flutter reports an error no mediaquery widget ancestor found
- Width and alignment
- .Net之延迟队列
猜你喜欢

Dialogue with ye Yanxiu, senior consultant of Longzhi and atlassian certification expert: where should Chinese users go when atlassian products enter the post server era?

Preliminary exploration of flask: WSGI

MP3是如何诞生的?
Redis的4种缓存模式分享

AI做题水平已超过CS博士?
![[Dalian University of technology] information sharing of postgraduate entrance examination and re examination](/img/06/df5a64441814c9ecfa2f039318496e.jpg)
[Dalian University of technology] information sharing of postgraduate entrance examination and re examination

Quelles sont les perspectives de l'Internet intelligent des objets (aiot) qui a explosé ces dernières années?

TechSmith Camtasia studio 2022.0.2 screen recording software

每周招聘|高级DBA年薪49+,机会越多,成功越近!
Redis shares four cache modes
随机推荐
31年前的Beyond演唱会,是如何超清修复的?
十六进制
%f格式符
Hexadecimal form
selenium 元素交互
直播预告 | PostgreSQL 内核解读系列第二讲:PostgreSQL 体系结构
Unity script introduction day01
. Net delay queue
在芯片高度集成的今天,绝大多数都是CMOS器件
一篇文章学会GO语言中的变量
Redis sentinel mode realizes one master, two slave and three Sentinels
【大连理工大学】考研初试复试资料分享
怎么判断外盘期货平台正规,资金安全?
进制乱炖
干货 | fMRI标准报告指南新鲜出炉啦,快来涨知识吧
MySQL学习笔记——数据类型(数值类型)
Deep learning network regularization
Intelligent customer service track: Netease Qiyu and Weier technology play different ways
LNX efficient search engine, fastdeploy reasoning deployment toolbox, AI frontier paper | showmeai information daily # 07.04
How to rapidly deploy application software under SaaS