当前位置:网站首页>The four most common errors when using pytorch
The four most common errors when using pytorch
2022-07-04 15:43:00 【Xiaobai learns vision】
Reading guide
this 4 A mistake , I'm sure most people have committed , I hope I can give you a little reminder .
The most common neural network error :1) You didn't first try to fit a single batch.2) You forgot to set up for the network train/eval Pattern .3) stay .backward() Forget before .zero_grad()( stay pytorch in ).4) take softmaxed The output is passed to the expected raw logits The loss of , Anything else? ?
This article will analyze point by point how these errors are in PyTorch In the code example . Code :https://github.com/missinglinkai/common-nn-mistakes
Common mistakes 1 You didn't first try to fit a single batch
Andrej Say we should over fit individual batch. Why? ? ok , When you over fit a single batch —— You're actually making sure the model works . I don't want to waste hours of training on a huge dataset , Just to find out because of a small mistake , It's just 50% The accuracy of the . When your model fully remembers the input , What you get is a good prediction of its best performance .
Maybe the best performance is zero , Because an exception was thrown during execution . But it doesn't matter , Because we can quickly find the problem and solve it . To sum up , Why should you start over fitting from a small subset of the dataset :
- Find out bug
- Estimate the best possible loss and accuracy
- Fast iteration
stay PyTorch Data set , You are usually in dataloader Last iteration . Your first attempt might be to index train_loader.
# TypeError: 'DataLoader' object does not support indexing first_batch = train_loader[0]
You will immediately see a mistake , because DataLoaders Hope to support network flow and other scenarios that do not require indexing . So there was no __getitem__
Method , This led to the [0]
operation failed , Then you'll try to convert it to list, This will support indexing .
# slow, wasteful first_batch = list(train_loader)[0]
But that means you have to evaluate the entire dataset, which consumes your time and memory . So what else can we try ?
stay Python for In circulation , When you type in the following :
for item in iterable: do_stuff(item)
You effectively get this :
iterator = iter(iterable) try: while True: item = next(iterator) do_stuff(item) except StopIteration: pass
call “iter” Function to create an iterator , Then call the function's “next” To get the next entry . Until we finish ,StopIteration Be triggered . In this cycle , We just need to call next, next, next… . To simulate this behavior, but only get the first one , We can use this :
first = next(iter(iterable))
We call “iter” To get the iterator , But we just call “next” Function a . Be careful , For the sake of clarity , I assign the next result to a named “first” Variables in . I call this “next-iter” trick. In the following code , You can see the complete train data loader Example :
for batch_idx, (data, target) in enumerate(train_loader): # training code here
Here's how to modify the loop to use first-iter trick :
first_batch = next(iter(train_loader)) for batch_idx, (data, target) in enumerate([first_batch] * 50): # training code here
You can see that I will “first_batch” Multiplied by 50 Time , To make sure I over fit .
Common mistakes 2: Forget to set up for the network train/eval Pattern
Why? PyTorch Focus on whether we're training or evaluating models ? The biggest reason is dropout. This technique removes neurons at random during training .
Imagine , If the red neuron on the right is the only neuron that contributes to the right result . Once we remove the red neurons , It forces other neurons to train and learn how to be accurate without red . such drop-out Improved the performance of the final test —— But it has a negative impact on performance during training , Because the network is incomplete . Run the script and see MissingLink dashobard The accuracy of , Remember that .
In this particular case , It seems that every 50 The next iteration will reduce the accuracy .
If we check the code —— We see that it is train
The training mode is set in the function .
def train(model, optimizer, epoch, train_loader, validation_loader): model.train() # ???????????? for batch_idx, (data, target) in experiment.batch_loop(iterable=train_loader): data, target = Variable(data), Variable(target) # Inference output = model(data) loss_t = F.nll_loss(output, target) # The iconic grad-back-step trio optimizer.zero_grad() loss_t.backward() optimizer.step() if batch_idx % args.log_interval == 0: train_loss = loss_t.item() train_accuracy = get_correct_count(output, target) * 100.0 / len(target) experiment.add_metric(LOSS_METRIC, train_loss) experiment.add_metric(ACC_METRIC, train_accuracy) print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( epoch, batch_idx, len(train_loader), 100. * batch_idx / len(train_loader), train_loss)) with experiment.validation(): val_loss, val_accuracy = test(model, validation_loader) # ???????????? experiment.add_metric(LOSS_METRIC, val_loss) experiment.add_metric(ACC_METRIC, val_accuracy)
This problem is not easy to notice , In the loop, we call test function .
def test(model, test_loader): model.eval() # ...
stay test Internal function , We set the mode to eval
! It means , If we call it in the training process test
function , We'll go in eval
Pattern , Until the next time train
Function called . This leads to every one of them epoch Only one of them batch Used drop-out , This leads to the performance degradation we see .
It's easy to fix —— We will model.train()
Move one line down , Let's be in the training cycle . The ideal pattern setting is as close as possible to the reasoning step , To avoid forgetting to set it . After correction , Our training process seems more reasonable , There is no middle peak . Please note that , Due to the use of drop-out , Training accuracy will be lower than verification accuracy .
Common mistakes 3: Forget in .backward() Before .zero_grad()
When in “loss” Tensor calls “backward” when , You're telling me PyTorch from loss Go back , And calculate the impact of each weight on the loss , That is, this is the gradient of each node in the graph . Use this gradient , We can optimally update weights .
Here it is PyTorch The look in the code . final “step” The method will be based on “backward” The result of the step updates the weight . What may not be obvious from this code is , If we've been in a lot of batch I'm going to do this , Gradients explode , What we use step It's going to get bigger .
output = model(input) # forward-pass loss_fn.backward() # backward-pass optimizer.step() # update weights by an ever growing gradient ????????????
for fear of step Get too big , We use zero_grad
Method .
output = model(input) # forward-pass optimizer.zero_grad() # reset gradient ???? loss_fn.backward() # backward-pass optimizer.step() # update weights using a reasonably sized gradient ????
It may feel a little too obvious , But it does give precise control of gradients . There's a way to make sure you're not confused , That's putting these three functions together :
zero_grad
backward
step
In our code example , Don't use... At all zero_grad Under the circumstances . Neural networks are starting to get better , Because it's improving , But gradients eventually explode , All the updates are getting more and more junk , Until the Internet finally becomes useless .
call backward
Do it later zero_grad
. Nothing happened , Because we erased the gradient , So the weight is not updated . The only change left is dropout.
I think every time step
It makes sense to automatically reset the gradient when a method is called .
stay backward
When you don't use zero_grad One reason for that is , If you call step()
You have to call backward
, for example , If you each batch Only one sample can be put into memory , So a gradient would be too noisy , You want to be in every step A few batch Gradient of . Another reason might be to call backward
—— But in this case , You can also add up the losses , And then call... On the sum backward
.
Common mistakes 4: You finish softmax The results are sent to the original logits In the loss function of
logits Is the last fully connected layer activation value .softmax It's the same activation value , But it's standardized .logits value , There's something you can see , Some are negative . and log_softmax
The value after , It's all negative . If you look at the bar chart , The same can be seen in distributed , The only difference is the scale , But it's the subtle difference , The result is that the final mathematical calculation is completely different . But why is this a common mistake ? stay PyTorch The official MNIST In the example , see forward
Method , At the end, you can see the last full connectivity layer self.fc2
, Then is log_softmax
.
But when you look at the official PyTorch resnet perhaps AlexNet When modeling , You'll find that these models don't end up with softmax layer , The final result is a fully connected output , Namely logits.
The difference between the two is not made clear in the document . If you check nll_loss function , It is not mentioned that the input is logits still softmax, Your only hope is to find nll_loss
Used log_softmax
As input .
边栏推荐
- 一篇文章搞懂Go语言中的Context
- Redis哨兵模式实现一主二从三哨兵
- 左右对齐!
- [learning notes] matroid
- 31年前的Beyond演唱会,是如何超清修复的?
- Case sharing | integrated construction of data operation and maintenance in the financial industry
- [Dalian University of technology] information sharing of postgraduate entrance examination and re examination
- LeetCode 1184. 公交站间的距离 ---vector顺逆时针
- Unity script API - time class
- Summer Review, we must avoid stepping on these holes!
猜你喜欢
Unity动画Animation Day05
压力、焦虑还是抑郁? 正确诊断再治疗
数据湖治理:优势、挑战和入门
Blood cases caused by Lombok use
Big God explains open source buff gain strategy live broadcast
Go zero micro service practical series (IX. ultimate optimization of seckill performance)
Scientific research cartoon | what else to do after connecting with the subjects?
函数式接口,方法引用,Lambda实现的List集合排序小工具
.Net 应用考虑x64生成
AI做题水平已超过CS博士?
随机推荐
Shell 编程基础
MySQL federated primary key_ MySQL creates a federated primary key [easy to understand]
Go deep into the details of deconstruction and assignment of several data types in JS
直播预告 | PostgreSQL 内核解读系列第二讲:PostgreSQL 体系结构
Unity script API - GameObject game object, object object
%s格式符
odoo数据库主控密码采用什么加密算法?
Redis 解决事务冲突之乐观锁和悲观锁
Introduction of text mining tools [easy to understand]
Scientific research cartoon | what else to do after connecting with the subjects?
小数,指数
2022年九大CIO趨勢和優先事項
【读书会第十三期】视频文件的编码格式
Redis shares four cache modes
Go zero micro service practical series (IX. ultimate optimization of seckill performance)
重排数组
Analysis of nearly 100 million dollars stolen and horizon cross chain bridge attacked
JS tile data lookup leaf node
[book club issue 13] ffmpeg common methods for viewing media information and processing audio and video files
Functional interface, method reference, list collection sorting gadget implemented by lambda