当前位置:网站首页>Pytorch training process was interrupted
Pytorch training process was interrupted
2022-07-05 11:17:00 【IMQYT】
I'm scared to death , Training 3 The process of Tian's model was killed by his own hand , I almost cried , Has the money for renting a server for a week been wasted , Is time wasted , Can it be remedied ! For the first time , And my code runs very slowly (RTXA5000, It's reasonable to say that it's not slow , Too much data , In order to reduce the number of logs IO Wasted time , There is no log ), Only the model is saved . Already my hands are shaking
Don't talk much , How to remedy it ?
Save the model in the code only torch.save. Other parameters are not saved .epoch Nothing is saved , Found a lot of experience , Finally find a remedy
Reload the model
path='autodl-tmp/GraphDTA-master/model_GINConvNet_kiba.model'
model.load_state_dict(torch.load(path))
In this case , What the model learned is back , Include loss And so on. .
From here I can see ,loss It did continue 294 The training of the time , The same is true of the predicted value. Continue 294 The result after the first time , Fortunately, I got it back , But there was a problem , Because I saw epoch It seems to be from 1 Here we go , In this case, we need to train 600 Time ?, So remember to revise epoch The total number of times ,600-294=306, Although the control interrupt writes this 1, But retraining 306 This time it will end . Be accomplished
边栏推荐
- Web Security
- DDR4的特性与电气参数
- The art of communication III: Listening between people
- 力扣(LeetCode)185. 部门工资前三高的所有员工(2022.07.04)
- Modulenotfounderror: no module named 'scratch' ultimate solution
- regular expression
- Ffmpeg calls avformat_ open_ Error -22 returned during input (invalid argument)
- 如何将 DevSecOps 引入企业?
- sklearn模型整理
- 7.2每日学习4
猜你喜欢
一次edu证书站的挖掘
不要再说微服务可以解决一切问题了!
[advertising system] parameter server distributed training
R3Live系列学习(四)R2Live源码阅读(2)
【爬虫】wasm遇到的bug
【广告系统】Parameter Server分布式训练
Wechat nucleic acid detection appointment applet system graduation design completion (8) graduation design thesis template
Bidirectional RNN and stacked bidirectional RNN
DDR4硬件原理图设计详解
LSTM applied to MNIST dataset classification (compared with CNN)
随机推荐
regular expression
msfconsole命令大全,以及使用说明
R3live series learning (IV) r2live source code reading (2)
基于OpenHarmony的智能金属探测器
9、 Disk management
When using gbase 8C database, an error is reported: 80000502, cluster:%s is busy. What's going on?
uboot的启动流程:
技术管理进阶——什么是管理者之体力、脑力、心力
Differences between IPv6 and IPv4 three departments including the office of network information technology promote IPv6 scale deployment
LSTM applied to MNIST dataset classification (compared with CNN)
PWA (Progressive Web App)
Operators
技术分享 | 常见接口协议解析
【DNS】“Can‘t resolve host“ as non-root user, but works fine as root
iframe
Characteristics and electrical parameters of DDR4
What about SSL certificate errors? Solutions to common SSL certificate errors in browsers
OneForAll安装使用
Four departments: from now on to the end of October, carry out the "100 day action" on gas safety
2022 chemical automation control instrument examination questions and online simulation examination