当前位置:网站首页>[help] mindspire training based on ascend910 cannot reproduce the model effect on GPU
[help] mindspire training based on ascend910 cannot reproduce the model effect on GPU
2022-07-25 00:12:00 【Xiaole happy】
A very simple convolution down sampling model , Two layer convolution +ReLU

【 Training environment 】
1、Ascend910
2、MindSpore1.1.1
【 Problem description 】
Use the same training data and training process , stay Ascend Upper training convolution lower sampling model , Has been unable to reproduce in GPU The model effect trained on , Tests on the same test set MS-SSIM Yes 0.15 The gap between (0.945 VS 0.96).
And a phenomenon was observed :loss It's using MS-SSIM loss, stay GPU Used in training lr yes 0.00001, Finally, it can converge , But in Ascend910 In training , Or use it 0.00001 This lr, Train more than ten epoch after ,loss It will rise , Unable to converge . take lr To adjust to 0.0000001 Then training can converge , But the final model effect cannot be achieved GPU Model effect on .
Logically speaking , Just training such a simple model ,Ascend and GPU There should not be such a big gap , Now I have a few questions for help :
1、 Use MindSpore In training , Are there any tuning trick?
2、 because loss It uses MindSpore Self contained Function mindspore.nn.MSSSIM Self realized loss, Is there a problem with this approach ?MindSpore Built in mindspore.nn.MSSSIM Implementation method and GPU Upper MSSSIM Is there any difference in the calculation method ? Here is a custom MS-SSIM loss Realization :

3、 Whether the difference in accuracy will affect the actual loss The effect of ? such as : stay fp16 Under precision ,MS-SSIM loss Will the effect of be reduced ?
4、 What are the possible reasons for Ascend Of MindSpore The training results of cannot be reproduced GPU(Pytorch) The result on ?
From your description above , If you are using fp16 Or mixing accuracy , Some operators may overflow , Suggestions for reference https://www.mindspore.cn/doc/api_python/zh-CN/r1.1/mindspore/mindspore.html?highlight=lossscale#mindspore.DynamicLossScaleManager add to LossScale Or reference Enable automatic mixing accuracy — MindSpore r1.1 documentation Use automatic mixing accuracy or manual mixing accuracy , It should be noted that if there is exp It is easy to overflow if you operate , You can check whether such an operator is used fp16, If so, it is suggested to use manual mixing and change the accuracy to fp32 Of
边栏推荐
- Technical operation
- See project code Note 1
- Restructuredtext grammar summary for beginners
- LeetCode_ 6124_ The first letter that appears twice
- JS ------ Chapter 3 JS cycle
- QT project - security monitoring system (function realization of each interface)
- [hero planet July training leetcode problem solving daily] 24th line segment tree
- LeetCode_392_判断子序列
- 线段树杂谈
- SQL result export function. If you click the work order but don't enter it, the interface is always blank and there is no response. What should you do?
猜你喜欢

Live broadcast preview | online seminar on open source security governance models and tools

技术操作

91. (leaflet chapter) leaflet situation plotting - offensive direction drawing

Qt学习-利用数据库单例完成 登录匹配 + 注册 功能实现

Soft test --- fundamentals of programming language (Part 2)

在混合云中管理数据库:八个关键注意事项

Pit record: typeerror:'module'object is not callable

Processing PDF and JPG files in VB6

软考 --- 程序设计语言基础(下)

Analysis of WPF multi finger application development
随机推荐
Wine wechat initialization 96% stuck
UART
Tencent low code platform is officially open source! You can drag and drop and generate mobile phone projects and PC projects! Get private benefits
Beisen prospectus: the advantages of the track are prominent, and integration + medium and large customers are plus points
Lambda&Stream
Upload and download filask files
How to put long links into Excel
Let me introduce you to the partition automatic management of data warehouse
From the big guy baptism! 2022 headline first hand play MySQL advanced notes, and it is expected to penetrate P7
Regular expression learning
MATLAB basic grammar (II)
NXP i.mx6q development board software and hardware are all open source, and the schematic diagram of the core board is provided
Grafana - influxdb visual K6 output
C语言学习之分支与循环语句
阿里 Seata 新版本终于解决了 TCC 模式的幂等、悬挂和空回滚问题
codeforces round #797 ABCDEFG
Processing PDF and JPG files in VB6
Advanced function of postman
EF core: self referencing organizational structure tree
软考 --- 程序设计语言基础(下)