当前位置:网站首页>Implementation of multi GPU distributed training with horovod in Amazon sagemaker pipeline mode
Implementation of multi GPU distributed training with horovod in Amazon sagemaker pipeline mode
2020-11-07 20:15:00 【InfoQ】
At present , We can use a variety of techniques to train deep learning models with a small amount of data , It includes transfer learning for image classification tasks 、 Small sample learning and even one-time learning , It can also be based on pre training BERT or GPT2 Models fine tune language models . however , In some application cases, we still need to introduce a lot of training data . for example , If the current image and ImageNet The images in the dataset are completely different , Or is the current language corpus only for specific areas 、 It's not a generic type , So it's very difficult for transfer learning to bring about the ideal model performance . As a deep learning researcher , You may need to try new ideas or approaches from scratch . under these circumstances , We have to use large datasets to train large deep learning models ; Without finding the best way to train , The whole process can take a few days 、 Weeks, even months .
In this paper , We'll learn how to do it together Amazon SageMaker Run many on a single instance of GPU Training , And discuss how to do it in Amazon SageMaker On the implementation of more efficient GPU And multi node distributed training .
Link to the original text :【https://www.infoq.cn/article/0867pYEmzviBfvZxW37k】. Without the permission of the author , Prohibited reproduced .
版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
边栏推荐
猜你喜欢
使用LWA和Lync模拟外部测试无边缘单前端环境
把 4个消息队列都拉到一个群里后,他们吵起来了
浅谈HiZ-buffer
滴滴的分布式ID生成器(Tinyid),好用的一批
What kind of technical ability should a programmer who has worked for 1-3 years? How to improve?
Ac86u KX Online
不懂数据库索引的底层原理?那是因为你心里没点b树
Mate 40系列发布 搭载华为运动健康服务带来健康数字生活
统计文本中字母的频次(不区分大小写)
【笔记】Error while loading PyV8 binary: exit code 1解决方法
随机推荐
带你深入了解 GitLab CI/CD 原理及流程
Analysis of the original code of [QT] qthread
使用 Xunit.DependencyInjection 改造测试项目
【QT】QThread原始碼淺析
利用线程通信、解决缓存穿透数据库雪崩
Web API series (3) unified exception handling
如何应对事关业务生死的数据泄露和删改?
工作1-3年的程序员,应该具备怎么样的技术能力?该如何提升?
Let you have a deep understanding of gitlab CI / CD principle and process
Blazor 準備好為企業服務了嗎?
Mate 40系列发布 搭载华为运动健康服务带来健康数字生活
使用RabbitMQ实现分布式事务
高级并发编程系列九(Lock接口分析)
Business Facade 与 Business Rule
9. Routingmesh service communication between clusters
Huawei HCIA notes
HMS core push service helps e-commerce app to carry out refined operation
bgfx编译教程
廬山真面目之二微服務架構NGINX版本實現
The samesite problem of cross domain cookie of Chrome browser results in abnormal access to iframe embedded pages