当前位置:网站首页>Implementation of multi GPU distributed training with horovod in Amazon sagemaker pipeline mode
Implementation of multi GPU distributed training with horovod in Amazon sagemaker pipeline mode
2020-11-07 20:15:00 【InfoQ】
At present , We can use a variety of techniques to train deep learning models with a small amount of data , It includes transfer learning for image classification tasks 、 Small sample learning and even one-time learning , It can also be based on pre training BERT or GPT2 Models fine tune language models . however , In some application cases, we still need to introduce a lot of training data . for example , If the current image and ImageNet The images in the dataset are completely different , Or is the current language corpus only for specific areas 、 It's not a generic type , So it's very difficult for transfer learning to bring about the ideal model performance . As a deep learning researcher , You may need to try new ideas or approaches from scratch . under these circumstances , We have to use large datasets to train large deep learning models ; Without finding the best way to train , The whole process can take a few days 、 Weeks, even months .
In this paper , We'll learn how to do it together Amazon SageMaker Run many on a single instance of GPU Training , And discuss how to do it in Amazon SageMaker On the implementation of more efficient GPU And multi node distributed training .
Link to the original text :【https://www.infoq.cn/article/0867pYEmzviBfvZxW37k】. Without the permission of the author , Prohibited reproduced .
版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
边栏推荐
- 想要忘记以前连接到Mac的WiFi网络,试试这个方法!
- 11.Service更新
- 滴滴的分布式ID生成器(Tinyid),好用的一批
- 【笔记】Error while loading PyV8 binary: exit code 1解决方法
- Using thread communication to solve the problem of cache penetrating database avalanche
- Technical debt is a lack of real understanding of business functions- daverupert.com
- Code Review最佳实践
- Don't treat exceptions as business logic, which you can't afford
- 是时候结束 BERTology了
- Using LWA and lync to simulate external test edge free single front end environment
猜你喜欢

嘉宾专访|2020 PostgreSQL亚洲大会中文分论坛:岳彩波

Web API series (3) unified exception handling

Web API系列(三)统一异常处理

C enumerates the differences between permissions |, and |

OpenCV計算機視覺學習(10)——影象變換(傅立葉變換,高通濾波,低通濾波)

统计文本中字母的频次(不区分大小写)

DOM节点操作

In simple terms, the large front-end framework angular6 practical course (angular6 node.js 、keystonejs、

websocket+probuf.原理篇
![[note] error while loading pyv8 binary: exit code 1 solution](/img/3b/00bc81122d330c9d59909994e61027.jpg)
[note] error while loading pyv8 binary: exit code 1 solution
随机推荐
How did I lose control of the team?
DOM node operation
【笔记】Error while loading PyV8 binary: exit code 1解决方法
Code Review最佳实践
The most hard core of the whole network explains the computer startup process
Using LWA and lync to simulate external test edge free single front end environment
Deep into web workers (1)
抽絲剝繭——門面和調停者設計模式
分享几个我日常使用的VS Code插件
Ac86u KX Online
课堂练习
What kind of technical ability should a programmer who has worked for 1-3 years? How to improve?
聊一聊数据库中的锁
快進來!花幾分鐘看一下 ReentrantReadWriteLock 的原理!
DOM节点操作
使用 Xunit.DependencyInjection 改造测试项目
Didi's distributed ID generator (tinyid), easy to use
Why do we need software engineering -- looking at a simple project
Tips for Mac novices
使用RabbitMQ实现分布式事务