当前位置:网站首页>Implementation of multi GPU distributed training with horovod in Amazon sagemaker pipeline mode
Implementation of multi GPU distributed training with horovod in Amazon sagemaker pipeline mode
2020-11-07 20:15:00 【InfoQ】
At present , We can use a variety of techniques to train deep learning models with a small amount of data , It includes transfer learning for image classification tasks 、 Small sample learning and even one-time learning , It can also be based on pre training BERT or GPT2 Models fine tune language models . however , In some application cases, we still need to introduce a lot of training data . for example , If the current image and ImageNet The images in the dataset are completely different , Or is the current language corpus only for specific areas 、 It's not a generic type , So it's very difficult for transfer learning to bring about the ideal model performance . As a deep learning researcher , You may need to try new ideas or approaches from scratch . under these circumstances , We have to use large datasets to train large deep learning models ; Without finding the best way to train , The whole process can take a few days 、 Weeks, even months .
In this paper , We'll learn how to do it together Amazon SageMaker Run many on a single instance of GPU Training , And discuss how to do it in Amazon SageMaker On the implementation of more efficient GPU And multi node distributed training .
Link to the original text :【https://www.infoq.cn/article/0867pYEmzviBfvZxW37k】. Without the permission of the author , Prohibited reproduced .
版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
边栏推荐
- CPU瞒着内存竟干出这种事
- Chinese sub forum of | 2020 PostgreSQL Asia Conference: Pan Juan
- 一种超参数优化技术-Hyperopt
- 深入web workers (上)
- Kubernetes (1): introduction to kubernetes
- websocket+probuf.原理篇
- 使用“1”个参数调用“DownloadString”时发生异常:“操作超时”
- OpenCV計算機視覺學習(10)——影象變換(傅立葉變換,高通濾波,低通濾波)
- HandlerMethodArgumentResolver使用和原理
- 编程界大佬教你:一行Python代码能做出哪些神奇的事情?
猜你喜欢
不懂数据库索引的底层原理?那是因为你心里没点b树
全网最硬核讲解计算机启动流程
一文详解微服务架构
Exclusive interview with alicloud database of | 2020 PostgreSQL Asia Conference: Wang Xu
The official 1909 version of win10 cannot open the real-time protection solution of virus and threat protection in windows security center.
[graffiti footprints of Internet of things] mainstream communication mode of Internet of things
不要把异常当做业务逻辑,这性能可能你无法承受
Advanced concurrent programming series 9 (lock interface analysis)
11. Service update
统计文本中字母的频次(不区分大小写)
随机推荐
栈-括号的匹配
Kylin on Kubernetes 在 eBay 的实践
C language I blog assignment 03
Win10官方1909版本无法打开windows安全中心中病毒和威胁防护的实时保护解决方案。
一万四千字分布式事务原理解析,全部掌握你还怕面试被问?
嘉宾介绍|2020 PostgreSQL亚洲大会中文分论坛:潘娟
Mate 40 series launch with Huawei sports health service to bring healthy digital life
使用 Xunit.DependencyInjection 改造测试项目
三步一坑五步一雷,高速成长下的技术团队怎么带?
After pulling four message queues into a group, they quarreled
大数据算法——布隆过滤器
Business facade and business rule
The samesite problem of cross domain cookie of Chrome browser results in abnormal access to iframe embedded pages
华为HCIA笔记
If you want to forget the WiFi network you used to connect to your Mac, try this!
Using rabbitmq to implement distributed transaction
我是如何失去团队掌控的?
带你深入了解 GitLab CI/CD 原理及流程
当 TiDB 与 Flink 相结合:高效、易用的实时数仓
嘉宾专访|2020 PostgreSQL亚洲大会中文分论坛:岳彩波