当前位置:网站首页>Record the process of configuring nccl and horovod in these two days (original)
Record the process of configuring nccl and horovod in these two days (original)
2022-07-05 06:16:00 【JNash】
Installation Guide :: NVIDIA Deep Learning NCCL Documentation
NVIDIA Collective Communications Library (NCCL) Download Page | NVIDIA Developer
The above two are nccl Official website installation tutorial
One 、 Now install according to the official website ( It didn't work )
1.
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004 /x86_64/7fa2af80.pub
2.
sudo add-apt-repository “deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/ x86_64/ /”
( Change according to the version of your own operating system )
3.
sudo apt update
4.
(1) Install the latest version
sudo apt install libnccl2 libnccl-dev
(2) Install according to your version
sudo apt install libnccl2=2.4.8-1+cuda10.0 libnccl-dev=2.4.8-1+cuda10.0
But only installed , No compilation , What the official website said is not detailed ,/usr/local/ I can't find it nccl
Two 、 from github Next clone
1.
git clone https://github.com/NVIDIA/nccl.git
cd nccl
2.
make -j12 src.build BUILDDIR=/home/yourname/nccl CUDA_HOME=/usr/local/cuda NVCC_GENCODE="-gencode=arch=compute_86, ode=sm_86"
((NVCC_GENCODE Can not add , If you do not add this field , All schemas will be compiled by default ; To speed up compilation and reduce binary file size , Add this field , Specifically comute_86,sm_86 It matches the computing power of the graphics card , Specific view :https://developer.nvidia.com/cuda-gpus))
- -j12: Said the use of 12 Core , Use nproc Check the total number of cores , Adjust according to the specific situation ;
- BUILDDIR: Indicates that after compilation , The storage path of some files ; The default is nccl/build; Of course, if it is root Users can specify to /usr/local/ncc/;
- CUDA_HOME: Express CUDA The catalog of , The default is /usr/local/cuda, Can not add , If you make a mistake , add
3.
Finally , The compiled files are in the specified NVCC_GENCODE Field path , You need to add it to the environment variable ;
vim ~/.bashrc
This document is located in /home/yourname Next
Add the following :
export LD_LIBRARY_PATH= L D L I B R A R Y P A T H : / h o m e / y o u r n a m e / n c c l / l i b e x p o r t P A T H = LD_LIBRARY_PATH:/home/yourname/nccl/lib export PATH= LDLIBRARYPATH:/home/yourname/nccl/libexportPATH=PATH:/home/yourname/nccl/bin
Press ESC,:WQ, Then execute the following command
source ~/.bashrc
4. verification NCCL Is the installation successful :
git clone https://github.com/NVIDIA/nccl-tests.git
cd nccl-tests
make -j12 CUDA_HOME=/usr/local/cuda
./build/all_reduce_perf -b 8 -e 256M -f 2 -g 4
( Machine required GPU Replace the number of , I have 4 A video card , Just designate 4;)
Linux Next NCCL Source code compilation and installation
Then install horovod( It also needs to be installed Openmpi, gcc±5 above (horovod The official website requires ), tensorflow>=1.15.0)
install horovod( Environmental Science ) The right posture
openmpi introduction 1- Installation and testing
Horovod Installation and use
How to install Horovod?
ubuntu 18.04 install horovod
1. install gcc
- sudo apt-get install g++
- g++ --version see g++ Version of
2. see openmpi edition
- ompi_info (or mpiexec –version or mpirun –version or mpicxx --showme:version)
3. Start installation tensorflow
pip install tensorflow-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple
test tensorflow
import tensorflow as tf
Report errors
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
Solution one :
- pip install numpy –upgrade( upgrade Numpy, But mine is the latest version , Invalid )
Solution two :
- uninstall numpy reinstall ( After uninstalling and installing again, I am prompted that my environment has Numpy, This shows that there are two environments just now numpy, import tensorflow as tf, Tips No module named 'numpy.core._multiarray_umath, upgrade numpy Successfully imported tensorflow)
4. install horovod
HOROVOD_GPU_OPERATIONS=NCCL pip install --no-cache-dir horovod
5. test
import tensorflow as tf
import horovod.tensorflow as hvd
( No report error , Installation successful )
6. test horovod Some examples of
The link is provided horovod How to install
边栏推荐
- Appium automation test foundation - Summary of appium test environment construction
- Golang uses context gracefully
- QT判断界面当前点击的按钮和当前鼠标坐标
- 1.13 - RISC/CISC
- Multi screen computer screenshots will cut off multiple screens, not only the current screen
- MySQL怎么运行的系列(八)14张图说明白MySQL事务原子性和undo日志原理
- The connection and solution between the shortest Hamilton path and the traveling salesman problem
- LeetCode 1200. Minimum absolute difference
- Operator priority, one catch, no doubt
- 可变电阻器概述——结构、工作和不同应用
猜你喜欢
随机推荐
一些工具的记录2022
A reason that is easy to be ignored when the printer is offline
MySQL advanced part 1: triggers
【Rust 笔记】15-字符串与文本(下)
1.15 - input and output system
Records of some tools 2022
One question per day 2047 Number of valid words in the sentence
Real time clock (RTC)
【Rust 笔记】17-并发(上)
Usage scenarios of golang context
Collection: programming related websites and books
1.14 - assembly line
1039 Course List for Student
Appium基础 — 使用Appium的第一个Demo
[rust notes] 14 set (Part 2)
RGB LED infinite mirror controlled by Arduino
liunx启动redis
阿里新成员「瓴羊」正式亮相,由阿里副总裁朋新宇带队,集结多个核心部门技术团队
leetcode-9:回文数
Règlement sur la sécurité des réseaux dans les écoles professionnelles secondaires du concours de compétences des écoles professionnelles de la province de Guizhou en 2022