当前位置:网站首页>Record the process of configuring nccl and horovod in these two days (original)
Record the process of configuring nccl and horovod in these two days (original)
2022-07-05 06:16:00 【JNash】
Installation Guide :: NVIDIA Deep Learning NCCL Documentation
NVIDIA Collective Communications Library (NCCL) Download Page | NVIDIA Developer
The above two are nccl Official website installation tutorial
One 、 Now install according to the official website ( It didn't work )
1.
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004 /x86_64/7fa2af80.pub
2.
sudo add-apt-repository “deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/ x86_64/ /”
( Change according to the version of your own operating system )
3.
sudo apt update
4.
(1) Install the latest version
sudo apt install libnccl2 libnccl-dev
(2) Install according to your version
sudo apt install libnccl2=2.4.8-1+cuda10.0 libnccl-dev=2.4.8-1+cuda10.0
But only installed , No compilation , What the official website said is not detailed ,/usr/local/ I can't find it nccl
Two 、 from github Next clone
1.
git clone https://github.com/NVIDIA/nccl.git
cd nccl
2.
make -j12 src.build BUILDDIR=/home/yourname/nccl CUDA_HOME=/usr/local/cuda NVCC_GENCODE="-gencode=arch=compute_86, ode=sm_86"
((NVCC_GENCODE Can not add , If you do not add this field , All schemas will be compiled by default ; To speed up compilation and reduce binary file size , Add this field , Specifically comute_86,sm_86 It matches the computing power of the graphics card , Specific view :https://developer.nvidia.com/cuda-gpus))
- -j12: Said the use of 12 Core , Use nproc Check the total number of cores , Adjust according to the specific situation ;
- BUILDDIR: Indicates that after compilation , The storage path of some files ; The default is nccl/build; Of course, if it is root Users can specify to /usr/local/ncc/;
- CUDA_HOME: Express CUDA The catalog of , The default is /usr/local/cuda, Can not add , If you make a mistake , add
3.
Finally , The compiled files are in the specified NVCC_GENCODE Field path , You need to add it to the environment variable ;
vim ~/.bashrc
This document is located in /home/yourname Next
Add the following :
export LD_LIBRARY_PATH= L D L I B R A R Y P A T H : / h o m e / y o u r n a m e / n c c l / l i b e x p o r t P A T H = LD_LIBRARY_PATH:/home/yourname/nccl/lib export PATH= LDLIBRARYPATH:/home/yourname/nccl/libexportPATH=PATH:/home/yourname/nccl/bin
Press ESC,:WQ, Then execute the following command
source ~/.bashrc
4. verification NCCL Is the installation successful :
git clone https://github.com/NVIDIA/nccl-tests.git
cd nccl-tests
make -j12 CUDA_HOME=/usr/local/cuda
./build/all_reduce_perf -b 8 -e 256M -f 2 -g 4
( Machine required GPU Replace the number of , I have 4 A video card , Just designate 4;)
Linux Next NCCL Source code compilation and installation
Then install horovod( It also needs to be installed Openmpi, gcc±5 above (horovod The official website requires ), tensorflow>=1.15.0)
install horovod( Environmental Science ) The right posture
openmpi introduction 1- Installation and testing
Horovod Installation and use
How to install Horovod?
ubuntu 18.04 install horovod
1. install gcc
- sudo apt-get install g++
- g++ --version see g++ Version of
2. see openmpi edition
- ompi_info (or mpiexec –version or mpirun –version or mpicxx --showme:version)
3. Start installation tensorflow
pip install tensorflow-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple
test tensorflow
import tensorflow as tf
Report errors
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
Solution one :
- pip install numpy –upgrade( upgrade Numpy, But mine is the latest version , Invalid )
Solution two :
- uninstall numpy reinstall ( After uninstalling and installing again, I am prompted that my environment has Numpy, This shows that there are two environments just now numpy, import tensorflow as tf, Tips No module named 'numpy.core._multiarray_umath, upgrade numpy Successfully imported tensorflow)
4. install horovod
HOROVOD_GPU_OPERATIONS=NCCL pip install --no-cache-dir horovod
5. test
import tensorflow as tf
import horovod.tensorflow as hvd
( No report error , Installation successful )
6. test horovod Some examples of
The link is provided horovod How to install
边栏推荐
- 2021apmcm post game Summary - edge detection
- Multi screen computer screenshots will cut off multiple screens, not only the current screen
- Navicat連接Oracle數據庫報錯ORA-28547或ORA-03135
- Error ora-28547 or ora-03135 when Navicat connects to Oracle Database
- liunx启动redis
- Records of some tools 2022
- [rust notes] 14 set (Part 2)
- 一些工具的记录2022
- wordpress切换页面,域名变回了IP地址
- Leetcode-9: palindromes
猜你喜欢
1.15 - 输入输出系统
Is it impossible for lamda to wake up?
1.14 - 流水线
Simple selection sort of selection sort
4. 对象映射 - Mapping.Mapster
WordPress switches the page, and the domain name changes back to the IP address
Leetcode stack related
1.13 - RISC/CISC
MySQL advanced part 2: optimizing SQL steps
SQLMAP使用教程(二)实战技巧一
随机推荐
11-gorm-v2-03-basic query
数据可视化图表总结(二)
Open source storage is so popular, why do we insist on self-development?
A reason that is easy to be ignored when the printer is offline
redis发布订阅命令行实现
leetcode-6109:知道秘密的人数
SQLMAP使用教程(二)实战技巧一
Leetcode-6111: spiral matrix IV
【LeetCode】Day95-有效的数独&矩阵置零
Erreur de connexion Navicat à la base de données Oracle Ora - 28547 ou Ora - 03135
LeetCode 0107.二叉树的层序遍历II - 另一种方法
Data visualization chart summary (I)
Daily question 2006 Number of pairs whose absolute value of difference is k
Leetcode backtracking method
多屏电脑截屏会把多屏连着截下来,而不是只截当前屏
The connection and solution between the shortest Hamilton path and the traveling salesman problem
Leetcode-6110: number of incremental paths in the grid graph
4. Object mapping Mapster
[rust notes] 17 concurrent (Part 1)
Leetcode-3: Longest substring without repeated characters