当前位置:网站首页>[tke] GPU node NVIDIA Tesla driver reinstallation

[tke] GPU node NVIDIA Tesla driver reinstallation

2022-06-24 12:26:00 jokey

Use scenarios

By default , The user is in TKE add to GPU Node time , A specific version is automatically preloaded GPU drive , However, it is currently installed by default GPU The driver version is fixed , The user has not yet been able to select what to install GPU Driver version , When users have other versions GPU When driving usage requirements , You need to reinstall on the node , The following will be introduced in TKE How to reinstall in a node GPU The driver .

Operation steps

1. Unload the original drive

Unload the original drive first , Execute the uninstall command on the node :

nvidia-uninstall

The unloading process of the original drive is shown in the figure below :

No relevant configuration is used , So choose not to back up

Prompt that the uninstallation of the original driver is completed, indicating that the uninstallation is successful :

Uninstall completed

2. Restart the node

Because the driver is compiled into the kernel and loaded , After uninstalling the original drive, you need to restart the next node , If you do not restart, the installation of the new driver will fail because the original driver is still loading .

3. Download the new driver and install

Sign in NVIDIA Driver download Official website download selection linux 64 bit shell The installation files , Here's the picture :

Download the new driver installation file

Here we choose to install NVIDIA Tesla 10.2 Version driven , Finally, it can be downloaded through a link similar to the following command shell Install the script into the node and execute the installation :

wget https://us.download.nvidia.com/tesla/440.95.01/NVIDIA-Linux-x86_64-440.95.01.run
chmod +x NVIDIA-Linux-x86_64-440.95.01.run
sh NVIDIA-Linux-x86_64-440.95.01.run

The installation process of the new drive is shown in the following figure :

choice YES

Wait for the new driver installation to complete :

installation is complete

4. Test new drive

  • Execute... On the node nvidia-smi see GPU situation , You can see GPU Message and display that the driver version is the new version :
see GPU Information
  • see k8s Whether the node is recognized GPU Capacity , Carry out orders :
kubectl describe node <NodeName>

from k8s View node resources GPU Whether the resources are consistent with the actual resources , Here's the picture :

k8s Node resource

summary

This article briefly introduces how to TKE reinstall GPU The driver , If there is any relevant demand, it can be installed according to the above operation .

Reference material :https://cloud.tencent.com/document/product/560/8048

原网站

版权声明
本文为[jokey]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/06/20210601234420899D.html