Tag Archives: Failed to initialize NVML: driver/library version mismatch

[Solved] nvidia-smi Error: Failed to initialize NVML: Driver/library version mismatch

Record the problems and solutions encountered when configuring CUDA environment, which has been convenient for future verification:

problem

When the terminal inputs NVIDIA-SMI, the following error is reported:

Failed to initialize NVML: Driver/library version mismatch

The reason for this situation is that the NVIDIA kernel driver version and the system driver do not match.

Most of the cases of this error may be the NVIDIA driver is installed in the low version, due to the system update, the kernel version upgrade led to NVIDIA can not be applied in the high kernel version, the new version of the kernel and the graphics driver does not match, the graphics driver version is too low!

In my case, when I configured a low version of cuda, I accidentally lowered the version of the graphics driver, resulting in an error due to its inconsistency with the system kernel version. In order to restore the previous version, here I solved the problem by reinstalling the corresponding version of the graphics card driver.

Installing the graphics card driver

For the method of this step, please refer to method 2 in this blog. Thank you very much!!

Step 1: install the graphics card driver PPA

The website of NVIDIA’s official PPA is: https://launchpad.net/ ~Graphics drivers/+ archive/Ubuntu/PPA, you can view all graphics card drivers.

lspci | grep NVIDIA #Check the PCIE installed graphics card on your computer, theoretically you don't need to check, you should know it on your own computer
sudo add-apt-repository ppa:graphics-drivers/ppa # Add the graphics card installation source

The installation process of NVIDIA graphics card can take any time for NVIDIA graphics card installation. Press enter in the middle to continue. (because the problem has been solved when writing this blog, there is no way to release the screenshot of the process. For details, please refer to the blog in the link)

NVIDIA PPA is installed into the cache of apt package repository Update it later.

sudo apt update

Step 2: find the appropriate graphics card driver

Open the graphics card driver query website:

When my graphics card is Titan XP, find the corresponding option and click search to query

You can see that currently (2022.1.14), the driver version number of my graphics card is 470.94. Verify whether the driver version of the graphics card can be installed initially. You can visit the link and view it according to the system type

Use the following command to search the PPA for the 470 Version (no need to pay attention to the number after the decimal point) of the graphics card driver:

apt search nvidia-470

Good luck, just have; If not found, you can search and use it from high to low according to all available graphics card driver versions in the above web page.

Step 3: remove the previously installed graphics card driver on the system

sudo apt purge nvidia*

This instruction will remove all the graphics card drivers and CUDA used. When the graphics card driver remains unchanged, it will simply upgrade CUDA and delete the previous CUDA version.

Step 4: install the graphics card driver

In this step, use the following commands to install:

sudo apt install nvidia-driver-470

Step 5: restart the computer and verify

sudo reboot
nvidia-smi
lsmod | grep nvidia
nvcc --version

 

Solution to NVIDIA driver problem: failed to initialize nvml: Driver / library version mismatch

After chopping hands, the fraud call came before the express delivery was received. How to improve the privacy and security of e-commerce>>>

Enter the command on the Rongtian server scs4850:

sudo dpkg --list | grep nvidia-*

The display is as follows:

As you can see, the NVIDIA driver version is 384.111

Enter the command again:

cat /proc/driver/nvidia/version

Nvrm version displayed: NVIDIA UNIX x86_ 64 kernel module is still the old version: 384.98

The situation of scs4450 is the same as that of scs4850

Then look at the log of Rongtian server scs4850 and enter the command:

cat /proc/driver/nvidia/version

The display is as follows:

Then look at the log of Rongtian server scs4450 and enter the command:

cat /proc/driver/nvidia/version

The display is as follows:

It is found that the drivers of both servers are automatically updated to 384.111, while NVIDIA’s kernel module is still the old version of 384.98, so the command is entered as follows:

nvidia-smi

After the display does not match

Restart the computer

Enter CD/etc/apt/apt.conf.d/50 unattended upgrades

Disable automatic update in server capacity scs4850:

Disable automatic update in server Rongtian scs4450:

Then, enter the command again, and the display result of server Rongtian scs4850 is as follows:

The display results of server capacity scs4450 are as follows:

Then use this command:

ps -eo pid,user,group,euser,egroup,cmd

You can see which user is using which process according to PID