Record the problems and solutions encountered when configuring CUDA environment, which has been convenient for future verification:
problem
When the terminal inputs NVIDIA-SMI, the following error is reported:
Failed to initialize NVML: Driver/library version mismatch
The reason for this situation is that the NVIDIA kernel driver version and the system driver do not match.
Most of the cases of this error may be the NVIDIA driver is installed in the low version, due to the system update, the kernel version upgrade led to NVIDIA can not be applied in the high kernel version, the new version of the kernel and the graphics driver does not match, the graphics driver version is too low!
In my case, when I configured a low version of cuda, I accidentally lowered the version of the graphics driver, resulting in an error due to its inconsistency with the system kernel version. In order to restore the previous version, here I solved the problem by reinstalling the corresponding version of the graphics card driver.
Installing the graphics card driver
For the method of this step, please refer to method 2 in this blog. Thank you very much!!
Step 1: install the graphics card driver PPA
The website of NVIDIA’s official PPA is: https://launchpad.net/ ~Graphics drivers/+ archive/Ubuntu/PPA, you can view all graphics card drivers.
lspci | grep NVIDIA #Check the PCIE installed graphics card on your computer, theoretically you don't need to check, you should know it on your own computer
sudo add-apt-repository ppa:graphics-drivers/ppa # Add the graphics card installation source
The installation process of NVIDIA graphics card can take any time for NVIDIA graphics card installation. Press enter in the middle to continue. (because the problem has been solved when writing this blog, there is no way to release the screenshot of the process. For details, please refer to the blog in the link)
NVIDIA PPA is installed into the cache of apt package repository Update it later.
sudo apt update
Step 2: find the appropriate graphics card driver
Open the graphics card driver query website:
When my graphics card is Titan XP, find the corresponding option and click search to query
You can see that currently (2022.1.14), the driver version number of my graphics card is 470.94. Verify whether the driver version of the graphics card can be installed initially. You can visit the link and view it according to the system type
Use the following command to search the PPA for the 470 Version (no need to pay attention to the number after the decimal point) of the graphics card driver:
apt search nvidia-470
Good luck, just have; If not found, you can search and use it from high to low according to all available graphics card driver versions in the above web page.
Step 3: remove the previously installed graphics card driver on the system
sudo apt purge nvidia*
This instruction will remove all the graphics card drivers and CUDA used. When the graphics card driver remains unchanged, it will simply upgrade CUDA and delete the previous CUDA version.
Step 4: install the graphics card driver
In this step, use the following commands to install:
sudo apt install nvidia-driver-470
Step 5: restart the computer and verify
sudo reboot
nvidia-smi
lsmod | grep nvidia
nvcc --version