After the server updates the NVIDIA driver version, it often appears
Failed to initialize NVML: Driver/library version mismatch
The reason for this problem is that the nvidiadriver version of kernel mod has not been updated
1. Generally, restarting the machine can solve the problem
2. If it can’t be restarted for some reasons, there is also a way to reload kernel mod
In short, there are only two steps
unload nvidiakernel mod
reload nvidia kernel mod
It’s all about execution
sudo rmmod nvidia
sudo nvidia-smi
NVIDIA SMI found that no kernel mod would load it automatically
But things are far from that simple, generally will encounter uninstall failure
$ sudo rmmod nvidia
rmmod: ERROR: Module nvidia is in use by: nvidia_modeset nvidia_uvm
At this time, we need to unload the whole driver bit by bit. First, we need to know the dependency of kernel mod. First, we know from the error message that NVIDIA_ modeset nvidia_ UVM these two mods depend on NVIDIA, so you need to uninstall them first
$lsmod | grep nvidia
nvidia_uvm 647168 0
nvidia_drm 53248 0
nvidia_modeset 790528 1 nvidia_drm
nvidia 12144640 152 nvidia_modeset,nvidia_uvm 12144640 152 nvidia_modeset,nvidia_uvm
As you can see, NVIDIA has 152 words. We can unload NVIDIA first_ UVM and NVIDIA_ modeset
Let’s see which processes use NVIDIA first*
sudo lsof -n -w /dev/nvidia*
I have an understanding of these processes. If the uninstall fails later, remember to close the related processes
Uninstall NVIDIA_ uvm , nvidia_ modeset
sudo rmmod nvidia_uvm
sudo rmmod nvidia_modeset
Then in losf, if NVIDIA’s use by has not dropped to 0, kill the related process. Then perform the relevant unload operation
Finally
sudo rmmod nvidia
nvidia-smi