Warm tip: This article is reproduced from stackoverflow.com, please click
driver nvidia singularity-container

nvidia-smi not working after installing driver in a singularity container

发布于 2020-03-27 10:18:09

I use singularity and I need to install a nvidia driver in my singularity container to do some deep learning with a gtx 1080. This singularity image is created from a nvidia docker from here: https://ngc.nvidia.com/catalog/containers/nvidia:kaldi and converted to a singularity container. There was no nvidia drivers I think because nvidia-smi was not found before I install the driver.

I did the following commmands :

    add-apt-repository ppa:graphics-drivers/ppa
    sudo apt update
    apt install nvidia-418

after that I wanted to see if the driver was well installed, I did the command :

nvidia-smi

which return : Failed to initialize NVML: Driver/library version mismatch

I searched about how to solve this error and found this topic : NVIDIA NVML Driver/library version mismatch

One answer says to do the command :

lsmod | grep nvidia

and then to rmmod on each except nvidia and finally to rmmod nvidia.

rmmod drm

But when I do this, as the topic excepted it, I have the error : rmmod: ERROR: Module nvidia is in use.

The topic says to tap lsof /dev/nvidia*, and to kill the process that use the module, but I see nothing with drm written, and it seems to be a very bad idea to kill the process (Xorg, gnome-she).

Here is the answer to the command lsof /dev/nvidia*, followed by the command lsmod | grep nvidia, and then rmmod drm enter image description here Rebooting the computer also didn't work.

what should I do to manage using nvidia-smi and be able to use my GPU from inside the singularity container ?

Thank you

Questioner
Antoine V
Viewed
99
Antoine V 2019-09-13 20:18

thank you for your answer. I wanted to install the GPU driver in the singularity container because when inside the container, I wasn't able to use the GPU (nvidia-smi : command not found) while outside of the container I could use nvidia-smi.

You are right, the driver should be installed outside of the container, I wanted to install it in the container to avoid my problem of not having access to the driver from inside the container.

Now I found the solution : To use GPU from inside the singularity container, you must add --nv when calling the container. example :

singularity exec --nv singularity_container.simg ~/test_gpu.sh 

or

singularity shell --nv singularity_container.simg

When you add --nv, the container will have access to the nvidia driver and nvidia-smi will work. Without this you will not be able to use GPU, nvidia-smi will not work.