Tensorflow-GPU in multi-user environment
This post is intended for setting up tensorflow-gpu setup in a multi-user setting. This is written as a guide for GPU users at the WAVES research group, Ghent University, Belgium. But these are also applicable to any linux multi-user environment with GPU-based jobs.
Installing Tensorflow in conda
conda installation
Anaconda is a popular python environment among the AI/ML community. The anaconda distribution can be downloaded from here. Follow the instructions here to properly install it to your user account.
Once you have installed anaconda into your user account, you can create a conda environment using
$ conda create -n <name-of-your-environment>
Then you can activate that environment using:
$ conda activate <name-of-your-environment>
Once you are in the environment, you can install whatever python packages you want. Anaconda already comes with numpy,scipy and many other useful python libraries. If you need a specific library, google for conda install
$ conda deactivate
conda tensorflow
Anaconda also offers tensorflow and keras installations among many many other libraries. In order to install it to your environment, follow the steps below:
- Activate your conda environment
- Install keras
$ conda install -c conda-forge keras - Install tensorflow GPU version
$ conda install tensorflow-gpu
This should install other libraries that are required by keras and tensorflow. I found that it is better to install keras before installing tensorflow since keras also installs a tensorflow that may not be comaptible with the GPU (I am not 100% sure about this).
conda install, we can also use pip install <the-library-you-need> in the same environment for installing libraries. But I recommend using conda.You can use conda list to see all the installed libraries in your environment. conda env list will list all the conda environments in your system.
Testing Tensorflow Installation
You can test whether the tensorflow installation is using the GPU using the following options.
$ python -c "import tensorflow as tf; sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))"
OR
$ python -c "import tensorflow as tf; tf.test.is_gpu_available()"
If it gives message like Adding visible gpu devices:, then it means that tensorflow indeed uses the GPU. If it only mentions CPU, then you will need to correct the installation. Often, it is better to install keras first and then install tensorflow, or use conda install tensorflow-gpu instead of conda install -c conda-forge tensorflow-gpu.
Admin only
nvidia-smi command does not list any GPUs (meaning the system does not see the GPUs anymore).Installing the cuda compiler and nvidia drivers
These steps are adapted from here. Ignore the $ sign in the beginning of the commands.
Install the kernel headers for the current Ubuntu installation.
$ sudo apt-get install linux-headers-$(uname -r)For other linux flavors, this step is different (Refer here for other linux distributions).
Download the runfile from the cuda downloads page.
Disable the nouveau driver. The instructions are given in this page. For ubuntu, create a new file
/etc/modprobe.d/blacklist-nouveau.confwith the following contents:blacklist nouveau <br> options nouveau modeset=0Regenerate the kernel initramfs:
$ sudo update-initramfs -uDisable the lightdm service to kill the X server from running.
$ sudo service lightdm stopAlso kill vncserver sessions (if they exist) (e.g.,
vncserver -kill :1to kill the first vncserver and so on.). Also remove the.X0.lockor other.lockfiles present in the/tmpfolder.Go to the Downloads folder where the downloaded runfile is stored. Make the file executable:
$ chmod +x cuda<version>.linux.runInstall the driver and compiler
$ sudo ./cuda<version>.linux.run --no-opengl-libsThe option
--no-opengl-libsis important to avoid the login problems. You will then be asked the following and the requried responses are provided in bold font.- Accept license agreement? yes
You can pressCtrl+Cto skip to the end of the license. - Install NVIDIA driver? yes
- Should NVIDIA modify the x-config ? no
- Install CUDA? yes
- Path where cuda installations should be put: choose default or provide a path of your choice
- Install symbolic link? yes
- Install samples? yes
- Choose samples location: choose default or enter your choice
This should install both the cuda compiler and nvidia drivers to the machine.
- Accept license agreement? yes
Perform the post installation actions such as adding the cuda installation to your
PATHandLD_LIBRARY_PATH. Follow the instructions here. You can also edit the~/.bashrcfile to add modify these variables.Try
nvcc -Vto check the nvidia compiler version andnvidia-smito see the GPUs’ status in your machine.Finally, restart the
lightdmservice.$ sudo service lightdm restart
The machine will have the GUI after the lightdm service is restarted. You will need to launch new vnc sessions in order to use remote desktop.