We will install the required drivers for using GPUs for training deep models. But before we begin, remember:

When you are not using an instance - stop it

Now that we are done with this, let’s SSH into our machine like we learned in the previous guide.

Setting Up a GPU-Instance

Our instances should at this point be GPU-less. This saves a big chunk of money and allows you to keep the credits Google was nice enough to give you for the final project.

A Word on GPUs

For those of you not familiar - these are basically the drivers required for using GPUs (Graphical Processing Units - sometimes in this context also called GPGPUs where the other GP stands for General Purpose). Installing them is a lot of fun.

Types of GPUs

Not all GPUs are created equal. We are using us-west-b1 since that zone contains K80 GPUs - long gone fone other zones - which are powerful enough but are relatively very cheap. Newer P100 and V100 GPUs are way stronger but are also more expensive. Experiment with them when doing time crucial tasks such as training very deep models from scratch.

Adding a GPU to Your Instances

To do this, we will head into the dashboard and stop our running instance as below:

- first time -

Editing an Instance

The nice feature of GCP is that is allows us to edit instances. By hitting Edit we can get into the same menu we used to define our instance when creating it.

- first time -

Inside the Edit menu, hit Customize

- first time -

…and inside that menu, change Number of GPUs from None to 1. In GPU type change to K80 as below. Note that you have other options there - such as P100 and V100 as discussed before.

- first time -

- first time -

Finally, scroll all the way down and hit Save. Back at the instances dashboard, choose your instance and hit Start.

- first time -

Moving Forward

Don’t forget to change it back to a CPU instance by reversing this process once we’re done installing everything. Deep learning frameworks installed with GPU-support will work flawlessly even when the GPUs are removed from the instance. The oposite isn’t true.

Install CUDA and cuDNN

Installing CUDA

Let’s first make sure we don’t have an older version installed by hitting sudo apt-get purge nvidia* cuda*

Also do this just in case:

sudo apt-get autoremove 

Some prerequisites:

sudo apt-get update -y && \
sudo apt-get install -y build-essential git gcc-multilib && \
sudo apt-get install -y linux-headers-$(uname -r)
sudo apt-get install build-essential gcc-multilib dkms

Install NVIDIA Drivers

CUDA and NVIDIA drivers are actually two different things. CUDA packages usually contain an older version of the drivers, so it’s best to install the two separately. Note that we will be installing version 410. Installing NVIDIA drivers can be done like this:

wget http://us.download.nvidia.com/XFree86/Linux-x86_64/410.73/NVIDIA-Linux-x86_64-410.73.run && \
chmod +x NVIDIA-Linux-x86_64-*.run && \
sudo ./NVIDIA-Linux-x86_64-*.run -s && \
rm -rf ./NVIDIA-Linux-x86_64-*.run

Test this:

Do this to list all available devices:

nvidia-smi -q | head && ls -la /dev | grep nvidia 

If for some reason this fails, do

sudo modprobe nvidia

and try again. Now let’s install CUDA:

Install CUDA 9.2

For this guide we will use the stable CUDA-9.2 which works with the latest TF/PyTorch versions. CUDA-10 is out now but isn’t fully supported by the latest frameworks. Once it is, installing a newer version should be straightforward from this guide.

Again, we don’t use the deb installation but rather the runfile one. It simply extracts files into /usr/local/cuda- with a symlink from /usr/local/cuda.

The CUDA runfile installer can be downloaded from NVIDIA’s websie. But what you download is a package the following three components:

  1. an NVIDIA driver installer, but usually of stale version;
  2. the actual CUDA installer;
  3. the CUDA samples installer;
wget https://developer.nvidia.com/compute/cuda/9.2/Prod2/local_installers/cuda_9.2.148_396.37_linux && \
chmod +x cuda_* && \
mkdir nvidia && \
./cuda_*linux -extract=`pwd`/nvidia && \
cd nvidia && \
sudo ./cuda-linux*.run -noprompt && \
cd .. && rm -rf nvidia && \
rm cuda_*linux

For this type of installation it’s best to manually add the required binaries to PATH by:

echo "export PATH=\$PATH:/usr/local/cuda-9.2/bin" >> ~/.bashrc && \
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:/usr/local/cuda-9.2/lib64" >> ~/.bashrc && \
source ~/.bashrc

We should be done now! To check that this worked, let’s invoke:

nvcc --version

Which should output:

>>> nvcc: NVIDIA (R) Cuda compiler driver
>>> Copyright (c) 2005-2018 NVIDIA Corporation
>>> Built on Tue_Jun_12_23:07:04_CDT_2018
>>> Cuda compilation tools, release 9.2, V9.2.148

Install cuDNN

(New!) Download cuDNN

Download this file and locate it in the home dir of your instance:

wget https://storage.googleapis.com/cs231n-il/cudnn-9.2-linux-x64-v7.4.1.5.tgz

(Conditional) Register to NVIDIA Developer Forums

cuDNN, unlike CUDA, requires us to register with NVIDIA’s developer center. Let’s do that by going here. Once done, download the correct cuDNN distribution. Note that for this guide, we want to get version 7.4.1 and nothing earlier or later. Archived versions are in this link.

We want to download Download cuDNN v7.4.1 (Nov 8, 2018), for CUDA 9.2. This might work directly, again. The file we want will always be called cuDNN Library for Linux (and not the ones with Deb). Download it and place in your homedir inside the machine. For version 7.4.1 it will be called cudnn-9.2-linux-x64-v7.4.1.5.tgz. Since you must register to download it, you will most likely have to move it from your computer to the remote machine.

gcloud compute scp --recurse \
	Downloads/cudnn-9.2-linux-x64-v7.4.1.5.tgz instance-01:~/

Now Install

From your homedir invoke the following. This extracts the cuDNN binaries to the cuda-9.2 directory.

sudo tar xvf cudnn-9.2-linux-x64-v7.4.1.5.tgz --strip-components=1 -C /usr/local/cuda-9.2/

Installing Deep Learning Libraries

It is up to you to decide what libraries you might want to use in this course. We direct you to follow the instructions on both websites for PyTorch and TensorFlow, but read the next part first.

Anaconda and Miniconda

Starting Assignment 2 we believe it’s better to use Anaconda for managing python environments and packages. For assignment one we used a simple virtualenv but it’s not a recommended option for heavier projects.

Like virtualenv, Anaconda is a package manger and a virtual environemnt manager packed into one. This means that it has a very rich collection of Python libraries already compiled. It also means it allows you to run standalone environments, possibly with different versions of those libraries. It has become the de-facto standard in the ML/DS industry.

We direct you to the Anaconda website, where you can follow instructions to get your system up and running. Follow instructions for Ubuntu systems. Once done, we recommend installing the PyTorch or Tensorflow using the Anaconda installation method.

We will actually be installing miniconda which contains a smaller number of pre-installed packages.

After reading the instructions you can actually follow ours :-)

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3 && \
echo "export PATH=$HOME/miniconda3/bin:\$PATH" >> ~/.bashrc && \
source ~/.bashrc

Once anaconda is actually installed, we can create a new environment using:

conda create --yes -n cs231n python=3.6 anaconda

To activate or exit/deactivate an environment, we will use

source activate cs231n

…and

deactivate cs231n

respectively.

To install a specific TF or PyTorch version we can specify them in the installer once inside an environment. Make sure you are inside the environment!.

conda install --yes -c pytorch pytorch=0.4.1

Note that the first pytorch directs Anaconda to use a specific channel.

For TF, use:

conda install --yes -c anaconda tensorflow-gpu=1.8.0

(on a non-GPU machine you’ll have to drop the -gpu)

Test Insallation

With TF, invoking the following test code(s) inside a Python shell should make sure you are fine.

import tensorflow as tf
 
 
class SquareTest(tf.test.TestCase):
  def testSquare(self):
    with self.test_session():
      x = tf.square([2, 3])
      self.assertAllEqual(x.eval(), [4, 9])
 
 
if __name__ == '__main__':
  tf.test.main()

For testing GPU TF, do:

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))

For PyTorch, try this:

from __future__ import print_function
import torch
x = torch.rand(5, 3)
print(x)

To test that PyTorch is seeing installed GPUs, you can simply:

torch.cuda.get_device_name(0)

Assuming this worked, our environment is now ready!

Working with Conda

During the course, especially the project part, there’s a high chance you’ll be needing all sorts of external libraries. Those almost always exist in some pip format. Pip is Python’s native package manager (actually called PyPi).

Anaconda, as we mentioned, has it’s own package management system which has better tracking of version dependencies. Always look for a Conda version before using pip install <...>. In most cases simply hitting conda install instead will work out of the box.

Some Extra Packages

I’d recommend also installing OpenCV. It’s the industry standard for working with images (although assignment 1 actually depended on the simpler, older PIL). To install:

conda install --yes -c conda-forge opencv==3.4.3

Getting the hang of it?

Now, Shut Down Your Instance!