Skip to main content

Setting up CUDA 10.0 for mxnet on Google Colaboratory

Preface

Recently, I was trying to train model on Google Colaboratory with mxnet. However, I found the CUDA version pre-installed on the Colab. is 10.2. Till now, mxnet only support to CUDA 10.1 Therefore, I started to think about if it is possible to setup the environment that mxnet has support. Till now CUDA 10.1 doesn’t work for me. But I do successfully installed CUDA 10.0. Also tried to trained a LeNet on Colaboratory. Since I am a mac user, NVIDIA GPU is always what we are jealous and envy. Until I found Google Colaboratory….
I’d like to share my journey with you, if you also encounter some problem on setting up environment.
Google Colaboratory provide free access to the NVIDIA Tesla K80 GPU (24GB RAM, 4992 CUDA core). It is a great gift for most of people who is eager to learn deep learning but doesn’t have good hardware systems to train deeper models.
NVIDIA Tesla K80 GPU
Tesla
I assume that you already have background knowledge of Python and familiar with tools like jupyter notebook.
The goal of this article is to guild you to setup the environment for mxnet GPU model training on Google Colaboratory.

Change runtime setting

Open a new **Colaboratory notebook**. We are going to setup the Python3 and GPU environment.
Click on menu **Runtime>Change runtime type** Select **Python 3** and **GPU** as our notebook setting.

Check CUDA version

Insert a code block by typing into the command below:
This command help you to check CUDA version
# Check cuda version
!nvcc --version
The version I have just installed is 10.0. Which is ok for the mxnet environment setting. However you might see higher version like 10.2 not like me.

Uninstall CUDA 10.2

If you see version higher than 10.1. It means that you might need to uninstall the new version and install the older one.Let’s first uninstall CUDA 10.2.
#Uninstall the current CUDA version
!apt-get --purge remove cuda nvidia* libnvidia-*
!dpkg -l | grep cuda- | awk '{print $2}' | xargs -n1 dpkg --purge
!apt-get remove cuda-*
!apt autoremove
!apt-get update

Download and install CUDA 10.0

After the uninstall process. We are going to install older version 10.0 for our mxnet system.
#Download CUDA 10.0
!wget  --no-clobber https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
#install CUDA kit dpkg
!dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
!sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
!apt-get update
!apt-get install cuda-10-0
This is a part of command that I found should be execute to slove
the error message of :
OSError: libcurand.so.10: cannot open shared object file: No such file or directory error
the missing libcurand.so.10 error if it is not installed properly.
#Slove libcurand.so.10 error
!wget --no-clobber http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
#-nc, --no-clobber: skip downloads that would download to existing files.
!apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
!apt-get update
The !apt-get install **cuda-10-0** means that we will force the system to install version 10.0. Otherwise, the system will just install the newest CUDA version for you(Which is still not support by mxnet).

Check CUDA version

After setting up the CUDA 10.0 environment, Let’s first check with the CUDA version again:
# Check cuda version
!nvcc --version
You should see version 10.0 being installed.
Let’s install mxnet related package. The dependence package for mxnet somehow also rely on older version. The spacy, folium, imgaug version is suggest by the error message when I was trying to install the new version.
!pip install spacy==2.0.18 folium==0.2.1 imgaug==0.2.7
!pip install numpy
#Lets install all the package needed
!pip install mxnet-cu100mkl
!pip install gluoncv
!pip install d2l
!pip install **mxnet-cu100mkl** means that we are going to install mxnet CUDA version 10.0 with mkl support.

Give it a try

Let’s try with some code with mxnet GPU computing.
from mxnet import nd, gpu, gluon, autograd
from mxnet.gluon import nn
from mxnet.gluon.data.vision import datasets, transforms
import time
y = nd.random.uniform(shape=(3,4), ctx=gpu())
print(y)
The y = nd.random.uniform(shape=(3,4), **ctx=gpu()**``) let the nd array to be put into the GPU ram. If you can successfully see the print output result with **@gpu**, it means that you have successfully execut the code in the GPU.
You can now try to play around with some mxnet sample code and have fun 🙂
Please follow this guide for further mxnet GPU computing information:
mxnet-Use GPUs

Comments

Popular posts from this blog

Arduino CNC shield control Stepper motor with DRV8825

CNC shield is quite useful for stepper motor driving. Here, I demonstrated how to use simple arduino code to drive stepper motor with DRV8825. First, just simply mount CNC shield onto Arduino Uno. Make sure the direction of the shield was right, where both the USB port and power supply wire was on your left hand site. The blue wire is my power supply which can be connect to 12-36V of power source. Next step you can mount the DRV8825 chip onto the CNC shield. Make sure the DRV8825 chip goes like this direction. If you put the chip in the wrong direction, you will probably damage it. By adjusting the jumber underneath the DRV8825 chip, motors can be driven with different kind of microstepping mode. I put three number here so it means that I set it into 1/32 step driving mode. Which is the most precise  one of DRV8825. The motor can connect to the right site of the DRV8825. Plug in the usb to your computer and upload these coded which will generate ste

Connect Arduino Wemos D1 ESP8266 to Internet/Wi-Fi Router

Connect ESP8266 to Wi-Fi Router Upload these code to your Arduino WeMos D1 ESP8266 W-Fi board. #include <ESP8266WiFi.h> //SSID of your network char ssid[] = " myRouter"; //SSID of your Wi-Fi router char pass[] = " myPassWord"; //Password of your Wi-Fi router void setup() {   Serial.begin(115200);   delay(10);   // Connect to Wi-Fi network   Serial.println();   Serial.println();   Serial.print("Connecting to...");   Serial.println(ssid);   WiFi.begin(ssid, pass);   while ( WiFi.status() != WL_CONNECTED) {     delay(500);     Serial.print(".");   }   Serial.println("");   Serial.println("Wi-Fi connected successfully"); } void loop ( ) {} Using ESP8266 to connect to Wi-Fi need to use the function of: WiFi.begin(ssid, pass); // connect to target Wi-Fi SSID is the name of the Wi-Fi you want to connect to.  while ( WiFi.status() != WL_CONN