Preface
Recently, I was trying to train model on Google Colaboratory with mxnet. However, I found the CUDA version pre-installed on the Colab. is 10.2. Till now, mxnet only support to
I’d like to share my journey with you, if you also encounter some problem on setting up environment.
Google Colaboratory provide free access to the NVIDIA Tesla K80 GPU (24GB RAM, 4992 CUDA core). It is a great gift for most of people who is eager to learn deep learning but doesn’t have good hardware systems to train deeper models.
CUDA 10.1
Therefore, I started to think about if it is possible to setup the environment that mxnet has support. Till now CUDA 10.1 doesn’t work for me. But I do successfully installed CUDA 10.0. Also tried to trained a LeNet on Colaboratory. Since I am a mac user, NVIDIA GPU is always what we are jealous and envy. Until I found Google Colaboratory….I’d like to share my journey with you, if you also encounter some problem on setting up environment.
Google Colaboratory provide free access to the NVIDIA Tesla K80 GPU (24GB RAM, 4992 CUDA core). It is a great gift for most of people who is eager to learn deep learning but doesn’t have good hardware systems to train deeper models.
NVIDIA Tesla K80 GPU
I assume that you already have background knowledge of Python and familiar with tools like jupyter notebook.
The goal of this article is to guild you to setup the environment for mxnet GPU model training on Google Colaboratory.
The goal of this article is to guild you to setup the environment for mxnet GPU model training on Google Colaboratory.
Change runtime setting
Open a new
Click on menu
**Colaboratory notebook**
. We are going to setup the Python3 and GPU environment.Click on menu
**Runtime>Change runtime type**
Select **Python 3**
and **GPU**
as our notebook setting.Check CUDA version
Insert a code block by typing into the command below:
This command help you to check CUDA version
# Check cuda version
!nvcc --version
The version I have just installed is 10.0. Which is ok for the mxnet environment setting. However you might see higher version like 10.2 not like me.
Uninstall CUDA 10.2
If you see version higher than 10.1. It means that you might need to uninstall the new version and install the older one.Let’s first uninstall CUDA 10.2.
#Uninstall the current CUDA version
!apt-get --purge remove cuda nvidia* libnvidia-*
!dpkg -l | grep cuda- | awk '{print $2}' | xargs -n1 dpkg --purge
!apt-get remove cuda-*
!apt autoremove
!apt-get update
Download and install CUDA 10.0
After the uninstall process. We are going to install older version 10.0 for our mxnet system.
#Download CUDA 10.0
!wget --no-clobber https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
#install CUDA kit dpkg
!dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
!sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
!apt-get update
!apt-get install cuda-10-0
This is a part of command that I found should be execute to slove
the error message of :
the error message of :
OSError: libcurand.so.10: cannot open shared object file: No such file or directory error
the missing libcurand.so.10 error if it is not installed properly.
#Slove libcurand.so.10 error
!wget --no-clobber http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
#-nc, --no-clobber: skip downloads that would download to existing files.
!apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
!apt-get update
The
!apt-get install
**cuda-10-0**
means that we will force the system to install version 10.0. Otherwise, the system will just install the newest CUDA version for you(Which is still not support by mxnet).Check CUDA version
After setting up the CUDA 10.0 environment, Let’s first check with the CUDA version again:
# Check cuda version
!nvcc --version
You should see version 10.0 being installed.
Install mxnet related package
Let’s install mxnet related package. The dependence package for mxnet somehow also rely on older version. The
spacy
, folium
, imgaug
version is suggest by the error message when I was trying to install the new version.!pip install spacy==2.0.18 folium==0.2.1 imgaug==0.2.7
!pip install numpy
#Lets install all the package needed
!pip install mxnet-cu100mkl
!pip install gluoncv
!pip install d2l
!pip install
**mxnet-cu100mkl**
means that we are going to install mxnet CUDA version 10.0 with mkl support.Give it a try
Let’s try with some code with mxnet GPU computing.
from mxnet import nd, gpu, gluon, autograd
from mxnet.gluon import nn
from mxnet.gluon.data.vision import datasets, transforms
import time
y = nd.random.uniform(shape=(3,4), ctx=gpu())
print(y)
The
y = nd.random.uniform(shape=(3,4),
**ctx=gpu()**``)
let the nd array to be put into the GPU ram. If you can successfully see the print output result with **@gpu**
, it means that you have successfully execut the code in the GPU.
You can now try to play around with some mxnet sample code and have fun 🙂
Please follow this guide for further mxnet GPU computing information:
mxnet-Use GPUs
mxnet-Use GPUs
Comments
Post a Comment