Running GPU Jobs
1. Running GPU Batch Jobs:
On MASSIVE, there are 244 GPU cards:
a) 76 NVIDIA K20 GPU
b) 20 NVIDIA M2070Q (Vis node)
c) 148 NVIDIA M2070
When requesting a GPU you can leave it up to the system to decide which GPU you are given
You can also explicitly select the gpu type you would like
To submit a job, if you need 1 nodes with 2 cores and 2 GPUs, then the slurm submission script should look like:
#!/bin/bash #SBATCH --job-name=MyJob #SBATCH --account=monash001 #SBATCH --time=01:00:00 #SBATCH --ntasks=2 #SBATCH --cpus-per-task=1
if you need 6 nodes with 4 cpu cores and 2 GPUs on each node, then the slurm submission script should look like:
#!/bin/bash #SBATCH --job-name=MyJob #SBATCH --account=monash001 #SBATCH --time=01:00:00 #SBATCH --ntasks=24
#SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=1
On MASSIVE, the sample slurm submission scripts have been prepared and can be found here:
2. Compile your own CUDA or OpenCL codes and run on MASSIVE
The MASSIVE cluster has been configured to allow CUDA (or OpenCL) applications to be compiled (device independent code ONLY) on the Login node (no GPUs installed) for execution on a Compute node (with GPU).
Login node: can compile some of CUDA (or OpenCL) source code (device independent code ONLY) but cannot run it
Compute node: can compile all CUDA (or OpenCL) source code as well as execute it.
We strongly suggest you compile your code on a compute node. To do that, you need to use sinteractive session get on a compute node
sinteractive --account=monash001 --gres=gpu:1
To load the cuda module
module load cuda
To check the GPU device information
Then you should be able to compile the GPU codes. And the compilation is done, you can run your codes now.
If you attempt to run any CUDA (or OpenCL) application (compiled executable) on the Login node, errors of ‘no CUDA device found’ may be reported. This is because no CUDA-enabled GPU was installed on the Login node. Instead, you have to run them on a compute node.