Running Phi Jobs
1. Running Intel Xeon Phi Batch Jobs:
On MASSIVE, the nodes m20[41-50] are phi nodes where each node is equippted with two Xeon Phi cards.
When requesting phi devices, you can add the following directive in your slurm script:
To submit a job, if you need 1 nodes with 2 cores and 2 Phis, then the slurm submission script should look like:
#!/bin/bash #SBATCH --job-name=MyJob #SBATCH --account=monash001 #SBATCH --time=01:00:00 #SBATCH --ntasks=2 #SBATCH --cpus-per-tasks=1
module load intel
On MASSIVE, the sample slurm submission scripts have been prepared and can be found here:
2. Compile your own CUDA or OpenCL codes and run on MASSIVE
You need to compile your code on a Phi node (m20[41-50]). To do that, you need to use sinteractive session get on a phi node
sinteractive --account=monash001 --ntasks=16 --ntasks-per-node=16 --gres=mic:2
The above command will request an entire phi node which is recommanded however sometimes the cluster can be very busy so it is hard to get an entire node. For such case, use the following command for requesting less resource:
sinteractive --account=monash001 --ntasks=1 --gres=mic:1
Once you get on a phi node, load the module first:
module load intel
Check the Phi device information. All tests should be passed:
There are two programming modes of Phi - Offload and Native.
Compile your phi code on host. After compilation is done, you can run your codes on the node now. In this mode, the Phi card works as a co-processor, very much like a GPU card.
icc -openmp test.c -o test./test
Intel Xeon Phi supports Native mode which allows you to compile and run code onboard directly. E.g. If the node m2041 is allocated, you should be able to ssh to the phi cards. The two phi cards of the node m2041 come with the hostname m2041-mic0 and m2041-mic1 respectively. Once ssh to a card, you can do code compilation and running. In this mode, the phi card is considered as an embedded device/'mini-pc' using its own cpu/memory/file system/OS. The CPU/Memory usage of the host is minimized. To achieve the best performance, only do Native programming when the code is highly parallized and with minimal serial part.
[kaixi@m2041]$ ssh mic0