massive-website-banner

Transitioning to SLURM

Overview

MASSIVE has now moved from Centos 5/MOAB to Centos 6/SLURM. The details of Operating System Upgrade and Scheduler Transition to SLURM were announced in the 2014 Q4 newsletter

The following describes the process of mirgrating old workflows to Cenots 6/SLURM.

Desktop Users

Please make sure you are using the current Strudel version: https://www.massive.org.au/userguide/cluster-instructions/strudel 

Known Issues Desktop Users:

  • You may need to deselect and reselect MASSIVE under the menu File > Manage Sites > MASSIVE (this will refresh the list to include "Centos 6 Desktop (For Eval Users) on m2-login3.massive.org.au)"
  • loading Centos 5 modules in your .bashrc will report errors if the modules do not exist in the Centos 6.5 implementation.

Batch Users

Please login in to m2-login2.massive.org.au or m2-login1.massive.org.au

You will need to learn SLURM. The MASSIVE team can help with porting your PBS command to sbatch.

SLURM

SLURM is the new queue manger that we are running and details of how to use SLURM are covered in Running Slurm Jobs.

If you are interested in greater detail, here are some other useful links:

Our intention is to make the transition to MASSIVE Centos 6.5 and SLURM as easy as possible! If you have ANY issues please contact help@massive.org.au.

SLURM Quick Start

We are still in the process of creating training material for SLURM. To start with MASSIVE is providing information using the commands that you are used to using ... qstat, showq, qsub. We have also provided example scripts in usr/local/training/samples/slurm.

Below are examples of information currently provided:

showq
# MASSIVE info...
#
# Under SLURM the qstat/showq commands are replaced with squeue/scontrol show...
# For information on using more advanced squeue/scontrol show features you can use...
info squeue
man squeue
squeue --usage
man scontrol
# The most common commands are...
squeue
squeue -u $USERID
scontrol show job <jobid>
# We also have a MASSIVE script to show what is happening on the cluster
show_cluster

 

qsub
# MASSIVE info...
#
# Under SLURM the qsub command is replaced with sbatch...
# For information on using more advanced sbatch features you can use...
info sbatch
man sbatch
sbatch --usage
# Some examples of using sbatch and sinteractive
#
# Interactive:
# qsub -I is replaced by sinteractive, below are some examples...
# Access a single node using 2 tasks and 2 GPU's
sinteractive --nodes=1 --ntasks-per-node=2 --gres=gpu:2
#
# Batch:
# Some example sbatch scripts can be found at:
# /usr/local/training/samples/slurm/

 

qdel
# MASSIVE info...
#
# Under SLURM the qdel command is replaced with scancel
# For information on using more advanced scancel features you can use...
info scancel
man scancel
scancel --usage
# The most common commands are...
scancel <jobid>

 

SLURM Differences to MOAB/Torque

  • SLURM exports the users modules environment from the login node by default to the compute node (placing a "module purge" in your script will start with a fresh environment)
  • SLURM runs in the current directory by default (so no need to "cd $PBS_O_WORKDIR")
  • SLURM combines stdout and stderr and outputs directly (and the file naming is different)
  • SLURM is case insensitive (e.g. project names are lower case)
  • #SBATCH instead of #PBS in batch scripts (remove #PBS variables as SLURM will attempt to read these also)
  • SLURM has an "srun" command for parallel jobs, this command "may" have advantages over mpirun/mpiexec.

Known Issues Batch Users:

  • email is not currently automatic, System Admins don't currently get emails if jobs fail and you are required to add email notification explicitly to enable it
  • job output does not currently contain information about how much memory is used or other statistics
  • the best scheduling algorithm/parameters for best/fairest job throughput is still being worked on by the MASSIVE team

Copyright © 2016 MASSIVE. All Rights Reserved.