massive-website-banner

Memory Management

On MASSIVE, nodes have 12-16 CPU's and 48-192G's of memory (see Resources). While running SLURM job scripts many jobs only require a small amount of memory, so to prevent excessive memory use a default of 1000M per processor (~1GB) has been applied. To use more memory users can request a specific amount in the job script. If a job uses more RAM than has been requested it will be killed by the scheduling software to protect other jobs on the node. If your job is killed you will see an error "Exceeded job memory limit".

We recommend that users use multiples of 1000MB rather than 1GB (where, due to binary, the multipler is 1024MB not 1000), so that jobs fit efficiently on to a node. The actual memory available is 47GB/48389MB of RAM for a "stage one" 12 core node, this will fit 12 jobs using 4000MB but only 11 jobs using 4GB (1 core will be wasted). Newer "stage two" 16 core node have 62GB/64398MB. Also requesting memory in GB may stop jobs running at all (without error), for example requesting 64GB will not run as no compute node has this amount of memory, however 64000MB will fit on a "stage two" compute node.

 

Specifying Memory in Jobs

To specify memory requirements use the "--mem-per-cpu" (memory per process) option.

Below are some examples requesting various combinations of resources as well as memory for including in a SLURM script (see: running SLURM job scripts for more details on what these options mean). 

 

Compute Nodes (48GB available)

# Memory request to take all memory (MB) on a standard node
#SBATCH --mem-per-cpu=48000

# Memory request to give each process ~10GB of memory
#SBATCH --mem-per-cpu=10000

Large Memory Jobs

When requesting large amounts of memory the queue system works out where jobs will fit and node memory limitations need to be considered. For example a mem request for 48000MB will mean that only one process per node can run, as the single process consumes all the memory on a single compute node and there is no memory left to service other jobs on other cores. Using 48000MB means that you may also hit the limits of MASSIVE for distributed jobs.

If you require very high memory, then contact help@massive.org.au as we may be able to allow access to Vis Nodes.

Software Specific

Some software hides the memory options from you, especially when running through a GUI. This section covers how to manage software with different software packages.

FSL and fsl_sub

FSL mainly launches jobs through fsl_sub. On MASSIVE fsl_sub has been set to use a default memory request of 4000MB. To request more use the -R command:

-R Max total RAM to use for job (integer in MB)

 

In cases where you do not have control of the memory request, please contact help@massive.org.au

Copyright © 2016 MASSIVE. All Rights Reserved.