Use the "sbatch" command to submit jobs.
- Use the "-A" option to request the Account for accounting, e.g "-A other"
- Use the "-p" option to request a queue, e.g. "-p cpu"
- Use the "-N" option to request the number of nodes, e.g. "-N 2"
- Use the "--ntasks-per-node=" option to request the number of tasks (threads) per node
- Use the "--gres:gpu=" option to request the number of GPUs per node
So for example,
- sbatch -A other -p cpu -N 1 --ntasks-per-node=1 run.slurm
runs a single processor job on the 'cpu' queue using the 'other' account.
- sbatch -A S1.1 -p cpu -N 1 --ntasks-per-node=16 run.slurm
runs a single, whole-node job (16 cores per node) on the 'cpu' queue using the 'S1.1' account.
- sbatch -A T1 -p cpu -N 2 --ntasks-per-node=16 run.slurm
runs a 2-node job (32 threads) on the 'cpu' queue using the 'T1' account.
- sbatch -A other_mvz -p cpu -N 52 --ntasks-per-node=16 run.slurm
runs a 52-node job (832 threads) on the 'cpu' queue using the 'other_mvz' account.
- sbatch -A S2.1 -p all_cluster -N 56 --ntasks-per-node=16 run.slurm
runs a 56-node job (896 threads) on the 'all_cluster' queue using the 'S2.1' account. Note that the 'all_cluster' queue will reject jobs that request less than 54 nodes.
To submit GPU jobs, use the --gres:gpu=X option to request a number of GPUs, and make sure that you submit to the 'gpu' queue.
- sbatch -A S3.1 -p gpu -N 1 --ntasks-per-node=1 --gres:gpu=1 run.slurm
runs a single-core, single-GPU job using the 'gpu' queue and the 'S3.1' account.
- sbatch -A S2.2 -p gpu -N 1 --ntasks-per-node=2 --gres:gpu=2 run.slurm
runs a two-core, dual-GPU job
BlueGem has 53 compute nodes, each with 16 cores per node. Jobs only scale well over an even number of nodes, so it is recommended that the maximum job size is 52 nodes (832 threads)
BlueGem has 4 GPU nodes, each with 2 K80 GPUs. Each K80 shows up as 2 GPUs, so it will look like each GPU node has 4 GPUs. Jobs do not scale well over multiple GPUs. You will only get good performance using 1 or 2 GPUs (where both GPUs are in the same K80 card).
If you want to ensure that only your jobs run on a node, then use the "--exclusive" option, e.g.
- sbatch -A other -p cpu -N 1 --ntasks-per-node=1 --exclusive run.slurm
runs a single task on an unshared node on the 'cpu' queue using the 'other' account. No other job will be allowed to use the node that you are using. This will ensure that your job has full access to everything available on the node.