Running Jobs

Job Scripts

Users submit jobs to Slurm for scheduling by means of a job script – a plain text file containing commands, directives, and syntax specific to Slurm, and shell scripting.

The script will launch a job that executes the my_exe program using the my_alloc account:

#!/bin/csh

#SBATCH --account <account>                    # charged account
#SBATCH --time 1:30                            # 31 hour 0 minute time limit
#SBATCH --nodes 2                              # 2 nodes
#SBATCH --ntasks-per-node 36                   # 36 processes on each node
#SBATCH --job-name my_job_name                 # job name in queue (``squeue``)
#SBATCH --error my_job_name-%j.err             # stderr file with job_name-job_id.err
#SBATCH --output my_job_name-%j.out            # stdout file
#SBATCH --mail-user=my_email_address@pnnl.gov  # email user
#SBATCH --mail-type END                        # when job ends

module purge                                   # removes the default module set
module load intel
module load impi

mpirun -n 72 ./my_exe

Notes

  • After purging the modules, the script should load the compiler, MPI library, and math library specified when the program my_exe was compiled. The order of load instructions is important.

  • All #SBATCH lines must come before shell script commands.

  • Include your preferred shell as the first line in your batch script.

Slurm Directives

Options passed to Slurm via the #SBATCH keyword are referred to as directives and can be specified in a submission script like we see above in Job Scripts or specified on the command-line during execution. The directives are the same, though when inserted into the job script, the lines must have the #SBATCH prefix. Some common directives and their description:

-A, --account=<account>

Charge resources used by this job to specified account. The account is an arbitrary string. The account name may be changed after job submission using the scontrol command.

-d, --dependency=<dependency_list>

Defer the start of this job until the specified dependencies have been satisfied completed, e.g. -d afterany:226783.

-D, --workdir=<directory>

Set the working directory of the batch script to directory before it is executed. The path can be specified as full path or relative path to the directory where the command is executed.

-e, --error=<filename pattern>

Instruct Slurm to connect the batch script’s standard error directly to the file name specified in the “filename pattern”. By default both standard output and standard error are directed to the same file. The default file name is “slurm-%j.out”, where the “%j” is replaced by the job ID.

-J, --job-name=<jobname>

Specify a name for the job allocation. The specified name will appear along with the job id number when querying running jobs on the system. The default is the name of the batch script, or just “sbatch” if the script is read on sbatch’s standard input.

--mail-type=<type>

Notify user by email when certain event types occur. Some type values are NONE, BEGIN, END, FAIL, TIME_LIMIT_90 (reached 90 percent of time limit). Multiple type values may be specified in a comma separated list.

--mail-user=<user>

User to receive email notification of state changes as defined by --mail-type.

-N, --nodes=<minnodes[-maxnodes]>

Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count.

--ntasks-per-node=<ntasks>

Request that ntasks be invoked on each node. Meant to be used with the –nodes option.

-o, --output=<filename pattern>

Instruct Slurm to connect the batch script’s standard output directly to the file name specified in the “filename pattern”. By default both standard output and standard error are directed to the same file. The default file name is “slurm-%j.out”, where the “%j” is replaced by the job ID.

-p, --partition=<partition_names>

Request a specific partition for the resource allocation. If not specified, the default behavior is to allow the slurm controller to select the default partition as designated by the system administrator. If the job can use more than one partition, specify their names in a comma separate list and the one offering earliest initiation will be used with no regard given to the partition name ordering.

-t, --time=<time>

Set a limit on the total run time of the job allocation. If the requested time limit exceeds the partition’s time limit, the job will be left in a PENDING state (possibly indefinitely). The default time limit is the partition’s default time limit. When the time limit is reached, each task in each job step is sent SIGTERM followed by SIGKILL. Acceptable time formats include “minutes”, “minutes:seconds”, “hours:minutes:seconds”, “days-hours”, “days-hours:minutes” and “days-hours:minutes:seconds”.

--gres=<gpu:2>

Run this job on the NVIDIA Tesla GPGPU nodes. Note: Maximum time limit 2 hours.

All available directives can be listed on the command line using sbatch --help, with more complete descriptions on the command-line using man sbatch, or online at https://slurm.schedmd.com/sbatch.html.

Job Submission

Submission scripts are used with the sbatch command to submit jobs. A successful submission results in the job’s ID being returned:

$ sbatch myjobscript

226783

Notes

  • Changes to the contents of the script after the job has been submitted do not affect the current job.

  • The default directory is where you submit your script from. If you need to be in another directory, then your job script will need to cd to that directory or specify sbatch --workdir=<directory>.

Queues and Queue Limits

Job queue default values limit the time a single job with a particular number of nodes is permitted to run. The system will automatically categorize your job for you based on the resources it requests.

Tahoma has two main queues available for computational jobs, one for each type of node.

Queue/Partition Name

Maximum nodes in a job

Time Limit

Number of jobs

Available Nodes

normal

100

2 days

10

Standard Compute & ML/AI analysis nodes

analysis

4

2 days

6

ML/AI analysis nodes

Job Policy Constraints

The total number of jobs a person can have running depends on user activity. While busy with many users:

  • The maximum number of Active jobs in the Running state: 20

  • Minimum number of nodes per job: 1

Warning Jobs must be run from the batch queue rather than the login node. Jobs found running in the login node will be terminated by the system administrator and the user will be sent an e-mail message. Jobs in the batch queue that have been suspended for more than 12 hours will be deleted.

Running Interactive Jobs

Tahoma supports interactive jobs. Interactive jobs are given higher priority in the queue to allow users the ability to request nodes and get the resources quickly without having to wait for a long time. Because they are given such a high priority in the queue, you are not allowed to run any other jobs while using an interactive job. There are no restrictions on which partition you can use.

If your application requires an X session (usually a graphical application), you will need to make sure that you have used the '-Y' option on your ssh command to login (enables X tunneling, in some cases the '-X' option may work instead):

ssh -Y <name>@tahoma.emsl.pnl.gov

To start an interactive job, use the salloc command:

salloc -A <allocation> -t <time> -N <nodes> -p <partition> -q interactive

Where:

  • -A - is the project for which the job is run.

  • -t - is the number of minutes

  • -N - is the number of nodes you want

  • -p - is the partition to run the job in.

Alternatively, you can use the long options:

salloc –-account=<allocation> -–time=<time>  –-nodes=<nodes>  --partition=<partition>
--qos=interactive

As with all jobs, an interactive job will wait in the queue until there is a set of available nodes for it.