Job statistics – High Performance Computing

Table of Contents

How to find job resource usage

List your jobs (replace USERNAME with your cluster user name):

$ sacct -X --starttime 2025-04-01T00:00:00 --endtime 2025-05-01T23:59:59 --user USERNAME
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- --------
. . .
21541421       o2_assim       CPUQ share-ie-+         20  COMPLETED      0:0 
21541422       o2_assim       CPUQ share-ie-+         20  COMPLETED      0:0 
21541423       o2_assim       CPUQ share-ie-+         20  COMPLETED      0:0

Slurm job efficiency report (seff).

$ seff 21541423
Job ID: 21541423
Cluster: idun
User/Group: USERNAME/group_s
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 20
CPU Utilized: 05:57:51
CPU Efficiency: 78.76% of 07:34:20 core-walltime
Job Wall-clock time: 00:22:43
Memory Utilized: 5.29 GB
Memory Efficiency: 90.23% of 5.86 GB (5.86 GB/node)

List your jobs with extended formatting (replace USERNAME with your cluster user name):

$ sacct -X --starttime 2025-04-01T00:00:00 --endtime 2025-05-01T23:59:59 --user USERNAME --format jobid,jobname%10,start,elapsed,state%22,node%20
JobID           JobName               Start    Elapsed                  State             NodeList 
------------ ---------- ------------------- ---------- ---------------------- -------------------- 
. . .
21541421       o2_assim 2025-04-10T11:40:56   00:22:47              COMPLETED           idun-03-17 
21541422       o2_assim 2025-04-10T11:40:56   00:26:21              COMPLETED           idun-04-02 
21541423       o2_assim 2025-04-10T11:40:56   00:22:43              COMPLETED           idun-04-28

Show detailed informating for one job:

$ sacct -j 21541423 --units=G --format jobid,jobname,start,elapsed,state,node,CPUTime,TotalCPU,MaxRSS,MaxDiskRead,MaxDiskWrite,ReqTRES%40,TRESUsageInMax%70
JobID           JobName               Start    Elapsed      State        NodeList    CPUTime   TotalCPU     MaxRSS  MaxDiskRead MaxDiskWrite                                  ReqTRES                                                         TRESUsageInMax 
------------ ---------- ------------------- ---------- ---------- --------------- ---------- ---------- ---------- ------------ ------------ ---------------------------------------- ---------------------------------------------------------------------- 
21541423       o2_assim 2025-04-10T11:40:56   00:22:43  COMPLETED      idun-04-28   07:34:20   05:57:50                                           billing=240,cpu=20,mem=5.86G,node=1                                                                        
21541423.ba+      batch 2025-04-10T11:40:56   00:22:43  COMPLETED      idun-04-28   07:34:20   05:57:50      5.29G      259.26G        0.42G                                              cpu=05:57:50,energy=0,fs/disk=259.26G,mem=5.29G,pages=0.00G,vmem=0 
21541423.ex+     extern 2025-04-10T11:40:56   00:22:43  COMPLETED      idun-04-28   07:34:20   00:00:00          0        0.01M        0.00M                                                        cpu=00:00:00,energy=0,fs/disk=0.00G,mem=0,pages=0,vmem=

Explanation:

# CPUTime
Time used (Elapsed time * CPU count) by a job or step in HH:MM:SS format.

# TotalCPU
The sum of the SystemCPU and UserCPU time used by the job or job step.

# MaxRSS
Maximum resident set size of all tasks in job.

# ReqTres
Trackable resources. These are the minimum resource counts requested by the job/step at submission time.

# node%20 or ReqTRES%40
% and number means size of the column. Useful when text is too long.

More about sacct in the Slurm documentation: https://slurm.schedmd.com/sacct.html

Reporting

sreport is used to generate reports of job usage

# sreport -n cluster AccountUtilizationByUser Start=2025-03-01T00:00:00 End=2025-04-01T00:00:00 -T cpu,gres/gpu,gres/gpu:p100,gres/gpu:v100,gres/gpu:a100,gres/gpu:h100,billing format=Accounts%21,Login,TRESName,Used -t hours User=penl
           share-group  USERNAME            cpu      11995 
           share-group  USERNAME       gres/gpu        923 
           share-group  USERNAME  gres/gpu:p100         50 
           share-group  USERNAME  gres/gpu:v100         34 
           share-group  USERNAME  gres/gpu:a100          5 
           share-group  USERNAME  gres/gpu:h100        923 
           share-group  USERNAME        billing     882063