Unity
Unity
About
News
Events
Docs
Contact Us
code
search
login
Unity
Unity
About
News
Events
Docs
Contact Us
dark_mode
light_mode
code login
search

Documentation

  • Requesting An Account
  • Get Started
    • Quick Start
    • Common Terms
    • HPC Resources
    • Theory of HPC
      • Overview of threads, cores, and sockets in Slurm for HPC workflows
    • Git Guide
  • Connecting to Unity
    • SSH
    • Unity OnDemand
    • Connecting to Desktop VS Code
  • Get Help
    • Frequently Asked Questions
    • How to Ask for Help
    • Troubleshooting
  • Cluster Specifications
    • Node List
    • Partition List
    • Storage
    • Node Features (Constraints)
      • NVLink and NVSwitch
    • CPU Summary List
    • GPU Summary List
  • Managing Files
    • Command Line Interface (CLI)
    • Disk Quotas
    • FileZilla
    • Globus
    • RStor Research Storage System
      • Managing RStor Shares
      • RStor Usage
      • The Allocation Portal
    • Scratch: HPC Workspace
    • Unity OnDemand File Browser
  • Submitting Jobs
    • Batch Jobs
      • Array Batch Jobs
      • Large Job Counts
      • Monitor a batch job
    • Helper Scripts
    • Interactive CLI Jobs
    • Unity OnDemand
    • Message Passing Interface (MPI)
    • Slurm cheat sheet
    • Managing group resource use
  • Software Management
    • Building Software from Scratch
    • Conda
    • Modules
      • Module Usage
    • Renv
    • Unity OnDemand
      • JupyterLab OnDemand
    • Venv
  • Tools & Software
    • ColabFold
    • PyTorch
    • R
      • R Parallelization
    • Unity GPUs
  • Datasets
    • AI and ML
      • Aleph-Alpha
      • Alibaba-NLP
      • Allen AI
      • AlpacaFarm
      • Amass
      • Audioset
      • BAAI
      • Bigcode
      • Biomed Clip
      • Blip 2
      • Bloom
      • ByteDance
      • COCO
      • Code Llama
      • DeepAccident
      • DeepSeek
      • DeSTA
      • Diffa
      • DINO v2
      • epic-kitchens
      • Falcon
      • Florence
      • FLUX.1 Kontext
      • Fomo
      • Gemma
      • Genmo
      • Glm
      • GPT
      • HiDream-I1
      • Ibm Granite
      • Idefics2
      • Imagenet 1K
      • Inaturalist
      • Infly
      • InternLM
      • Internvl3-8b-hf
      • Intfloat
      • Kinetics
      • LG
      • Linq
      • Llama2
      • Llama3
      • Llama4
      • Llava_OneVision
      • LLM-compiler
      • LMSys
      • Lumina
      • Mims
      • Mixtral
      • Monai
      • Moonshot-ai
      • Msmarco
      • Natural-questions
      • Nvidia
      • Objaverse
      • Openai-whisper
      • Perplexity AI
      • Phi
      • Playgroundai
      • Pythia
      • Qwen
      • Qwen2
      • Qwen3
      • Rag-sequence-nq
      • S1-32B
      • Scalabilityai
      • Sft
      • SlimPajama
      • T5
      • Tulu
      • V2X
      • Video-MAE
      • Vit
      • Wildchat
    • Bioinformatics
      • AlphaFold3 Databases
      • BFD/MGnify
      • Big Fantastic Database
      • checkm
      • ColabFoldDB
      • Databases for ColabFold
      • dfam
      • EggNOG - version 5.0
      • EggNOG - version 6.0
      • EVcouplings databases
      • Genomes from NCBI RefSeq database
      • GMAP-GSNAP database (human genome)
      • GTDB
      • Illumina iGenomes
      • Kraken2
      • MGnify
      • NCBI BLAST databases
      • NCBI RefSeq database
      • Parameters of AlphaFold
      • Parameters of Evolutionary Scale Modeling (ESM) models
      • PDB70
      • PINDER
      • PLINDER
      • Protein Data Bank
      • Protein Data Bank database in mmCIF format
      • Protein Data Bank database in SEQRES records
      • Tara Oceans 18S amplicon
      • Tara Oceans MATOU gene catalog
      • Tara Oceans MGT transcriptomes
      • Tattabio
      • Uniclust30
      • UniProtKB
      • UniRef100
      • UniRef30
      • UniRef90
      • Updated databases for ColabFold
    • Using HuggingFace Datasets

Documentation

  • Requesting An Account
  • Get Started
    • Quick Start
    • Common Terms
    • HPC Resources
    • Theory of HPC
      • Overview of threads, cores, and sockets in Slurm for HPC workflows
    • Git Guide
  • Connecting to Unity
    • SSH
    • Unity OnDemand
    • Connecting to Desktop VS Code
  • Get Help
    • Frequently Asked Questions
    • How to Ask for Help
    • Troubleshooting
  • Cluster Specifications
    • Node List
    • Partition List
    • Storage
    • Node Features (Constraints)
      • NVLink and NVSwitch
    • CPU Summary List
    • GPU Summary List
  • Managing Files
    • Command Line Interface (CLI)
    • Disk Quotas
    • FileZilla
    • Globus
    • RStor Research Storage System
      • Managing RStor Shares
      • RStor Usage
      • The Allocation Portal
    • Scratch: HPC Workspace
    • Unity OnDemand File Browser
  • Submitting Jobs
    • Batch Jobs
      • Array Batch Jobs
      • Large Job Counts
      • Monitor a batch job
    • Helper Scripts
    • Interactive CLI Jobs
    • Unity OnDemand
    • Message Passing Interface (MPI)
    • Slurm cheat sheet
    • Managing group resource use
  • Software Management
    • Building Software from Scratch
    • Conda
    • Modules
      • Module Usage
    • Renv
    • Unity OnDemand
      • JupyterLab OnDemand
    • Venv
  • Tools & Software
    • ColabFold
    • PyTorch
    • R
      • R Parallelization
    • Unity GPUs
  • Datasets
    • AI and ML
      • Aleph-Alpha
      • Alibaba-NLP
      • Allen AI
      • AlpacaFarm
      • Amass
      • Audioset
      • BAAI
      • Bigcode
      • Biomed Clip
      • Blip 2
      • Bloom
      • ByteDance
      • COCO
      • Code Llama
      • DeepAccident
      • DeepSeek
      • DeSTA
      • Diffa
      • DINO v2
      • epic-kitchens
      • Falcon
      • Florence
      • FLUX.1 Kontext
      • Fomo
      • Gemma
      • Genmo
      • Glm
      • GPT
      • HiDream-I1
      • Ibm Granite
      • Idefics2
      • Imagenet 1K
      • Inaturalist
      • Infly
      • InternLM
      • Internvl3-8b-hf
      • Intfloat
      • Kinetics
      • LG
      • Linq
      • Llama2
      • Llama3
      • Llama4
      • Llava_OneVision
      • LLM-compiler
      • LMSys
      • Lumina
      • Mims
      • Mixtral
      • Monai
      • Moonshot-ai
      • Msmarco
      • Natural-questions
      • Nvidia
      • Objaverse
      • Openai-whisper
      • Perplexity AI
      • Phi
      • Playgroundai
      • Pythia
      • Qwen
      • Qwen2
      • Qwen3
      • Rag-sequence-nq
      • S1-32B
      • Scalabilityai
      • Sft
      • SlimPajama
      • T5
      • Tulu
      • V2X
      • Video-MAE
      • Vit
      • Wildchat
    • Bioinformatics
      • AlphaFold3 Databases
      • BFD/MGnify
      • Big Fantastic Database
      • checkm
      • ColabFoldDB
      • Databases for ColabFold
      • dfam
      • EggNOG - version 5.0
      • EggNOG - version 6.0
      • EVcouplings databases
      • Genomes from NCBI RefSeq database
      • GMAP-GSNAP database (human genome)
      • GTDB
      • Illumina iGenomes
      • Kraken2
      • MGnify
      • NCBI BLAST databases
      • NCBI RefSeq database
      • Parameters of AlphaFold
      • Parameters of Evolutionary Scale Modeling (ESM) models
      • PDB70
      • PINDER
      • PLINDER
      • Protein Data Bank
      • Protein Data Bank database in mmCIF format
      • Protein Data Bank database in SEQRES records
      • Tara Oceans 18S amplicon
      • Tara Oceans MATOU gene catalog
      • Tara Oceans MGT transcriptomes
      • Tattabio
      • Uniclust30
      • UniProtKB
      • UniRef100
      • UniRef30
      • UniRef90
      • Updated databases for ColabFold
    • Using HuggingFace Datasets

On this page

  • Seeing jobs in your account
  • Managing other users’ jobs
    • Cancel jobs
    • Hold and release jobs
  • Viewing and editing user limits in your account
    • Check current user limits
  • Examples of changing user limits
    • Example: Limit how many jobs a user can run at once
    • Example: Cap the total CPUs a user can use at once
    • Example: Limit total memory usage for a user
    • Example: Temporary stricter limits for a whole class
    • Example: Limit how many A100 GPUs a user can use
  • Balancing fairness and usability
  • Summary of useful commands
  1. Unity
  2. Documentation
  3. Submitting Jobs
  4. Managing group resource use

Managing group resource use

PI’s may request that themselves or a designator user within their group be made a Coordinator for their group. Please send an email to hpc@umass.edu request this.

A Coordinator in Slurm is like a “power user” for group (Slurm “Account”). As a Coordinator you can:

  • See and manage jobs for users in your account
  • Adjust limits that control how much of the cluster your group members can use

You do not have full administrator powers, but you can help keep your group’s jobs under control and share resources fairly.

Seeing jobs in your account

Use squeue to see which jobs your group members are running:

# See jobs for a specific account (e.g., "pi_groupname")
squeue -A pi_groupname

Common helpful filters:

# See all running jobs in your account
squeue -A pi_groupname -t RUNNING

# See pending (waiting) jobs in your account
squeue -A pi_groupname -t PENDING

# See jobs of a specific user in your account
squeue -A pi_groupname -u someuser

Managing other users’ jobs

As a Coordinator for pi_groupname, you can hold, release, or cancel jobs submitted under that account, including jobs owned by other users in your group.

Cancel jobs

Cancel a job by job ID:

# Cancel a single job
scancel 123456

# Cancel all jobs from a specific user in your account
scancel -u someuser -A pi_groupname

# Cancel all pending jobs in your account
scancel -t PENDING -A pi_groupname

Use cases:

  • A user accidentally submitted 1,000 jobs instead of 10.
  • A job is clearly stuck (no progress, wrong partition, etc.).

Hold and release jobs

You can hold a job so it does not start, and release it later.

# Hold a job (prevent it from starting)
scontrol hold 123456

# Release a held job
scontrol release 123456

Typical reasons to hold:

  • A student has submitted a very large job right before a deadline and you want to wait until other jobs finish.
  • You want the user to fix their code before the job runs again (note that changes to the batch script require cancel and resubmit).

Viewing and editing user limits in your account

As Coordinator, you can adjust resource limits for users under your account using sacctmgr. These limits help control how much of the cluster they can use at once.

  • Maximum jobs running at once
  • Maximum CPUs/cores in use
  • Maximum GPUs in use
  • Maximum jobs per user in a specific account

Check current user limits

To see the limits for a user under your account:

# Show user info, scoped to your account
sacctmgr show user someuser withassoc format=User,Account,GrpJobs,GrpTRES,MaxJobs,MaxTRES

You may see fields like:

  • MaxJobs – max number of jobs this user can run at once
  • MaxTRES – max resources (like CPUs or GPUs) this user can use
  • GrpJobs / GrpTRES – group-wide limits for the association (what the entire group can use at once)

Examples of changing user limits

You use sacctmgr to modify limits. Commands below assume your account is pi_groupname.

Example: Limit how many jobs a user can run at once

Scenario: A user alice keeps submitting hundreds of short jobs that overwhelm the queue. You want to limit her to 20 running jobs at any time in account pi_groupname.

# Set MaxJobs=20 for user alice in account pi_groupname
sacctmgr modify user where name=alice account=pi_groupname set MaxJobs=20

Verify:

sacctmgr show user alice withassoc format=User,Account,MaxJobs

Result: Alice can still submit more than 20 jobs, but Slurm will only allow 20 to run simultaneously. The rest will stay pending.

Example: Cap the total CPUs a user can use at once

Scenario: User bob submits a few very large multi-core jobs that monopolize the cluster. You want to limit him to 64 CPUs at a time in pi_groupname.

# Limit bob to 64 CPUs (C for CPU) across all running jobs in account pi_groupname
sacctmgr modify user where name=bob account=pi_groupname set MaxTRES=cpu=64

Check:

sacctmgr show user bob withassoc format=User,Account,MaxTRES

Result: If bob already has 64 CPUs in use, any new job asking for more CPUs will stay pending until some CPUs free up.

Example: Limit total memory usage for a user

Scenario: User carol runs several memory-heavy jobs that cause memory pressure. You want to limit her to 256 GB of RAM at once.

# Limit carol to 256G of total memory across her running jobs
sacctmgr modify user where name=carol account=pi_groupname set MaxTRES=mem=256G

Check:

sacctmgr show user carol withassoc format=User,Account,MaxTRES

Result: Once Carol reaches 256G of allocated memory across her jobs, additional memory-demanding jobs will wait.

lightbulb

Tip: MaxTRES can combine multiple resources, e.g.:

sacctmgr modify user where name=carol account=pi_groupname \
         set MaxTRES=cpu=64,mem=256G

Example: Temporary stricter limits for a whole class

Scenario: For a course group, you want to restrict each student to 4 jobs and 16 CPUs in your account ds532_school_edu.

# Limit max running jobs
sacctmgr modify user where account=ds532_school_edu set MaxJobs=4

# Limit total CPUs
sacctmgr modify user where account=ds532_school_edu set MaxTRES=cpu=16

Limits can also be ste on a per-user basis by adding user=student_usernname. You can relax those limits by raising them or clearing them (set to -1).

Example: Limit how many A100 GPUs a user can use

Scenario: User dave is running multiple GPU jobs and tends to grab all the A100 GPUs. You want to limit him to 2 A100 GPUs total at any time in account mygroup.

# Limit dave to 2 A100 GPUs across all running jobs in account mygroup
sacctmgr modify user where name=dave account=mygroup set MaxTRES=gres/gpu:a100=2

Check:

sacctmgr show user dave withassoc format=User,Account,MaxTRES

Result: If dave already has 2 A100 GPUs allocated (e.g., one job with --gres=gpu:a100:2 or two jobs with --gres=gpu:a100:1), any additional job that requests an A100 GPU will remain PENDING until one of his GPU jobs finishes.

You can also combine GPU limits with CPU/memory limits in one command, for example:

sacctmgr modify user where name=dave account=mygroup \
    set MaxTRES=cpu=32,mem=128G,gres/gpu:a100=2

This keeps dave within 32 CPUs, 128G RAM, and 2 A100 GPUs simultaneously.

Balancing fairness and usability

When changing limits, consider:

  • Fairness: Ensure a single user cannot block the whole group.
  • Flexibility: For advanced users running large but important jobs, coordinate with them so limits are reasonable.
  • Transparency: Tell users what limits are in place and why (e.g., “You’re limited to 32 CPUs so everyone gets a share”).

Summary of useful commands

View jobs:

squeue -A pi_groupname
squeue -A pi_groupname -u someuser

Manage jobs:

scancel 123456           # Cancel job
scancel -A pi_groupname -u X  # Cancel all jobs of user X in your account
scontrol hold 123456     # Hold job
scontrol release 123456  # Release job

View limits:

sacctmgr show user someuser withassoc format=User,Account,MaxJobs,MaxTRES

Change limits (examples):

# Max 20 jobs:
sacctmgr modify user where name=alice account=pi_groupname set MaxJobs=20

# Max 64 CPUs:
sacctmgr modify user where name=bob account=pi_groupname set MaxTRES=cpu=64

# Max GPUs:
sacctmgr modify user where name=carol account=pi_groupname set MaxTRES=gres/gpu:a100=2,gres/gpu:24
Last modified: Monday, March 9, 2026 at 3:06 PM. See the commit on GitLab.
University of Massachusetts Amherst University of Massachusetts Amherst University of Rhode Island University of Rhode Island University of Massachusetts Dartmouth University of Massachusetts Dartmouth University of Massachusetts Lowell University of Massachusetts Lowell University of Massachusetts Boston University of Massachusetts Boston Mount Holyoke College Mount Holyoke College Smith College Smith College Olin College of Engineering Olin College of Engineering
search
close