Tag Banner

All news with #slurm tag

Thu, November 20, 2025

AWS PCS Adds Slurm REST API for Programmatic Job Control

🔁 The AWS Parallel Computing Service (AWS PCS) now supports the Slurm REST API, enabling programmatic job submission, resource management, and cluster monitoring over HTTP. This removes reliance on CLI-only workflows and lets teams integrate HPC operations into web portals, CI/CD pipelines, and data processing frameworks. The feature is available in all AWS Regions with AWS PCS and has no additional charge.

read more →

Tue, November 11, 2025

AWS PCS Adds Slurm CLI Filter Plugin Support for HPC

🛠️ AWS Parallel Computing Service (PCS) now supports Slurm CLI Filter plugins, letting administrators extend and modify how Slurm evaluates and schedules HPC jobs without changing Slurm source code. With CLI Filter plugins, you can enforce custom submission policies — validate required flags, reject submissions missing attributes, or adjust job parameters at submission. This capability is available in all Regions where PCS is offered.

read more →

Wed, October 22, 2025

AWS PCS Adds Slurm Cluster Secret Rotation Support

🔐 AWS Parallel Computing Service (PCS) now supports rotation of Slurm cluster secret keys using AWS Secrets Manager. Administrators can update the credentials used for authentication between the Slurm controller and compute nodes without recreating a cluster, preserving running workloads and configuration. Regular rotation reduces the risk of credential compromise and helps meet security best practices and compliance requirements. The capability is available in all Regions where PCS operates and can be initiated from the Secrets Manager console or via API after preparing the cluster for rotation.

read more →

Fri, October 17, 2025

AWS Parallel Computing Service Adds Support for Slurm v25.05

🚀 AWS Parallel Computing Service (PCS) now supports Slurm v25.05, enabling PCS clusters to run the latest Slurm capabilities. The release introduces enhanced multi-cluster sackd configuration so login nodes can manage multiple clusters without requiring sackd reconfiguration or restarts, allowing administrators to preconfigure user access across clusters. It also implements improved requeue behavior that automatically retries failed instance launches during capacity shortages, increasing scheduling resilience and overall cluster reliability.

read more →

Thu, October 2, 2025

AWS PCS Adds Slurm Node Reboot, Available in All Regions

🔁 AWS Parallel Computing Service (PCS) now supports rebooting compute nodes using Slurm commands without triggering instance replacement. You can use the scontrol reboot command with options for immediate or deferred reboots to troubleshoot, perform resource cleanup, or recover from degraded states. This capability is available in all PCS-supported AWS Regions and helps teams maintain cluster health more efficiently while reducing costs associated with unnecessary instance replacements.

read more →

Thu, October 2, 2025

AWS PCS Expands Slurm Configuration with 60+ Settings

🔧 AWS Parallel Computing Service (AWS PCS) now supports over 60 additional Slurm configuration parameters, giving administrators finer control of job scheduling, resource allocation, access permissions, and job lifecycle behavior. New options include queue-specific priority policies, preemption rules, custom time and resource limits, and account-level access controls. Per-job execution behaviors and QoS tuning help run multi-team production HPC environments more efficiently. The expanded settings are available in all AWS PCS regions.

read more →

Thu, October 2, 2025

AWS PCS allows dynamic Slurm cluster configuration

🔧 AWS Parallel Computing Service (AWS PCS) now lets you change key Slurm workload manager settings on live clusters without rebuilding them. Administrators can update accounting and workload management parameters via the AWS Management Console, AWS CLI, or AWS SDK. This change reduces operational disruption and enables faster adaptation to evolving HPC requirements. Changes are available in all regions where AWS PCS is offered.

read more →

Wed, September 17, 2025

AWS PCS Supports EC2 Capacity Blocks for ML Workloads

🔧 Amazon Web Services has added native support for EC2 Capacity Blocks in the Parallel Computing Service (PCS), enabling use of reserved EC2 instances directly within PCS Slurm clusters. This integration lets Capacity Blocks be associated with PCS compute node groups via an EC2 Launch Template, simplifying capacity planning for GPU‑based ML workloads. The feature is available in all Regions where both services are offered and aims to improve availability and predictability for cutting‑edge GPU jobs.

read more →