All news with #aws eks tag
Thu, September 18, 2025
Amazon SageMaker HyperPod Adds Managed Karpenter Autoscaling
🛠️ Amazon SageMaker HyperPod now supports managed node autoscaling using Karpenter, enabling automated cluster scaling for both inference and training workloads. This managed capability removes the operational burden of installing and maintaining autoscaling infrastructure while providing integrated resilience and fault tolerance. Customers gain just-in-time GPU provisioning, scale-to-zero during low demand, workload-aware instance selection, and cost reductions through intelligent consolidation.
Tue, September 16, 2025
Amazon EKS Adds Community Add-Ons Catalog for GovCloud
🔒Amazon EKS now offers a curated catalog of community add-ons for AWS GovCloud (US) Regions. The catalog includes popular open-source components such as metrics-server, kube-state-metrics, cert-manager, prometheus-node-exporter, fluent-bit, and external-dns, all packaged, scanned, and validated for compatibility by EKS. Container images are hosted in an EKS-owned private ECR repository, and you can install and manage add-ons via the EKS Console, API, CLI, eksctl, or infrastructure-as-code tools like AWS CloudFormation.
Mon, September 15, 2025
Amazon GuardDuty Protection Plans and Threat Detection
🔐 Amazon GuardDuty centralizes continuous threat detection across AWS using AI/ML and integrated threat intelligence. It offers optional protection plans—S3, EKS, Runtime Monitoring, Malware Protection for EC2 and S3, RDS, and Lambda—that extend detections to service-specific telemetry and runtime behaviors. Built-in Extended Threat Detection correlates signals into high-confidence attack sequences and maps findings to MITRE ATT&CK, providing prioritized remediation guidance.
Wed, September 10, 2025
CloudWatch Flow Monitors Extend Cross-Region Visibility
🔍 With this update, Amazon CloudWatch Network Monitoring flow monitors can observe traffic between AWS Regions over the AWS global network. Flow monitors deliver near real-time metrics for compute instances such as Amazon EC2 and Amazon EKS, and for services like Amazon S3 and Amazon DynamoDB, to help detect and attribute network-driven impairments. The network health indicator now captures cross-Region path health including visibility into remote public IPs and private traffic over VPC and Transit Gateway peering.
Mon, September 8, 2025
Managed Tiered Checkpointing for Amazon SageMaker HyperPod
⚡ Amazon Web Services has announced general availability of managed tiered checkpointing for Amazon SageMaker HyperPod, a hybrid checkpointing capability that caches frequent checkpoints in CPU memory and periodically persists them to Amazon S3 for durability. The approach reduces model recovery time and minimizes training progress loss on large-scale clusters. It integrates with PyTorch Distributed Checkpoint (DCP) and is enabled via a CreateCluster/UpdateCluster API parameter; customers can use the sagemaker-checkpointing Python library to adopt it with minimal code changes. Currently available for HyperPod clusters using the EKS orchestrator.
Tue, September 2, 2025
AWS Split Cost Allocation Adds GPU and Accelerator Cost Tracking
🔍 Split Cost Allocation Data now supports accelerator-based workloads running in Amazon Elastic Kubernetes Service (EKS), allowing customers to track costs for Trainium, Inferentia, NVIDIA and AMD GPUs alongside CPU and memory. Cost details are included in the AWS Cost and Usage Report (including CUR 2.0) and can be visualized using the Containers Cost Allocation dashboard in Amazon QuickSight or queried with Amazon Athena. New customers can enable the feature in the Billing and Cost Management console; it is automatically enabled for existing Split Cost Allocation Data customers.
Wed, August 27, 2025
Amazon EKS adds on-demand cluster insights refresh
🔁 Amazon EKS now supports on-demand refresh of cluster insights, enabling operators to retrieve the latest detection results immediately after making changes. The capability complements existing periodic checks that identify upgrade warnings and configuration recommendations. By allowing immediate verification, teams can accelerate upgrade testing, confirm that remediations took effect, and shorten the feedback loop for cluster configuration changes.
Wed, August 27, 2025
SageMaker HyperPod Supports EBS CSI Driver for Storage
🔧 Amazon SageMaker HyperPod now supports the Amazon Elastic Block Store (EBS) Container Storage Interface (CSI) driver, enabling dynamic provisioning and lifecycle management of persistent EBS volumes for machine learning workloads on HyperPod EKS clusters. Through standard Kubernetes persistent volume claims and storage classes, teams can create, attach, resize, snapshot, and encrypt volumes (including customer-managed KMS keys), and volumes persist across pod restarts and node replacements. Install the EBS CSI driver as an EKS add-on to get started; the capability is available in all regions where HyperPod EKS clusters are supported.
Fri, August 22, 2025
Amazon EKS adds namespace configuration for add-ons
🔧 Amazon Elastic Kubernetes Service (Amazon EKS) now allows you to select a custom Kubernetes namespace when installing both AWS and Community add-ons, giving operators finer control over object organization and isolation within clusters. You can install add-ons into a chosen namespace via the AWS Console, EKS APIs, AWS CLI, or infrastructure-as-code tools like CloudFormation. Note that to move an installed add-on to a different namespace you must remove and recreate it. This capability is available in all commercial AWS Regions.