Tag Banner

All news with #aws eks tag

Sun, November 30, 2025

Amazon EKS Capabilities: Managed Kubernetes Platform

🚀 Amazon EKS Capabilities is now generally available, offering a fully managed, extensible set of Kubernetes-native platform features that offload operations to AWS. The capabilities run in AWS-owned infrastructure separate from customer clusters and AWS handles autoscaling, patching, and upgrades. Launch features include Argo CD for continuous deployment, AWS Controllers for Kubernetes (ACK) for resource management, and Kube Resource Orchestrator (KRO) for dynamic orchestration.

read more →

Wed, November 26, 2025

AWS Secrets Store CSI Driver Add-on for Amazon EKS

🔐 This post introduces the AWS provider for the Secrets Store CSI Driver and the new Amazon EKS add-on that mounts Secrets Manager secrets and Systems Manager parameters as files in Kubernetes pods. The add-on simplifies installation compared with Helm or kubectl, supports EC2 and hybrid nodes, and includes security patches and FIPS endpoint options. The walkthrough covers prerequisites, creating a test secret, installing the add-on, configuring an IAM role and EKS Pod Identity association, deploying an example pod that mounts the secret at /mnt/secrets-store, validating retrieval, and cleaning up resources.

read more →

Wed, November 26, 2025

SageMaker HyperPod Adds Custom Kubernetes Labels and Taints

🛠️ Amazon SageMaker HyperPod now supports custom Kubernetes labels and taints configured at the instance group level via the CreateCluster and UpdateCluster APIs. You can specify up to 50 labels and 50 taints per instance group using the KubernetesConfig parameter. HyperPod automatically applies and preserves these settings across node creation, replacement, scaling, and patching, eliminating manual kubectl work and ensuring device plugin pods (EFA, NVIDIA) schedule correctly while allowing NoSchedule taints to protect costly GPU nodes.

read more →

Wed, November 26, 2025

Amazon SageMaker HyperPod: Programmatic Node Recovery

🚀 Amazon SageMaker HyperPod is now generally available with new programmatic APIs that let administrators reboot or replace cluster nodes at scale. The BatchRebootClusterNodes and BatchReplaceClusterNodes APIs provide an orchestrator-agnostic way to recover unresponsive or degraded nodes for both Slurm and EKS clusters. Each API supports batch operations for up to 25 instances and complements existing orchestrator-specific workflows. The capabilities are currently available in US East (Ohio), Asia Pacific (Mumbai), and Asia Pacific (Tokyo) and are accessible via the AWS CLI, SDKs, or API calls.

read more →

Tue, November 25, 2025

Manage SageMaker HyperPod Clusters with AI MCP Server

🔧 The Amazon SageMaker AI MCP Server now provides tools to set up and manage HyperPod clusters, allowing AI coding assistants to provision and operate clusters for distributed training, fine‑tuning, and deployment. It automates prerequisites and orchestrates clusters via Amazon EKS or Slurm with CloudFormation templates that optimize networking, storage, and compute. The server also delivers lifecycle operations — scaling, patching, diagnostics — so administrators and data scientists can manage large-scale AI/ML clusters without deep infrastructure expertise.

read more →

Mon, November 24, 2025

Amazon SageMaker HyperPod Adds Spot Instance Support

⚡ Amazon SageMaker HyperPod now supports Spot Instances, enabling customers to reduce GPU compute costs by up to 90% compared with on-demand instances. The integration is available on HyperPod EKS clusters and works with Karpenter for intelligent autoscaling, automatic Spot capacity discovery, and interruption handling. You can enable Spot when creating instance groups via the CreateCluster API or the AWS Console, and the feature supports all HyperPod instance types across available regions.

read more →

Fri, November 21, 2025

Amazon EKS add-on: AWS Secrets Store CSI Driver Provider

🔐 AWS has announced general availability of the Amazon EKS add-on for the AWS Secrets Store CSI Driver provider, enabling clusters to mount secrets from AWS Secrets Manager and parameters from AWS Systems Manager Parameter Store as files on Kubernetes workloads. The add-on installs and manages the AWS provider component and supports automated setup and lifecycle management for new and existing Amazon EKS clusters. It is available in all AWS commercial and AWS GovCloud (US) Regions.

read more →

Fri, November 21, 2025

Amazon EKS Provisioned Control Plane for High Performance

🚀 Amazon EKS introduced Provisioned Control Plane, letting customers select pre-defined control plane capacity tiers for new or existing clusters via APIs, the AWS Console, or infrastructure-as-code. The feature pre-provisions capacity to deliver predictable, low-latency control plane performance during traffic spikes and unpredictable bursts. It unlocks higher cluster scalability for ultra-scale workloads such as AI training, high-performance computing, and large data processing, and helps align development, staging, production, and disaster recovery behavior.

read more →

Fri, November 21, 2025

CloudWatch Container Insights Supports Neuron UltraServers

🔍 Amazon CloudWatch Container Insights now supports Neuron UltraServers on Amazon EKS, enabling aggregated observability for multi-instance ML servers. The update adds a new UltraServer ID filter that presents consolidated metrics across all instances in a logical UltraServer group while retaining per-instance visibility. Available in all commercial AWS Regions and AWS GovCloud (US), this simplifies monitoring and troubleshooting for Trainium and Inferentia workloads.

read more →

Fri, November 21, 2025

Amazon ECS and EKS Add AI-Powered Troubleshooting in Console

🔍 The AWS Management Console now integrates Amazon Q Developer AI-assisted troubleshooting directly into Amazon ECS and Amazon EKS. Contextual 'Inspect with Amazon Q' controls appear alongside error and status messages to gather relevant logs and metrics, analyze root causes, and present one-click mitigation suggestions. The experience covers failed tasks, container health checks, deployment rollbacks, cluster and node health, and Kubernetes pod events, and is available in all AWS commercial regions.

read more →

Fri, November 21, 2025

CloudWatch Container Insights: Sub-Minute GPU Metrics

🔍 Amazon CloudWatch Container Insights now supports configurable sub-minute GPU sampling for Amazon EKS, enabling GPU metrics to be collected at a per-second sample rate and aggregated to CloudWatch once per minute. This enhancement gives teams finer visibility into short-lived AI/ML inference and GPU-intensive workloads, helping to optimize resource utilization, troubleshoot performance issues, and improve operational efficiency for containerized GPU applications. The feature is available in all AWS Commercial Regions and AWS GovCloud (US) Regions at no additional cost.

read more →

Fri, November 21, 2025

Amazon SageMaker HyperPod Adds IDE and Notebook Support

🚀 Amazon SageMaker HyperPod now supports running IDEs and Notebooks on persistent EKS-based HyperPod clusters, allowing developers to run JupyterLab, Code Editor, or connect local IDEs directly to GPU-backed compute. Developers can share data across interactive sessions and training jobs via mounted file systems such as FSx and EFS, and use familiar tools including the HyperPod CLI. Administrators gain unified governance through HyperPod Task Governance and visibility into CPU, GPU, and memory consumption via HyperPod Observability, helping optimize cluster utilization. The feature is available in all AWS Regions that support HyperPod, excluding China and GovCloud (US).

read more →

Thu, November 20, 2025

Transfer Data Across AWS Partitions with Roles Anywhere

🔐 AWS outlines replacing cross-partition IAM user keys with IAM Roles Anywhere to securely transfer data between AWS partitions. The post explains partition isolation (Commercial, GovCloud, China), why long-lived access keys are discouraged, and how IAM Roles Anywhere uses X.509 certificates and temporary credentials. It also covers using an external CA or AWS Private CA to issue and manage certificates for workloads.

read more →

Thu, November 20, 2025

SageMaker Studio: Long‑Running Sessions with Corporate IDs

⏳ Amazon SageMaker Unified Studio now supports long-running background sessions using corporate identities via AWS IAM Identity Center's trusted identity propagation (TIP). Users can launch interactive notebooks and data processing on SageMaker, Amazon EMR, and AWS Glue that persist when they log off or experience network or credential interruptions. Sessions retain corporate permissions and can run up to 90 days (default 7 days), reducing the need for continuous monitoring and improving productivity for multi-hour or multi-day workloads.

read more →

Wed, November 19, 2025

Amazon EKS Adds Enhanced Container Network Observability

🔍 Amazon EKS now delivers enhanced container network observability with granular, network-related metrics and integrated console visualizations to help teams monitor and troubleshoot Kubernetes networking on AWS. Powered by Amazon CloudWatch Network Flow Monitor, the capabilities reveal cross-AZ flows, top-talkers, retransmissions, and retransmission timeouts for faster root cause analysis. Teams can ingest metrics into their preferred observability stacks and use the console views to eliminate blind spots during incidents. These features are available in all commercial Regions where CloudWatch Network Flow Monitor is offered.

read more →

Wed, November 12, 2025

Amazon EKS Independent Validation of Zero-Operator Access

🔒 AWS announced an independent affirmation of the Amazon EKS zero operator access design, validated by cybersecurity firm NCC Group. The review found no architectural gaps and confirmed that AWS personnel lack technical means to access or manipulate customer content in managed Kubernetes control planes or etcd backups. AWS highlights Nitro-based confidential compute, tightly scoped administrative APIs with multi-party change approval, mandatory logging and auditing, and envelope encryption for etcd as core protections. Customers retain visibility via cluster audit logs and remain responsible for securing worker node configurations outside managed modes.

read more →

Mon, November 10, 2025

AWS Backup Adds Native Support for Amazon EKS Across Regions

🔒 AWS Backup now supports Amazon EKS, providing a fully managed, centralized solution for backing up cluster state and persistent application data. The agent-free integration replaces custom scripts and third-party tools with a native, policy-driven service that offers automated scheduling, retention management, immutable vaults, and cross-Region and cross-account copies. You can restore entire clusters, specific namespaces, or individual persistent volumes to support disaster recovery, compliance, or pre-upgrade protection.

read more →

Thu, November 6, 2025

CloudWatch Application Signals Now in AWS GovCloud

🔒 CloudWatch Application Signals is now available in AWS GovCloud (US-East) and AWS GovCloud (US-West), extending automated application observability to government and regulated workloads. The service automatically collects telemetry from Amazon EC2, Amazon ECS, Amazon EKS and AWS Lambda to provide real-time health, dependency visualization and anomaly detection. By eliminating manual instrumentation, it helps teams meet compliance and monitoring requirements while improving incident detection and resolution. For pricing and setup, consult the CloudWatch pricing page and Application Signals documentation.

read more →

Wed, October 22, 2025

Amazon EKS Auto Mode Adds FIPS Support in GovCloud

🔐 Amazon Elastic Kubernetes Service (EKS) Auto Mode is now available in AWS GovCloud (US-East) and (US-West), automating compute, storage, and networking management for Kubernetes clusters. Its AMIs include FIPS-validated cryptographic modules to help meet FedRAMP-style requirements. EKS Auto Mode handles OS patching, leverages ephemeral compute to reduce persistent attack surface, and dynamically scales EC2 instances to optimize costs while maintaining availability; it supports clusters running Kubernetes 1.29 and later with no upfront fees.

read more →

Tue, October 14, 2025

AWS for Fluent Bit 3.0.0: Based on Fluent Bit 4.1.0

🚀 AWS for Fluent Bit 3.0.0, based on Fluent Bit 4.1.0 and Amazon Linux 2023, delivers faster, more secure container logging for Amazon ECS and Amazon EKS. It adds native OpenTelemetry (OTel) support for OTLP logs, metrics, and traces with SigV4 authentication and faster JSON parsing for higher throughput and lower latency. TLS minimum version and cipher controls enforce stronger output security. The image is available in the Amazon ECR Public Gallery and Amazon ECR, and source code and guidance are provided on GitHub.

read more →