< ciso
brief />
Tag Banner

All news with #kubernetes tag

35 articles · page 2 of 2

Kubernetes introduces control-plane minor-version rollback

🔁 Google and the Kubernetes community introduced control-plane minor-version rollback in Kubernetes 1.33, giving operators a safe, observable path to revert control-plane upgrades. The new KEP-4330 emulated-version model separates binary upgrades from API and storage transitions into a two-step process, enabling validation before committing changes. This capability is available in open-source Kubernetes and will be generally available in GKE 1.33 soon, reducing upgrade risk and shortening recovery time from unexpected regressions.
read more →

How Google Cloud Networking Supports AI Workloads at Scale

🔗 Networking is a critical enabler for AI on Google Cloud, connecting models, storage, and inference endpoints while preserving security and performance. The post outlines seven capabilities—from private API access and RDMA-backed GPU interconnects to hybrid Cross-Cloud links—that reduce latency, prevent data exfiltration, and simplify model serving. It also highlights options for exposing inference (managed services, GKE, load balancing) and previews AI-driven network operations using Gemini.
read more →

EKS Split Cost Allocation Now Imports Pod Labels for Billing

🔖 Starting today, Split Cost Allocation Data for Amazon EKS can import up to 50 Kubernetes custom labels per pod as cost allocation tags. You can attribute pod-level costs in the AWS Cost and Usage Report (CUR) using labels such as cost center, application, business unit, and environment. New customers enable the feature in the AWS Billing and Cost Management console; existing customers will have labels automatically imported but must activate them as cost allocation tags. After activation labels appear in CUR within 24 hours and can be visualized via the Containers Cost Allocation dashboard in Amazon QuickSight or queried with Amazon Athena.
read more →

Deploying AWS Secrets Manager Agent as an EKS Sidecar

🔒 This post demonstrates deploying the AWS Secrets Manager Agent as a sidecar container in Amazon EKS to provide a language-agnostic local HTTP interface (localhost:2773) for secrets retrieval. The agent pulls and caches secret values, reducing direct API calls to Secrets Manager and improving application availability. It enforces SSRF protection via a generated token at /var/run/awssmatoken and implements ML‑KEM post‑quantum key exchange by default. Authentication uses Amazon EKS Pod Identity and IAM permissions (secretsmanager:GetSecretValue and secretsmanager:DescribeSecret), and the post includes build, containerization, and deployment steps.
read more →

LinkPro Rootkit Uses eBPF and Magic TCP Packets to Hide

🔒 An AWS-hosted compromise revealed a new GNU/Linux rootkit dubbed LinkPro, discovered by Synacktiv. Attackers leveraged an exposed Jenkins server vulnerable to CVE-2024-23897 and deployed a malicious Docker image (kvlnt/vv) to Kubernetes clusters, delivering a VPN/proxy (vnt), a Rust downloader (vGet) and vShell backdoors. LinkPro relies on two eBPF modules—Hide and Knock—to conceal processes and activate via a magic TCP packet, with a user-space fallback via /etc/ld.so.preload when kernel support is missing.
read more →

Amazon EKS and EKS Distro Add Kubernetes 1.34 Support

🚀 AWS announced that Amazon EKS and EKS Distro now support Kubernetes version 1.34. Starting today, you can create new clusters or upgrade existing clusters via the EKS console, eksctl, or infrastructure-as-code tools, with EKS Distro images available in ECR Public Gallery and GitHub. Kubernetes 1.34 introduces projected service account tokens for kubelet image credential providers, Pod-level resource requests and limits for simpler multi-container resource management, and Dynamic Resource Allocation prioritized alternatives to improve device scheduling and workload placement. AWS recommends using EKS Cluster Insights and consulting EKS version lifecycle guidance before upgrading.
read more →

GKE Autopilot Features Now Available to Qualified Clusters

🚀 Google Cloud has extended core Autopilot capabilities to qualified Standard GKE clusters, enabling access to the new container-optimized compute platform via built-in compute classes. Available initially to clusters in the Rapid release channel running 1.33.1-gke.1107000 or later, these features include the autopilot and autopilot-spot compute classes and a provisioning mode that supports gradual adoption. Benefits include rapid horizontal and vertical scaling, pay-for-request billing, efficient bin-packing, and support for GPUs and TPUs for AI workloads.
read more →

Amazon SageMaker HyperPod Adds Managed Karpenter Autoscaling

🛠️ Amazon SageMaker HyperPod now supports managed node autoscaling using Karpenter, enabling automated cluster scaling for both inference and training workloads. This managed capability removes the operational burden of installing and maintaining autoscaling infrastructure while providing integrated resilience and fault tolerance. Customers gain just-in-time GPU provisioning, scale-to-zero during low demand, workload-aware instance selection, and cost reductions through intelligent consolidation.
read more →

Critical Code-Execution CVEs Found in Chaos-Mesh Platform

⚠️ JFrog Security Research disclosed multiple CVEs in Chaos-Mesh, including three critical flaws that permit in-cluster attackers to execute arbitrary code on any pod. The Chaos Controller Manager exposes an unauthenticated ClusterIP GraphQL /query endpoint on port 10082 by default, enabling mutations such as killProcesses and cleanTcs. The critical issues (CVSS 9.8) arise from unsafe command construction in resolvers and an ExecBypass routine that allows OS command injection. Operators should upgrade to Chaos-Mesh 2.7.3 immediately; as a temporary mitigation redeploy the Helm chart with the control server disabled.
read more →

Amazon SageMaker HyperPod: Slurm Health Agent Now GA

🩺 Amazon announces general availability of the SageMaker HyperPod health monitoring agent for Slurm clusters. The agent runs continuously on GPU- and Trainium-based nodes to perform passive background checks, detect hardware faults (for example, unresponsive GPUs and NVLink errors), and mark and replace unhealthy nodes automatically. It supports automatic reboots and coordinates with Slurm job auto-resume so training can continue from the last checkpoint, reducing manual intervention and downtime.
read more →

Azure Container Storage v2.0.0: NVMe Boosts Kubernetes

⚡ Azure today released Azure Container Storage v2.0.0, a performance-first update that delivers up to 7× higher IOPS, 4× lower latency, and improved resource efficiency for Kubernetes stateful workloads. The release adds built-in support for local NVMe drives, removes prior pricing tiers for large pools, and is available as an open-source local CSI driver for non-AKS clusters. Optimized for storage- and GPU-optimized VM families, the update also enables single-node deployments and integrates with KAITO to speed AI model loading and scaling.
read more →

Amazon Managed Service for Prometheus Adds 11 Regions

📢 Amazon Managed Service for Prometheus is now generally available in 11 additional AWS regions, including Asia Pacific (Jakarta, Hyderabad, Osaka, Melbourne, Taipei), Canada West (Calgary), Europe (Spain), Israel (Tel Aviv), Mexico (Central), Middle East (Bahrain), and US West (N. California). The fully managed, Prometheus-compatible monitoring service makes it easier to collect, store, query, and alarm on operational metrics at scale. Customers can send up to 1 billion active metrics to a single workspace and create multiple workspaces per account to partition workloads. See the AWS user guide or product documentation for the full list of supported regions and configuration details.
read more →

Wesco Reimagines Risk Management with Data Consolidation

🔍 Wesco consolidated thousands of security alerts into a unified risk framework to separate urgent threats from noise. By integrating more than a dozen platforms — including GitHub, Azure DevOps, Veracode, JFrog, Kubernetes, Microsoft Defender, and CrowdStrike — the company applied ASPM, threat modeling, a security champions program, and AI-driven automation to prioritize remediation. The initiative reduced duplication, saved developer time, and improved risk visibility across the organization.
read more →

SageMaker HyperPod Supports EBS CSI Driver for Storage

🔧 Amazon SageMaker HyperPod now supports the Amazon Elastic Block Store (EBS) Container Storage Interface (CSI) driver, enabling dynamic provisioning and lifecycle management of persistent EBS volumes for machine learning workloads on HyperPod EKS clusters. Through standard Kubernetes persistent volume claims and storage classes, teams can create, attach, resize, snapshot, and encrypt volumes (including customer-managed KMS keys), and volumes persist across pod restarts and node replacements. Install the EBS CSI driver as an EKS add-on to get started; the capability is available in all regions where HyperPod EKS clusters are supported.
read more →

CISA Releases Thorium: Scalable Malware Analysis Platform

🛡️ CISA, in partnership with Sandia National Laboratories, released Thorium, an automated, scalable malware and forensic analysis platform that consolidates commercial, custom, and open-source tools into unified, automated workflows. Thorium is configured to ingest over 10 million files per hour per permission group and schedule more than 1,700 jobs per second, enabling rapid, large-scale binary and artifact analysis while maintaining fast query performance. It scales on Kubernetes with ScyllaDB, supports Dockerized tools and VM/bare-metal integrations, and enforces strict group-based access controls along with tag and full-text filtering for results.
read more →