< ciso
brief />
Tag Banner

All news with #amazon sagemaker ai tag

78 articles · page 4 of 4

AWS Step Functions Adds Amazon Q AI Troubleshooting Guidance

🔍 AWS has integrated Amazon Q's AI diagnostics into the AWS Step Functions console to provide context-aware troubleshooting for workflow errors. Users can click the Diagnose with Amazon Q button in error alerts and the console notification area to receive tailored remediation steps for state machine execution failures and Amazon States Language (ASL) syntax errors and warnings. Troubleshooting recommendations appear in a dedicated window showing remediation steps, analysis of relevant state, input, and logs, and suggested fixes to reduce manual investigation. The feature is automatically enabled in commercial AWS Regions where Amazon Q is available to help teams accelerate resolution and lower operational overhead.
read more →

Aurora PostgreSQL zero-ETL now integrates SageMaker

🔁 Amazon Aurora PostgreSQL-Compatible Edition now offers zero-ETL integration with Amazon SageMaker, enabling near-real-time replication of PostgreSQL tables into a lakehouse. The synced data conforms to Apache Iceberg open standards and is immediately accessible to SQL, Apache Spark, BI, and ML tools via a simple no-code interface without impacting production workloads. Comprehensive, fine-grained access controls are enforced across analytics engines, and the capability is available in multiple AWS Regions.
read more →

SageMaker AI Projects Adds Custom ML Templates from S3

🛠️ Amazon Web Services announced that SageMaker AI Projects can now provision custom ML project templates stored in Amazon S3. Administrators can define and manage standardized end-to-end project templates in SageMaker AI Studio so data scientists can create projects that follow organizational patterns and automated workflows. The feature is available in all AWS Regions where SageMaker AI Projects is offered.
read more →

Amazon OpenSearch Service Adds Batch AI Inference Support

🧠 You can now run asynchronous batch AI inference inside Amazon OpenSearch Ingestion pipelines to enrich and ingest very large datasets for Amazon OpenSearch Service domains. The same AI connectors previously used for real-time calls to Amazon Bedrock, Amazon SageMaker, and third parties now support high-throughput, offline jobs. Batch inference is intended for offline enrichment scenarios—generating up to billions of vector embeddings—with improved performance and cost efficiency versus streaming inference. The feature is available in regions that support OpenSearch Ingestion on domains running 2.17+.
read more →

Amazon SageMaker Managed MLflow Now in AWS GovCloud

🛡️ Amazon SageMaker managed MLflow is now available in both AWS GovCloud (US-West) and AWS GovCloud (US-East) regions. The managed service integrates MLflow experiment tracking with SageMaker capabilities, streamlining AI experimentation and accelerating GenAI development from idea to production. It provides end-to-end observability to help reduce time-to-market and simplify compliance and operational oversight for government workloads.
read more →

Amazon SageMaker HyperPod Adds Managed Karpenter Autoscaling

🛠️ Amazon SageMaker HyperPod now supports managed node autoscaling using Karpenter, enabling automated cluster scaling for both inference and training workloads. This managed capability removes the operational burden of installing and maintaining autoscaling infrastructure while providing integrated resilience and fault tolerance. Customers gain just-in-time GPU provisioning, scale-to-zero during low demand, workload-aware instance selection, and cost reductions through intelligent consolidation.
read more →

Amazon SageMaker HyperPod: Slurm Health Agent Now GA

🩺 Amazon announces general availability of the SageMaker HyperPod health monitoring agent for Slurm clusters. The agent runs continuously on GPU- and Trainium-based nodes to perform passive background checks, detect hardware faults (for example, unresponsive GPUs and NVLink errors), and mark and replace unhealthy nodes automatically. It supports automatic reboots and coordinates with Slurm job auto-resume so training can continue from the last checkpoint, reducing manual intervention and downtime.
read more →

Amazon SageMaker Adds EC2 P6-B200 Notebook Instances

🚀 Amazon Web Services announced general availability of EC2 P6-B200 instances for SageMaker notebooks. These instances include eight NVIDIA Blackwell GPUs with 1,440 GB of high-bandwidth GPU memory and 5th Gen Intel Xeon processors, offering up to 2x the training performance versus P5en. They enable interactive development and fine-tuning of large foundation models in JupyterLab and CodeEditor, and are available in US East (Ohio) and US West (Oregon).
read more →

SageMaker Unified Studio Connects Remotely to VS Code

🔗 AWS now enables remote connections from local VS Code to Amazon SageMaker Unified Studio, allowing developers to use their personalized VS Code setups while running workloads on SageMaker-managed compute and accessing cloud-resident data. Authentication is provided via the AWS Toolkit extension for secure, streamlined access. The integration preserves existing development workflows for data processing, SQL analytics, and ML.
read more →

Managed Tiered Checkpointing for Amazon SageMaker HyperPod

⚡ Amazon Web Services has announced general availability of managed tiered checkpointing for Amazon SageMaker HyperPod, a hybrid checkpointing capability that caches frequent checkpoints in CPU memory and periodically persists them to Amazon S3 for durability. The approach reduces model recovery time and minimizes training progress loss on large-scale clusters. It integrates with PyTorch Distributed Checkpoint (DCP) and is enabled via a CreateCluster/UpdateCluster API parameter; customers can use the sagemaker-checkpointing Python library to adopt it with minimal code changes. Currently available for HyperPod clusters using the EKS orchestrator.
read more →

Amazon SageMaker Unified Studio Adds Custom Blueprints

🔧 AWS announced general availability of Custom Blueprints in Amazon SageMaker Unified Studio, enabling customers to supply their own managed IAM policies when creating project roles. Teams can replace or augment the default service-managed policies and use custom AWS CloudFormation templates to define infrastructure and parameters for resources such as Amazon EMR on EC2, AWS Glue Data Catalog, and Amazon Redshift. Sample templates are available in the SageMaker documentation, and the capability is offered in all AWS Commercial Regions where the next-generation SageMaker is available.
read more →

Improved AI Assistance in Amazon SageMaker Unified Studio

🤖 Amazon Web Services announced enhancements to the Amazon Q Developer chat experience within SageMaker Unified Studio Jupyter notebooks and added a command-line interface for use in notebooks and the Code Editor. By integrating with Model Context Protocol (MCP) servers, the assistant becomes aware of project resources—data, compute, and code—and provides personalized, context-aware help. These updates aim to speed tasks like code refactoring, file edits, and troubleshooting while preserving transparency around assistant actions. The capabilities are available at no additional cost via the Amazon Q Developer Free Tier where SageMaker Unified Studio is offered; customers can enable Amazon Q Developer Pro for expanded functionality.
read more →

Amazon SageMaker Adds Restricted Classification Terms

🔒 Amazon SageMaker Catalog now supports governed classification using Restricted Classification Terms, enabling catalog administrators to mark sensitive glossary terms so only authorized users or projects can apply them to assets. Administrators grant usage through explicit policies and group membership, allowing centralized governance teams to control labels like Seller-MCF or PII. The capability is available in all regions that support SageMaker Unified Studio; consult the user guide to get started.
read more →

Amazon Managed Service for Prometheus Adds PagerDuty

🔔 Amazon Managed Service for Prometheus now sends alerts directly to PagerDuty, removing the need for custom Lambda functions or intermediary services. The native integration simplifies authentication and improves delivery reliability for incident notifications. It is available in all AWS regions where the service is generally available and can be configured from the Alert manager tab or via the AWS CLI, SDK, or APIs. Refer to the user guide for detailed setup instructions.
read more →

Amazon SageMaker Adds Account-Agnostic Project Profiles

🔁 Amazon SageMaker introduces account-agnostic, reusable project profiles within the SageMaker Unified Studio domain, enabling domain administrators to define project templates once and reuse them across multiple AWS accounts and regions. Profiles are decoupled from specific accounts and regions and can reference a new account pool for dynamic account and region selection at project creation, driven by custom authorization policies or predefined strategies. This reduces duplication, simplifies governance, and accelerates onboarding across large-scale data and ML environments. The feature is available in all Regions where Unified Studio is supported.
read more →

Amazon SageMaker Lakehouse Adds Tag-Based Access Control

🏷️ Amazon SageMaker lakehouse now supports tag-based access control (TBAC) across federated catalogs, extending capability beyond the default AWS Glue Data Catalog to Amazon S3 Tables, Amazon Redshift, and federated sources such as DynamoDB, PostgreSQL, and SQL Server. TBAC lets administrators group resources with tags, grant access based on those tags, and rely on tag inheritance so new tables automatically receive fine-grained controls. Administrators can create and apply tags via the AWS Lake Formation console and grant tag-based permissions to principals; tagged resources are then usable through Amazon Athena, Amazon Redshift, Amazon EMR, and SageMaker Unified Studio. The feature is available in all commercial AWS Regions via the Console, AWS CLI, and SDKs, with supporting Lake Formation Tags documentation and a blog post.
read more →

AWS SageMaker Adds P5.4xlarge with NVIDIA H100 GPU

🚀 Amazon SageMaker Training and Processing Jobs now supports the new EC2 P5 instance size with a single NVIDIA H100 GPU, offering the P5.4xlarge configuration for cost‑effective ML and HPC workloads. The instance enables fine-grained scaling so customers can begin with smaller configurations and expand incrementally, improving cost management and infrastructure flexibility. P5.4xlarge is available via SageMaker Flexible Training Plans and in select regions through On‑Demand and Spot.
read more →

Amazon SageMaker Unified Studio adds S3 file sharing option

📂 Amazon SageMaker Unified Studio now offers a simplified S3-based file storage option for project collaboration. Customers can choose between Git integrations (GitHub, GitLab, Bitbucket Cloud) or Amazon S3 buckets, with S3 set as the default while Git remains fully supported. The S3 option gives a consistent view of files across Studio tools, uses a last-write-wins model, and supports basic versioning when administrators enable it.
read more →