< ciso
brief />
Tag Banner

All news with #model monitoring tag

3 articles

SageMaker HyperPod Adds Data Capture for Inference

🧾 Amazon SageMaker HyperPod now supports data capture for inference workloads, allowing organizations to record request and response payloads for monitoring, compliance, debugging, and offline analysis. You can capture traffic at the SageMaker endpoint, load balancer, or model pod and combine layers for richer observability. Captured data is delivered asynchronously to Amazon S3 with configurable sampling and encryption using customer-managed AWS KMS keys and is designed to never block inference. Enable data capture via the HyperPod Inference Operator or SageMaker JumpStart.
read more →

Amazon SageMaker HyperPod Adds RIG Observability for Training

🔍 Amazon SageMaker HyperPod now provides integrated observability for Restricted Instance Groups (RIG), giving teams training foundation models with Nova Forge a unified view of compute resources and training workloads. A pre-configured Amazon Managed Grafana dashboard, backed by Amazon Managed Service for Prometheus, aggregates metrics from four exporters to show GPU utilization, NVLink bandwidth, CPU pressure, FSx for Lustre usage, network fabric, Kubernetes state, and curated logs including epoch progress, step-level logs, pipeline errors, and Python tracebacks. Observability is automatically enabled for new RIG clusters and can be turned on for existing clusters via the HyperPod console; it is available in all Regions where SageMaker HyperPod RIG is supported.
read more →

Amazon Connect Adds AI Agent Analytics and Monitoring

📊 Amazon Connect now delivers built‑in analytics and monitoring for AI agents across self‑service and agent assist experiences. Administrators can use customizable dashboards to track key metrics such as number of AI‑led interactions, hand‑off rates, conversation turns, and average handle time, and to compare agent versions to find optimal configurations. The release also exposes AI agent traces via APIs and enables rule‑based automation to trigger alerts or actions when conditions like low sentiment transfers occur.
read more →