< ciso
brief />
Tag Banner

All news with #model monitoring tag

2 articles

Amazon SageMaker HyperPod Adds RIG Observability for Training

🔍 Amazon SageMaker HyperPod now provides integrated observability for Restricted Instance Groups (RIG), giving teams training foundation models with Nova Forge a unified view of compute resources and training workloads. A pre-configured Amazon Managed Grafana dashboard, backed by Amazon Managed Service for Prometheus, aggregates metrics from four exporters to show GPU utilization, NVLink bandwidth, CPU pressure, FSx for Lustre usage, network fabric, Kubernetes state, and curated logs including epoch progress, step-level logs, pipeline errors, and Python tracebacks. Observability is automatically enabled for new RIG clusters and can be turned on for existing clusters via the HyperPod console; it is available in all Regions where SageMaker HyperPod RIG is supported.
read more →

Amazon Connect Adds AI Agent Analytics and Monitoring

📊 Amazon Connect now delivers built‑in analytics and monitoring for AI agents across self‑service and agent assist experiences. Administrators can use customizable dashboards to track key metrics such as number of AI‑led interactions, hand‑off rates, conversation turns, and average handle time, and to compare agent versions to find optimal configurations. The release also exposes AI agent traces via APIs and enables rule‑based automation to trigger alerts or actions when conditions like low sentiment transfers occur.
read more →