Tag Banner

All news with #amazon emr tag

Tue, December 2, 2025

AWS launches Apache Spark Upgrade Agent for Amazon EMR

🛠️ AWS announced the Apache Spark upgrade agent, a capability that automates and accelerates Spark version upgrades for Amazon EMR on EC2 and EMR Serverless. The agent performs automated code analysis across PySpark and Scala, identifies API and behavioral changes for Spark 2.4→3.5, and suggests precise code transformations. Engineers can invoke the agent from SageMaker Unified Studio, the Kiro CLI, or any MCP-compatible IDE, interact via natural-language prompts, review proposed edits, and approve implementations. Functional correctness is validated through data quality checks to help maintain processing accuracy during migration.

read more →

Tue, December 2, 2025

Amazon EMR Serverless Removes Local Storage Provisioning

🚀 Amazon EMR Serverless now provides fully managed serverless local storage for Apache Spark workloads, removing the need to provision disk type or size per application. The service offloads intermediate operations such as shuffle to an auto-scaling, encrypted serverless storage with job-level isolation, so customers pay only for compute and memory consumed. This reduces disk-related job failures and can lower costs by up to 20%. It is generally available for EMR release 7.12 and later.

read more →

Wed, November 26, 2025

AWS Adds Apache Iceberg V3 Deletion Vectors and Lineage

🔔 AWS now supports Apache Iceberg V3 deletion vectors and row lineage across key analytics services. These features — available in Amazon EMR 7.12, AWS Glue, SageMaker notebooks, Amazon S3 Tables, and the AWS Glue Data Catalog — accelerate data modifications and make it simpler to identify changed records. Enable V3 by setting the table property 'format-version = 3' in CREATE TABLE or by updating table metadata; supported AWS query engines will automatically use deletion vectors and row lineage.

read more →

Wed, November 26, 2025

Amazon EMR and AWS Glue Add Audit Context for Lake Formation

🔒 Amazon EMR and AWS Glue now include comprehensive audit context support for AWS Lake Formation credential vending APIs and AWS Glue Data Catalog GetTable and GetTables calls. Enabled by default, the feature logs platform type and identifiers (Cluster ID, Step ID, Job Run ID, Virtual Cluster ID) to AWS CloudTrail for enhanced security auditing and troubleshooting. It supports EMR 7.12+ and AWS Glue 5.1+ across all Regions that offer EMR, AWS Glue, and Lake Formation.

read more →

Wed, November 26, 2025

Amazon EMR and AWS Glue Enforce Lake Formation Write FGAC

🔐 Amazon has extended AWS Lake Formation fine-grained access control to include write operations for tables registered with Lake Formation when used in Apache Spark jobs on Amazon EMR and AWS Glue. Administrators can now enforce table-, column-, and row-level permissions for DML actions (CREATE, ALTER, INSERT, UPDATE, DELETE, MERGE INTO, DROP) as well as read operations, enabling single-job read/write pipelines. The change reduces the need for separate clusters or applications and centralizes governance. The feature is available in all Regions where EMR, Glue, and Lake Formation are supported.

read more →

Fri, November 21, 2025

Amazon EMR 7.12 Adds Apache Iceberg v3 Table Format

🆕 Amazon EMR 7.12 now supports the Apache Iceberg v3 table format (Iceberg 1.10) and includes Apache Spark 3.5.6. This update reduces storage and pipeline costs by marking deleted rows instead of rewriting files, while adding automatic row-level history for stronger governance and change-data capture. It also introduces table-level encryption and integrates with AWS Lake Formation. Apache Trino 476 is included, and EMR 7.12 is available in all Regions that support EMR.

read more →

Fri, August 29, 2025

Amazon EMR S3A Connector: Faster S3 Access for Analytics

🚀 Amazon Web Services announced the Amazon EMR S3A connector, an AWS-optimized S3 interface for Apache Hadoop, Spark, and Hive on EMR. It extends open-source S3A with AWS-specific enhancements including MagicCommitter V2, improved credentials resolution, accelerated prefix listing, and Spark fine-grained access control. The connector is pre-configured in EMR release 7.10 and later and is available in all Regions where EMR runs.

read more →

Fri, August 29, 2025

Amazon EMR Adds Spark FGAC and Glue Data Catalog Views

🔒 Amazon EMR on EC2 now supports Apache Spark native fine-grained access control (FGAC) through AWS Lake Formation and adds support for AWS Glue Data Catalog views. These capabilities let administrators define and enforce granular Lake Formation policies once and apply them consistently to Spark jobs and interactive sessions, reducing administrative overhead and security risk. Access checks support named resource grants, data filters, and tag-based controls and are logged in AWS CloudTrail for auditing.

read more →