All news with #apache iceberg tag

Tue, December 2, 2025

Amazon SageMaker Catalog Exports Asset Metadata to Iceberg

#AWS #Amazon SageMaker #Apache Iceberg #Amazon Athena

🔍 Amazon SageMaker Catalog now exports asset metadata as an Apache Iceberg table via Amazon S3 Tables, enabling teams to query catalog inventory with standard SQL without building custom ETL. The export includes technical fields (resource_id, resource_type), business metadata (asset_name, business_description), ownership details, and timestamps, partitioned by snapshot_date for time travel queries. The dataset appears in SageMaker Unified Studio and is queryable from Amazon Athena, Studio notebooks, AI agents, and BI tools. Available in all supported Regions at no additional SageMaker charge; you pay for S3 Tables storage and Athena queries.

Sun, November 30, 2025

AWS Glue Adds Apache Iceberg-Based Materialized Views

#AWS #AWS Glue #Apache Iceberg #Product Release

⚡ AWS Glue now supports materialized views stored in Apache Iceberg format and managed in the AWS Glue Data Catalog. Data teams can create views with standard Spark SQL, attach a refresh schedule, and rely on automatic change detection, incremental updates, and managed compute for refresh jobs. Query engines across Athena, EMR, and AWS Glue rewrite queries to use these views, improving performance by up to 8x and lowering compute costs, while SQL tools like Redshift and SageMaker can read the Iceberg tables directly.

Wed, November 26, 2025

AWS Adds Apache Iceberg V3 Deletion Vectors and Lineage

#AWS #Apache Iceberg #Amazon EMR #AWS Glue #Amazon SageMaker

🔔 AWS now supports Apache Iceberg V3 deletion vectors and row lineage across key analytics services. These features — available in Amazon EMR 7.12, AWS Glue, SageMaker notebooks, Amazon S3 Tables, and the AWS Glue Data Catalog — accelerate data modifications and make it simpler to identify changed records. Enable V3 by setting the table property 'format-version = 3' in CREATE TABLE or by updating table metadata; supported AWS query engines will automatically use deletion vectors and row lineage.

Wed, November 26, 2025

AWS Glue 5.1 GA: Spark 3.5, Iceberg 3.0, Lake Formation

#AWS #Product Release #AWS Glue #Apache Iceberg #Lake Formation #Apache Hudi #Delta Lake #Apache Spark

⚡ AWS Glue 5.1 is now generally available, upgrading core engines to Apache Spark 3.5.6, Python 3.11, and Scala 2.12.18 to deliver performance and security improvements. The release refreshes open table format support (Apache Hudi 1.0.2, Apache Iceberg 1.10.0, Delta Lake 3.3.2) and adds Apache Iceberg format 3.0 features such as default column values and deletion vectors. AWS Lake Formation now enforces fine‑grained write control for Spark DDL/DML, and Glue adds full‑table access control for Hudi and Delta tables in Spark.

Mon, November 24, 2025

AWS Glue: Catalog Federation for Remote Iceberg Catalogs

#AWS #Product Release #AWS Glue #Apache Iceberg #AWS S3 #Lake Formation

🔗 AWS announces general availability of AWS Glue catalog federation for remote Apache Iceberg catalogs. The feature enables analytics engines to query Iceberg tables stored in Amazon S3 and cataloged remotely without moving or copying data, with real-time metadata synchronization to the AWS Glue Data Catalog. It leverages AWS Lake Formation for fine-grained access controls and supports the Iceberg REST specifications; federation is available in the Lake Formation console and via SDKs/APIs.

Fri, November 21, 2025

Amazon EMR 7.12 Adds Apache Iceberg v3 Table Format

#AWS #Amazon EMR #Apache Iceberg #Apache Spark #Lake Formation

🆕 Amazon EMR 7.12 now supports the Apache Iceberg v3 table format (Iceberg 1.10) and includes Apache Spark 3.5.6. This update reduces storage and pipeline costs by marking deleted rows instead of rewriting files, while adding automatic row-level history for stronger governance and change-data capture. It also introduces table-level encryption and integrates with AWS Lake Formation. Apache Trino 476 is included, and EMR 7.12 is available in all Regions that support EMR.

Wed, November 19, 2025

BigLake Metastore Adds Iceberg REST Catalog Support

#Google #BigLake #Apache Iceberg #BigQuery #Dataplex Universal Catalog #Vertex AI #GCP Cloud Storage

🔔 Google Cloud announced general availability of BigLake metastore support for the Iceberg REST Catalog, offering a serverless, standards-based runtime metastore that enables interoperability across Iceberg-compatible engines (Spark, Trino) and BigQuery. The service provides credential vending, integrated governance via Dataplex Universal Catalog for lineage and data quality, and a UX console for creating and managing Iceberg catalogs. By removing the need to run custom metastore deployments, BigLake metastore aims to reduce operational overhead while preserving enterprise scale and security.

Tue, November 18, 2025

Amazon Redshift JIT ANALYZE for Apache Iceberg tables

#Product Release #AWS #Amazon Redshift #Apache Iceberg

📈 Amazon Redshift now supports Just‑In‑Time (JIT) ANALYZE for Apache Iceberg tables, automatically collecting table‑ and column‑level statistics during query execution. The feature uses intelligent heuristics and lightweight sketch data structures to determine when runtime statistics will improve optimizer decisions and to build high‑quality statistics on the fly. JIT ANALYZE is generally available in all AWS regions with Redshift and requires no configuration changes to begin improving query plans and performance.

Mon, November 17, 2025

Amazon Redshift Adds Apache Iceberg Write Support (GA)

#AWS #Amazon Redshift #Apache Iceberg #Product Release

🔔 Amazon Redshift now supports write operations to Apache Iceberg tables in general availability, enabling SQL DDL and DML including CREATE, SHOW, DROP, and INSERT for append-only workloads. Customers can execute concurrent read and write queries against Iceberg tables cataloged in AWS Glue Data Catalog while benefiting from transactional consistency and schema and partition evolution support. The capability is available in all regions where Amazon Redshift is offered.

Thu, September 25, 2025

R2 SQL Deep Dive: Serverless Queries over R2 Data Platform

#Product Release #Cloudflare Workers #R2 SQL #R2 Data Catalog #Apache Iceberg #Apache Parquet #Apache DataFusion

⚡ R2 SQL is Cloudflare’s serverless query engine that runs SQL directly against Iceberg tables stored in R2, eliminating the need for Spark or Trino clusters. The Query Planner uses R2 Data Catalog metadata and multi-level stats to prune manifests, files, and Parquet row groups so only necessary bytes are read. Execution is distributed across Cloudflare’s network using Workers and query workers running Apache DataFusion, with results serialized via Apache Arrow. An ordered, streaming planning pipeline enables early termination for ORDER BY ... LIMIT queries; R2 SQL is currently available in open beta.

Thu, September 25, 2025

Cloudflare Data Platform: R2 Pipelines, Catalog, SQL

#Cloudflare #Product Release #R2 Data Catalog #R2 SQL #Cloudflare Pipelines #Apache Iceberg

🧭 Cloudflare announced the Cloudflare Data Platform, combining Cloudflare Pipelines, R2 Data Catalog, and R2 SQL to ingest, store, and query analytical tables directly on R2 object storage. Built on Apache Iceberg and open standards, the platform emphasizes engine interoperability and Cloudflare’s zero-cost egress. Pipelines offers exactly-once ingestion and SQL transforms today; stateful processing is planned. The products are open betas with usage-based pricing signals ahead of GA.

Fri, August 29, 2025

Google Cloud and Partners Commit to Apache Iceberg

#Apache Iceberg #Confluent #Databricks #dbt Labs #Fivetran #Google #Snowflake

🔁 Google Cloud and an ecosystem of partners — including Confluent, Databricks, dbt, Fivetran, Informatica, and Snowflake — reaffirm support for the open table format Apache Iceberg to power modern lakehouse architectures. The post highlights Google innovations such as BigLake and a REST Catalog API that unify metadata and enable interoperability across engines like BigQuery, Databricks, and Snowflake. The collaboration aims to reduce data silos, enable time travel and pruning, and accelerate AI-ready analytics.