Remove Metadata Remove Reference Remove Snapshot
article thumbnail

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. Icebergs table format separates data files from metadata files, enabling efficient data modifications without full dataset rewrites.

Metadata 111
article thumbnail

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

For more details, refer to Iceberg Release 1.6.1. Branching Branches are independent lineage of snapshot history that point to the head of each lineage. An Iceberg table’s metadata stores a history of snapshots, which are updated with each transaction. We highlight its notable updates in this section.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

Central to a transactional data lake are open table formats (OTFs) such as Apache Hudi , Apache Iceberg , and Delta Lake , which act as a metadata layer over columnar formats. For more examples and references to other posts, refer to the following GitHub repository. This post is one of multiple posts about XTable on AWS.

Metadata 105
article thumbnail

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

AWS Big Data

Heres how it works: As data streams in, it passes through a validation process Valid data is written directly to the table referred by downstream users Invalid or problematic data is redirected to a separate DLQ for later analysis and potential recovery The following screenshot shows this flow.

article thumbnail

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

AWS Big Data

Along with the Glue Data Catalog’s automated compaction feature, these storage optimizations can help you reduce metadata overhead, control storage costs, and improve query performance. Iceberg creates a new version called a snapshot for every change to the data in the table. Snapshots are timestamped versions of an iceberg table.

article thumbnail

Apache Ozone Metadata Explained

Cloudera

As an important part of achieving better scalability, Ozone separates the metadata management among different services: . Ozone Manager (OM) service manages the metadata of the namespace such as volume, bucket and keys. Datanode service manages the metadata of blocks, containers and pipelines running on the datanode. .

article thumbnail

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

AWS Big Data

Apache Iceberg manages these schema changes in a backward-compatible way through its innovative metadata table evolution architecture. With Lake Formation, you can manage fine-grained access control for your data lake data on Amazon S3 and its metadata in the Data Catalog. Iceberg maintains the table state in metadata files.

Snapshot 132