Remove Metadata Remove Optimization Remove Snapshot
article thumbnail

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

We will also cover the pattern with automatic compaction through AWS Glue Data Catalog table optimization. Consider a streaming pipeline ingesting real-time event data while a scheduled compaction job runs to optimize file sizes. Load the tables latest metadata, and determine which metadata version is used as the base for the update.

Snapshot 117
article thumbnail

Accelerate your migration to Amazon OpenSearch Service with Reindexing-from-Snapshot

AWS Big Data

In this post, we will introduce a new mechanism called Reindexing-from-Snapshot (RFS), and explain how it can address your concerns and simplify migrating to OpenSearch. Documents are parsed from the snapshot and then reindexed to the target cluster, so that performance impact to the source clusters is minimized during migration.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. Icebergs table format separates data files from metadata files, enabling efficient data modifications without full dataset rewrites.

Metadata 107
article thumbnail

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

The adoption of open table formats is a crucial consideration for organizations looking to optimize their data management practices and extract maximum value from their data. Branching Branches are independent lineage of snapshot history that point to the head of each lineage. In earlier posts, we discussed AWS Glue 5.0

article thumbnail

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

Despite their advantages, traditional data lake architectures often grapple with challenges such as understanding deviations from the most optimal state of the table over time, identifying issues in data pipelines, and monitoring a large number of tables. It enables users to track changes over time and manage version history effectively.

Metadata 119
article thumbnail

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

AWS Big Data

The AWS Glue Data Catalog now enhances managed table optimization of Apache Iceberg tables by automatically removing data files that are no longer needed. Along with the Glue Data Catalog’s automated compaction feature, these storage optimizations can help you reduce metadata overhead, control storage costs, and improve query performance.

article thumbnail

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

AWS Big Data

Amazon OpenSearch Service recently introduced the OpenSearch Optimized Instance family (OR1), which delivers up to 30% price-performance improvement over existing memory optimized instances in internal benchmarks, and uses Amazon Simple Storage Service (Amazon S3) to provide 11 9s of durability.