Remove Blog Remove Metadata Remove Snapshot
article thumbnail

Accelerate your migration to Amazon OpenSearch Service with Reindexing-from-Snapshot

AWS Big Data

In this post, we will introduce a new mechanism called Reindexing-from-Snapshot (RFS), and explain how it can address your concerns and simplify migrating to OpenSearch. Documents are parsed from the snapshot and then reindexed to the target cluster, so that performance impact to the source clusters is minimized during migration.

article thumbnail

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

In this blog post, we’ll discuss how the metadata layer of Apache Iceberg can be used to make data lakes more efficient. You will learn about an open-source solution that can collect important metrics from the Iceberg metadata layer. It enables users to track changes over time and manage version history effectively.

Metadata 126
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Apache Ozone Metadata Explained

Cloudera

As an important part of achieving better scalability, Ozone separates the metadata management among different services: . Ozone Manager (OM) service manages the metadata of the namespace such as volume, bucket and keys. Datanode service manages the metadata of blocks, containers and pipelines running on the datanode. .

article thumbnail

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Apache Iceberg addresses customer needs by capturing rich metadata information about the dataset at the time the individual data files are created.

Data Lake 136
article thumbnail

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

Branching Branches are independent lineage of snapshot history that point to the head of each lineage. An Iceberg table’s metadata stores a history of snapshots, which are updated with each transaction. Iceberg implements features such as table versioning and concurrency control through the lineage of these snapshots.

article thumbnail

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

This means the data files in the data lake aren’t modified during the migration and all Apache Iceberg metadata files (manifests, manifest files, and table metadata files) are generated outside the purview of the data. In this method, the metadata are recreated in an isolated environment and colocated with the existing data files.

Data Lake 122
article thumbnail

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

In this blog post, we dive into different data aspects and how Cloudinary breaks the two concerns of vendor locking and cost efficient data analytics by using Apache Iceberg, Amazon Simple Storage Service (Amazon S3 ), Amazon Athena , Amazon EMR , and AWS Glue. This concept makes Iceberg extremely versatile. SparkActions.get().expireSnapshots(iceTable).expireOlderThan(TimeUnit.DAYS.toMillis(7)).execute()

Data Lake 126