Remove Blog Remove Metadata Remove Snapshot
article thumbnail

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

In this blog post, we’ll discuss how the metadata layer of Apache Iceberg can be used to make data lakes more efficient. You will learn about an open-source solution that can collect important metrics from the Iceberg metadata layer. It enables users to track changes over time and manage version history effectively.

Metadata 112
article thumbnail

Apache Ozone Metadata Explained

Cloudera

As an important part of achieving better scalability, Ozone separates the metadata management among different services: . Ozone Manager (OM) service manages the metadata of the namespace such as volume, bucket and keys. Datanode service manages the metadata of blocks, containers and pipelines running on the datanode. .

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Apache HBase online migration to Amazon EMR

AWS Big Data

And during HBase migration, you can export the snapshot files to S3 and use them for recovery. This blog post introduces a set of typical HBase migration solutions with best practices based on real-world customers’ migration case studies. HBase provided by other cloud platforms doesn’t support snapshots.

article thumbnail

Cloudera Lakehouse Optimizer Makes it Easier Than Ever to Deliver High-Performance Iceberg Tables

Cloudera

Table Cleanup: As tables grow, they often accumulate unused data files, manifest files, and snapshots that aren’t needed anymore. Users may want to perform table maintenance functions, like expiring snapshots, removing old metadata files, and deleting orphan files, to optimize storage utilization and improve performance.

article thumbnail

Materialized Views in Hive for Iceberg Table Format

Cloudera

Overview This blog post describes support for materialized views for the Iceberg table format. Create Iceberg materialized view For the examples in this blog, we will use three tables from the TPC-DS dataset as our base tables: store_sales, customer and date_dim. Both full and incremental rebuild of the materialized view are supported.

article thumbnail

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

In this blog post, we dive into different data aspects and how Cloudinary breaks the two concerns of vendor locking and cost efficient data analytics by using Apache Iceberg, Amazon Simple Storage Service (Amazon S3 ), Amazon Athena , Amazon EMR , and AWS Glue. This concept makes Iceberg extremely versatile. SparkActions.get().expireSnapshots(iceTable).expireOlderThan(TimeUnit.DAYS.toMillis(7)).execute()

Data Lake 116
article thumbnail

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Apache Iceberg addresses customer needs by capturing rich metadata information about the dataset at the time the individual data files are created.

Data Lake 121