Remove Data Quality Remove Metadata Remove Snapshot
article thumbnail

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

It addresses many of the shortcomings of traditional data lakes by providing features such as ACID transactions, schema evolution, row-level updates and deletes, and time travel. In this blog post, we’ll discuss how the metadata layer of Apache Iceberg can be used to make data lakes more efficient.

Metadata 115
article thumbnail

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

AWS Big Data

Equally crucial is the ability to segregate and audit problematic data, not just for maintaining data integrity, but also for regulatory compliance, error analysis, and potential data recovery. We discuss two common strategies to verify the quality of published data.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

Today, we are pleased to announce that Amazon DataZone is now able to present data quality information for data assets. Other organizations monitor the quality of their data through third-party solutions. Additionally, Amazon DataZone now offers APIs for importing data quality scores from external systems.

article thumbnail

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

These formats, exemplified by Apache Iceberg, Apache Hudi, and Delta Lake, addresses persistent challenges in traditional data lake structures by offering an advanced combination of flexibility, performance, and governance capabilities. Branching Branches are independent lineage of snapshot history that point to the head of each lineage.

article thumbnail

BI Cubed: Data Lineage on OLAP Anyone?

Octopai

How much time has your BI team wasted on finding data and creating metadata management reports? BI groups spend more than 50% of their time and effort manually searching for metadata. In fact, BI projects used to take many months to complete and require huge numbers of IT professionals to extract data. Cube to the rescue.

OLAP 81
article thumbnail

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

With in-place table migration, you can rapidly convert to Iceberg tables since there is no need to regenerate data files. Only metadata will be regenerated. Newly generated metadata will then point to source data files as illustrated in the diagram below. . Data quality using table rollback.

article thumbnail

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

Prior to the creation of the data lake, Orca’s data was distributed among various data silos, each owned by a different team with its own data pipelines and technology stack. Moreover, running advanced analytics and ML on disparate data sources proved challenging.