Remove Data Quality Remove Metadata Remove Snapshot
article thumbnail

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

AWS Big Data

Equally crucial is the ability to segregate and audit problematic data, not just for maintaining data integrity, but also for regulatory compliance, error analysis, and potential data recovery. We discuss two common strategies to verify the quality of published data.

article thumbnail

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

These formats, exemplified by Apache Iceberg, Apache Hudi, and Delta Lake, addresses persistent challenges in traditional data lake structures by offering an advanced combination of flexibility, performance, and governance capabilities. Branching Branches are independent lineage of snapshot history that point to the head of each lineage.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

It addresses many of the shortcomings of traditional data lakes by providing features such as ACID transactions, schema evolution, row-level updates and deletes, and time travel. In this blog post, we’ll discuss how the metadata layer of Apache Iceberg can be used to make data lakes more efficient.

Metadata 126
article thumbnail

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

Today, we are pleased to announce that Amazon DataZone is now able to present data quality information for data assets. Other organizations monitor the quality of their data through third-party solutions. Additionally, Amazon DataZone now offers APIs for importing data quality scores from external systems.

article thumbnail

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

AWS Big Data

As organizations process vast amounts of data, maintaining an accurate historical record is crucial. History management in data systems is fundamental for compliance, business intelligence, data quality, and time-based analysis. You can obtain the table snapshots by querying for db.table.snapshots.

article thumbnail

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

It’s the preferred choice when customers need more control and customization over the data integration process or require complex transformations. This flexibility makes Glue ETL suitable for scenarios where data must be transformed or enriched before analysis.

article thumbnail

BI Cubed: Data Lineage on OLAP Anyone?

Octopai

How much time has your BI team wasted on finding data and creating metadata management reports? BI groups spend more than 50% of their time and effort manually searching for metadata. In fact, BI projects used to take many months to complete and require huge numbers of IT professionals to extract data. Cube to the rescue.

OLAP 81