Measure performance of AWS Glue Data Quality for ETL pipelines
AWS Big Data
MARCH 12, 2024
In this post, we provide benchmark results of running increasingly complex data quality rulesets over a predefined test dataset. Dataset details The test dataset contains 104 columns and 1 million rows stored in Parquet format. We define eight different AWS Glue ETL jobs where we run the data quality rulesets.
Let's personalize your content