Remove tag etl-testing
article thumbnail

Measure performance of AWS Glue Data Quality for ETL pipelines

AWS Big Data

In this post, we provide benchmark results of running increasingly complex data quality rulesets over a predefined test dataset. Dataset details The test dataset contains 104 columns and 1 million rows stored in Parquet format. We define eight different AWS Glue ETL jobs where we run the data quality rulesets.

article thumbnail

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

AWS Big Data

Lake Formation tag-based access control (LF-TBAC) is an authorization strategy that defines permissions based on attributes. In Lake Formation, these attributes are called LF-Tags. You can attach LF-Tags to Data Catalog resources, Lake Formation principals, and table columns. You can see the associated database LF-Tags.

Snapshot 132
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Introducing AWS Glue usage profiles for flexible cost control

AWS Big Data

AWS Glue is a serverless data integration service that enables you to run extract, transform, and load (ETL) workloads on your data in a scalable and serverless manner. Because an AWS Glue profile is a resource identified by an ARN, all the default IAM controls apply, including action-based, resource-based, and tag-based authorization.

Big Data 132
article thumbnail

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

In this post, we showcase how to use AWS Glue with AWS Glue Data Quality , sensitive data detection transforms , and AWS Lake Formation tag-based access control to automate data governance. For the purpose of this post, the following governance policies are defined: No PII data should exist in tables or columns tagged as public.

article thumbnail

Orchestrate an end-to-end ETL pipeline using Amazon S3, AWS Glue, and Amazon Redshift Serverless with Amazon MWAA

AWS Big Data

This post demonstrates how to orchestrate an end-to-end extract, transform, and load (ETL) pipeline using Amazon Simple Storage Service (Amazon S3), AWS Glue , and Amazon Redshift Serverless with Amazon MWAA. This is done by invoking AWS Glue ETL jobs and writing to data objects in a Redshift Serverless cluster in Account B.

Metadata 133
article thumbnail

Automate alerting and reporting for AWS Glue job resource usage

AWS Big Data

To gain these insights, customers often perform ETL (extract, transform, and load) jobs from their source systems and output an enriched dataset. This team is allowed to create AWS Glue for Spark jobs in development, test, and production environments. jobs because this feature will help reduce cost and optimize your ETL jobs.

article thumbnail

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

This zero-ETL integration reduces the complexity and operational burden of data replication to let you focus on deriving insights from your data. When end-users in the appropriate user groups access Amazon S3 using AWS Glue ETL for Apache Spark , they will then automatically have the necessary permissions to read and write data.