Data Transformation, Structured Data and Testing

Data Transformation

Structured Data

Testing

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

Data Engineers Are Using AI to Verify Data Transformations

Wayne Yaddow

FEBRUARY 26, 2025

AI is transforming how senior data engineers and data scientists validate data transformations and conversions. Artificial intelligence-based verification approaches aid in the detection of anomalies, the enforcement of data integrity, and the optimization of pipelines for improved efficiency.

Data Transformation

Data Transformation Testing Data-driven Data Quality

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Apply fine-grained access and transformation on the SUPER data type in Amazon Redshift

AWS Big Data

JUNE 19, 2024

Amazon Redshift, a cloud data warehouse service, supports attaching dynamic data masking (DDM) policies to paths of SUPER data type columns, and uses the OBJECT_TRANSFORM function with the SUPER data type. SUPER data type columns in Amazon Redshift contain semi-structured data like JSON documents.

Data Warehouse

Data Warehouse Testing Sales Structured Data

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

A modern data platform entails maintaining data across multiple layers, targeting diverse platform capabilities like high performance, ease of development, cost-effectiveness, and DataOps features such as CI/CD, lineage, and unit testing. It does this by helping teams handle the T in ETL (extract, transform, and load) processes.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Amazon Athena provides interactive analytics service for analyzing the data in Amazon Simple Storage Service (Amazon S3). Amazon Redshift is used to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes.

Metadata

Metadata Data Lake Modeling Data Warehouse

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

For the downstream consumption by all departments across the organization, smava’s Data Platform team prepares curated data products following the extract, load, and transform (ELT) pattern. The data products from the Business Vault and Data Mart stages are now available for consumers.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

AWS Big Data

OCTOBER 18, 2023

Spark SQL is an Apache Spark module for structured data processing. They use various AWS analytics services, such as Amazon EMR, to enable their analysts and data scientists to apply advanced analytics techniques to interactively develop and test new surveillance patterns and improve investor protection.

Big Data

Big Data Data Processing Interactive Testing

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. This approach minimizes the need for making query adjustments in multiple locations.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches. This solution includes a Lambda function that continuously updates the Amazon Location tracker with simulated location data from fictitious journeys.

Analytics

Analytics IoT Metadata Internet of Things

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

AWS Big Data

JULY 31, 2023

Select the connection again and on the Actions menu, choose Test connection. Testing the connection can take approximately 1 minute. You will see the message “Successfully connected to the data store with connection blog-redshift-connection.” This concludes creating data sources on the AWS Glue job canvas.

Sales

Sales Data Warehouse Visualization Testing

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. Data lakehouse was created to solve these problems.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

The Rising Need for Data Governance in Healthcare

Alation

OCTOBER 28, 2021

This, in turn, empowers data leaders to better identify and develop new revenue streams, customize patient offerings, and use data to optimize operations. Storing the same data in multiple places can lead to: Human error: mistakes when transcribing data reduce its quality and integrity.

Data Governance

Data Governance Measurement Data Quality Metrics

How DeNA Co., Ltd. accelerated anonymized data quality tests up to 100 times faster using Amazon Redshift Serverless and dbt

AWS Big Data

DECEMBER 17, 2024

Conduct data quality tests on anonymized data in compliance with data policies Conduct data quality tests to quickly identify and address data quality issues, maintaining high-quality data at all times. The challenge Data quality tests require performing 1,300 tests on 10 TB of data monthly.

Data Quality

Data Quality Testing Metrics Optimization

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

AWS Big Data

FEBRUARY 25, 2025

Based on the configuration file, the input data is fetched and technical validations are applied. If data mapping has been enabled within the data processing job, then the structured data is prepared based on the given schema.

Data Processing

Data Processing Machine Learning Data-driven Cost-Benefit

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

AWS Big Data

NOVEMBER 14, 2024

We use the built-in features of Data Firehose, including AWS Lambda for necessary data transformation and Amazon Simple Notification Service (Amazon SNS) for near real-time alerts. Each AWS account has one Data Catalog per AWS Region. Each Data Catalog is a highly scalable collection of tables organized into databases.

Data Lake

Data Lake Metadata Testing Data-driven

Data Leaders Brief

Ensuring Data Transformation Quality with dbt Core

Data Engineers Are Using AI to Verify Data Transformations

Webinars

Trending Sources

Apply fine-grained access and transformation on the SUPER data type in Amazon Redshift

Webinars

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

Data platform trinity: Competitive or complementary?

The Rising Need for Data Governance in Healthcare

How DeNA Co., Ltd. accelerated anonymized data quality tests up to 100 times faster using Amazon Redshift Serverless and dbt

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

Stay Connected