Data Quality, Data Transformation and Testing

Data Quality

Data Transformation

Testing

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data.

Data Quality

Data Quality Metrics Data-driven Management

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Yet, despite growing investments in advanced analytics and AI, organizations continue to grapple with a persistent and often underestimated challenge: poor data quality.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. With dbt, teams can define data quality checks and access controls as part of their transformation workflow.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Complex Data Transformations — Test Planning Best Practices

Wayne Yaddow

FEBRUARY 21, 2025

Complex Data TransformationsTest Planning Best Practices Ensuring data accuracy with structured testing and best practices Photo by Taylor Vick on Unsplash Introduction Data transformations and conversions are crucial for data pipelines, enabling organizations to process, integrate, and refine raw data into meaningful insights.

Testing

Testing Data Transformation Data Quality Data Integration

Functional Gaps in Your Data Transformation Testing Tools?

Wayne Yaddow

FEBRUARY 11, 2025

Managing tests of complex data transformations when automated data testing tools lack important features? Photo by Marvin Meyer on Unsplash Introduction Data transformations are at the core of modern business intelligence, blending and converting disparate datasets into coherent, reliable outputs.

Testing

Testing Data Transformation Data Quality Statistics

Available Now! Automated Testing for Data Transformations

Wayne Yaddow

FEBRUARY 18, 2025

Selecting the strategies and tools for validating data transformations and data conversions in your data pipelines. Introduction Data transformations and data conversions are crucial to ensure that raw data is organized, processed, and ready for useful analysis.

Testing

Testing Data Transformation Data-driven Data Quality

Key Challenges Affecting Data Transformations—Dev and Testing

Wayne Yaddow

FEBRUARY 6, 2025

Common challenges and practical mitigation strategies for reliable data transformations. Photo by Mika Baumeister on Unsplash Introduction Data transformations are important processes in data engineering, enabling organizations to structure, enrich, and integrate data for analytics , reporting, and operational decision-making.

Testing

Testing Data Transformation Data-driven Manufacturing

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Wayne Yaddow

MARCH 5, 2025

In this post, well see the fundamental procedures, tools, and techniques that data engineers, data scientists, and QA/testing teams use to ensure high-quality data as soon as its deployed. First, we look at how unit and integration tests uncover transformation errors at an early stage.

Testing

Testing Data Transformation Statistics Metadata

Development Strategies to Prevent Data Quality Issues in Production (Part 1)

Wayne Yaddow

MARCH 3, 2025

Photo by CDC on Unsplash Many data pipeline failures and quality issues that are detected by data observability tools in production could have been prevented earlier in the pipeline lifecycle with better pre-production testing strategies. Crucial for time-sensitive analytics and reporting processes.

Data Quality

Data Quality Strategy ROI Testing

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

Extrinsic Control Deficit: Many of these changes stem from tools and processes beyond the immediate control of the data team. Unregulated ETL/ELT Processes: The absence of stringent data quality tests in ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) processes further exacerbates the problem.

Data Quality

Data Quality Testing Data Lake Data Integration

It’s Essential — Verifying Data Transformations (Part 4)

Wayne Yaddow

FEBRUARY 4, 2025

Its EssentialVerifying Data Transformations (Part4) Uncovering the leading problems in data transformation workflowsand practical ways to detect and preventthem In Parts 13 of this series of blogs, categories of data transformations were identified as among the top causes of data quality defects in data pipeline workflows.

Data Transformation

Data Transformation Testing Data Quality Strategy

Ensuring Data Transformation Results with Great Expectations

Wayne Yaddow

MARCH 12, 2025

How GX helps data teams validate, test, and monitor complex data pipelines Introduction Data flows from diverse sources, and transformations are becoming increasingly complex. Great Expectations can enable a wide range of data transformations and conversion operations.

Data Transformation

Data Transformation Data Quality Testing Data Warehouse

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

DataKitchen

APRIL 26, 2021

GSK had been pursuing DataOps capabilities such as automation, containerization, automated testing and monitoring, and reusability, for several years. At Workiva, they recognized that they are only as good as their data, so they centered their initial DataOps efforts around lowering errors. Early Results are Positive.

Measurement

Measurement Metrics Data-driven Dashboards

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

Data Engineers Are Using AI to Verify Data Transformations

Wayne Yaddow

FEBRUARY 26, 2025

AI is transforming how senior data engineers and data scientists validate data transformations and conversions. Artificial intelligence-based verification approaches aid in the detection of anomalies, the enforcement of data integrity, and the optimization of pipelines for improved efficiency.

Data Transformation

Data Transformation Testing Data-driven Data Quality

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

Uncomfortable truth incoming: Most people in your organization don’t think about the quality of their data from intake to production of insights. However, as a data team member, you know how important data integrity (and a whole host of other aspects of data management) is. Data integrity: A process and a state.

Data Integration

Data Integration Testing Data Quality Data-driven

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

A modern data platform entails maintaining data across multiple layers, targeting diverse platform capabilities like high performance, ease of development, cost-effectiveness, and DataOps features such as CI/CD, lineage, and unit testing. It does this by helping teams handle the T in ETL (extract, transform, and load) processes.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

AzureML and CRISP-DM – a Framework to help the Business Intelligence professional move to AI

Jen Stirrup

SEPTEMBER 30, 2021

Data Understanding is a crucial aspect of all of these areas, and the process will not proceed properly without it. From the perspective of CRISP-DM, this piece involves a number of activities: Collecting Initial Data Describing Data Exploring Data Verifying Data Quality. Evaluating the Model.

Business Intelligence

Business Intelligence Data mining Machine Learning Testing

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. This will enable right-sizing the Redshift data warehouse to meet workload demands cost-effectively. Additional considerations – Factor in additional tasks beyond schema conversion.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

AWS Big Data

FEBRUARY 8, 2023

It’s common to ingest multiple data sources into Amazon Redshift to perform analytics. Often, each data source will have its own processes of creating and maintaining data, which can lead to data quality challenges within and across sources. Answering questions as simple as “How many unique customers do we have?”

Data Quality

Data Quality Testing Data Warehouse Unstructured Data

The Best Data Management Tools For Small Businesses

Smart Data Collective

APRIL 29, 2020

The techniques for managing organisational data in a standardised approach that minimises inefficiency. Extraction, Transform, Load (ETL). The extraction of raw data, transforming to a suitable format for business needs, and loading into a data warehouse. Data transformation. Microsoft Azure.

Management

Management Data Warehouse Digital Transformation Dashboards

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

AWS Big Data

JULY 27, 2023

DataBrew is a visual data preparation tool that enables you to clean and normalize data without writing any code. The over 200 transformations it provides are now available to be used in an AWS Glue Studio visual job. Now that we identified the data quality issues to address, we need to decide how to deal with each case.

Visualization

Visualization Cost-Benefit Data Quality Publishing

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Alation

OCTOBER 18, 2022

Data analysts and engineers use dbt to transform, test, and document data in the cloud data warehouse. Yet every dbt transformation contains vital metadata that is not captured – until now. Data Transformation in the Modern Data Stack. How did the data transform exactly?

Metadata

Metadata Metrics Recreation/Entertainment Data Quality

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

Tricentis is the global leader in continuous testing for DevOps, cloud, and enterprise applications. Speed changes everything, and continuous testing across the entire CI/CD lifecycle is the key. Tricentis instills that confidence by providing software tools that enable Agile Continuous Testing (ACT) at scale.

Software

Software Data Lake Testing Dashboards

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. So questions linger about whether transformed data can be trusted.

Data Governance

Data Governance Risk Metadata Management

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

As the latest iteration in this pursuit of high-quality data sharing, DataOps combines a range of disciplines. It synthesizes all we’ve learned about agile, data quality , and ETL/ELT. They can better understand data transformations, checks, and normalization. Transparency is key.

Metadata

Metadata Cost-Benefit Data Quality Data Lake

The Chief Marketing Officer and the CDO – A Modern Fable

Peter James Thomas

OCTOBER 30, 2018

It may well be that one thing that a CDO needs to get going is a data transformation programme. This may purely be focused on cultural aspects of how an organisation records, shares and otherwise uses data. It may be to build a new (or a first) Data Architecture. It may be to build a new (or a first) Data Architecture.

Marketing

Marketing Strategy Data Architecture Data Strategy

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

AWS Big Data

APRIL 5, 2023

“Each of these tools were getting data from a different place, and that’s where it gets difficult,” says Jeroen Minnaert, head of data at Showpad. “If If each tool tells a different story because it has different data, we won’t have alignment within the business on what this data means.”

Dashboards

Dashboards Reporting Cost-Benefit Visualization

Smart Factories: Artificial Intelligence and Automation for Reduced OPEX in Manufacturing

DataRobot Blog

MARCH 10, 2022

The first step in building a model that can predict machine failure and even recommend the next best course of action is to aggregate, clean, and prepare data to train against. This task may require complex joins, aggregations, filtering, window functions, and many other data transformations against extremely large-scale data sets.

Manufacturing

Manufacturing IoT Machine Learning Forecasting

The Rising Need for Data Governance in Healthcare

Alation

OCTOBER 28, 2021

This, in turn, empowers data leaders to better identify and develop new revenue streams, customize patient offerings, and use data to optimize operations. To make good on this potential, healthcare organizations need to understand their data and how they can use it. Why Is Data Governance in Healthcare Important?

Data Governance

Data Governance Measurement Data Quality Metrics

Self-Serve Data Preparation Doesn’t Mean Traditional ETL is Dead!

Smarten

JANUARY 4, 2018

Extract, Transform and Load (ETL) refers to a process of connecting to data sources, integrating data from various data sources, improving data quality, aggregating it and then storing it in staging data source or data marts or data warehouses for consumption of various business applications including BI, Analytics and Reporting.

Data Warehouse

Data Warehouse OLAP Data Governance Optimization

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

The quick and dirty definition of data mapping is the process of connecting different types of data from various data sources. Data mapping is a crucial step in data modeling and can help organizations achieve their business goals by enabling data integration, migration, transformation, and quality.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

How DeNA Co., Ltd. accelerated anonymized data quality tests up to 100 times faster using Amazon Redshift Serverless and dbt

AWS Big Data

DECEMBER 17, 2024

Prevent the inclusion of invalid values in categorical data and process data without any data loss. Conduct data quality tests on anonymized data in compliance with data policies Conduct data quality tests to quickly identify and address data quality issues, maintaining high-quality data at all times.

Data Quality

Data Quality Testing Metrics Optimization

Unleashing GenAI — Ensuring Data Quality at Scale (Part 2)

Wayne Yaddow

MARCH 28, 2025

Unleashing GenAIEnsuring Data Quality at Scale (Part2) Transitioning from individual repository source systems to consolidated AI LLM pipelines, the importance of automated checks, end-to-end observability, and compliance with enterprise businessrules. First: It is critical to set up a thorough data inventory and assessment procedure.

Data Quality

Data Quality Data Integration Data Governance Metadata

How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and Amazon Athena

AWS Big Data

APRIL 29, 2025

While enabling organization-wide efficiency, the team also applied these principles to the data architecture, making sure that CLEA itself operates frugally. After evaluating various tools, we built a serverless data transformation pipeline using Amazon Athena and dbt. The Source stage maintains raw data in its original form.

Data Transformation

Data Transformation Cost-Benefit Testing Data Lake

Automating Data Warehouses in the Era of AI, Data Products and Data Lakehouses

BI-Survey

MARCH 6, 2025

For data management teams, achieving more with fewer resources has become a familiar challenge. While efficiency is a priority, data quality and security remain non-negotiable. Developing and maintaining data transformation pipelines are among the first tasks to be targeted for automation.

Data Warehouse

Data Warehouse Metadata Unstructured Data Data-driven

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

AWS Big Data

FEBRUARY 25, 2025

If data mapping has been enabled within the data processing job, then the structured data is prepared based on the given schema. This output is passed to next phase where data transformations and business validations can be applied. After this step, data is loaded to specified target.

Data Processing

Data Processing Machine Learning Data-driven Cost-Benefit

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

DataOps Observability includes monitoring and testing the data pipeline, data quality, data testing, and alerting. Data testing can be done through various methods, such as data profiling, Statistical Process Control, and quality checks. Are problems with data tests?

Testing

Testing Data Governance Data Quality Data-driven

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Data’s dark secret: Why poor quality cripples AI and growth

Webinars

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Complex Data Transformations — Test Planning Best Practices

Functional Gaps in Your Data Transformation Testing Tools?

Available Now! Automated Testing for Data Transformations

Key Challenges Affecting Data Transformations—Dev and Testing

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Development Strategies to Prevent Data Quality Issues in Production (Part 1)

Navigating the Chaos of Unruly Data: Solutions for Data Teams

It’s Essential — Verifying Data Transformations (Part 4)

Ensuring Data Transformation Results with Great Expectations

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

Ensuring Data Transformation Quality with dbt Core

Data Engineers Are Using AI to Verify Data Transformations

Data Integrity, the Basis for Reliable Insights

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AzureML and CRISP-DM – a Framework to help the Business Intelligence professional move to AI

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

­­Use fuzzy string matching to approximate duplicate records in Amazon Redshift

The Best Data Management Tools For Small Businesses

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Turnkey Cloud DataOps: Solution from Alation and Accenture

The Chief Marketing Officer and the CDO – A Modern Fable

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

Smart Factories: Artificial Intelligence and Automation for Reduced OPEX in Manufacturing

The Rising Need for Data Governance in Healthcare

Self-Serve Data Preparation Doesn’t Mean Traditional ETL is Dead!

What is Data Mapping?

How DeNA Co., Ltd. accelerated anonymized data quality tests up to 100 times faster using Amazon Redshift Serverless and dbt

Unleashing GenAI — Ensuring Data Quality at Scale (Part 2)

How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and Amazon Athena

Automating Data Warehouses in the Era of AI, Data Products and Data Lakehouses

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

“You Complete Me,” said Data Lineage to DataOps Observability.

Stay Connected

Use fuzzy string matching to approximate duplicate records in Amazon Redshift