Data Quality, Metadata and Testing

Announcing Open Source DataOps Data Quality TestGen 3.0

DataKitchen

FEBRUARY 20, 2025

Announcing DataOps Data Quality TestGen 3.0: Open-Source, Generative Data Quality Software. It assesses your data, deploys production testing, monitors progress, and helps you build a constituency within your company for lasting change. New Quality Dashboard & Score Explorer.

Data Quality

Data Quality Scorecard Testing Dashboards

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data.

Data Quality

Data Quality Metrics Data-driven Management

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Yet, despite growing investments in advanced analytics and AI, organizations continue to grapple with a persistent and often underestimated challenge: poor data quality.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

When is data too clean to be useful for enterprise AI?

CIO Business Intelligence

NOVEMBER 27, 2024

Once the province of the data warehouse team, data management has increasingly become a C-suite priority, with data quality seen as key for both customer experience and business performance. But along with siloed data and compliance concerns , poor data quality is holding back enterprise AI projects.

Enterprise

Enterprise Data Quality Structured Data Modeling

7 Benefits of Metadata Management

erwin

FEBRUARY 19, 2021

Metadata management is key to wringing all the value possible from data assets. However, most organizations don’t use all the data at their disposal to reach deeper conclusions about how to drive revenue, achieve regulatory compliance or accomplish other strategic objectives. What Is Metadata? Harvest data.

Metadata

Metadata Management Data Quality Cost-Benefit

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

Today, we are pleased to announce that Amazon DataZone is now able to present data quality information for data assets. Other organizations monitor the quality of their data through third-party solutions. Additionally, Amazon DataZone now offers APIs for importing data quality scores from external systems.

Data Quality

Data Quality Visualization Metadata Metrics

What are model governance and model operations?

O'Reilly on Data

JUNE 19, 2019

A catalog or a database that lists models, including when they were tested, trained, and deployed. A catalog of validation data sets and the accuracy measurements of stored models. Versioning (of models, feature vectors , data) and the ability to roll out, roll back, or have multiple live versions.

Modeling

Modeling Machine Learning Testing Metrics

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

Know thy data: understand what it is (formats, types, sampling, who, what, when, where, why), encourage the use of data across the enterprise, and enrich your datasets with searchable (semantic and content-based) metadata (labels, annotations, tags). Test early and often. Test and refine the chatbot.

Strategy

Strategy Experimentation Uncertainty Machine Learning

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

have a large body of tools to choose from: IDEs, CI/CD tools, automated testing tools, and so on. are only starting to exist; one big task over the next two years is developing the IDEs for machine learning, plus other tools for data management, pipeline management, data cleaning, data provenance, and data lineage.

Machine Learning

Machine Learning Software Metadata Testing

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

In the context of Data in Place, validating data quality automatically with Business Domain Tests is imperative for ensuring the trustworthiness of your data assets. Running these automated tests as part of your DataOps and Data Observability strategy allows for early detection of discrepancies or errors.

Testing

Testing Data Quality Predictive Modeling Metrics

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

AWS Big Data

OCTOBER 9, 2024

Some customers build custom in-house data parity frameworks to validate data during migration. Others use open source data quality products for data parity use cases. This takes away important person hours from the actual migration effort into building and maintaining a data parity framework.

Data Quality

Data Quality Data Lake Data Warehouse Metrics

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

The data engineer then emails the BI Team, who refreshes a Tableau dashboard. Figure 1: Example data pipeline with manual processes. There are no automated tests , so errors frequently pass through the pipeline. The pipeline has automated tests at each step, making sure that each step completes successfully.

Testing

Testing Metadata Dashboards Statistics

A Data Governance Self-Assessment Test

TDAN

MARCH 16, 2021

The purpose of this article is to provide a model to conduct a self-assessment of your organization’s data environment when preparing to build your Data Governance program. Take the […].

Data Governance

Data Governance Testing Modeling Data Quality

2024 Gartner Market Guide To DataOps

DataKitchen

AUGUST 16, 2024

Data Pipeline Observability: Optimizes pipelines by monitoring data quality, detecting issues, tracing data lineage, and identifying anomalies using live and historical metadata. This capability includes monitoring, logging, and business-rule detection.

Marketing

Marketing Data Quality Testing Metadata

Why data observability is essential to AI governance

erwin

DECEMBER 9, 2024

If you are not observing and reacting to the data, the model will accept every variant and it may end up one of the more than 50% of models, according to Gartner , that never make it to production because there are no clear insights and the results have nothing to do with the original intent of the model.

Metadata

Metadata Data Quality Sales Modeling

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

If you’ve been following DataKitchen at all, you know we are all about transferring software development methods to data analytics. . The organizational concepts behind data mesh are summarized as follows. A five to nine-person team owns the dev, test, deployment, monitoring and maintenance of a domain.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The Business Application Research Center (BARC) warns that data governance is a highly complex, ongoing program, not a “big bang initiative,” and it runs the risk of participants losing trust and interest over time. The program must introduce and support standardization of enterprise data.

Data Governance

Data Governance Management Metadata Data Quality

Alation + Soda: Dynamic Data Quality with the Data Catalog

Alation

DECEMBER 7, 2021

Alation and Soda are excited to announce a new partnership, which will bring powerful data-quality capabilities into the data catalog. Soda’s data observability platform empowers data teams to discover and collaboratively resolve data issues quickly. Do we have end-to-end data pipeline control?

Data Quality

Data Quality Metadata Data Governance Testing

The Need For Personalized Data Journeys for Your Data Consumers

DataKitchen

OCTOBER 20, 2023

Deploying a Data Journey Instance unique to each customer’s payload is vital to fill this gap. Such an instance answers the critical question of ‘Dude, Where is my data?’ ’ while maintaining operational efficiency and ensuring data quality—thus preserving customer satisfaction and the team’s credibility.

Insurance

Insurance Metadata Data-driven Data Quality

The Gold Standard – The Key to Information Extraction and Data Quality Control

Ontotext

MAY 26, 2021

In the above case of merging information about companies from different data sources, data linking helps us encode the real-world business logic into data linking rules. But, before we can have any larger scale implementation of these rules, we have to test their validity. How does the Gold Standard help data linking?

Data Quality

Data Quality Machine Learning Measurement Metadata

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Wayne Yaddow

MARCH 5, 2025

In this post, well see the fundamental procedures, tools, and techniques that data engineers, data scientists, and QA/testing teams use to ensure high-quality data as soon as its deployed. First, we look at how unit and integration tests uncover transformation errors at an early stage. PyTest, JUnit,NUnit).

Testing

Testing Data Transformation Statistics Metadata

Metadata enrichment – highly scalable data classification and data discovery

IBM Big Data Hub

JULY 28, 2022

Metadata enrichment is about scaling the onboarding of new data into a governed data landscape by taking data and applying the appropriate business terms, data classes and quality assessments so it can be discovered, governed and utilized effectively. Scalability and elasticity.

Metadata

Metadata Machine Learning Data Quality Statistics

Available Now! Automated Testing for Data Transformations

Wayne Yaddow

FEBRUARY 18, 2025

Introduction Data transformations and data conversions are crucial to ensure that raw data is organized, processed, and ready for useful analysis. However, these two processes are essentially distinct, and their testing needs differ in manyways.

Testing

Testing Data Transformation Data-driven Data Quality

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

Manually upgrading, testing, and deploying over 5,000 jobs every few quarters was time consuming, error prone, costly, and not sustainable. EDLS job steps and metadata Every EDLS job comprises one or more job steps chained together and run in a predefined order orchestrated by the custom ETL framework.

Metadata

Metadata Data Lake Visualization Data Quality

What is a data fabric architecture?

IBM Big Data Hub

MARCH 25, 2022

Automated data enrichment : To create the knowledge catalog, you need automated data stewardship services. These services include the ability to auto-discover and classify data, to detect sensitive information, to analyze data quality, to link business terms to technical metadata and to publish data to the knowledge catalog.

Metadata

Metadata Data Quality Data Governance Data Integration

Top 5 Data Catalog Benefits: Understanding Your Organization’s Data Lineage

erwin

AUGUST 7, 2019

A data catalog benefits organizations in a myriad of ways. With the right data catalog tool, organizations can automate enterprise metadata management – including data cataloging, data mapping, data quality and code generation for faster time to value and greater accuracy for data movement and/or deployment projects.

Metadata

Metadata Data Governance Data Quality Data Warehouse

SHACL-ing the Data Quality Dragon I: the Problem and the Tools

Ontotext

NOVEMBER 9, 2023

While everyone may subscribe to the same design decisions and agree on an ontology, there may be differences in the data quality. In such situations, data must be validated. Let’s build a very simple test dataset. For our test set, we can simply use sh:path. Instead, they provide metadata about the shapes.

Data Quality

Data Quality Testing Reporting Metadata

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Alation

OCTOBER 18, 2022

In the modern data stack, dbt is a key tool to make data ready for analysis. Data analysts and engineers use dbt to transform, test, and document data in the cloud data warehouse. Yet every dbt transformation contains vital metadata that is not captured – until now. Conclusion.

Metadata

Metadata Metrics Recreation/Entertainment Data Quality

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

The model outputs produced by the same code will vary with changes to things like the size of the training data (number of labeled examples), network training parameters, and training run time. This has serious implications for software testing, versioning, deployment, and other core development processes.

Management

Management Machine Learning Experimentation Metrics

6 DataOps Best Practices to Increase Your Data Analytics Output AND Your Data Quality

Octopai

OCTOBER 26, 2022

DataOps is an approach to best practices for data management that increases the quantity of data analytics products a data team can develop and deploy in a given time while drastically improving the level of data quality. SPC is the continuous testing of the results of automated manufacturing processes.

Data Quality

Data Quality Data Analytics Analytics Manufacturing

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Figure 1: Flow of actions for self-service analytics around data assets stored in relational databases First, the data producer needs to capture and catalog the technical metadata of the data asset. The producer also needs to manage and publish the data asset so it’s discoverable throughout the organization.

Metadata

Metadata Data Lake Data Processing Data-driven

6 Case Studies on The Benefits of Business Intelligence And Analytics

datapine

JANUARY 31, 2022

The companies that are most successful at marketing in both B2C and B2B are using data and online BI tools to craft hyper-specific campaigns that reach out to targeted prospects with a curated message. Everything is being tested, and then the campaigns that succeed get more money put into them, while the others aren’t repeated.

Business Intelligence

Business Intelligence Analytics Cost-Benefit ROI

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This also includes building an industry standard integrated data repository as a single source of truth, operational reporting through real time metrics, data quality monitoring, 24/7 helpdesk, and revenue forecasting through financial projections and supply availability projections.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

10 master data management certifications that will pay off

CIO Business Intelligence

FEBRUARY 2, 2024

The Art of Service recommends candidates spend a minimum of 18 hours on the course to pass the certification test. It consists of three separate, 90-minute exams: the Information Systems (IS) Core exam, the Data Management Core exam, and the Specialty exam. How to prepare: The fee includes an online training program and PDF textbook.

Management

Management Data Governance Cost-Benefit Testing

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

DECEMBER 15, 2022

With in-place table migration, you can rapidly convert to Iceberg tables since there is no need to regenerate data files. Only metadata will be regenerated. Newly generated metadata will then point to source data files as illustrated in the diagram below. . Data quality using table rollback.

Metadata

Metadata Data Warehouse Snapshot Machine Learning

Using DataOps to Drive Agility and Business Value

DataKitchen

JUNE 24, 2021

Bergh added, “ DataOps is part of the data fabric. You should use DataOps principles to build and iterate and continuously improve your Data Fabric. Automate the data collection and cleansing process. Education is the Biggest Challenge. Take a show-me approach.

Metrics

Metrics ROI Measurement Cost-Benefit

Gen AI can be the answer to your data problems — but not all of them

CIO Business Intelligence

JUNE 12, 2024

“Most enterprise data is unstructured and semi-structured documents and code, as well as images and video. For example, gen AI can be used to extract metadata from documents, create indexes of information and knowledge graphs, and to query, summarize, and analyze this data. But sometimes there isn’t enough data,” says Thurai.

Modeling

Modeling Testing Cost-Benefit Metadata

How to Manage Risk with Modern Data Architectures

Cloudera

JUNE 29, 2023

To ensure the stability of the US financial system, the implementation of advanced liquidity risk models and stress testing using (MI/AI) could potentially serve as a protective measure. However, because most institutions lack a modern data architecture , they struggle to manage, integrate and analyze financial data at pace.

Data Architecture

Data Architecture Risk Management Risk Management

How to Ensure Continuous Improvement With Data Governance

Alation

FEBRUARY 3, 2022

They make testing and learning a part of that process. Using this methodology, teams will test new processes, monitor performance, and adjust based on results. In an approach based on continuous improvement, organizations must identify key assets so their metadata can be ingested and analyzed. Monitor and Measure Curation.

Data Governance

Data Governance Measurement Metadata Testing

From Data Silos to Data Fabric with Knowledge Graphs

Ontotext

SEPTEMBER 15, 2020

The Data Fabric paradigm combines design principles and methodologies for building efficient, flexible and reliable data management ecosystems. Knowledge Graphs are the Warp and Weft of a Data Fabric. To implement any Data Fabric approach, it is essential to be able to understand the context of data.

Metadata

Metadata Knowledge Discovery Data Quality Strategy

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Data has become an invaluable asset for businesses, offering critical insights to drive strategic decision-making and operational optimization. The business end-users were given a tool to discover data assets produced within the mesh and seamlessly self-serve on their data sharing needs.

Data Governance

Data Governance Publishing Data-driven Metadata

What Is Data Intelligence?

Alation

AUGUST 26, 2021

What Is Data Intelligence? Data intelligence is a system to deliver trustworthy, reliable data. It includes intelligence about data, or metadata. IDC coined the term, stating, “data intelligence helps organizations answer six fundamental questions about data.” Yet finding data is just the beginning.

Metadata

Metadata Data Governance Dashboards Software

Simplify and Improve Analytics with Self-Serve Data Prep!

Smarten

JANUARY 30, 2024

The right self-serve data prep solution can provide easy-to-use yet sophisticated data prep tools that are suitable for your business users, and enable data preparation techniques like: Connect and Mash Up Auto Suggesting Relationships JOINS and Types Sampling and Outliers Exploration, Cleaning, Shaping Reducing and Combining Data Insights (Data Quality (..)

Analytics

Analytics Visualization Data Quality Metadata

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

Announcing Open Source DataOps Data Quality TestGen 3.0

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Webinars

Trending Sources

Data’s dark secret: Why poor quality cripples AI and growth

Webinars

When is data too clean to be useful for enterprise AI?

7 Benefits of Metadata Management

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

What are model governance and model operations?

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Deep automation in machine learning

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

A Day in the Life of a DataOps Engineer

A Data Governance Self-Assessment Test

2024 Gartner Market Guide To DataOps

Why data observability is essential to AI governance

What is a Data Mesh?

What is data governance? Best practices for managing data assets

Alation + Soda: Dynamic Data Quality with the Data Catalog

The Need For Personalized Data Journeys for Your Data Consumers

The Gold Standard – The Key to Information Extraction and Data Quality Control

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Metadata enrichment – highly scalable data classification and data discovery

Available Now! Automated Testing for Data Transformations

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

What is a data fabric architecture?

Top 5 Data Catalog Benefits: Understanding Your Organization’s Data Lineage

SHACL-ing the Data Quality Dragon I: the Problem and the Tools

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

What you need to know about product management for AI

6 DataOps Best Practices to Increase Your Data Analytics Output AND Your Data Quality

Governing data in relational databases using Amazon DataZone

6 Case Studies on The Benefits of Business Intelligence And Analytics

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

10 master data management certifications that will pay off

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Using DataOps to Drive Agility and Business Value

Gen AI can be the answer to your data problems — but not all of them

How to Manage Risk with Modern Data Architectures

How to Ensure Continuous Improvement With Data Governance

From Data Silos to Data Fabric with Knowledge Graphs

HEMA accelerates their data governance journey with Amazon DataZone

What Is Data Intelligence?

Simplify and Improve Analytics with Self-Serve Data Prep!

Ensuring Data Transformation Quality with dbt Core

Stay Connected