Data Quality, Metadata and Reference

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data.

Data Quality

Data Quality Metrics Data-driven Management

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

Concurrent UPDATE/DELETE on overlapping partitions When multiple processes attempt to modify the same partition simultaneously, data conflicts can arise. For example, imagine a data quality process updating customer records with corrected addresses while another process is deleting outdated customer records.

Snapshot

Snapshot Management Metadata Big Data

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

Today, we are pleased to announce that Amazon DataZone is now able to present data quality information for data assets. Other organizations monitor the quality of their data through third-party solutions. Additionally, Amazon DataZone now offers APIs for importing data quality scores from external systems.

Data Quality

Data Quality Visualization Metadata Metrics

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

RDF-Star: Metadata Complexity Simplified

Ontotext

JUNE 10, 2021

This is a graph of millions of edges and vertices – in enterprise data management terms it is a giant piece of master/reference data. Not Every Graph is a Knowledge Graph: Schemas and Semantic Metadata Matter. They aren’t concerned with publishing or integrating data. open-world vs. closed-world assumptions).

Metadata

Metadata Cost-Benefit OLAP Modeling

Implement data quality checks on Amazon Redshift data assets and integrate with Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Data quality is crucial in data pipelines because it directly impacts the validity of the business insights derived from the data. Today, many organizations use AWS Glue Data Quality to define and enforce data quality rules on their data at rest and in transit.

Data Quality

Data Quality Visualization Metadata Key Performance Indicator

Best Practices for Metadata Management

Alation

JULY 19, 2021

What Is Metadata? Metadata is information about data. A clothing catalog or dictionary are both examples of metadata repositories. Indeed, a popular online catalog, like Amazon, offers rich metadata around products to guide shoppers: ratings, reviews, and product details are all examples of metadata.

Metadata

Metadata Management Data Governance Machine Learning

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

These formats, exemplified by Apache Iceberg, Apache Hudi, and Delta Lake, addresses persistent challenges in traditional data lake structures by offering an advanced combination of flexibility, performance, and governance capabilities. For more details, refer to Iceberg Release 1.6.1. We highlight its notable updates in this section.

Snapshot

Snapshot Metadata Data Lake Optimization

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

SageMaker still includes all the existing ML and AI capabilities you’ve come to know and love for data wrangling, human-in-the-loop data labeling with Amazon SageMaker Ground Truth , experiments, MLOps, Amazon SageMaker HyperPod managed distributed training, and more. Having confidence in your data is key.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

AWS Big Data

OCTOBER 9, 2024

Some customers build custom in-house data parity frameworks to validate data during migration. Others use open source data quality products for data parity use cases. This takes away important person hours from the actual migration effort into building and maintaining a data parity framework.

Data Quality

Data Quality Data Lake Data Warehouse Metrics

Alation 2022.2: Open Data Quality Initiative and Enhanced Data Governance

Alation

MAY 24, 2022

generally available on May 24, Alation introduces the Open Data Quality Initiative for the modern data stack, giving customers the freedom to choose the data quality vendor that’s best for them with the added confidence that those tools will integrate seamlessly with Alation’s Data Catalog and Data Governance application.

Data Quality

Data Quality Data Governance Metadata Metrics

Data Intelligence and Its Role in Combating Covid-19

erwin

MARCH 30, 2020

To marry the epidemiological data to the population data it will require a tremendous amount of data intelligence about the: Source of the data; Currency of the data; Quality of the data; and. Unraveling Data Complexities with Metadata Management. Data lineage to support impact analysis.

Metadata

Metadata IT Data Governance Data Quality

Alation and Salesforce partner on data governance for Data Cloud

CIO Business Intelligence

SEPTEMBER 19, 2024

It will do this, it said, with bidirectional integration between its platform and Salesforce’s to seamlessly delivers data governance and end-to-end lineage within Salesforce Data Cloud. Additional to that, we are also allowing the metadata inside of Alation to be read into these agents.”

Data Governance

Data Governance Metadata Unstructured Data Structured Data

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

Anomaly detection is well-known in the financial industry, where it’s frequently used to detect fraudulent transactions, but it can also be used to catch and fix data quality issues automatically. If you suddenly see unexpected patterns in your social data, that may mean adversaries are attempting to poison your data sources.

Machine Learning

Machine Learning Software Metadata Testing

Benefits of Data Dictionary Tools for Enterprise Metadata Management

Octopai

FEBRUARY 12, 2020

Like any good puzzle, metadata management comes with a lot of complex variables. That’s why you need to use data dictionary tools, which can help organize your metadata into an archive that can be navigated with ease and from which you can derive good information to power informed decision-making. Why Have a Data Dictionary? #1

Metadata

Metadata Enterprise Management Data Warehouse

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The Business Application Research Center (BARC) warns that data governance is a highly complex, ongoing program, not a “big bang initiative,” and it runs the risk of participants losing trust and interest over time. The program must introduce and support standardization of enterprise data.

Data Governance

Data Governance Management Metadata Data Quality

The Gold Standard – The Key to Information Extraction and Data Quality Control

Ontotext

MAY 26, 2021

Without all this background knowledge, before computers can perform like humans, they need a machine-readable point of reference that represents “the ground truth”. One of the main uses of the Gold Standard is to train AI systems to identify the patterns in various types of data with the help of machine learning (ML) algorithms.

Data Quality

Data Quality Machine Learning Measurement Metadata

Data integrity vs. data quality: Is there a difference?

IBM Big Data Hub

JULY 13, 2023

When we talk about data integrity, we’re referring to the overarching completeness, accuracy, consistency, accessibility, and security of an organization’s data. Together, these factors determine the reliability of the organization’s data. In short, yes.

Data Quality

Data Quality Data Integration Metadata Cost-Benefit

Informatica Embraces AI for Data Intelligence and Operations

David Menninger's Analyst Perspectives

MAY 8, 2025

It expanded its focus to address wider data integration and data management challenges, including master data management, data quality and data governance. The latter was boosted by the companys most recent acquisition , adding the data management access and privacy capabilities of Privitar in 2023.

Data Quality

Data Quality Data Governance Data Integration Software

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Implement data privacy policies. Implement data quality by data type and source. Let’s look at some of the key changes in the data pipelines namely, data cataloging, data quality, and vector embedding security in more detail. Link structured and unstructured datasets.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Maximize your data dividends with active metadata

IBM Big Data Hub

NOVEMBER 28, 2022

Metadata management performs a critical role within the modern data management stack. It helps blur data silos, and empowers data and analytics teams to better understand the context and quality of data. This, in turn, builds trust in data and the decision-making to follow. Improve data discovery.

Metadata

Metadata Data Quality Data-driven Data Governance

Top 10 Data Governance Trends for 2020: Data’s Real Value Comes Into Focus

erwin

JANUARY 3, 2020

As organizations become data-driven and awash in an overwhelming amount of data from multiple data sources (AI, IoT, ML, etc.), they will find new ways to get a handle on data quality and focus on data management processes and best practices.

Data Governance

Data Governance Digital Transformation IoT Metadata

The Need For Personalized Data Journeys for Your Data Consumers

DataKitchen

OCTOBER 20, 2023

Deploying a Data Journey Instance unique to each customer’s payload is vital to fill this gap. Such an instance answers the critical question of ‘Dude, Where is my data?’ ’ while maintaining operational efficiency and ensuring data quality—thus preserving customer satisfaction and the team’s credibility.

Insurance

Insurance Metadata Data-driven Data Quality

Metadata Management and Data Governance with Cloudera SDX

Cloudera

JANUARY 26, 2024

In this article, we will walk you through the process of implementing fine grained access control for the data governance framework within the Cloudera platform. In a good data governance strategy, it is important to define roles that allow the business to limit the level of access that users can have to their strategic data assets.

Metadata

Metadata Data Governance Management Finance

What an Old Dictionary teaches us about Metadata

Jim Harris

MAY 5, 2017

Spelling, pronunciation, and examples of usage are included in the dictionary definition of a word, which is a good example of one of the many uses of metadata, namely to provide a definition, description, and context for data. In practice, I haven’t encountered a metadata dictionary that could deliver on that promise.

Metadata

Metadata Publishing Management IT

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

.’ It’s not just about playing detective to discover where things went wrong; it’s about proactively monitoring your entire data journey to ensure everything goes right with your data. What is Data in Place? There are multiple locations where problems can happen in a data and analytic system.

Testing

Testing Data Quality Predictive Modeling Metrics

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

erwin

JULY 17, 2019

Your organization won’t be able to take complete advantage of analytics tools to become data-driven unless you establish a foundation for agile and complete data management. You need automated data mapping and cataloging through the integration lifecycle process, inclusive of data at rest and data in motion.

Digital Transformation

Digital Transformation Strategy Metadata Data-driven

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

It’s the preferred choice when customers need more control and customization over the data integration process or require complex transformations. This flexibility makes Glue ETL suitable for scenarios where data must be transformed or enriched before analysis. Review the details and choose Create and launch integration.

Data Integration

Data Integration Data Lake Statistics Data-driven

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

Based on business rules, additional data quality tests check the dimensional model after the ETL job completes. While implementing a DataOps solution, we make sure that the pipeline has enough automated tests to ensure data quality and reduce the fear of failure. Monitoring Job Metadata.

Testing

Testing Metadata Dashboards Statistics

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Figure 1: Flow of actions for self-service analytics around data assets stored in relational databases First, the data producer needs to capture and catalog the technical metadata of the data asset. The producer also needs to manage and publish the data asset so it’s discoverable throughout the organization.

Metadata

Metadata Data Lake Data Processing Data-driven

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This also includes building an industry standard integrated data repository as a single source of truth, operational reporting through real time metrics, data quality monitoring, 24/7 helpdesk, and revenue forecasting through financial projections and supply availability projections.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

AWS Lake Formation 2023 year in review

AWS Big Data

JANUARY 18, 2024

Easily and securely prepare, share, and query data – This session shows how you can use Lake Formation and the AWS Glue Data Catalog to share data without copying, transform and prepare data without coding, and query data. DataZone automatically manages the permissions of your shared data in the DataZone projects.

Data Lake

Data Lake Metadata Data Governance Statistics

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Ontotext

JULY 29, 2021

KGs bring the Semantic Web paradigm to the enterprises, by introducing semantic metadata to drive data management and content management to new levels of efficiency and breaking silos to let them synergize with various forms of knowledge management. The RDF data model and the other standards in W3C’s Semantic Web stack (e.g.,

Enterprise

Enterprise Metadata Knowledge Discovery Management

What is BCBS 239 Compliance?

Octopai

JANUARY 19, 2020

BCBS 239 is a document published by that committee entitled, Principles for Effective Risk Data Aggregation and Risk Reporting. You can see why it’s referred to by number and not by the title.) BCBS 239 and Automated Metadata Management Tools. You may recognize the common thread running through all of these principles: Metadata.

Metadata

Metadata Risk Management Business Intelligence Data Governance

The Value of Catalog-Led Data Governance

Alation

NOVEMBER 4, 2021

The practitioner asked me to add something to a presentation for his organization: the value of data governance for things other than data compliance and data security. Now to be honest, I immediately jumped onto data quality. Data quality is a very typical use case for data governance.

Data Governance

Data Governance Metadata Data Quality Enterprise

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

Flexible and easy to use – The solutions should provide less restrictive, easy-to-access, and ready-to-use data. And unlike data warehouses, which are primarily analytical stores, a data hub is a combination of all types of repositories—analytical, transactional, operational, reference, and data I/O services, along with governance processes.

Analytics

Analytics Data Warehouse Data Lake Metadata

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

There may even be someone on your team who built a personalized video recommender before and can help scope and estimate the project requirements using that past experience as a point of reference. You might have millions of short videos , with user ratings and limited metadata about the creators or content.

Management

Management Machine Learning Experimentation Metrics

Best Practices for Data Catalog Implementation

Octopai

JUNE 19, 2023

In an era where data is often referred to as the new oil, having a well-organized and easily accessible data catalog is no longer a luxury but a necessity as organizations deal with the deluge of too much data (data bloatedness) coming from every system and landscape.

Metadata

Metadata Data Governance Measurement Risk Management

Metadata Management & Data Governance with Cloudera SDX

Cloudera

MARCH 4, 2024

In this article, we will walk you through the process of implementing fine grained access control for the data governance framework within the Cloudera platform. In a good data governance strategy, it is important to define roles that allow the business to limit the level of access that users can have to their strategic data assets.

Metadata

Metadata Data Governance Management Finance

Clean up your Excel and CSV files without writing code using AWS Glue DataBrew

AWS Big Data

NOVEMBER 15, 2023

As the organization receives data from multiple external vendors, it often arrives in different formats, typically Excel or CSV files, with each vendor using their own unique data layout and structure. DataBrew is an excellent tool for data quality and preprocessing. For Matching conditions , choose Match all conditions.

Metadata

Metadata Sales Data Lake Big Data

Introducing Amazon MWAA support for Apache Airflow version 2.7.2 and deferrable operators

AWS Big Data

NOVEMBER 6, 2023

Refer to the Configuration reference in the User Guide for detailed configuration values. The following graph describes a simple data quality check pipeline using setup and teardown tasks. To learn more about Setup and Teardown tasks, refer to the Apache Airflow documentation.

Metrics

Metrics Metadata Snapshot Management

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

AWS Big Data

JULY 25, 2024

Amazon DataZone provides rich functionality to help a data platform team distribute ownership of tasks so that these teams can choose to operate less like gatekeepers. In Amazon DataZone, data owners can publish their data and its business catalog (metadata) to ATPCO’s DataZone domain. Choose Next.

Data Lake

Data Lake Metadata Sales Publishing

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

This streamlined architecture approach offers several advantages: Single source of truth – The Central IT team acts as the custodian of the combined and curated data from all business units, thereby providing a unified and consistent dataset. Similarly, individual business units produce their own domain-specific data.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data. Then, you transform this data into a concise format.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Data Lakes on Cloud & it’s Usage in Healthcare

BizAcuity

MARCH 29, 2019

Load data into staging, perform data quality checks, clean and enrich it, steward it, and run reports on it completing the full management cycle. Numbers are only good if the data quality is good. To get an in-depth knowledge of the practices mentioned above please refer to the blog on Oracle’s webpage.

Data Lake

Data Lake Unstructured Data Cost-Benefit Data Quality

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Webinars

Trending Sources

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

Webinars

RDF-Star: Metadata Complexity Simplified

Implement data quality checks on Amazon Redshift data assets and integrate with Amazon DataZone

Best Practices for Metadata Management

Use open table format libraries on AWS Glue 5.0 for Apache Spark

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

Alation 2022.2: Open Data Quality Initiative and Enhanced Data Governance

Data Intelligence and Its Role in Combating Covid-19

Alation and Salesforce partner on data governance for Data Cloud

Deep automation in machine learning

Benefits of Data Dictionary Tools for Enterprise Metadata Management

What is data governance? Best practices for managing data assets

The Gold Standard – The Key to Information Extraction and Data Quality Control

Data integrity vs. data quality: Is there a difference?

Informatica Embraces AI for Data Intelligence and Operations

Data governance in the age of generative AI

Maximize your data dividends with active metadata

Top 10 Data Governance Trends for 2020: Data’s Real Value Comes Into Focus

The Need For Personalized Data Journeys for Your Data Consumers

Metadata Management and Data Governance with Cloudera SDX

What an Old Dictionary teaches us about Metadata

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

A Day in the Life of a DataOps Engineer

Governing data in relational databases using Amazon DataZone

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Lake Formation 2023 year in review

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

What is BCBS 239 Compliance?

The Value of Catalog-Led Data Governance

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

What you need to know about product management for AI

Best Practices for Data Catalog Implementation

Metadata Management & Data Governance with Cloudera SDX

Clean up your Excel and CSV files without writing code using AWS Glue DataBrew

Introducing Amazon MWAA support for Apache Airflow version 2.7.2 and deferrable operators

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Create an end-to-end data strategy for Customer 360 on AWS

Data Lakes on Cloud & it’s Usage in Healthcare

Stay Connected