Data Quality, Events and Metadata

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data.

Data Quality

Data Quality Metrics Data-driven Management

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Yet, despite growing investments in advanced analytics and AI, organizations continue to grapple with a persistent and often underestimated challenge: poor data quality.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

Concurrent UPDATE/DELETE on overlapping partitions When multiple processes attempt to modify the same partition simultaneously, data conflicts can arise. For example, imagine a data quality process updating customer records with corrected addresses while another process is deleting outdated customer records.

Snapshot

Snapshot Management Metadata Big Data

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

Datasphere goes beyond the “big three” data usage end-user requirements (ease of discovery, access, and delivery) to include data orchestration (data ops and data transformations) and business data contextualization (semantics, metadata, catalog services).

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Domain ownership recognizes that the teams generating the data have the deepest understanding of it and are therefore best suited to manage, govern, and share it effectively. This principle makes sure data accountability remains close to the source, fostering higher data quality and relevance.

Metadata

Metadata Data Governance Data Quality Data-driven

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

It addresses many of the shortcomings of traditional data lakes by providing features such as ACID transactions, schema evolution, row-level updates and deletes, and time travel. In this blog post, we’ll discuss how the metadata layer of Apache Iceberg can be used to make data lakes more efficient.

Metadata

Metadata Snapshot Data Lake Metrics

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

These formats, exemplified by Apache Iceberg, Apache Hudi, and Delta Lake, addresses persistent challenges in traditional data lake structures by offering an advanced combination of flexibility, performance, and governance capabilities. These are useful for flexible data lifecycle management.

Snapshot

Snapshot Metadata Data Lake Optimization

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

Know thy data: understand what it is (formats, types, sampling, who, what, when, where, why), encourage the use of data across the enterprise, and enrich your datasets with searchable (semantic and content-based) metadata (labels, annotations, tags). So, if you have 1 trillion data points (g.,

Strategy

Strategy Experimentation Uncertainty Machine Learning

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

Anomaly detection is well-known in the financial industry, where it’s frequently used to detect fraudulent transactions, but it can also be used to catch and fix data quality issues automatically. If you suddenly see unexpected patterns in your social data, that may mean adversaries are attempting to poison your data sources.

Machine Learning

Machine Learning Software Metadata Testing

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

An extract, transform, and load (ETL) process using AWS Glue is triggered once a day to extract the required data and transform it into the required format and quality, following the data product principle of data mesh architectures. From here, the metadata is published to Amazon DataZone by using AWS Glue Data Catalog.

IoT

IoT Machine Learning Metadata Data-driven

Why data observability is essential to AI governance

erwin

DECEMBER 9, 2024

Data observability provides the ability to immediately recognize, and be alerted to, the emergence of hallucinations and accept or reject these changes iteratively, thereby training and validating the data. Maybe your AI model monitors sales data, and the data is spiking for one region of the country due to a world event.

Metadata

Metadata Data Quality Sales Modeling

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

It’s the preferred choice when customers need more control and customization over the data integration process or require complex transformations. This flexibility makes Glue ETL suitable for scenarios where data must be transformed or enriched before analysis. Check CloudWatch log events for the SEED Load.

Data Integration

Data Integration Data Lake Statistics Data-driven

Data integrity vs. data quality: Is there a difference?

IBM Big Data Hub

JULY 13, 2023

When we talk about data integrity, we’re referring to the overarching completeness, accuracy, consistency, accessibility, and security of an organization’s data. Together, these factors determine the reliability of the organization’s data.

Data Quality

Data Quality Data Integration Metadata Cost-Benefit

The Need For Personalized Data Journeys for Your Data Consumers

DataKitchen

OCTOBER 20, 2023

Deploying a Data Journey Instance unique to each customer’s payload is vital to fill this gap. Such an instance answers the critical question of ‘Dude, Where is my data?’ ’ while maintaining operational efficiency and ensuring data quality—thus preserving customer satisfaction and the team’s credibility.

Insurance

Insurance Metadata Data-driven Data Quality

What is Data Lineage? Top 5 Benefits of Data Lineage

erwin

APRIL 29, 2020

An understanding of the data’s origins and history helps answer questions about the origin of data in a Key Performance Indicator (KPI) reports, including: How the report tables and columns are defined in the metadata? Who are the data owners? Data lineage offers proof that the data provided is reflected accurately.

Metadata

Metadata Key Performance Indicator Data Governance Data Quality

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The Business Application Research Center (BARC) warns that data governance is a highly complex, ongoing program, not a “big bang initiative,” and it runs the risk of participants losing trust and interest over time. The program must introduce and support standardization of enterprise data.

Data Governance

Data Governance Management Metadata Data Quality

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

Octopai

APRIL 19, 2021

Aptly named, metadata management is the process in which BI and Analytics teams manage metadata, which is the data that describes other data. In other words, data is the context and metadata is the content. Without metadata, BI teams are unable to understand the data’s full story.

Metadata

Metadata Management Business Intelligence Data Governance

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

These layers help teams delineate different stages of data processing, storage, and access, offering a structured approach to data management. In the context of Data in Place, validating data quality automatically with Business Domain Tests is imperative for ensuring the trustworthiness of your data assets.

Testing

Testing Data Quality Predictive Modeling Metrics

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

The DataOps pipeline you have built has enough automated tests to catch errors, and error events are tied to some form of real-time alerts. Based on business rules, additional data quality tests check the dimensional model after the ETL job completes. Monitoring Job Metadata. Adding Tests to Reduce Stress.

Testing

Testing Metadata Dashboards Statistics

Level up your Kafka applications with schemas

IBM Big Data Hub

NOVEMBER 21, 2023

Apache Kafka is a well-known open-source event store and stream processing platform and has grown to become the de facto standard for data streaming. Apache Kafka transfers data without validating the information in the messages. Kafka does not examine the metadata of your messages. What’s next?

Data Quality

Data Quality Metadata Data-driven Optimization

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

Data Virtualization can include web process automation tools and semantic tools that help easily and reliably extract information from the web, and combine it with corporate information, to produce immediate results. How does Data Virtualization manage data quality requirements? In forecasting future events.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Top 6 Benefits of Automating End-to-End Data Lineage

erwin

SEPTEMBER 17, 2020

Here are six benefits of automating end-to-end data lineage: Reduced Errors and Operational Costs. Data quality is crucial to every organization. Automated data capture can significantly reduce errors when compared to manual entry. Automating data capture frees up resources to focus on more strategic and useful tasks.

Cost-Benefit

Cost-Benefit Data Governance Metadata Reporting

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This also includes building an industry standard integrated data repository as a single source of truth, operational reporting through real time metrics, data quality monitoring, 24/7 helpdesk, and revenue forecasting through financial projections and supply availability projections.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Figure 1: Flow of actions for self-service analytics around data assets stored in relational databases First, the data producer needs to capture and catalog the technical metadata of the data asset. The producer also needs to manage and publish the data asset so it’s discoverable throughout the organization.

Metadata

Metadata Data Lake Data Processing Data-driven

Keeping Up with New Data Protection Regulations

erwin

APRIL 11, 2019

The Regulatory Rationale for Integrating Data Management & Data Governance. This is also true for existing data regulations. Compliance is an on-going requirement, so efforts to become compliant should not be treated as static events. In fact, such an understanding is arguably better put to use proactively.

Insurance

Insurance Data-driven Data Governance Metadata

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

It covers how to use a conceptual, logical architecture for some of the most popular gaming industry use cases like event analysis, in-game purchase recommendations, measuring player satisfaction, telemetry data analysis, and more. Unlike ingestion processes, data can be transformed as per business rules before loading.

Analytics

Analytics Data Warehouse Data Lake Metadata

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Ontotext

JULY 29, 2021

KGs bring the Semantic Web paradigm to the enterprises, by introducing semantic metadata to drive data management and content management to new levels of efficiency and breaking silos to let them synergize with various forms of knowledge management. The RDF data model and the other standards in W3C’s Semantic Web stack (e.g.,

Enterprise

Enterprise Metadata Knowledge Discovery Management

Embedding AI Into Every Aspect of Your Business

Cloudera

JULY 20, 2021

Invest in maturing and improving your enterprise business metrics and metadata repositories, a multitiered data architecture, continuously improving data quality, and managing data acquisitions. enhanced customer experiences by accelerating the use of data across the organization.

Manufacturing

Manufacturing Forecasting IoT Insurance

From Data Silos to Data Fabric with Knowledge Graphs

Ontotext

SEPTEMBER 15, 2020

Today’s organizations are dealing with data of unprecedented diversity in terms of type, location and use at equally unprecedented volumes and no one is proposing that it is ever going to simplify. This multiplicity of data leads to the growth silos, which in turns increases the cost of integration.

Metadata

Metadata Knowledge Discovery Data Quality Strategy

Introducing Amazon MWAA support for Apache Airflow version 2.7.2 and deferrable operators

AWS Big Data

NOVEMBER 6, 2023

The trigger runs in a parent process called a triggerer , a service that runs an asyncio event loop. The following graph describes a simple data quality check pipeline using setup and teardown tasks. The triggerer has the capability to run triggers in parallel at scale, and to signal tasks to resume when a condition is met.

Metrics

Metrics Metadata Snapshot Management

6 Case Studies on The Benefits of Business Intelligence And Analytics

datapine

JANUARY 31, 2022

As Dan Jeavons Data Science Manager at Shell stated: “what we try to do is to think about minimal viable products that are going to have a significant business impact immediately and use that to inform the KPIs that really matter to the business”. Experience the power of Business Intelligence with our 14-days free trial!

Business Intelligence

Business Intelligence Analytics Cost-Benefit ROI

Top 10 Data Lineage Podcasts, Blogs, and Magazines

Octopai

JANUARY 31, 2021

The particular episode we recommend looks at how WeWork struggled with understanding their data lineage so they created a metadata repository to increase visibility. Agile Data. Another podcast we think is worth a listen is Agile Data. Currently, he is in charge of the Technical Operations team at MIT Open Learning.

Data Governance

Data Governance Data Processing Data Quality Metadata

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Cloudera

APRIL 1, 2024

We scored the highest in hybrid, intercloud, and multi-cloud capabilities because we are the only vendor in the market with a true hybrid data platform that can run on any cloud including private cloud to deliver a seamless, unified experience for all data, wherever it lies.

Unstructured Data

Unstructured Data Cost-Benefit Metadata Machine Learning

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data. Then, you transform this data into a concise format.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

Bad data tax is rampant in most organizations. Currently, every organization is blindly chasing the GenAI race, often forgetting that data quality and semantics is one of the fundamentals to achieving AI success. Sadly, data quality is losing to data quantity, resulting in “ Infobesity ”. “Any

Metadata

Metadata Data Lake Data Warehouse Data Quality

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Cloudera

AUGUST 31, 2021

Running on CDW is fully integrated with streaming, data engineering, and machine learning analytics. It has a consistent framework that secures and provides governance for all data and metadata on private clouds, multiple public clouds, or hybrid clouds. Smart DwH Mover helps in accelerating data warehouse migration.

Data Warehouse

Data Warehouse Cost-Benefit Metadata Data-driven

What Is Data Intelligence?

Alation

AUGUST 26, 2021

What Is Data Intelligence? Data intelligence is a system to deliver trustworthy, reliable data. It includes intelligence about data, or metadata. IDC coined the term, stating, “data intelligence helps organizations answer six fundamental questions about data.” Yet finding data is just the beginning.

Metadata

Metadata Data Governance Dashboards Software

How to Manage Risk with Modern Data Architectures

Cloudera

JUNE 29, 2023

Incorporate data from novel sources — social media feeds, alternative credit histories (utility and rental payments), geo-spatial systems, and IoT streams — into liquidity risk models. CDP also enables data and platform architects, data stewards, and other experts to manage and control data from a single location.

Data Architecture

Data Architecture Risk Management Risk Management

Gartner Data & Analytics Summit 2022 in London: 3 Key Takeaways

Alation

MAY 19, 2022

Alation attended last week’s Gartner Data and Analytics Summit in London from May 9 – 11, 2022. Coming off the heels of Data Innovation Summit in Stockholm, it’s clear that in-person events are back with a vengeance, and we’re thrilled about it. Establish what data you have. Leverage small data.

Metadata

Metadata Data Analytics Analytics Data Governance

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Ontotext

DECEMBER 1, 2023

The event held the space for presentations, discussions, and one-on-one meetings, where more than 20 partners, 1064 Registrants from 41 countries, spanning across 25 industries came together. According to him, “failing to ensure data quality in capturing and structuring knowledge, turns any knowledge graph into a piece of abstract art”.

Metadata

Metadata Sales Machine Learning Consulting

Enabling Integration and Interoperability Across the Grid with Knowledge Graphs

Ontotext

JULY 15, 2024

The generation, transmission, distribution and sale of electrical power generates a lot of data needed across a variety of roles to address reporting requirements, changing regulations, advancing technology, rapid responses to extreme weather events and more.

Contextual Data

Contextual Data Metadata Data Quality Publishing

Secrets from Data Governance Leaders: DGIQ West 2023 (June 5 – 9)

Alation

MAY 31, 2023

If you’re not familiar with DGIQ, it’s the world’s most comprehensive event dedicated to, you guessed it, data governance and information quality. This year’s DGIQ West will host tutorials, workshops, seminars, general conference sessions, and case studies for global data leaders.

Data Governance

Data Governance Insurance Metadata Data-driven

“Data is the closest thing to magic in the modern world…”

Timo Elliott

JANUARY 5, 2022

The first one is: companies should invest more in improving their data quality before doing anything else. You must master your metadata and make sure that everything is lined up. To make a big step forward with data science, you first need to do that painful work. That’s an awful waste of resources.

Digital Transformation

Digital Transformation Machine Learning Enterprise Data Science

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

We also used AWS Lambda for data processing. To further optimize and improve the developer velocity for our data consumers, we added Amazon DynamoDB as a metadata store for different data sources landing in the data lake. The data is partitioned on InputDataSetName, Year, Month, and Date.

Optimization

Optimization Forecasting Data Lake Metadata

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Data’s dark secret: Why poor quality cripples AI and growth

Webinars

Trending Sources

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Webinars

SAP Datasphere Powers Business at the Speed of Data

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Deep automation in machine learning

How EUROGATE established a data mesh architecture using Amazon DataZone

Why data observability is essential to AI governance

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Data integrity vs. data quality: Is there a difference?

The Need For Personalized Data Journeys for Your Data Consumers

What is Data Lineage? Top 5 Benefits of Data Lineage

What is data governance? Best practices for managing data assets

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

A Day in the Life of a DataOps Engineer

Level up your Kafka applications with schemas

Biggest Trends in Data Visualization Taking Shape in 2022

Top 6 Benefits of Automating End-to-End Data Lineage

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

Governing data in relational databases using Amazon DataZone

Keeping Up with New Data Protection Regulations

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Embedding AI Into Every Aspect of Your Business

From Data Silos to Data Fabric with Knowledge Graphs

Introducing Amazon MWAA support for Apache Airflow version 2.7.2 and deferrable operators

6 Case Studies on The Benefits of Business Intelligence And Analytics

Top 10 Data Lineage Podcasts, Blogs, and Magazines

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Create an end-to-end data strategy for Customer 360 on AWS

How Knowledge Graphs Power Data Mesh and Data Fabric

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

What Is Data Intelligence?

How to Manage Risk with Modern Data Architectures

Gartner Data & Analytics Summit 2022 in London: 3 Key Takeaways

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Enabling Integration and Interoperability Across the Grid with Knowledge Graphs

Secrets from Data Governance Leaders: DGIQ West 2023 (June 5 – 9)

“Data is the closest thing to magic in the modern world…”

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Stay Connected