Data Quality, Metadata and Optimization

Announcing Open Source DataOps Data Quality TestGen 3.0

DataKitchen

FEBRUARY 20, 2025

Announcing DataOps Data Quality TestGen 3.0: Open-Source, Generative Data Quality Software. You don’t have to imagine — start using it today: [link] Introducing Data Quality Scoring in Open Source DataOps Data Quality TestGen 3.0! DataOps just got more intelligent.

Data Quality

Data Quality Scorecard Testing Dashboards

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

NOVEMBER 13, 2024

By adding the Octopai platform, Cloudera customers will benefit from: Enhanced Data Discovery: Octopai’s automated data discovery enables instantaneous search and location of desired data across multiple systems. This guarantees data quality and automates the laborious, manual processes required to maintain data reliability.

Metadata

Metadata Management Data Governance Data-driven

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data.

Data Quality

Data Quality Metrics Data-driven Management

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Yet, despite growing investments in advanced analytics and AI, organizations continue to grapple with a persistent and often underestimated challenge: poor data quality.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

We will explore Icebergs concurrency model, examine common conflict scenarios, and provide practical implementation patterns of both automatic retry mechanisms and situations requiring custom conflict resolution logic for building resilient data pipelines. The Data Catalog provides the functionality as the Iceberg catalog.

Snapshot

Snapshot Management Metadata Big Data

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

Despite their advantages, traditional data lake architectures often grapple with challenges such as understanding deviations from the most optimal state of the table over time, identifying issues in data pipelines, and monitoring a large number of tables. It is essential for optimizing read and write performance.

Metadata

Metadata Snapshot Data Lake Metrics

RDF-Star: Metadata Complexity Simplified

Ontotext

JUNE 10, 2021

With graph databases the representation of relationships as data make it possible to better represent data in real time, addressing newly discovered types of data and relationships. Relational databases benefit from decades of tweaks and optimizations to deliver performance. Metadata about Relationships Come in Handy.

Metadata

Metadata Cost-Benefit OLAP Modeling

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Domain ownership recognizes that the teams generating the data have the deepest understanding of it and are therefore best suited to manage, govern, and share it effectively. This principle makes sure data accountability remains close to the source, fostering higher data quality and relevance.

Metadata

Metadata Data Governance Data Quality Data-driven

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

These formats, exemplified by Apache Iceberg, Apache Hudi, and Delta Lake, addresses persistent challenges in traditional data lake structures by offering an advanced combination of flexibility, performance, and governance capabilities. These are useful for flexible data lifecycle management. In earlier posts, we discussed AWS Glue 5.0

Snapshot

Snapshot Metadata Data Lake Optimization

What Is Active Metadata Management and How Does It Work?

Octopai

OCTOBER 18, 2021

First, what active metadata management isn’t : “Okay, you metadata! Now, what active metadata management is (well, kind of): “Okay, you metadata! I will, of course, end up with a very amateurish finished product, because I used sub-optimal tools to do the job. Data assets are tools. Quit lounging around!

Metadata

Metadata Management IT Data Quality

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. From here, the metadata is published to Amazon DataZone by using AWS Glue Data Catalog. This process is shown in the following figure.

IoT

IoT Machine Learning Metadata Data-driven

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

We won’t be writing code to optimize scheduling in a manufacturing plant; we’ll be training ML algorithms to find optimum performance based on historical data. With machine learning, the challenge isn’t writing the code; the algorithms are implemented in a number of well-known and highly optimized libraries.

Machine Learning

Machine Learning Software Metadata Testing

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This also includes building an industry standard integrated data repository as a single source of truth, operational reporting through real time metrics, data quality monitoring, 24/7 helpdesk, and revenue forecasting through financial projections and supply availability projections.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

2024 Gartner Market Guide To DataOps

DataKitchen

AUGUST 16, 2024

At DataKitchen, we think of this is a ‘meta-orchestration’ of the code and tools acting upon the data. Data Pipeline Observability: Optimizes pipelines by monitoring data quality, detecting issues, tracing data lineage, and identifying anomalies using live and historical metadata.

Marketing

Marketing Data Quality Testing Metadata

What are model governance and model operations?

O'Reilly on Data

JUNE 19, 2019

In a previous post , we noted some key attributes that distinguish a machine learning project: Unlike traditional software where the goal is to meet a functional specification, in ML the goal is to optimize a metric. Quality depends not just on code, but also on data, tuning, regular updates, and retraining.

Modeling

Modeling Machine Learning Testing Metrics

Maximize your data dividends with active metadata

IBM Big Data Hub

NOVEMBER 28, 2022

Metadata management performs a critical role within the modern data management stack. It helps blur data silos, and empowers data and analytics teams to better understand the context and quality of data. This, in turn, builds trust in data and the decision-making to follow. Improve data discovery.

Metadata

Metadata Data Quality Data-driven Data Governance

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The Business Application Research Center (BARC) warns that data governance is a highly complex, ongoing program, not a “big bang initiative,” and it runs the risk of participants losing trust and interest over time. The program must introduce and support standardization of enterprise data.

Data Governance

Data Governance Management Metadata Data Quality

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Poor data quality is one of the top barriers faced by organizations aspiring to be more data-driven. Ill-timed business decisions and misinformed business processes, missed revenue opportunities, failed business initiatives and complex data systems can all stem from data quality issues.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

L1 is usually the raw, unprocessed data ingested directly from various sources; L2 is an intermediate layer featuring data that has undergone some form of transformation or cleaning; and L3 contains highly processed, optimized, and typically ready for analytics and decision-making processes. What is Data in Use?

Testing

Testing Data Quality Predictive Modeling Metrics

Introducing Cloudera Observability Premium

Cloudera

JULY 10, 2024

In the public cloud, these cost management issues are compounded by consumption rates, where compute is often overused due to a lack of visibility into optimization opportunities. The data temperature feature lets us see whether hot or cold data sets are deployed optimally, including the underlying file sizes and partitioning styles.

Cost-Benefit

Cost-Benefit Metadata Optimization Measurement

What is Active Metadata & Why it Matters: Key Insights from Gartner’s Market Guide

Alation

MARCH 2, 2023

It does feel, however, as if we need jet-like speed to analyze and understand our data, who is using it, how it is used, and if it is being used to drive value. With lots of data comes yet more calls for automation, optimization, and productivity initiatives to put that data to good use. This data about data is valuable.

Metadata

Metadata Marketing IT Data Quality

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

Data Virtualization can include web process automation tools and semantic tools that help easily and reliably extract information from the web, and combine it with corporate information, to produce immediate results. How does Data Virtualization manage data quality requirements? In improving operational processes.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

7 enterprise data strategy trends

CIO Business Intelligence

NOVEMBER 22, 2022

The next step in every organization’s data strategy, Guan says, should be investing in and leveraging artificial intelligence and machine learning to unlock more value out of their data. The fabric, especially at the active metadata level, is important, Saibene notes.

Data Strategy

Data Strategy Strategy Enterprise Consulting

AWS Lake Formation 2023 year in review

AWS Big Data

JANUARY 18, 2024

Curate your data at scale – This session shows how solutions like AWS Glue, AWS Glue Data Quality , and Lake Formation can help you manage your best sources and find sensitive information. DataZone automatically manages the permissions of your shared data in the DataZone projects. Crawlers, salut!

Data Lake

Data Lake Metadata Data Governance Statistics

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

They conveniently store data in a flat architecture that can be queried in aggregate and offer the speed and lower cost required for big data analytics. On the other hand, they don’t support transactions or enforce data quality. Each ETL step risks introducing failures or bugs that reduce data quality. .

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

AWS Big Data

DECEMBER 9, 2024

As organizations process vast amounts of data, maintaining an accurate historical record is crucial. History management in data systems is fundamental for compliance, business intelligence, data quality, and time-based analysis. In customer relationship management, it tracks changes in customer information over time.

Snapshot

Snapshot Data Warehouse Data Lake Data Quality

How Fujitsu implemented a global data mesh architecture and democratized data

AWS Big Data

MAY 1, 2024

To provide a variety of products, services, and solutions that are better suited to customers and society in each region, we have built business processes and systems that are optimized for each region and its market. Responsibilities include: Load raw data from the data source system at the appropriate frequency.

Dashboards

Dashboards Publishing Data-driven Cost-Benefit

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

It’s the preferred choice when customers need more control and customization over the data integration process or require complex transformations. This flexibility makes Glue ETL suitable for scenarios where data must be transformed or enriched before analysis. The company stores vast amounts of transactional data in ServiceNow.

Data Integration

Data Integration Data Lake Statistics Data-driven

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

You might have millions of short videos , with user ratings and limited metadata about the creators or content. Job postings have a much shorter relevant lifetime than movies, so content-based features and metadata about the company, skills, and education requirements will be more important in this case.

Management

Management Machine Learning Experimentation Metrics

6 Case Studies on The Benefits of Business Intelligence And Analytics

datapine

JANUARY 31, 2022

As Dan Jeavons Data Science Manager at Shell stated: “what we try to do is to think about minimal viable products that are going to have a significant business impact immediately and use that to inform the KPIs that really matter to the business”. Business intelligence and analytics allow users to know their businesses on a deeper level.

Business Intelligence

Business Intelligence Analytics Cost-Benefit ROI

How to Do Data Modeling the Right Way

erwin

MAY 27, 2020

What, then, should users look for in a data modeling product to support their governance/intelligence requirements in the data-driven enterprise? Nine Steps to Data Modeling. Provide metadata and schema visualization regardless of where data is stored.

Modeling

Modeling Metadata Data Governance Visualization

Are Data Governance Bottlenecks Holding You Back?

erwin

FEBRUARY 4, 2021

Without an accurate, high-quality, real-time enterprise data pipeline, it will be difficult to uncover the necessary intelligence to make optimal business decisions. So what’s holding organizations back from fully using their data to make better, smarter business decisions? Data Governance Bottlenecks. Regulations.

Data Governance

Data Governance Metadata Data Quality Risk Management

What’s Business Process Modeling Got to Do with It? – Choosing A BPM Tool

erwin

MARCH 21, 2019

As part of a data governance strategy, a BPM tool aids organizations in visualizing their business processes, system interactions and organizational hierarchies to ensure elements are aligned and core operations are optimized. The lack of a central metadata repository is a far too common thorn in an organization’s side.

Modeling

Modeling Metadata Data Governance IT

6 DataOps Best Practices to Increase Your Data Analytics Output AND Your Data Quality

Octopai

OCTOBER 26, 2022

DataOps is an approach to best practices for data management that increases the quantity of data analytics products a data team can develop and deploy in a given time while drastically improving the level of data quality. Just-in-Time” manufacturing increases production while optimizing resources.

Data Quality

Data Quality Data Analytics Analytics Manufacturing

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

This introduces the need for both polling and pushing the data to access and analyze in near-real time. From an operational standpoint, we designed a new shared responsibility model for data ingestion using AWS Glue instead of internal services (REST APIs) designed on Amazon EC2 to extract the data.

Optimization

Optimization Forecasting Data Lake Metadata

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

Despite soundings on this from leading thinkers such as Andrew Ng , the AI community remains largely oblivious to the important data management capabilities, practices, and – importantly – the tools that ensure the success of AI development and deployment. Further, data management activities don’t end once the AI model has been developed.

Data Governance

Data Governance IT Risk Data Lake

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

Prior to the creation of the data lake, Orca’s data was distributed among various data silos, each owned by a different team with its own data pipelines and technology stack. Moreover, running advanced analytics and ML on disparate data sources proved challenging.

Data Lake

Data Lake Analytics Snapshot Data Quality

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Data has become an invaluable asset for businesses, offering critical insights to drive strategic decision-making and operational optimization. The business end-users were given a tool to discover data assets produced within the mesh and seamlessly self-serve on their data sharing needs.

Data Governance

Data Governance Publishing Data-driven Metadata

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Ontotext

JULY 29, 2021

KGs bring the Semantic Web paradigm to the enterprises, by introducing semantic metadata to drive data management and content management to new levels of efficiency and breaking silos to let them synergize with various forms of knowledge management. The RDF data model and the other standards in W3C’s Semantic Web stack (e.g.,

Enterprise

Enterprise Metadata Knowledge Discovery Management

The Modern Data Lakehouse: An Architectural Innovation

Cloudera

SEPTEMBER 9, 2022

With this in mind, it’s clear that no “one size fits all” architecture will work here; we need a diverse set of data services, fit for each workload and purpose, backed by optimized compute engines and tools. . Data changes in numerous ways: the shape and form of the data changes; the volume, variety, and velocity changes.

Metadata

Metadata Machine Learning Unstructured Data Data Lake

Level up your Kafka applications with schemas

IBM Big Data Hub

NOVEMBER 21, 2023

Apache Kafka transfers data without validating the information in the messages. It does not have any visibility of what kind of data are being sent and received, or what data types it might contain. Kafka does not examine the metadata of your messages. Optimize your Kafka environment by using a schema registry.

Data Quality

Data Quality Metadata Data-driven Optimization

Clean up your Excel and CSV files without writing code using AWS Glue DataBrew

AWS Big Data

NOVEMBER 15, 2023

As the organization receives data from multiple external vendors, it often arrives in different formats, typically Excel or CSV files, with each vendor using their own unique data layout and structure. DataBrew is an excellent tool for data quality and preprocessing. For Matching conditions , choose Match all conditions.

Metadata

Metadata Sales Data Lake Big Data

Embedding AI Into Every Aspect of Your Business

Cloudera

JULY 20, 2021

So relying upon the past for future insights with data that is outdated due to changing customer preferences, the hyper-competitive world and emphasis on environment, society and governance produces non-relevant insights and sub-optimized returns. Quality data needs to be the normalizing factor.

Manufacturing

Manufacturing Forecasting IoT Insurance

Why Your Data Governance Strategy is Failing

Alation

OCTOBER 5, 2021

The main reasons that a company’s data strategy and governance protocols fail to deliver are somewhat universal, regardless of the industry sector. Without a doubt, no company can achieve lasting profitability and sustainable growth with a poorly constructed data governance methodology. Data governance and AI.

Data Governance

Data Governance Strategy Data Quality Metrics

Announcing Open Source DataOps Data Quality TestGen 3.0

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Webinars

Trending Sources

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Webinars

Data’s dark secret: Why poor quality cripples AI and growth

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

RDF-Star: Metadata Complexity Simplified

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Use open table format libraries on AWS Glue 5.0 for Apache Spark

What Is Active Metadata Management and How Does It Work?

How EUROGATE established a data mesh architecture using Amazon DataZone

Deep automation in machine learning

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

2024 Gartner Market Guide To DataOps

What are model governance and model operations?

Maximize your data dividends with active metadata

What is data governance? Best practices for managing data assets

Data architecture strategy for data quality

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

Introducing Cloudera Observability Premium

What is Active Metadata & Why it Matters: Key Insights from Gartner’s Market Guide

Biggest Trends in Data Visualization Taking Shape in 2022

7 enterprise data strategy trends

AWS Lake Formation 2023 year in review

Building a Beautiful Data Lakehouse

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

How Fujitsu implemented a global data mesh architecture and democratized data

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

What you need to know about product management for AI

6 Case Studies on The Benefits of Business Intelligence And Analytics

How to Do Data Modeling the Right Way

Are Data Governance Bottlenecks Holding You Back?

What’s Business Process Modeling Got to Do with It? – Choosing A BPM Tool

6 DataOps Best Practices to Increase Your Data Analytics Output AND Your Data Quality

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

HEMA accelerates their data governance journey with Amazon DataZone

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

The Modern Data Lakehouse: An Architectural Innovation

Level up your Kafka applications with schemas

Clean up your Excel and CSV files without writing code using AWS Glue DataBrew

Embedding AI Into Every Aspect of Your Business

Why Your Data Governance Strategy is Failing

Stay Connected