Blog, Data Quality and Publishing

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data.

Data Quality

Data Quality Metrics Data-driven Management

Drug Launch Case Study: Amazing Efficiency Using DataOps

DataKitchen

DECEMBER 9, 2024

A Drug Launch Case Study in the Amazing Efficiency of a Data Team Using DataOps How a Small Team Powered the Multi-Billion Dollar Acquisition of a Pharma Startup When launching a groundbreaking pharmaceutical product, the stakes and the rewards couldnt be higher. data engineers delivered over 100 lines of code and 1.5

Data Quality

Data Quality Data Lake Testing Statistics

Data Quality Is Free

Anmut

JANUARY 30, 2025

They made us realise that building systems, processes and procedures to ensure quality is built in at the outset is far more cost effective than correcting mistakes once made. How about data quality? Redman and David Sammon, propose an interesting (and simple) exercise to measure data quality.

Data Quality

Data Quality Cost-Benefit Statistics Data-driven

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How I Broke Our SLA and Delighted Our Customer

DataKitchen

MAY 17, 2025

The refresh was long past its deadline, the projects key data engineer was on vacation, and I was playing backup. At the moment, I was flying home from a data quality conference. And they caught a major problem: the new records we received from one source were completely out of sync with our other data. Where was I?

Testing

Testing Data Quality Data Warehouse Dashboards

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Data teams struggle to find a unified approach that enables effortless discovery, understanding, and assurance of data quality and security across various sources. Collaboration is seamless, with straightforward publishing and subscribing workflows, fostering a more connected and efficient work environment.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake. Data confidentiality and data quality are the two essential themes for data governance.

Data Quality

Data Quality Data Governance Data Lake Testing

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Poor-quality data can lead to incorrect insights, bad decisions, and lost opportunities. AWS Glue Data Quality measures and monitors the quality of your dataset. It supports both data quality at rest and data quality in AWS Glue extract, transform, and load (ETL) pipelines.

Data Quality

Data Quality Data Lake Visualization Data-driven

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

It’s the preferred choice when customers need more control and customization over the data integration process or require complex transformations. This flexibility makes Glue ETL suitable for scenarios where data must be transformed or enriched before analysis. The status and statistics of the CDC load are published into CloudWatch.

Data Integration

Data Integration Data Lake Statistics Data-driven

Take Your SQL Skills To The Next Level With These Popular SQL Books

datapine

SEPTEMBER 27, 2022

Our next book is dedicated to anyone who wants to start a career as a data scientist and is looking to get all the knowledge and skills in a way that is accessible and well-structured. Originally published in 2018, the book has a second edition that was released in January of 2022. 4) “SQL Performance Explained” by Markus Winand.

Business Intelligence

Business Intelligence Data Warehouse Data Processing Data mining

Quality Control Tips for Data Collection with Drone Surveying

Smart Data Collective

APRIL 5, 2022

Here at Smart Data Collective, we never cease to be amazed about the advances in data analytics. We have been publishing content on data analytics since 2008, but surprising new discoveries in big data are still made every year. One of the biggest trends shaping the future of data analytics is drone surveying.

Data Collection

Data Collection Data Quality Big Data Data-driven

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

Octopai

APRIL 19, 2021

He is the President of Knowledge Integrity, Inc and an expert in master data management, data quality, and business intelligence. His articles on TDWI deal with advice for analysts, customer data profiling, master data management technology, and machine learning. . It is published by Robert S.

Metadata

Metadata Management Business Intelligence Data Governance

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This blog post is co-written with Hardeep Randhawa and Abhay Kumar from HPE. Data quality checks When the files land in the processing zone, the Step Functions workflow invokes another Lambda function that converts the raw files to CSV format followed by stringent data quality checks.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

Advance top 2025 data initiatives with analyst firm-recognized erwin by Quest

erwin

JANUARY 23, 2025

Data intelligence software is continuously evolving to enable organizations to efficiently and effectively advance new data initiatives. With a variety of providers and offerings addressing data intelligence and governance needs, it can be easy to feel overwhelmed in selecting the right solution for your enterprise.

Metadata

Metadata Data Quality Data Governance Software

Why Data Governance Is Crucial for All Enterprise-Level Businesses

Cloudera

MARCH 3, 2022

The medical insurance company wasn’t hacked, but its customers’ data was compromised through a third-party vendor’s employee. In the 2020 O’Reilly Data Quality survey only 20% of respondents say their organizations publish information about data provenance or data lineage internally.

Data Governance

Data Governance Enterprise Data Quality Metadata

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

You will need to continually return to your business dashboard to make sure that it’s working, the data is accurate and it’s still answering the right questions in the most effective way. Testing will eliminate lots of data quality challenges and bring a test-first approach through your agile cycle.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

What is a data fabric architecture?

IBM Big Data Hub

MARCH 25, 2022

Automated data enrichment : To create the knowledge catalog, you need automated data stewardship services. These services include the ability to auto-discover and classify data, to detect sensitive information, to analyze data quality, to link business terms to technical metadata and to publish data to the knowledge catalog.

Metadata

Metadata Data Quality Data Governance Data Integration

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

AWS Big Data

JULY 25, 2024

This blog post is co-written with Raj Samineni from ATPCO. In today’s data-driven world, companies across industries recognize the immense value of data in making decisions, driving innovation, and building new products to serve their customers. Publish data assets. Create and configure an Amazon DataZone domain.

Data Lake

Data Lake Metadata Sales Publishing

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

DataKitchen

APRIL 26, 2021

At Workiva, they recognized that they are only as good as their data, so they centered their initial DataOps efforts around lowering errors. Hodges commented, “Our first focus was to up our game around data quality and lowering errors in production. Organizations should be optimizing and driving their data teams with data.” .

Measurement

Measurement Metrics Data-driven Dashboards

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

Figure 2: Example data pipeline with DataOps automation. In this project, I automated data extraction from SFTP, the public websites, and the email attachments. The automated orchestration published the data to an AWS S3 Data Lake. All the code, Talend job, and the BI report are version controlled using Git.

Testing

Testing Metadata Dashboards Statistics

Data Intelligence and Its Role in Combating Covid-19

erwin

MARCH 30, 2020

As a result, the data may be compromised, rendering faulty analyses and insights. To marry the epidemiological data to the population data it will require a tremendous amount of data intelligence about the: Source of the data; Currency of the data; Quality of the data; and.

Metadata

Metadata IT Data Governance Data Quality

SAP poaches Microsoft exec as its new global head of AI

CIO Business Intelligence

SEPTEMBER 1, 2023

Sun has a PhD from MIT and continued to publish academic research papers during his time at Microsoft, in addition to teaching at Seattle and Washington universities. In a recent blog post, Sun described how Microsoft researchers conducted experiments to compare the performance of different AI models for use in Dynamics 365.

IT

IT Forecasting Reporting Data Quality

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. This is something that you can learn more about in just about any technology blog. We would like to talk about data visualization and its role in the big data movement.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

BI Reporting Tools Can Make or Break Decision-Making!

Smarten

DECEMBER 17, 2024

According to a recent TechJury survey: Data analytics makes decision-making 5x faster for businesses. The top three business intelligence trends are data visualization, data quality management, and self-service business intelligence (BI). 7 out of 10 business rate data discovery as very important.

Reporting

Reporting Key Performance Indicator KPI Business Intelligence

The State of Planning in Dynamic Times

Jedox

NOVEMBER 19, 2020

In this blog we take a closer look at the recently published BARC study “Sound Decisions in Dynamic Times.” The improvement in data quality follows with 53%. This finding is in line with what US-based FP&A expert Brian Kalish recently stated in his guest blog on data analysis : Data is abundant.

Forecasting

Forecasting Reporting Data Quality Marketing

12 Cloud Computing Risks & Challenges Businesses Are Facing In These Days

datapine

MAY 31, 2022

In fact, the Foundry’s recently published Cloud Computing Study (2022) found that 84% of organizations have at least one application, or a portion of their computing infrastructure already in the cloud. This has increased the difficulty for IT to provide the governance, compliance, risks, and data quality management required.

Risk

Risk Cost-Benefit Business Intelligence Data-driven

Automate large-scale data validation using Amazon EMR and Apache Griffin

AWS Big Data

APRIL 4, 2024

Griffin is an open source data quality solution for big data, which supports both batch and streaming mode. In today’s data-driven landscape, where organizations deal with petabytes of data, the need for automated data validation frameworks has become increasingly critical.

Data Quality

Data Quality Data Lake Data Warehouse Data-driven

Using Strategic Data Governance to Manage GDPR/CCPA Complexity

erwin

JULY 12, 2019

It also helps enterprises put these strategic capabilities into action by: Understanding their business, technology and data architectures and their inter-relationships, aligning them with their goals and defining the people, processes and technologies required to achieve compliance.

Data Governance

Data Governance Management Metadata Risk Management

A summary of Gartner’s recent DataOps-driven data engineering best practices article

DataKitchen

FEBRUARY 21, 2023

If he is to take Gartner’s advice to heart, Marcus will have to add a set of tasks to his team’s daily data engineering tasks. When these enablers are implemented, such as through DataKitchen products, teams will work faster, produce higher-quality results, and will be happier. Learn More Implement DataOps Data Engineering yourself.

Data-driven

Data-driven Testing Risk Cost-Benefit

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

These formats, exemplified by Apache Iceberg, Apache Hudi, and Delta Lake, addresses persistent challenges in traditional data lake structures by offering an advanced combination of flexibility, performance, and governance capabilities. Auto generated keys Traditionally, Hudi required explicit configuration of primary keys for every table.

Snapshot

Snapshot Metadata Data Lake Optimization

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

AWS Big Data

DECEMBER 9, 2024

Given the importance of data in the world today, organizations face the dual challenges of managing large-scale, continuously incoming data while vetting its quality and reliability. AWS Glue is a serverless data integration service that you can use to effectively monitor and manage data quality through AWS Glue Data Quality.

Data Quality

Data Quality Publishing Snapshot Data Lake

Two Downs Make Two Ups: The Only Success Metrics That Matter For Your Data & Analytics Team

DataKitchen

MARCH 16, 2023

What are the metrics that matter? Gartner attempted to list every metric under the sun in their recent report , “T oolkit: Delivery Metrics for DataOps, Self-Service Analytics, ModelOps, and MLOps, ” published February 7, 2023. For example, Gartner’s DataOps metrics can be categorized into Velocity, Efficiency, and Quality.

Metrics

Metrics Data Analytics Analytics Measurement

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Since its uniquely metadata-driven, the abstraction layer of a data fabric makes it easier to model, integrate and query any data sources, build data pipelines, and integrate data in real-time. This improves data engineering productivity and time-to-value for data consumers. What’s a data mesh?

Management

Management Metadata Data Architecture Data Lake

Salesforce and the (single source of) Truth about Customer 360

Andrew White

DECEMBER 4, 2019

If you read my blog regularly then you know I rarely write about IT vendors. If I am moved to write research about a vendor, I’ll write it and publish it behind our pay wall, in the assumption the advice is valuable. This acquisition followed another with Mulesoft, a data integration vendor. That’s the way it is.

Digital Transformation

Digital Transformation Data Quality Data Integration Data Warehouse

The Modern Data Lakehouse: An Architectural Innovation

Cloudera

SEPTEMBER 9, 2022

As a data analyst or data scientist, we would all love to be able to do all these things, and much more. This is the promise of the modern data lakehouse architecture. Schema evolution: With fast-moving data and real-time data ingestion, we need new ways to keep up with data quality, consistency, accuracy, and overall integrity.

Metadata

Metadata Machine Learning Unstructured Data Data Lake

Self Service is Simply Efficient – Cloudera DataFlow Designer GA announcement

Cloudera

MARCH 14, 2023

Data quality issue? Good luck auditing data lineage and definitions where policies were never enforced. The new DataFlow Designer is more than just a new UI — it is a paradigm shift in the process of data flow development. Security breach? Massive cloud consumption bill you can’t account for?

Cost-Benefit

Cost-Benefit Data-driven Risk Data Governance

Harnessing Streaming Data: Insights at the Speed of Life

Sisense

OCTOBER 15, 2020

We live in a world of data: There’s more of it than ever before, in a ceaselessly expanding array of forms and locations. Dealing with Data is your window into the ways data teams are tackling the challenges of this new world to help their companies and their customers thrive. billion market by 2025.

Dashboards

Dashboards IoT Optimization Internet of Things

Automating Model Risk Compliance: Model Development

DataRobot Blog

MAY 10, 2022

It has been over a decade since the Federal Reserve Board (FRB) and the Office of the Comptroller of the Currency (OCC) published its seminal guidance focused on Model Risk Management ( SR 11-7 & OCC Bulletin 2011-12 , respectively). To reference SR 11-7: .

Risk

Risk Modeling Machine Learning Data Quality

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Improved Decision Making : Well-modeled data provides insights that drive informed decision-making across various business domains, resulting in enhanced strategic planning. Reduced Data Redundancy : By eliminating data duplication, it optimizes storage and enhances data quality, reducing errors and discrepancies.

Data-driven

Data-driven Modeling Enterprise Structured Data

Data Governance Program: Ensuring a Successful Delivery

Alation

AUGUST 17, 2022

Data governance policy should be owned by the top of the organization so data governance is given appropriate attention — including defining what’s a potential risk and what is poor data quality.” It comes down to the question: What is the value of your data? Subscribe to Alation's Blog.

Data Governance

Data Governance Risk Data-driven Measurement

A Primer On Web Analytics Visitor Tracking Cookies

Occam's Razor

JULY 24, 2008

If you are an Analyst or a Marketer or a Website Owner or a Website User it is critical that you read this short blog post – your data will make so much more sense after are done. Email providers like hotmail (! :) or gmail.com, ecommerce websites like amazon.com or crutchfield.com, banks, even blogging platforms!

Analytics

Analytics Key Performance Indicator Measurement Metrics

Fear has Replaced Apathy as the Number One Enemy of Data

Jim Harris

JULY 13, 2015

Background: “Apathy is the enemy of data quality”. I began work on data quality in the late 1980s at the great Bell Laboratories. This led me to conclude, by about 2000, that apathy was the number one enemy of data quality. I especially wanted to identify industries that were ripe for data quality.

Data Quality

Data Quality Data-driven Big Data Consulting

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

The data mesh approach distributes data ownership and decentralizes data architecture, paving the way for enhanced agility and scalability. With distributed ownership there is a need for effective governance to ensure the success of any data initiative. Business Glossaries – what is the business meaning of our data?

Metadata

Metadata Data Quality Data Governance Modeling

Detect Accounting Fraud with AI

DataRobot

AUGUST 18, 2021

Along with raw data entries in these statements, additional financial ratios such as year-on-year changes in return on assets or book-to-market value are useful machine learning features as well. The dataset used in the following example was published in the Journal of Accounting Research. See DataRobot in Action. Request a Demo.

Machine Learning

Machine Learning Data Quality Modeling Sales

Metadata enrichment – highly scalable data classification and data discovery

IBM Big Data Hub

JULY 28, 2022

This feature significantly increases the productivity of the data stewards who provide business context to data by ensuring data quality, usefulness and protection for broader consumption. The post Metadata enrichment – highly scalable data classification and data discovery appeared first on Journey to AI Blog.

Metadata

Metadata Machine Learning Data Quality Statistics

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Drug Launch Case Study: Amazing Efficiency Using DataOps

Webinars

Trending Sources

Data Quality Is Free

Webinars

How I Broke Our SLA and Delighted Our Customer

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Take Your SQL Skills To The Next Level With These Popular SQL Books

Quality Control Tips for Data Collection with Drone Surveying

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

Advance top 2025 data initiatives with analyst firm-recognized erwin by Quest

Why Data Governance Is Crucial for All Enterprise-Level Businesses

Accomplish Agile Business Intelligence & Analytics For Your Business

What is a data fabric architecture?

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

A Day in the Life of a DataOps Engineer

Data Intelligence and Its Role in Combating Covid-19

SAP poaches Microsoft exec as its new global head of AI

Biggest Trends in Data Visualization Taking Shape in 2022

BI Reporting Tools Can Make or Break Decision-Making!

The State of Planning in Dynamic Times

12 Cloud Computing Risks & Challenges Businesses Are Facing In These Days

Automate large-scale data validation using Amazon EMR and Apache Griffin

Using Strategic Data Governance to Manage GDPR/CCPA Complexity

A summary of Gartner’s recent DataOps-driven data engineering best practices article

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

Two Downs Make Two Ups: The Only Success Metrics That Matter For Your Data & Analytics Team

Augmented data management: Data fabric versus data mesh

Salesforce and the (single source of) Truth about Customer 360

The Modern Data Lakehouse: An Architectural Innovation

Self Service is Simply Efficient – Cloudera DataFlow Designer GA announcement

Harnessing Streaming Data: Insights at the Speed of Life

Automating Model Risk Compliance: Model Development

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

Data Governance Program: Ensuring a Successful Delivery

A Primer On Web Analytics Visitor Tracking Cookies

Fear has Replaced Apathy as the Number One Enemy of Data

Empowering data mesh: The tools to deliver BI excellence

Detect Accounting Fraud with AI

Metadata enrichment – highly scalable data classification and data discovery

Stay Connected