Data Architecture, Data Quality and Reference

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

The Race For Data Quality In A Medallion Architecture The Medallion architecture pattern is gaining traction among data teams. It is a layered approach to managing and transforming data. It sounds great, but how do you prove the data is correct at each layer?

Data Quality

Data Quality Testing Metrics Reporting

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

Today, we are pleased to announce that Amazon DataZone is now able to present data quality information for data assets. Other organizations monitor the quality of their data through third-party solutions. Additionally, Amazon DataZone now offers APIs for importing data quality scores from external systems.

Data Quality

Data Quality Visualization Metadata Metrics

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

AWS Glue Data Quality allows you to measure and monitor the quality of data in your data repositories. It’s important for business users to be able to see quality scores and metrics to make confident business decisions and debug data quality issues. An AWS Glue crawler crawls the results.

Data Quality

Data Quality Metrics Visualization Dashboards

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

SageMaker still includes all the existing ML and AI capabilities you’ve come to know and love for data wrangling, human-in-the-loop data labeling with Amazon SageMaker Ground Truth , experiments, MLOps, Amazon SageMaker HyperPod managed distributed training, and more. Having confidence in your data is key.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

AWS Big Data

OCTOBER 9, 2024

Some customers build custom in-house data parity frameworks to validate data during migration. Others use open source data quality products for data parity use cases. This takes away important person hours from the actual migration effort into building and maintaining a data parity framework.

Data Quality

Data Quality Data Lake Data Warehouse Metrics

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake. Data confidentiality and data quality are the two essential themes for data governance.

Data Quality

Data Quality Data Governance Data Lake Testing

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This complex process involves suppliers, logistics, quality control, and delivery. This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

The future of data: A 5-pillar approach to modern data management

CIO Business Intelligence

DECEMBER 11, 2024

To succeed in todays landscape, every company small, mid-sized or large must embrace a data-centric mindset. This article proposes a methodology for organizations to implement a modern data management function that can be tailored to meet their unique needs. Implementing ML capabilities can help find the right thresholds.

Management

Management Data Governance Data Science Reporting

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. For more detailed configuration, refer to Write properties in the Iceberg documentation.

Snapshot

Snapshot Management Metadata Big Data

Data integrity vs. data quality: Is there a difference?

IBM Big Data Hub

JULY 13, 2023

When we talk about data integrity, we’re referring to the overarching completeness, accuracy, consistency, accessibility, and security of an organization’s data. Together, these factors determine the reliability of the organization’s data. In short, yes.

Data Quality

Data Quality Data Integration Metadata Cost-Benefit

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The Business Application Research Center (BARC) warns that data governance is a highly complex, ongoing program, not a “big bang initiative,” and it runs the risk of participants losing trust and interest over time. The program must introduce and support standardization of enterprise data.

Data Governance

Data Governance Management Metadata Data Quality

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

In turn, they both must also have the data literacy skills to be able to verify the data’s accuracy, ensure its security, and provide or follow guidance on when and how it should be used. Data democratization uses a fit-for-purpose data architecture that is designed for the way today’s businesses operate, in real-time.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

First, you must understand the existing challenges of the data team, including the data architecture and end-to-end toolchain. Based on business rules, additional data quality tests check the dimensional model after the ETL job completes. A DataOps implementation project consists of three steps.

Testing

Testing Metadata Dashboards Statistics

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

Data governance is increasingly top-of-mind for customers as they recognize data as one of their most important assets. Effective data governance enables better decision-making by improving data quality, reducing data management costs, and ensuring secure access to data for stakeholders.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

Uncomfortable truth incoming: Most people in your organization don’t think about the quality of their data from intake to production of insights. However, as a data team member, you know how important data integrity (and a whole host of other aspects of data management) is. Means of ensuring data integrity.

Data Integration

Data Integration Testing Data Quality Data-driven

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Cloudera

JUNE 13, 2024

After countless open-source innovations ushered in the Big Data era, including the first commercial distribution of HDFS (Apache Hadoop Distributed File System), commonly referred to as Hadoop, the two companies joined forces, giving birth to an entire ecosystem of technology and tech companies.

Big Data

Big Data Machine Learning Contextual Data Data Lake

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

Mark: The first element in the process is the link between the source data and the entry point into the data platform. At Ramsey International (RI), we refer to that layer in the architecture as the foundation, but others call it a staging area, raw zone, or even a source data lake. What is a data fabric?

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Migrating to Amazon Redshift offers organizations the potential for improved price-performance, enhanced data processing, faster query response times, and better integration with technologies such as machine learning (ML) and artificial intelligence (AI).

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

The goal of a data product is to solve the long-standing issue of data silos and data quality. Independent data products often only have value if you can connect them, join them, and correlate them to create a higher order data product that creates additional insights.

Technology

Technology Data-driven Machine Learning Sales

The latest edition of The Data & Analytics Dictionary is now out

Peter James Thomas

AUGUST 2, 2019

Data Architecture – Definition (2). Data Catalogue. Data Community. Data Domain (contributor: Taru Väre ). Data Enrichment. Data Federation. Data Function. Data Model. Data Operating Model. Geospatial Data. Reference Data (contributor: George Firican ).

Analytics

Analytics Data Analytics Data Architecture Statistics

A step-by-step guide to setting up a data governance program

IBM Big Data Hub

FEBRUARY 9, 2023

Realize that a data governance program cannot exist on its own – it must solve business problems and deliver outcomes. Start by identifying business objectives, desired outcomes, key stakeholders, and the data needed to deliver these objectives. So where are you in your data governance journey?

Data Governance

Data Governance Business Objectives Data Quality Measurement

Introduction to Data Technical Debt

TDAN

OCTOBER 18, 2022

“Technical debt” refers to the implied cost of future refactoring or rework to improve the quality of an asset to make it easy to understand, work with, maintain, and extend.

Software

Software Data Architecture IT Data Quality

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

The data mesh framework In the dynamic landscape of data management, the search for agility, scalability, and efficiency has led organizations to explore new, innovative approaches. One such innovation gaining traction is the data mesh framework. Business Glossaries – what is the business meaning of our data?

Metadata

Metadata Data Quality Data Governance Modeling

The New Normal for FP&A: Data Analytics

Jedox

OCTOBER 22, 2020

The term “data analytics” refers to the process of examining datasets to draw conclusions about the information they contain. Data analysis techniques enhance the ability to take raw data and uncover patterns to extract valuable insights from it.

Data Analytics

Data Analytics Analytics Unstructured Data Data mining

A Simple Data Capability Framework

Peter James Thomas

MAY 3, 2019

Control of Data to ensure it is Fit-for-Purpose. This refers to a wide range of activities from Data Governance to Data Management to Data Quality improvement and indeed related concepts such as Master Data Management. Data Architecture / Infrastructure.

Strategy

Strategy Data Architecture Data Quality Data Strategy

How Data Management and Big Data Analytics Speed Up Business Growth

BizAcuity

APRIL 14, 2022

Big Data technology in today’s world. Did you know that the big data and business analytics market is valued at $198.08 Or that the US economy loses up to $3 trillion per year due to poor data quality? quintillion bytes of data which means an average person generates over 1.5 megabytes of data every second?

Big Data

Big Data Data Analytics Management Unstructured Data

Unlocking the power of data governance by understanding key challenges

IBM Big Data Hub

JANUARY 25, 2023

Breaking down these silos to encourage data access, data sharing and collaboration will be an important challenge for organizations in the coming years. The right data architecture to link and gain insight across silos requires the communication and coordination of a strategic data governance program.

Data Governance

Data Governance Data-driven Strategy Data Architecture

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

AWS Big Data

JULY 25, 2024

Consume data assets as part of analyzing data to generate insights. Part 1: Set up account governance and identity management Before you start, compare your current cloud environment, including data architecture, to ATPCO’s environment. Clear Data quality unless you have already set up AWS Glue data quality.

Data Lake

Data Lake Metadata Sales Publishing

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

Bad data tax is rampant in most organizations. Currently, every organization is blindly chasing the GenAI race, often forgetting that data quality and semantics is one of the fundamentals to achieving AI success. Sadly, data quality is losing to data quantity, resulting in “ Infobesity ”. “Any

Metadata

Metadata Data Lake Data Warehouse Data Quality

Enterprise Analytics: Key Challenges & Strategies

Alation

OCTOBER 12, 2021

Like many, the team at Cbus wanted to use data to more effectively drive the business. “Finding the right data was a real challenge,” recalls John Gilbert, Data Governance Manager. “There was no single source of reference, there was no catalog to leverage, and it was unclear who to ask or seek assistance from.”

Strategy

Strategy Enterprise Analytics Data Governance

A data strategy checklist for the journey to the data-driven enterprise

BI-Survey

DECEMBER 22, 2020

Business users often think that data is something technical that it is not their concern. While IT is happy to look after the technical storage and backup of data, they refer to line of business experts when it comes to quality and usability. They believe the IT department should take care of it.

Data-driven

Data-driven Data Strategy Strategy Enterprise

The Chief Marketing Officer and the CDO – A Modern Fable

Peter James Thomas

OCTOBER 30, 2018

All the references I can find to it are modern pieces comparing it to the CDO role, so perhaps it is apochryphal. This may purely be focused on cultural aspects of how an organisation records, shares and otherwise uses data. It may be to build a new (or a first) Data Architecture. It may be to improve Data Quality.

Marketing

Marketing Strategy Data Architecture Data Strategy

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

Both use cases use semantic metadata, which describes information sources with respect to a unified conceptual model , that includes ontologies, data schema, taxonomies, reference data, or other domain knowledge. They sift through documents, generate metadata, and store it in the knowledge graph.

Metadata

Metadata Slice and Dice Data Integration Enterprise

DAMA International Community Corner: Updates from DAMA-I

TDAN

AUGUST 20, 2019

We hope your Data Management career and programs are progressing well. If you have issues, please refer to DAMA.org for references, as well as the DAMA Data Management Body of Knowledge (DMBok). Good day from DAMA International. You can purchase the DMBoK at your favorite book source or via website link.

Management

Management Data Quality Data Architecture Data Governance

Building a Semantic Capability Stack to Support FAIR Knowledge Graphs at Scale

Ontotext

FEBRUARY 7, 2024

They have to misallocate resources because 80% of the time the data scientists are busy doing data finding, accessing, cleansing, etc. This also results in the information loss I’ve already mentioned and severely impacts our insight creation and monetizing the data. So Schemata are interoperable by design.

Metadata

Metadata Data Integration Measurement Data-driven

What Is a Data Fabric and How Does a Data Catalog Support It?

Alation

JANUARY 25, 2022

The data catalog is a foundational layer of the data fabric. This zoomed-in version has references to corresponding vendor markets removed.). Using this diagram as our guide, this blog will deep-dive into each layer of the data fabric, starting with the data catalog. Alation Data Catalog for the data fabric.

Metadata

Metadata IT Data-driven Metrics

Strategically Approaching Graph Technologies

Ontotext

FEBRUARY 26, 2024

If you want to convert your data into the right insights to drive business decisions and processes, you need this data to be easily accessible and stored in a format that is flexible, accurate, and machine-readable. It must retain the context and insight of the original data and be traceable as it flows through the organization.

Technology

Technology Cost-Benefit Data-driven Metadata

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Andrew White

JANUARY 9, 2022

Most of D&A concerns and activities are done within EA in the Info/Data architecture domain/phases. – I remember that I tried to answer this live during the webinar. – We see most, if not all, of data management being augmented with ML. – I am not totally sure what you mean by data capture.

Analytics

Analytics Measurement Data-driven Modeling

Convergent Evolution

Peter James Thomas

AUGUST 18, 2018

Convergent Evolution refers to something else. Even back then, these were used for activities such as Analytics , Dashboards , Statistical Modelling , Data Mining and Advanced Visualisation. Of course some architectures featured both paradigms as well. So far so simple.

Data Lake

Data Lake Data Warehouse Data mining Statistics

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

It allows organizations to see how data is being used, where it is coming from, its quality, and how it is being transformed. DataOps Observability includes monitoring and testing the data pipeline, data quality, data testing, and alerting. Data observability and data lineage are complementary concepts.

Testing

Testing Data Governance Data Quality Data-driven

The Race For Data Quality in a Medallion Architecture

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

Webinars

Trending Sources

Visualize data quality scores and metrics generated by AWS Glue Data Quality

Webinars

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

The future of data: A 5-pillar approach to modern data management

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Data integrity vs. data quality: Is there a difference?

What is data governance? Best practices for managing data assets

Data democratization: How data architecture can drive business decisions and AI initiatives

A Day in the Life of a DataOps Engineer

AWS Lake Formation 2022 year in review

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Data Integrity, the Basis for Reliable Insights

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Demystifying Modern Data Platforms

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Automate discovery of data relationships using ML and Amazon Neptune graph technology

The latest edition of The Data & Analytics Dictionary is now out

A step-by-step guide to setting up a data governance program

Introduction to Data Technical Debt

Empowering data mesh: The tools to deliver BI excellence

The New Normal for FP&A: Data Analytics

A Simple Data Capability Framework

How Data Management and Big Data Analytics Speed Up Business Growth

Unlocking the power of data governance by understanding key challenges

Create an end-to-end data strategy for Customer 360 on AWS

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

How Knowledge Graphs Power Data Mesh and Data Fabric

Enterprise Analytics: Key Challenges & Strategies

A data strategy checklist for the journey to the data-driven enterprise

The Chief Marketing Officer and the CDO – A Modern Fable

You Cannot Get to the Moon on a Bike!

DAMA International Community Corner: Updates from DAMA-I

Building a Semantic Capability Stack to Support FAIR Knowledge Graphs at Scale

What Is a Data Fabric and How Does a Data Catalog Support It?

Strategically Approaching Graph Technologies

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Convergent Evolution

“You Complete Me,” said Data Lineage to DataOps Observability.

Stay Connected