Data Architecture, Metadata and Publishing

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Need for a data mesh architecture Because entities in the EUROGATE group generate vast amounts of data from various sourcesacross departments, locations, and technologiesthe traditional centralized data architecture struggles to keep up with the demands for real-time insights, agility, and scalability.

IoT

IoT Machine Learning Metadata Data-driven

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS. Each file arrives as a pair with a tail metadata file in CSV format containing the size and name of the file.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Metadata, the Neglected Stepchild of IT

Data Virtualization

DECEMBER 8, 2022

Reading Time: 3 minutes While cleaning up our archive recently, I found an old article published in 1976 about data dictionary/directory systems (DD/DS). Nowadays, we no longer use the term DD/DS, but “data catalog” or simply “metadata system”. It was written by L.

Metadata

Metadata IT Data Integration Publishing

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Cloudera

SEPTEMBER 29, 2022

Each of these trends claim to be complete models for their data architectures to solve the “everything everywhere all at once” problem. Data teams are confused as to whether they should get on the bandwagon of just one of these trends or pick a combination. First, we describe how data mesh and data fabric could be related.

Data Architecture

Data Architecture Data Warehouse Metadata Sales

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Over the past decade, the successful deployment of large scale data platforms at our customers has acted as a big data flywheel driving demand to bring in even more data, apply more sophisticated analytics, and on-board many new data practitioners from business analysts to data scientists.

Snapshot

Snapshot Metadata Cost-Benefit Data Architecture

Using Strategic Data Governance to Manage GDPR/CCPA Complexity

erwin

JULY 12, 2019

Modern, strategic data governance , which involves both IT and the business, enables organizations to plan and document how they will discover and understand their data within context, track its physical existence and lineage, and maximize its security, quality and value. Five Steps to GDPR/CCPA Compliance. How erwin Can Help.

Data Governance

Data Governance Management Metadata Risk Management

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

This blog post introduces Amazon DataZone and explores how VW used it to build their data mesh to enable streamlined data access across multiple data lakes. Amazon DataZone projects enable collaboration with teams through data assets and the ability to manage and monitor data assets across projects.

Data Lake

Data Lake Publishing Metadata Data-driven

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

Octopai

APRIL 19, 2021

Aptly named, metadata management is the process in which BI and Analytics teams manage metadata, which is the data that describes other data. In other words, data is the context and metadata is the content. Without metadata, BI teams are unable to understand the data’s full story.

Metadata

Metadata Management Business Intelligence Data Governance

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

With this launch, you can query data regardless of where it is stored with support for a wide range of use cases, including analytics, ad-hoc querying, data science, machine learning, and generative AI. We’ve simplified data architectures, saving you time and costs on unnecessary data movement, data duplication, and custom solutions.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

First, you must understand the existing challenges of the data team, including the data architecture and end-to-end toolchain. Figure 2: Example data pipeline with DataOps automation. In this project, I automated data extraction from SFTP, the public websites, and the email attachments. Monitoring Job Metadata.

Testing

Testing Metadata Dashboards Statistics

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

They chose AWS Glue as their preferred data integration tool due to its serverless nature, low maintenance, ability to control compute resources in advance, and scale when needed. To share the datasets, they needed a way to share access to the data and access to catalog metadata in the form of tables and views.

Metadata

Metadata Data Lake Machine Learning Big Data

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

AWS Big Data

JULY 25, 2024

Instead of a central data platform team with a data warehouse or data lake serving as the clearinghouse of all data across the company, a data mesh architecture encourages distributed ownership of data by data producers who publish and curate their data as products, which can then be discovered, requested, and used by data consumers.

Data Lake

Data Lake Metadata Sales Publishing

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Lakehouse provides an open data architecture that reduces data silos and unifies data across Amazon Simple Storage Service (Amazon S3) data lakes, Redshift data warehouses, and third-party and federated data sources. connection testing, metadata retrieval, and data preview.

Analytics

Analytics Data Lake Metadata Data Warehouse

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

Companies can now capitalize on the value in all their data, by delivering a hybrid data platform for modern data architectures with data anywhere. Cloudera Data Platform (CDP) is designed to address the critical requirements for modern data architectures today and tomorrow.

IT

IT Data Architecture Unstructured Data Big Data

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

The business end-users were given a tool to discover data assets produced within the mesh and seamlessly self-serve on their data sharing needs. The integration of Databricks Delta tables into Amazon DataZone is done using the AWS Glue Data Catalog. The following diagram illustrates the architecture of both accounts.

Data Governance

Data Governance Publishing Data-driven Metadata

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Data fabric and data mesh are emerging data management concepts that are meant to address the organizational change and complexities of understanding, governing and working with enterprise data in a hybrid multicloud ecosystem. The good news is that both data architecture concepts are complimentary.

Management

Management Metadata Data Architecture Data Lake

The Future Is Hybrid Data, Embrace It

CIO Business Intelligence

JUNE 23, 2022

Companies can now capitalize on the value in all their data, by delivering a hybrid data platform for modern data architectures with data anywhere. Cloudera Data Platform (CDP) is designed to address the critical requirements for modern data architectures today and tomorrow.

IT

IT Data Architecture Unstructured Data Big Data

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. To address this challenge, organizations can deploy a data mesh using AWS Lake Formation that connects the multiple EMR clusters. The data resides on Amazon S3, which reduces the storage costs significantly.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

AWS Glue Data Catalog stores information as metadata tables, where each table specifies a single data store. The AWS Glue crawler writes metadata to the Data Catalog by classifying the data to determine the format, schema, and associated properties of the data.

Metadata

Metadata Dashboards Metrics Visualization

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

These inputs reinforced the need of a unified data strategy across the FinOps teams. We decided to build a scalable data management product that is based on the best practices of modern data architecture. Our source system and domain teams were mapped as data producers, and they would have ownership of the datasets.

Finance

Finance Metadata Big Data Recreation/Entertainment

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. The File Manager Lambda function consumes those messages, parses the metadata, and inserts the metadata to the DynamoDB table odpf_file_tracker.

Data Lake

Data Lake Data Processing Metadata Snapshot

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

Limiting growth by (data integration) complexity Most operational IT systems in an enterprise have been developed to serve a single business function and they use the simplest possible model for this. In both cases, semantic metadata is the glue that turns knowledge graphs into hubs of data, metadata, and content.

Metadata

Metadata Slice and Dice Data Integration Enterprise

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

In this post, we are excited to summarize the features that the AWS Glue Data Catalog, AWS Glue crawler, and Lake Formation teams delivered in 2022. Whether you are a data platform builder, data engineer, data scientist, or any technology leader interested in data lake solutions, this post is for you.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Apache Iceberg overview Iceberg is an open-source table format that brings the power of SQL tables to big data files. It enables ACID transactions on tables, allowing for concurrent data ingestion, updates, and queries, all while using familiar SQL. The Iceberg table is synced with the AWS Glue Data Catalog.

Data Lake

Data Lake Sales Data Warehouse Snapshot

How A Data Catalog Enhances Data Risk Management

Alation

JANUARY 9, 2023

But data leaders must work quickly, and use the right tools, to understand, manage, and protect data while complying with related regulations and standards. The Australian Prudential Regulation Authority (APRA) released nonbinding standards covering data risk management. Download the complete white paper now.

Risk Management

Risk Management Risk Management Metadata

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

The data mesh framework In the dynamic landscape of data management, the search for agility, scalability, and efficiency has led organizations to explore new, innovative approaches. One such innovation gaining traction is the data mesh framework. This empowers individual teams to own and manage their data.

Metadata

Metadata Data Quality Data Governance Modeling

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

In this example, we use Amazon EMR Serverless in combination with the open source library Pydeequ to act as an external system for data quality. If the asset has AWS Glue Data Quality enabled, you can now quickly visualize the data quality score directly in the catalog search pane.

Data Quality

Data Quality Visualization Metadata Metrics

Insights from Gartner Data & Analytics Summit Orlando 2023

Alation

MARCH 31, 2023

Ehtisham Zaidi, Gartner’s VP of data management, and Robert Thanaraj, Gartner’s director of data management, gave an update on the fabric versus mesh debate in light of what they call the “active metadata era” we’re currently in. The foundations of successful data governance The state of data governance was also top of mind.

Data Analytics

Data Analytics Analytics Metadata Data Governance

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

Figure 1 Shows the overall idea of a data mesh with the major components: What Is a Data Mesh and How Does It Work? Think of data mesh as an operational mode for organizations with a domain-driven, decentralized data architecture. What Is a Data Product and Who Owns Them?

Metadata

Metadata Data-driven Data Quality Data Architecture

Choosing A Graph Data Model to Best Serve Your Use Case

Ontotext

MARCH 27, 2024

For example, GPS, social media, cell phone handoffs are modeled as graphs while data catalogs, data lineage and MDM tools leverage knowledge graphs for linking metadata with semantics. LPG lacks schema and semantics, which makes it inappropriate for publishing and sharing of data. This makes LPGs inflexible.

Modeling

Modeling Metadata Data Quality Enterprise

How Zurich Insurance Group built a log management solution on AWS

AWS Big Data

JULY 16, 2024

Priority 2 logs, such as operating system security logs, firewall, identity provider (IdP), email metadata, and AWS CloudTrail , are ingested into Amazon OpenSearch Service to enable the following capabilities. She currently serves as the Global Head of Cyber Data Management at Zurich Group.

Insurance

Insurance Management Cost-Benefit Optimization

Celebrating 25 Years of TDAN.com!

TDAN

JUNE 21, 2022

Twenty-five years ago today, I published the first issue of The Data Administration Newsletter. It only took a few months to recognize that there was an audience for an “online” publication focused on data administration. […].

Publishing

Publishing Data Quality Data Architecture IT

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

Just as Istio applies security governance to containers in Kubernetes, the data fabric will apply policies to data according to similar principles, in real time. Data discoverability. Data fabric promotes data discoverability. This enables access to data at all stages of its value lifecycle.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

The Book Look: Technical Writing for Quality

TDAN

APRIL 4, 2023

Why would Technics Publications publish a book outside its specialty of data management? We published Graham Witt’s Technical Writing for Quality for two reasons. First, Graham is a world-renowned data modeler and the author of Data Modeling for Quality, and therefore many of his examples are in the field of data management.

Publishing

Publishing Management Modeling Data Architecture

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Andrew White

JANUARY 9, 2022

I try to relate as much published research as I can in the time available to draft a response. – In the webinar and Leadership Vision deck for Data and Analytics we called out AI engineering as a big trend. – In the webinar and Leadership Vision deck for Data and Analytics we called out AI engineering as a big trend.

Analytics

Analytics Measurement Data-driven Modeling

GraphDB Empowers Scientific Projects to Fight COVID-19 and Publish Knowledge Graphs

Ontotext

APRIL 15, 2020

The FHIRCat group at the Mayo Clinic has published the CORD-19-on-FHIR dataset for COVID-19 research. Ontotext’s knowledge graph technology is at the core of Cochrane’s data architecture developed by our partners from Data Language. The Mayo Clinic.

Publishing

Publishing Metadata Data mining Data Architecture

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Data Environment First off, the solutions you consider should be compatible with your current data architecture. We have outlined the requirements that most providers ask for: Data Sources Strategic Objective Use native connectivity optimized for the data source. addresses).

Analytics

Analytics Cost-Benefit Visualization Dashboards

Knowledge Graphs 101: The Story (and Benefits) Behind the Hype

Ontotext

NOVEMBER 11, 2024

Knowledge graphs, while not as well-known as other data management offerings, are a proven dynamic and scalable solution for addressing enterprise data management requirements across several verticals. The RDF-star extension makes it easy to model provenance and other structured metadata.

Metadata

Metadata Knowledge Discovery Data Integration Management

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

AWS Big Data

NOVEMBER 14, 2024

The solution uses the following key services: Amazon API Gateway – API Gateway is a fully managed service that makes it straightforward developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the entry point for applications to access data, business logic, or functionality from your backend services.

Data Lake

Data Lake Metadata Testing Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Trending Sources

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

Webinars

Metadata, the Neglected Stepchild of IT

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Introducing Apache Iceberg in Cloudera Data Platform

Using Strategic Data Governance to Manage GDPR/CCPA Complexity

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

A Day in the Life of a DataOps Engineer

How Cargotec uses metadata replication to enable cross-account data sharing

How Cloudera Data Flow Enables Successful Data Mesh Architectures

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

Top analytics announcements of AWS re:Invent 2024

The Future Is Hybrid Data, Embrace It

HEMA accelerates their data governance journey with Amazon DataZone

Augmented data management: Data fabric versus data mesh

The Future Is Hybrid Data, Embrace It

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

You Cannot Get to the Moon on a Bike!

AWS Lake Formation 2022 year in review

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

How A Data Catalog Enhances Data Risk Management

Empowering data mesh: The tools to deliver BI excellence

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

Insights from Gartner Data & Analytics Summit Orlando 2023

What is Data Mesh?

Choosing A Graph Data Model to Best Serve Your Use Case

How Zurich Insurance Group built a log management solution on AWS

Celebrating 25 Years of TDAN.com!

Data platform trinity: Competitive or complementary?

The Book Look: Technical Writing for Quality

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

GraphDB Empowers Scientific Projects to Fight COVID-19 and Publish Knowledge Graphs

What Is Embedded Analytics?

Knowledge Graphs 101: The Story (and Benefits) Behind the Hype

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

Stay Connected