Data Architecture, Definition and Metadata

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Yet, despite growing investments in advanced analytics and AI, organizations continue to grapple with a persistent and often underestimated challenge: poor data quality. Fragmented systems, inconsistent definitions, legacy infrastructure and manual workarounds introduce critical risks.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. This allows the existing data to be interpreted as if it were originally written in any of these formats.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

erwin

OCTOBER 24, 2019

Untapped data, if mined, represents tremendous potential for your organization. While there has been a lot of talk about big data over the years, the real hero in unlocking the value of enterprise data is metadata , or the data about the data. Metadata Is the Heart of Data Intelligence.

Metadata

Metadata Management Data-driven Data Architecture

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data Management 20/20: Anatomy of a Business Glossary Definition

TDAN

AUGUST 18, 2020

Standards exist for naming conventions, abbreviations and other pertinent metadata properties. Consistent business meaning is important because distinctions between business terms are not typically well defined or documented. What are the standards for writing […].

Management

Management Metadata Data Architecture IT

5 Ways Data Modeling Is Critical to Data Governance

erwin

JANUARY 9, 2020

That’s because it’s the best way to visualize metadata , and metadata is now the heart of enterprise data management and data governance/ intelligence efforts. So here’s why data modeling is so critical to data governance. erwin Data Modeler: Where the Magic Happens.

Data Governance

Data Governance Modeling Metadata Unstructured Data

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Cloudera

SEPTEMBER 29, 2022

Each of these trends claim to be complete models for their data architectures to solve the “everything everywhere all at once” problem. Data teams are confused as to whether they should get on the bandwagon of just one of these trends or pick a combination. First, we describe how data mesh and data fabric could be related.

Data Architecture

Data Architecture Data Warehouse Metadata Sales

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

This solution only replicates metadata in the Data Catalog, not the actual underlying data. To have a redundant data lake using Lake Formation and AWS Glue in an additional Region, we recommend replicating the Amazon S3-based storage using S3 replication , S3 sync, aws-s3-copy-sync-using-batch or S3 Batch replication process.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

Data governance definition Data governance is a system for defining who within an organization has authority and control over data assets and how those data assets may be used. It encompasses the people, processes, and technologies required to manage and protect data assets.

Data Governance

Data Governance Management Metadata Data Quality

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

First, you must understand the existing challenges of the data team, including the data architecture and end-to-end toolchain. Second, you must establish a definition of “done.” In DataOps, the definition of done includes more than just some working code. Definition of Done. Monitoring Job Metadata.

Testing

Testing Metadata Dashboards Statistics

The Future of Data Lineage and the Role of Metadata

Alation

AUGUST 18, 2022

The challenge today is to think more broadly about what these data things could or should be. It’s important to realize that we need visibility into lineage and relationships between all data and data-related assets, including business terms, metric definitions, policies, quality rules, access controls, algorithms, etc.

Metadata

Metadata Visualization Statistics Data Architecture

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

AWS Big Data

NOVEMBER 7, 2024

BladeBridge offers a comprehensive suite of tools that automate much of the complex conversion work, allowing organizations to quickly and reliably transition their data analytics capabilities to the scalable Amazon Redshift data warehouse. Amazon Redshift is a fully managed data warehouse service offered by Amazon Web Services (AWS).

Data Warehouse

Data Warehouse Reporting Big Data Data Lake

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

When evolving such a partition definition, the data in the table prior to the change is unaffected, as is its metadata. Only data that is written to the table after the evolution is partitioned with the new definition, and the metadata for this new set of data is kept separately. SparkActions.get().expireSnapshots(iceTable).expireOlderThan(TimeUnit.DAYS.toMillis(7)).execute()

Data Lake

Data Lake Metadata Snapshot Analytics

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

They chose AWS Glue as their preferred data integration tool due to its serverless nature, low maintenance, ability to control compute resources in advance, and scale when needed. To share the datasets, they needed a way to share access to the data and access to catalog metadata in the form of tables and views.

Metadata

Metadata Data Lake Machine Learning Big Data

SAP Datasphere review: turning data from a technical problem to a business data product.

Jen Stirrup

MARCH 29, 2023

SAP helps to solve this search problem by offering ways to simplify business data with a solid data foundation that powers SAP Datasphere. It fits neatly with the renewed interest in data architecture, particularly data fabric architecture. They fail to get a grip on their data.

Data Warehouse

Data Warehouse Metadata Data Integration Business Intelligence

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Iceberg stores the metadata pointer for all the metadata files. When a SELECT query is reading an Iceberg table, the query engine first goes to the Iceberg catalog, then retrieves the entry of the location of the latest metadata file, as shown in the following diagram. The following example demonstrates this.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. The File Manager Lambda function consumes those messages, parses the metadata, and inserts the metadata to the DynamoDB table odpf_file_tracker.

Data Lake

Data Lake Data Processing Metadata Snapshot

Enterprise Data Management — Driving Large-Scale Change in Your Organization

Sisense

JULY 6, 2020

EDM covers the entire organization’s data lifecycle: It designs and describes data pipelines for each enterprise data type: metadata, reference data, master data, transactional data, and reporting data.

Enterprise

Enterprise Management Data Architecture Data-driven

What is Data Mesh ?

Octopai

APRIL 20, 2023

Data mesh is an approach to data architecture that is intentionally distributed, where data is owned and governed by domain-specific teams who treat the data as a product to be consumed by other domain-specific teams. What are the principles behind data mesh architecture?

Data-driven

Data-driven Data Architecture Sales Interactive

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

The consumption of the data should be supported through an elastic delivery layer that aligns with demand, but also provides the flexibility to present the data in a physical format that aligns with the analytic application, ranging from the more traditional data warehouse view to a graph view in support of relationship analysis.

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

It seamlessly consolidates data from various data sources within AWS, including AWS Cost Explorer (and forecasting with Cost Explorer ), AWS Trusted Advisor , and AWS Compute Optimizer. Data providers and consumers are the two fundamental users of a CDH dataset.

Analytics

Analytics Dashboards Metadata Data Warehouse

The Data Architect’s Role in Data Governance

Alation

MAY 31, 2023

This means that specialized roles such as data architects, which focus on modernizing data architecture to help meet business goals, are increasingly important to support data governance. What is a data architect? Their broad range of responsibilities include: Design and implement data architecture.

Data Governance

Data Governance Metadata Data Architecture Data Quality

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

The data mesh framework In the dynamic landscape of data management, the search for agility, scalability, and efficiency has led organizations to explore new, innovative approaches. One such innovation gaining traction is the data mesh framework. This empowers individual teams to own and manage their data.

Metadata

Metadata Data Quality Data Governance Modeling

If Johnny Mnemonic Smuggled Linked Data

Ontotext

MAY 30, 2019

There are two reasons for this: First, Linked Data, or, to put it in Plain English, the practice of explaining the meaning of content to machines, is essentially about linking content to semantically modeled data. Second, Linked Data is creating highly connected, computer-processable definitions of entities.

Cost-Benefit

Cost-Benefit Big Data Technology Metadata

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

Overview of solution As a data-driven company, smava relies on the AWS Cloud to power their analytics use cases. smava ingests data from various external and internal data sources into a landing stage on the data lake based on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Data Warehouse Data-driven B2B

How A Data Catalog Enhances Data Risk Management

Alation

JANUARY 9, 2023

The Australian Prudential Regulation Authority (APRA) released nonbinding standards covering data risk management. Another agency later also published a legally binding standard to strengthen risk management for financial institutions with specific language related to data architecture and IT infrastructure.

Risk Management

Risk Management Risk Management Metadata

What Is a Data Fabric and How Does a Data Catalog Support It?

Alation

JANUARY 25, 2022

Data Fabric: A Love Story , we defined data fabric and outlined its uses and motivations. As a reminder, here’s Gartner’s definition of data fabric: “A design concept that serves as an integrated layer (fabric) of data and connecting processes. The data catalog is a foundational layer of the data fabric.

Metadata

Metadata IT Data-driven Metrics

If Johnny Mnemonic Smuggled Linked Data

Ontotext

MAY 30, 2019

There are two reasons for this: First, Linked Data, or, to put it in Plain English, the practice of explaining the meaning of content to machines, is essentially about linking content to semantically modeled data. Second, Linked Data is creating highly connected, computer-processable definitions of entities.

Cost-Benefit

Cost-Benefit Big Data Technology Metadata

Educating ChatGPT on Data Lakehouse

Cloudera

MARCH 17, 2023

As the use of ChatGPT becomes more prevalent, I frequently encounter customers and data users citing ChatGPT’s responses in their discussions. I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics.

Unstructured Data

Unstructured Data Data Lake Data Warehouse Machine Learning

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

In today’s AI/ML-driven world of data analytics, explainability needs a repository just as much as those doing the explaining need access to metadata, EG, information about the data being used. The Cloud Data Migration Challenge. It’s not a simple definition. Legacy data adds to the challenge.

Metadata

Metadata Data Governance Data-driven Modeling

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Open data formats that kept the data accessible by all but optimized for high performance and with a well-defined structure. Open (sharable) metadata that enables multiple consumption engines or frameworks. Ability to update data (ACID properties) and support transactional concurrency.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Choosing A Graph Data Model to Best Serve Your Use Case

Ontotext

MARCH 27, 2024

For example, GPS, social media, cell phone handoffs are modeled as graphs while data catalogs, data lineage and MDM tools leverage knowledge graphs for linking metadata with semantics. This verbosity allows schema, metadata, and instance data to be in one place, enabling accessibility and manageability.

Modeling

Modeling Metadata Data Quality Enterprise

Erwin Data Intelligence: A Data Partner’s Perspective

erwin

FEBRUARY 28, 2024

While the essence of success in data governance is people and not technology, having the right tools at your fingertips is crucial. Technology is an enabler, and for data governance this is essentially having an excellent metadata management tool. Next to data governance, data architecture is really embedded in our DNA.

Metadata

Metadata Data Governance Data Quality Technology

Building a Semantic Capability Stack to Support FAIR Knowledge Graphs at Scale

Ontotext

FEBRUARY 7, 2024

They have to misallocate resources because 80% of the time the data scientists are busy doing data finding, accessing, cleansing, etc. This also results in the information loss I’ve already mentioned and severely impacts our insight creation and monetizing the data. The next element is the variables.

Metadata

Metadata Data Integration Data-driven Measurement

Cloud Data Warehouse Migration 101: Expert Tips

Alation

JULY 28, 2022

What Are the Biggest Drivers of Cloud Data Warehousing? It’s costly and time-consuming to manage on-premises data warehouses — and modern cloud data architectures can deliver business agility and innovation. You really need to understand the metadata and data definitions around different data sets,” Kirsch says.

Data Warehouse

Data Warehouse Cost-Benefit Data-driven Data Governance

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

Data fabric promotes data discoverability. Here, data assets can be published into categories, creating an enterprise-wide data marketplace. This marketplace provides a search mechanism, utilizing metadata and a knowledge graph to enable asset discovery. Data mesh: A mostly new culture.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

The Book Look: New Book on the Data Catalog

TDAN

JUNE 30, 2020

About a week ago, I was teaching a data modeling class, and an attendee asked me to explain the concept of a data catalog. Like a lot of hype-related terms in IT, there is more than one definition. However, I had recently read the book, The Data Catalog: Sherlock Holmes Data Sleuthing for Analytics by […].

Modeling

Modeling Analytics Data Architecture IT

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

spark_travel_details" limit 10; The table (native Delta table) has been created and updated to the AWS Glue Data Catalog from the EMR Serverless application code. Athena supports reading native Delta tables and therefore we can read the data successfully even though the Data Catalog shows only a single array column.

Data Lake

Data Lake Dashboards Metrics Metadata

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Andrew White

JANUARY 9, 2022

Most of D&A concerns and activities are done within EA in the Info/Data architecture domain/phases. Here too is a blog ( By 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated ) of mine on the topic. There really is not one plan per se for everyone.

Analytics

Analytics Measurement Data-driven Modeling

Jumia builds a next-generation data platform with metadata-driven specification frameworks

AWS Big Data

DECEMBER 20, 2024

Solution overview The basic concept of the modernization project is to create metadata-driven frameworks, which are reusable, scalable, and able to respond to the different phases of the modernization process. These phases are: data orchestration, data migration, data ingestion, data processing, and data maintenance.

Metadata

Metadata Data-driven Snapshot Data Lake

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Introduction Why should I read the definitive guide to embedded analytics? The Definitive Guide to Embedded Analytics is designed to answer any and all questions you have about the topic. It is now most definitely a need-to-have. CRM, ERP, EHR/EMR) or portals (e.g., intranets or extranets). addresses).

Analytics

Analytics Cost-Benefit Visualization Dashboards

Knowledge Graphs 101: The Story (and Benefits) Behind the Hype

Ontotext

NOVEMBER 11, 2024

Knowledge graphs, while not as well-known as other data management offerings, are a proven dynamic and scalable solution for addressing enterprise data management requirements across several verticals. The RDF-star extension makes it easy to model provenance and other structured metadata. A is B; B is C; C has D; A has D).

Metadata

Metadata Knowledge Discovery Data Integration Management

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

AWS Big Data

MAY 12, 2025

Integrating lineage into EMR Serverless AppsFlyer developed a robust solution for column-level lineage collection to provide comprehensive visibility into data transformations across pipelines. Lineage data is stored in Amazon S3 and subsequently ingested into DataHub , AppsFlyers lineage and metadata management environment.

Metrics

Metrics Cost-Benefit Metadata Data Lake

A Summary Of Gartner’s Recent Innovation Insight Into Data Observability

DataKitchen

AUGUST 8, 2023

Data Observability technology learns what to monitor and provides insights into unforeseen exceptions. However, the market for Data Observability is fragmented and lacks a standard accepted definition, leading to confusion and tool adoption issues. However, there are potential risks and challenges in adopting Data Observability.

Data Quality

Data Quality Testing Snapshot Reporting

Data’s dark secret: Why poor quality cripples AI and growth

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Trending Sources

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

Webinars

Data Management 20/20: Anatomy of a Business Glossary Definition

5 Ways Data Modeling Is Critical to Data Governance

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

What is data governance? Best practices for managing data assets

A Day in the Life of a DataOps Engineer

The Future of Data Lineage and the Role of Metadata

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

How Cargotec uses metadata replication to enable cross-account data sharing

SAP Datasphere review: turning data from a technical problem to a business data product.

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Enterprise Data Management — Driving Large-Scale Change in Your Organization

What is Data Mesh ?

Demystifying Modern Data Platforms

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

The Data Architect’s Role in Data Governance

Empowering data mesh: The tools to deliver BI excellence

If Johnny Mnemonic Smuggled Linked Data

How smava makes loans transparent and affordable using Amazon Redshift Serverless

How A Data Catalog Enhances Data Risk Management

What Is a Data Fabric and How Does a Data Catalog Support It?

If Johnny Mnemonic Smuggled Linked Data

Educating ChatGPT on Data Lakehouse

The Cloud Connection: How Governance Supports Security

How to modernize data lakes with a data lakehouse architecture

Choosing A Graph Data Model to Best Serve Your Use Case

Erwin Data Intelligence: A Data Partner’s Perspective

Building a Semantic Capability Stack to Support FAIR Knowledge Graphs at Scale

Cloud Data Warehouse Migration 101: Expert Tips

Data platform trinity: Competitive or complementary?

The Book Look: New Book on the Data Catalog

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Jumia builds a next-generation data platform with metadata-driven specification frameworks

What Is Embedded Analytics?

Knowledge Graphs 101: The Story (and Benefits) Behind the Hype

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

A Summary Of Gartner’s Recent Innovation Insight Into Data Observability

Stay Connected