Data Architecture, IT and Metadata

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. However, commits can still fail if the latest metadata is updated after the base metadata version is established.

Snapshot

Snapshot Management Metadata Big Data

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

The landscape of big data management has been transformed by the rising popularity of open table formats such as Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake. These formats, designed to address the limitations of traditional data storage systems, have become essential in modern data architectures.

Metadata

Metadata Data Warehouse Big Data Data Lake

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Data quality is no longer a back-office concern. We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. Why data quality matters and its impact on business AI and analytics are transforming how businesses operate, compete and grow.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

erwin

OCTOBER 24, 2019

While there has been a lot of talk about big data over the years, the real hero in unlocking the value of enterprise data is metadata , or the data about the data. They don’t know exactly what data they have or even where some of it is. Metadata Is the Heart of Data Intelligence.

Metadata

Metadata Management Data-driven Data Architecture

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. In later pipeline stages, data is converted to Iceberg, to benefit from its read performance.

Metadata

Metadata Data Lake Snapshot Data Warehouse

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. Second-generation – gigantic, complex data lake maintained by a specialized team drowning in technical debt. Introduction to Data Mesh. See the pattern?

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

How Metadata Makes Data Meaningful

erwin

DECEMBER 12, 2019

Metadata is an important part of data governance, and as a result, most nascent data governance programs are rife with project plans for assessing and documenting metadata. But in many scenarios, it seems that the underlying driver of metadata collection projects is that it’s just something you do for data governance.

Metadata

Metadata Data Governance Digital Transformation Data Quality

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI.

Metadata

Metadata Data Lake Dashboards Interactive

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. Recently, EUROGATE has developed a digital twin for its container terminal Hamburg (CTH), generating millions of data points every second from Internet of Things (IoT)devices attached to its container handling equipment (CHE).

IoT

IoT Machine Learning Metadata Data-driven

Metadata, the Neglected Stepchild of IT

Data Virtualization

DECEMBER 8, 2022

Reading Time: 3 minutes While cleaning up our archive recently, I found an old article published in 1976 about data dictionary/directory systems (DD/DS). Nowadays, we no longer use the term DD/DS, but “data catalog” or simply “metadata system”. It was written by L.

Metadata

Metadata IT Data Integration Publishing

How to Manage Risk with Modern Data Architectures

Cloudera

JUNE 29, 2023

To improve the way they model and manage risk, institutions must modernize their data management and data governance practices. Implementing a modern data architecture makes it possible for financial institutions to break down legacy data silos, simplifying data management, governance, and integration — and driving down costs.

Data Architecture

Data Architecture Risk Management Risk Management

Making OT-IT integration a reality with new data architectures and generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

Manufacturers have long held a data-driven vision for the future of their industry. It’s one where near real-time data flows seamlessly between IT and operational technology (OT) systems. Legacy data management is holding back manufacturing transformation Until now, however, this vision has remained out of reach.

Data Architecture

Data Architecture Unstructured Data Manufacturing IT

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

But while state and local governments seek to improve policies, decision making, and the services constituents rely upon, data silos create accessibility and sharing challenges that hinder public sector agencies from transforming their data into a strategic asset and leveraging it for the common good. . Modern data architectures.

Data Architecture

Data Architecture Data Lake Data Warehouse Metadata

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

Cloudera

AUGUST 6, 2024

And although there is clear momentum behind the data lakehouse as the ideal architecture for multi-function analytics, the demand for open table formats including Apache Iceberg is a clear signal that data leaders value interoperability and engine freedom. It no longer matters where the data is. Open data is the future.

Metadata

Metadata Cost-Benefit Management Enterprise

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

Only a fraction of data created is actually stored and managed, with analysts estimating it to be between 4 – 6 ZB in 2020. Clearly, hybrid data presents a massive opportunity and a tough challenge. Capitalizing on the potential requires the ability to harness the value of all of that data, no matter where it is.

IT

IT Data Architecture Unstructured Data Big Data

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS. Each file arrives as a pair with a tail metadata file in CSV format containing the size and name of the file.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

The Struggle Between Data Dark Ages and LLM Accuracy

Cloudera

DECEMBER 6, 2024

The AI Forecast: Data and AI in the Cloud Era , sponsored by Cloudera, aims to take an objective look at the impact of AI on business, industry, and the world at large. AI is only as successful as the data behind it. It could be metadata that you weren’t capturing before. But what does that future look like?

Manufacturing

Manufacturing Forecasting Metadata Data Processing

5 Ways Data Modeling Is Critical to Data Governance

erwin

JANUARY 9, 2020

For decades, data modeling has been the optimal way to design and deploy new relational databases with high-quality data sources and support application development. Today’s data modeling is not your father’s data modeling software. So here’s why data modeling is so critical to data governance.

Data Governance

Data Governance Modeling Metadata Unstructured Data

The Future Is Hybrid Data, Embrace It

CIO Business Intelligence

JUNE 23, 2022

Only a fraction of data created is actually stored and managed, with analysts estimating it to be between 4 – 6 ZB in 2020. Clearly, hybrid data presents a massive opportunity and a tough challenge. Capitalizing on the potential requires the ability to harness the value of all of that data, no matter where it is.

IT

IT Data Architecture Unstructured Data Big Data

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Cloudera

SEPTEMBER 29, 2022

Each of these trends claim to be complete models for their data architectures to solve the “everything everywhere all at once” problem. Data teams are confused as to whether they should get on the bandwagon of just one of these trends or pick a combination. First, we describe how data mesh and data fabric could be related.

Data Architecture

Data Architecture Data Warehouse Metadata Sales

Metadata is the Magic Behind Data Fabric

TDAN

MAY 31, 2022

The main goal of creating an enterprise data fabric is not new. It is the ability to deliver the right data at the right time, in the right shape, and to the right data consumer, irrespective of how and where it is stored. Data fabric is the common “net” that stitches integrated data from multiple data […].

Metadata

Metadata Enterprise IT Data Architecture

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architecture is a complex and varied field and different organizations and industries have unique needs when it comes to their data architects. Solutions data architect: These individuals design and implement data solutions for specific business needs, including data warehouses, data marts, and data lakes.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Several factors determine the quality of your enterprise data like accuracy, completeness, consistency, to name a few. But there’s another factor of data quality that doesn’t get the recognition it deserves: your data architecture. How the right data architecture improves data quality.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

How Metadata Makes Data Meaningful

erwin

DECEMBER 12, 2019

Metadata is an important part of data governance, and as a result, most nascent data governance programs are rife with project plans for assessing and documenting metadata. But in many scenarios, it seems that the underlying driver of metadata collection projects is that it’s just something you do for data governance.

Metadata

Metadata Data Governance Digital Transformation Data Quality

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

AWS Lake Formation helps with enterprise data governance and is important for a data mesh architecture. It works with the AWS Glue Data Catalog to enforce data access and governance. This solution only replicates metadata in the Data Catalog, not the actual underlying data.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

Why it’s challenging to process and manage unstructured data Unstructured data makes up a large proportion of the data in the enterprise that can’t be stored in a traditional relational database management systems (RDBMS). Understanding the data, categorizing it, storing it, and extracting insights from it can be challenging.

Unstructured Data

Unstructured Data Metadata Management Analytics

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Over the past decade, the successful deployment of large scale data platforms at our customers has acted as a big data flywheel driving demand to bring in even more data, apply more sophisticated analytics, and on-board many new data practitioners from business analysts to data scientists. Key Design Goals .

Snapshot

Snapshot Metadata Cost-Benefit Data Architecture

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

Octopai

APRIL 19, 2021

Aptly named, metadata management is the process in which BI and Analytics teams manage metadata, which is the data that describes other data. In other words, data is the context and metadata is the content. Without metadata, BI teams are unable to understand the data’s full story.

Metadata

Metadata Management Business Intelligence Data Governance

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

Data governance definition Data governance is a system for defining who within an organization has authority and control over data assets and how those data assets may be used. It encompasses the people, processes, and technologies required to manage and protect data assets.

Data Governance

Data Governance Management Metadata Data Quality

Using Strategic Data Governance to Manage GDPR/CCPA Complexity

erwin

JULY 12, 2019

In light of recent, high-profile data breaches, it’s past-time we re-examined strategic data governance and its role in managing regulatory requirements. for alleged violations of the European Union’s General Data Protection Regulation (GDPR). Complexity. Five Steps to GDPR/CCPA Compliance. Govern PII “in motion”.

Data Governance

Data Governance Management Metadata Risk Management

The Future of Data Lineage and the Role of Metadata

Alation

AUGUST 18, 2022

I’m convinced that a fundamental shift in approach to lineage is needed to drive both the value of data (and the analytics culture) to a new level of effectiveness. What Is Data Lineage Creation & Maintenance? But what data things are interconnected? Simply put, I find it fascinating! Why Focus on Lineage? Look familiar?

Metadata

Metadata Visualization Statistics Data Architecture

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions.

Data Lake

Data Lake Snapshot Metadata Data Architecture

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

A modern data strategy redefines and enables sharing data across the enterprise and allows for both reading and writing of a singular instance of the data using an open table format. Why Cloudinary chose Apache Iceberg Apache Iceberg is a high-performance table format for huge analytic workloads.

Data Lake

Data Lake Metadata Snapshot Analytics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

Data Governance 2.0: The CIO’s Guide to Collaborative Data Governance

erwin

DECEMBER 6, 2019

In the data-driven era, CIO’s need a solid understanding of data governance 2.0 … Data governance (DG) is no longer about just compliance or relegated to the confines of IT. Today, data governance needs to be a ubiquitous part of your organization’s culture. It also requires funding.

Data Governance

Data Governance Metadata Enterprise Data-driven

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

They chose AWS Glue as their preferred data integration tool due to its serverless nature, low maintenance, ability to control compute resources in advance, and scale when needed. To share the datasets, they needed a way to share access to the data and access to catalog metadata in the form of tables and views.

Metadata

Metadata Data Lake Machine Learning Big Data

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Lakehouse provides an open data architecture that reduces data silos and unifies data across Amazon Simple Storage Service (Amazon S3) data lakes, Redshift data warehouses, and third-party and federated data sources. With AWS Glue 5.0, AWS Glue 5.0 AWS Glue 5.0 Apache Iceberg 1.6.1,

Analytics

Analytics Data Lake Metadata Data Warehouse

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

They’re taking data they’ve historically used for analytics or business reporting and putting it to work in machine learning (ML) models and AI-powered applications. You’ll get a single unified view of all your data for your data and AI workers, regardless of where the data sits, breaking down your data siloes.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

First, you must understand the existing challenges of the data team, including the data architecture and end-to-end toolchain. The final step is designing a data solution and its implementation. The biggest challenge is broken data pipelines due to highly manual processes. List of Challenges. Definition of Done.

Testing

Testing Metadata Dashboards Statistics

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

Over the years, data lakes on Amazon Simple Storage Service (Amazon S3) have become the default repository for enterprise data and are a common choice for a large set of users who query data for a variety of analytics and machine leaning use cases. Analytics use cases on data lakes are always evolving.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

Data democratization, much like the term digital transformation five years ago, has become a popular buzzword throughout organizations, from IT departments to the C-suite. It’s often described as a way to simply increase data access, but the transition is about far more than that.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

Top 6 Benefits of Automating End-to-End Data Lineage

erwin

SEPTEMBER 17, 2020

Replace manual and recurring tasks for fast, reliable data lineage and overall data governance. It’s paramount that organizations understand the benefits of automating end-to-end data lineage. The importance of end-to-end data lineage is widely understood and ignoring it is risky business. Doing Data Lineage Right.

Cost-Benefit

Cost-Benefit Data Governance Metadata Reporting

Modern Data Modeling: The Foundation of Enterprise Data Management and Data Governance

erwin

MAY 13, 2020

The role of data modeling (DM) has expanded to support enterprise data management, including data governance and intelligence efforts. Metadata management is the key to managing and governing your data and drawing intelligence from it. Types of Data Models: Conceptual, Logical and Physical.

Data Governance

Data Governance Enterprise Modeling Management

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Within the context of a data mesh architecture, I will present industry settings / use cases where the particular architecture is relevant and highlight the business value that it delivers against business and technology areas. Introduction to the Data Mesh Architecture and its Required Capabilities.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Webinars

Trending Sources

Data’s dark secret: Why poor quality cripples AI and growth

Webinars

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

Run Apache XTable in AWS Lambda for background conversion of open table formats

What is a Data Mesh?

How Metadata Makes Data Meaningful

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

How EUROGATE established a data mesh architecture using Amazon DataZone

Metadata, the Neglected Stepchild of IT

How to Manage Risk with Modern Data Architectures

Making OT-IT integration a reality with new data architectures and generative AI

Breaking State and Local Data Silos with Modern Data Architectures

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

The Future Is Hybrid Data, Embrace It

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

The Struggle Between Data Dark Ages and LLM Accuracy

5 Ways Data Modeling Is Critical to Data Governance

The Future Is Hybrid Data, Embrace It

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Metadata is the Magic Behind Data Fabric

What is a data architect? Skills, salaries, and how to become a data framework master

Data architecture strategy for data quality

How Metadata Makes Data Meaningful

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

Unstructured data management and governance using AWS AI/ML and analytics services

Introducing Apache Iceberg in Cloudera Data Platform

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

What is data governance? Best practices for managing data assets

Using Strategic Data Governance to Manage GDPR/CCPA Complexity

The Future of Data Lineage and the Role of Metadata

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Data Governance 2.0: The CIO’s Guide to Collaborative Data Governance

How Cargotec uses metadata replication to enable cross-account data sharing

Top analytics announcements of AWS re:Invent 2024

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

A Day in the Life of a DataOps Engineer

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Data democratization: How data architecture can drive business decisions and AI initiatives

Top 6 Benefits of Automating End-to-End Data Lineage

Modern Data Modeling: The Foundation of Enterprise Data Management and Data Governance

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Stay Connected