Data Architecture, Interactive and Metadata

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. However, commits can still fail if the latest metadata is updated after the base metadata version is established.

Snapshot

Snapshot Management Metadata Big Data

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. This allows the existing data to be interpreted as if it were originally written in any of these formats.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI.

Metadata

Metadata Data Lake Dashboards Interactive

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. The communication between business units and data professionals is usually incomplete and inconsistent. Introduction to Data Mesh. Source: Thoughtworks.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

Over the years, data lakes on Amazon Simple Storage Service (Amazon S3) have become the default repository for enterprise data and are a common choice for a large set of users who query data for a variety of analytics and machine leaning use cases. Analytics use cases on data lakes are always evolving.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Over the past decade, the successful deployment of large scale data platforms at our customers has acted as a big data flywheel driving demand to bring in even more data, apply more sophisticated analytics, and on-board many new data practitioners from business analysts to data scientists.

Snapshot

Snapshot Metadata Cost-Benefit Data Architecture

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Lakehouse provides an open data architecture that reduces data silos and unifies data across Amazon Simple Storage Service (Amazon S3) data lakes, Redshift data warehouses, and third-party and federated data sources. connection testing, metadata retrieval, and data preview.

Analytics

Analytics Data Lake Metadata Data Warehouse

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

Traditionally, data was seen as information to be put on reserve, only called upon during customer interactions or executing a program. Today, the way businesses use data is much more fluid; data literate employees use data across hundreds of apps, analyze data for better decision-making, and access data from numerous locations.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

Companies can now capitalize on the value in all their data, by delivering a hybrid data platform for modern data architectures with data anywhere. Cloudera Data Platform (CDP) is designed to address the critical requirements for modern data architectures today and tomorrow.

IT

IT Data Architecture Unstructured Data Big Data

SAP enhances Datasphere and SAC for AI-driven transformation

CIO Business Intelligence

MARCH 6, 2024

The company is expanding its partnership with Collibra to integrate Collibra’s AI Governance platform with SAP data assets to facilitate data governance for non-SAP data assets in customer environments. “We We are also seeing customers bringing in other data assets from other apps or data sources.

Unstructured Data

Unstructured Data Dashboards Business Intelligence Data Governance

SAP Datasphere review: turning data from a technical problem to a business data product.

Jen Stirrup

MARCH 29, 2023

SAP helps to solve this search problem by offering ways to simplify business data with a solid data foundation that powers SAP Datasphere. It fits neatly with the renewed interest in data architecture, particularly data fabric architecture. They fail to get a grip on their data.

Data Warehouse

Data Warehouse Metadata Data Integration Business Intelligence

The Future Is Hybrid Data, Embrace It

CIO Business Intelligence

JUNE 23, 2022

Companies can now capitalize on the value in all their data, by delivering a hybrid data platform for modern data architectures with data anywhere. Cloudera Data Platform (CDP) is designed to address the critical requirements for modern data architectures today and tomorrow.

IT

IT Data Architecture Unstructured Data Big Data

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

In the past, First Service Credit Union’s Chief data officer Ty Robbins struggled to integrate data from the legacy, non-relational, and often proprietary tabular databases on which many credit unions run. Start early The time to standardize everything from data modeling to its security is when the data is acquired. “We

Analytics

Analytics Data Lake Metadata Cost-Benefit

Embedding AI Into Every Aspect of Your Business

Cloudera

JULY 20, 2021

Invest in maturing and improving your enterprise business metrics and metadata repositories, a multitiered data architecture, continuously improving data quality, and managing data acquisitions. enhanced customer experiences by accelerating the use of data across the organization.

Manufacturing

Manufacturing Forecasting IoT Insurance

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

Data domain producers publish data assets using datasource run to Amazon DataZone in the Central Governance account. This populates the technical metadata in the business data catalog for each data asset. Producers control what to share, for how long, and how consumers interact with it.

Data Lake

Data Lake Publishing Metadata Data-driven

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

It seamlessly consolidates data from various data sources within AWS, including AWS Cost Explorer (and forecasting with Cost Explorer ), AWS Trusted Advisor , and AWS Compute Optimizer. Data providers and consumers are the two fundamental users of a CDH dataset.

Dashboards

Dashboards Analytics Metadata Data Warehouse

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

These inputs reinforced the need of a unified data strategy across the FinOps teams. We decided to build a scalable data management product that is based on the best practices of modern data architecture. Our source system and domain teams were mapped as data producers, and they would have ownership of the datasets.

Finance

Finance Metadata Big Data Recreation/Entertainment

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

Limiting growth by (data integration) complexity Most operational IT systems in an enterprise have been developed to serve a single business function and they use the simplest possible model for this. In both cases, semantic metadata is the glue that turns knowledge graphs into hubs of data, metadata, and content.

Metadata

Metadata Slice and Dice Data Integration Enterprise

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. Amazon Athena is used for interactive querying and AWS Lake Formation is used for access controls. Similarly, you will find 17 such folders in the bucket.

Data Lake

Data Lake Data Processing Metadata Snapshot

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

In recent years, the term “data lakehouse” was coined to describe this architectural pattern of tabular analytics over data in the data lake. In a rush to own this term, many vendors have lost sight of the fact that the openness of a data architecture is what guarantees its durability and longevity.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Apache Iceberg overview Iceberg is an open-source table format that brings the power of SQL tables to big data files. It enables ACID transactions on tables, allowing for concurrent data ingestion, updates, and queries, all while using familiar SQL.

Data Lake

Data Lake Sales Data Warehouse Snapshot

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

In recent years, the term “data lakehouse” was coined to describe this architectural pattern of tabular analytics over data in the data lake. In a rush to own this term, many vendors have lost sight of the fact that the openness of a data architecture is what guarantees its durability and longevity.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

In this post, we are excited to summarize the features that the AWS Glue Data Catalog, AWS Glue crawler, and Lake Formation teams delivered in 2022. Whether you are a data platform builder, data engineer, data scientist, or any technology leader interested in data lake solutions, this post is for you.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Ontotext

DECEMBER 1, 2023

Atanas Kiryakov presenting at KGF 2023 about Where Shall and Enterprise Start their Knowledge Graph Journey Only data integration through semantic metadata can drive business efficiency as “it’s the glue that turns knowledge graphs into hubs of metadata and content”.

Metadata

Metadata Sales Machine Learning Consulting

What is Data Mesh ?

Octopai

APRIL 20, 2023

Data mesh is an approach to data architecture that is intentionally distributed, where data is owned and governed by domain-specific teams who treat the data as a product to be consumed by other domain-specific teams. What are the principles behind data mesh architecture?

Data-driven

Data-driven Data Architecture Sales Interactive

Build streaming data pipelines with Amazon MSK Serverless and IAM authentication

AWS Big Data

SEPTEMBER 6, 2023

or higher Appropriate AWS credentials for interacting with resources in your AWS account. He works with enterprise FSI customers and is primarily specialized in machine learning and data architectures. or higher Apache Maven version 3.8.4 or higher Docker version 24.0.2 or higher Node.js AWS CLI 2.12.1 or higher AWS CDK 2.89.0

Testing

Testing Metadata Cost-Benefit Internet of Things

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Customer 360 (C360) provides a complete and unified view of a customer’s interactions and behavior across all touchpoints and channels. This view is used to identify patterns and trends in customer behavior, which can inform data-driven decisions to improve business outcomes. Then, you transform this data into a concise format.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

The RDV organizes data into three key types of tables: Hubs – This type of table represents a core business entity such as a customer. Each record in a hub table is married with metadata that identifies the record’s creation time, originating source system, and unique business key.

Enterprise

Enterprise Data Warehouse Data Lake Optimization

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

Figure 1 Shows the overall idea of a data mesh with the major components: What Is a Data Mesh and How Does It Work? Think of data mesh as an operational mode for organizations with a domain-driven, decentralized data architecture. What Is a Data Product and Who Owns Them?

Metadata

Metadata Data-driven Data Quality Data Architecture

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

Overview of solution As a data-driven company, smava relies on the AWS Cloud to power their analytics use cases. smava ingests data from various external and internal data sources into a landing stage on the data lake based on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Data Warehouse Data-driven B2B

Why We Started the Data Intelligence Project

Alation

JULY 7, 2022

In 2013 I joined American Family Insurance as a metadata analyst. I had always been fascinated by how people find, organize, and access information, so a metadata management role after school was a natural choice. The use cases for metadata are boundless, offering opportunities for innovation in every sector.

Metadata

Metadata Data-driven Insurance Statistics

Shopping for Data

Alation

FEBRUARY 20, 2020

Curation allows active management of the data sets that are available in the EDM, selects and and qualifies data sets, describes each one and manages all metadata. Cataloging exposes data sets for data shoppers, including descriptions and metadata and provides a view into the inventory of curated data sets.

Data Warehouse

Data Warehouse Metadata Data Lake Data Architecture

Insights from Gartner Data & Analytics Summit Orlando 2023

Alation

MARCH 31, 2023

Ehtisham Zaidi, Gartner’s VP of data management, and Robert Thanaraj, Gartner’s director of data management, gave an update on the fabric versus mesh debate in light of what they call the “active metadata era” we’re currently in. The foundations of successful data governance The state of data governance was also top of mind.

Data Analytics

Data Analytics Analytics Metadata Data Governance

Announcing Alation 4.0 with Alation Connect

Alation

FEBRUARY 20, 2020

How recently the data was updated. What the mapping is of technical metadata to business descriptions. Alation Connect synchronizes metadata, sample data, and query logs into the Alation Data Catalog. We decided to address these needs for SQL engines over Hadoop in Alation 4.0.

Metadata

Metadata Enterprise Data Processing Data Architecture

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

In today’s AI/ML-driven world of data analytics, explainability needs a repository just as much as those doing the explaining need access to metadata, EG, information about the data being used. The Cloud Data Migration Challenge. A useful feature for exposing patterns in the data. Legacy data adds to the challenge.

Metadata

Metadata Data Governance Data-driven Modeling

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

If the asset has AWS Glue Data Quality enabled, you can now quickly visualize the data quality score directly in the catalog search pane. By selecting the corresponding asset, you can understand its content through the readme, glossary terms , and technical and business metadata.

Data Quality

Data Quality Visualization Metadata Metrics

How Zurich Insurance Group built a log management solution on AWS

AWS Big Data

JULY 16, 2024

Priority 2 logs, such as operating system security logs, firewall, identity provider (IdP), email metadata, and AWS CloudTrail , are ingested into Amazon OpenSearch Service to enable the following capabilities. Develop log and trace analytics solutions with interactive queries and visualize results with high adaptability and speed.

Insurance

Insurance Management Cost-Benefit Optimization

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

A highly available distributed processing framework meant giving up on performance in favor of resiliency (we are talking orders of magnitude performance degradation for interactive analytics and BI). Get the ebook on the benefits of a lakehouse architecture Why modernize your data lake?

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Erwin Data Intelligence: A Data Partner’s Perspective

erwin

FEBRUARY 28, 2024

While the essence of success in data governance is people and not technology, having the right tools at your fingertips is crucial. Technology is an enabler, and for data governance this is essentially having an excellent metadata management tool. Next to data governance, data architecture is really embedded in our DNA.

Metadata

Metadata Data Governance Data Quality Technology

Data Strategy and Decentralization: A Data Architect’s View

Alation

MARCH 1, 2023

And, we have now moved on to getting people engaged with those two other aspects – ensuring that they understand the tech and policies, and understanding how they interact with the data – which is where Alation came in. What are the goals of your data team? Alation was the way that we introduced data stewards into Blockdaemon.

Data Strategy

Data Strategy Strategy Metadata Interactive

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

The purpose of the data fabric is to make data available wherever and whenever it is needed, abstracting away the technological complexities involved in data movement, transformation and integration, so that anyone can use the data. Some key characteristics of data fabric are: A network of data nodes.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Streaming jobs constantly ingest new data to synchronize across systems and can perform enrichment, transformations, joins, and aggregations across windows of time more efficiently. AWS Glue can interact with streaming data services such as Kinesis Data Streams and Amazon MSK for processing and transforming CDC data.

Data Lake

Data Lake Unstructured Data Management Snapshot

The Cycle of Change

TDAN

MAY 31, 2022

One very influential factor that can potentially undermine your data and document strategies is the natural and emotional reactions of people when things change. Interactions between hardware and software are cautiously investigated, operating systems and network connections are carefully tested, […].

Testing

Testing Interactive Strategy Software

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Trending Sources

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Webinars

What is a Data Mesh?

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Introducing Apache Iceberg in Cloudera Data Platform

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Top analytics announcements of AWS re:Invent 2024

Data democratization: How data architecture can drive business decisions and AI initiatives

The Future Is Hybrid Data, Embrace It

SAP enhances Datasphere and SAC for AI-driven transformation

SAP Datasphere review: turning data from a technical problem to a business data product.

The Future Is Hybrid Data, Embrace It

Lay the groundwork now for advanced analytics and AI

Embedding AI Into Every Aspect of Your Business

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

You Cannot Get to the Moon on a Bike!

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

The Future of the Data Lakehouse – Open

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

The Future of the Data Lakehouse – Open

AWS Lake Formation 2022 year in review

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

What is Data Mesh ?

Build streaming data pipelines with Amazon MSK Serverless and IAM authentication

Create an end-to-end data strategy for Customer 360 on AWS

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

What is Data Mesh?

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Why We Started the Data Intelligence Project

Shopping for Data

Insights from Gartner Data & Analytics Summit Orlando 2023

Announcing Alation 4.0 with Alation Connect

The Cloud Connection: How Governance Supports Security

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

How Zurich Insurance Group built a log management solution on AWS

How to modernize data lakes with a data lakehouse architecture

Erwin Data Intelligence: A Data Partner’s Perspective

Data Strategy and Decentralization: A Data Architect’s View

Data platform trinity: Competitive or complementary?

Exploring real-time streaming for generative AI Applications

The Cycle of Change

Stay Connected