Data Architecture, Metadata and Presentation

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. However, commits can still fail if the latest metadata is updated after the base metadata version is established.

Snapshot

Snapshot Management Metadata Big Data

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architecture is a complex and varied field and different organizations and industries have unique needs when it comes to their data architects. Solutions data architect: These individuals design and implement data solutions for specific business needs, including data warehouses, data marts, and data lakes.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Lakehouse provides an open data architecture that reduces data silos and unifies data across Amazon Simple Storage Service (Amazon S3) data lakes, Redshift data warehouses, and third-party and federated data sources. connection testing, metadata retrieval, and data preview.

Analytics

Analytics Data Lake Metadata Data Warehouse

Webinars

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

Over the years, data lakes on Amazon Simple Storage Service (Amazon S3) have become the default repository for enterprise data and are a common choice for a large set of users who query data for a variety of analytics and machine leaning use cases. Analytics use cases on data lakes are always evolving.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

When evolving such a partition definition, the data in the table prior to the change is unaffected, as is its metadata. Only data that is written to the table after the evolution is partitioned with the new definition, and the metadata for this new set of data is kept separately. SparkActions.get().expireSnapshots(iceTable).expireOlderThan(TimeUnit.DAYS.toMillis(7)).execute()

Data Lake

Data Lake Metadata Snapshot Analytics

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Within the context of a data mesh architecture, I will present industry settings / use cases where the particular architecture is relevant and highlight the business value that it delivers against business and technology areas. Components of a Data Mesh. How CDF enables successful Data Mesh Architectures.

Metadata

Metadata Cost-Benefit Enterprise Interactive

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

The cause is hybrid data – the massive amounts of data created everywhere businesses operate – in clouds, on-prem, and at the edge. Only a fraction of data created is actually stored and managed, with analysts estimating it to be between 4 – 6 ZB in 2020.

IT

IT Data Architecture Unstructured Data Big Data

The Data-Centric Revolution: Semantics and the DAMA Wheel

TDAN

DECEMBER 3, 2019

Recently, I was giving a presentation and someone asked me which segment of “the DAMA wheel” did I think semantics most affected. I said I thought it affected all of them pretty profoundly, but perhaps the Metadata wedge the most. I thought I’d spend a bit of time to reflect on the question and answer […].

Metadata

Metadata Data Architecture IT Data Governance

SAP Datasphere review: turning data from a technical problem to a business data product.

Jen Stirrup

MARCH 29, 2023

SAP helps to solve this search problem by offering ways to simplify business data with a solid data foundation that powers SAP Datasphere. It fits neatly with the renewed interest in data architecture, particularly data fabric architecture. They fail to get a grip on their data.

Data Warehouse

Data Warehouse Metadata Data Integration Business Intelligence

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. To address this challenge, organizations can deploy a data mesh using AWS Lake Formation that connects the multiple EMR clusters. An entity can act both as a producer of data assets and as a consumer of data assets.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

The Future Is Hybrid Data, Embrace It

CIO Business Intelligence

JUNE 23, 2022

The cause is hybrid data – the massive amounts of data created everywhere businesses operate – in clouds, on-prem, and at the edge. Only a fraction of data created is actually stored and managed, with analysts estimating it to be between 4 – 6 ZB in 2020. Clearly, hybrid data presents a massive opportunity and a tough challenge.

IT

IT Data Architecture Unstructured Data Big Data

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

With exponential growth in data volume, centralized monitoring becomes challenging. It is also crucial to audit granular data access for security and compliance needs. This blog post presents an architecture solution that allows customers to extract key insights from Amazon S3 access logs at scale.

Metadata

Metadata Dashboards Metrics Visualization

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Ontotext

DECEMBER 1, 2023

The event held the space for presentations, discussions, and one-on-one meetings, where more than 20 partners, 1064 Registrants from 41 countries, spanning across 25 industries came together. It was presented by Summit Pal, Strategic Technology Director at Ontotext and former Gartner VP Analyst.

Metadata

Metadata Sales Machine Learning Consulting

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Data Vault 2.0 allows for the following: Agile data warehouse development Parallel data ingestion A scalable approach to handle multiple data sources even on the same entity A high level of automation Historization Full lineage support However, Data Vault 2.0

Enterprise

Enterprise Data Warehouse Data Lake Optimization

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

From establishing an enterprise-wide data inventory and improving data discoverability, to enabling decentralized data sharing and governance, Amazon DataZone has been a game changer for HEMA. HEMA has a bespoke enterprise architecture, built around the concept of services.

Data Governance

Data Governance Publishing Data-driven Metadata

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

Limiting growth by (data integration) complexity Most operational IT systems in an enterprise have been developed to serve a single business function and they use the simplest possible model for this. In both cases, semantic metadata is the glue that turns knowledge graphs into hubs of data, metadata, and content.

Metadata

Metadata Slice and Dice Data Integration Enterprise

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

The following graph illustrates these runtime improvements for the full benchmark (all TPC-DS queries) over the past year, including the additional boost from using AWS Glue Data Catalog column statistics. This can have a significant impact on overall query performance.

Data Lake

Data Lake Statistics Broadcasting Optimization

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

Cost and resource efficiency – This is an area where Acast observed a reduction in data duplication, and therefore cost reduction (in some accounts, removing the copy of data 100%), by reading data across accounts while enabling scaling. Some examples of Acast’s domains are presented in the following figure.

Data-driven

Data-driven Advertising Metadata Data Architecture

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

Independent data products often only have value if you can connect them, join them, and correlate them to create a higher order data product that creates additional insights. A modern data architecture is critical in order to become a data-driven organization. We focus on the former.

Technology

Technology Data-driven Machine Learning Sales

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

It seamlessly consolidates data from various data sources within AWS, including AWS Cost Explorer (and forecasting with Cost Explorer ), AWS Trusted Advisor , and AWS Compute Optimizer. Data providers and consumers are the two fundamental users of a CDH dataset. These ingested datasets are used as a source in CLEA dashboards.

Analytics

Analytics Dashboards Metadata Data Warehouse

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. The File Manager Lambda function consumes those messages, parses the metadata, and inserts the metadata to the DynamoDB table odpf_file_tracker.

Data Lake

Data Lake Data Processing Metadata Snapshot

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

In this post, we are excited to summarize the features that the AWS Glue Data Catalog, AWS Glue crawler, and Lake Formation teams delivered in 2022. Whether you are a data platform builder, data engineer, data scientist, or any technology leader interested in data lake solutions, this post is for you.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

Today, we are pleased to announce that Amazon DataZone is now able to present data quality information for data assets. If the asset has AWS Glue Data Quality enabled, you can now quickly visualize the data quality score directly in the catalog search pane.

Data Quality

Data Quality Visualization Metadata Metrics

Build streaming data pipelines with Amazon MSK Serverless and IAM authentication

AWS Big Data

SEPTEMBER 6, 2023

In this post, we aim to address this issue and present how you can use Amazon API Gateway and AWS Lambda to navigate around this obstacle. He works with enterprise FSI customers and is primarily specialized in machine learning and data architectures. Daniel Wessendorf is a Global Solutions Architect at AWS based in Munich.

Testing

Testing Metadata Cost-Benefit Internet of Things

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

The consumption of the data should be supported through an elastic delivery layer that aligns with demand, but also provides the flexibility to present the data in a physical format that aligns with the analytic application, ranging from the more traditional data warehouse view to a graph view in support of relationship analysis.

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. We explore why Orca chose to build a transactional data lake and examine the key considerations that guided the selection of Apache Iceberg as the preferred table format.

Data Lake

Data Lake Analytics Snapshot Data Quality

Insights from Gartner Data & Analytics Summit Orlando 2023

Alation

MARCH 31, 2023

Ehtisham Zaidi, Gartner’s VP of data management, and Robert Thanaraj, Gartner’s director of data management, gave an update on the fabric versus mesh debate in light of what they call the “active metadata era” we’re currently in. The foundations of successful data governance The state of data governance was also top of mind.

Data Analytics

Data Analytics Analytics Metadata Data Governance

Why We Started the Data Intelligence Project

Alation

JULY 7, 2022

In 2013 I joined American Family Insurance as a metadata analyst. I had always been fascinated by how people find, organize, and access information, so a metadata management role after school was a natural choice. The use cases for metadata are boundless, offering opportunities for innovation in every sector.

Metadata

Metadata Data-driven Insurance Statistics

Shopping for Data

Alation

FEBRUARY 20, 2020

In fact, Wells has identified four characteristics of digital marketplaces that should be present in any EDW. Categorization organizes the marketplace to simplify browsing (either by data asset type or topic). Through this ongoing feedback loop, the quality of the data in the marketplace undergoes continuous improvement.

Data Warehouse

Data Warehouse Metadata Data Lake Data Architecture

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

Figure 1 Shows the overall idea of a data mesh with the major components: What Is a Data Mesh and How Does It Work? Think of data mesh as an operational mode for organizations with a domain-driven, decentralized data architecture. What Is a Data Product and Who Owns Them?

Metadata

Metadata Data-driven Data Quality Data Architecture

5 recommendations to get your data strategy right

IBM Big Data Hub

JUNE 30, 2022

The diversity of data types, data processing, integration and consumption patterns used by organizations has grown exponentially. Extend data governance to foster trust in your data by creating transparency, eliminating bias and ensuring explainability for data and insights fueled by machine learning and AI.

Data Strategy

Data Strategy Strategy Business Objectives Data-driven

Erwin Data Intelligence: A Data Partner’s Perspective

erwin

FEBRUARY 28, 2024

While the essence of success in data governance is people and not technology, having the right tools at your fingertips is crucial. Technology is an enabler, and for data governance this is essentially having an excellent metadata management tool. Next to data governance, data architecture is really embedded in our DNA.

Metadata

Metadata Data Governance Data Quality Technology

Features Every Data Catalog Needs

TDAN

MAY 2, 2023

In today’s data-driven world, organizations are demanding and consuming vast amounts of data — data that needs to be easily accessed, analyzed, and presented in a way that enables quick action.

Data-driven

Data-driven IT Data Architecture Metadata

Safely Driving Infonomic Growth with Data Access Governance and Security

TDAN

JULY 5, 2022

Whichever metaphor you would like to use, what is certain is that no organization will survive the twenty-first century without optimizing the use of its data assets. Similarly, cybersecurity, privacy, and compliance risks increasingly present huge […].

Data-driven

Data-driven Optimization Risk Enterprise

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

split(";") # Exit if length of table names and partition keys are different to ensure data is provided for all tables. if partition key is not present enter empty semicolon - T1_PK;;T3PK") sys.exit(0) i = 0 while i < len(tables): table = tables[i] partition_key = partition_keys[i].split(",") if len(tables)!=len(partition_keys):

Data Lake

Data Lake Dashboards Metrics Metadata

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Andrew White

JANUARY 9, 2022

Check this out: The Foundation of an Effective Data and Analytics Operating Model — Presentation Materials. Most of D&A concerns and activities are done within EA in the Info/Data architecture domain/phases. – Data (and analytics) governance remains a challenge. Great presentation, thank you.

Analytics

Analytics Measurement Data-driven Modeling

GraphDB Empowers Scientific Projects to Fight COVID-19 and Publish Knowledge Graphs

Ontotext

APRIL 15, 2020

An example of the sort of linked data reasoning that can be employed here is that if quarantine and social distancing measures are in place for a region, then a community that’s part of this region will be subject to those same restrictions, so you don’t need to materialize everything in the graph. To Sum It Up.

Publishing

Publishing Metadata Data mining Data Architecture

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Discuss, don’t present. Present your business case. To support your case, present findings from the State of Embedded Analytics study. Information Delivery The main reason software providers take on an embedded analytics project is to improve how data is presented. It is now most definitely a need-to-have.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Jumia builds a next-generation data platform with metadata-driven specification frameworks

AWS Big Data

DECEMBER 20, 2024

Jumia is a technology company born in 2012, present in 14 African countries, with its main headquarters in Lagos, Nigeria. Jumia is present in NYSE and has a market cap of $554 million. These phases are: data orchestration, data migration, data ingestion, data processing, and data maintenance.

Metadata

Metadata Data-driven Snapshot Data Lake

Beyond the lakehouse: Architecting the open, interoperable data cloud for AI

CIO Business Intelligence

MAY 29, 2025

AI in the enterprise has become a strategic imperative for every organization, but for it to be truly effective, CIOs need to manage the data layer in a way that can support the evolutionary breakthroughs in large language models and frameworks. Thats why there is a massive pivot toward AI powered open lakehouse architectures.

Metadata

Metadata Contextual Data Cost-Benefit Unstructured Data

Access Amazon S3 Iceberg tables from Databricks using AWS Glue Iceberg Rest Catalog in Amazon SageMaker Lakehouse

AWS Big Data

JANUARY 23, 2025

Amazon SageMaker Lakehouse enables a unified, open, and secure lakehouse platform on your existing data lakes and warehouses. Its unified data architecture supports data analysis, business intelligence, machine learning, and generative AI applications, which can now take advantage of a single authoritative copy of data.

Data Lake

Data Lake Data Warehouse Metadata Machine Learning

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

AWS Big Data

APRIL 28, 2025

The data is stored in Apache Parquet format with AWS Glue Catalog providing metadata management. While this architecture supported NI analytical needs, it lacked the flexibility required for a truly open and adaptable data platform. This meant NI couldnt rely on Glue Catalog events to detect partition changes.

Data Lake

Data Lake Metadata Cost-Benefit Snapshot

Introducing the HubSpot connector for AWS Glue

AWS Big Data

DECEMBER 2, 2024

The rapid adoption has enabled them to quickly streamline operations, enhance collaboration, and gain more accessible, scalable solutions for managing their critical data and workflows. AWS Glue also supports the ability to apply complex data transformations, enabling efficient data integration and preparation to meet your needs.

Data Lake

Data Lake Testing Data Integration Metadata

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

What is a data architect? Skills, salaries, and how to become a data framework master

Webinars

Trending Sources

Top analytics announcements of AWS re:Invent 2024

Webinars

Migrate an existing data lake to a transactional data lake using Apache Iceberg

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

How Cloudera Data Flow Enables Successful Data Mesh Architectures

The Future Is Hybrid Data, Embrace It

The Data-Centric Revolution: Semantics and the DAMA Wheel

SAP Datasphere review: turning data from a technical problem to a business data product.

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

The Future Is Hybrid Data, Embrace It

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

HEMA accelerates their data governance journey with Amazon DataZone

You Cannot Get to the Moon on a Bike!

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Design a data mesh on AWS that reflects the envisioned organization

Automate discovery of data relationships using ML and Amazon Neptune graph technology

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Lake Formation 2022 year in review

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

Build streaming data pipelines with Amazon MSK Serverless and IAM authentication

Demystifying Modern Data Platforms

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Insights from Gartner Data & Analytics Summit Orlando 2023

Why We Started the Data Intelligence Project

Shopping for Data

What is Data Mesh?

5 recommendations to get your data strategy right

Erwin Data Intelligence: A Data Partner’s Perspective

Features Every Data Catalog Needs

Safely Driving Infonomic Growth with Data Access Governance and Security

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

GraphDB Empowers Scientific Projects to Fight COVID-19 and Publish Knowledge Graphs

What Is Embedded Analytics?

Jumia builds a next-generation data platform with metadata-driven specification frameworks

Beyond the lakehouse: Architecting the open, interoperable data cloud for AI

Access Amazon S3 Iceberg tables from Databricks using AWS Glue Iceberg Rest Catalog in Amazon SageMaker Lakehouse

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

Introducing the HubSpot connector for AWS Glue

Stay Connected