Data Architecture, Metadata and Software

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. However, commits can still fail if the latest metadata is updated after the base metadata version is established.

Snapshot

Snapshot Management Metadata Big Data

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. This allows the existing data to be interpreted as if it were originally written in any of these formats.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

The landscape of big data management has been transformed by the rising popularity of open table formats such as Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake. These formats, designed to address the limitations of traditional data storage systems, have become essential in modern data architectures.

Metadata

Metadata Data Warehouse Big Data Data Lake

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

erwin

OCTOBER 24, 2019

Untapped data, if mined, represents tremendous potential for your organization. While there has been a lot of talk about big data over the years, the real hero in unlocking the value of enterprise data is metadata , or the data about the data. Metadata Is the Heart of Data Intelligence.

Metadata

Metadata Management Data-driven Data Architecture

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. The communication between business units and data professionals is usually incomplete and inconsistent. Introduction to Data Mesh.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

5 Ways Data Modeling Is Critical to Data Governance

erwin

JANUARY 9, 2020

Today’s data modeling is not your father’s data modeling software. While it’s always been the best way to understand complex data sources and automate design standards and integrity rules, the role of data modeling continues to expand as the fulcrum of collaboration between data generators, stewards and consumers.

Data Governance

Data Governance Modeling Metadata Unstructured Data

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

Cloudera

AUGUST 6, 2024

Open data is the future. And for that future to be a reality, data teams must shift their attention to metadata, the new turf war for data. The need for unified metadata While open and distributed architectures offer many benefits, they come with their own set of challenges. A few solutions manage both.

Metadata

Metadata Cost-Benefit Management Enterprise

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

Aruba offers networking hardware like access points, switches, routers, software, security devices, and Internet of Things (IoT) products. This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

Making OT-IT integration a reality with new data architectures and generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

Here, industrial knowledge graphs are going to prove vital by enabling manufacturers to combine structured and unstructured data from a wide range of operational and enterprise software systems to drive better decision-making, problem-solving and more advanced automation.”

Data Architecture

Data Architecture Unstructured Data Manufacturing IT

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architecture is a complex and varied field and different organizations and industries have unique needs when it comes to their data architects. Solutions data architect: These individuals design and implement data solutions for specific business needs, including data warehouses, data marts, and data lakes.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Using Strategic Data Governance to Manage GDPR/CCPA Complexity

erwin

JULY 12, 2019

Modern, strategic data governance , which involves both IT and the business, enables organizations to plan and document how they will discover and understand their data within context, track its physical existence and lineage, and maximize its security, quality and value. Five Steps to GDPR/CCPA Compliance. How erwin Can Help.

Data Governance

Data Governance Management Metadata Risk Management

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The program must introduce and support standardization of enterprise data. Programs must support proactive and reactive change management activities for reference data values and the structure/use of master data and metadata. Your data governance program needs to continually break down new siloes.

Data Governance

Data Governance Management Metadata Data Quality

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Over the past decade, the successful deployment of large scale data platforms at our customers has acted as a big data flywheel driving demand to bring in even more data, apply more sophisticated analytics, and on-board many new data practitioners from business analysts to data scientists.

Snapshot

Snapshot Metadata Cost-Benefit Data Architecture

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

After decades of digitizing everything in your enterprise, you may have an enormous amount of data, but with dormant value. However, with the help of AI and machine learning (ML), new software tools are now available to unearth the value of unstructured data. The solution integrates data in three tiers.

Unstructured Data

Unstructured Data Metadata Management Analytics

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions.

Data Lake

Data Lake Snapshot Metadata Data Architecture

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Collaborate and build faster using familiar AWS tools for model development, generative AI, data processing, and SQL analytics with Amazon Q Developer , the most capable generative AI assistant for software development, helping you along the way. Having confidence in your data is key.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

AWS Big Data

NOVEMBER 7, 2024

BladeBridge offers a comprehensive suite of tools that automate much of the complex conversion work, allowing organizations to quickly and reliably transition their data analytics capabilities to the scalable Amazon Redshift data warehouse. Amazon Redshift is a fully managed data warehouse service offered by Amazon Web Services (AWS).

Data Warehouse

Data Warehouse Reporting Big Data Data Lake

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

They chose AWS Glue as their preferred data integration tool due to its serverless nature, low maintenance, ability to control compute resources in advance, and scale when needed. To share the datasets, they needed a way to share access to the data and access to catalog metadata in the form of tables and views.

Metadata

Metadata Data Lake Machine Learning Big Data

Modern Data Modeling: The Foundation of Enterprise Data Management and Data Governance

erwin

MAY 13, 2020

The role of data modeling (DM) has expanded to support enterprise data management, including data governance and intelligence efforts. Metadata management is the key to managing and governing your data and drawing intelligence from it. Types of Data Models: Conceptual, Logical and Physical.

Data Governance

Data Governance Enterprise Modeling Management

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

When evolving such a partition definition, the data in the table prior to the change is unaffected, as is its metadata. Only data that is written to the table after the evolution is partitioned with the new definition, and the metadata for this new set of data is kept separately. SparkActions.get().expireSnapshots(iceTable).expireOlderThan(TimeUnit.DAYS.toMillis(7)).execute()

Data Lake

Data Lake Metadata Snapshot Analytics

Top 6 Benefits of Automating End-to-End Data Lineage

erwin

SEPTEMBER 17, 2020

Data automation reduces the loss of time in collecting, processing and storing large chunks of data because it replaces manual processes (and human errors) with intelligent processes, software and artificial intelligence (AI). Automating data capture frees up resources to focus on more strategic and useful tasks.

Cost-Benefit

Cost-Benefit Data Governance Metadata Reporting

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

In response, Lenovo launched a new line of entry-level gaming laptops and desktops it now brands as Lenovo LOQ that caters to a new gamer’s first foray into gaming, says Girish Hoogar, global head of engineering for Lenovo’s cloud and software business in its Intelligent Devices Group.

Analytics

Analytics Data Lake Metadata Cost-Benefit

SAP enhances Datasphere and SAC for AI-driven transformation

CIO Business Intelligence

MARCH 6, 2024

“SAP is executing on a roadmap that brings an important semantic layer to enterprise data, and creates the critical foundation for implementing AI-based use cases,” said analyst Robert Parker, SVP of industry, software, and services research at IDC. In the SuccessFactors application, Joule will behave like an HR assistant.

Unstructured Data

Unstructured Data Dashboards Business Intelligence Data Governance

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

We are excited to announce the general availability of Apache Iceberg in Cloudera Data Platform (CDP). Iceberg is a 100% open table format, developed through the Apache Software Foundation , and helps users avoid vendor lock-in. Why integrate Apache Iceberg with Cloudera Data Platform? This is a huge accelerator to adoption.

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

These inputs reinforced the need of a unified data strategy across the FinOps teams. We decided to build a scalable data management product that is based on the best practices of modern data architecture. Our source system and domain teams were mapped as data producers, and they would have ownership of the datasets.

Finance

Finance Metadata Big Data Recreation/Entertainment

Dive deep into security management: The Data on EKS Platform

AWS Big Data

APRIL 29, 2024

The construction of big data applications based on open source software has become increasingly uncomplicated since the advent of projects like Data on EKS , an open source project from AWS to provide blueprints for building data and machine learning (ML) applications on Amazon Elastic Kubernetes Service (Amazon EKS).

Management

Management Big Data Data Warehouse Metadata

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

They conveniently store data in a flat architecture that can be queried in aggregate and offer the speed and lower cost required for big data analytics. On the other hand, they don’t support transactions or enforce data quality. If only there were a best-of-both-worlds compromise. . Just starting out with analytics?

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

Enterprise Data Management — Driving Large-Scale Change in Your Organization

Sisense

JULY 6, 2020

First off, this involves defining workflows for every business process within the enterprise: the what, how, why, who, when, and where aspects of data. These regulations, ultimately, ensure key business values: data consistency, quality, and trustworthiness. There he oversaw product strategy, planning, design, and delivery.

Enterprise

Enterprise Management Data Architecture Data-driven

Embedding AI Into Every Aspect of Your Business

Cloudera

JULY 20, 2021

Once companies are able to leverage their data they’re then able to fuel machine learning and analytics models, transforming their business by embedding AI into every aspect of their business. . Build your data strategy around the convergence of software and hardware.

Manufacturing

Manufacturing Forecasting IoT Insurance

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

AWS Big Data

JUNE 12, 2024

One Data Platform The ODP architecture is based on the AWS Well Architected Framework Analytics Lens and follows the pattern of having raw, standardized, conformed, and enriched layers as described in Modern data architecture. Samuel Bucheli is a Lead Cloud Architect at Zühlke Engineering AG.

Data Architecture

Data Architecture Cost-Benefit Data-driven Experimentation

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

HEMA built its first ecommerce system on AWS in 2018 and 5 years later, its developers have the freedom to innovate and build software fast with their choice of tools in the AWS Cloud. HEMA has a bespoke enterprise architecture, built around the concept of services. Oghosa Omorisiagbon is a Senior Data Engineer at HEMA.

Data Governance

Data Governance Publishing Data-driven Metadata

Boosting Object Storage Performance with Ozone Manager

Cloudera

JULY 19, 2023

Introduction Ozone is an Apache Software Foundation project to build a distributed storage platform that caters to the demanding performance needs of analytical workloads, content distribution, and object storage use cases. The tool reads only the metadata for objects in a cluster with around 100 million keys.

Management

Management Metadata Metrics Optimization

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

However, even the most powerful systems can experience performance degradation if they encounter anti-patterns like grossly inaccurate table statistics, such as the row count metadata. Software Development Engineer with Amazon. This can have a significant impact on overall query performance.

Data Lake

Data Lake Statistics Broadcasting Optimization

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

The data mesh framework In the dynamic landscape of data management, the search for agility, scalability, and efficiency has led organizations to explore new, innovative approaches. One such innovation gaining traction is the data mesh framework. This empowers individual teams to own and manage their data.

Metadata

Metadata Data Quality Data Governance Modeling

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

Ontotext

AUGUST 4, 2023

A data fabric utilizes an integrated data layer over existing, discoverable, and inferenced metadata assets to support the design, deployment, and utilization of data across enterprises, including hybrid and multi-cloud platforms. It also helps capture and connect data based on business or domains.

Metadata

Metadata Data-driven Data Architecture Data Quality

If Johnny Mnemonic Smuggled Linked Data

Ontotext

MAY 30, 2019

The lack of structure and the presence of too many siloed (often meaning duplicate) data entries, which make data expand endlessly can be avoided if these data are properly interlinked and given explicit machine-interpretable metadata for easier and automatic search and retrieval. Linked Data and Information Retrieval.

Cost-Benefit

Cost-Benefit Big Data Technology Metadata

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Ontotext

DECEMBER 1, 2023

Most famous for inventing the first wiki and one of the pioneers of software design patterns and Extreme Programming, he is no stranger to it. Most organisations are missing this ability to connect all the data together. “Complexity is empowering”, argues Howard G. Cunningham.

Metadata

Metadata Sales Machine Learning Consulting

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

Cost and resource efficiency – This is an area where Acast observed a reduction in data duplication, and therefore cost reduction (in some accounts, removing the copy of data 100%), by reading data across accounts while enabling scaling. In this approach, teams responsible for generating data are referred to as producers.

Data-driven

Data-driven Advertising Metadata Data Architecture

Build streaming data pipelines with Amazon MSK Serverless and IAM authentication

AWS Big Data

SEPTEMBER 6, 2023

The following software installed on your development machine, or use an AWS Cloud9 environment, which comes with all requirements preinstalled: Java Development Kit 17 or higher (for example, Amazon Corretto 17 , OpenJDK 17 ) Python version 3.11 If you haven’t signed up, complete the following steps: Create an account. Create an IAM user.

Testing

Testing Metadata Cost-Benefit Internet of Things

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

Overview of solution As a data-driven company, smava relies on the AWS Cloud to power their analytics use cases. smava ingests data from various external and internal data sources into a landing stage on the data lake based on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Data Warehouse Data-driven B2B

How Huron built an Amazon QuickSight Asset Catalogue with AWS CDK Based Deployment Pipeline

AWS Big Data

APRIL 26, 2023

Having an accurate and up-to-date inventory of all technical assets helps an organization ensure it can keep track of all its resources with metadata information such as their assigned oners, last updated date, used by whom, how frequently and more.

Metadata

Metadata Dashboards Visualization Consulting

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Trending Sources

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Webinars

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

What is a Data Mesh?

Data’s dark secret: Why poor quality cripples AI and growth

5 Ways Data Modeling Is Critical to Data Governance

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

Making OT-IT integration a reality with new data architectures and generative AI

What is a data architect? Skills, salaries, and how to become a data framework master

Using Strategic Data Governance to Manage GDPR/CCPA Complexity

What is data governance? Best practices for managing data assets

Introducing Apache Iceberg in Cloudera Data Platform

Unstructured data management and governance using AWS AI/ML and analytics services

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

How Cargotec uses metadata replication to enable cross-account data sharing

Modern Data Modeling: The Foundation of Enterprise Data Management and Data Governance

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Top 6 Benefits of Automating End-to-End Data Lineage

Lay the groundwork now for advanced analytics and AI

SAP enhances Datasphere and SAC for AI-driven transformation

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Dive deep into security management: The Data on EKS Platform

Building a Beautiful Data Lakehouse

Enterprise Data Management — Driving Large-Scale Change in Your Organization

Embedding AI Into Every Aspect of Your Business

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

HEMA accelerates their data governance journey with Amazon DataZone

Boosting Object Storage Performance with Ozone Manager

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Choosing an open table format for your transactional data lake on AWS

Empowering data mesh: The tools to deliver BI excellence

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

If Johnny Mnemonic Smuggled Linked Data

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Design a data mesh on AWS that reflects the envisioned organization

Build streaming data pipelines with Amazon MSK Serverless and IAM authentication

How smava makes loans transparent and affordable using Amazon Redshift Serverless

How Huron built an Amazon QuickSight Asset Catalogue with AWS CDK Based Deployment Pipeline

Stay Connected