Data Architecture, Data Integration and Metadata

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Need for a data mesh architecture Because entities in the EUROGATE group generate vast amounts of data from various sourcesacross departments, locations, and technologiesthe traditional centralized data architecture struggles to keep up with the demands for real-time insights, agility, and scalability.

IoT

IoT Machine Learning Metadata Data-driven

How Metadata Makes Data Meaningful

erwin

DECEMBER 12, 2019

Metadata is an important part of data governance, and as a result, most nascent data governance programs are rife with project plans for assessing and documenting metadata. But in many scenarios, it seems that the underlying driver of metadata collection projects is that it’s just something you do for data governance.

Metadata

Metadata Data Governance Digital Transformation Data Quality

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS. Each file arrives as a pair with a tail metadata file in CSV format containing the size and name of the file.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

5 Ways Data Modeling Is Critical to Data Governance

erwin

JANUARY 9, 2020

That’s because it’s the best way to visualize metadata , and metadata is now the heart of enterprise data management and data governance/ intelligence efforts. So here’s why data modeling is so critical to data governance. erwin Data Modeler: Where the Magic Happens.

Data Governance

Data Governance Modeling Metadata Unstructured Data

Metadata, the Neglected Stepchild of IT

Data Virtualization

DECEMBER 8, 2022

Reading Time: 3 minutes While cleaning up our archive recently, I found an old article published in 1976 about data dictionary/directory systems (DD/DS). Nowadays, we no longer use the term DD/DS, but “data catalog” or simply “metadata system”. It was written by L.

Metadata

Metadata IT Data Integration Publishing

IBM named a leader in the 2022 Gartner® Magic Quadrant™ for Data Integration Tools

IBM Big Data Hub

AUGUST 24, 2022

The only question is, how do you ensure effective ways of breaking down data silos and bringing data together for self-service access? It starts by modernizing your data integration capabilities – ensuring disparate data sources and cloud environments can come together to deliver data in real time and fuel AI initiatives.

Data Integration

Data Integration Metadata Data-driven Data Architecture

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions.

Data Lake

Data Lake Snapshot Metadata Data Architecture

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

With this launch, you can query data regardless of where it is stored with support for a wide range of use cases, including analytics, ad-hoc querying, data science, machine learning, and generative AI. We’ve simplified data architectures, saving you time and costs on unnecessary data movement, data duplication, and custom solutions.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Data integrity vs. data quality: Is there a difference?

IBM Big Data Hub

JULY 13, 2023

When we talk about data integrity, we’re referring to the overarching completeness, accuracy, consistency, accessibility, and security of an organization’s data. Together, these factors determine the reliability of the organization’s data. In short, yes.

Data Quality

Data Quality Data Integration Metadata Cost-Benefit

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Several factors determine the quality of your enterprise data like accuracy, completeness, consistency, to name a few. But there’s another factor of data quality that doesn’t get the recognition it deserves: your data architecture. How the right data architecture improves data quality.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The program must introduce and support standardization of enterprise data. Programs must support proactive and reactive change management activities for reference data values and the structure/use of master data and metadata.

Data Governance

Data Governance Management Metadata Data Quality

How Metadata Makes Data Meaningful

erwin

DECEMBER 12, 2019

Metadata is an important part of data governance, and as a result, most nascent data governance programs are rife with project plans for assessing and documenting metadata. But in many scenarios, it seems that the underlying driver of metadata collection projects is that it’s just something you do for data governance.

Metadata

Metadata Data Governance Digital Transformation Data Quality

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

Over the years, data lakes on Amazon Simple Storage Service (Amazon S3) have become the default repository for enterprise data and are a common choice for a large set of users who query data for a variety of analytics and machine leaning use cases. Analytics use cases on data lakes are always evolving.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

For this, Cargotec built an Amazon Simple Storage Service (Amazon S3) data lake and cataloged the data assets in AWS Glue Data Catalog. They chose AWS Glue as their preferred data integration tool due to its serverless nature, low maintenance, ability to control compute resources in advance, and scale when needed.

Metadata

Metadata Data Lake Machine Learning Big Data

SAP Datasphere review: turning data from a technical problem to a business data product.

Jen Stirrup

MARCH 29, 2023

However, to turn data into a business problem, organizations need support to move away from technical issues to start getting value as quickly as possible. SAP Datasphere simplifies data integration, cataloging, semantic modeling, warehousing, federation, and virtualization through a unified interface. Why is this interesting?

Data Warehouse

Data Warehouse Metadata Data Integration Business Intelligence

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

When evolving such a partition definition, the data in the table prior to the change is unaffected, as is its metadata. Only data that is written to the table after the evolution is partitioned with the new definition, and the metadata for this new set of data is kept separately. SparkActions.get().expireSnapshots(iceTable).expireOlderThan(TimeUnit.DAYS.toMillis(7)).execute()

Data Lake

Data Lake Metadata Snapshot Analytics

Modern Data Modeling: The Foundation of Enterprise Data Management and Data Governance

erwin

MAY 13, 2020

The role of data modeling (DM) has expanded to support enterprise data management, including data governance and intelligence efforts. Metadata management is the key to managing and governing your data and drawing intelligence from it. Types of Data Models: Conceptual, Logical and Physical.

Data Governance

Data Governance Enterprise Modeling Management

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Lakehouse provides an open data architecture that reduces data silos and unifies data across Amazon Simple Storage Service (Amazon S3) data lakes, Redshift data warehouses, and third-party and federated data sources. With AWS Glue 5.0, With AWS Glue 5.0, AWS Glue 5.0 AWS Glue 5.0

Analytics

Analytics Data Lake Metadata Data Warehouse

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

This blog post presents an architecture solution that allows customers to extract key insights from Amazon S3 access logs at scale. We will partition and format the server access logs with Amazon Web Services (AWS) Glue , a serverless data integration service, to generate a catalog for access logs and create dashboards for insights.

Metadata

Metadata Dashboards Metrics Visualization

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Data fabric and data mesh are emerging data management concepts that are meant to address the organizational change and complexities of understanding, governing and working with enterprise data in a hybrid multicloud ecosystem. The good news is that both data architecture concepts are complimentary.

Management

Management Metadata Data Architecture Data Lake

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

In fact, we recently announced the integration with our cloud ecosystem bringing the benefits of Iceberg to enterprises as they make their journey to the public cloud, and as they adopt more converged architectures like the Lakehouse. 1: Multi-function analytics . 1: Multi-function analytics . 3: Open Performance.

Metadata

Metadata Data Architecture Machine Learning Cost-Benefit

SAP enhances Datasphere and SAC for AI-driven transformation

CIO Business Intelligence

MARCH 6, 2024

“SAP is executing on a roadmap that brings an important semantic layer to enterprise data, and creates the critical foundation for implementing AI-based use cases,” said analyst Robert Parker, SVP of industry, software, and services research at IDC. We are also seeing customers bringing in other data assets from other apps or data sources.

Unstructured Data

Unstructured Data Dashboards Business Intelligence Data Governance

How to stay ahead of ever-evolving data privacy regulations

IBM Big Data Hub

SEPTEMBER 12, 2022

The journey starts with having a multimodal data governance framework that is underpinned by a robust data architecture like data fabric. To understand how a data fabric helps maintain compliance to privacy regulations, it’s helpful to look at some essential elements of that single pane of glass.

Metadata

Metadata Data Governance Enterprise Data Architecture

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

And each of these gains requires data integration across business lines and divisions. Limiting growth by (data integration) complexity Most operational IT systems in an enterprise have been developed to serve a single business function and they use the simplest possible model for this. We call this the Bad Data Tax.

Metadata

Metadata Slice and Dice Data Integration Enterprise

Dive deep into security management: The Data on EKS Platform

AWS Big Data

APRIL 29, 2024

Addressing big data challenges – Big data comes with unique challenges, like managing large volumes of rapidly evolving data across multiple platforms. Effective permission management helps tackle these challenges by controlling how data is accessed and used, providing data integrity and minimizing the risk of data breaches.

Management

Management Big Data Data Warehouse Metadata

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. To address this challenge, organizations can deploy a data mesh using AWS Lake Formation that connects the multiple EMR clusters. An entity can act both as a producer of data assets and as a consumer of data assets.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Iceberg stores the metadata pointer for all the metadata files. When a SELECT query is reading an Iceberg table, the query engine first goes to the Iceberg catalog, then retrieves the entry of the location of the latest metadata file, as shown in the following diagram.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Ontotext

DECEMBER 1, 2023

So, KGF 2023 proved to be a breath of fresh air for anyone interested in topics like data mesh and data fabric , knowledge graphs, text analysis , large language model (LLM) integrations, retrieval augmented generation (RAG), chatbots, semantic data integration , and ontology building.

Metadata

Metadata Sales Machine Learning Consulting

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Maximize value with comprehensive analytics and ML capabilities “Amazon Redshift is one of the most important tools we had in growing Jobcase as a company.” – Ajay Joshi, Distinguished Engineer, Jobcase With all your data integrated and available, you can easily build and run near real-time analytics to AI/ML/Generative AI applications.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

In most enterprises data teams lack a data map and data asset inventory and are often unaware of data that exists across the organization, its associated profile, quality and associated metadata. Teams can’t access data to build their business use cases. For example, a product data tag is basic metadata.

Metadata

Metadata Data Lake Data Warehouse Data Quality

Knowledge Graphs vs. Property Graphs – Part 1

TDAN

AUGUST 18, 2020

Flexibility is one strong driver: heterogeneous data, integrating new data sources, and analytics all require flexibility. We are in the era of graphs. Graphs are hot. Graphs deliver it in spades. Over the last few years, a number of new graph databases came to market. As we start the next decade, dare we say […].

Data Integration

Data Integration Marketing Analytics Data Architecture

My Reflections on the Gartner® Hype Cycle™ for Data Management, 2024

Data Virtualization

DECEMBER 20, 2024

The post My Reflections on the Gartner Hype Cycle for Data Management, 2024 appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information. Gartner Hype Cycle methodology provides a view of how.

Management

Management Data Integration Technology Data Architecture

Data Governance in a Data Mesh or Data Fabric Architecture

Data Virtualization

DECEMBER 21, 2023

Reading Time: 2 minutes Data mesh is a modern, distributed data architecture in which different domain based data products are owned by different groups within an organization. And data fabric is a self-service data layer that is supported in an orchestrated fashion to serve.

Data Governance

Data Governance Data Architecture Data Integration Management

If Johnny Mnemonic Smuggled Linked Data

Ontotext

MAY 30, 2019

The lack of structure and the presence of too many siloed (often meaning duplicate) data entries, which make data expand endlessly can be avoided if these data are properly interlinked and given explicit machine-interpretable metadata for easier and automatic search and retrieval. Linked Data and Information Retrieval.

Cost-Benefit

Cost-Benefit Big Data Technology Metadata

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Data ingestion You have to build ingestion pipelines based on factors like types of data sources (on-premises data stores, files, SaaS applications, third-party data), and flow of data (unbounded streams or batch data). Then, you transform this data into a concise format.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

What Is a Data Fabric and How Does a Data Catalog Support It?

Alation

JANUARY 25, 2022

As a reminder, here’s Gartner’s definition of data fabric: “A design concept that serves as an integrated layer (fabric) of data and connecting processes. In this blog, we will focus on the “integrated layer” part of this definition by examining each of the key layers of a comprehensive data fabric in more detail.

Metadata

Metadata IT Data-driven Metrics

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. Metadata tables offer insights into the physical data storage layout of the tables and offer the convenience of querying them with Athena version 3.

Data Lake

Data Lake Analytics Snapshot Data Quality

Querying Minds Want to Know: Can a Data Fabric and RAG Clean up LLMs? – Part 4 : Intelligent Autonomous Agents

Data Virtualization

AUGUST 23, 2024

The post Querying Minds Want to Know: Can a Data Fabric and RAG Clean up LLMs? – Part 4 : Intelligent Autonomous Agents appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information. In previous posts, I spoke.

Data Integration

Data Integration Modeling Management Data Architecture

Data Mesh vs Data Fabric: Understanding the Key Differences

Data Virtualization

JANUARY 17, 2023

Reading Time: 2 minutes In recent years, there has been a growing interest in data architecture. One of the key considerations is how best to handle data, and this is where data mesh and data fabric come into play. But what are the key.

Data Architecture

Data Architecture Data Integration Management Metadata

If Johnny Mnemonic Smuggled Linked Data

Ontotext

MAY 30, 2019

The lack of structure and the presence of too many siloed (often meaning duplicate) data entries, which make data expand endlessly can be avoided if these data are properly interlinked and given explicit machine-interpretable metadata for easier and automatic search and retrieval. Linked Data and Information Retrieval.

Cost-Benefit

Cost-Benefit Big Data Technology Metadata

Building a Semantic Capability Stack to Support FAIR Knowledge Graphs at Scale

Ontotext

FEBRUARY 7, 2024

However, what we usually don’t talk about when generating an asset, are the huge invisible or unplanned costs occurring at a later stage when the data needs to be made available for analysis or secondary usage. As a result, a big portion of the IT capacity in Pharma is bound by data integration.

Metadata

Metadata Data Integration Measurement Data-driven

Choosing A Graph Data Model to Best Serve Your Use Case

Ontotext

MARCH 27, 2024

For example, GPS, social media, cell phone handoffs are modeled as graphs while data catalogs, data lineage and MDM tools leverage knowledge graphs for linking metadata with semantics. This verbosity allows schema, metadata, and instance data to be in one place, enabling accessibility and manageability.

Modeling

Modeling Metadata Data Quality Enterprise

Four starting points to transform your organization into a data-driven enterprise

IBM Big Data Hub

JANUARY 17, 2023

IBM Cloud Pak for Data Express solutions offer clients a simple on ramp to start realizing the business value of a modern architecture. Data governance. The data governance capability of a data fabric focuses on the collection, management and automation of an organization’s data. Data integration.

Data-driven

Data-driven Enterprise Data Governance Data Science

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Trending Sources

How Metadata Makes Data Meaningful

Webinars

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

5 Ways Data Modeling Is Critical to Data Governance

Metadata, the Neglected Stepchild of IT

IBM named a leader in the 2022 Gartner® Magic Quadrant™ for Data Integration Tools

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Data integrity vs. data quality: Is there a difference?

Data architecture strategy for data quality

What is data governance? Best practices for managing data assets

How Metadata Makes Data Meaningful

Migrate an existing data lake to a transactional data lake using Apache Iceberg

How Cargotec uses metadata replication to enable cross-account data sharing

SAP Datasphere review: turning data from a technical problem to a business data product.

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Modern Data Modeling: The Foundation of Enterprise Data Management and Data Governance

Top analytics announcements of AWS re:Invent 2024

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

Augmented data management: Data fabric versus data mesh

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

SAP enhances Datasphere and SAC for AI-driven transformation

How to stay ahead of ever-evolving data privacy regulations

You Cannot Get to the Moon on a Bike!

Dive deep into security management: The Data on EKS Platform

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

How Knowledge Graphs Power Data Mesh and Data Fabric

Knowledge Graphs vs. Property Graphs – Part 1

My Reflections on the Gartner® Hype Cycle™ for Data Management, 2024

Data Governance in a Data Mesh or Data Fabric Architecture

If Johnny Mnemonic Smuggled Linked Data

Create an end-to-end data strategy for Customer 360 on AWS

What Is a Data Fabric and How Does a Data Catalog Support It?

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Querying Minds Want to Know: Can a Data Fabric and RAG Clean up LLMs? – Part 4 : Intelligent Autonomous Agents

Data Mesh vs Data Fabric: Understanding the Key Differences

If Johnny Mnemonic Smuggled Linked Data

Building a Semantic Capability Stack to Support FAIR Knowledge Graphs at Scale

Choosing A Graph Data Model to Best Serve Your Use Case

Four starting points to transform your organization into a data-driven enterprise

Stay Connected