Data Architecture, Machine Learning and Metadata

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Need for a data mesh architecture Because entities in the EUROGATE group generate vast amounts of data from various sourcesacross departments, locations, and technologiesthe traditional centralized data architecture struggles to keep up with the demands for real-time insights, agility, and scalability.

IoT

IoT Machine Learning Metadata Data-driven

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. In practice, OTFs are used in a broad range of analytical workloads, from business intelligence to machine learning.

Metadata

Metadata Data Lake Snapshot Data Warehouse

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. Second-generation – gigantic, complex data lake maintained by a specialized team drowning in technical debt. Introduction to Data Mesh. See the pattern?

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Metadata Improves Security, Quality, and Transparency

TDAN

JULY 20, 2021

They use data better. Using machine learning and AI, Spotify creates value for their users by providing a more personalized experience. How does Spotify win against a competitor like Apple?

Metadata

Metadata Machine Learning Data Architecture Data Strategy

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS. Each file arrives as a pair with a tail metadata file in CSV format containing the size and name of the file.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Our customers are telling us that they are seeing their analytics and AI workloads increasingly converge around a lot of the same data, and this is changing how they are using analytics tools with their data. Having confidence in your data is key. They aren’t using analytics and AI tools in isolation.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

How to Manage Risk with Modern Data Architectures

Cloudera

JUNE 29, 2023

To improve the way they model and manage risk, institutions must modernize their data management and data governance practices. Implementing a modern data architecture makes it possible for financial institutions to break down legacy data silos, simplifying data management, governance, and integration — and driving down costs.

Data Architecture

Data Architecture Risk Management Risk Management

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

Modern data architectures. To eliminate or integrate these silos, the public sector needs to adopt robust data management solutions that support modern data architectures (MDAs). Deploying modern data architectures. Lack of sharing hinders the elimination of fraud, waste, and abuse.

Data Architecture

Data Architecture Data Lake Data Warehouse Metadata

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions.

Data Lake

Data Lake Snapshot Metadata Data Architecture

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architecture is a complex and varied field and different organizations and industries have unique needs when it comes to their data architects. Solutions data architect: These individuals design and implement data solutions for specific business needs, including data warehouses, data marts, and data lakes.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

By some estimates, unstructured data can make up to 80–90% of all new enterprise data and is growing many times faster than structured data. After decades of digitizing everything in your enterprise, you may have an enormous amount of data, but with dormant value. The solution integrates data in three tiers.

Unstructured Data

Unstructured Data Metadata Management Analytics

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Over the past decade, the successful deployment of large scale data platforms at our customers has acted as a big data flywheel driving demand to bring in even more data, apply more sophisticated analytics, and on-board many new data practitioners from business analysts to data scientists.

Snapshot

Snapshot Metadata Cost-Benefit Data Architecture

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

Octopai

APRIL 19, 2021

Aptly named, metadata management is the process in which BI and Analytics teams manage metadata, which is the data that describes other data. In other words, data is the context and metadata is the content. Without metadata, BI teams are unable to understand the data’s full story.

Metadata

Metadata Management Business Intelligence Data Governance

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue.

Data Lake

Data Lake Data Processing Optimization Machine Learning

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

Cargotec captures terabytes of IoT telemetry data from their machinery operated by numerous customers across the globe. This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization.

Metadata

Metadata Data Lake Machine Learning Big Data

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

This amalgamation empowers vendors with authority over a diverse range of workloads by virtue of owning the data. This authority extends across realms such as business intelligence, data engineering, and machine learning thus limiting the tools and capabilities that can be used. Here is where it can get complicated.

Data Lake

Data Lake Metadata Snapshot Analytics

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The program must introduce and support standardization of enterprise data. Programs must support proactive and reactive change management activities for reference data values and the structure/use of master data and metadata.

Data Governance

Data Governance Management Metadata Data Quality

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

Today, the way businesses use data is much more fluid; data literate employees use data across hundreds of apps, analyze data for better decision-making, and access data from numerous locations. It uses knowledge graphs, semantics and AI/ML technology to discover patterns in various types of metadata.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

In fact, we recently announced the integration with our cloud ecosystem bringing the benefits of Iceberg to enterprises as they make their journey to the public cloud, and as they adopt more converged architectures like the Lakehouse. 1: Multi-function analytics . 3: Open Performance.

Metadata

Metadata Data Architecture Machine Learning Cost-Benefit

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Introducing the next generation of Amazon SageMaker AWS announces the next generation of Amazon SageMaker, a unified platform for data, analytics, and AI. S3 Metadata is designed to automatically capture metadata from objects as they are uploaded into a bucket, and to make that metadata queryable in a read-only table.

Analytics

Analytics Data Lake Metadata Data Warehouse

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

We deliver cloud-native data analytics across the full data lifecycle – data distribution, data engineering, data warehousing, transactional data, streaming data, data science, and machine learning – that’s portable across infrastructures.

IT

IT Data Architecture Unstructured Data Big Data

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

Today’s general availability announcement covers Iceberg running within key data services in the Cloudera Data Platform (CDP) — including Cloudera Data Warehousing ( CDW ), Cloudera Data Engineering ( CDE ), and Cloudera Machine Learning ( CML ). Why integrate Apache Iceberg with Cloudera Data Platform?

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

BI Data Lineage Solutions: Your Trusted Guide For Success

Octopai

JULY 9, 2020

Your Guide to Machine Learning Data Lineage for BI: From Source to Target. Too often, data lineage is considered a linear course that does little more than reveal the route data took to arrive at its final destination. It required banks to develop a data architecture that could support risk-management tools.

Insurance

Insurance Risk Management Machine Learning Metadata

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads. Learn more about the AWS zero-ETL future with newly launched AWS databases integrations with Amazon Redshift.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Since then, customer demands for better scale, higher throughput, and agility in handling a wide variety of changing, but increasingly business critical analytics and machine learning use cases has exploded, and we have been keeping pace.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. To address this challenge, organizations can deploy a data mesh using AWS Lake Formation that connects the multiple EMR clusters. An entity can act both as a producer of data assets and as a consumer of data assets.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

The Future Is Hybrid Data, Embrace It

CIO Business Intelligence

JUNE 23, 2022

We deliver cloud-native data analytics across the full data lifecycle – data distribution, data engineering, data warehousing, transactional data, streaming data, data science, and machine learning – that’s portable across infrastructures. Jonathan Takiff / IDG.

IT

IT Data Architecture Unstructured Data Big Data

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Data fabric and data mesh are emerging data management concepts that are meant to address the organizational change and complexities of understanding, governing and working with enterprise data in a hybrid multicloud ecosystem. The good news is that both data architecture concepts are complimentary.

Management

Management Metadata Data Architecture Data Lake

How the right data and AI foundation can empower a successful ESG strategy

IBM Big Data Hub

APRIL 10, 2023

A well-designed data architecture should support business intelligence and analysis, automation, and AI—all of which can help organizations to quickly seize market opportunities, build customer value, drive major efficiencies, and respond to risks such as supply chain disruptions.

Strategy

Strategy Data Architecture Cost-Benefit Reporting

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

They conveniently store data in a flat architecture that can be queried in aggregate and offer the speed and lower cost required for big data analytics. On the other hand, they don’t support transactions or enforce data quality. If only there were a best-of-both-worlds compromise. .

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

Data domain producers publish data assets using datasource run to Amazon DataZone in the Central Governance account. This populates the technical metadata in the business data catalog for each data asset. Data ownership remains with the producer.

Data Lake

Data Lake Publishing Metadata Data-driven

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

It seamlessly consolidates data from various data sources within AWS, including AWS Cost Explorer (and forecasting with Cost Explorer ), AWS Trusted Advisor , and AWS Compute Optimizer. They can use their own toolsets or rely on provided blueprints to ingest the data from source systems.

Analytics

Analytics Dashboards Metadata Data Warehouse

Embedding AI Into Every Aspect of Your Business

Cloudera

JULY 20, 2021

Thousands of our customers across all industries are harnessing the power of their data in order to drive insights and innovation. Once companies are able to leverage their data they’re then able to fuel machine learning and analytics models, transforming their business by embedding AI into every aspect of their business. .

Manufacturing

Manufacturing Forecasting IoT Insurance

Dive deep into security management: The Data on EKS Platform

AWS Big Data

APRIL 29, 2024

The construction of big data applications based on open source software has become increasingly uncomplicated since the advent of projects like Data on EKS , an open source project from AWS to provide blueprints for building data and machine learning (ML) applications on Amazon Elastic Kubernetes Service (Amazon EKS).

Management

Management Big Data Data Warehouse Metadata

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

AWS Big Data

JULY 25, 2024

Amazon DataZone provides rich functionality to help a data platform team distribute ownership of tasks so that these teams can choose to operate less like gatekeepers. In Amazon DataZone, data owners can publish their data and its business catalog (metadata) to ATPCO’s DataZone domain. For Publishing settings , select No.

Data Lake

Data Lake Metadata Sales Publishing

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

In this post, we are excited to summarize the features that the AWS Glue Data Catalog, AWS Glue crawler, and Lake Formation teams delivered in 2022. Whether you are a data platform builder, data engineer, data scientist, or any technology leader interested in data lake solutions, this post is for you.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

However, when a data producer shares data products on a data mesh self-serve web portal, it’s neither intuitive nor easy for a data consumer to know which data products they can join to create new insights. This is especially true in a large enterprise with thousands of data products.

Technology

Technology Data-driven Machine Learning Sales

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

In this post, we discuss how the Amazon Finance Automation team used AWS Lake Formation and the AWS Glue Data Catalog to build a data mesh architecture that simplified data governance at scale and provided seamless data access for analytics, AI, and machine learning (ML) use cases.

Finance

Finance Metadata Big Data Recreation/Entertainment

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. In recent years, the term “data lakehouse” was coined to describe this architectural pattern of tabular analytics over data in the data lake.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

Limiting growth by (data integration) complexity Most operational IT systems in an enterprise have been developed to serve a single business function and they use the simplest possible model for this. No matter what machine learning or graph algorithms are used, they cannot uncover dependencies if the corresponding “signals” are missing.

Metadata

Metadata Slice and Dice Data Integration Enterprise

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

SEPTEMBER 28, 2020

This customer’s workloads leverage batch processing of data from 100+ backend database sources like Oracle, SQL Server, and traditional Mainframes using Syncsort. Data Science and machine learning workloads using CDSW. The customer is a heavy user of Kafka for data ingestion. on roadmap). Phase 4: Implementation.

Testing

Testing Metadata Risk Data Science

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

However, even the most powerful systems can experience performance degradation if they encounter anti-patterns like grossly inaccurate table statistics, such as the row count metadata. This can have a significant impact on overall query performance.

Data Lake

Data Lake Statistics Broadcasting Optimization

How EUROGATE established a data mesh architecture using Amazon DataZone

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Trending Sources

What is a Data Mesh?

Webinars

Metadata Improves Security, Quality, and Transparency

Data’s dark secret: Why poor quality cripples AI and growth

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

How to Manage Risk with Modern Data Architectures

Breaking State and Local Data Silos with Modern Data Architectures

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

What is a data architect? Skills, salaries, and how to become a data framework master

Unstructured data management and governance using AWS AI/ML and analytics services

Introducing Apache Iceberg in Cloudera Data Platform

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

How Cargotec uses metadata replication to enable cross-account data sharing

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

What is data governance? Best practices for managing data assets

Data democratization: How data architecture can drive business decisions and AI initiatives

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Top analytics announcements of AWS re:Invent 2024

The Future Is Hybrid Data, Embrace It

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

BI Data Lineage Solutions: Your Trusted Guide For Success

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

The Future Is Hybrid Data, Embrace It

Augmented data management: Data fabric versus data mesh

How the right data and AI foundation can empower a successful ESG strategy

Building a Beautiful Data Lakehouse

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Embedding AI Into Every Aspect of Your Business

Dive deep into security management: The Data on EKS Platform

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

AWS Lake Formation 2022 year in review

Automate discovery of data relationships using ML and Amazon Neptune graph technology

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

The Future of the Data Lakehouse – Open

You Cannot Get to the Moon on a Bike!

Upgrade Journey: The Path from CDH to CDP Private Cloud

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Stay Connected