Data Analytics, Data Architecture and Metadata

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

This week on the keynote stages at AWS re:Invent 2024, you heard from Matt Garman, CEO, AWS, and Swami Sivasubramanian, VP of AI and Data, AWS, speak about the next generation of Amazon SageMaker , the center for all of your data, analytics, and AI. The relationship between analytics and AI is rapidly evolving.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. Third-generation – more or less like the previous generation but with streaming data, cloud, machine learning and other (fill-in-the-blank) fancy tools. See the pattern?

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Need for a data mesh architecture Because entities in the EUROGATE group generate vast amounts of data from various sourcesacross departments, locations, and technologiesthe traditional centralized data architecture struggles to keep up with the demands for real-time insights, agility, and scalability.

IoT

IoT Machine Learning Metadata Data-driven

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI.

Metadata

Metadata Data Lake Dashboards Interactive

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS. Each file arrives as a pair with a tail metadata file in CSV format containing the size and name of the file.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

This solution only replicates metadata in the Data Catalog, not the actual underlying data. To have a redundant data lake using Lake Formation and AWS Glue in an additional Region, we recommend replicating the Amazon S3-based storage using S3 replication , S3 sync, aws-s3-copy-sync-using-batch or S3 Batch replication process.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Several factors determine the quality of your enterprise data like accuracy, completeness, consistency, to name a few. But there’s another factor of data quality that doesn’t get the recognition it deserves: your data architecture. How the right data architecture improves data quality.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

But most important of all, the assumed dormant value in the unstructured data is a question mark, which can only be answered after these sophisticated techniques have been applied. Therefore, there is a need to being able to analyze and extract value from the data economically and flexibly. The solution integrates data in three tiers.

Unstructured Data

Unstructured Data Metadata Management Analytics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

AWS Big Data

NOVEMBER 7, 2024

BladeBridge offers a comprehensive suite of tools that automate much of the complex conversion work, allowing organizations to quickly and reliably transition their data analytics capabilities to the scalable Amazon Redshift data warehouse. She is passionate about data analytics and data science.

Data Warehouse

Data Warehouse Reporting Big Data Data Lake

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

In this post, we walk you through the top analytics announcements from re:Invent 2024 and explore how these innovations can help you unlock the full potential of your data. S3 Metadata is designed to automatically capture metadata from objects as they are uploaded into a bucket, and to make that metadata queryable in a read-only table.

Analytics

Analytics Data Lake Metadata Data Warehouse

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

Want to manage and analyze data of all types including machine, structured, transactional, and unstructured – anywhere? Only Cloudera has the power to span multi-cloud and on-premises with a hybrid data platform. Common security, governance, metadata, replication, and automation enable CDP to operate as an integrated system.

IT

IT Data Architecture Unstructured Data Big Data

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The program must introduce and support standardization of enterprise data. Programs must support proactive and reactive change management activities for reference data values and the structure/use of master data and metadata.

Data Governance

Data Governance Management Metadata Data Quality

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

They chose AWS Glue as their preferred data integration tool due to its serverless nature, low maintenance, ability to control compute resources in advance, and scale when needed. To share the datasets, they needed a way to share access to the data and access to catalog metadata in the form of tables and views.

Metadata

Metadata Data Lake Machine Learning Big Data

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

In this blog post, we dive into different data aspects and how Cloudinary breaks the two concerns of vendor locking and cost efficient data analytics by using Apache Iceberg, Amazon Simple Storage Service (Amazon S3 ), Amazon Athena , Amazon EMR , and AWS Glue. This concept makes Iceberg extremely versatile. SparkActions.get().expireSnapshots(iceTable).expireOlderThan(TimeUnit.DAYS.toMillis(7)).execute()

Data Lake

Data Lake Metadata Snapshot Analytics

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

First, you must understand the existing challenges of the data team, including the data architecture and end-to-end toolchain. The biggest challenge is broken data pipelines due to highly manual processes. Figure 1 shows a manually executed data analytics pipeline. Monitoring Job Metadata.

Testing

Testing Metadata Dashboards Statistics

The Future Is Hybrid Data, Embrace It

CIO Business Intelligence

JUNE 23, 2022

Want to manage and analyze data of all types including machine, structured, transactional, and unstructured – anywhere? Only Cloudera has the power to span multi-cloud and on-premises with a hybrid data platform. Common security, governance, metadata, replication, and automation enable CDP to operate as an integrated system.

IT

IT Data Architecture Unstructured Data Big Data

Surviving Radical Disruption with Data Intelligence

erwin

OCTOBER 16, 2020

And this time sensitivity is a massive issue, as taking a proactive and data-driven approach can literally mean life or death to your business or to your customers. And that’s where data analytics can play a huge role. There’s a common denominator in what they’re all missing, and that is data intelligence.

Internet of Things

Internet of Things Data-driven Uncertainty Data Governance

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

Applying artificial intelligence (AI) to data analytics for deeper, better insights and automation is a growing enterprise IT priority. But the data repository options that have been around for a while tend to fall short in their ability to serve as the foundation for big data analytics powered by AI.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

AWS Glue Data Catalog stores information as metadata tables, where each table specifies a single data store. The AWS Glue crawler writes metadata to the Data Catalog by classifying the data to determine the format, schema, and associated properties of the data. Big Data Architect.

Metadata

Metadata Dashboards Metrics Visualization

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue.

Data Lake

Data Lake Data Processing Optimization Machine Learning

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. To address this challenge, organizations can deploy a data mesh using AWS Lake Formation that connects the multiple EMR clusters. An entity can act both as a producer of data assets and as a consumer of data assets.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Data fabric and data mesh are emerging data management concepts that are meant to address the organizational change and complexities of understanding, governing and working with enterprise data in a hybrid multicloud ecosystem. The good news is that both data architecture concepts are complimentary.

Management

Management Metadata Data Architecture Data Lake

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Insights from Gartner Data & Analytics Summit Orlando 2023

Alation

MARCH 31, 2023

Ehtisham Zaidi, Gartner’s VP of data management, and Robert Thanaraj, Gartner’s director of data management, gave an update on the fabric versus mesh debate in light of what they call the “active metadata era” we’re currently in. The foundations of successful data governance The state of data governance was also top of mind.

Data Analytics

Data Analytics Analytics Metadata Data Governance

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

AWS Big Data

JUNE 12, 2024

Swisscom’s Data, Analytics, and AI division is building a One Data Platform (ODP) solution that will enable every Swisscom employee, process, and product to benefit from the massive value of Swisscom’s data. The following high-level architecture diagram shows ODP with different layers of the modern data architecture.

Data Architecture

Data Architecture Cost-Benefit Data-driven Experimentation

AI at Scale isn’t Magic, it’s Data – Hybrid Data

Cloudera

OCTOBER 11, 2022

Al needs machine learning (ML), ML needs data science. Data science needs analytics. And they all need lots of data. Different data types need different types of analytics – real-time, streaming, operational, data warehouses. And that data is likely in clouds, in data centers and at the edge.

Data Science

Data Science Snapshot Data Warehouse Metadata

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

However, even the most powerful systems can experience performance degradation if they encounter anti-patterns like grossly inaccurate table statistics, such as the row count metadata. He is passionate about data analytics and empowering customers to change the world with their data.

Data Lake

Data Lake Statistics Broadcasting Optimization

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

SEPTEMBER 28, 2020

The customer leverages Cloudera’s multi-function analytics stack in CDP. The data lifecycle model ingests data using Kafka, enriches that data with Spark-based batch process, performs deep data analytics using Hive and Impala, and finally uses that data for data science using Cloudera Data Science Workbench to get deep insights.

Testing

Testing Metadata Risk Data Science

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

While Cloudera CDH was already a success story at HBL, in 2022, HBL identified the need to move its customer data centre environment from Cloudera’s CDH to Cloudera Data Platform (CDP) Private Cloud to accommodate growing volumes of data. Smooth, hassle-free deployment in just six weeks.

Management

Management Data Lake Consulting Unstructured Data

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

Cost and resource efficiency – This is an area where Acast observed a reduction in data duplication, and therefore cost reduction (in some accounts, removing the copy of data 100%), by reading data across accounts while enabling scaling. In this approach, teams responsible for generating data are referred to as producers.

Data-driven

Data-driven Advertising Metadata Data Architecture

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

If the asset has AWS Glue Data Quality enabled, you can now quickly visualize the data quality score directly in the catalog search pane. By selecting the corresponding asset, you can understand its content through the readme, glossary terms , and technical and business metadata.

Data Quality

Data Quality Visualization Metadata Metrics

Why We Started the Data Intelligence Project

Alation

JULY 7, 2022

In 2013 I joined American Family Insurance as a metadata analyst. I had always been fascinated by how people find, organize, and access information, so a metadata management role after school was a natural choice. The use cases for metadata are boundless, offering opportunities for innovation in every sector.

Metadata

Metadata Data-driven Insurance Statistics

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. Metadata tables offer insights into the physical data storage layout of the tables and offer the convenience of querying them with Athena version 3.

Data Lake

Data Lake Analytics Snapshot Data Quality

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

AWS Big Data

FEBRUARY 7, 2024

Download the IAM Identity Center SAML metadata file to use in a later step. Choose Import from XML file and import the IAM Identity Center SAML metadata file that you downloaded in an earlier step. Luca Menichetti is a Big Data Architect with Amazon Web Services. Take note of the group ID. Create a new custom SAML 2.0

Dashboards

Dashboards Data Processing Metadata Consulting

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

This is the first post to a blog series that offers common architectural patterns in building real-time data streaming infrastructures using Kinesis Data Streams for a wide range of use cases. Refer to Amazon Kinesis Data Streams integrations for additional details.

Analytics

Analytics IoT Data-driven Snapshot

How Huron built an Amazon QuickSight Asset Catalogue with AWS CDK Based Deployment Pipeline

AWS Big Data

APRIL 26, 2023

Having an accurate and up-to-date inventory of all technical assets helps an organization ensure it can keep track of all its resources with metadata information such as their assigned oners, last updated date, used by whom, how frequently and more. This is a guest blog post co-written with Corey Johnson from Huron.

Metadata

Metadata Dashboards Visualization Consulting

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Profile aggregation – When you’ve uniquely identified a customer, you can build applications in Managed Service for Apache Flink to consolidate all their metadata, from name to interaction history. Then, you transform this data into a concise format.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

And, as organizations progress and grow, “data drift” starts to impact data usage, models, and your business. In today’s AI/ML-driven world of data analytics, explainability needs a repository just as much as those doing the explaining need access to metadata, EG, information about the data being used.

Metadata

Metadata Data Governance Data-driven Modeling

2023 Predictions: Data Trends That Will Dominate Business Agenda in APAC

Cloudera

JANUARY 5, 2023

We have been working with APAC organizations to operationalize data analytics and AI solutions to unlock data-driven decision-making and operational efficiency, with them quickly seeing distinct business benefits. These features provide businesses with a common metadata, security, and governance model across all their data.

Cost-Benefit

Cost-Benefit Business Objectives Machine Learning Data Architecture

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. Amazon EMR Serverless is a serverless option in Amazon EMR that makes it easy for data analysts and engineers to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers.

Data Lake

Data Lake Dashboards Metrics Metadata

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Streaming jobs constantly ingest new data to synchronize across systems and can perform enrichment, transformations, joins, and aggregations across windows of time more efficiently. Data streaming enables you to ingest data from a variety of databases across various systems.

Data Lake

Data Lake Unstructured Data Management Snapshot

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

What is a Data Mesh?

Webinars

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

Data architecture strategy for data quality

Unstructured data management and governance using AWS AI/ML and analytics services

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

Top analytics announcements of AWS re:Invent 2024

The Future Is Hybrid Data, Embrace It

What is data governance? Best practices for managing data assets

How Cargotec uses metadata replication to enable cross-account data sharing

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

A Day in the Life of a DataOps Engineer

The Future Is Hybrid Data, Embrace It

Surviving Radical Disruption with Data Intelligence

Building a Beautiful Data Lakehouse

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Augmented data management: Data fabric versus data mesh

Choosing an open table format for your transactional data lake on AWS

Insights from Gartner Data & Analytics Summit Orlando 2023

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

AI at Scale isn’t Magic, it’s Data – Hybrid Data

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Upgrade Journey: The Path from CDH to CDP Private Cloud

Habib Bank manages data at scale with Cloudera Data Platform

The Future of the Data Lakehouse – Open

Design a data mesh on AWS that reflects the envisioned organization

The Future of the Data Lakehouse – Open

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

Why We Started the Data Intelligence Project

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

How Huron built an Amazon QuickSight Asset Catalogue with AWS CDK Based Deployment Pipeline

Create an end-to-end data strategy for Customer 360 on AWS

The Cloud Connection: How Governance Supports Security

2023 Predictions: Data Trends That Will Dominate Business Agenda in APAC

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Exploring real-time streaming for generative AI Applications

Stay Connected