Big Data, Data Lake and Metadata

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Webinars

Trending Sources

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Multicloud data lake analytics with Amazon Athena

Use Apache Iceberg in a data lake to support incremental data processing

Choosing an open table format for your transactional data lake on AWS

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Understanding the Differences Between Data Lakes and Data Warehouses

Enrich your serverless data lake with Amazon Bedrock

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

How Cargotec uses metadata replication to enable cross-account data sharing

Build a real-time GDPR-aligned Apache Iceberg data lake

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

Efficiently crawl your data lake and improve data access with an AWS Glue crawler using partition indexes

Data Lakes on Cloud & it’s Usage in Healthcare

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

Data Cataloging in the Data Lake: Alation + Kylo

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Data Lakes: What Are They and Who Needs Them?

Introducing Apache Hudi support with AWS Glue crawlers

Use AWS Glue Data Catalog views to analyze data

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

Where Do Data Catalogs Fit in Metadata Management?

What is an open data lakehouse and why you should care?

Building a Beautiful Data Lakehouse

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Unstructured data management and governance using AWS AI/ML and analytics services

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Query AWS Glue Data Catalog views using Amazon Athena and Amazon Redshift

Stay Connected