Data Lake, Metadata and Optimization

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue.

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Webinars

Trending Sources

Recap of Amazon Redshift key product announcements in 2024

Webinars

Build a high-performance quant research platform with Apache Iceberg

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Use Apache Iceberg in a data lake to support incremental data processing

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Choosing an open table format for your transactional data lake on AWS

Use open table format libraries on AWS Glue 5.0 for Apache Spark

How EUROGATE established a data mesh architecture using Amazon DataZone

Multicloud data lake analytics with Amazon Athena

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Enrich your serverless data lake with Amazon Bedrock

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Top analytics announcements of AWS re:Invent 2024

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Speed up queries with the cost-based optimizer in Amazon Athena

Amazon SageMaker Lakehouse now supports attribute-based access control

AWS Lake Formation 2023 year in review

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

Efficiently crawl your data lake and improve data access with an AWS Glue crawler using partition indexes

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Data Lakes: What Are They and Who Needs Them?

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Introducing Apache Hudi support with AWS Glue crawlers

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

Data’s dark secret: Why poor quality cripples AI and growth

Use Amazon Athena with Spark SQL for your open-source transactional table formats

How to modernize data lakes with a data lakehouse architecture

Building a Beautiful Data Lakehouse

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Doing Cloud Migration and Data Governance Right the First Time

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Stay Connected