Big Data, Data Lake and Metadata

Big Data

Data Lake

Metadata

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Webinars

Trending Sources

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Understanding the Differences Between Data Lakes and Data Warehouses

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Use Apache Iceberg in a data lake to support incremental data processing

Recap of Amazon Redshift key product announcements in 2024

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Choosing an open table format for your transactional data lake on AWS

Build a high-performance quant research platform with Apache Iceberg

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Multicloud data lake analytics with Amazon Athena

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

How EUROGATE established a data mesh architecture using Amazon DataZone

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Data Lakes on Cloud & it’s Usage in Healthcare

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Enrich your serverless data lake with Amazon Bedrock

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Amazon SageMaker Lakehouse now supports attribute-based access control

Build a real-time GDPR-aligned Apache Iceberg data lake

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

Build a data lake with Apache Flink on Amazon EMR

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Efficiently crawl your data lake and improve data access with an AWS Glue crawler using partition indexes

Data Lakes: What Are They and Who Needs Them?

Top analytics announcements of AWS re:Invent 2024

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

Stay Connected