Data Lake, Management and Optimization

Data Lake

Management

Optimization

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. source_s3_bucket – The raw S3 bucket name. S3FileIO").getOrCreate()

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Incremental refresh for Amazon Redshift materialized views on data lake tables

Webinars

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

What is data architecture? A framework to manage data

The Unexpected Cost of Data Copies

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Drug Launch Case Study: Amazing Efficiency Using DataOps

Recap of Amazon Redshift key product announcements in 2024

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Multicloud data lake analytics with Amazon Athena

Choosing an open table format for your transactional data lake on AWS

How Salesforce optimized their detection and response platform using AWS managed services

MongoDB Enhances Developer Data Platform

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Optimize your workloads with Amazon Redshift Serverless AI-driven scaling and optimization

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Use Apache Iceberg in a data lake to support incremental data processing

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Enrich your serverless data lake with Amazon Bedrock

Build a high-performance quant research platform with Apache Iceberg

The success of GenAI models lies in your data management strategy

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

Amazon SageMaker Lakehouse now supports attribute-based access control

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Steps taken to build Sevita’s first enterprise data platform

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

How EUROGATE established a data mesh architecture using Amazon DataZone

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Top 15 data management platforms

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Centralize Your Data Processes With a DataOps Process Hub

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Denodo Provides a Logical Approach to Data Management

Implementing a Pharma Data Mesh using DataOps

Automate replication of relational sources into a transactional data lake with Apache Iceberg and AWS Glue

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Stay Connected