Analytics, Data Lake and Data Processing

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. To start the job, choose Run. format(dbname)).config("spark.sql.catalog.glue_catalog.catalog-impl",

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Trending Sources

Important Considerations When Migrating to a Data Lake

Webinars

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Accomplish Agile Business Intelligence & Analytics For Your Business

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Eight Top DataOps Trends for 2022

Enrich your serverless data lake with Amazon Bedrock

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

How EUROGATE established a data mesh architecture using Amazon DataZone

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Scaling RISE with SAP data and AWS Glue

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Build a data lake with Apache Flink on Amazon EMR

The essential check list for effective data democratization

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

Data Management Requirements for the Enterprise Data Lake

Top 15 data management platforms

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

10 Things AWS Can Do for Your SaaS Company

Query your Apache Hive metastore with AWS Lake Formation permissions

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

Create your Private Data Warehousing Environment Using Azure Kubernetes Service

Announcing the 2020 Data Impact Award Winners

Habib Bank manages data at scale with Cloudera Data Platform

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

DS Smith sets a single-cloud agenda for sustainability

AWS Glue crawlers support cross-account crawling to support data mesh architecture

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Governing data in relational databases using Amazon DataZone

Preparing the foundations for Generative AI

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Use AWS Glue to streamline SFTP data processing

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

Stay Connected