Data Lake and Data Processing - Data Leaders Brief

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. To start the job, choose Run. format(dbname)).config("spark.sql.catalog.glue_catalog.catalog-impl",

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Important Considerations When Migrating to a Data Lake

Webinars

Trending Sources

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Enrich your serverless data lake with Amazon Bedrock

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Eight Top DataOps Trends for 2022

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

The success of GenAI models lies in your data management strategy

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Scaling RISE with SAP data and AWS Glue

How EUROGATE established a data mesh architecture using Amazon DataZone

Build a data lake with Apache Flink on Amazon EMR

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Data Management Requirements for the Enterprise Data Lake

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Your New Cloud for AI May Be Inside a Colo

Top 15 data management platforms

The essential check list for effective data democratization

Query your Apache Hive metastore with AWS Lake Formation permissions

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

BMC on BMC: How the company enables IT observability with BMC Helix and AIOps

10 Things AWS Can Do for Your SaaS Company

Create your Private Data Warehousing Environment Using Azure Kubernetes Service

AWS Glue crawlers support cross-account crawling to support data mesh architecture

Habib Bank manages data at scale with Cloudera Data Platform

DS Smith sets a single-cloud agenda for sustainability

Governing data in relational databases using Amazon DataZone

Preparing the foundations for Generative AI

Migrate Hive data from CDH to CDP public cloud

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Announcing the 2020 Data Impact Award Winners

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Accomplish Agile Business Intelligence & Analytics For Your Business

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

Use AWS Glue to streamline SFTP data processing

Stay Connected