Big Data, Data Lake and Data Processing

Big Data

Data Lake

Data Processing

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. To start the job, choose Run. format(dbname)).config("spark.sql.catalog.glue_catalog.catalog-impl",

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Important Considerations When Migrating to a Data Lake

Webinars

Trending Sources

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Webinars

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Enrich your serverless data lake with Amazon Bedrock

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Scaling RISE with SAP data and AWS Glue

How EUROGATE established a data mesh architecture using Amazon DataZone

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Build a data lake with Apache Flink on Amazon EMR

Why Big Data Needs A Robust Off-Site Data Backup Method

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Query your Apache Hive metastore with AWS Lake Formation permissions

Use AWS Glue to streamline SFTP data processing

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Announcing the 2020 Data Impact Award Winners

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

Governing data in relational databases using Amazon DataZone

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

Data Management Requirements for the Enterprise Data Lake

10 Things AWS Can Do for Your SaaS Company

AWS Glue crawlers support cross-account crawling to support data mesh architecture

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Run Spark SQL on Amazon Athena Spark

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Attribute Amazon EMR on EC2 costs to your end-users

Enhance query performance using AWS Glue Data Catalog column-level statistics

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Access Amazon Athena in your applications using the WebSocket API

Implement alerts in Amazon OpenSearch Service with PagerDuty

Stay Connected