Data Lake, Management and Snapshot

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

AWS Big Data

NOVEMBER 11, 2024

Amazon OpenSearch Service is a fully managed service offered by AWS that enables you to deploy, operate, and scale OpenSearch domains effortlessly. This post focuses on introducing an active-passive approach using a snapshot and restore strategy. OpenSearch is a distributed search and analytics engine, which is an open-source project.

Snapshot

Snapshot Strategy Dashboards Data Lake

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. The Data Catalog provides the functionality as the Iceberg catalog. Determine the changes in transaction, and write new data files.

Snapshot

Snapshot Management Metadata Big Data

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

Open table formats are emerging in the rapidly evolving domain of big data management, fundamentally altering the landscape of data storage and analysis. Their ability to resolve critical issues such as data consistency, query efficiency, and governance renders them indispensable for data- driven organizations.

Snapshot

Snapshot Metadata Data Lake Optimization

Webinars

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

MORE WEBINARS

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

In this post, we focus on data management implementation options such as accessing data directly in Amazon Simple Storage Service (Amazon S3), using popular data formats like Parquet, or using open table formats like Iceberg. Data management is the foundation of quantitative research.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Webinars

Trending Sources

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Webinars

Build a high-performance quant research platform with Apache Iceberg

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Use Apache Iceberg in a data lake to support incremental data processing

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Choosing an open table format for your transactional data lake on AWS

Load data incrementally from transactional data lakes to data warehouses

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

MLOps and DevOps: Why Data Makes It Different

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Introducing Apache Hudi support with AWS Glue crawlers

Amazon Managed Service for Apache Flink now supports Apache Flink version 1.19

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor

Build a data lake with Apache Flink on Amazon EMR

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

How Gupshup built their multi-tenant messaging analytics platform on Amazon Redshift

Analyze Elastic IP usage history using Amazon Athena and AWS CloudTrail

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

Implement disaster recovery with Amazon Redshift

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

Exploring real-time streaming for generative AI Applications

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

Unleashing the power of Presto: The Uber case study

Stay Connected