Data Lake, IT and Snapshot - Data Leaders Brief

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

AWS Big Data

NOVEMBER 11, 2024

This post focuses on introducing an active-passive approach using a snapshot and restore strategy. Snapshot and restore in OpenSearch Service The snapshot and restore strategy in OpenSearch Service involves creating point-in-time backups, known as snapshots , of your OpenSearch domain.

Snapshot

Snapshot Strategy Dashboards Data Lake

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. The Data Catalog provides the functionality as the Iceberg catalog. Determine the changes in transaction, and write new data files.

Snapshot

Snapshot Management Metadata Big Data

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

Why: Data Makes It Different. In contrast, a defining feature of ML-powered applications is that they are directly exposed to a large amount of messy, real-world data which is too complex to be understood and modeled by hand. However, the concept is quite abstract. Can’t we just fold it into existing DevOps best practices?

IT

IT Testing Experimentation Software

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

Open table formats are emerging in the rapidly evolving domain of big data management, fundamentally altering the landscape of data storage and analysis. By providing a standardized framework for data representation, open table formats break down data silos, enhance data quality, and accelerate analytics at scale.

Snapshot

Snapshot Metadata Data Lake Optimization

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Initially, data warehouses were the go-to solution for structured data and analytical workloads but were limited by proprietary storage formats and their inability to handle unstructured data. Eventually, transactional data lakes emerged to add transactional consistency and performance of a data warehouse to the data lake.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Webinars

Trending Sources

MLOps and DevOps: Why Data Makes It Different

Webinars

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Run Apache XTable in AWS Lambda for background conversion of open table formats

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Use Apache Iceberg in a data lake to support incremental data processing

Build a high-performance quant research platform with Apache Iceberg

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Choosing an open table format for your transactional data lake on AWS

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Load data incrementally from transactional data lakes to data warehouses

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

Introducing Apache Hudi support with AWS Glue crawlers

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Build a data lake with Apache Flink on Amazon EMR

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

Analyze Elastic IP usage history using Amazon Athena and AWS CloudTrail

Implement disaster recovery with Amazon Redshift

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

How Gupshup built their multi-tenant messaging analytics platform on Amazon Redshift

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

Unleashing the power of Presto: The Uber case study

Materialized Views in Hive for Iceberg Table Format

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Stay Connected