Data Lake, Metadata and Reference

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback.

Use Apache Iceberg in a data lake to support incremental data processing

Multicloud data lake analytics with Amazon Athena

Trending Sources

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Choosing an open table format for your transactional data lake on AWS

Data Lakes on Cloud & it’s Usage in Healthcare

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Enrich your serverless data lake with Amazon Bedrock

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Build a real-time GDPR-aligned Apache Iceberg data lake

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

Run Apache XTable in AWS Lambda for background conversion of open table formats

Salesforce debuts Zero Copy Partner Network to ease data integration

Efficiently crawl your data lake and improve data access with an AWS Glue crawler using partition indexes

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

Driving Business Value and ROI from a Hybrid Cloud Data Lake

Data Cataloging in the Data Lake: Alation + Kylo

Data governance in the age of generative AI

Use AWS Glue Data Catalog views to analyze data

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Unstructured data management and governance using AWS AI/ML and analytics services

A Day in the Life of a DataOps Engineer

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

Denodo Provides a Logical Approach to Data Management

What is a data architect? Skills, salaries, and how to become a data framework master

How Cargotec uses metadata replication to enable cross-account data sharing

Recap of Amazon Redshift key product announcements in 2024

Build a high-performance quant research platform with Apache Iceberg

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Set up cross-account AWS Glue Data Catalog access using AWS Lake Formation and AWS IAM Identity Center with Amazon Redshift and Amazon QuickSight

Amazon DataZone announces custom blueprints for AWS services

Stay Connected