Big Data, Data Lake and Reference

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Initially, data warehouses were the go-to solution for structured data and analytical workloads but were limited by proprietary storage formats and their inability to handle unstructured data. Eventually, transactional data lakes emerged to add transactional consistency and performance of a data warehouse to the data lake.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

AWS Big Data

DECEMBER 9, 2024

Today, many customers build data quality validation pipelines using its Data Quality Definition Language (DQDL) because with static rules, dynamic rules , and anomaly detection capability , its fairly straightforward. One of its key features is the ability to manage data using branches.

Data Quality

Data Quality Publishing Snapshot Data Lake

Multicloud data lake analytics with Amazon Athena

AWS Big Data

MARCH 18, 2024

Many organizations operate data lakes spanning multiple cloud data stores. In these cases, you may want an integrated query layer to seamlessly run analytical queries across these diverse cloud stores and streamline your data analytics processes. Refer to Using Amazon Athena Federated Query for further details.

Run Apache XTable in AWS Lambda for background conversion of open table formats

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

Webinars

Trending Sources

Multicloud data lake analytics with Amazon Athena

Webinars

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

Recap of Amazon Redshift key product announcements in 2024

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Load data incrementally from transactional data lakes to data warehouses

Choosing an open table format for your transactional data lake on AWS

Data Lakes on Cloud & it’s Usage in Healthcare

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Use Apache Iceberg in a data lake to support incremental data processing

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Enrich your serverless data lake with Amazon Bedrock

Insiders Cite The Wondrous Benefits Of Big Data In Fortnite

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Understanding Apache Iceberg on AWS with the new technical guide

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

Automate replication of relational sources into a transactional data lake with Apache Iceberg and AWS Glue

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started

Build a real-time GDPR-aligned Apache Iceberg data lake

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

Efficiently crawl your data lake and improve data access with an AWS Glue crawler using partition indexes

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Stay Connected