Data Architecture, Data Lake and Reference

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. For more examples and references to other posts, refer to the following GitHub repository.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift enables you to directly access data stored in Amazon Simple Storage Service (Amazon S3) using SQL queries and join data across your data warehouse and data lake. With Amazon Redshift, you can query the data in your S3 data lake using a central AWS Glue metastore from your Redshift data warehouse.

Run Apache XTable in AWS Lambda for background conversion of open table formats

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Webinars

Trending Sources

Load data incrementally from transactional data lakes to data warehouses

Webinars

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Choosing an open table format for your transactional data lake on AWS

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Automate replication of relational sources into a transactional data lake with Apache Iceberg and AWS Glue

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

What is a data architect? Skills, salaries, and how to become a data framework master

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Lake Formation 2022 year in review

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Why the Data Journey Manifesto?

AWS Glue crawlers support cross-account crawling to support data mesh architecture

Enhance data security and governance for Amazon Redshift Spectrum with VPC endpoints

Insiders Cite The Wondrous Benefits Of Big Data In Fortnite

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Simplify access management with Amazon Redshift and AWS Lake Formation for users in an External Identity Provider

Exploring real-time streaming for generative AI Applications

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

An Introduction to Disaster Recovery with the Cloudera Data Platform

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

A Day in the Life of a DataOps Engineer

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Convergent Evolution

Demystifying Modern Data Platforms

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Generic orchestration framework for data warehousing workloads using Amazon Redshift RSQL

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Stay Connected