Data Architecture, Data Lake and Reference

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. For more examples and references to other posts, refer to the following GitHub repository.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions.

Run Apache XTable in AWS Lambda for background conversion of open table formats

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Webinars

Trending Sources

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Webinars

Load data incrementally from transactional data lakes to data warehouses

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Choosing an open table format for your transactional data lake on AWS

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Automate replication of relational sources into a transactional data lake with Apache Iceberg and AWS Glue

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

What is a data architect? Skills, salaries, and how to become a data framework master

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Lake Formation 2022 year in review

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Glue crawlers support cross-account crawling to support data mesh architecture

Why the Data Journey Manifesto?

Enhance data security and governance for Amazon Redshift Spectrum with VPC endpoints

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Insiders Cite The Wondrous Benefits Of Big Data In Fortnite

Exploring real-time streaming for generative AI Applications

Simplify access management with Amazon Redshift and AWS Lake Formation for users in an External Identity Provider

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

An Introduction to Disaster Recovery with the Cloudera Data Platform

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

A Day in the Life of a DataOps Engineer

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Convergent Evolution

Demystifying Modern Data Platforms

Visualize data quality scores and metrics generated by AWS Glue Data Quality

Estimating Scope 1 Carbon Footprint with Amazon Athena

Generic orchestration framework for data warehousing workloads using Amazon Redshift RSQL

Stay Connected