Data Lake, Interactive and Metadata

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Table metadata is fetched from AWS Glue. The generated Athena SQL query is run.

Metadata

Metadata Data Lake Modeling Data Warehouse

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

This improvement streamlines the ability to access and manage your Airflow environments and their integration with external systems, and allows you to interact with your workflows programmatically. Airflow REST API The Airflow REST API is a programmatic interface that allows you to interact with Airflow’s core functionalities.

Interactive

Interactive Testing Data-driven Data Lake

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Initially, data warehouses were the go-to solution for structured data and analytical workloads but were limited by proprietary storage formats and their inability to handle unstructured data. Eventually, transactional data lakes emerged to add transactional consistency and performance of a data warehouse to the data lake.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. However, commits can still fail if the latest metadata is updated after the base metadata version is established.

Snapshot

Snapshot Management Metadata Big Data

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Webinars

Trending Sources

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Build a high-performance quant research platform with Apache Iceberg

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Recap of Amazon Redshift key product announcements in 2024

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

What is a Data Mesh?

Enrich your serverless data lake with Amazon Bedrock

Build a real-time GDPR-aligned Apache Iceberg data lake

Data Lakes: What Are They and Who Needs Them?

Top analytics announcements of AWS re:Invent 2024

How to modernize data lakes with a data lakehouse architecture

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

Build a data lake with Apache Flink on Amazon EMR

Data Cataloging in the Data Lake: Alation + Kylo

Data governance in the age of generative AI

The Future of the Data Lakehouse – Open

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Integrating Data Governance and Enterprise Architecture

The Future of the Data Lakehouse – Open

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Salesforce readies Einstein Copilot to unleash generative AI across its offerings

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

Lay the groundwork now for advanced analytics and AI

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Achieve your AI goals with an open data lakehouse approach

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Query AWS Glue Data Catalog views using Amazon Athena and Amazon Redshift

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

Integrate custom applications with AWS Lake Formation – Part 1

AWS Lake Formation 2022 year in review

Stay Connected