Data Lake, Metadata and Reference

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Table metadata is fetched from AWS Glue. The generated Athena SQL query is run.

Metadata

Metadata Data Lake Modeling Data Warehouse

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Initially, data warehouses were the go-to solution for structured data and analytical workloads but were limited by proprietary storage formats and their inability to handle unstructured data. Eventually, transactional data lakes emerged to add transactional consistency and performance of a data warehouse to the data lake.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

Under the hood, UniForm generates Iceberg metadata files (including metadata and manifest files) that are required for Iceberg clients to access the underlying data files in Delta Lake tables. Both Delta Lake and Iceberg metadata files reference the same data files. Appendix 1.

Metadata

Metadata Data Warehouse Big Data Data Lake

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Trending Sources

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Webinars

Recap of Amazon Redshift key product announcements in 2024

Build a high-performance quant research platform with Apache Iceberg

Use Apache Iceberg in a data lake to support incremental data processing

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Multicloud data lake analytics with Amazon Athena

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Choosing an open table format for your transactional data lake on AWS

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Data Lakes on Cloud & it’s Usage in Healthcare

Enrich your serverless data lake with Amazon Bedrock

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Build a real-time GDPR-aligned Apache Iceberg data lake

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

Efficiently crawl your data lake and improve data access with an AWS Glue crawler using partition indexes

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Salesforce debuts Zero Copy Partner Network to ease data integration

Use AWS Glue Data Catalog views to analyze data

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Data governance in the age of generative AI

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

Driving Business Value and ROI from a Hybrid Cloud Data Lake

Unstructured data management and governance using AWS AI/ML and analytics services

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

Data Cataloging in the Data Lake: Alation + Kylo

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

Set up cross-account AWS Glue Data Catalog access using AWS Lake Formation and AWS IAM Identity Center with Amazon Redshift and Amazon QuickSight

Stay Connected