Data Lake, Metadata and Testing - Data Leaders Brief

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Table metadata is fetched from AWS Glue. The generated Athena SQL query is run.

Metadata

Metadata Data Lake Modeling Data Warehouse

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Initially, data warehouses were the go-to solution for structured data and analytical workloads but were limited by proprietary storage formats and their inability to handle unstructured data. Eventually, transactional data lakes emerged to add transactional consistency and performance of a data warehouse to the data lake.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. Unlike direct Amazon S3 access, Iceberg supports these operations on petabyte-scale data lakes without requiring complex custom code.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Trending Sources

Build a high-performance quant research platform with Apache Iceberg

Webinars

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Use Apache Iceberg in a data lake to support incremental data processing

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Recap of Amazon Redshift key product announcements in 2024

Choosing an open table format for your transactional data lake on AWS

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Addressing Data Mesh Technical Challenges with DataOps

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Enrich your serverless data lake with Amazon Bedrock

Write queries faster with Amazon Q generative SQL for Amazon Redshift

What is a Data Mesh?

Build a real-time GDPR-aligned Apache Iceberg data lake

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Data’s dark secret: Why poor quality cripples AI and growth

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

Top analytics announcements of AWS re:Invent 2024

Doing Cloud Migration and Data Governance Right the First Time

A Day in the Life of a DataOps Engineer

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Regeneron turns to IT to accelerate drug discovery

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Apache Ozone and Dense Data Nodes

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Governing data in relational databases using Amazon DataZone

Top 15 data management platforms

Integrate custom applications with AWS Lake Formation – Part 1

What is a data architect? Skills, salaries, and how to become a data framework master

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

NVIDIA RAPIDS in Cloudera Machine Learning

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

Query your Apache Hive metastore with AWS Lake Formation permissions

Stay Connected