Data Lake, Machine Learning and Metadata

Understanding the Differences Between Data Lakes and Data Warehouses

Smart Data Collective

AUGUST 28, 2021

Data lakes and data warehouses are probably the two most widely used structures for storing data. Data Warehouses and Data Lakes in a Nutshell. A data warehouse is used as a central storage space for large amounts of structured data coming from various sources. Data Type and Processing.

Understanding the Differences Between Data Lakes and Data Warehouses

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Webinars

Trending Sources

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Webinars

NVIDIA RAPIDS in Cloudera Machine Learning

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Choosing an open table format for your transactional data lake on AWS

Data Lakes on Cloud & it’s Usage in Healthcare

Collibra Brings Effective Data Governance to Line-of-Business

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

What is a Data Mesh?

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

Run Apache XTable in AWS Lambda for background conversion of open table formats

Of Muffins and Machine Learning Models

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

Data Lakes: What Are They and Who Needs Them?

Informatica’s new data management clouds target health, finance services

Unstructured data management and governance using AWS AI/ML and analytics services

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Regeneron turns to IT to accelerate drug discovery

Building a Beautiful Data Lakehouse

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Introducing Apache Hudi support with AWS Glue crawlers

Data Swamp, Data Lake, Data Lakehouse: What to Know

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Data Cataloging in the Data Lake: Alation + Kylo

Data governance in the age of generative AI

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

The Future of the Data Lakehouse – Open

The Future of the Data Lakehouse – Open

How Cargotec uses metadata replication to enable cross-account data sharing

Cloud Data Science News – Beta 6

Recap of Amazon Redshift key product announcements in 2024

Governing data in relational databases using Amazon DataZone

Stay Connected