Big Data, Data Lake and Modeling

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Data Lake

Data Lake Data Processing Optimization Machine Learning

Databricks Lakehouse Platform Streamlines Big Data Processing

David Menninger's Analyst Perspectives

OCTOBER 26, 2021

Databricks is a data engineering and analytics cloud platform built on top of Apache Spark that processes and transforms huge volumes of data and offers data exploration capabilities through machine learning models. The platform supports streaming data, SQL queries, graph processing and machine learning.

Big Data

Big Data Data Processing Machine Learning Modeling

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This saves time and effort, especially for teams looking to minimize infrastructure management and focus solely on data modeling.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Understanding the Differences Between Data Lakes and Data Warehouses

Smart Data Collective

AUGUST 28, 2021

Data lakes and data warehouses are probably the two most widely used structures for storing data. Data Warehouses and Data Lakes in a Nutshell. A data warehouse is used as a central storage space for large amounts of structured data coming from various sources. Data Type and Processing.

Data Lake

Data Lake Data Warehouse Unstructured Data Structured Data

Here’s Why Automation For Data Lakes Could Be Important

Smart Data Collective

APRIL 2, 2019

Data Lakes are among the most complex and sophisticated data storage and processing facilities we have available to us today as human beings. Analytics Magazine notes that data lakes are among the most useful tools that an enterprise may have at its disposal when aiming to compete with competitors via innovation.

Data Lake

Data Lake Big Data OLAP Testing

7 Key Benefits of Proper Data Lake Ingestion

Smart Data Collective

APRIL 24, 2020

Perhaps one of the biggest perks is scalability, which simply means that with good data lake ingestion a small business can begin to handle bigger data numbers. The reality is businesses that are collecting data will likely be doing so on several levels. Proper Scalability. Stores in Raw Format. Uses Powerful Algorithms.

Data Lake

Data Lake Data Collection Deep Learning Management

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

They’re taking data they’ve historically used for analytics or business reporting and putting it to work in machine learning (ML) models and AI-powered applications. Amazon SageMaker Unified Studio (Preview) solves this challenge by providing an integrated authoring experience to use all your data and tools for analytics and AI.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. When combined with well-timed maintenance operations, these patterns help build resilient data pipelines that can handle concurrent writes reliably.

Snapshot

Snapshot Management Metadata Big Data

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Initially, data warehouses were the go-to solution for structured data and analytical workloads but were limited by proprietary storage formats and their inability to handle unstructured data. Eventually, transactional data lakes emerged to add transactional consistency and performance of a data warehouse to the data lake.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. Amazon DataZone natively supports data sharing for Amazon Redshift data assets.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

AWS Big Data

OCTOBER 10, 2024

Over the years, this customer-centric approach has led to the introduction of groundbreaking features such as zero-ETL , data sharing , streaming ingestion , data lake integration , Amazon Redshift ML , Amazon Q generative SQL , and transactional data lake capabilities.

Data Lake

Data Lake Data Warehouse Recreation/Entertainment Data-driven

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

AWS Big Data

MARCH 13, 2025

From within the unified studio, you can discover data and AI assets from across your organization, then work together in projects to securely build and share analytics and AI artifacts, including data, models, and generative AI applications.

Analytics

Analytics Data Lake Data Warehouse Data-driven

2021 Gift Giving Guide for Data Nerds

DataKitchen

DECEMBER 7, 2021

Fail Fast, Learn Faster: Lessons in Data-Driven Leadership in an Age of Disruption, Big Data, and AI, by Randy Bean. In the book, author Zhamak Dehghani reveals that, despite the time, money, and effort poured into them, data warehouses and data lakes fail when applied at the scale and speed of today’s organizations.

Data-driven

Data-driven Data Governance Big Data Data Science

Enrich your serverless data lake with Amazon Bedrock

AWS Big Data

SEPTEMBER 26, 2024

For many organizations, this centralized data store follows a data lake architecture. Although data lakes provide a centralized repository, making sense of this data and extracting valuable insights can be challenging. We recommend testing your use case and data with different models.

Data Lake

Data Lake Cost-Benefit Unstructured Data Modeling

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

Jet Global

NOVEMBER 5, 2020

That stands for “bring your own database,” and it refers to a model in which core ERP data are replicated to a separate standalone database used exclusively for reporting. OLAP reporting based on a data warehouse model is a well-proven solution for companies with robust reporting requirements. Option 3: Azure Data Lakes.

Data Lake

Data Lake OLAP Data Warehouse Unstructured Data

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

This led to inefficiencies in data governance and access control. AWS Lake Formation is a service that streamlines and centralizes the data lake creation and management process. The Solution: How BMW CDH solved data duplication The CDH is a company-wide data lake built on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Sales Metadata Machine Learning

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

In addition to real-time analytics and visualization, the data needs to be shared for long-term data analytics and machine learning applications. To achieve this, EUROGATE designed an architecture that uses Amazon DataZone to publish specific digital twin data sets, enabling access to them with SageMaker in a separate AWS account.

IoT

IoT Machine Learning Metadata Data-driven

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. foundation model (FM) in Amazon Bedrock as the LLM. Can it also help write SQL queries?

Metadata

Metadata Data Lake Modeling Data Warehouse

Did Big Data Deliver Business Transformation & Improved CX?

Alation

AUGUST 4, 2022

It’s been one decade since the “ Big Data Era ” began (and to much acclaim!). Analysts asked, What if we could manage massive volumes and varieties of data? Yet the question remains: How much value have organizations derived from big data? Big Data as an Enabler of Digital Transformation.

Big Data

Big Data Digital Transformation Data Lake Data-driven

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

In this post, we delve into the key aspects of using Amazon EMR for modern data management, covering topics such as data governance, data mesh deployment, and streamlined data discovery. Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. Of those tables, some are larger (such as in terms of record volume) than others, and some are updated more frequently than others.

Data Lake

Data Lake Data Processing Metadata Snapshot

Outdated business apps can cloud your AI vision

CIO Business Intelligence

FEBRUARY 20, 2025

Stone called outdated apps a multi-trillion-dollar problem, even after organizations have spent the past decade focused on modernizing their infrastructure to deal with big data. This allows for the extraction and integration of data into AI models without overhauling entire platforms, Erolin says.

Insurance

Insurance Cost-Benefit Unstructured Data Data Lake

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake. What’s in a Data Lake? Data warehouses do a great job of standardizing data from disparate sources for analysis. Taking a Dip.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

Waking Up The World of Big Data

Sisense

JUNE 11, 2019

The term “Big Data” has lost its relevance. The fact remains, though: every dataset is becoming a Big Data set, whether its owners and users know (and understand) that or not. Big Data isn’t just something that happens to other people or giant companies like Google and Amazon. Big Data Today.

Big Data

Big Data Slice and Dice IoT Data Lake

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

AWS Big Data

DECEMBER 12, 2024

In the context of comprehensive data governance, Amazon DataZone offers organization-wide data lineage visualization using Amazon Web Services (AWS) services, while dbt provides project-level lineage through model analysis and supports cross-project integration between data lakes and warehouses.

Snapshot

Snapshot Recreation/Entertainment Experimentation Data Lake

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

But the data repository options that have been around for a while tend to fall short in their ability to serve as the foundation for big data analytics powered by AI. Traditional data warehouses, for example, support datasets from multiple sources but require a consistent data structure. Meet the data lakehouse.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications. Search for the Jira Cloud connector.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

AWS Big Data

JUNE 23, 2023

Events and many other security data types are stored in Imperva’s Threat Research Multi-Region data lake. Imperva harnesses data to improve their business outcomes. As part of their solution, they are using Amazon QuickSight to unlock insights from their data.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

Data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts.

Data Lake

Data Lake Data Warehouse Marketing Management

Data Modeling 301 for the cloud: data lake and NoSQL data modeling and design

erwin

AUGUST 15, 2022

For NoSQL, data lakes, and data lake houses—data modeling of both structured and unstructured data is somewhat novel and thorny. This blog is an introduction to some advanced NoSQL and data lake database design techniques (while avoiding common pitfalls) is noteworthy. Data Modeling.

Data Lake

Data Lake Modeling Unstructured Data Data Warehouse

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

AWS Big Data

FEBRUARY 26, 2025

This evaluation, we feel, critically examines vendors capabilities to address key service needs, including data engineering, operational data integration, modern data architecture delivery, and enabling less-technical data integration across various deployment models.

Data Integration

Data Integration Data Lake Data Warehouse Unstructured Data

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

Taking the broadest possible interpretation of data analytics , Azure offers more than a dozen services — and that’s before you include Power BI, with its AI-powered analysis and new datamart option , or governance-oriented approaches such as Microsoft Purview. Azure Data Factory. Azure Data Explorer. Azure Data Lake Analytics.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

These features allow efficient data corrections, gap-filling in time series, and historical data updates without disrupting ongoing analyses or compromising data integrity. Unlike direct Amazon S3 access, Iceberg supports these operations on petabyte-scale data lakes without requiring complex custom code.

Metadata

Metadata Snapshot Cost-Benefit Optimization

The rise of the data lakehouse: A new era of data value

CIO Business Intelligence

AUGUST 18, 2022

Previously, Walgreens was attempting to perform that task with its data lake but faced two significant obstacles: cost and time. Those challenges are well-known to many organizations as they have sought to obtain analytical knowledge from their vast amounts of data. Lakehouses redeem the failures of some data lakes.

Data Lake

Data Lake Data Warehouse Unstructured Data Business Intelligence

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

You can’t talk about data analytics without talking about data modeling. The reasons for this are simple: Before you can start analyzing data, huge datasets like data lakes must be modeled or transformed to be usable. Building the right data model is an important part of your data strategy.

Modeling

Modeling Big Data IoT Data Warehouse

Data Cataloging in the Data Lake: Alation + Kylo

Alation

FEBRUARY 20, 2020

Architecturally the introduction of Hadoop, a file system designed to store massive amounts of data, radically affected the cost model of data. Organizationally the innovation of self-service analytics, pioneered by Tableau and Qlik, fundamentally transformed the user model for data analysis.

Data Lake

Data Lake Metadata Structured Data Big Data

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Q: Is data modeling cool again? In today’s fast-paced digital landscape, data reigns supreme. The data-driven enterprise relies on accurate, accessible, and actionable information to make strategic decisions and drive innovation. A: It always was and is getting cooler!!

Data-driven

Data-driven Modeling Enterprise Structured Data

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture. To achieve this, they plan to use machine learning (ML) models to extract insights from data.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Databricks Lakehouse Platform Streamlines Big Data Processing

Webinars

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Understanding the Differences Between Data Lakes and Data Warehouses

Here’s Why Automation For Data Lakes Could Be Important

7 Key Benefits of Proper Data Lake Ingestion

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Choosing an open table format for your transactional data lake on AWS

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Run Apache XTable in AWS Lambda for background conversion of open table formats

Recap of Amazon Redshift key product announcements in 2024

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

2021 Gift Giving Guide for Data Nerds

Enrich your serverless data lake with Amazon Bedrock

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

How EUROGATE established a data mesh architecture using Amazon DataZone

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Did Big Data Deliver Business Transformation & Improved CX?

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Outdated business apps can cloud your AI vision

Data Lakes: What Are They and Who Needs Them?

Waking Up The World of Big Data

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Building a Beautiful Data Lakehouse

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Data Modeling 301 for the cloud: data lake and NoSQL data modeling and design

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

7 key Microsoft Azure analytics services (plus one extra)

Build a high-performance quant research platform with Apache Iceberg

The rise of the data lakehouse: A new era of data value

Building Better Data Models to Unlock Next-Level Intelligence

Data Cataloging in the Data Lake: Alation + Kylo

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Stay Connected