Data Lake and Strategy - Data Leaders Brief

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Data Lake

Data Lake Data Processing Optimization Machine Learning

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights. They are the same.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

AWS Big Data

DECEMBER 9, 2024

Apache Iceberg is an open table format that brings atomicity, consistency, isolation, and durability (ACID) transactions to data lakes, streamlining data management. One of its key features is the ability to manage data using branches. We discuss two common strategies to verify the quality of published data.

Data Quality

Data Quality Publishing Snapshot Data Lake

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. and later supports the Apache Iceberg framework for data lakes. AWS Glue 3.0 The following diagram illustrates the solution architecture.

Data Lake

Data Lake Data Processing Metadata Snapshot

The Unexpected Cost of Data Copies

Fortunately, a next-gen data architecture enabled by the Dremio data lake service removes the need for replicated data, helping organizations to minimize complexity, boost efficiency and dramatically reduce costs. Read this whitepaper to learn: Why organizations frequently end up with unnecessary data copies.

Data Lake

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries.

Data Lake

Data Lake Statistics Broadcasting Optimization

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

A modern data strategy redefines and enables sharing data across the enterprise and allows for both reading and writing of a singular instance of the data using an open table format.

Data Lake

Data Lake Metadata Snapshot Analytics

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

This led to inefficiencies in data governance and access control. AWS Lake Formation is a service that streamlines and centralizes the data lake creation and management process. The Solution: How BMW CDH solved data duplication The CDH is a company-wide data lake built on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Sales Metadata Machine Learning

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift enables you to directly access data stored in Amazon Simple Storage Service (Amazon S3) using SQL queries and join data across your data warehouse and data lake. With Amazon Redshift, you can query the data in your S3 data lake using a central AWS Glue metastore from your Redshift data warehouse.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

The success of GenAI models lies in your data management strategy

CIO Business Intelligence

OCTOBER 9, 2024

The data preparation process should take place alongside a long-term strategy built around GenAI use cases, such as content creation, digital assistants, and code generation. Known as data engineering, this involves setting up a data lake or lakehouse, with their data integrated with GenAI models.

Strategy

Strategy Modeling Management Data Lake

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Enrich your serverless data lake with Amazon Bedrock

AWS Big Data

SEPTEMBER 26, 2024

For many organizations, this centralized data store follows a data lake architecture. Although data lakes provide a centralized repository, making sense of this data and extracting valuable insights can be challenging. The process creates a JSON file with the original_content and summary fields.

Data Lake

Data Lake Cost-Benefit Unstructured Data Modeling

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

AWS Big Data

MARCH 13, 2025

Unified access to your data is provided by Amazon SageMaker Lakehouse , a unified, open, and secure data lakehouse built on Apache Iceberg open standards. To identify the most promising opportunities, the team develops a segmentation strategy. The data analyst then discovers it and creates a comprehensive view of their market.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

AWS Big Data

SEPTEMBER 10, 2024

Organizations are increasingly using a multi-cloud strategy to run their production workloads. We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services. For this post, we use the Shared Key authentication method.

Data Lake

Data Lake Metadata Management Software

Differences Between Data Lake and Data Warehouses

TDAN

SEPTEMBER 14, 2021

Data lake is a newer IT term created for a new category of data store. But just what is a data lake? According to IBM, “a data lake is a storage repository that holds an enormous amount of raw or refined data in native format until it is accessed.” That makes sense. I think the […].

Data Lake

Data Lake Data Warehouse IT Data Strategy

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

Jet Global

NOVEMBER 5, 2020

Option 3: Azure Data Lakes. This leads us to Microsoft’s apparent long-term strategy for D365 F&SCM reporting: Azure Data Lakes. Azure Data Lakes are highly complex and designed with a different fundamental purpose in mind than financial and operational reporting. Azure Data Lakes are complicated.

Data Lake

Data Lake OLAP Data Warehouse Unstructured Data

The data flywheel: A better way to think about your data strategy

CIO Business Intelligence

OCTOBER 25, 2022

This article was co-authored by Duke Dyksterhouse , an Associate at Metis Strategy. Data & Analytics is delivering on its promise. Some are our clients—and more of them are asking our help with their data strategy. So, they built a data-lake. Often their ask is a thinly veiled admission of overwhelm.

Data Strategy

Data Strategy Strategy Data Lake Data-driven

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

El Consejo Superior de Deportes destinará 2,8 millones a la creación de una plataforma de TI y un ‘data lake’

CIO Business Intelligence

AUGUST 6, 2024

La firma de consultoría también deberá crear un data lake que permita almacenar y compartir fácilmente los datos de todo el ecosistema deportivo español, para lograr sinergias en torno a los distintos proyectos que se realicen.

Data Lake

Data Lake Software Digital Transformation Strategy

Navigating Data Entities, BYOD, and Data Lakes in Microsoft Dynamics

Jet Global

SEPTEMBER 4, 2020

There is an established body of practice around creating, managing, and accessing OLAP data (known as “cubes”). Data Lakes. There has been a lot of talk over the past year or two in the D365F&SCM world about “data lakes.” Traditional databases and data warehouses do not lend themselves to that task.

Data Lake

Data Lake OLAP Data Warehouse Unstructured Data

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

When you build your transactional data lake using Apache Iceberg to solve your functional use cases, you need to focus on operational use cases for your S3 data lake to optimize the production environment. availability. You still need to set appropriate EMRFS retries to provide additional resiliency.

Data Lake

Data Lake Snapshot Metadata Optimization

Build a real-time GDPR-aligned Apache Iceberg data lake

AWS Big Data

FEBRUARY 24, 2023

Data lakes are a popular choice for today’s organizations to store their data around their business activities. As a best practice of a data lake design, data should be immutable once stored. A data lake built on AWS uses Amazon Simple Storage Service (Amazon S3) as its primary storage environment.

Data Lake

Data Lake Metadata Testing Data Warehouse

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

In our previous post Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg , we showed how to use Apache Iceberg in the context of strategy backtesting. Our analysis shows that Iceberg can accelerate query performance by up to 52%, reduce operational costs, and significantly improve data management at scale.

Metadata

Metadata Snapshot Cost-Benefit Optimization

MongoDB Enhances Developer Data Platform

David Menninger's Analyst Perspectives

JANUARY 21, 2025

It continues to position its document database product as a developer data platform which is primarily used to support the development and deployment of net-new applications rather than as a direct replacement for relational databases. The recent launch of MongoDB 8.0

Data Lake

Data Lake IoT Cost-Benefit Enterprise

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

AWS Big Data

NOVEMBER 11, 2024

Disaster recovery is vital for organizations, offering a proactive strategy to mitigate the impact of unforeseen events like system failures, natural disasters, or cyberattacks. In Disaster Recovery (DR) Architecture on AWS, Part I: Strategies for Recovery in the Cloud , we introduced four major strategies for disaster recovery (DR) on AWS.

Snapshot

Snapshot Strategy Dashboards Data Lake

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

Data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts.

Data Lake

Data Lake Data Warehouse Marketing Management

Outdated business apps can cloud your AI vision

CIO Business Intelligence

FEBRUARY 20, 2025

Decades-old apps designed to retain a limited amount of data due to storage costs at the time are also unlikely to integrate easily with AI tools, says Brian Klingbeil, chief strategy officer at managed services provider Ensono. The aim is to create integration pipelines that seamlessly connect different systems and data sources.

Insurance

Insurance Cost-Benefit Unstructured Data Data Lake

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

AWS Big Data

MARCH 28, 2023

As organizations across the globe are modernizing their data platforms with data lakes on Amazon Simple Storage Service (Amazon S3), handling SCDs in data lakes can be challenging.

Data Lake

Data Lake Testing Snapshot Big Data

Five Strategies to Accelerate Data Product Development

Cloudera

JULY 26, 2021

With this first article of the two-part series on data product strategies, I am presenting some of the emerging themes in data product development and how they inform the prerequisites and foundational capabilities of an Enterprise data platform that would serve as the backbone for developing successful data product strategies.

Strategy

Strategy Data Science Unstructured Data Marketing

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

AWS Big Data

JUNE 23, 2023

Events and many other security data types are stored in Imperva’s Threat Research Multi-Region data lake. Imperva harnesses data to improve their business outcomes. As part of their solution, they are using Amazon QuickSight to unlock insights from their data.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

Steps taken to build Sevita’s first enterprise data platform

CIO Business Intelligence

OCTOBER 23, 2024

But because of the infrastructure, employees spent hours on manual data analysis and spreadsheet jockeying. We had plenty of reporting, but very little data insight, and no real semblance of a data strategy. Second, the manual spreadsheet work resulted in significant manual data entry.

Enterprise

Enterprise Dashboards KPI Data Lake

Architecture for the Data Lake

TDAN

JANUARY 3, 2023

For a while now, vendors have been advocating that people put their data in a data lake when they put their data in the cloud. The Data Lake The idea is that you put your data into a data lake. Then, at a later point in time, the end user analyst can come along and […].

Data Lake

Data Lake Data Architecture Data Warehouse Data Strategy

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

What is data architecture? A framework to manage data

CIO Business Intelligence

DECEMBER 20, 2024

Beyond breaking down silos, modern data architectures need to provide interfaces that make it easy for users to consume data using tools fit for their jobs. Data must be able to freely move to and from data warehouses, data lakes, and data marts, and interfaces must make it easy for users to consume that data.

Data Architecture

Data Architecture Management Consulting Internet of Things

Data Architecture and Strategy in the AI Era

Cloudera

MARCH 28, 2024

But, even with the backdrop of an AI-dominated future, many organizations still find themselves struggling with everything from managing data volumes and complexity to security concerns to rapidly proliferating data silos and governance challenges.

Data Architecture

Data Architecture Strategy Data Lake Data-driven

The Increasing Importance of Open Table Formats

David Menninger's Analyst Perspectives

OCTOBER 31, 2024

I previously wrote about the importance of open table formats to the evolution of data lakes into data lakehouses. The concept of the data lake was initially proposed as a single environment where data could be combined from multiple sources to be stored and processed to enable analysis by multiple users for multiple purposes.

Data Lake

Data Lake Unstructured Data Data Warehouse Software

The rise of the data lakehouse: A new era of data value

CIO Business Intelligence

AUGUST 18, 2022

Previously, Walgreens was attempting to perform that task with its data lake but faced two significant obstacles: cost and time. Those challenges are well-known to many organizations as they have sought to obtain analytical knowledge from their vast amounts of data. Lakehouses redeem the failures of some data lakes.

Data Lake

Data Lake Data Warehouse Unstructured Data Business Intelligence

Analyzing the business-case approach Perdue Farms takes to derive value from data

CIO Business Intelligence

SEPTEMBER 20, 2023

Mark Booth: We have a growth strategy to improve our business, and to support that, we’re driving a transformation in technology and business processes. But the more challenging work is in making our processes as efficient as possible so we capture the right data in our desire to become a more data-driven business.

Data Lake

Data Lake Data-driven Dashboards Risk

DaVita’s technology strategy driven by the ‘power of purpose’

CIO Business Intelligence

DECEMBER 13, 2022

Our digital transformation strategy is centered around establishing a consumer-oriented model that helps us customize chronic care management based on the ever-changing conditions of each patient.” Tim Scannell: How much of a role do technologies like data analytics and AI play in DaVita’s overall technology and business strategy?

Strategy

Strategy Technology Digital Transformation Data Lake

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

However, they do contain effective data management, organization, and integrity capabilities. As a result, users can easily find what they need, and organizations avoid the operational and cost burdens of storing unneeded or duplicate data copies. Warehouse, data lake convergence. Meet the data lakehouse.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Trending Sources

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

Webinars

Use Apache Iceberg in a data lake to support incremental data processing

The Unexpected Cost of Data Copies

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Choosing an open table format for your transactional data lake on AWS

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

The success of GenAI models lies in your data management strategy

Recap of Amazon Redshift key product announcements in 2024

Enrich your serverless data lake with Amazon Bedrock

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Differences Between Data Lake and Data Warehouses

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

The data flywheel: A better way to think about your data strategy

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

El Consejo Superior de Deportes destinará 2,8 millones a la creación de una plataforma de TI y un ‘data lake’

Navigating Data Entities, BYOD, and Data Lakes in Microsoft Dynamics

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Build a real-time GDPR-aligned Apache Iceberg data lake

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Build a high-performance quant research platform with Apache Iceberg

MongoDB Enhances Developer Data Platform

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Outdated business apps can cloud your AI vision

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Five Strategies to Accelerate Data Product Development

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

Steps taken to build Sevita’s first enterprise data platform

Architecture for the Data Lake

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

What is data architecture? A framework to manage data

Data Architecture and Strategy in the AI Era

The Increasing Importance of Open Table Formats

The rise of the data lakehouse: A new era of data value

Analyzing the business-case approach Perdue Farms takes to derive value from data

DaVita’s technology strategy driven by the ‘power of purpose’

Building a Beautiful Data Lakehouse

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Stay Connected