Data Lake and Enterprise - Data Leaders Brief

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Data Lake

Data Lake Data Processing Optimization Machine Learning

Incremental refresh for Amazon Redshift materialized views on data lake tables

AWS Big Data

NOVEMBER 8, 2024

Amazon Redshift is a fast, fully managed cloud data warehouse that makes it cost-effective to analyze your data using standard SQL and business intelligence tools. Customers use data lake tables to achieve cost effective storage and interoperability with other tools.

Data Lake

Data Lake Data Warehouse Optimization Testing

Steps taken to build Sevita’s first enterprise data platform

CIO Business Intelligence

OCTOBER 23, 2024

Here, CIO Patrick Piccininno provides a roadmap of his journey from data with no integration to meaningful dashboards, insights, and a data literate culture. You ’re building an enterprise data platform for the first time in Sevita’s history. Second, the manual spreadsheet work resulted in significant manual data entry.

Enterprise

Enterprise Dashboards KPI Data Lake

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Rapidminer Platform Supports Entire Data Science Lifecycle

David Menninger's Analyst Perspectives

SEPTEMBER 16, 2021

Rapidminer is a visual enterprise data science platform that includes data extraction, data mining, deep learning, artificial intelligence and machine learning (AI/ML) and predictive analytics. It can support AI/ML processes with data preparation, model validation, results visualization and model optimization.

Data Science

Data Science Data Lake Data mining Deep Learning

12 Considerations When Evaluating Data Lake Engine Vendors for Analytics and BI

Businesses today compete on their ability to turn big data into essential business insights. To do so, modern enterprises leverage cloud data lakes as the platform used to store data for analytical purposes, combined with various compute engines for processing that data.

Data Lake

Delta Lake in Action – Quick Hands-on Tutorial for Beginners

Analytics Vidhya

OCTOBER 10, 2022

Introduction In the modern data world, Lakehouse has become one of the most discussed topics for building a data platform. Enterprises have slowly started adopting Lakehouses for their data ecosystems as they offer cost efficiencies of data lakes and the performance of warehouses. […].

Data Lake

Data Lake Data Science Publishing Enterprise

Better together? Why AWS is unifying data analytics and AI services in SageMaker

CIO Business Intelligence

DECEMBER 6, 2024

Another offering that AWS announced to support the integration is the SageMaker Data Lakehouse , aimed at helping enterprises unify data across Amazon S3 data lakes and Amazon Redshift data warehouses.

Data Analytics

Data Analytics Analytics Data Lake Data Warehouse

A Comprehensive Guide on Delta Lake

Analytics Vidhya

FEBRUARY 27, 2023

Introduction Enterprises here and now catalyze vast quantities of data, which can be a high-end source of business intelligence and insight when used appropriately. Delta Lake allows businesses to access and break new data down in real time.

Data Lake

Data Lake Business Intelligence Enterprise Analytics

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Building Best-in-Class Enterprise Analytics

Speaker: Anthony Roach, Director of Product Management at Tableau Software, and Jeremiah Morrow, Partner Solution Marketing Director at Dremio

Register now for the webinar on April 21, 2022 at 10:00 am PDT, 12:00 pm EDT to learn how Dremio and Tableau are delivering mission critical BI and interactive analytics on data directly in the data lake.

Analytics

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Oracle Wants to Be the Database for AI

David Menninger's Analyst Perspectives

MAY 15, 2025

For context, read this perspective by my colleague, Matt Aslett, on the importance of local data processing. Our research shows that more than half of enterprises (58%) have the majority of data platforms in the cloud, but a substantial portion is deployed on premises. Regards, David Menninger

Data Lake

Data Lake Data Warehouse Machine Learning Software

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

Businesses are constantly evolving, and data leaders are challenged every day to meet new requirements. For many enterprises and large organizations, it is not feasible to have one processing engine or tool to deal with the various business requirements. This post is co-written with Andries Engelbrecht and Scott Teal from Snowflake.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. and later supports the Apache Iceberg framework for data lakes. AWS Glue 3.0 The following diagram illustrates the solution architecture.

Data Lake

Data Lake Data Processing Metadata Snapshot

Top Considerations for Building an Open Cloud Data Lake

Data fuels the modern enterprise — today more than ever, businesses compete on their ability to turn big data into essential business insights. Increasingly, enterprises are leveraging cloud data lakes as the platform used to store data for analytics, combined with various compute engines for processing that data.

Data Lake

Enrich your serverless data lake with Amazon Bedrock

AWS Big Data

SEPTEMBER 26, 2024

For many organizations, this centralized data store follows a data lake architecture. Although data lakes provide a centralized repository, making sense of this data and extracting valuable insights can be challenging.

Data Lake

Data Lake Cost-Benefit Unstructured Data Modeling

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Enterprises and organizations across the globe want to harness the power of data to make better decisions by putting data at the center of every decision-making process. This post is co-written with Amit Gilad, Alex Dickman and Itay Takersman from Cloudinary. 5 seconds $0.08 8 seconds $0.07 8 seconds $0.02 107 seconds $0.25

Data Lake

Data Lake Metadata Snapshot Analytics

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

AWS Big Data

MARCH 13, 2025

Governance features including fine-grained access control are built into SageMaker Unified Studio using Amazon SageMaker Catalog to help you meet enterprise security requirements across your entire data estate.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

AWS Big Data

SEPTEMBER 10, 2024

We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services. In such scenarios, data engineers face challenges in connecting and extracting data from storage containers on Microsoft Azure.

Data Lake

Data Lake Metadata Management Software

MongoDB Enhances Developer Data Platform

David Menninger's Analyst Perspectives

JANUARY 21, 2025

While new and emerging capabilities might catch the eye, features that address data platform security, performance and availability remain some of the most significant deal-breakers when enterprises are considering potential data platform providers. This is especially true for mission-critical workloads. Regards, Matt Aslett

Data Lake

Data Lake IoT Cost-Benefit Enterprise

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The combination of a data lake in a serverless paradigm brings significant cost and performance benefits. By monitoring application logs, you can gain insights into job execution, troubleshoot issues promptly to ensure the overall health and reliability of data pipelines.

Data Lake

Data Lake Metrics Testing Cost-Benefit

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Amazon SageMaker Lakehouse , now generally available, unifies all your data across Amazon Simple Storage Service (Amazon S3) data lakes and Amazon Redshift data warehouses, helping you build powerful analytics and AI/ML applications on a single copy of data. The tools to transform your business are here.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more. Connect with him on LinkedIn.

Visualization

Visualization Data Lake Testing Data Governance

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Use cases for Hive metastore federation for Amazon EMR Hive metastore federation for Amazon EMR is applicable to the following use cases: Governance of Amazon EMR-based data lakes – Producers generate data within their AWS accounts using an Amazon EMR-based data lake supported by EMRFS on Amazon Simple Storage Service (Amazon S3)and HBase.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

What is data architecture? A framework to manage data

CIO Business Intelligence

DECEMBER 20, 2024

Data architecture definition Data architecture describes the structure of an organizations logical and physical data assets, and data management resources, according to The Open Group Architecture Framework (TOGAF). An organizations data architecture is the purview of data architects. DAMA-DMBOK 2.

Data Architecture

Data Architecture Management Consulting Internet of Things

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

United Airlines sets its flight plan for gen AI success

CIO Business Intelligence

DECEMBER 20, 2024

Uniteds embrace of SageMaker and Bedrock as well as Amazon Q is going to be a game changer for building data products, said Mai-LanTomsenBukovec, AWS vice president of technology, who pointed to United Data Hub as a transformational component in its AI journey at re:Invent. That number has increased to 21% in just 18 months.

IT

IT Unstructured Data Experimentation Data Lake

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

DataOps adoption continues to expand as a perfect storm of social, economic, and technological factors drive enterprises to invest in process-driven innovation. As a result, enterprises will examine their end-to-end data operations and analytics creation workflows. Data Gets Meshier. Hub-Spoke Enterprise Architectures.

Testing

Testing Data Lake Data Architecture Manufacturing

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

A high hurdle many enterprises have yet to overcome is accessing mainframe data via the cloud. Giving the mobile workforce access to this data via the cloud allows them to be productive from anywhere, fosters collaboration, and improves overall strategic decision-making.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

Cloud computing has made it much easier to integrate data sets, but that’s only the beginning. Creating a data lake has become much easier, but that’s only ten percent of the job of delivering analytics to users. It often takes months to progress from a data lake to the final delivery of insights.

Data Processing

Data Processing Data Lake Cost-Benefit Testing

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. But first, let’s define the data mesh design pattern. The past decades of enterprise data platform architectures can be summarized in 69 words.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Build a real-time GDPR-aligned Apache Iceberg data lake

AWS Big Data

FEBRUARY 24, 2023

Data lakes are a popular choice for today’s organizations to store their data around their business activities. As a best practice of a data lake design, data should be immutable once stored. A data lake built on AWS uses Amazon Simple Storage Service (Amazon S3) as its primary storage environment.

Data Lake

Data Lake Metadata Testing Data Warehouse

Implementing a Pharma Data Mesh using DataOps

DataKitchen

AUGUST 19, 2021

Data mesh and DataOps provide the organization, enterprise architecture, and workflow automation that together enable a relatively small data team to address the analytics needs of hundreds of active business users. Figure 1: Data requirements for phases of the drug product lifecycle. The new Recipes run, and BOOM!

Data Warehouse

Data Warehouse Data Lake Manufacturing Testing

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

When you build your transactional data lake using Apache Iceberg to solve your functional use cases, you need to focus on operational use cases for your S3 data lake to optimize the production environment. Jupyter Enterprise Gateway 2.6.0, availability. This example is demonstrated on an EMR version emr-6.10.0

Data Lake

Data Lake Snapshot Metadata Optimization

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

With this integration, you can now seamlessly query your governed data lake assets in Amazon DataZone using popular business intelligence (BI) and analytics tools, including partner solutions like Tableau. Joel has led data transformation projects on fraud analytics, claims automation, and Master Data Management.

Analytics

Analytics Visualization Data Governance Data-driven

Apply enterprise data governance and management using AWS Lake Formation and AWS IAM Identity Center

AWS Big Data

SEPTEMBER 26, 2024

In today’s rapidly evolving digital landscape, enterprises across regulated industries face a critical challenge as they navigate their digital transformation journeys: effectively managing and governing data from legacy systems that are being phased out or replaced. The following diagram illustrates the end-to-end solution.

Data Governance

Data Governance Enterprise Management Data Lake

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

AWS Big Data

FEBRUARY 26, 2025

Given the diverse data integration needs of customers, AWS offers a robust data integration system through multiple services including Amazon EMR , Amazon Athena , Amazon Managed Workflows for Apache Airflow (Amazon MWAA) , Amazon Managed Streaming for Apache Kafka (MSK) , Amazon Kinesis , and others.

Data Integration

Data Integration Data Lake Data Warehouse Unstructured Data

Scalable analytics and centralized governance for Apache Iceberg tables using Amazon S3 Tables and Amazon Redshift

AWS Big Data

MAY 22, 2025

While this blog post helps you to get started using Amazon Redshift with Amazon S3 Tables, there are additional steps you need to consider when working with your data in production environments, including who has access to your data and with what level of permissions. We welcome your feedback in the comments section.

Analytics

Analytics Data Lake Management Insurance

The Increasing Importance of Open Table Formats

David Menninger's Analyst Perspectives

OCTOBER 31, 2024

I previously wrote about the importance of open table formats to the evolution of data lakes into data lakehouses. The concept of the data lake was initially proposed as a single environment where data could be combined from multiple sources to be stored and processed to enable analysis by multiple users for multiple purposes.

Data Lake

Data Lake Unstructured Data Data Warehouse Software

Why enterprise CIOs need to plan for Microsoft gen AI

CIO Business Intelligence

AUGUST 14, 2024

Between building gen AI features into almost every enterprise tool it offers, adding the most popular gen AI developer tool to GitHub — GitHub Copilot is already bigger than GitHub when Microsoft bought it — and running the cloud powering OpenAI, Microsoft has taken a commanding lead in enterprise gen AI.

Enterprise

Enterprise Cost-Benefit Experimentation Modeling

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

Data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts.

Data Lake

Data Lake Data Warehouse Marketing Management

Generative AI: 5 enterprise predictions for AI and security — for 2023, 2024, and beyond

CIO Business Intelligence

OCTOBER 25, 2023

From IT, to finance, marketing, engineering, and more, AI advances are causing enterprises to re-evaluate their traditional approaches to unlock the transformative potential of AI. What can enterprises learn from these trends, and what future enterprise developments can we expect around generative AI?

Enterprise

Enterprise Manufacturing Risk Data-driven

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Incremental refresh for Amazon Redshift materialized views on data lake tables

Webinars

Trending Sources

Steps taken to build Sevita’s first enterprise data platform

Webinars

Rapidminer Platform Supports Entire Data Science Lifecycle

12 Considerations When Evaluating Data Lake Engine Vendors for Analytics and BI

Delta Lake in Action – Quick Hands-on Tutorial for Beginners

Better together? Why AWS is unifying data analytics and AI services in SageMaker

A Comprehensive Guide on Delta Lake

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Building Best-in-Class Enterprise Analytics

Recap of Amazon Redshift key product announcements in 2024

Oracle Wants to Be the Database for AI

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Use Apache Iceberg in a data lake to support incremental data processing

Top Considerations for Building an Open Cloud Data Lake

Enrich your serverless data lake with Amazon Bedrock

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

MongoDB Enhances Developer Data Platform

Monitor data pipelines in a serverless data lake

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

What is data architecture? A framework to manage data

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

United Airlines sets its flight plan for gen AI success

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Eight Top DataOps Trends for 2022

Bridging the gap between mainframe data and hybrid cloud environments

Centralize Your Data Processes With a DataOps Process Hub

What is a Data Mesh?

Build a real-time GDPR-aligned Apache Iceberg data lake

Implementing a Pharma Data Mesh using DataOps

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Apply enterprise data governance and management using AWS Lake Formation and AWS IAM Identity Center

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

Scalable analytics and centralized governance for Apache Iceberg tables using Amazon S3 Tables and Amazon Redshift

The Increasing Importance of Open Table Formats

Why enterprise CIOs need to plan for Microsoft gen AI

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Generative AI: 5 enterprise predictions for AI and security — for 2023, 2024, and beyond

Stay Connected