Big Data, Data Governance and Data Lake

Big Data

Data Governance

Data Lake

Diving Deeper into the Data Lake

David Menninger's Analyst Perspectives

NOVEMBER 13, 2020

A data lake is a centralized repository designed to house big data in structured, semi-structured and unstructured form. I have been covering the data lake topic for several years and encourage you to check out an earlier perspective called Data Lakes: Safe Way to Swim in Big Data?

Data Lake

Data Lake Big Data Data Governance Technology

Data Lakes Meet Data Warehouses

David Menninger's Analyst Perspectives

MAY 7, 2020

In this analyst perspective, Dave Menninger takes a look at data lakes. He explains the term “data lake,” describes common use cases and shares his views on some of the latest market trends. He explores the relationship between data warehouses and data lakes and share some of Ventana Research’s findings on the subject.

Data Lake

Data Lake Data Warehouse Risk Marketing

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. Using Athena and the dbt adapter, you can transform raw data in Amazon S3 into well-structured tables suitable for analytics.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Databricks Lakehouse Platform Streamlines Big Data Processing

David Menninger's Analyst Perspectives

OCTOBER 26, 2021

Databricks is a data engineering and analytics cloud platform built on top of Apache Spark that processes and transforms huge volumes of data and offers data exploration capabilities through machine learning models. The platform supports streaming data, SQL queries, graph processing and machine learning.

Big Data

Big Data Data Processing Machine Learning Modeling

Top Considerations for Building an Open Cloud Data Lake

Data fuels the modern enterprise — today more than ever, businesses compete on their ability to turn big data into essential business insights. Increasingly, enterprises are leveraging cloud data lakes as the platform used to store data for analytics, combined with various compute engines for processing that data.

Data Lake

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. Amazon DataZone natively supports data sharing for Amazon Redshift data assets.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

AWS Big Data

OCTOBER 10, 2024

Over the years, this customer-centric approach has led to the introduction of groundbreaking features such as zero-ETL , data sharing , streaming ingestion , data lake integration , Amazon Redshift ML , Amazon Q generative SQL , and transactional data lake capabilities.

Data Lake

Data Lake Data Warehouse Recreation/Entertainment Data-driven

2021 Gift Giving Guide for Data Nerds

DataKitchen

DECEMBER 7, 2021

Fail Fast, Learn Faster: Lessons in Data-Driven Leadership in an Age of Disruption, Big Data, and AI, by Randy Bean. This book is not available until January 2022, but considering all the hype around the data mesh, we expect it to be a best seller. A distributed data mesh is a better choice. How did we get here?

Data-driven

Data-driven Data Governance Big Data Data Science

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

However, the initial version of CDH supported only coarse-grained access control to entire data assets, and hence it was not possible to scope access to data asset subsets. This led to inefficiencies in data governance and access control.

Data Lake

Data Lake Sales Metadata Machine Learning

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Data landscape in EUROGATE and current challenges faced in data governance The EUROGATE Group is a conglomerate of container terminals and service providers, providing container handling, intermodal transports, maintenance and repair, and seaworthy packaging services. Eliminate centralized bottlenecks and complex data pipelines.

IoT

IoT Machine Learning Metadata Data-driven

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

In this post, we delve into the key aspects of using Amazon EMR for modern data management, covering topics such as data governance, data mesh deployment, and streamlined data discovery. Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. Data governance is a critical building block across all these approaches, and we see two emerging areas of focus.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

Over the years, organizations have invested in creating purpose-built, cloud-based data lakes that are siloed from one another. A major challenge is enabling cross-organization discovery and access to data across these multiple data lakes, each built on different technology stacks.

Data Lake

Data Lake Publishing Metadata Data-driven

Data Governance Makes Data Security Less Scary

erwin

OCTOBER 31, 2019

The Regulatory Rationale for Integrating Data Management & Data Governance. Now, as Cybersecurity Awareness Month comes to a close – and ghosts and goblins roam the streets – we thought it a good time to resurrect some guidance on how data governance can make data security less scary.

Data Governance

Data Governance Metadata Risk Data Lake

Emerging Data Platforms Tackle Big Challenges

David Menninger's Analyst Perspectives

JANUARY 14, 2021

Organizations still struggle with limited data visibility and insufficient insights, which are often caused by a multitude of reasons such as analytic workloads running independently, data spread across multiple data centers, data governance, etc.

Machine Learning

Machine Learning Data Governance Big Data Data Science

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

With this integration, you can now seamlessly query your governed data lake assets in Amazon DataZone using popular business intelligence (BI) and analytics tools, including partner solutions like Tableau. When you’re connected, you can query, visualize, and share data—governed by Amazon DataZone—within Tableau.

Analytics

Analytics Visualization Data Governance Data-driven

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Initially, the data inventories of different services were siloed within isolated environments, making data discovery and sharing across services manual and time-consuming for all teams involved. Implementing robust data governance is challenging. The following figure illustrates the data mesh architecture.

Data Governance

Data Governance Publishing Data-driven Metadata

How Data Governance Protects Sensitive Data

erwin

APRIL 2, 2021

How can companies protect their enterprise data assets, while also ensuring their availability to stewards and consumers while minimizing costs and meeting data privacy requirements? Data Security Starts with Data Governance. Lack of a solid data governance foundation increases the risk of data-security incidents.

Data Governance

Data Governance Cost-Benefit Metadata Risk

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

AWS Big Data

DECEMBER 12, 2024

The combination of these three services provides a powerful, comprehensive solution for end-to-end data lineage analysis. In this post, we use dbt for data modeling on both Amazon Athena and Amazon Redshift. This led to the implementation of both Athena on dbt and Amazon Redshift on dbt architectures.

Snapshot

Snapshot Recreation/Entertainment Experimentation Data Lake

Apply enterprise data governance and management using AWS Lake Formation and AWS IAM Identity Center

AWS Big Data

SEPTEMBER 26, 2024

In this blog post, there are three personas: Data Lake Administrator (with admin level access) User Silver from the Data Engineering group User Lead Auditor from the Auditor group. You will see how different personas in an organization can access the data without the need to modify their existing enterprise entitlements.

Data Governance

Data Governance Enterprise Management Data Lake

Talend Data Fabric Simplifies Data Life Cycle Management

David Menninger's Analyst Perspectives

NOVEMBER 16, 2021

Talend is a data integration and management software company that offers applications for cloud computing, big data integration, application integration, data quality and master data management.

Management

Management Data Warehouse Data Quality Data Integration

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake. What’s in a Data Lake? Data warehouses do a great job of standardizing data from disparate sources for analysis. Taking a Dip.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

Big Data for Business: A Requirement for Today’s Business Analytics

David Menninger's Analyst Perspectives

MAY 13, 2019

Organizations now must store, process and use data of significantly greater volume and variety than in the past.

Business Analytics

Business Analytics Big Data Analytics Enterprise

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Snowflake Builds on Its Success

David Menninger's Analyst Perspectives

JANUARY 15, 2021

Traditional on-premises data processing solutions have led to a hugely complex and expensive set of data silos where IT spends more time managing the infrastructure than extracting value from the data.

IT Data Architecture Big Data Data Processing

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

Data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts.

Data Lake

Data Lake Data Warehouse Marketing Management

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

AWS Lake Formation 2023 year in review

AWS Big Data

JANUARY 18, 2024

AWS Lake Formation and the AWS Glue Data Catalog form an integral part of a data governance solution for data lakes built on Amazon Simple Storage Service (Amazon S3) with multiple AWS analytics services integrating with them. We realized that your use cases need more flexibility in data governance.

Data Lake

Data Lake Metadata Data Governance Statistics

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

AWS Big Data

DECEMBER 4, 2024

In today’s data-driven world , organizations are constantly seeking efficient ways to process and analyze vast amounts of information across data lakes and warehouses. This post will showcase how this data can also be queried by other data teams using Amazon Athena. Verify that you have Python version 3.7

Data Lake

Data Lake Metadata Insurance Data-driven

Accelerate Amazon Redshift secure data use with Satori – Part 2

AWS Big Data

DECEMBER 12, 2024

He has over 17 years of experience architecting, building, leading, and maintaining big data platforms. Rohit helps customers modernize their analytic workloads using the breadth of AWS services and ensures that customers get the best price/performance with utmost security and data governance.

Data Warehouse

Data Warehouse Cost-Benefit Data Lake Data Architecture

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Under the federated mesh architecture, each divisional mesh functions as a node within the broader enterprise data mesh, maintaining a degree of autonomy in managing its data products. These nodes can implement analytical platforms like data lake houses, data warehouses, or data marts, all united by producing data products.

Metadata

Metadata Data Governance Data Quality Data-driven

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

Data governance is the collection of policies, processes, and systems that organizations use to ensure the quality and appropriate handling of their data throughout its lifecycle for the purpose of generating business value.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

How Verizon Is Building A Big Data And AI Culture

Bruno Aziza

NOVEMBER 15, 2017

VEDA — Verizon Enterprise Data Analytics—is an enterprise organization that addresses data management, data governance, data warehousing and data lakes and common analytical and AI technologies.

Big Data

Big Data Data Lake Data Governance Enterprise

Decentralize LF-tag management with AWS Lake Formation

AWS Big Data

NOVEMBER 16, 2023

One of the core features of AWS Lake Formation is the delegation of permissions on a subset of resources such as databases, tables, and columns in AWS Glue Data Catalog to data stewards, empowering them make decisions regarding who should get access to their resources and helping you decentralize the permissions management of your data lakes.

Management

Management Data Lake Sales Machine Learning

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

AWS Big Data

OCTOBER 20, 2023

Today, we are pleased to announce new AWS Glue connectors for Azure Blob Storage and Azure Data Lake Storage that allow you to move data bi-directionally between Azure Blob Storage, Azure Data Lake Storage, and Amazon Simple Storage Service (Amazon S3). option("header","true").load("wasbs://yourblob@youraccountname.blob.core.windows.net/loadingtest-input/100mb")

Data Lake

Data Lake Big Data Data Warehouse Consulting

Automatically detect Personally Identifiable Information in Amazon Redshift using AWS Glue

AWS Big Data

DECEMBER 15, 2023

Even after identification, it’s cumbersome to implement redaction, masking, or encryption of sensitive data at scale. In this post, we provide an automated solution to detect PII data in Amazon Redshift using AWS Glue. For our solution, we use Amazon Redshift to store the data.

Data Warehouse

Data Warehouse Data Lake Big Data Structured Data

Amazon DataZone announces custom blueprints for AWS services

AWS Big Data

JUNE 26, 2024

New feature: Custom AWS service blueprints Previously, Amazon DataZone provided default blueprints that created AWS resources required for data lake, data warehouse, and machine learning use cases. You can build projects and subscribe to both unstructured and structured data assets within the Amazon DataZone portal.

Data Lake

Data Lake Data Warehouse Unstructured Data Data Governance

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Solutions data architect: These individuals design and implement data solutions for specific business needs, including data warehouses, data marts, and data lakes. Application data architect: The application data architect designs and implements data models for specific software applications.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

Diving Deeper into the Data Lake

Data Lakes Meet Data Warehouses

Webinars

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Databricks Lakehouse Platform Streamlines Big Data Processing

Top Considerations for Building an Open Cloud Data Lake

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

2021 Gift Giving Guide for Data Nerds

How BMW streamlined data access using AWS Lake Formation fine-grained access control

How EUROGATE established a data mesh architecture using Amazon DataZone

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Choosing an open table format for your transactional data lake on AWS

Data governance in the age of generative AI

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Data Governance Makes Data Security Less Scary

Emerging Data Platforms Tackle Big Challenges

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

HEMA accelerates their data governance journey with Amazon DataZone

How Data Governance Protects Sensitive Data

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

Apply enterprise data governance and management using AWS Lake Formation and AWS IAM Identity Center

Talend Data Fabric Simplifies Data Life Cycle Management

Data Lakes: What Are They and Who Needs Them?

Big Data for Business: A Requirement for Today’s Business Analytics

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Snowflake Builds on Its Success

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Lake Formation 2023 year in review

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

Accelerate Amazon Redshift secure data use with Satori – Part 2

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

How to modernize data lakes with a data lakehouse architecture

AWS Lake Formation 2022 year in review

How Verizon Is Building A Big Data And AI Culture

Decentralize LF-tag management with AWS Lake Formation

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

Automatically detect Personally Identifiable Information in Amazon Redshift using AWS Glue

Amazon DataZone announces custom blueprints for AWS services

What is a data architect? Skills, salaries, and how to become a data framework master

Top analytics announcements of AWS re:Invent 2024

Stay Connected