Blog - Data Leaders Brief

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

The landscape of big data management has been transformed by the rising popularity of open table formats such as Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake. These formats, designed to address the limitations of traditional data storage systems, have become essential in modern data architectures.

Metadata

Metadata Data Warehouse Big Data Data Lake

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

NOVEMBER 13, 2024

We are excited to announce the acquisition of Octopai , a leading data lineage and catalog platform that provides data discovery and governance for enterprises to enhance their data-driven decision making.

Metadata

Metadata Management Data Governance Data-driven

Scalable analytics and centralized governance for Apache Iceberg tables using Amazon S3 Tables and Amazon Redshift

AWS Big Data

MAY 22, 2025

Amazon Redshift supports querying data stored in Apache Iceberg tables managed by Amazon S3 Tables , which we previously covered as part of getting started blog post. Well also review an example with simultaneously using data that resides both in Amazon Redshift and Amazon S3 Tables, enabling a unified analytics experience.

Analytics

Analytics Data Lake Management Insurance

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. In addition, organizations rely on an increasingly diverse array of digital systems, data fragmentation has become a significant challenge.

Data Integration

Data Integration Data Lake Statistics Data-driven

Are You Content with Your Organization’s Content Strategy?

Rocket-Powered Data Science

JULY 6, 2021

What attributes of your organization’s strategies can you attribute to successful outcomes? Do you converse with your employees about decisions that might be the converse of what they would expect? What you have just experienced is a plethora of heteronyms. Before we start, I have a few questions for you. Can you find them all?

Strategy

Strategy Machine Learning Metadata Knowledge Discovery

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

Addressing Data Mesh Technical Challenges with DataOps

DataKitchen

AUGUST 9, 2021

Below is our third post (3 of 5) on combining data mesh with DataOps to foster greater innovation while addressing the challenges of a decentralized architecture. We’ve talked about data mesh in organizational terms (see our first post, “ What is a Data Mesh? ”) and how team structure supports agility.

Testing

Testing Data Lake Metadata Publishing

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. Industry-leading price-performance Amazon Redshift offers up to three times better price-performance than alternative cloud data warehouses. large instances.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Is Your Team in Denial of Data Quality? Here’s How to Tell

DataKitchen

MAY 2, 2025

Is Your Team in Denial of Data Quality? Here’s How to Tell In many organizations, data quality problems fester in the shadowsignored, rationalized, or swept aside with confident-sounding statements that mask a deeper dysfunction. That doesn’t mean the data inside was correct. A pipeline ran “all green”?

Data Quality

Data Quality Snapshot Testing Data-driven

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

This week on the keynote stages at AWS re:Invent 2024, you heard from Matt Garman, CEO, AWS, and Swami Sivasubramanian, VP of AI and Data, AWS, speak about the next generation of Amazon SageMaker , the center for all of your data, analytics, and AI. The relationship between analytics and AI is rapidly evolving.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

Third, any commitment to a disruptive technology (including data-intensive and AI implementations) must start with a business strategy. I suggest that the simplest business strategy starts with answering three basic questions: What? That is: (1) What is it you want to do and where does it fit within the context of your organization?

Strategy

Strategy Experimentation Uncertainty Machine Learning

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Can it also help write SQL queries? The answer is yes. The solution architecture and workflow.

Metadata

Metadata Data Lake Modeling Data Warehouse

What is a Data Catalog? Value, Benefits and Features

TDAN

MARCH 16, 2021

No matter where data comes from, becoming data-driven depends on every member of your organization being able to find, access, and use the data they need.

Business Objectives

Business Objectives Data-driven IT Metadata

What Are ChatGPT and Its Friends?

O'Reilly on Data

MARCH 23, 2023

What is it, how does it work, what can it do, and what are the risks of using it? Many of these are unsurprising: you can ask it to write a letter, you can ask it to make up a story, you can ask it to write descriptive entries for products in a catalog. What Software Are We Talking About? It’s much more.

IT

IT Modeling Testing Risk

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. DataOps helps the data mesh deliver greater business agility by enabling decentralized domains to work in concert. . But first, let’s define the data mesh design pattern.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Read the complete blog below for a more detailed description of the vendors and their capabilities. This is not surprising given that DataOps enables enterprise data teams to generate significant business value from their data. Testing and Data Observability. Download the 2021 DataOps Vendor Landscape here.

Testing

Testing Machine Learning Consulting Data Science

DataOps Enables Your Data Fabric

DataKitchen

APRIL 28, 2021

Industry analysts who follow the data and analytics industry tell DataKitchen that they are receiving inquiries about “data fabrics” from enterprise clients on a near-daily basis. Gartner included data fabrics in their top ten trends for data and analytics in 2019. What is a Data Fabric?

Statistics

Statistics Optimization Data Analytics Technology

Generative AI in the Enterprise

O'Reilly on Data

NOVEMBER 28, 2023

We wanted to find out what people are actually doing, so in September we surveyed O’Reilly’s users. Our survey focused on how companies use generative AI, what bottlenecks they see in adoption, and what skills gaps need to be addressed. AI users say that AI programming (66%) and data analysis (59%) are the most needed skills.

Enterprise

Enterprise Testing Modeling Reporting

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

So if you’re going to move from your data from on-premise legacy data stores and warehouse systems to the cloud, you should do it right the first time. And as you make this transition, you need to understand what data you have, know where it is located, and govern it along the way. Then you must bulk load the legacy data.

Data Governance

Data Governance Metadata Testing Data Lake

Why data observability is essential to AI governance

erwin

DECEMBER 9, 2024

When it comes to using AI and machine learning across your organization, there are many good reasons to provide your data and analytics community with an intelligent data foundation. For instance, Large Language Models (LLMs) are known to ultimately perform better when data is structured.

Metadata

Metadata Data Quality Sales Modeling

Do I Need a Data Catalog?

erwin

JUNE 26, 2020

If you’re serious about a data-driven strategy , you’re going to need a data catalog. Organizations need a data catalog because it enables them to create a seamless way for employees to access and consume data and business assets in an organized manner. Three Types of Metadata in a Data Catalog.

Metadata

Metadata Cost-Benefit Measurement Data-driven

What Is Data Governance? (And Why Your Organization Needs It)

erwin

AUGUST 28, 2020

Organizations with a solid understanding of data governance (DG) are better equipped to keep pace with the speed of modern business. In this post, the erwin Experts address: What Is Data Governance? Why Is Data Governance Important? What Is Good Data Governance? What Are the Key Benefits of Data Governance?

Data Governance

Data Governance IT Cost-Benefit Metadata

Streaming Ingestion for Apache Iceberg With Cloudera Stream Processing

Cloudera

MARCH 2, 2023

Recently, we announced enhanced multi-function analytics support in Cloudera Data Platform (CDP) with Apache Iceberg. Iceberg is a high-performance open table format for huge analytic data sets. This enables you to maximize utilization of streaming data at scale. The Catalog Type should be set to Hive.

Snapshot

Snapshot Data Processing Metadata Data Processing

Databricks Follows Cloudera by Adopting Iceberg, While Snowflake Mulls Open Source Approach

Cloudera

JUNE 7, 2024

A constant flow of breaking news from the data lakehouse space is making notable tech headlines this week. On Tuesday, Databricks announced that it will acquire Tabular, a data management company founded by the creators of Apache Iceberg, Ryan Blue, Daniel Weeks, and Jason Reidfor. How Open is Your Open Source?

Management

Management Reporting Marketing Data Analytics

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

Data organizations often have a mix of centralized and decentralized activity. DataOps concerns itself with the complex flow of data across teams, data centers and organizational boundaries. It expands beyond tools and data architecture and views the data organization from the perspective of its processes and workflows.

Data Processing

Data Processing Data Lake Cost-Benefit Testing

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. AWS Glue 3.0

Data Lake

Data Lake Data Processing Metadata Snapshot

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

Open table formats are emerging in the rapidly evolving domain of big data management, fundamentally altering the landscape of data storage and analysis. By providing a standardized framework for data representation, open table formats break down data silos, enhance data quality, and accelerate analytics at scale.

Snapshot

Snapshot Metadata Data Lake Optimization

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

Amazon DataZone enables customers to discover, access, share, and govern data at scale across organizational boundaries, reducing the undifferentiated heavy lifting of making data and analytics tools accessible to everyone in the organization. This is challenging because access to data is managed differently by each of the tools.

Metadata

Metadata Data Lake Publishing Data Governance

Cloudera DataFlow for the Public Cloud: A technical deep dive

Cloudera

AUGUST 16, 2021

We just announced Cloudera DataFlow for the Public Cloud (CDF-PC), the first cloud-native runtime for Apache NiFi data flows. In this blog post we’re revisiting the challenges that come with running Apache NiFi at scale before we take a closer look at the architecture and core features of CDF-PC.

Dashboards

Dashboards Metrics KPI Data-driven

The Benefits of Data Management Automation: 8 Tips to Automate Data Management

erwin

FEBRUARY 6, 2020

As organizations deal with managing ever more data, the need to automate data management becomes clear. Last week erwin issued its 2020 State of Data Governance and Automation (DGA) Report. One piece of the research that stuck with me is that 70% of respondents spend 10 or more hours per week on data-related activities.

Management

Management Data Governance Cost-Benefit Metadata

Using Enterprise Architecture, Data Modeling & Data Governance for Rapid Crisis Response

erwin

MARCH 17, 2020

Teams need to urgently respond to everything from massive changes in workforce access and management to what-if planning for a variety of grim scenarios, in addition to building and documenting new applications and providing fast, accurate access to data for smart decision-making. Data Modeling. Data Governance.

Data Governance

Data Governance Enterprise Modeling Metadata

What is a data fabric architecture?

IBM Big Data Hub

MARCH 25, 2022

To simplify data access and empower users to leverage trusted information, organizations need a better approach that provides better insights and business outcomes faster, without sacrificing data access controls. There are many different approaches, but you’ll want an architecture that can be used regardless of your data estate.

Metadata

Metadata Data Quality Data Governance Data Integration

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Cloudera

NOVEMBER 13, 2020

This is part of our series of blog posts on recent enhancements to Impala. Apache Impala is synonymous with high-performance processing of extremely large datasets, but what if our data isn’t huge? What if our queries are very selective? It turns out that Apache Impala scales down with data just as well as it scales up.

Optimization

Optimization Metadata Statistics Cost-Benefit

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 1

AWS Big Data

AUGUST 27, 2024

Director of Product, Salesforce Data Cloud. In today’s ever-evolving business landscape, organizations must harness and act on data to fuel analytics, generate insights, and make informed decisions to deliver exceptional customer experiences. What is Salesforce Data Cloud? What is Amazon Redshift?

Data Lake

Data Lake Analytics Data-driven Management

What is new in Cloudera Streaming Analytics 1.4?

Cloudera

JUNE 7, 2021

Since then, we have been working hard to expose the full power of Apache Flink SQL and the existing Data Warehousing tools in CDP to combine it into a state-of-the-art real-time analytics platform. Flink SQL DDL and Catalog support. Catalog Integrations. This step significantly speeds up query development and data exploration.

Analytics

Analytics Management IT

Data Governance and Metadata Management: You Can’t Have One Without the Other

erwin

FEBRUARY 13, 2020

When an organization’s data governance and metadata management programs work in harmony, then everything is easier. Data governance is a complex but critical practice. There’s always more data to handle, much of it unstructured; more data sources, like IoT, more points of integration, and more regulatory compliance requirements.

Metadata

Metadata Data Governance Management Cost-Benefit

The People’s Data Catalog: Alation Featured as Top Choice in Eckerson’s Latest Report

Alation

JULY 15, 2021

Many data catalog initiatives fail. How can prospective buyers ensure they partner with the right catalog to drive success? According to the latest report from Eckerson Group, Deep Dive on Data Catalogs , shoppers must match the goals of their organizations to the capabilities of their chosen catalog.

Reporting

Reporting Data Governance Recreation/Entertainment Metadata

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

Cloudera

SEPTEMBER 29, 2020

Performance is one of the key, if not the most important deciding criterion, in choosing a Cloud Data Warehouse service. In today’s fast changing world, enterprises have to make data driven decisions quickly and for that they rely heavily on their data warehouse service. . Cloudera Data Warehouse vs HDInsight.

Data Warehouse

Data Warehouse Metadata Data-driven Machine Learning

Migrating Data to the Cloud: Things You Need to Know

Alation

APRIL 19, 2021

The cloud supports this new workforce, connecting remote workers to vital data, no matter their location. And what are the benefits? Data Cloud Migration Challenges and Solutions. Cloud migration is the process of moving enterprise data and infrastructure from on premise to off premise. But why migrate at all?

Cost-Benefit

Cost-Benefit Strategy Enterprise Data Strategy

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

Customers of all sizes and industries use Amazon Simple Storage Service (Amazon S3) to store data globally for a variety of use cases. Customers want to know how their data is being accessed, when it is being accessed, and who is accessing it. With exponential growth in data volume, centralized monitoring becomes challenging.

Metadata

Metadata Dashboards Metrics Visualization

3x better performance with CDP Data Warehouse compared to EMR in TPC-DS benchmark

Cloudera

DECEMBER 11, 2020

In a previous blog post on CDW performance, we compared Azure HDInsight to CDW. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to EMR 6.0 (also powered by Apache Hive-LLAP) on Amazon using the TPC-DS 2.9 More on this later in the blog.

Data Warehouse

Data Warehouse Metadata Machine Learning Measurement

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

Over the years, organizations have invested in creating purpose-built, cloud-based data lakes that are siloed from one another. A major challenge is enabling cross-organization discovery and access to data across these multiple data lakes, each built on different technology stacks.

Data Lake

Data Lake Publishing Metadata Data-driven

Top Three Requirements for Data Flows

Cloudera

MARCH 11, 2021

Data flows are an integral part of every modern enterprise. At Cloudera, we’re helping our customers implement data flows on-premises and in the public cloud using Apache NiFi , a core component of Cloudera DataFlow. In this blog post, I want to share the top three requirements for data flows in 2021 that we hear from our customers.

Data Integration

Data Integration Data Warehouse Testing Enterprise

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Webinars

Trending Sources

Scalable analytics and centralized governance for Apache Iceberg tables using Amazon S3 Tables and Amazon Redshift

Webinars

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Are You Content with Your Organization’s Content Strategy?

Scaling RISE with SAP data and AWS Glue

Addressing Data Mesh Technical Challenges with DataOps

Recap of Amazon Redshift key product announcements in 2024

Is Your Team in Denial of Data Quality? Here’s How to Tell

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

What is a Data Catalog? Value, Benefits and Features

What Are ChatGPT and Its Friends?

What is a Data Mesh?

The DataOps Vendor Landscape, 2021

DataOps Enables Your Data Fabric

Generative AI in the Enterprise

Doing Cloud Migration and Data Governance Right the First Time

Why data observability is essential to AI governance

Do I Need a Data Catalog?

What Is Data Governance? (And Why Your Organization Needs It)

Streaming Ingestion for Apache Iceberg With Cloudera Stream Processing

Databricks Follows Cloudera by Adopting Iceberg, While Snowflake Mulls Open Source Approach

Centralize Your Data Processes With a DataOps Process Hub

Use Apache Iceberg in a data lake to support incremental data processing

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Unlock data across organizational boundaries using Amazon DataZone – now generally available

Cloudera DataFlow for the Public Cloud: A technical deep dive

The Benefits of Data Management Automation: 8 Tips to Automate Data Management

Using Enterprise Architecture, Data Modeling & Data Governance for Rapid Crisis Response

What is a data fabric architecture?

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 1

What is new in Cloudera Streaming Analytics 1.4?

Data Governance and Metadata Management: You Can’t Have One Without the Other

The People’s Data Catalog: Alation Featured as Top Choice in Eckerson’s Latest Report

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

Migrating Data to the Cloud: Things You Need to Know

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

3x better performance with CDP Data Warehouse compared to EMR in TPC-DS benchmark

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Top Three Requirements for Data Flows

Stay Connected