Data Leaders Brief

Knowledge Graphs are Critical to Data Intelligence and AI

David Menninger's Analyst Perspectives

MAY 22, 2025

I recently described how business data catalogs are evolving into data intelligence catalogs. These catalogs combine technical and business metadata and data governance capabilities with knowledge graph functionality to deliver a holistic, business-level view of data production and consumption.

Metadata

Metadata Enterprise Data-driven Publishing

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. Consider a common scenario: A streaming pipeline continuously writes data to an Iceberg table while scheduled maintenance jobs perform compaction operations.

Snapshot

Snapshot Management Metadata Big Data

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

The landscape of big data management has been transformed by the rising popularity of open table formats such as Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake. These formats, designed to address the limitations of traditional data storage systems, have become essential in modern data architectures.

Metadata

Metadata Data Warehouse Big Data Data Lake

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Announcing Open Source DataOps Data Quality TestGen 3.0

DataKitchen

FEBRUARY 20, 2025

Announcing DataOps Data Quality TestGen 3.0: Open-Source, Generative Data Quality Software. It assesses your data, deploys production testing, monitors progress, and helps you build a constituency within your company for lasting change. New Quality Dashboard & Score Explorer. DataOps just got more intelligent.

Data Quality

Data Quality Scorecard Testing Dashboards

How to Evaluate a Data Catalog

More data, more problems. Do you struggle to find, understand, and trust data in your daily work? A data catalog will make your work life easier -- and more productive. This guide offers handy tips for evaluating data catalogs. But where do you start?

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Data is the most significant asset of any organization. However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio

AWS Big Data

APRIL 9, 2025

Were excited to introduce a new enhancement to the search experience in Amazon SageMaker Catalog , part of the next generation of Amazon SageMaker exact match search using technical identifiers. This yields results with exact precision, dramatically improving the speed and accuracy of data discovery.

Metadata

Metadata Metrics Data-driven Cost-Benefit

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and from third-party sources. Use case Amazon DataZone addresses your data sharing challenges and optimizes data availability.

Analytics

Analytics Visualization Data Governance Data-driven

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

Amazon Q data integration , introduced in January 2024, allows you to use natural language to author extract, transform, load (ETL) jobs and operations in AWS Glue specific data abstraction DynamicFrame. The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios.

Data Integration

Data Integration Visualization Data Processing Big Data

Why Modern Data Challenges Require a New Approach to Governance

A healthy data-driven culture minimizes knowledge debt while maximizing analytics productivity. Agile Data Governance is the process of creating and improving data assets by iteratively capturing knowledge as data producers and consumers work together so that everyone can benefit.

Metadata

Author visual ETL flows on Amazon SageMaker Unified Studio (preview)

AWS Big Data

DECEMBER 4, 2024

Amazon SageMaker Unified Studio (preview) provides an integrated data and AI development environment within Amazon SageMaker. From the Unified Studio, you can collaborate and build faster using familiar AWS tools for model development, generative AI, data processing, and SQL analytics.

Visualization

Visualization Sales Data-driven Analytics

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. Together, these capabilities enable terminal operators to enhance efficiency and competitiveness in an industry that is increasingly data driven.

IoT

IoT Machine Learning Metadata Data-driven

Amazon SageMaker Lakehouse now supports attribute-based access control

AWS Big Data

APRIL 24, 2025

Amazon SageMaker Lakehouse now supports attribute-based access control (ABAC) with AWS Lake Formation , using AWS Identity and Access Management (IAM) principals and session tags to simplify data access, grant creation, and maintenance. You can then query, analyze, and join the data using Redshift, Amazon Athena , Amazon EMR , and AWS Glue.

Sales

Sales Data Lake Management Data-driven

The key to operational AI: Modern data architecture

CIO Business Intelligence

NOVEMBER 27, 2024

From customer service chatbots to marketing teams analyzing call center data, the majority of enterprises—about 90% according to recent data —have begun exploring AI. For companies investing in data science, realizing the return on these investments requires embedding AI deeply into business processes.

Data Architecture

Data Architecture Cost-Benefit Machine Learning Experimentation

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

NOVEMBER 13, 2024

We are excited to announce the acquisition of Octopai , a leading data lineage and catalog platform that provides data discovery and governance for enterprises to enhance their data-driven decision making.

Metadata

Metadata Management Data Governance Data-driven

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

To achieve this, they aimed to break down data silos and centralize data from various business units and countries into the BMW Cloud Data Hub (CDH). However, the initial version of CDH supported only coarse-grained access control to entire data assets, and hence it was not possible to scope access to data asset subsets.

Data Lake

Data Lake Sales Metadata Machine Learning

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

We live in a data-rich, insights-rich, and content-rich world. Data collections are the ones and zeroes that encode the actionable insights (patterns, trends, relationships) that we seek to extract from our data through machine learning and data science. Plus, AI can also help find key insights encoded in data.

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

The need to integrate diverse data sources has grown exponentially, but there are several common challenges when integrating and analyzing data from multiple sources, services, and applications. First, you need to create and maintain independent connections to the same data source for different services.

Visualization

Visualization Data Processing Testing Publishing

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

In today’s rapidly evolving financial landscape, data is the bedrock of innovation, enhancing customer and employee experiences and securing a competitive edge. Like many large financial institutions, ANZ Institutional Division operated with siloed data practices and centralized data management teams.

Metadata

Metadata Data Governance Data Quality Data-driven

The future of data: A 5-pillar approach to modern data management

CIO Business Intelligence

DECEMBER 11, 2024

In todays economy, as the saying goes, data is the new gold a valuable asset from a financial standpoint. A similar transformation has occurred with data. More than 20 years ago, data within organizations was like scattered rocks on early Earth.

Management

Management Data Governance Data Science Reporting

Scalable analytics and centralized governance for Apache Iceberg tables using Amazon S3 Tables and Amazon Redshift

AWS Big Data

MAY 22, 2025

Amazon Redshift supports querying data stored in Apache Iceberg tables managed by Amazon S3 Tables , which we previously covered as part of getting started blog post. Well also review an example with simultaneously using data that resides both in Amazon Redshift and Amazon S3 Tables, enabling a unified analytics experience.

Analytics

Analytics Data Lake Management Insurance

Becoming a machine learning company means investing in foundational technologies

O'Reilly on Data

MAY 21, 2019

Companies successfully adopt machine learning either by building on existing data products and services, or by modernizing existing models and algorithms. In this post, I share slides and notes from a keynote I gave at the Strata Data Conference in London earlier this year. Use ML to unlock new data types—e.g.,

Machine Learning

Machine Learning Technology Deep Learning Data Science

Steps taken to build Sevita’s first enterprise data platform

CIO Business Intelligence

OCTOBER 23, 2024

As such, the data on labor, occupancy, and engagement is extremely meaningful. Here, CIO Patrick Piccininno provides a roadmap of his journey from data with no integration to meaningful dashboards, insights, and a data literate culture. You ’re building an enterprise data platform for the first time in Sevita’s history.

Enterprise

Enterprise Dashboards KPI Data Lake

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. In addition, organizations rely on an increasingly diverse array of digital systems, data fragmentation has become a significant challenge.

Data Integration

Data Integration Data Lake Statistics Data-driven

The Symbiotic Relationship Between Data Governance and AI

David Menninger's Analyst Perspectives

MAY 14, 2025

Data governance has always been a critical part of the data and analytics landscape. However, for many years, it was seen as a preventive function to limit access to data and ensure compliance with security and data privacy requirements. Data governance is integral to an overall data intelligence strategy.

Data Governance

Data Governance Data Quality Data-driven Metadata

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

If you’re already a software product manager (PM), you have a head start on becoming a PM for artificial intelligence (AI) or machine learning (ML). You’re responsible for the design, the product-market fit, and ultimately for getting the product out the door. Why AI software development is different.

Management

Management Machine Learning Experimentation Metrics

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Are You Content with Your Organization’s Content Strategy?

Rocket-Powered Data Science

JULY 6, 2021

Specifically, in the modern era of massive data collections and exploding content repositories, we can no longer simply rely on keyword searches to be sufficient. Clearly, such a content delivery system is not good for business productivity. I believe that this product is good” is quite different from a post that states “Yeah, sure.

Strategy

Strategy Machine Learning Metadata Knowledge Discovery

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

This week on the keynote stages at AWS re:Invent 2024, you heard from Matt Garman, CEO, AWS, and Swami Sivasubramanian, VP of AI and Data, AWS, speak about the next generation of Amazon SageMaker , the center for all of your data, analytics, and AI. The relationship between analytics and AI is rapidly evolving.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

Through a visual designer, you can configure custom AI search flowsa series of AI-driven data enrichments performed during ingestion and search. Each processor applies a type of data transform such as encoding text into vector embeddings, or summarizing search results with a chatbot AI service.

Machine Learning

Machine Learning Visualization Dashboards Metadata

Oracle Wants to Be the Database for AI

David Menninger's Analyst Perspectives

MAY 15, 2025

Oracle recently hosted its annual Database Analyst Summit, sharing the vision and strategy for its data platform. While much of the event was under non-disclosure as product plans and launch schedules are finalized, it still served as a useful recap of the broad portfolio of data platform capabilities that Oracle has to offer.

Data Lake

Data Lake Data Warehouse Machine Learning Software

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. Industry-leading price-performance Amazon Redshift offers up to three times better price-performance than alternative cloud data warehouses.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Addressing Data Mesh Technical Challenges with DataOps

DataKitchen

AUGUST 9, 2021

Below is our third post (3 of 5) on combining data mesh with DataOps to foster greater innovation while addressing the challenges of a decentralized architecture. We’ve talked about data mesh in organizational terms (see our first post, “ What is a Data Mesh? ”) and how team structure supports agility. Source: Thoughtworks.

Testing

Testing Data Lake Metadata Publishing

Incremental refresh for Amazon Redshift materialized views on data lake tables

AWS Big Data

NOVEMBER 8, 2024

Amazon Redshift is a fast, fully managed cloud data warehouse that makes it cost-effective to analyze your data using standard SQL and business intelligence tools. However, it also offers additional optimizations that you can use to further improve this performance and achieve even faster query response times from your data warehouse.

Data Lake

Data Lake Data Warehouse Optimization Testing

Specialized tools for machine learning development and model governance are becoming essential

O'Reilly on Data

APRIL 2, 2019

A few years ago, we started publishing articles (see “Related resources” at the end of this post) on the challenges facing data teams as they start taking on more machine learning (ML) projects. So, why is this new open source project resonating with data scientists and machine learning engineers? The upcoming 0.9.0

Machine Learning

Machine Learning Modeling Data Science Software

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

In this post, we focus on data management implementation options such as accessing data directly in Amazon Simple Storage Service (Amazon S3), using popular data formats like Parquet, or using open table formats like Iceberg. Data management is the foundation of quantitative research.

Metadata

Metadata Snapshot Cost-Benefit Optimization

What are model governance and model operations?

O'Reilly on Data

JUNE 19, 2019

A look at the landscape of tools for building and deploying robust, production-ready machine learning models. A few factors are contributing to this strong interest in implementing ML in products and services. Quality depends not just on code, but also on data, tuning, regular updates, and retraining.

Modeling

Modeling Machine Learning Testing Metrics

Marsh McLennan IT reorg lays foundation for gen AI

CIO Business Intelligence

NOVEMBER 1, 2024

Re-platforming to reduce friction Marsh McLennan had been running several strategic data centers globally, with some workloads on the cloud that had sprung up organically. Several co-location centers host the remainder of the firm’s workloads, and Marsh McLennans big data centers will go away once all the workloads are moved, Beswick says.

IT

IT Insurance Consulting Risk

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

AWS Big Data

FEBRUARY 26, 2025

Amazon Web Services (AWS) has been recognized as a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools. This recognition, we feel, reflects our ongoing commitment to innovation and excellence in data integration, demonstrating our continued progress in providing comprehensive data management solutions.

Data Integration

Data Integration Data Lake Data Warehouse Unstructured Data

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

This is not surprising given that DataOps enables enterprise data teams to generate significant business value from their data. Companies that implement DataOps find that they are able to reduce cycle times from weeks (or months) to days, virtually eliminate data errors, increase collaboration, and dramatically improve productivity.

Testing

Testing Machine Learning Consulting Data Science

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

Third, any commitment to a disruptive technology (including data-intensive and AI implementations) must start with a business strategy. These changes may include requirements drift, data drift, model drift, or concept drift. I suggest that the simplest business strategy starts with answering three basic questions: What?

Strategy

Strategy Experimentation Uncertainty Machine Learning

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

We need to do more than automate model building with autoML; we need to automate tasks at every stage of the data pipeline. In a previous post , we talked about applications of machine learning (ML) to software development, which included a tour through sample tools in data science and for managing data infrastructure.

Machine Learning

Machine Learning Software Metadata Testing

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Data has become an invaluable asset for businesses, offering critical insights to drive strategic decision-making and operational optimization. HEMA is a household Dutch retail brand name since 1926, providing daily convenience products using unique design. This post is cowritten by Tommaso Paracciani and Oghosa Omorisiagbon from HEMA.

Data Governance

Data Governance Publishing Data-driven Metadata

Knowledge Graphs are Critical to Data Intelligence and AI

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Webinars

Trending Sources

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Webinars

Announcing Open Source DataOps Data Quality TestGen 3.0

How to Evaluate a Data Catalog

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Why Modern Data Challenges Require a New Approach to Governance

Author visual ETL flows on Amazon SageMaker Unified Studio (preview)

How EUROGATE established a data mesh architecture using Amazon DataZone

Amazon SageMaker Lakehouse now supports attribute-based access control

The key to operational AI: Modern data architecture

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

How BMW streamlined data access using AWS Lake Formation fine-grained access control

SAP Datasphere Powers Business at the Speed of Data

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

The future of data: A 5-pillar approach to modern data management

Scalable analytics and centralized governance for Apache Iceberg tables using Amazon S3 Tables and Amazon Redshift

Becoming a machine learning company means investing in foundational technologies

Steps taken to build Sevita’s first enterprise data platform

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

The Symbiotic Relationship Between Data Governance and AI

Scaling RISE with SAP data and AWS Glue

What you need to know about product management for AI

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Are You Content with Your Organization’s Content Strategy?

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

Oracle Wants to Be the Database for AI

Recap of Amazon Redshift key product announcements in 2024

Addressing Data Mesh Technical Challenges with DataOps

Incremental refresh for Amazon Redshift materialized views on data lake tables

Specialized tools for machine learning development and model governance are becoming essential

Build a high-performance quant research platform with Apache Iceberg

What are model governance and model operations?

Marsh McLennan IT reorg lays foundation for gen AI

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

The DataOps Vendor Landscape, 2021

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Deep automation in machine learning

HEMA accelerates their data governance journey with Amazon DataZone

Stay Connected