Data Leaders Brief

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. These activities translate into the ability to run append, insert, update, and delete operations. Data management is the foundation of quantitative research.

Metadata

Metadata Snapshot Cost-Benefit Optimization

It’s 2025. Are your data strategies strong enough to de-risk AI adoption?

CIO Business Intelligence

DECEMBER 11, 2024

Primary among these is the need to ensure the data that will power their AI strategies is fit for purpose. Primary among these is the need to ensure the data that will power their AI strategies is fit for purpose. In fact, a data framework is critical first step for AI success.

Risk

Risk Data Strategy Strategy Data Governance

The Power of Active Metadata

Data Virtualization

JULY 28, 2023

This is where active metadata comes in. Listen to “Why is Active Metadata Management Essential?” What is Active Metadata? The post The Power of Active Metadata appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Metadata

Metadata Data Integration Management Data Science

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Write queries faster with Amazon Q generative SQL for Amazon Redshift

AWS Big Data

NOVEMBER 7, 2024

Amazon Redshift is a fully managed, AI-powered cloud data warehouse that delivers the best price-performance for your analytics workloads at any scale. It enables you to get insights faster without extensive knowledge of your organization’s complex database schema and metadata. Within this feature, user data is secure and private.

Metadata

Metadata Sales Data Warehouse Optimization

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Key recommendations include investing in AI-powered cleansing tools and adopting federated governance models that empower domains while ensuring enterprise alignment.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Use Amazon Kinesis Data Streams to deliver real-time data to Amazon OpenSearch Service domains with Amazon OpenSearch Ingestion

AWS Big Data

NOVEMBER 11, 2024

OpenSearch Ingestion is a serverless pipeline that provides powerful tools for extracting, transforming, and loading data into an OpenSearch Service domain. You can use this approach for a variety of use cases, from real-time log analytics to integrating application messaging data for real-time search.

Metadata

Metadata Metrics Analytics Data Processing

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

Amazon SageMaker Lakehouse unifies all your data across Amazon S3 data lakes and Amazon Redshift data warehouses, helping you build powerful analytics and AI/ML applications on a single copy of data. In addition, organizations rely on an increasingly diverse array of digital systems, data fragmentation has become a significant challenge.

Data Integration

Data Integration Data Lake Statistics Data-driven

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

AWS Big Data

MAY 20, 2025

Solution overview To illustrate the new Amazon Bedrock Knowledge Bases integration with structured data in Amazon Redshift, we will build a conversational AI-powered assistant for financial assistance that is designed to help answer financial inquiries, like Who has the most accounts?

Structured Data

Structured Data Data Warehouse Analytics Finance

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

In this context, the adoption of data lakes and the data mesh framework emerges as a powerful approach. In this context, the adoption of data lakes and the data mesh framework emerges as a powerful approach. Data is the most significant asset of any organization. At the core of this ecosystem lies the enterprise data platform.

Sales

Sales Data-driven Data Processing Key Performance Indicator

AI adoption in the enterprise 2020

O'Reilly on Data

MARCH 18, 2020

When we analyzed the results , we determined the AI space was in a state of rapid change, so we eagerly commissioned a follow-up survey to help find out where AI stands right now. The new survey, which ran for a few weeks in December 2019, generated an enthusiastic 1,388 responses. There’s a lot to bite into here, so let’s get started.

Enterprise

Enterprise Deep Learning Data Governance Risk

Proposals for model vulnerability and security

O'Reilly on Data

MARCH 20, 2019

Like many others, I’ve known for some time that machine learning models themselves could pose security risks. A recent flourish of posts and papers has outlined the broader topic, listed attack vectors and vulnerabilities, started to propose defensive solutions, and provided the necessary framework for this post.

Modeling

Modeling Machine Learning Predictive Modeling Consulting

Federate to Amazon Redshift Query Editor v2 with Microsoft Entra ID

AWS Big Data

DECEMBER 10, 2024

Amazon Redshift is a fast, petabyte-scale, cloud data warehouse that tens of thousands of customers rely on to power their analytics workloads. Many customers have already implemented identity providers (IdPs) like Microsoft Entra ID (formerly Azure Active Directory) for single sign-on (SSO) access across their applications and services.

Sales

Sales Metadata Enterprise Testing

What Is Active Metadata Management and How Does It Work?

Octopai

OCTOBER 18, 2021

First, what active metadata management isn’t : “Okay, you metadata! Now, what active metadata management is (well, kind of): “Okay, you metadata! Metadata are the details on those tools: what they are, what to use them for, what to use them with. . Quit lounging around! And one – and zero!”.

Metadata

Metadata Management IT Data Quality

Data Governance as an Emergency Service

erwin

MAY 20, 2020

Organizations need to understand what the most critical operational activities are and the most impactful projects that need to proceed. Where crisis leads to vulnerability, data governance as an emergency service enables organization management to direct or redirect efforts to ensure activities continue and risks are mitigated.

Data Governance

Data Governance Metadata Risk Strategy

CRM’s Have a Big Data Technical Debt Problem: Here’s How to Fix It

Smart Data Collective

JULY 27, 2021

Customer relationship management (CRM) platforms are very reliant on big data. As these platforms become more widely used, some of the data resources they depend on become more stretched. CRM providers need to find ways to address the technical debt problem they are facing through new big data initiatives. What is technical debt anyway?

Big Data

Big Data Snapshot IT Dashboards

Automating Data Governance

erwin

OCTOBER 29, 2020

Whether driving digital experiences, mapping customer journeys, enhancing digital operations, developing digital innovations, finding new ways to interact with customers, or building digital ecosystems or marketplaces – all of this digital transformation is powered by data. Data readiness is everything. The State of Data Automation.

Data Governance

Data Governance Metadata Digital Transformation ROI

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

It’s a set of HTTP endpoints to perform operations such as invoking Directed Acyclic Graphs (DAGs), checking task statuses, retrieving metadata about workflows, managing connections and variables, and even initiating dataset-related events, without directly accessing the Airflow web interface or command line tools.

Interactive

Interactive Testing Data-driven Data Lake

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

If you’re already a software product manager (PM), you have a head start on becoming a PM for artificial intelligence (AI) or machine learning (ML). You already know the game and how it is played: you’re the coordinator who ties everything together, from the developers and designers to the executives. Why AI software development is different.

Management

Management Machine Learning Experimentation Metrics

Copyright, AI, and Provenance

O'Reilly on Data

DECEMBER 12, 2023

Lanier argues that training a model should be a protected activity, but that the output generated by a model can infringe on someone’s copyright. Generative AI stretches our current copyright law in unforeseen and uncomfortable ways. If a human writes software to generate prompts that in turn generate an image, is that copyrightable?

Modeling

Modeling Sales Software Statistics

What Is a Metadata Management Tool?

Octopai

DECEMBER 12, 2021

Metadata is the pertinent, practical details about data assets: what they are, what to use them for, what to use them with. Without metadata, data is just a heap of numbers and letters collecting dust. Where does metadata come from? What is a metadata management tool? What are examples of metadata management tools?

Metadata

Metadata Management Data Quality Data Governance

Achieve high availability in Amazon OpenSearch Multi-AZ with Standby enabled domains: A deep dive into failovers

AWS Big Data

JANUARY 10, 2024

Background Multi-AZ with Standby deploys OpenSearch Service domain instances across three Availability Zones, with two zones designated as active and one as standby. During regular operations, the active zone handles coordinator traffic for both read and write requests, as well as shard query traffic.

Metadata

Metadata Broadcasting Data Processing Modeling

Amazon DataZone introduces OpenLineage-compatible data lineage visualization in preview

AWS Big Data

JULY 8, 2024

The lineage visualized includes activities inside the Amazon DataZone business data catalog. Lineage captures the assets cataloged as well as the subscribers to those assets and to activities that happen outside the business data catalog captured programmatically using the API.

Visualization

Visualization Metadata Publishing Sales

RDF-Star: Metadata Complexity Simplified

Ontotext

JUNE 10, 2021

Relational databases (RDBS) have been the workhorse of ICT for decades. Being able to sit down and define a complete schema, a blueprint of the database, gave everyone assurity and consistency. Sure, you have to ignore the edge cases and hope that they stay edge cases. Surely, business requirements don’t change over time, right?

Metadata

Metadata Cost-Benefit OLAP Modeling

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

Iceberg tables maintain metadata to abstract large collections of files, providing data management features including time travel, rollback, data compaction, and full schema evolution, reducing management overhead. Implementing these solutions requires data sharing between purpose-built data stores.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Navigating the Data Maze: Top Trends in Data Intelligence for 2025

BI-Survey

MARCH 19, 2025

Today’s AI-powered catalogs are more like trusted advisors that work alongside you, anticipating needs and taking initiative. Think of AI copilots as your data expedition partnersthey’re not just answering questions but actively helping you navigate the terrain. Each day, the walls shift and new pathways emerge.

Metadata

Metadata Data-driven Unstructured Data Data Governance

Three Emerging Analytics Products Derived from Value-driven Data Innovation and Insights Discovery in the Enterprise

Rocket-Powered Data Science

JULY 19, 2023

I recently saw an informal online survey that asked users which types of data (tabular, text, images, or “other”) are being used in their organization’s analytics applications. This was not a scientific or statistically robust survey, so the results are not necessarily reliable, but they are interesting and provocative.

Data-driven

Data-driven Enterprise Analytics Machine Learning

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

AWS Big Data

JUNE 25, 2024

Salesforce Data Cloud is a data platform that unifies all of your company’s data into Salesforce’s Einstein 1 Platform , giving every team a 360-degree view of the customer to drive automation, create analytics, personalize engagement, and power trusted artificial intelligence (AI). What is Zero Copy Data Federation?

Data Lake

Data Lake Cost-Benefit Data-driven Data Warehouse

6 Case Studies on The Benefits of Business Intelligence And Analytics

datapine

JANUARY 31, 2022

Experience the power of Business Intelligence with our 14-days free trial! The benefits of business intelligence and analytics are plentiful and varied, but they all have one thing in common: they bring power. Consumers have grown more and more immune to ads that aren’t targeted directly at them.

Business Intelligence

Business Intelligence Analytics Cost-Benefit ROI

erwin, Microsoft and the Power of the Common Data Model

erwin

DECEMBER 17, 2020

What is Microsoft’s Common Data Model (CDM), and why is it so powerful? Insights: Given the meaning of the data is the same, regardless of the domain it came from, an organization can use its data to power business insights. It would make your work frustrating, complicated and slow. The CDM takes this concept to the next level.

Modeling

Modeling Metadata Data-driven Data Lake

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

The zero-copy pattern helps customers map the data from external platforms into the Salesforce metadata model, providing a virtual object definition for that object. The zero-copy pattern helps customers map the data from external platforms into the Salesforce metadata model, providing a virtual object definition for that object. “It

Data Integration

Data Integration Data Lake Data Warehouse Metadata

How ActionIQ built a truly composable customer data platform using Amazon Redshift

AWS Big Data

JULY 24, 2024

High costs associated with launching campaigns, the security risk of duplicating data, and the time spent on SQL requests have created a demand for a better solution for managing and activating customer data. Organizations are demanding secure, cost efficient, and time efficient solutions to power their marketing outcomes.

Data Warehouse

Data Warehouse Cost-Benefit Marketing Testing

Understanding The Phenomenal Impact of Social Data on B2B Funnels

Smart Data Collective

JANUARY 5, 2021

For B2B sales and marketing teams, few metaphors are as powerful as the sales funnel. Today, so much online activity has shifted to social media channels, leading to an inescapable conclusion. But the question that matters is, how can you measure and analyze the true impact of social activity on your sales funnel?

B2B

B2B Sales Big Data Marketing

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

AWS Big Data

SEPTEMBER 12, 2024

This platform is an advanced information retrieval system engineered to assist healthcare professionals and researchers in navigating vast repositories of medical documents, medical literature, research articles, clinical guidelines, protocol documents, activity logs, and more. Evidence generation is rife with knowledge management challenges.

Unstructured Data

Unstructured Data Metadata Machine Learning Consulting

Top 6 Benefits of Automating End-to-End Data Lineage

erwin

SEPTEMBER 17, 2020

According to erwin’s “2020 State of Data Governance and Automation” report , close to 70 percent of data professional respondents say they spend an average of 10 or more hours per week on data-related activities, and most of that time is spent searching for and preparing data. Doing Data Lineage Right. Faster Business Turnaround.

Cost-Benefit

Cost-Benefit Data Governance Metadata Reporting

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

AWS Big Data

APRIL 19, 2023

We split the solution into two primary components: generating Spark job metadata and running the SQL on Amazon EMR. The first component (metadata setup) consumes existing Hive job configurations and generates metadata such as number of parameters, number of actions (steps), and file formats.

Metadata

Metadata Data Lake Testing Consulting

Visualize Amazon DynamoDB insights in Amazon QuickSight using the Amazon Athena DynamoDB connector and AWS Glue

AWS Big Data

NOVEMBER 17, 2023

These include internet-scale web and mobile applications, low-latency metadata stores, high-traffic retail websites, Internet of Things (IoT) and time series data, online gaming, and more. Table metadata, such as column names and data types, is stored using the AWS Glue Data Catalog. You don’t need to write any code.

Visualization

Visualization Metadata Testing Internet of Things

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

As someone who is passionate about the transformative power of technology, it is fascinating to see intelligent computing – in all its various guises – bridge the schism between fantasy and reality. The excitement is palpable. This first article emphasizes data as the ‘foundation-stone’ of AI-based initiatives. Establishing a Data Foundation.

Data Governance

Data Governance IT Risk Data Lake

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Today, this is powering every part of the organization, from the customer-favorite online cake customization feature to democratizing data to drive business insight. Today, this is powering every part of the organization, from the customer-favorite online cake customization feature to democratizing data to drive business insight.

Data Governance

Data Governance Publishing Data-driven Metadata

What Are ChatGPT and Its Friends?

O'Reilly on Data

MARCH 23, 2023

Foundation models are a class of very powerful AI models that can be used as the basis for other models: they can be specialized, or retrained, or otherwise modified for specific applications. What is it, how does it work, what can it do, and what are the risks of using it? It has helped to write a book. Or a text adventure game.

IT

IT Modeling Testing Risk

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Ontotext

JULY 29, 2021

Knowledge graphs (KG) came later, but quickly became a powerful driver for adoption of Semantic Web standards and all species of semantic technology implementing them. Knowledge graphs (KG) came later, but quickly became a powerful driver for adoption of Semantic Web standards and all species of semantic technology implementing them.

Enterprise

Enterprise Metadata Knowledge Discovery Management

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

This is where the true power of complete data observability comes into play, and it’s time to get acquainted with its two critical parts: ‘Data in Place’ and ‘Data in Use.’ Complaints from dissatisfied customers and apathetic data providers only add to the mounting stress. One of the primary sources of tension?

Testing

Testing Data Quality Predictive Modeling Metrics

Amazon OpenSearch Service Under the Hood: Multi-AZ with Standby

AWS Big Data

MAY 10, 2023

This option provides improved reliability and the added benefit of simplifying cluster configuration and management by enforcing best practices and reducing complexity. In this post, we share how Multi-AZ with Standby works under the hood to achieve high resiliency and consistent performance to meet the four 9s. This approach was reactive at best.

Snapshot

Snapshot Testing Metadata Management

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

Cloudera

SEPTEMBER 29, 2020

In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to Microsoft HDInsight (also powered by Apache Hive-LLAP) on Azure using the TPC-DS 2.9 Once the benchmark run has completed, the Virtual Warehouse automatically suspends itself when no further activity is detected.

Data Warehouse

Data Warehouse Metadata Data-driven Machine Learning

Build a high-performance quant research platform with Apache Iceberg

It’s 2025. Are your data strategies strong enough to de-risk AI adoption?

Webinars

Trending Sources

The Power of Active Metadata

Webinars

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Data’s dark secret: Why poor quality cripples AI and growth

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Use Amazon Kinesis Data Streams to deliver real-time data to Amazon OpenSearch Service domains with Amazon OpenSearch Ingestion

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AI adoption in the enterprise 2020

Proposals for model vulnerability and security

Federate to Amazon Redshift Query Editor v2 with Microsoft Entra ID

What Is Active Metadata Management and How Does It Work?

Data Governance as an Emergency Service

CRM’s Have a Big Data Technical Debt Problem: Here’s How to Fix It

Automating Data Governance

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

What you need to know about product management for AI

Copyright, AI, and Provenance

What Is a Metadata Management Tool?

Achieve high availability in Amazon OpenSearch Multi-AZ with Standby enabled domains: A deep dive into failovers

Amazon DataZone introduces OpenLineage-compatible data lineage visualization in preview

RDF-Star: Metadata Complexity Simplified

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Navigating the Data Maze: Top Trends in Data Intelligence for 2025

Three Emerging Analytics Products Derived from Value-driven Data Innovation and Insights Discovery in the Enterprise

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

6 Case Studies on The Benefits of Business Intelligence And Analytics

erwin, Microsoft and the Power of the Common Data Model

Salesforce debuts Zero Copy Partner Network to ease data integration

How ActionIQ built a truly composable customer data platform using Amazon Redshift

Understanding The Phenomenal Impact of Social Data on B2B Funnels

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

Top 6 Benefits of Automating End-to-End Data Lineage

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

Visualize Amazon DynamoDB insights in Amazon QuickSight using the Amazon Athena DynamoDB connector and AWS Glue

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

HEMA accelerates their data governance journey with Amazon DataZone

What Are ChatGPT and Its Friends?

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

Amazon OpenSearch Service Under the Hood: Multi-AZ with Standby

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

Stay Connected