Blog - Data Leaders Brief

the-future-of-working-with-data

Blog

Drug Launch Case Study: Amazing Efficiency Using DataOps

DataKitchen

DECEMBER 9, 2024

A Drug Launch Case Study in the Amazing Efficiency of a Data Team Using DataOps How a Small Team Powered the Multi-Billion Dollar Acquisition of a Pharma Startup When launching a groundbreaking pharmaceutical product, the stakes and the rewards couldnt be higher. The numbers speak for themselves: working towards the launch, an average of 1.5

Data Quality

Data Quality Data Lake Testing Statistics

We’ve Been Using FITT Data Architecture For Many Years, And Honestly, We Can Never Go Back

DataKitchen

JULY 22, 2025

TL;DR: Functional, Idempotent, Tested, Two-stage (FITT) data architecture has saved our sanity—no more 3 AM pipeline debugging sessions. We lived this nightmare for years until we discovered something that changed everything about how we approach data engineering. What is FITT Data Architecture? Sound familiar?

Data Architecture

Data Architecture Testing Data Quality Cost-Benefit

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Streamline Payment Applications & Lien Waivers Through Innovative Construction Technology

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Trending Sources

Nvidia unveils generative physical AI platform, agentic AI advances at CES

CIO Business Intelligence

JANUARY 6, 2025

AI requires us to build an entirely new computing stack to build AI factories, accelerated computing at data center scale, Rev Lebaredian, vice president of omniverse and simulation technology at Nvidia, said at a press conference Monday. Large language models (LLMs), Nvidia says, are one-dimensional.

B2B

B2B Interactive Modeling Reporting

Webinars

How to Streamline Payment Applications & Lien Waivers Through Innovative Construction Technology

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Vibe Coding, Vibe Checking, and Vibe Blogging

O'Reilly on Data

APRIL 22, 2025

It’s a middle path that’s worked surprisingly well for my personal projects, and today I want to share some insights from that journey. For the past decade and a half, I’ve been exploring the intersection of technology, education, and design as a professor of cognitive science and design at UC San Diego.

Software

Software Data Processing IT Visualization

Generative AI: A Self-Study Roadmap

KDnuggets

JULY 11, 2025

For developers and data practitioners, this shift presents both opportunity and challenge. Youll learn to work with large language models, implement retrieval-augmented generation systems, and deploy production-ready generative applications. This difference shapes everything about how you work with these systems.

Machine Learning

Machine Learning Testing Data Science Cost-Benefit

Author visual ETL flows on Amazon SageMaker Unified Studio (preview)

AWS Big Data

DECEMBER 4, 2024

Amazon SageMaker Unified Studio (preview) provides an integrated data and AI development environment within Amazon SageMaker. From the Unified Studio, you can collaborate and build faster using familiar AWS tools for model development, generative AI, data processing, and SQL analytics.

Visualization

Visualization Sales Data-driven Analytics

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in Amazon OpenSearch Service

AWS Big Data

OCTOBER 11, 2024

Snapshots are crucial for data backup and disaster recovery in Amazon OpenSearch Service. Snapshots play a critical role in providing the availability, integrity and ability to recover data in OpenSearch Service domains. Snapshots are not instantaneous.

Snapshot

Snapshot Dashboards Management Testing

Integrating DuckDB & Python: An Analytics Guide

KDnuggets

JUNE 10, 2025

By Josep Ferrer , KDnuggets AI Content Specialist on June 10, 2025 in Python Image by Author DuckDB is a fast, in-process analytical database designed for modern data analysis. As understanding how to deal with data is becoming more important, today I want to show you how to build a Python workflow with DuckDB and explore its key features.

OLAP

OLAP Analytics Machine Learning Data Science

The Data Quality Revolution Starts with You

DataKitchen

JUNE 20, 2025

The Data Quality Revolution Starts with One Person (Yes, That’s You!) Picture this: You’re sitting in yet another meeting where someone asks, “Can we trust this data?” Start Small, Think Customer Here’s where most data quality initiatives go wrong: they try to boil the ocean. Sound familiar?

Data Quality

Data Quality Measurement Dashboards Reporting

AI Agents in Analytics Workflows: Too Early or Already Behind?

KDnuggets

JUNE 13, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter AI Agents in Analytics Workflows: Too Early or Already Behind?

Analytics

Analytics Data Science Visualization Machine Learning

Automate Data Quality Reports with n8n: From CSV to Professional Analysis

KDnuggets

JUNE 26, 2025

Before diving into analysis, you need to understand what youre working with: How many missing values? Whats the overall data quality score? Whats the overall data quality score? Most data scientists spend 15-30 minutes manually exploring each new dataset—loading it into pandas, running.info() ,describe() , and.isnull().sum()

Data Quality

Data Quality Reporting Machine Learning Data Science

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

This week on the keynote stages at AWS re:Invent 2024, you heard from Matt Garman, CEO, AWS, and Swami Sivasubramanian, VP of AI and Data, AWS, speak about the next generation of Amazon SageMaker , the center for all of your data, analytics, and AI. The relationship between analytics and AI is rapidly evolving.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Go vs. Python for Modern Data Workflows: Need Help Deciding?

KDnuggets

JUNE 19, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter Go vs. Python for Modern Data Workflows: Need Help Deciding?

Experimentation

Experimentation Machine Learning Data Science Advertising

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

The Race For Data Quality In A Medallion Architecture The Medallion architecture pattern is gaining traction among data teams. It is a layered approach to managing and transforming data. By systematically moving data through these layers, the Medallion architecture enhances the data structure in a data lakehouse environment.

Data Quality

Data Quality Testing Metrics Reporting

Telco Enterprise Data Platforms: Key Success Factors in Building for an AI Future

Cloudera

DECEMBER 17, 2024

Because data management is a key variable for overcoming these challenges, carriers are turning to hybrid cloud solutions, which provide the flexibility and scalability needed to adapt to the evolving landscape 5G enables. Cost is also a constant concern, especially as carriers work to scale their infrastructure to support 5G networks.

Enterprise

Enterprise Data Architecture Data-driven Optimization

Agentic AI design: An architectural case study

CIO Business Intelligence

NOVEMBER 19, 2024

From obscurity to ubiquity, the rise of large language models (LLMs) is a testament to rapid technological advancement. Just a few short years ago, models like GPT-1 (2018) and GPT-2 (2019) barely registered a blip on anyone’s tech radar. There are many areas of research and focus sprouting from the capabilities presented through LLMs.

Cost-Benefit

Cost-Benefit Testing Interactive ROI

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

Build an analytics pipeline that is resilient to Avro schema changes using Amazon Athena

AWS Big Data

JULY 25, 2025

As a result, organizations collect vast amounts of data from diverse sensor devices monitoring everything from industrial equipment to smart buildings. As a result, the data structure (schema) of the information transmitted by these devices evolves continuously.

IoT

IoT Analytics Metadata Measurement

Key Takeaways from AWS re:Invent 2024

Cloudera

DECEMBER 19, 2024

Finally, we hosted a hands-on workshop to walk attendees through a Retrieval-Augmented Generation (RAG) workflow within Cloudera AI to show how easy it is to deploy contextualized models based on organizational data. AWS re:Invent is one of my favorite trade shows. You can read more about the partnership and its implications here.

Metadata

Metadata Data Processing Machine Learning Cost-Benefit

Accelerate your data quality journey for lakehouse architecture with Amazon SageMaker, Apache Iceberg on AWS, Amazon S3 tables, and AWS Glue Data Quality

AWS Big Data

JULY 28, 2025

In an era where data drives innovation and decision-making, organizations are increasingly focused on not only accumulating data but on maintaining its quality and reliability. By using AWS Glue Data Quality , you can measure and monitor the quality of your data. With this, you can make confident business decisions.

Data Quality

Data Quality Data Lake Data Architecture Visualization

AI’s Achilles’ Heel: The Data Quality Dilemma

DataFloq

JULY 20, 2025

As AI has gained prominence, all the data quality issues we’ve faced historically are still relevant. However, there are additional complexities faced when dealing with the nontraditional data that AI often makes use of. When using AI models with this type of data, quality is as important as ever. It isn’t easy!

Data Quality

Data Quality Unstructured Data Structured Data Modeling

The Risks and Governance Requirements of Agentic AI

Dataiku

JULY 16, 2025

This blog, the first in a three-part series, explores why and how organizations must implement new governance controls to address the distinct requirements of AI models and the agents that use them.

Risk

Risk Recreation/Entertainment Modeling Data Governance

Scaling Data Reliability: The Definitive Guide to Test Coverage for Data Engineers

DataKitchen

JULY 8, 2025

Scaling Data Reliability: The Definitive Guide to Test Coverage for Data Engineers The parallels between software development and data analytics have never been more apparent. Let us show you how to implement full-coverage automatic data checks on every table, column, tool, and step in your delivery process.

Testing

Testing Data Quality Cost-Benefit Manufacturing

How Nexthink built real-time alerts with Amazon Managed Service for Apache Flink

AWS Big Data

JUNE 12, 2025

Internally, Infinity comprises more than 300 microservices that use the power of Apache Kafka through Amazon Managed Service for Apache Kafka (Amazon MSK) for data ingestion and intra-service communication. Amazon MSK and ClickHouse serve as the backbone for this data pipeline.

Management

Management Metrics Cost-Benefit Technology

Data Insights Assure Quality Data and Confident Decisions!

Smarten

NOVEMBER 26, 2024

Why is Data Insight So Important? Every business (large or small) creates and depends upon data. Decisions were based on opinion, guesswork and a complicated mixture of notes and records reflecting historical results that might or might not be relevant to the future. But too much data can also create issues.

Machine Learning

Machine Learning Data Quality Predictive Modeling Metadata

Unifying metadata governance across Amazon SageMaker and Collibra

AWS Big Data

JULY 16, 2025

Managing metadata across tools and teams is a growing challenge for organizations building modern data and AI platforms. As data volumes grow and generative AI becomes more central to business strategy, teams need a consistent way to define, discover, and govern their datasets, features, and models.

Metadata

Metadata Publishing Management Modeling

Introducing Jobs in Amazon SageMaker

AWS Big Data

JULY 15, 2025

Processing large volumes of data efficiently is critical for businesses, and so data engineers, data scientists, and business analysts need reliable and scalable ways to run data processing workloads. The next generation of Amazon SageMaker is the center for all your data, analytics, and AI.

Visualization

Visualization Data Processing Metrics Big Data

Orchestrate data processing jobs, querybooks, and notebooks using visual workflow experience in Amazon SageMaker

AWS Big Data

JULY 15, 2025

Automation of data processing and data integration tasks and queries is essential for data engineers and analysts to maintain up-to-date data pipelines and reports. SageMaker Unified Studio offers multiple ways to integrate with data through the Visual ETL, Query Editor, and JupyterLab builders.

Data Processing

Data Processing Visualization Metadata Software

Predictive Models Are Nothing Without Trust

Cloudera

JANUARY 7, 2025

For a smaller airport in Canada, data has grown to be its North Star in an industry full of surprises. In order for data to bring true value to operationsand ultimately customer experiencesthose data insights must be grounded in trust. Data needs to be an asset and not a commodity. What’s the reason for data?

Predictive Modeling

Predictive Modeling Modeling Forecasting Data-driven

Vibing at Home

O'Reilly on Data

MAY 13, 2025

If it doesnt work, you have the AI try again, perhaps with a modified prompt that explains what went wrong. Simon Willison has an excellent blog post about what vibe coding means, when its appropriate, and how to do it. My programming consists of weekend projects and quick data analyses for OReilly. Vibe coding works.

Testing

Testing Modeling Dashboards IT

Introducing erwin Data Modeler 15.0: Bridging the Gap Between Data Modeling & Data Engineering

erwin

JULY 9, 2025

The data landscape has evolved dramatically. Today’s data teams are more distributed than ever, working with an increasingly complex modern data stack that spans cloud warehouses, transformation tools, and API-first architectures. erwin Data Modeler 15.0 erwin Data Modeler 15.0

Modeling

Modeling Metadata Visualization Data Architecture

From Machine Learning to AI: Simplifying the Path to Enterprise Intelligence

Cloudera

JANUARY 9, 2025

A Name That Matches the Moment For years, Clouderas platform has helped the worlds most innovative organizations turn data into action. Its a signal that were fully embracing the future of enterprise intelligence. But over the years, data teams and data scientists overcame these hurdles and AI became an engine of real-world innovation.

Machine Learning

Machine Learning Enterprise Data-driven Modeling

Data Quality Is Free

Anmut

JANUARY 30, 2025

If quality is free, why isn't data? Originally applied to manufacturing, this principle holds profound relevance in today’s data-driven world. How about data quality? How about data quality? What do we know about the cost of bad quality data? What do we know about the cost of bad quality data?

Data Quality

Data Quality Cost-Benefit Statistics Data-driven

What the Rise of AI Web Scrapers Means for Data Teams

Smart Data Collective

JUNE 22, 2025

Reading: What the Rise of AI Web Scrapers Means for Data Teams Share Notification Font Resizer Aa Font Resizer Aa Search About Help Privacy Follow US © 2008-23 SmartData Collective. You often hear about machine learning in broad strokes, but we aim to look at how these tools handle the messy reality of raw data. All Rights Reserved.

Big Data

Big Data Data mining Machine Learning Structured Data

Optimizing Business Performance with Dynamics 365 and BI Dashboards: The Missing Link Between Data and Decisions

BizAcuity

FEBRUARY 21, 2025

Businesses have never had access to more data than they do today. Because data without intelligence is just noise. Its not that the data doesnt existits that it isnt connected. Without proper Dynamics 365 integration, data remains siloed, and decision-making becomes guesswork.

Dashboards

Dashboards Optimization Sales Finance

Empower Your Cyber Defenders with Real-Time Analytics Author: Carolyn Duby, Field CTO

Cloudera

NOVEMBER 15, 2024

Today, cyber defenders face an unprecedented set of challenges as they work to secure and protect their organizations. In fact, according to the Identity Theft Resource Center (ITRC) Annual Data Breach Report , there were 2,365 cyber attacks in 2023 with more than 300 million victims, and a 72% increase in data breaches since 2021.

Analytics

Analytics Metadata Snapshot Data-driven

The New Career Crisis: AI Is Breaking the Entry-Level Path for Gen Z

DataFloq

JULY 1, 2025

Due to the emergence of artificial intelligence , a good number of the tedious or mechanical work usually performed by freshers has been automated , making freshers almost redundant. Smaller analysts have to deal with AI tools that are better and cheaper at cleaning, processing, and visualizing data at scale.

Machine Learning

Machine Learning Deep Learning Marketing Data-driven

Near real-time baggage operational insights for airlines using Amazon Kinesis Data Streams

AWS Big Data

JULY 8, 2025

Traditional baggage analytics systems often struggle with adaptability, real-time insights, data integrity, operational costs, and security, limiting their effectiveness in dynamic environments. Before diving into the solution’s architecture, we first examine the traditional baggage analytics process and the need for modernization.

Internet of Things

Internet of Things IoT Metrics Data-driven

Don’t get left in the dark with SAP PowerDesigner: Keep the lights on with erwin

erwin

MAY 22, 2025

As SAP PowerDesigner approaches its end of life (EOL), you could soon find yourself plunged into data modeling darkness. Limited connectivity PowerDesigners restrictions on data-platform integration will hinder your ability to adapt and scale. Because data modeling is more crucial than ever to keep up in the AI race.

Uncertainty

Uncertainty Modeling Metadata Data Integration

How Does Low-Code, No-Code Development Support BI Tools?

Smarten

JUNE 11, 2025

The LCNC approach allows business intelligence vendors to create, configure, integrate, deploy and support BI tools at a lower cost, reducing the cost of the solution and ensuring that your team can transition to a Citizen Data Scientist role.’ 70% of new business applications will use low-code/no-code technologies by 2025.

Business Intelligence

Business Intelligence ROI Cost-Benefit Dashboards

7 Steps to Mastering Vibe Coding

KDnuggets

JULY 8, 2025

It suggests a future where the friction between concept and creation is smoothed away by intelligent algorithms. The initial, near-magical experience of writing a simple prompt and receiving a working piece of software (should you be so lucky on your first attempt) is the foundation of this entire practice.

Machine Learning

Machine Learning Data Science Testing Advertising

Meet Michelle Hoover, Cloudera’s new SVP of Global Alliances and Channels

Cloudera

NOVEMBER 5, 2024

Cloudera is committed to fostering collaboration with partners, growing relationships, and innovating for the future. Michelle’s deep partnership expertise and strong relationships within the data and AI ecosystem make her a great leader of the Cloudera alliances and partner channels strategies.

Strategy

Strategy Software Technology Enterprise

AI’s Future: Not Always Bigger

O'Reilly on Data

MARCH 11, 2025

Did DeepSeek steal training data from OpenAI? Did DeepSeek steal training data from OpenAI? If youre in the trenches building tomorrows development practices today and interested in speaking at the event, wed love to hear from you by March 12. Thats roughly 1/10th what it cost to train OpenAIs most recent models. Claude 3.7,

Data Processing

Data Processing Software Modeling Marketing

No Python, No SQL Templates, No YAML: Why Your Open Source Data Quality Tool Should Generate 80% Of Your Data Quality Tests Automatically

DataKitchen

FEBRUARY 17, 2025

No Python, No SQL Templates, No YAML: Why Your Open Source Data Quality Tool Should Generate 80% Of Your Data Quality Tests Automatically As a data engineer, ensuring data quality is both essential and overwhelming. They are all in the realm of software, domain-specific language to help you write data quality tests.

Data Quality

Data Quality Testing Scorecard Data-driven

Drug Launch Case Study: Amazing Efficiency Using DataOps

We’ve Been Using FITT Data Architecture For Many Years, And Honestly, We Can Never Go Back

Webinars

Trending Sources

Nvidia unveils generative physical AI platform, agentic AI advances at CES

Webinars

Vibe Coding, Vibe Checking, and Vibe Blogging

Generative AI: A Self-Study Roadmap

Author visual ETL flows on Amazon SageMaker Unified Studio (preview)

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in Amazon OpenSearch Service

Integrating DuckDB & Python: An Analytics Guide

The Data Quality Revolution Starts with You

AI Agents in Analytics Workflows: Too Early or Already Behind?

Automate Data Quality Reports with n8n: From CSV to Professional Analysis

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Go vs. Python for Modern Data Workflows: Need Help Deciding?

The Race For Data Quality in a Medallion Architecture

Telco Enterprise Data Platforms: Key Success Factors in Building for an AI Future

Agentic AI design: An architectural case study

Scaling RISE with SAP data and AWS Glue

Build an analytics pipeline that is resilient to Avro schema changes using Amazon Athena

Key Takeaways from AWS re:Invent 2024

Accelerate your data quality journey for lakehouse architecture with Amazon SageMaker, Apache Iceberg on AWS, Amazon S3 tables, and AWS Glue Data Quality

AI’s Achilles’ Heel: The Data Quality Dilemma

The Risks and Governance Requirements of Agentic AI

Scaling Data Reliability: The Definitive Guide to Test Coverage for Data Engineers

How Nexthink built real-time alerts with Amazon Managed Service for Apache Flink

Data Insights Assure Quality Data and Confident Decisions!

Unifying metadata governance across Amazon SageMaker and Collibra

Introducing Jobs in Amazon SageMaker

Orchestrate data processing jobs, querybooks, and notebooks using visual workflow experience in Amazon SageMaker

Predictive Models Are Nothing Without Trust

Vibing at Home

Introducing erwin Data Modeler 15.0: Bridging the Gap Between Data Modeling & Data Engineering

From Machine Learning to AI: Simplifying the Path to Enterprise Intelligence

Data Quality Is Free

What the Rise of AI Web Scrapers Means for Data Teams

Optimizing Business Performance with Dynamics 365 and BI Dashboards: The Missing Link Between Data and Decisions

Empower Your Cyber Defenders with Real-Time Analytics Author: Carolyn Duby, Field CTO

The New Career Crisis: AI Is Breaking the Entry-Level Path for Gen Z

Near real-time baggage operational insights for airlines using Amazon Kinesis Data Streams

Don’t get left in the dark with SAP PowerDesigner: Keep the lights on with erwin

How Does Low-Code, No-Code Development Support BI Tools?

7 Steps to Mastering Vibe Coding

Meet Michelle Hoover, Cloudera’s new SVP of Global Alliances and Channels

AI’s Future: Not Always Bigger

No Python, No SQL Templates, No YAML: Why Your Open Source Data Quality Tool Should Generate 80% Of Your Data Quality Tests Automatically

Stay Connected