Testing - Data Leaders Brief

Data Quality Testing: A Shared Resource for Modern Data Teams

DataKitchen

JUNE 6, 2025

Data Quality Testing: A Shared Resource for Modern Data Teams In today’s AI-driven landscape, where data is king, every role in the modern data and analytics ecosystem shares one fundamental responsibility: ensuring that incorrect data never reaches business customers. That must change.

Data Quality

Data Quality Testing Dashboards Metrics

Scaling Data Reliability: The Definitive Guide to Test Coverage for Data Engineers

DataKitchen

JULY 8, 2025

Scaling Data Reliability: The Definitive Guide to Test Coverage for Data Engineers The parallels between software development and data analytics have never been more apparent. And how you can create 1000s of tests in a minute using open source tools.

Testing

Testing Data Quality Cost-Benefit Manufacturing

We’ve Been Using FITT Data Architecture For Many Years, And Honestly, We Can Never Go Back

DataKitchen

JULY 22, 2025

TL;DR: Functional, Idempotent, Tested, Two-stage (FITT) data architecture has saved our sanity—no more 3 AM pipeline debugging sessions. Each transformation becomes a mathematical function that you can reason about, test, and trust. Want to test a change safely? Consider a typical calculation of customer lifetime value.

Data Architecture

Data Architecture Testing Data Quality Cost-Benefit

Webinars

How to Streamline Payment Applications & Lien Waivers Through Innovative Construction Technology

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Salesforce adds Testing Center to Agentforce for AI agents

CIO Business Intelligence

NOVEMBER 21, 2024

Salesforce has added a new set of tools under the name of Testing Center to its agentic AI offering, Agentforce, to help enterprise users test and observe agents before deploying them in production. Sandboxes, according to Salesforce, work by mirroring images of an enterprise’s production data and configurations. “By

Testing

Testing Unstructured Data Interactive Metadata

The Ultimate Guide to Apache Airflow DAGS

You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale your Airflow environment Systematically test and debug Airflow DAGs By the end of this guide, you’ll know how to (..)

Testing

Webinar: Test Coverage: The Software Development Idea That Supercharges Data Quality & Data Engineering

DataKitchen

JULY 17, 2025

In software engineering, test coverage is non-negotiable. So why do most data teams still ship data without knowing what’s tested—and what isn’t? Explore how leading data teams are applying the proven discipline of test coverage to data and analytics—automating quality checks across every table, not just the “important” ones.

Data Quality

Data Quality Testing Software Analytics

Image Classification with JAX, Flax, and Optax : A Step-by-Step Guide

Analytics Vidhya

NOVEMBER 18, 2024

This tutorial starts from how to set up the environment and preprocess the data to how to define the CNN structure and the final step is to test the model. […] The post Image Classification with JAX, Flax, and Optax : A Step-by-Step Guide appeared first on Analytics Vidhya.

Testing

Testing Modeling Analytics Deep Learning

Beyond “Prompt and Pray”

O'Reilly on Data

JANUARY 21, 2025

Instead of having LLMs make runtime decisions about business logic, use them to help create robust, reusable workflows that can be tested, versioned, and maintained like traditional software. By predefined, tested workflows, we mean creating workflows during the design phase, using AI to assist with ideas and patterns.

Cost-Benefit

Cost-Benefit Testing Interactive Software

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

Get Off The Blocks Fast: Data Quality In The Bronze Layer Effective Production QA techniques begin with rigorous automated testing at the Bronze layer , where raw data enters the lakehouse environment. Data Drift Checks (does it make sense): Is there a shift in the overall data quality?

Data Quality

Data Quality Testing Metrics Reporting

Best Practices for Creating Long-Lasting and Continuous Discovery Habits

Speaker: Teresa Torres, Internationally Acclaimed Author, Speaker, and Coach at ProductTalk.org

As a result, many of us are still stuck in a project-world rut: research, usability testing, engineering, and a/b testing, ad nauseam. Industry-wide, product teams have adopted discovery practices like customer interviews and experimentation merely for end-user satisfaction.

Testing

Drug Launch Case Study: Amazing Efficiency Using DataOps

DataKitchen

DECEMBER 9, 2024

data quality tests every day to support a cast of analysts and customers. DataKitchen loaded this data and implemented data tests to ensure integrity and data quality via statistical process control (SPC) from day one. The numbers speak for themselves: working towards the launch, an average of 1.5

Data Quality

Data Quality Data Lake Testing Statistics

How DeNA Co., Ltd. accelerated anonymized data quality tests up to 100 times faster using Amazon Redshift Serverless and dbt

AWS Big Data

DECEMBER 17, 2024

Conduct data quality tests on anonymized data in compliance with data policies Conduct data quality tests to quickly identify and address data quality issues, maintaining high-quality data at all times. The challenge Data quality tests require performing 1,300 tests on 10 TB of data monthly.

Data Quality

Data Quality Testing Metrics Optimization

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

response = client.create( key="test", value="Test value", description="Test description" ) print(response) print("nListing all variables.") variables = client.list() print(variables) print("nGetting the test variable.") Creating a test variable. Creating a test variable. Creating a test variable.

Interactive

Interactive Testing Data-driven Data Lake

How to Learn Math for Data Science: A Roadmap for Beginners

KDnuggets

JUNE 12, 2025

When you know hypothesis testing, you know whether your A/B test results actually mean something. Hypothesis testing gives you the framework to make valid and provable claims. Learn t-tests, chi-square tests, and confidence intervals. When you understand distributions, you can spot data quality issues instantly.

Data Science

Data Science Statistics Machine Learning Optimization

New Planning Maturity Assessment

Test your Planning Fitness. In today's new supply chain paradigm, resilience and agility are key. Is your planning process fit enough to keep up with the pace of change? Is your tech stack helping or hindering your progress? Take AIMMS's new quiz to uncover learnings and benchmark yourself against peers!

Testing

AI-native software engineering may be closer than developers think

CIO Business Intelligence

OCTOBER 17, 2024

“This agentic approach to creation and validation is especially useful for people who are already taking a test-driven development approach to writing software,” Davis says. With existing, human-written tests you just loop through generated code, feeding the errors back in, until you get to a success state.”

Software

Software Testing Experimentation Consulting

10 GitHub LLM Repositories Every AI Engineer Should Know

Analytics Vidhya

JULY 8, 2025

Are you an AI engineer, wondering how to attain resources that can put your skills to a practical test? It might be difficult to look for the right solution for you, based on the vast amount of information out there.

Testing

Testing Analytics IT

What’s Killing Data Innovation At Your Company? The Hidden Crisis in Data Usability

DataKitchen

JULY 8, 2025

Fix The Fear: Why Data Engineers and Quality Teams Love TestGen We test software code with care and consistency—so why don’t we apply the same discipline to our data? In production, TestGen continuously monitors your data with more than forty column-level tests. Just connect it to your data and start testing.

Scorecard

Scorecard Data Quality Statistics Testing

En marcha un test de estrés en español para medir los sesgos de la IA generativa

CIO Business Intelligence

JANUARY 8, 2025

Este tipo de datasets estn especialmente diseados para constituir un test de estrs que ponga al lmite a los modelos. Los investigadores introducen la que constituira una respuesta sesgada para cada situacin, lo que sirve de base para comparar con los resultados que ofrece la IA.

Testing

Testing Technology

New Planning Maturity Assessment

Test your Planning Fitness. In today's new supply chain paradigm, resilience and agility are key. Is your planning process fit enough to keep up with the pace of change? Is your tech stack helping or hindering your progress? Take AIMMS's new quiz to uncover learnings and benchmark yourself against peers!

Testing

Agentic AI design: An architectural case study

CIO Business Intelligence

NOVEMBER 19, 2024

Development teams starting small and building up, learning, testing and figuring out the realities from the hype will be the ones to succeed. In our real-world case study, we needed a system that would create test data. This data would be utilized for different types of application testing.

Cost-Benefit

Cost-Benefit Testing Interactive ROI

MLFlow Mastery: A Complete Guide to Experiment Tracking and Model Management

KDnuggets

JUNE 23, 2025

It logs parameters, metrics, and files created during tests. This gives a clear record of what was tested. You can see how each test performed. It saves exact settings used for each test. CI/CD for Machine Learning : Integrate MLflow with Jenkins or GitHub Actions to automate testing and deployment of ML models.

Modeling

Modeling Management Machine Learning Data Science

7 Python Statistics Tools That Data Scientists Actually Use in 2025 - KDnuggets

KDnuggets

JULY 14, 2025

SciPy: Advanced Statistical Functions and More SciPy builds on NumPy and provides a wide range of advanced statistical functions, probability distributions, and hypothesis testing capabilities. Statsmodels: In-Depth Statistical Modeling Statsmodels is designed for statistical modeling and hypothesis testing. Learn more: [link] 5.

Statistics

Statistics Machine Learning Data Science Advertising

Generative Logic

O'Reilly on Data

DECEMBER 10, 2024

That seemed like something worth testing outor at least playing around withso when I heard that it very quickly became available in Ollama and wasnt too large to run on a moderately well-equipped laptop, I downloaded QwQ and tried it out. How do you test a reasoning model? But thats hardly a valid test.

Testing

Testing Modeling Software IT

100 Pipeline Plays: The Modern Sales Playbook

Advertiser: ZoomInfo

Apply tested plays to your funnel - Use real-world scenarios, triggers, actions and expected results to improve your entire funnel. Use our proven data-driven plays to grow your pipeline and crush your revenue targets. Close more deals with these winning plays!

Sales

Essential Skills for the Modern Data Analyst in 2025

DataFloq

JUNE 10, 2025

A use of such skills would be in hypothesis proving, also known as A/B testing. A/B Testing can determine which of the two pages (A or B) performed better as far as user interaction is concerned. A good example is in determining the effectiveness of a constructed page.

Statistics

Statistics Machine Learning Big Data Data-driven

Generative AI: A Self-Study Roadmap

KDnuggets

JULY 11, 2025

Quality Evaluation and Testing : Unlike traditional ML models with clear accuracy metrics, evaluating generative AI requires more sophisticated approaches. Design iteratively—test variations and measure results systematically. This requires new approaches to testing, debugging, and quality assurance.

Machine Learning

Machine Learning Testing Data Science Cost-Benefit

From Data Lake to Data Products: Operationalising Analytics at Scale

DataFloq

JULY 28, 2025

Test: Validate timeliness, schema, quality. Engineer: Pipelines, APIs, data contracts, metadata. Deploy & Maintain: Insights monitoring usage, SLA monitoring, logs, support. SLAs, SLOs & Contracts SLAs (Agreements) and SLOs (Objectives) are fundamental for data products.

Data Lake

Data Lake Metadata Analytics Data-driven

A Gentle Introduction to Principal Component Analysis (PCA) in Python

KDnuggets

JULY 4, 2025

This is the other reason why we previously split the data into training and test data, to have the opportunity to discuss this: in data transformations like standardization of numerical attributes, transformations across the training and test sets must be consistent.

Machine Learning

Machine Learning Data Science Advertising Testing

A Tale of Two Case Studies: Using LLMs in Production

Speaker: Tony Karrer, Ryan Barker, Grant Wiles, Zach Asman, & Mark Pace

Some takeaways include: How to test and evaluate results 📊 Why confidence scoring matters 🔐 How to assess cost and quality 🤖 Cross-platform cost vs. quality trade offs 🔀 and more!

Testing

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly on Data

MARCH 25, 2025

Weve seen this across dozens of companies, and the teams that break out of this trap all adopt some version of Evaluation-Driven Development (EDD), where testing, monitoring, and evaluation drive every decision from the start. What breaks your app in production isnt always what you tested for in dev! The way out?

Testing

Testing Data-driven Software Measurement

Why Data Quality Is the Keystone of Generative AI

DataFloq

JULY 8, 2025

Bias Auditing and Testing Before feeding data into models, evaluate it for bias, gaps, or systemic issues. Implement fairness metrics and conduct adversarial testing during model training. Maintaining lineage ensures you know the provenance of the data fueling your AI.

Data Quality

Data Quality Metrics Testing Data-driven

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

show(truncate=False) Test results To evaluate the performance and cost benefits of using Iceberg for our quant research data lake, we created four different datasets: two with Iceberg tables and two with direct Amazon S3 Parquet access, each using both sorted and unsorted write distributions. groupBy("exchange_code", "instrument").count().orderBy("count",

Metadata

Metadata Snapshot Cost-Benefit Optimization

Enhance Amazon EMR scaling capabilities with Application Master Placement

AWS Big Data

OCTOBER 14, 2024

Launch an EMR cluster with Application Manager placement awareness To perform some tests, you can launch the following AWS CloudFormation stack, which provisions an EMR cluster with managed scaling and the Application Manager placement awareness feature enabled.

Cost-Benefit

Cost-Benefit Optimization Big Data Management

The Recruiting Crossword Puzzle

Advertiser: ZoomInfo

Test your recruiter-brain with this crossword puzzle, which reveals the best ways to move forward in your efforts with every answer! You can solve your recruiting problems using new tools and data specifically designed to help do your job: find top passive talent and fill those open reqs – faster than you thought possible.

Testing

10 AI strategy questions every CIO must answer

CIO Business Intelligence

JANUARY 14, 2025

Its typical for organizations to test out an AI use case, launching a proof of concept and pilot to determine whether theyre placing a good bet. But as CIOs devise their AI strategies, they must ask whether theyre prepared to move a successful AI test into production, Mason says.

Strategy

Strategy ROI Experimentation Consulting

AI as a growing child: How we can shape its future responsibly

CIO Business Intelligence

JANUARY 6, 2025

This includes mandating bias testing, diversifying datasets, and holding companies accountable for the societal impacts of their technologies. To ensure it grows responsibly, we need diverse voices at the table developers, policymakers, and community leaders who can represent the needs of all users, not just the privileged few.

IT

IT Risk Technology Testing

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

We can ask the following question in Amazon Q: update the s3 sink node to write to s3://xxx-testing-in-356769412531/output/ in CSV format in the same way to update the Amazon S3 data target. Upon checking the S3 data target, we can see the S3 path is now a placeholder and the output format is Parquet.

Data Integration

Data Integration Visualization Data Processing Big Data

Amazon EMR 7.5 runtime for Apache Spark and Iceberg can run Spark workloads 3.6 times faster than Spark 3.5.3 and Iceberg 1.6.1

AWS Big Data

DECEMBER 27, 2024

To assess the Spark engines performance with the Iceberg table format, we performed benchmark tests using the 3 TB TPC-DS dataset, version 2.13 (our results derived from the TPC-DS dataset are not directly comparable to the official TPC-DS results due to setup differences). 4xlarge instances, for testing both open source Spark 3.5.3

Cost-Benefit

Cost-Benefit Testing Metrics Optimization

Buyer's Guide for Supply Chain Network Design Software

As a result, most organizations struggle to answer network design questions or test hypotheses in weeks, when results are demanded in hours. Network design as a discipline is complex and too many businesses are still relying on spreadsheets to design and optimize their supply chain.

Software

Intel Accelerators on Amazon OpenSearch Service improve price-performance on vector search by up to 51%

AWS Big Data

NOVEMBER 27, 2024

Replicate these tests using the older R5 instances as the baseline. FAISS engine results We also examine results from the same tests performed on k-NN indexes configured on the FAISS engine. Using your OpenSearch 2.17 domain, create a k-NN index configured to use either the Lucene or FAISS engine.

Cost-Benefit

Cost-Benefit Machine Learning Optimization Software

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in Amazon OpenSearch Service

AWS Big Data

OCTOBER 11, 2024

Testing and development – You can use snapshots to create copies of your data for testing or development purposes. Migration – Manual snapshots can be useful when you want to migrate data from one domain to another. You can create a snapshot of the source domain and then restore it on the target domain.

Snapshot

Snapshot Dashboards Management Testing

How REA Group approaches Amazon MSK cluster capacity planning

AWS Big Data

DECEMBER 5, 2024

To address this, we used the AWS performance testing framework for Apache Kafka to evaluate the theoretical performance limits. We conducted performance and capacity tests on the test MSK clusters that had the same cluster configurations as our development and production clusters.

Metrics

Metrics Dashboards Testing Optimization

How I Broke Our SLA and Delighted Our Customer

DataKitchen

MAY 17, 2025

An embedded test had failed. And I was tempted, so tempted, as the clock kept ticking, to disable the test and let it go. Then it dawned on me that this test wasnt even ours. These tests werent easy to define or implement. We trusted stakeholders to define critical business rules that would test for major problems.

Data Quality

Data Quality Testing Data Warehouse Dashboards

Easily Build an Optimization App and Empower Your Data

Speaker: Gertjan de Lange

Discover how the AIMMS IDE allows you to analyze, build, and test a model. In this short demo, you will: See how to quickly model sets, parameters, variables, and a multitude of constraints that will define your mathematical formulation. Experience how efficient you can be when you fit your model with actionable data.

Optimization

Data Quality Testing: A Shared Resource for Modern Data Teams

Scaling Data Reliability: The Definitive Guide to Test Coverage for Data Engineers

Webinars

Trending Sources

We’ve Been Using FITT Data Architecture For Many Years, And Honestly, We Can Never Go Back

Webinars

Salesforce adds Testing Center to Agentforce for AI agents

The Ultimate Guide to Apache Airflow DAGS

Webinar: Test Coverage: The Software Development Idea That Supercharges Data Quality & Data Engineering

Image Classification with JAX, Flax, and Optax : A Step-by-Step Guide

Beyond “Prompt and Pray”

The Race For Data Quality in a Medallion Architecture

Best Practices for Creating Long-Lasting and Continuous Discovery Habits

Drug Launch Case Study: Amazing Efficiency Using DataOps

How DeNA Co., Ltd. accelerated anonymized data quality tests up to 100 times faster using Amazon Redshift Serverless and dbt

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

How to Learn Math for Data Science: A Roadmap for Beginners

New Planning Maturity Assessment

AI-native software engineering may be closer than developers think

10 GitHub LLM Repositories Every AI Engineer Should Know

What’s Killing Data Innovation At Your Company? The Hidden Crisis in Data Usability

En marcha un test de estrés en español para medir los sesgos de la IA generativa

New Planning Maturity Assessment

Agentic AI design: An architectural case study

MLFlow Mastery: A Complete Guide to Experiment Tracking and Model Management

7 Python Statistics Tools That Data Scientists Actually Use in 2025 - KDnuggets

Generative Logic

100 Pipeline Plays: The Modern Sales Playbook

Essential Skills for the Modern Data Analyst in 2025

Generative AI: A Self-Study Roadmap

From Data Lake to Data Products: Operationalising Analytics at Scale

A Gentle Introduction to Principal Component Analysis (PCA) in Python

A Tale of Two Case Studies: Using LLMs in Production

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Why Data Quality Is the Keystone of Generative AI

Build a high-performance quant research platform with Apache Iceberg

Enhance Amazon EMR scaling capabilities with Application Master Placement

The Recruiting Crossword Puzzle

10 AI strategy questions every CIO must answer

AI as a growing child: How we can shape its future responsibly

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Amazon EMR 7.5 runtime for Apache Spark and Iceberg can run Spark workloads 3.6 times faster than Spark 3.5.3 and Iceberg 1.6.1

Buyer's Guide for Supply Chain Network Design Software

Intel Accelerators on Amazon OpenSearch Service improve price-performance on vector search by up to 51%

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in Amazon OpenSearch Service

How REA Group approaches Amazon MSK cluster capacity planning

How I Broke Our SLA and Delighted Our Customer

Easily Build an Optimization App and Empower Your Data

Stay Connected