Experimentation, Modeling and Testing

Experimentation and Testing: A Primer

Occam's Razor

MAY 22, 2006

This post is a primer on the delightful world of testing and experimentation (A/B, Multivariate, and a new term from me: Experience Testing). Experimentation and testing help us figure out we are wrong, quickly and repeatedly and if you think about it that is a great thing for our customers, and for our employers.

Experimentation

Experimentation Testing Optimization Measurement

Practical Skills for The AI Product Manager

O'Reilly on Data

MAY 14, 2020

AI PMs should enter feature development and experimentation phases only after deciding what problem they want to solve as precisely as possible, and placing the problem into one of these categories. Experimentation: It’s just not possible to create a product by building, evaluating, and deploying a single model.

Management

Management Experimentation B2B Machine Learning

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly on Data

MARCH 25, 2025

Weve seen this across dozens of companies, and the teams that break out of this trap all adopt some version of Evaluation-Driven Development (EDD), where testing, monitoring, and evaluation drive every decision from the start. What breaks your app in production isnt always what you tested for in dev! The way out?

Testing

Testing Data-driven Software Measurement

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

Let’s start by considering the job of a non-ML software engineer: writing traditional software deals with well-defined, narrowly-scoped inputs, which the engineer can exhaustively and cleanly model in the code. Not only is data larger, but models—deep learning models in particular—are much larger than before.

IT

IT Testing Experimentation Software

Best Practices for Creating Long-Lasting and Continuous Discovery Habits

Speaker: Teresa Torres, Internationally Acclaimed Author, Speaker, and Coach at ProductTalk.org

Industry-wide, product teams have adopted discovery practices like customer interviews and experimentation merely for end-user satisfaction. As a result, many of us are still stuck in a project-world rut: research, usability testing, engineering, and a/b testing, ad nauseam.

Testing

End to End Statistics for Data Science

Analytics Vidhya

OCTOBER 29, 2021

This article was published as a part of the Data Science Blogathon Introduction to Statistics Statistics is a type of mathematical analysis that employs quantified models and representations to analyse a set of experimental data or real-world studies. Data processing is […].

Statistics

Statistics Data Science Experimentation Publishing

AI-native software engineering may be closer than developers think

CIO Business Intelligence

OCTOBER 17, 2024

Despite critics, most, if not all, vendors offering coding assistants are now moving toward autonomous agents, although full AI coding independence is still experimental, Walsh says. With existing, human-written tests you just loop through generated code, feeding the errors back in, until you get to a success state.”

Software

Software Testing Experimentation Consulting

Digital transformation 2025: What’s in, what’s out

CIO Business Intelligence

JANUARY 7, 2025

Transformational CIOs continuously invest in their operating model by developing product management, design thinking, agile, DevOps, change management, and data-driven practices. Focusing on classifying data and improving data quality is the offense strategy, as it can lead to improving AI model accuracy and delivering business results.

Digital Transformation

Digital Transformation Experimentation Cost-Benefit Strategy

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Testing and Data Observability. DataOps needs a directed graph-based workflow that contains all the data access, integration, model and visualization steps in the data analytic production process. It orchestrates complex pipelines, toolchains, and tests across teams, locations, and data centers. Testing and Data Observability.

Testing

Testing Machine Learning Consulting Data Science

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

While generative AI has been around for several years , the arrival of ChatGPT (a conversational AI tool for all business occasions, built and trained from large language models) has been like a brilliant torch brought into a dark room, illuminating many previously unseen opportunities. So, if you have 1 trillion data points (g.,

Strategy

Strategy Experimentation Uncertainty Machine Learning

6 keys to genAI success in 2025

CIO Business Intelligence

DECEMBER 17, 2024

While genAI has been a hot topic for the past couple of years, organizations have largely focused on experimentation. Like any new technology, organizations typically need to upskill existing talent or work with trusted technology partners to continuously tune and integrate their AI foundation models. In 2025, thats going to change.

Experimentation

Experimentation ROI Risk Data Quality

Multi-Channel Attribution Modeling: The Good, Bad and Ugly Models

Occam's Razor

AUGUST 12, 2013

than multi-channel attribution modeling. By the time you are done with this post you'll have complete knowledge of what's ugly and bad when it comes to attribution modeling. You'll know how to use the good model, even if it is far from perfect. Multi-Channel Attribution Models. Linear Attribution Model.

Modeling

Modeling Optimization Marketing Interactive

US Air Force seeks generative AI test pilots

CIO Business Intelligence

JUNE 13, 2024

Proof that even the most rigid of organizations are willing to explore generative AI arrived this week when the US Department of the Air Force (DAF) launched an experimental initiative aimed at Guardians, Airmen, civilian employees, and contractors. It is not training the model, nor are responses refined based on any user inputs.

Testing

Testing Experimentation Data Processing Modeling

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

Instead of writing code with hard-coded algorithms and rules that always behave in a predictable manner, ML engineers collect a large number of examples of input and output pairs and use them as training data for their models. This has serious implications for software testing, versioning, deployment, and other core development processes.

Management

Management Machine Learning Experimentation Metrics

AI Product Management After Deployment

O'Reilly on Data

OCTOBER 13, 2020

Similarly, in “ Building Machine Learning Powered Applications: Going from Idea to Product ,” Emmanuel Ameisen states: “Indeed, exposing a model to users in production comes with a set of challenges that mirrors the ones that come with debugging a model.”. Debugging AI Products.

Management

Management Machine Learning Metrics Modeling

Companies Test Possibilities and Limits of AI in Research and Product Development

Smart Data Collective

OCTOBER 13, 2022

These patterns could then be used as the basis for additional experimentation by scientists or engineers. Generative design is a new approach to product development that uses artificial intelligence to generate and test many possible designs. Automated Testing of Features. Generative Design. Quality Assurance.

Testing

Testing Cost-Benefit Experimentation Optimization

Liberty Mutual CIO Monica Caldas on developing a digital-savvy workforce

CIO Business Intelligence

NOVEMBER 7, 2024

It covers essential topics like artificial intelligence, our use of data models, our approach to technical debt, and the modernization of legacy systems. This initiative offers a safe environment for learning and experimentation. We are also testing it with engineering. We’ve structured our approach into phases.

Insurance

Insurance Experimentation Testing Technology

From project to product: Architecting the future of enterprise technology

CIO Business Intelligence

JANUARY 14, 2025

By articulating fitness functions automated tests tied to specific quality attributes like reliability, security or performance teams can visualize and measure system qualities that align with business goals. Experimentation: The innovation zone Progressive cities designate innovation districts where new ideas can be tested safely.

Enterprise

Enterprise Technology Metrics Measurement

Companies to shift AI goals in 2025 — with setbacks inevitable, Forrester predicts

CIO Business Intelligence

OCTOBER 24, 2024

As they look to operationalize lessons learned through experimentation, they will deliver short-term wins and successfully play the gen AI — and other emerging tech — long game,” Leaver said. The rest of their time is spent creating designs, writing tests, fixing bugs, and meeting with stakeholders. “So

ROI

ROI Data-driven Enterprise Experimentation

Do You Need a DataOps Dojo?

DataKitchen

JANUARY 20, 2021

Develop/execute regression testing . Test data management and other functions provided ‘as a service’ . The center of excellence (COE) model leverages the DataOps team to solve real-world challenges. Examples of technologies that can be delivered ‘as a service’ include: Source code control repository. Deploy to production.

Metrics

Metrics Experimentation Measurement Testing

What Are ChatGPT and Its Friends?

O'Reilly on Data

MARCH 23, 2023

It’s important to understand that ChatGPT is not actually a language model. It’s a convenient user interface built around one specific language model, GPT-3.5, is one of a class of language models that are sometimes called “large language models” (LLMs)—though that term isn’t very helpful. with specialized training.

IT

IT Modeling Testing Risk

3 musts when recruiting vendors for AI

CIO Business Intelligence

MARCH 5, 2025

Two years of experimentation may have given rise to several valuable use cases for gen AI , but during the same period, IT leaders have also learned that the new, fast-evolving technology isnt something to jump into blindly. The next thing is to make sure they have an objective way of testing the outcome and measuring success.

Testing

Testing Measurement Technology Experimentation

How to Set AI Goals

O'Reilly on Data

SEPTEMBER 15, 2020

In my book, I introduce the Technical Maturity Model: I define technical maturity as a combination of three factors at a given point of time. Outputs from trained AI models include numbers (continuous or discrete), categories or classes (e.g., spam or not-spam), probabilities, groups/segments, or a sequence (e.g.,

Advertising

Advertising Cost-Benefit ROI Machine Learning

Introducing Amazon MWAA micro environments for Apache Airflow

AWS Big Data

NOVEMBER 19, 2024

Customers maintain multiple MWAA environments to separate development stages, optimize resources, manage versions, enhance security, ensure redundancy, customize settings, improve scalability, and facilitate experimentation. This approach offers greater flexibility and control over workflow management. The introduction of mw1.micro

Metadata

Metadata Cost-Benefit Metrics Optimization

Interview with: Sankar Narayanan, Chief Practice Officer at Fractal Analytics

Corinium

JUNE 6, 2019

Fractal’s recommendation is to take an incremental, test and learn approach to analytics to fully demonstrate the program value before making larger capital investments. It is also important to have a strong test and learn culture to encourage rapid experimentation. What is the most common mistake people make around data?

Insurance

Insurance Analytics Forecasting Deep Learning

3 principles for regulatory-grade large language model application

CIO Business Intelligence

JULY 11, 2023

In recent years, we have witnessed a tidal wave of progress and excitement around large language models (LLMs) such as ChatGPT and GPT-4. The No Test Gaps Principle Under the No Test Gaps Principle, it is unacceptable that LLMs are not tested holistically with a reproducible test suite before deployment.

Modeling

Modeling Testing Experimentation Risk Management

Dear Avinash: Attribution Modeling, Org Culture, Deeper Analysis

Occam's Razor

AUGUST 13, 2012

Yehoshua I've covered this topic in detail in this blog post: Multi-Channel Attribution: Definitions, Models and a Reality Check. I explain three different models (Online to Store, Across Multiple Devices, Across Digital Channels) and for each I've highlighted: 1. What's possible to measure.

Modeling

Modeling Metrics Data Quality Data-driven

The bigger the better? Approaching Generative AI by size

CIO Business Intelligence

OCTOBER 3, 2024

From budget allocations to model preferences and testing methodologies, the survey unearths the areas that matter most to large, medium, and small companies, respectively. Medium companies Medium-sized companies—501 to 5,000 employees—were characterized by agility and a strong focus on GenAI experimentation.

Cost-Benefit

Cost-Benefit Testing Experimentation Risk

Ulta Beauty embraces low-code to deliver better CX

CIO Business Intelligence

JULY 12, 2024

They’ve also been using low-code and gen AI to quickly conceive, build, test, and deploy new customer-facing apps and experiences. In a fiercely competitive industry, where CX is critical to differentiation, this approach has enabled them to build and test new innovations about 10 times faster than traditional development.

Experimentation

Experimentation Testing Enterprise Strategy

6 enterprise DevOps mistakes to avoid

CIO Business Intelligence

OCTOBER 15, 2024

But continuous deployment isn’t always appropriate for your business , stakeholders don’t always understand the costs of implementing robust continuous testing , and end-users don’t always tolerate frequent app deployments during peak usage. CrowdStrike recently made the news about a failed deployment impacting 8.5

Enterprise

Enterprise Testing Business Objectives Experimentation

Machine Learning Product Management: Lessons Learned

Domino Data Lab

MAY 15, 2019

Unfortunately, a common challenge that many industry people face includes battling “ the model myth ,” or the perception that because their work includes code and data, their work “should” be treated like software engineering. These steps also reflect the experimental nature of ML product management.

Machine Learning

Machine Learning Management Experimentation Data Science

The Lean Analytics Cycle: Metrics > Hypothesis > Experiment > Act

Occam's Razor

APRIL 8, 2013

Sometimes, we escape the clutches of this sub optimal existence and do pick good metrics or engage in simple A/B testing. Let's listen in as Alistair discusses the lean analytics model… The Lean Analytics Cycle is a simple, four-step process that shows you how to improve a part of your business. Testing out a new feature.

Metrics

Metrics KPI Analytics Key Performance Indicator

Experiment design and modeling for long-term studies in ads

The Unofficial Google Data Science Blog

OCTOBER 7, 2015

by HENNING HOHNHOLD, DEIRDRE O'BRIEN, and DIANE TANG In this post we discuss the challenges in measuring and modeling the long-term effect of ads on user behavior. We describe experiment designs which have proven effective for us and discuss the subtleties of trying to generalize the results via modeling.

Modeling

Modeling Experimentation Knowledge Discovery KDD

What high-performance IT teams look like today — and how to build one

CIO Business Intelligence

AUGUST 20, 2024

Our mental models of what constitutes a high-performance team have evolved considerably over the past five years. Post-pandemic, high-performance teams excelled at remote and hybrid working models, were more empathetic to individual needs, and leveraged automation to reduce manual work.

IT

IT Digital Transformation Experimentation Risk

Changing assignment weights with time-based confounders

The Unofficial Google Data Science Blog

JULY 22, 2020

Another reason to use ramp-up is to test if a website's infrastructure can handle deploying a new arm to all of its users. The website wants to make sure they have the infrastructure to handle the feature while testing if engagement increases enough to justify the infrastructure. We offer two examples where this may be the case.

Experimentation

Experimentation Statistics Testing Knowledge Discovery

CBRE’s Sandeep Davé on accelerating your AI ambitions

CIO Business Intelligence

OCTOBER 5, 2023

Sandeep Davé knows the value of experimentation as well as anyone. As chief digital and technology officer at CBRE, Davé recognized early that the commercial real estate industry was ripe for AI and machine learning enhancements, and he and his team have tested countless use cases across the enterprise ever since.

Experimentation

Experimentation Strategy Machine Learning Interactive

Integrate sparse and dense vectors to enhance knowledge retrieval in RAG using Amazon OpenSearch Service

AWS Big Data

SEPTEMBER 5, 2024

In the context of Retrieval-Augmented Generation (RAG), knowledge retrieval plays a crucial role, because the effectiveness of retrieval directly impacts the maximum potential of large language model (LLM) generation. document-only) ~ 20%(bi-encoder) higher NDCG@10, comparable to the TAS-B dense vector model.

Metrics

Metrics Testing Experimentation Modeling

Designing A/B tests in a collaboration network

The Unofficial Google Data Science Blog

JANUARY 16, 2018

We present data from Google Cloud Platform (GCP) as an example of how we use A/B testing when users are connected. Experimentation on networks A/B testing is a standard method of measuring the effect of changes by randomizing samples into different treatment groups. This simulation is based on the actual user network of GCP.

Testing

Testing Experimentation Measurement Modeling

How We Teach The Leaders of Tomorrow To Be Curious, Ask Questions and Not Be Afraid To Fail Fast To Learn Fast

Rocket-Powered Data Science

OCTOBER 16, 2020

We build models to test our understanding, but these models are not “one and done.” In ML, the learning cycle is sometimes called backpropagation, where the errors (inaccurate predictions) of our models are fed back into adjusting the model’s input parameters in a way that aims to improve the output accuracy. (3)

Digital Transformation

Digital Transformation Experimentation Data Science Data Strategy

12 data science certifications that will pay off

CIO Business Intelligence

JANUARY 19, 2024

The exam tests general knowledge of the platform and applies to multiple roles, including administrator, developer, data analyst, data engineer, data scientist, and system architect. Candidates for the exam are tested on ML, AI solutions, NLP, computer vision, and predictive analytics.

Data Science

Data Science Machine Learning Predictive Modeling Forecasting

Bayer Crop Science blends gen AI and data science for innovative edge

CIO Business Intelligence

AUGUST 23, 2024

Data scientists at Bayer have developed several proofs of concept of generative AI models on the new platform that remain in discovery and evaluation phase for “efficacy,” McQueen says, adding that the models won’t be in production until 2025. The R&D pipeline is pretty highly confidential at this point,” he says. It’s additive.”

Data Science

Data Science Experimentation Testing Modeling

Regeneron turns to IT to accelerate drug discovery

CIO Business Intelligence

NOVEMBER 4, 2022

The company’s multicloud infrastructure has since expanded to include Microsoft Azure for business applications and Google Cloud Platform to provide its scientists with a greater array of options for experimentation. For McCowan, the key is to give scientists any and all tools that allow them to explore their hypotheses and test theories.

Data Lake

Data Lake IT Experimentation Data-driven

Announcing Domino 3.3: Datasets and Experiment Manager

Domino Data Lab

MARCH 20, 2019

Our mission at Domino is to enable organizations to put models at the heart of their business. Today we’re announcing two major new capabilities in Domino that make model development easier and faster for data scientists. This pain point is magnified in organizations with teams of data scientists working on numerous experiments.

Management

Management Experimentation Data Science Modeling

Of Muffins and Machine Learning Models

Cloudera

FEBRUARY 16, 2022

In this example, the Machine Learning (ML) model struggles to differentiate between a chihuahua and a muffin. Will the model correctly determine it is a muffin or get confused and think it is a chihuahua? The extent to which we can predict how the model will classify an image given a change input (e.g. Model Visibility.

Machine Learning

Machine Learning Modeling Metadata Recreation/Entertainment

Experimentation and Testing: A Primer

Practical Skills for The AI Product Manager

Webinars

Trending Sources

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Webinars

MLOps and DevOps: Why Data Makes It Different

Best Practices for Creating Long-Lasting and Continuous Discovery Habits

End to End Statistics for Data Science

AI-native software engineering may be closer than developers think

Digital transformation 2025: What’s in, what’s out

The DataOps Vendor Landscape, 2021

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

6 keys to genAI success in 2025

Multi-Channel Attribution Modeling: The Good, Bad and Ugly Models

US Air Force seeks generative AI test pilots

What you need to know about product management for AI

AI Product Management After Deployment

Companies Test Possibilities and Limits of AI in Research and Product Development

Liberty Mutual CIO Monica Caldas on developing a digital-savvy workforce

From project to product: Architecting the future of enterprise technology

Companies to shift AI goals in 2025 — with setbacks inevitable, Forrester predicts

Do You Need a DataOps Dojo?

What Are ChatGPT and Its Friends?

3 musts when recruiting vendors for AI

How to Set AI Goals

Introducing Amazon MWAA micro environments for Apache Airflow

Interview with: Sankar Narayanan, Chief Practice Officer at Fractal Analytics

3 principles for regulatory-grade large language model application

Dear Avinash: Attribution Modeling, Org Culture, Deeper Analysis

The bigger the better? Approaching Generative AI by size

Ulta Beauty embraces low-code to deliver better CX

6 enterprise DevOps mistakes to avoid

Machine Learning Product Management: Lessons Learned

The Lean Analytics Cycle: Metrics > Hypothesis > Experiment > Act

Experiment design and modeling for long-term studies in ads

What high-performance IT teams look like today — and how to build one

Changing assignment weights with time-based confounders

CBRE’s Sandeep Davé on accelerating your AI ambitions

Integrate sparse and dense vectors to enhance knowledge retrieval in RAG using Amazon OpenSearch Service

Designing A/B tests in a collaboration network

How We Teach The Leaders of Tomorrow To Be Curious, Ask Questions and Not Be Afraid To Fail Fast To Learn Fast

12 data science certifications that will pay off

Bayer Crop Science blends gen AI and data science for innovative edge

Regeneron turns to IT to accelerate drug discovery

Announcing Domino 3.3: Datasets and Experiment Manager

Of Muffins and Machine Learning Models

Stay Connected