Experimentation, Testing and Uncertainty

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly on Data

MARCH 25, 2025

Weve seen this across dozens of companies, and the teams that break out of this trap all adopt some version of Evaluation-Driven Development (EDD), where testing, monitoring, and evaluation drive every decision from the start. What breaks your app in production isnt always what you tested for in dev! The way out?

Testing

Testing Data-driven Software Measurement

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

Those F’s are: Fragility, Friction, and FUD (Fear, Uncertainty, Doubt). encouraging and rewarding) a culture of experimentation across the organization. Keep it agile, with short design, develop, test, release, and feedback cycles: keep it lean, and build on incremental changes. Test early and often. Launch the chatbot.

Strategy

Strategy Experimentation Uncertainty Machine Learning

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

Machine learning adds uncertainty. This has serious implications for software testing, versioning, deployment, and other core development processes. Underneath this uncertainty lies further uncertainty in the development process itself. Models within AI products change the same world they try to predict.

Management

Management Machine Learning Experimentation Metrics

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

How to Set AI Goals

O'Reilly on Data

SEPTEMBER 15, 2020

Technical competence results in reduced risk and uncertainty. Results are typically achieved through a scientific process of discovery, exploration, and experimentation, and these processes are not always predictable. There’s a lot of overlap between these factors.

Advertising

Advertising Cost-Benefit ROI Machine Learning

AI Product Management After Deployment

O'Reilly on Data

OCTOBER 13, 2020

In Bringing an AI Product to Market , we distinguished the debugging phase of product development from pre-deployment evaluation and testing. During testing and evaluation, application performance is important, but not critical to success. require not only disclosure, but also monitored testing. Debugging AI Products.

Management

Management Machine Learning Metrics Modeling

Machine Learning Product Management: Lessons Learned

Domino Data Lab

MAY 15, 2019

Pete indicates, in both his November 2018 and Strata London talks, that ML requires a more experimental approach than traditional software engineering. It is more experimental because it is “an approach that involves learning from data instead of programmatically following a set of human rules.”

Machine Learning

Machine Learning Management Experimentation Uncertainty

Towards optimal experimentation in online systems

The Unofficial Google Data Science Blog

APRIL 23, 2024

Crucially, it takes into account the uncertainty inherent in our experiments. To find optimal values of two parameters experimentally, the obvious strategy would be to experiment with and update them in separate, sequential stages. In this section we’ll discuss how we approach these two kinds of uncertainty with QCQP.

Experimentation

Experimentation Optimization Uncertainty Metrics

Uncertainties: Statistical, Representational, Interventional

The Unofficial Google Data Science Blog

DECEMBER 14, 2021

by AMIR NAJMI & MUKUND SUNDARARAJAN Data science is about decision making under uncertainty. Some of that uncertainty is the result of statistical inference, i.e., using a finite sample of observations for estimation. But there are other kinds of uncertainty, at least as important, that are not statistical in nature.

Uncertainty

Uncertainty Statistics Measurement Cost-Benefit

The trinity of errors in applying confidence intervals: An exploration using Statsmodels

O'Reilly on Data

DECEMBER 9, 2019

Because of this trifecta of errors, we need dynamic models that quantify the uncertainty inherent in our financial estimates and predictions. Practitioners in all social sciences, especially financial economics, use confidence intervals to quantify the uncertainty in their estimates and predictions.

Statistics

Statistics Uncertainty Risk Marketing

Changing assignment weights with time-based confounders

The Unofficial Google Data Science Blog

JULY 22, 2020

Another reason to use ramp-up is to test if a website's infrastructure can handle deploying a new arm to all of its users. The website wants to make sure they have the infrastructure to handle the feature while testing if engagement increases enough to justify the infrastructure. We offer two examples where this may be the case.

Experimentation

Experimentation Statistics Testing Knowledge Discovery

The DX roadmap: David Rogers on driving digital transformation success

CIO Business Intelligence

SEPTEMBER 19, 2023

How can enterprises attain these in the face of uncertainty? Rogers: This is one of two fundamental challenges of corporate innovation — managing innovation under high uncertainty and managing innovation far from the core — that I have studied in my work advising companies and try to tackle in my new book The Digital Transformation Roadmap.

Digital Transformation

Digital Transformation Uncertainty Testing Enterprise

The Lean Analytics Cycle: Metrics > Hypothesis > Experiment > Act

Occam's Razor

APRIL 8, 2013

Sometimes, we escape the clutches of this sub optimal existence and do pick good metrics or engage in simple A/B testing. Testing out a new feature. Identify, hypothesize, test, react. But at the same time, they had to have a real test of an actual feature. You don’t need a beautiful beast to go out and test.

Metrics

Metrics KPI Analytics Key Performance Indicator

Integrate sparse and dense vectors to enhance knowledge retrieval in RAG using Amazon OpenSearch Service

AWS Big Data

SEPTEMBER 5, 2024

Intuitively, for some extremely short user inputs, the vectors generated by dense vector models might have significant semantic uncertainty, where overlaying with a sparse vector model could be beneficial. Experimental data selection For retrieval evaluation, we used to use the datasets from BeIR. How to combine dense and sparse?

Metrics

Metrics Testing Experimentation Modeling

Lessons from the field: How Generative AI is shaping software development in 2023

CIO Business Intelligence

SEPTEMBER 6, 2023

The use of AI-generated code is still in an experimental phase for many organizations due to numerous uncertainties such as its impact on security, data privacy, copyright, and more. Best practices and education Currently, there are no established best practices for leveraging AI in software development.

Software

Software Experimentation Risk Uncertainty

Why CEOs should test big digital business ideas in tiny countries.

Mark Raskino

AUGUST 30, 2017

He was talking about something we call the ‘compound uncertainty’ that must be navigated when we want to test and introduce a real breakthrough digital business idea. You can connect social groups, economic groups and communities, which would be extraordinarily cumbersome and time-consuming in bigger societies”.

Testing

Testing Experimentation Uncertainty Risk

Belcorp reimagines R&D with AI

CIO Business Intelligence

JUNE 28, 2023

These circumstances have induced uncertainty across our entire business value chain,” says Venkat Gopalan, chief digital, data and technology officer, Belcorp. “As That, in turn, led to a slew of manual processes to make descriptive analysis of the test results. This allowed us to derive insights more easily.”

Digital Transformation

Digital Transformation Cost-Benefit Informatics Data mining

Product Management for AI

Domino Data Lab

JUNE 23, 2019

Skomoroch proposes that managing ML projects are challenging for organizations because shipping ML projects requires an experimental culture that fundamentally changes how many companies approach building and shipping software. Yet, this challenge is not insurmountable. for what is and isn’t possible) to address these challenges. Transcript.

Management

Management Machine Learning Experimentation Metrics

CIOs press ahead for gen AI edge — despite misgivings

CIO Business Intelligence

OCTOBER 18, 2023

If anything, 2023 has proved to be a year of reckoning for businesses, and IT leaders in particular, as they attempt to come to grips with the disruptive potential of this technology — just as debates over the best path forward for AI have accelerated and regulatory uncertainty has cast a longer shadow over its outlook in the wake of these events.

Risk

Risk Manufacturing Enterprise Technology

Misadventures in experiments for growth

The Unofficial Google Data Science Blog

APRIL 16, 2019

by MICHAEL FORTE Large-scale live experimentation is a big part of online product development. This means a small and growing product has to use experimentation differently and very carefully. This blog post is about experimentation in this regime. Such decisions involve an actual hypothesis test on specific metrics (e.g.

Experimentation

Experimentation Sales Metrics Measurement

Variance and significance in large-scale online services

The Unofficial Google Data Science Blog

JANUARY 14, 2016

Unlike experimentation in some other areas, LSOS experiments present a surprising challenge to statisticians — even though we operate in the realm of “big data”, the statistical uncertainty in our experiments can be substantial. We must therefore maintain statistical rigor in quantifying experimental uncertainty.

Experimentation

Experimentation Statistics Metrics Measurement

20 issues shaping generative AI strategies today

CIO Business Intelligence

JULY 31, 2023

As vendors add generative AI to their enterprise software offerings, and as employees test out the tech, CIOs must advise their colleagues on the pros and cons of gen AI’s use as well as the potential consequences of banning or limiting it. There’s a lot of uncertainty. People are thinking, ‘How is this going to affect my career?

Strategy

Strategy Risk Enterprise Risk Management

Get Creative with AI Forecasting in Changing Economic Conditions

DataRobot Blog

OCTOBER 4, 2022

In the last few years, businesses have experienced disruptions and uncertainty on an unprecedented scale. However, hand-coding, testing, evaluating and deploying highly accurate models is a tedious and time-consuming process. Managing Through Socio-Economic Disruption.

Forecasting

Forecasting Sales Data-driven Modeling

Getting ready for artificial general intelligence with examples

IBM Big Data Hub

APRIL 18, 2024

While leaders have some reservations about the benefits of current AI, organizations are actively investing in gen AI deployment, significantly increasing budgets, expanding use cases, and transitioning projects from experimentation to production. The AGI would need to handle uncertainty and make decisions with incomplete information.

Cost-Benefit

Cost-Benefit Manufacturing Modeling Interactive

Data scientist as scientist

The Unofficial Google Data Science Blog

OCTOBER 21, 2015

The beliefs of this community are always evolving, and the process of thoughtfully generating, testing, refuting and accepting ideas looks a lot like Science. It is important to make clear distinctions among each of these, and to advance the state of knowledge through concerted observation, modeling and experimentation.

Slice and Dice

Slice and Dice Experimentation Data-driven Data Science

Estimating causal effects using geo experiments

The Unofficial Google Data Science Blog

MAY 31, 2016

Similarly, we could test the effectiveness of a search ad compared to showing only organic search results. A geo experiment is an experiment where the experimental units are defined by geographic regions. Structure of a geo experiment A typical geo experiment consists of two distinct time periods: pretest and test.

Advertising

Advertising Testing Sales Statistics

How to inspire an AI growth mindset at your organization

CIO Business Intelligence

JANUARY 23, 2025

And while its beyond the scope of this article, the applicable knowledge gained through our hands-on experimentation with genAI was head and shoulders above simple web searches (e.g., He specializes in removing fear, uncertainty, and doubt from strategic decision-making through empirical data and market sensing.

Experimentation

Experimentation Uncertainty Data-driven Digital Transformation

Data Leaders Brief

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Webinars

Trending Sources

What you need to know about product management for AI

Webinars

How to Set AI Goals

AI Product Management After Deployment

Machine Learning Product Management: Lessons Learned

Towards optimal experimentation in online systems

Uncertainties: Statistical, Representational, Interventional

The trinity of errors in applying confidence intervals: An exploration using Statsmodels

Changing assignment weights with time-based confounders

The DX roadmap: David Rogers on driving digital transformation success

The Lean Analytics Cycle: Metrics > Hypothesis > Experiment > Act

Integrate sparse and dense vectors to enhance knowledge retrieval in RAG using Amazon OpenSearch Service

Lessons from the field: How Generative AI is shaping software development in 2023

Why CEOs should test big digital business ideas in tiny countries.

Belcorp reimagines R&D with AI

Product Management for AI

CIOs press ahead for gen AI edge — despite misgivings

Misadventures in experiments for growth

Variance and significance in large-scale online services

20 issues shaping generative AI strategies today

Get Creative with AI Forecasting in Changing Economic Conditions

Getting ready for artificial general intelligence with examples

Data scientist as scientist

Estimating causal effects using geo experiments

How to inspire an AI growth mindset at your organization

Stay Connected