Experimentation, Measurement and Uncertainty

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly on Data

MARCH 25, 2025

ML apps needed to be developed through cycles of experimentation (as were no longer able to reason about how theyll behave based on software specs). The skillset and the background of people building the applications were realigned: People who were at home with data and experimentation got involved! How will you measure success?

Testing

Testing Data-driven Software Measurement

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

Machine learning adds uncertainty. Underneath this uncertainty lies further uncertainty in the development process itself. There are strategies for dealing with all of this uncertainty–starting with the proverb from the early days of Agile: “ do the simplest thing that could possibly work.”

Management

Management Machine Learning Experimentation Metrics

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

Those F’s are: Fragility, Friction, and FUD (Fear, Uncertainty, Doubt). encouraging and rewarding) a culture of experimentation across the organization. Encourage and reward a Culture of Experimentation that learns from failure, “ Test, or get fired! Test early and often. Expect continuous improvement.

Strategy

Strategy Experimentation Uncertainty Machine Learning

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

How to Set AI Goals

O'Reilly on Data

SEPTEMBER 15, 2020

Technical sophistication: Sophistication measures a team’s ability to use advanced tools and techniques (e.g., Technical competence: Competence measures a team’s ability to successfully deliver on initiatives and projects. Technical competence results in reduced risk and uncertainty.

Advertising

Advertising Cost-Benefit ROI Machine Learning

Uncertainties: Statistical, Representational, Interventional

The Unofficial Google Data Science Blog

DECEMBER 14, 2021

by AMIR NAJMI & MUKUND SUNDARARAJAN Data science is about decision making under uncertainty. Some of that uncertainty is the result of statistical inference, i.e., using a finite sample of observations for estimation. But there are other kinds of uncertainty, at least as important, that are not statistical in nature.

Uncertainty

Uncertainty Statistics Measurement Cost-Benefit

Towards optimal experimentation in online systems

The Unofficial Google Data Science Blog

APRIL 23, 2024

the weight given to Likes in our video recommendation algorithm) while $Y$ is a vector of outcome measures such as different metrics of user experience (e.g., Crucially, it takes into account the uncertainty inherent in our experiments. Figure 2: Spreading measurements out makes estimates of model (slope of line) more accurate.

Experimentation

Experimentation Optimization Uncertainty Metrics

AI Product Management After Deployment

O'Reilly on Data

OCTOBER 13, 2020

In an incident management blog post , Atlassian defines SLOs as: “the individual promises you’re making to that customer… SLOs are what set customer expectations and tell IT and DevOps teams what goals they need to hit and measure themselves against. While useful, these constructs are not beyond criticism.

Management

Management Machine Learning Metrics Modeling

The trinity of errors in applying confidence intervals: An exploration using Statsmodels

O'Reilly on Data

DECEMBER 9, 2019

Because of this trifecta of errors, we need dynamic models that quantify the uncertainty inherent in our financial estimates and predictions. Practitioners in all social sciences, especially financial economics, use confidence intervals to quantify the uncertainty in their estimates and predictions.

Statistics

Statistics Uncertainty Risk Marketing

The Lean Analytics Cycle: Metrics > Hypothesis > Experiment > Act

Occam's Razor

APRIL 8, 2013

First, you figure out what you want to improve; then you create an experiment; then you run the experiment; then you measure the results and decide what to do. For each of them, write down the KPI you're measuring, and what that KPI should be for you to consider your efforts a success. Measure and decide what to do.

Metrics

Metrics KPI Analytics Key Performance Indicator

Changing assignment weights with time-based confounders

The Unofficial Google Data Science Blog

JULY 22, 2020

Instead, we focus on the case where an experimenter has decided to run a full traffic ramp-up experiment and wants to use the data from all of the epochs in the analysis. When there are changing assignment weights and time-based confounders, this complication must be considered either in the analysis or the experimental design.

Experimentation

Experimentation Statistics Testing Knowledge Discovery

How to create a culture of innovation

CIO Business Intelligence

SEPTEMBER 12, 2022

Prioritize time for experimentation. It requires bold bets and a willingness to persevere despite setbacks, criticism, and uncertainty,’’ wrote McKinsey senior partners Laura Furstenthal and Erik Roth in a recent blog post. “By Here, they and others share seven ways to create and nurture a culture of innovation.

Experimentation

Experimentation Consulting Technology Machine Learning

Integrate sparse and dense vectors to enhance knowledge retrieval in RAG using Amazon OpenSearch Service

AWS Big Data

SEPTEMBER 5, 2024

Intuitively, for some extremely short user inputs, the vectors generated by dense vector models might have significant semantic uncertainty, where overlaying with a sparse vector model could be beneficial. Experimental data selection For retrieval evaluation, we used to use the datasets from BeIR. We care more about the recall metric.

Metrics

Metrics Testing Experimentation Modeling

Misadventures in experiments for growth

The Unofficial Google Data Science Blog

APRIL 16, 2019

by MICHAEL FORTE Large-scale live experimentation is a big part of online product development. This means a small and growing product has to use experimentation differently and very carefully. This blog post is about experimentation in this regime. But these are not usually amenable to A/B experimentation.

Experimentation

Experimentation Sales Metrics Measurement

CIOs press ahead for gen AI edge — despite misgivings

CIO Business Intelligence

OCTOBER 18, 2023

If anything, 2023 has proved to be a year of reckoning for businesses, and IT leaders in particular, as they attempt to come to grips with the disruptive potential of this technology — just as debates over the best path forward for AI have accelerated and regulatory uncertainty has cast a longer shadow over its outlook in the wake of these events.

Risk

Risk Manufacturing Enterprise Technology

Product Management for AI

Domino Data Lab

JUNE 23, 2019

Skomoroch proposes that managing ML projects are challenging for organizations because shipping ML projects requires an experimental culture that fundamentally changes how many companies approach building and shipping software. These measurement-obsessed companies have an advantage when it comes to AI.

Management

Management Machine Learning Experimentation Metrics

Variance and significance in large-scale online services

The Unofficial Google Data Science Blog

JANUARY 14, 2016

Unlike experimentation in some other areas, LSOS experiments present a surprising challenge to statisticians — even though we operate in the realm of “big data”, the statistical uncertainty in our experiments can be substantial. We must therefore maintain statistical rigor in quantifying experimental uncertainty.

Experimentation

Experimentation Statistics Metrics Measurement

LSOS experiments: how I learned to stop worrying and love the variability

The Unofficial Google Data Science Blog

FEBRUARY 29, 2016

Despite a very large number of experimental units, the experiments conducted by LSOS cannot presume statistical significance of all effects they deem practically significant. The result is that experimenters can’t afford to be sloppy about quantifying uncertainty. At Google, we tend to refer to them as slices.

Experimentation

Experimentation Statistics Metrics Measurement

Getting ready for artificial general intelligence with examples

IBM Big Data Hub

APRIL 18, 2024

While leaders have some reservations about the benefits of current AI, organizations are actively investing in gen AI deployment, significantly increasing budgets, expanding use cases, and transitioning projects from experimentation to production. The AGI would need to handle uncertainty and make decisions with incomplete information.

Cost-Benefit

Cost-Benefit Manufacturing Modeling Interactive

Data scientist as scientist

The Unofficial Google Data Science Blog

OCTOBER 21, 2015

It is important to make clear distinctions among each of these, and to advance the state of knowledge through concerted observation, modeling and experimentation. Note also that this account does not involve ambiguity due to statistical uncertainty. We sliced and diced the experimental data in many many ways.

Slice and Dice

Slice and Dice Experimentation Data-driven Data Science

Estimating causal effects using geo experiments

The Unofficial Google Data Science Blog

MAY 31, 2016

It is important that we can measure the effect of these offline conversions as well. Panel studies make it possible to measure user behavior along with the exposure to ads and other online elements. Let's take a look at larger groups of individuals whose aggregate behavior we can measure. days or weeks).

Advertising

Advertising Testing Sales Statistics

IT leaders’ top 9 takeaways from 2024

CIO Business Intelligence

DECEMBER 23, 2024

AI investment and pressure grew upward As AI has moved from emerging to mainstream, and organizations matured in their ability to harness AIs potential over the past year or two, CEOs now expect less experimentation and more AI projects that deliver outcomes with measurable business value.

IT

IT Strategy Technology Experimentation

10 ways to kill your IT culture

CIO Business Intelligence

NOVEMBER 12, 2024

Measure the impact of software developers by how teams meet release commitments, promote design peer reviews, and demonstrate the impacts of experimentation. When changes are made without transparency or input from the team, it breeds uncertainty and resentment.

IT

IT Digital Transformation Management Experimentation

Data Leaders Brief

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

What you need to know about product management for AI

Webinars

Trending Sources

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Webinars

How to Set AI Goals

Uncertainties: Statistical, Representational, Interventional

Towards optimal experimentation in online systems

AI Product Management After Deployment

The trinity of errors in applying confidence intervals: An exploration using Statsmodels

The Lean Analytics Cycle: Metrics > Hypothesis > Experiment > Act

Changing assignment weights with time-based confounders

How to create a culture of innovation

Integrate sparse and dense vectors to enhance knowledge retrieval in RAG using Amazon OpenSearch Service

Misadventures in experiments for growth

CIOs press ahead for gen AI edge — despite misgivings

Product Management for AI

Variance and significance in large-scale online services

LSOS experiments: how I learned to stop worrying and love the variability

Getting ready for artificial general intelligence with examples

Data scientist as scientist

Estimating causal effects using geo experiments

IT leaders’ top 9 takeaways from 2024

10 ways to kill your IT culture

Stay Connected