Metrics, Testing and Uncertainty

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly on Data

MARCH 25, 2025

Weve seen this across dozens of companies, and the teams that break out of this trap all adopt some version of Evaluation-Driven Development (EDD), where testing, monitoring, and evaluation drive every decision from the start. What breaks your app in production isnt always what you tested for in dev! The way out? How do we do so?

Testing

Testing Data-driven Software Measurement

The Lean Analytics Cycle: Metrics > Hypothesis > Experiment > Act

Occam's Razor

APRIL 8, 2013

To win in business you need to follow this process: Metrics > Hypothesis > Experiment > Act. We are far too enamored with data collection and reporting the standard metrics we love because others love them because someone else said they were nice so many years ago. That metric is tied to a KPI.

Metrics

Metrics KPI Analytics Key Performance Indicator

You Can’t Regulate What You Don’t Understand

O'Reilly on Data

JUNE 15, 2023

If we want prosocial outcomes, we need to design and report on the metrics that explicitly aim for those outcomes and measure the extent to which they have been achieved. And they are stress testing and “ red teaming ” them to uncover vulnerabilities. There is no simple way to solve the alignment problem.

Metrics

Metrics Reporting Measurement Finance

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

Machine learning adds uncertainty. This has serious implications for software testing, versioning, deployment, and other core development processes. Underneath this uncertainty lies further uncertainty in the development process itself. Models within AI products change the same world they try to predict.

Management

Management Machine Learning Experimentation Metrics

AI Product Management After Deployment

O'Reilly on Data

OCTOBER 13, 2020

In Bringing an AI Product to Market , we distinguished the debugging phase of product development from pre-deployment evaluation and testing. During testing and evaluation, application performance is important, but not critical to success. require not only disclosure, but also monitored testing. Debugging AI Products.

Management

Management Machine Learning Metrics Modeling

3 ways to avoid the generative AI ROI doom loop

CIO Business Intelligence

NOVEMBER 12, 2024

He did not get to the point of 100% specificity and confidence about exactly how this makes him happier and more productive through a quick one-and-done test of a use case or two. Make ‘soft metrics’ matter Imagine an experienced manager with an “open door policy.” Each workflow is aimed at a problem or opportunity to be solved.

ROI

ROI Uncertainty Metrics Testing

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

The uncertainty of not knowing where data issues will crop up next and the tiresome game of ‘who’s to blame’ when pinpointing the failure. In the context of Data in Place, validating data quality automatically with Business Domain Tests is imperative for ensuring the trustworthiness of your data assets.

Testing

Testing Data Quality Predictive Modeling Metrics

Uncertainties: Statistical, Representational, Interventional

The Unofficial Google Data Science Blog

DECEMBER 14, 2021

by AMIR NAJMI & MUKUND SUNDARARAJAN Data science is about decision making under uncertainty. Some of that uncertainty is the result of statistical inference, i.e., using a finite sample of observations for estimation. But there are other kinds of uncertainty, at least as important, that are not statistical in nature.

Uncertainty

Uncertainty Statistics Measurement Cost-Benefit

Leveraging Data Science To Grow And Manage Your Team

Smart Data Collective

AUGUST 18, 2021

Although widely used, keyword scanning software alone simply doesn’t generate sufficient success metrics when sifting through candidate resumes. So, in this situation, you may devise and implement an online test designed to assess candidates on their basic skills and knowledge of their field of work. Speed up the recruitment process.

Data Science

Data Science Management Big Data Metrics

Why HR professionals struggle with big data

CIO Business Intelligence

FEBRUARY 20, 2025

This is due, on the one hand, to the uncertainty associated with handling confidential, sensitive data and, on the other hand, to a number of structural problems. If a database already exists, the available data must be tested and corrected. Aspects such as employee satisfaction and talent development are often neglected.

Big Data

Big Data Measurement Visualization Machine Learning

Integrate sparse and dense vectors to enhance knowledge retrieval in RAG using Amazon OpenSearch Service

AWS Big Data

SEPTEMBER 5, 2024

Although the absolute metrics of the sparse vector model can’t surpass those of the best dense vector models, it possesses unique and advantageous characteristics. In subsequent experiments, we input the context field into the index of OpenSearch as text content, and use the question field as a query for the retrieval test.

Metrics

Metrics Testing Experimentation Modeling

IT leader’s survival guide: 11 ways to thrive in the years ahead

CIO Business Intelligence

JUNE 8, 2022

Digital disruption, global pandemic, geopolitical crises, economic uncertainty — volatility has thrown into question time-honored beliefs about how best to lead IT. Thriving amid uncertainty means staying flexible, he argues. . The coming months are a leadership test for CIOs, and it’s a pass/fail grade.”. Keep calm and lead on.

IT

IT Cost-Benefit Uncertainty Digital Transformation

In AI we trust? Why we Need to Talk About Ethics and Governance (part 2 of 2)

Cloudera

DECEMBER 3, 2021

Systems should be designed with bias, causality and uncertainty in mind. Uncertainty is a measure of our confidence in the predictions made by a system. We need to understand and provide the greatest human oversight on systems with the greatest levels of uncertainty. System Design. Human Judgement & Oversight. Model Drift.

Uncertainty

Uncertainty Measurement Metrics Risk

Humans-in-the-loop forecasting: integrating data science and business planning

The Unofficial Google Data Science Blog

DECEMBER 4, 2019

This classification is based on the purpose, horizon, update frequency and uncertainty of the forecast. A single model may also not shed light on the uncertainty range we actually face. For example, we may prefer one model to generate a range, but use a second scenario-based model to “stress test” the range.

Forecasting

Forecasting Data Science Statistics Uncertainty

Rebooting expectations to connect and lead in more meaningful ways

CIO Business Intelligence

SEPTEMBER 22, 2022

Seeing that remote working continues to be a pressing issue still finding its footing after nearly three years in beta testing, the work surrounding feasible solutions seems to compound as time goes on, with some intending a full return to office while others have forged the company future on remote models. Go for the answer you already know.

Uncertainty

Uncertainty Forecasting Visualization Big Data

Data Teams and Their Types of Data Journeys

DataKitchen

OCTOBER 2, 2023

Data Journeys track and monitor all levels of the data stack, from data to tools to code to tests across all critical dimensions. A Data Journey supplies real-time statuses and alerts on start times, processing durations, test results, and infrastructure events, among other metrics.

Data Quality

Data Quality Testing Uncertainty Data Enablement

8 ways to retain top developer talent

CIO Business Intelligence

MARCH 10, 2023

Because of this, the importance of code quality shows why project velocity is not a metric to be seen in isolation as it often is. Mitigate DX-killing red tape The business longs for metrics and insights into what’s happening in the dark interior of software creation. But too much intrusion into developer workflow is a real DX killer.

Software

Software Metrics Uncertainty Sales

Product Management for AI

Domino Data Lab

JUNE 23, 2019

Ensure that product managers work on projects that matter to the business and/or are aligned to strategic company metrics. As a result, Skomoroch advocates getting “designers and data scientists, machine learning folks together and using real data and prototyping and testing” as quickly as possible. It is similar to R&D.

Management

Management Machine Learning Experimentation Metrics

What is ERP? Enterprise resource planning systems explained

CIO Business Intelligence

SEPTEMBER 27, 2022

Deploy the system: Prior to the final cutover, multiple activities have to be completed, including training of staff on the system, planning support to answer questions and resolve problems after the ERP is operational, testing the system, making the “Go live” decision in conjunction with the executive sponsor. Hidden costs of ERP.

Enterprise

Enterprise Cost-Benefit Manufacturing Software

Towards optimal experimentation in online systems

The Unofficial Google Data Science Blog

APRIL 23, 2024

the weight given to Likes in our video recommendation algorithm) while $Y$ is a vector of outcome measures such as different metrics of user experience (e.g., Crucially, it takes into account the uncertainty inherent in our experiments. Here, $X$ is a vector of tuning parameters that control the system's operating characteristics (e.g.

Experimentation

Experimentation Optimization Uncertainty Metrics

Why model calibration matters and how to achieve it

The Unofficial Google Data Science Blog

APRIL 19, 2021

To explain, let’s borrow a quote from Nate Silver’s The Signal and the Noise : One of the most important tests of a forecast — I would argue that it is the single most important one — is called calibration. Calibration and other considerations Calibration is a desirable property, but it is not the only important metric.

Modeling

Modeling IT Metrics Testing

ITIL certification guide: Costs, requirements, levels, and paths

CIO Business Intelligence

JULY 7, 2023

ITIL Specialist High Velocity IT : In this module you’ll learn how to integrate methodologies such as agile and Lean with other technical skills, including cloud, automation, and automatic testing, to deliver rapid delivery of products and services.

Cost-Benefit

Cost-Benefit Strategy Management Uncertainty

The Case for Continuous Financial Planning after Covid-19

Jet Global

OCTOBER 21, 2020

During periods of uncertainty, this helps us plan for – and be ready to respond to – different outcomes. Communicate the key performance indicators: Identify the key metrics that senior management will use to monitor the business on a daily, weekly, and monthly basis. The past few months have shown the benefits of continuous planning.

Forecasting

Forecasting Finance Business Driver Key Performance Indicator

Viral, Social, Sentiment, Mobile: 4 Delightful Web Analytics Solutions

Occam's Razor

JULY 12, 2010

Please click on the above image for a higher resolution version , including all the other metrics.]. I love the data you saw in the very first screenshot, and I absolutely love this… [Please click on the above image for a higher resolution version , including all the other metrics.]. Say it ain't so! :). Why is this cool?

Analytics

Analytics Measurement Metrics KPI

Estimating causal effects using geo experiments

The Unofficial Google Data Science Blog

MAY 31, 2016

Similarly, we could test the effectiveness of a search ad compared to showing only organic search results. This means it is possible to specify exactly in which geos an ad campaign will be served – and to observe the ad spend and the response metric at the geo level. They are non-overlapping geo-targetable regions.

Advertising

Advertising Testing Sales Statistics

Making Financial Planning a Continuous and Popular Activity

Jet Global

SEPTEMBER 10, 2020

Living through periods of rapid upheaval and uncertainty, like the recent pandemic, forces us to adapt quickly to new working practices. Imagine having real-time indicators consolidated across sales, marketing, operations, and HR—in addition to metrics on recent acquisitions and overall market trends.

Forecasting

Forecasting Finance Uncertainty Sales

Data scientist as scientist

The Unofficial Google Data Science Blog

OCTOBER 21, 2015

The beliefs of this community are always evolving, and the process of thoughtfully generating, testing, refuting and accepting ideas looks a lot like Science. Note also that this account does not involve ambiguity due to statistical uncertainty. the power grid, a streaming music service, the human body, the weather).

Slice and Dice

Slice and Dice Experimentation Data-driven Data Science

How guardrails allow enterprises to deploy safe, effective AI

CIO Business Intelligence

JULY 10, 2024

On top of this, Relex added instructions to its prompt to avoid answering any questions outside the company’s knowledge base, he says, and to express uncertainty when the question was at the limits of its knowledge or skills. Other hyperscalers also offer guardrails that work with their gen AI platforms. “It

Enterprise

Enterprise Risk Modeling Risk Management

Misadventures in experiments for growth

The Unofficial Google Data Science Blog

APRIL 16, 2019

Such decisions involve an actual hypothesis test on specific metrics (e.g. Often, an established product will have an overall evaluation criterion (OEC) that incorporates trade-offs among important metrics and between short- and long-term success. The metrics to measure the impact of the change might not yet be established.

Experimentation

Experimentation Sales Metrics Measurement

Measuring Validity and Reliability of Human Ratings

The Unofficial Google Data Science Blog

JULY 18, 2023

Once we’ve answered that, we will then define and use metrics to understand the quality of human-labeled data, along with a measurement framework that we call Cross-replication Reliability or xRR. We will follow the example of Janson and Olsson , and start from this generalized definition of the metric, which they call iota.

Measurement

Measurement Metrics Uncertainty Slice and Dice

Variance and significance in large-scale online services

The Unofficial Google Data Science Blog

JANUARY 14, 2016

Unlike experimentation in some other areas, LSOS experiments present a surprising challenge to statisticians — even though we operate in the realm of “big data”, the statistical uncertainty in our experiments can be substantial. We must therefore maintain statistical rigor in quantifying experimental uncertainty.

Experimentation

Experimentation Statistics Metrics Measurement

Predicting Movie Profitability and Risk at the Pre-production Phase

Insight

FEBRUARY 19, 2020

I held out 20% of this as a test set and used the remainder for training and validation. Building Models to Predict Movie Profitability Here I use profitability as the metric of success for a film and define profitability as the return on investment (ROI). Scatterplot of the predicted ROI vs. the true ROI for the hold-out test set.

Risk

Risk ROI Modeling Metrics

Take Advantage Of The Best Interactive & Effective Data Visualization Examples

datapine

SEPTEMBER 4, 2023

Your Chance: Want to test a powerful data visualization software? Your Chance: Want to test a powerful data visualization software? Not only is each flight color-coded by the airline, but this short movie-style visualization has transformed flight-based metrics into a piece of art that shows the path of each flight in action.

Interactive

Interactive Visualization Cost-Benefit Dashboards

Data Leaders Brief

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

The Lean Analytics Cycle: Metrics > Hypothesis > Experiment > Act

Webinars

Trending Sources

You Can’t Regulate What You Don’t Understand

Webinars

What you need to know about product management for AI

AI Product Management After Deployment

3 ways to avoid the generative AI ROI doom loop

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

Uncertainties: Statistical, Representational, Interventional

Leveraging Data Science To Grow And Manage Your Team

Why HR professionals struggle with big data

Integrate sparse and dense vectors to enhance knowledge retrieval in RAG using Amazon OpenSearch Service

IT leader’s survival guide: 11 ways to thrive in the years ahead

In AI we trust? Why we Need to Talk About Ethics and Governance (part 2 of 2)

Humans-in-the-loop forecasting: integrating data science and business planning

Rebooting expectations to connect and lead in more meaningful ways

Data Teams and Their Types of Data Journeys

8 ways to retain top developer talent

Product Management for AI

What is ERP? Enterprise resource planning systems explained

Towards optimal experimentation in online systems

Why model calibration matters and how to achieve it

ITIL certification guide: Costs, requirements, levels, and paths

The Case for Continuous Financial Planning after Covid-19

Viral, Social, Sentiment, Mobile: 4 Delightful Web Analytics Solutions

Estimating causal effects using geo experiments

Making Financial Planning a Continuous and Popular Activity

Data scientist as scientist

How guardrails allow enterprises to deploy safe, effective AI

Misadventures in experiments for growth

Measuring Validity and Reliability of Human Ratings

Variance and significance in large-scale online services

Predicting Movie Profitability and Risk at the Pre-production Phase

Take Advantage Of The Best Interactive & Effective Data Visualization Examples

Stay Connected