Measurement, Modeling and Testing

Beyond “Prompt and Pray”

O'Reilly on Data

JANUARY 21, 2025

The Evolution of Expectations For years, the AI world was driven by scaling laws : the empirical observation that larger models and bigger datasets led to proportionally better performance. This fueled a belief that simply making models bigger would solve deeper issues like accuracy, understanding, and reasoning.

Cost-Benefit

Cost-Benefit Testing Interactive Software

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly on Data

MARCH 25, 2025

Weve seen this across dozens of companies, and the teams that break out of this trap all adopt some version of Evaluation-Driven Development (EDD), where testing, monitoring, and evaluation drive every decision from the start. What breaks your app in production isnt always what you tested for in dev! The way out?

Testing

Testing Data-driven Software Measurement

What are model governance and model operations?

O'Reilly on Data

JUNE 19, 2019

A look at the landscape of tools for building and deploying robust, production-ready machine learning models. We are also beginning to see researchers share sample code written in popular open source libraries, and some even share pre-trained models. Model development. Model governance. Source: Ben Lorica.

Modeling

Modeling Machine Learning Testing Metrics

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Data Observability and Data Quality Testing Certification Series

DataKitchen

MAY 14, 2024

Data Observability and Data Quality Testing Certification Series We are excited to invite you to a free four-part webinar series that will elevate your understanding and skills in Data Observation and Data Quality Testing. Register for free today and take the first step towards mastering data observability and quality testing!

Data Quality

Data Quality Testing Metrics Measurement

Why you should care about debugging machine learning models

O'Reilly on Data

DECEMBER 12, 2019

Not least is the broadening realization that ML models can fail. And that’s why model debugging, the art and science of understanding and fixing problems in ML models, is so critical to the future of ML. Because all ML models make mistakes, everyone who cares about ML should also care about model debugging. [1]

Machine Learning

Machine Learning Modeling Testing Risk Management

The Quality of Auto-Generated Code

O'Reilly on Data

OCTOBER 12, 2021

Kevlin Henney and I were riffing on some ideas about GitHub Copilot , the tool for automatically generating code base on GPT-3’s language model, trained on the body of code that’s in GitHub. We know how to test whether or not code is correct (at least up to a certain limit). First, we wondered about code quality.

Testing

Testing Measurement Consulting Modeling

Bringing an AI Product to Market

O'Reilly on Data

JULY 28, 2020

Product Managers are responsible for the successful development, testing, release, and adoption of a product, and for leading the team that implements those milestones. When a measure becomes a target, it ceases to be a good measure ( Goodhart’s Law ). You must detect when the model has become stale, and retrain it as necessary.

Marketing

Marketing Experimentation Metrics Testing

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

Let’s start by considering the job of a non-ML software engineer: writing traditional software deals with well-defined, narrowly-scoped inputs, which the engineer can exhaustively and cleanly model in the code. Not only is data larger, but models—deep learning models in particular—are much larger than before.

IT

IT Testing Experimentation Software

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Testing and Data Observability. DataOps needs a directed graph-based workflow that contains all the data access, integration, model and visualization steps in the data analytic production process. It orchestrates complex pipelines, toolchains, and tests across teams, locations, and data centers. Testing and Data Observability.

Testing

Testing Machine Learning Consulting Data Science

Start DataOps Today with ‘Lean DataOps’

DataKitchen

SEPTEMBER 20, 2021

The best way to ensure error-free execution of data production is through automated testing and monitoring. The DataKitchen Platform enables data teams to integrate testing and observability into data pipeline orchestrations. Automated tests work 24×7 to ensure that the results of each processing stage are accurate and correct.

Testing

Testing Metrics Measurement Dashboards

Bigeye Enable Monitoring, Quality and Lineage of Data

David Menninger's Analyst Perspectives

NOVEMBER 19, 2024

As a result, many data teams were not as productive as they might be, with time and effort spent on manually troubleshooting data-quality issues and testing data pipelines. The ability to monitor and measure improvements in data quality relies on instrumentation.

Data Quality

Data Quality Dashboards Data-driven Software

Can developer productivity be measured? Better than you think

CIO Business Intelligence

NOVEMBER 20, 2023

Measuring developer productivity has long been a Holy Grail of business. In addition, system, team, and individual productivity all need to be measured. The inner loop comprises activities directly related to creating the software product: coding, building, and unit testing. And like the Holy Grail, it has been elusive.

Measurement

Measurement Metrics Testing Software

6 steps to measure the business value of IT

CIO Business Intelligence

APRIL 5, 2023

In a joint study with Markus Westner and Tobias Held from the department of computer science and mathematics at the University of Regensburg, the 4C experts examined the topic by focusing on how the IT value proposition is measured, made visible, and communicated. They also tested the concept in a German mechanical engineering company.

Measurement

Measurement IT Metrics Consulting

Generative AI for Farming

O'Reilly on Data

JUNE 18, 2024

Many farmers measure their yield in bags of rice, but what is “a bag of rice”? While RAG is conceptually simple—look up relevant documents and construct a prompt that tells the model to build its response from them—in practice, it’s more complex. Digital Green tests with “Golden QAs,” highly rated sets of questions and answers.

Testing

Testing Software Modeling Measurement

7 types of tech debt that could cripple your business

CIO Business Intelligence

MARCH 25, 2025

Using the companys data in LLMs, AI agents, or other generative AI models creates more risk. Build up: Databases that have grown in size, complexity, and usage build up the need to rearchitect the model and architecture to support that growth over time.

Risk

Risk Cost-Benefit Data-driven Digital Transformation

Intelligence and Comprehension

O'Reilly on Data

FEBRUARY 15, 2022

But a recent discussion of Google’s new Large Language Models (LLMs), and its claim that one of these models (named Gopher) has demonstrated reading comprehension approaching human performance , has spurred some thoughts about comprehension, ambiguity, intelligence, and will. Ethics is for beings who can make choices.

Testing

Testing Modeling Measurement IT

CIOs face mounting pressure as AI costs and complexities threaten enterprise value

CIO Business Intelligence

OCTOBER 23, 2024

To address this, Gartner has recommended treating AI-driven productivity like a portfolio — balancing operational improvements with high-reward, game-changing initiatives that reshape business models. You must understand the cost components and pricing model options, and you need to know how to reduce these costs and negotiate with vendors.

Enterprise

Enterprise Cost-Benefit Broadcasting Risk

Reclaiming the stories that algorithms tell

O'Reilly on Data

MAY 27, 2020

Using the new scores, Apgar and her colleagues proved that many infants who initially seemed lifeless could be revived, with success or failure in each case measured by the difference between an Apgar score at one minute after birth, and a second score taken at five minutes. Books, in turn, get matching scores to reflect their difficulty.

Risk

Risk Testing Reporting Measurement

Practical Skills for The AI Product Manager

O'Reilly on Data

MAY 14, 2020

Experimentation: It’s just not possible to create a product by building, evaluating, and deploying a single model. In reality, many candidate models (frequently hundreds or even thousands) are created during the development process. Modelling: The model is often misconstrued as the most important component of an AI product.

Management

Management Experimentation B2B Machine Learning

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

While generative AI has been around for several years , the arrival of ChatGPT (a conversational AI tool for all business occasions, built and trained from large language models) has been like a brilliant torch brought into a dark room, illuminating many previously unseen opportunities. So, if you have 1 trillion data points (g.,

Strategy

Strategy Experimentation Uncertainty Machine Learning

AI Product Management After Deployment

O'Reilly on Data

OCTOBER 13, 2020

Similarly, in “ Building Machine Learning Powered Applications: Going from Idea to Product ,” Emmanuel Ameisen states: “Indeed, exposing a model to users in production comes with a set of challenges that mirrors the ones that come with debugging a model.”. Debugging AI Products.

Management

Management Machine Learning Metrics Modeling

Goodbye digital transformation, hello AI-first business transformation

CIO Business Intelligence

FEBRUARY 4, 2025

Instead of seeing digital as a new paradigm for our business, we over-indexed on digitizing legacy models and processes and modernizing our existing organization. This only fortified traditional models instead of breaking down the walls that separate people and work inside our organizations. And its testing us all over again.

Digital Transformation

Digital Transformation Sales Optimization Enterprise

AI Governance: Act now, thrive later

CIO Business Intelligence

JANUARY 30, 2025

While there is a lot of effort and content that is now available, it tends to be at a higher level which will require work to be done to create a governance model specifically for your organization. Governance is action and there are many actions an organization can take to create and implement an effective AI governance model.

Testing

Testing Metrics Cost-Benefit Modeling

US Air Force seeks generative AI test pilots

CIO Business Intelligence

JUNE 13, 2024

Not instant perfection The NIPRGPT experiment is an opportunity to conduct real-world testing, measuring generative AI’s computational efficiency, resource utilization, and security compliance to understand its practical applications. It is not training the model, nor are responses refined based on any user inputs.

Testing

Testing Experimentation Data Processing Modeling

3 musts when recruiting vendors for AI

CIO Business Intelligence

MARCH 5, 2025

The next thing is to make sure they have an objective way of testing the outcome and measuring success. Large software vendors are used to solving the integration problems that enterprises deal with on a daily basis, says Lee McClendon, chief digital and technology officer at software testing company Tricentis.

Testing

Testing Measurement Technology Experimentation

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

Model developers will test for AI bias as part of their pre-deployment testing. Quality test suites will enforce “equity,” like any other performance metric. Continuous testing, monitoring and observability will prevent biased models from deploying or continuing to operate. Companies Commit to Remote.

Testing

Testing Data Lake Data Architecture Manufacturing

Data Quality Power Moves: Scorecards & Data Checks for Organizational Impact

DataKitchen

SEPTEMBER 18, 2024

DataOps introduces agility by advocating for: Measuring data quality early : Data quality leaders should begin measuring and assessing data quality even before perfect standards are in place. Early measurements provide valuable insights that can guide future improvements. Measuring and Refining : DataOps is an iterative process.

Scorecard

Scorecard Data Quality Measurement Testing

How REA Group approaches Amazon MSK cluster capacity planning

AWS Big Data

DECEMBER 5, 2024

To address this, we used the AWS performance testing framework for Apache Kafka to evaluate the theoretical performance limits. We conducted performance and capacity tests on the test MSK clusters that had the same cluster configurations as our development and production clusters.

Metrics

Metrics Dashboards Testing Optimization

How to Set AI Goals

O'Reilly on Data

SEPTEMBER 15, 2020

In my book, I introduce the Technical Maturity Model: I define technical maturity as a combination of three factors at a given point of time. Technical sophistication: Sophistication measures a team’s ability to use advanced tools and techniques (e.g., PyTorch, TensorFlow, reinforcement learning, self-supervised learning).

Advertising

Advertising Cost-Benefit ROI Machine Learning

Minding Your Models

DataRobot Blog

JULY 22, 2022

Using AI-based models increases your organization’s revenue, improves operational efficiency, and enhances client relationships. You need to know where your deployed models are, what they do, the data they use, the results they produce, and who relies upon their results. That requires a good model governance framework.

Modeling

Modeling Risk Management Testing Risk

Do You Need a DataOps Dojo?

DataKitchen

JANUARY 20, 2021

Centralizing analytics helps the organization standardize enterprise-wide measurements and metrics. Develop/execute regression testing . Test data management and other functions provided ‘as a service’ . Central DataOps process measurement function with reports. Agile ticketing/Kanban tools. Deploy to production.

Metrics

Metrics Experimentation Measurement Testing

What Are ChatGPT and Its Friends?

O'Reilly on Data

MARCH 23, 2023

It’s important to understand that ChatGPT is not actually a language model. It’s a convenient user interface built around one specific language model, GPT-3.5, is one of a class of language models that are sometimes called “large language models” (LLMs)—though that term isn’t very helpful. with specialized training.

IT

IT Modeling Testing Risk

Lessons learned building natural language processing systems in health care

O'Reilly on Data

MARCH 7, 2019

Language understanding benefits from every part of the fast-improving ABC of software: AI (freely available deep learning libraries like PyText and language models like BERT ), big data (Hadoop, Spark, and Spark NLP ), and cloud (GPU's on demand and NLP-as-a-service from all the major cloud providers). Azure Text Analytics. Stanford Core NLP.

Deep Learning

Deep Learning Testing Machine Learning Modeling

Data Observability and Monitoring with DataOps

DataKitchen

MAY 10, 2021

Some will argue that observability is nothing more than testing and monitoring applications using tests, metrics, logs, and other artifacts. Since 2008, teams working for our founding team and our customers have delivered 100s of millions of data sets, dashboards, and models with almost no errors. Tie tests to alerts.

Testing

Testing Manufacturing Data Quality Statistics

Our Favorite Questions

O'Reilly on Data

OCTOBER 22, 2020

Taking the time to work this out is like building a mathematical model: if you understand what a company truly does, you don’t just get a better understanding of the present, but you can also predict the future. Since I work in the AI space, people sometimes have a preconceived notion that I’ll only talk about data and models.

Consulting

Consulting Risk Cost-Benefit Modeling

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

Instead of writing code with hard-coded algorithms and rules that always behave in a predictable manner, ML engineers collect a large number of examples of input and output pairs and use them as training data for their models. This has serious implications for software testing, versioning, deployment, and other core development processes.

Management

Management Machine Learning Experimentation Metrics

Scaling False Peaks

O'Reilly on Data

AUGUST 4, 2022

This kind of humility is likely to deliver more meaningful progress and a more measured understanding of such progress. DeepMind’s Gato is an AI model that can be taught to carry out many different kinds of tasks based on a single transformer neural network. We typically underappreciate how complex such systems are.

Machine Learning

Machine Learning Modeling Statistics Software

How To Succeed As a DataOps Engineer

DataKitchen

NOVEMBER 20, 2021

A DataOps Engineer can make test data available on demand. We have automated testing and a system for exception reporting, where tests identify issues that need to be addressed. It then autogenerates QC tests based on those rules. You can track, measure and create graphs and reporting in an automated way.

Testing

Testing Machine Learning Data Warehouse Analytics

7 ways gen AI can create more work than it saves

CIO Business Intelligence

NOVEMBER 13, 2024

One is going through the big areas where we have operational services and look at every process to be optimized using artificial intelligence and large language models. But a substantial 23% of respondents say the AI has underperformed expectations as models can prove to be unreliable and projects fail to scale.

IT

IT Consulting ROI Cost-Benefit

COVID-19 and Complex Systems

O'Reilly on Data

JUNE 24, 2020

The argument is that some systems are intrinsically difficult to model. You can’t control for, or even measure, several of these factors. Wearing masks as a prophylactic measure isn’t the big cultural leap that it has been in the United States. What does that mean?

Measurement

Measurement Testing Interactive Modeling

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

Your Chance: Want to test an agile business intelligence solution? Business intelligence is moving away from the traditional engineering model: analysis, design, construction, testing, and implementation. In the traditional model communication between developers and business users is not a priority. Finalize testing.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Managing machine learning in the enterprise: Lessons from banking and health care

O'Reilly on Data

JULY 15, 2019

In recent posts, we described requisite foundational technologies needed to sustain machine learning practices within organizations, and specialized tools for model development, model governance, and model operations/testing/monitoring. Sources of model risk. Model risk management. Image by Ben Lorica.

Machine Learning

Machine Learning Management Enterprise Risk Management

DataOps For Business Analytics Teams

DataKitchen

JANUARY 3, 2022

Business analytic teams have ongoing deliverables – a dashboard, a PowerPoint, or a model that they refresh and renew. Tests that verify and validate data flowing through the data pipelines are executed continuously. An impact review test suite executes before new analytics are deployed. Business Analytic Challenges.

Business Analytics

Business Analytics Analytics Testing Dashboards

DataOps is the Factory that Supports Your Data Mesh

DataKitchen

SEPTEMBER 17, 2021

DataOps produces clear measurement and monitoring of the end-to-end analytics pipelines starting with data sources. Design your data analytics workflows with tests at every stage of processing so that errors are virtually zero in number. In the DataKitchen context, monitoring and functional tests use the same code.

Testing

Testing Data Architecture Measurement Visualization

Beyond “Prompt and Pray”

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Webinars

Trending Sources

What are model governance and model operations?

Webinars

Data Observability and Data Quality Testing Certification Series

Why you should care about debugging machine learning models

The Quality of Auto-Generated Code

Bringing an AI Product to Market

MLOps and DevOps: Why Data Makes It Different

The DataOps Vendor Landscape, 2021

Start DataOps Today with ‘Lean DataOps’

Bigeye Enable Monitoring, Quality and Lineage of Data

Can developer productivity be measured? Better than you think

6 steps to measure the business value of IT

Generative AI for Farming

7 types of tech debt that could cripple your business

Intelligence and Comprehension

CIOs face mounting pressure as AI costs and complexities threaten enterprise value

Reclaiming the stories that algorithms tell

Practical Skills for The AI Product Manager

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

AI Product Management After Deployment

Goodbye digital transformation, hello AI-first business transformation

AI Governance: Act now, thrive later

US Air Force seeks generative AI test pilots

3 musts when recruiting vendors for AI

Eight Top DataOps Trends for 2022

Data Quality Power Moves: Scorecards & Data Checks for Organizational Impact

How REA Group approaches Amazon MSK cluster capacity planning

How to Set AI Goals

Minding Your Models

Do You Need a DataOps Dojo?

What Are ChatGPT and Its Friends?

Lessons learned building natural language processing systems in health care

Data Observability and Monitoring with DataOps

Our Favorite Questions

What you need to know about product management for AI

Scaling False Peaks

How To Succeed As a DataOps Engineer

7 ways gen AI can create more work than it saves

COVID-19 and Complex Systems

Accomplish Agile Business Intelligence & Analytics For Your Business

Managing machine learning in the enterprise: Lessons from banking and health care

DataOps For Business Analytics Teams

DataOps is the Factory that Supports Your Data Mesh

Stay Connected