Remove Modeling Remove Risk Remove Testing
article thumbnail

Beyond “Prompt and Pray”

O'Reilly on Data

The Evolution of Expectations For years, the AI world was driven by scaling laws : the empirical observation that larger models and bigger datasets led to proportionally better performance. This fueled a belief that simply making models bigger would solve deeper issues like accuracy, understanding, and reasoning.

article thumbnail

Risk Management for AI Chatbots

O'Reilly on Data

Doing so means giving the general public a freeform text box for interacting with your AI model. Welcome to your company’s new AI risk management nightmare. ” ) With a chatbot, the web form passes an end-user’s freeform text input—a “prompt,” or a request to act—to a generative AI model.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Why you should care about debugging machine learning models

O'Reilly on Data

Not least is the broadening realization that ML models can fail. And that’s why model debugging, the art and science of understanding and fixing problems in ML models, is so critical to the future of ML. Because all ML models make mistakes, everyone who cares about ML should also care about model debugging. [1]

article thumbnail

What are model governance and model operations?

O'Reilly on Data

A look at the landscape of tools for building and deploying robust, production-ready machine learning models. We are also beginning to see researchers share sample code written in popular open source libraries, and some even share pre-trained models. Model development. Model governance. Source: Ben Lorica.

Modeling 230
article thumbnail

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly on Data

Weve seen this across dozens of companies, and the teams that break out of this trap all adopt some version of Evaluation-Driven Development (EDD), where testing, monitoring, and evaluation drive every decision from the start. What breaks your app in production isnt always what you tested for in dev! The way out?

Testing 174
article thumbnail

7 types of tech debt that could cripple your business

CIO Business Intelligence

CIOs perennially deal with technical debts risks, costs, and complexities. While the impacts of legacy systems can be quantified, technical debt is also often embedded in subtler ways across the IT ecosystem, making it hard to account for the full list of issues and risks.

Risk 123
article thumbnail

5 top business use cases for AI agents

CIO Business Intelligence

There are risks around hallucinations and bias, says Arnab Chakraborty, chief responsible AI officer at Accenture. Meanwhile, in December, OpenAIs new O3 model, an agentic model not yet available to the public, scored 72% on the same test. SS&C uses Metas Llama as well as other models, says Halpin.

Software 143