Document, Modeling and Testing - Data Leaders Brief

Beyond “Prompt and Pray”

O'Reilly on Data

JANUARY 21, 2025

Your companys AI assistant confidently tells a customer its processed their urgent withdrawal requestexcept it hasnt, because it misinterpreted the API documentation. This fueled a belief that simply making models bigger would solve deeper issues like accuracy, understanding, and reasoning. Development velocity grinds to a halt.

Cost-Benefit

Cost-Benefit Testing Interactive Software

Can Language Models Replace Compilers?

O'Reilly on Data

JANUARY 9, 2024

We still rely on humans to test and fix the errors. With the current models, every time you generate code, you’re likely to get something different. How do you understand what the program is doing if it’s a different program each time you generate and test it? Bard even gives you several alternatives to choose from.)

Modeling

Modeling Software Testing Optimization

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly on Data

MARCH 25, 2025

Weve seen this across dozens of companies, and the teams that break out of this trap all adopt some version of Evaluation-Driven Development (EDD), where testing, monitoring, and evaluation drive every decision from the start. What breaks your app in production isnt always what you tested for in dev! The way out?

Testing

Testing Data-driven Software Measurement

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Why you should care about debugging machine learning models

O'Reilly on Data

DECEMBER 12, 2019

Not least is the broadening realization that ML models can fail. And that’s why model debugging, the art and science of understanding and fixing problems in ML models, is so critical to the future of ML. Because all ML models make mistakes, everyone who cares about ML should also care about model debugging. [1]

Machine Learning

Machine Learning Modeling Testing Risk Management

Generative AI for Farming

O'Reilly on Data

JUNE 18, 2024

While RAG is conceptually simple—look up relevant documents and construct a prompt that tells the model to build its response from them—in practice, it’s more complex. Including all those results in a RAG query would be impossible with most language models, and impractical with the few that allow large context windows.

Testing

Testing Software Modeling Measurement

5 top business use cases for AI agents

CIO Business Intelligence

MARCH 19, 2025

Meanwhile, in December, OpenAIs new O3 model, an agentic model not yet available to the public, scored 72% on the same test. Were developing our own AI models customized to improve code understanding on rare platforms, he adds. That adds up to millions of documents a month that need to be processed.

Software

Software Risk Enterprise Cost-Benefit

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

dbt helps manage data transformation by enabling teams to deploy analytics code following software engineering best practices such as modularity, continuous integration and continuous deployment (CI/CD), and embedded documentation. Create dbt models in dbt Cloud. Deploy dbt models to Amazon Redshift. Choose Test Connection.

Data Warehouse

Data Warehouse Analytics Testing Sales

Agentic AI design: An architectural case study

CIO Business Intelligence

NOVEMBER 19, 2024

From obscurity to ubiquity, the rise of large language models (LLMs) is a testament to rapid technological advancement. Just a few short years ago, models like GPT-1 (2018) and GPT-2 (2019) barely registered a blip on anyone’s tech radar. In our real-world case study, we needed a system that would create test data.

Testing

Testing Cost-Benefit Interactive ROI

Copyright, AI, and Provenance

O'Reilly on Data

DECEMBER 12, 2023

If the output of a model can’t be owned by a human, who (or what) is responsible if that output infringes existing copyright? In an article in The New Yorker , Jaron Lanier introduces the idea of data dignity, which implicitly distinguishes between training a model and generating output using a model.

Modeling

Modeling Sales Software Statistics

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

Your Chance: Want to test an agile business intelligence solution? Working software over comprehensive documentation. Business intelligence is moving away from the traditional engineering model: analysis, design, construction, testing, and implementation. Test BI in a small group and deploy the software internally.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Real World Programming with ChatGPT

O'Reilly on Data

APRIL 25, 2023

There’s a lot of excitement about how the GPT models and their successors will change programming. Many of the prompts are about testing: ChatGPT is instructed to generate tests for each function that it generates. At least in theory, test driven development (TDD) is widely practiced among professional programmers.

Testing

Testing Software Strategy Enterprise

Getting the timing right at Setterwalls to invest in AI support

CIO Business Intelligence

OCTOBER 11, 2024

With backing from management and great interest outside the organization, the agency, started a pilot project where three AI tools specially designed for lawyers were tested, compared, and evaluated. “We We had a fairly large evaluation group that test drove them side by side,” he says. So all of this has been adapted for AI. “No

Uncertainty

Uncertainty Testing Technology Modeling

Generative AI in the Enterprise

O'Reilly on Data

NOVEMBER 28, 2023

And everyone has opinions about how these language models and art generation programs are going to change the nature of work, usher in the singularity, or perhaps even doom the human race. 16% of respondents working with AI are using open source models. A few have even tried out Bard or Claude, or run LLaMA 1 on their laptop.

Enterprise

Enterprise Testing Modeling Reporting

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

AWS Big Data

NOVEMBER 19, 2024

A common adoption pattern is to introduce document search tools to internal teams, especially advanced document searches based on semantic search. In a real-world scenario, organizations want to make sure their users access only documents they are entitled to access. The following diagram depicts the solution architecture.

Management

Management Metadata Manufacturing Testing

Preparing for AI

O'Reilly on Data

SEPTEMBER 17, 2024

Chain-of-thought prompts often include some examples of problems, procedures, and solutions that are done correctly, giving the AI a model to emulate. Include documents: You can include documents as part of a prompt. Checking the AI is a strenuous test of your own knowledge. It may reduce hallucination.

Modeling

Modeling Reporting Sales Testing

7 types of tech debt that could cripple your business

CIO Business Intelligence

MARCH 25, 2025

Using the companys data in LLMs, AI agents, or other generative AI models creates more risk. Build up: Databases that have grown in size, complexity, and usage build up the need to rearchitect the model and architecture to support that growth over time.

Risk

Risk Cost-Benefit Data-driven Digital Transformation

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

They consist of: A data sample of the documents you want to index. A pipeline of processors that apply transforms on ingested documents. An index constructed from the processed documents. This template requires us to select a text embedding model. Ingest flows are created to enrich data as its added to an index.

Machine Learning

Machine Learning Visualization Dashboards Metadata

Introducing Accelerator for Machine Learning (ML) Projects: Summarization with Gemini from Vertex AI

Cloudera

DECEMBER 9, 2024

We built this AMP for two reasons: To add an AI application prototype to our AMP catalog that can handle both full document summarization and raw text block summarization. To showcase how easy it is to build an AI application using Cloudera AI and Google’s Vertex AI Model Garden.

Machine Learning

Machine Learning Modeling Testing Optimization

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

While generative AI has been around for several years , the arrival of ChatGPT (a conversational AI tool for all business occasions, built and trained from large language models) has been like a brilliant torch brought into a dark room, illuminating many previously unseen opportunities. So, if you have 1 trillion data points (g.,

Strategy

Strategy Experimentation Uncertainty Machine Learning

Managing machine learning in the enterprise: Lessons from banking and health care

O'Reilly on Data

JULY 15, 2019

In recent posts, we described requisite foundational technologies needed to sustain machine learning practices within organizations, and specialized tools for model development, model governance, and model operations/testing/monitoring. Sources of model risk. Model risk management. Image by Ben Lorica.

Machine Learning

Machine Learning Management Enterprise Risk Management

How IT leaders use agentic AI for business workflows

CIO Business Intelligence

APRIL 30, 2025

At ServiceNow, theyre infusing agentic AI into three core areas: answering customer or employee requests for things like technical support and payroll info; reducing workloads for teams in IT, HR, and customer service; and boosting developer productivity by speeding up coding and testing. For others, integration remains the biggest obstacle.

IT

IT Sales Cost-Benefit Data-driven

The Role of Model Governance in Machine Learning and Artificial Intelligence

Domino Data Lab

AUGUST 6, 2021

All models require testing and auditing throughout their deployment and, because models are continually learning, there is always an element of risk that they will drift from their original standards. As such, model governance needs to be applied to each model for as long as it’s being used.

Machine Learning

Machine Learning Modeling Testing Data Science

Structural Evolutions in Data

O'Reilly on Data

SEPTEMBER 19, 2023

Stage 2: Machine learning models Hadoop could kind of do ML, thanks to third-party tools. While data scientists were no longer handling Hadoop-sized workloads, they were trying to build predictive models on a different kind of “large” dataset: so-called “unstructured data.” Specifically, through simulation.

Machine Learning

Machine Learning Testing Modeling Cost-Benefit

AI Governance: Act now, thrive later

CIO Business Intelligence

JANUARY 30, 2025

While there is a lot of effort and content that is now available, it tends to be at a higher level which will require work to be done to create a governance model specifically for your organization. Governance is action and there are many actions an organization can take to create and implement an effective AI governance model.

Testing

Testing Metrics Cost-Benefit Modeling

From project to product: Architecting the future of enterprise technology

CIO Business Intelligence

JANUARY 14, 2025

Documentation and diagrams transform abstract discussions into something tangible. By articulating fitness functions automated tests tied to specific quality attributes like reliability, security or performance teams can visualize and measure system qualities that align with business goals.

Enterprise

Enterprise Technology Metrics Measurement

Lessons learned building natural language processing systems in health care

O'Reilly on Data

MARCH 7, 2019

Language understanding benefits from every part of the fast-improving ABC of software: AI (freely available deep learning libraries like PyText and language models like BERT ), big data (Hadoop, Spark, and Spark NLP ), and cloud (GPU's on demand and NLP-as-a-service from all the major cloud providers). Azure Text Analytics. Stanford Core NLP.

Deep Learning

Deep Learning Testing Machine Learning Modeling

The unreasonable importance of data preparation

O'Reilly on Data

MARCH 24, 2020

In a world focused on buzzword-driven models and algorithms, you’d be forgiven for forgetting about the unreasonable importance of data preparation and quality: your models are only as good as the data you feed them. The model and the data specification become more important than the code. Let’s get everybody to do X.

Machine Learning

Machine Learning Statistics Data Quality Data Collection

Beyond the hype: Do you really need an LLM for your data?

CIO Business Intelligence

FEBRUARY 6, 2025

The hype around large language models (LLMs) is undeniable. Think about it: LLMs like GPT-3 are incredibly complex deep learning models trained on massive datasets. Even basic predictive modeling can be done with lightweight machine learning in Python or R. This article reflects some of what Ive learned.

Unstructured Data

Unstructured Data Manufacturing Data Governance Sales

What Are ChatGPT and Its Friends?

O'Reilly on Data

MARCH 23, 2023

It’s important to understand that ChatGPT is not actually a language model. It’s a convenient user interface built around one specific language model, GPT-3.5, is one of a class of language models that are sometimes called “large language models” (LLMs)—though that term isn’t very helpful. with specialized training.

IT

IT Modeling Testing Risk

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

This upgrade allows you to build, test, and deploy data models in dbt with greater ease and efficiency, using all the features that dbt Cloud provides. This saves time and effort, especially for teams looking to minimize infrastructure management and focus solely on data modeling.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Introducing Cloudera Fine Tuning Studio for Training, Evaluating, and Deploying LLMs with Cloudera AI

Cloudera

NOVEMBER 13, 2024

Large Language Models (LLMs) will be at the core of many groundbreaking AI solutions for enterprise organizations. These enable customer service representatives to focus their time and attention on more high-value interactions, leading to a more cost-efficient service model. Increase Productivity.

Cost-Benefit

Cost-Benefit Data Processing Machine Learning Testing

When is data too clean to be useful for enterprise AI?

CIO Business Intelligence

NOVEMBER 27, 2024

Data quality for AI needs to cover bias detection, infringement prevention, skew detection in data for model features, and noise detection. Not all columns are equal, so you need to prioritize cleaning data features that matter to your model, and your business outcomes. asks Friedman.

Enterprise

Enterprise Data Quality Structured Data Modeling

AI Product Management After Deployment

O'Reilly on Data

OCTOBER 13, 2020

Similarly, in “ Building Machine Learning Powered Applications: Going from Idea to Product ,” Emmanuel Ameisen states: “Indeed, exposing a model to users in production comes with a set of challenges that mirrors the ones that come with debugging a model.”. Debugging AI Products.

Management

Management Machine Learning Metrics Modeling

Improve search results for AI using Amazon OpenSearch Service as a vector database with Amazon Bedrock

AWS Big Data

FEBRUARY 21, 2025

Search applications include ecommerce websites, document repository search, customer support call centers, customer relationship management, matchmaking for gaming, and application search. However, generative AI models can produce hallucinationsoutputs that appear convincing but contain factual errors.

Dashboards

Dashboards Modeling Measurement Interactive

What gives IT leaders pause as they look to integrate agentic AI with legacy infrastructure

CIO Business Intelligence

FEBRUARY 26, 2025

AI agents are powered by gen AI models but, unlike chatbots, they can handle more complex tasks, work autonomously, and be combined with other AI agents into agentic systems capable of tackling entire workflows, replacing employees or addressing high-level business goals. You can make AI agents return XML or an API call, says Avancini.

IT

IT Enterprise Interactive Data Quality

UK Government tests frictionless trade models with Ecosystem of Trust pilots

IBM Big Data Hub

SEPTEMBER 12, 2023

The UK government’s Ecosystem of Trust is a potential future border model for frictionless trade, which the UK government committed to pilot testing from October 2022 to March 2023. The models also reduce private sector customs data collection costs by 40%.

Testing

Testing Modeling Cost-Benefit Consulting

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

Building Models. A common task for a data scientist is to build a predictive model. You’ll try this with a few other algorithms, and their respective tuning parameters–maybe even break out TensorFlow to build a custom neural net along the way–and the winning model will be the one that heads to production.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

We need to do more than automate model building with autoML; we need to automate tasks at every stage of the data pipeline. have a large body of tools to choose from: IDEs, CI/CD tools, automated testing tools, and so on. We have great tools for working with code: creating it, managing it, testing it, and deploying it.

Machine Learning

Machine Learning Software Metadata Testing

Closer to AGI?

O'Reilly on Data

JUNE 7, 2022

DeepMind’s new model, Gato, has sparked a debate on whether artificial general intelligence (AGI) is nearer–almost at hand–just a matter of scale. Gato is a model that can solve multiple unrelated problems: it can play a large number of different games, label images, chat, operate a robot, and more. If we had AGI, how would we know it?

Modeling

Modeling Interactive Optimization Deep Learning

Minding Your Models

DataRobot Blog

JULY 22, 2022

Using AI-based models increases your organization’s revenue, improves operational efficiency, and enhances client relationships. You need to know where your deployed models are, what they do, the data they use, the results they produce, and who relies upon their results. That requires a good model governance framework.

Modeling

Modeling Risk Management Testing Machine Learning

Data Insights for Everyone — The Semantic Layer to the Rescue

Rocket-Powered Data Science

SEPTEMBER 20, 2021

We would be able to go far beyond searching for correctly spelled column headings in databases or specific keywords in data documentation, to find the data we needed (assuming we even knew the correct labels, metatags, and keywords used by the dataset creators). Sharing and integrating such important data streams has never been such a dream.

Data Science

Data Science Forecasting Business Intelligence Sales

ChatGPT, Author of The Quixote

O'Reilly on Data

MARCH 26, 2024

TL;DR LLMs and other GenAI models can reproduce significant chunks of training data. Researchers are finding more and more ways to extract training data from ChatGPT and other models. And the space is moving quickly: SORA , OpenAI’s text-to-video model, is yet to be released and has already taken the world by storm.

Modeling

Modeling Machine Learning Risk Advertising

The New O’Reilly Answers: The R in “RAG” Stands for “Royalties”

O'Reilly on Data

JUNE 14, 2024

Generative AI models are trained on large repositories of information and media. They are then able to take in prompts and produce outputs based on the statistical weights of the pretrained models of those corpora. The newest Answers release is again built with an open source model—in this case, Llama 3.

Metadata

Metadata Publishing Data-driven Modeling

This AI summer is abloom with smaller models, on more devices

CIO Business Intelligence

AUGUST 19, 2024

Bigger models, with more data, invariably equal better AI experiences. It turns out companies adopting generative AI today don’t need models with 1 trillion parameters or even hundreds of billions of parameters frontier LLMs are trained on. This lends itself well to use cases where corporate IP is included as part of the model.

Modeling

Modeling Testing Technology Marketing

Beyond “Prompt and Pray”

Can Language Models Replace Compilers?

Webinars

Trending Sources

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Webinars

Why you should care about debugging machine learning models

Generative AI for Farming

5 top business use cases for AI agents

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Agentic AI design: An architectural case study

Copyright, AI, and Provenance

Accomplish Agile Business Intelligence & Analytics For Your Business

Real World Programming with ChatGPT

Getting the timing right at Setterwalls to invest in AI support

Generative AI in the Enterprise

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

Preparing for AI

7 types of tech debt that could cripple your business

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

Introducing Accelerator for Machine Learning (ML) Projects: Summarization with Gemini from Vertex AI

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Managing machine learning in the enterprise: Lessons from banking and health care

How IT leaders use agentic AI for business workflows

The Role of Model Governance in Machine Learning and Artificial Intelligence

Structural Evolutions in Data

AI Governance: Act now, thrive later

From project to product: Architecting the future of enterprise technology

Lessons learned building natural language processing systems in health care

The unreasonable importance of data preparation

Beyond the hype: Do you really need an LLM for your data?

What Are ChatGPT and Its Friends?

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Introducing Cloudera Fine Tuning Studio for Training, Evaluating, and Deploying LLMs with Cloudera AI

When is data too clean to be useful for enterprise AI?

AI Product Management After Deployment

Improve search results for AI using Amazon OpenSearch Service as a vector database with Amazon Bedrock

What gives IT leaders pause as they look to integrate agentic AI with legacy infrastructure

UK Government tests frictionless trade models with Ecosystem of Trust pilots

Automating the Automators: Shift Change in the Robot Factory

Deep automation in machine learning

Closer to AGI?

Minding Your Models

Data Insights for Everyone — The Semantic Layer to the Rescue

ChatGPT, Author of The Quixote

The New O’Reilly Answers: The R in “RAG” Stands for “Royalties”

This AI summer is abloom with smaller models, on more devices

Stay Connected