Document and Testing - Data Leaders Brief

Beyond “Prompt and Pray”

O'Reilly on Data

JANUARY 21, 2025

Your companys AI assistant confidently tells a customer its processed their urgent withdrawal requestexcept it hasnt, because it misinterpreted the API documentation. These are systems that engage in conversations and integrate with APIs but dont create stand-alone content like emails, presentations, or documents.

Cost-Benefit

Cost-Benefit Testing Interactive Software

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

Finally, the challenge we are addressing in this document – is how to prove the data is correct at each layer.? Get Off The Blocks Fast: Data Quality In The Bronze Layer Effective Production QA techniques begin with rigorous automated testing at the Bronze layer , where raw data enters the lakehouse environment.

Data Quality

Data Quality Testing Metrics Reporting

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly on Data

MARCH 25, 2025

Weve seen this across dozens of companies, and the teams that break out of this trap all adopt some version of Evaluation-Driven Development (EDD), where testing, monitoring, and evaluation drive every decision from the start. What breaks your app in production isnt always what you tested for in dev! The way out?

Testing

Testing Data-driven Software Measurement

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

5 predictions for emerging ’25 technology trends

CIO Business Intelligence

JANUARY 13, 2025

Advances in AI and ML will automate the compliance, testing, documentation and other tasks which can occupy 40-50% of a developers time. There will be productivity boosts for documentations, test cases the biggest value add immediately is human-in-the-loop internal efficiency use cases.

Technology

Technology Interactive Cost-Benefit Testing

Drug Launch Case Study: Amazing Efficiency Using DataOps

DataKitchen

DECEMBER 9, 2024

data quality tests every day to support a cast of analysts and customers. DataKitchen loaded this data and implemented data tests to ensure integrity and data quality via statistical process control (SPC) from day one. The numbers speak for themselves: working towards the launch, an average of 1.5

Data Quality

Data Quality Data Lake Testing Statistics

Generative AI for Farming

O'Reilly on Data

JUNE 18, 2024

While RAG is conceptually simple—look up relevant documents and construct a prompt that tells the model to build its response from them—in practice, it’s more complex. Keep in mind that, for Digital Green, this problem is both multilingual and multimodal: relevant documents can turn up in any of the languages or modes that they use.

Testing

Testing Software Modeling Measurement

5 top business use cases for AI agents

CIO Business Intelligence

MARCH 19, 2025

Meanwhile, in December, OpenAIs new O3 model, an agentic model not yet available to the public, scored 72% on the same test. Mitre has also tested dozens of commercial AI models in a secure Mitre-managed cloud environment with AWS Bedrock. That adds up to millions of documents a month that need to be processed.

Software

Software Risk Enterprise Cost-Benefit

Close Brothers unlocks RPA with Document Understanding

CIO Business Intelligence

SEPTEMBER 9, 2024

But Stephen Durnin, the company’s head of operational excellence and automation, says the 2020 Covid-19 pandemic thrust automation around unstructured input, like email and documents, into the spotlight. “We This was exacerbated by errors or missing information in documents provided by customers, leading to additional work downstream. “We

Finance

Finance Dashboards Sales Testing

Agentic AI design: An architectural case study

CIO Business Intelligence

NOVEMBER 19, 2024

Development teams starting small and building up, learning, testing and figuring out the realities from the hype will be the ones to succeed. These might be self-explanatory, but no matter what, there must always be documentation of the system. In our real-world case study, we needed a system that would create test data.

Testing

Testing Cost-Benefit Interactive ROI

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

Your Chance: Want to test an agile business intelligence solution? Working software over comprehensive documentation. Business intelligence is moving away from the traditional engineering model: analysis, design, construction, testing, and implementation. Test BI in a small group and deploy the software internally.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

dbt helps manage data transformation by enabling teams to deploy analytics code following software engineering best practices such as modularity, continuous integration and continuous deployment (CI/CD), and embedded documentation. Choose Test Connection. Choose Next if the test succeeded.

Data Warehouse

Data Warehouse Analytics Testing Sales

Real World Programming with ChatGPT

O'Reilly on Data

APRIL 25, 2023

Many of the prompts are about testing: ChatGPT is instructed to generate tests for each function that it generates. At least in theory, test driven development (TDD) is widely practiced among professional programmers. Tests tend to be very simple, and rarely get to the “hard stuff”: corner cases, error conditions, and the like.

Testing

Testing Software Strategy Enterprise

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

AWS Big Data

NOVEMBER 19, 2024

A common adoption pattern is to introduce document search tools to internal teams, especially advanced document searches based on semantic search. In a real-world scenario, organizations want to make sure their users access only documents they are entitled to access. The following diagram depicts the solution architecture.

Management

Management Metadata Manufacturing Testing

Getting the timing right at Setterwalls to invest in AI support

CIO Business Intelligence

OCTOBER 11, 2024

With backing from management and great interest outside the organization, the agency, started a pilot project where three AI tools specially designed for lawyers were tested, compared, and evaluated. “We We had a fairly large evaluation group that test drove them side by side,” he says. So all of this has been adapted for AI. “No

Uncertainty

Uncertainty Testing Technology Modeling

Copyright, AI, and Provenance

O'Reilly on Data

DECEMBER 12, 2023

I can also ask for a reading list about plagues in 16th century England, algorithms for testing prime numbers, or anything else. RAG takes your prompt, loads documents in your company’s archive that are relevant, packages everything together, and sends the prompt to the model. We have provenance.

Modeling

Modeling Sales Software Statistics

Accelerating Drug Discovery and Development with DataOps

DataKitchen

AUGUST 13, 2021

A drug company tests 50,000 molecules and spends a billion dollars or more to find a single safe and effective medicine that addresses a substantial market. Figure 1: A pharmaceutical company tests 50,000 compounds just to find one that reaches the market. A DataOps superstructure provides a common testing framework.

Testing

Testing Dashboards Marketing Measurement

Preparing for AI

O'Reilly on Data

SEPTEMBER 17, 2024

Include documents: You can include documents as part of a prompt. Checking an AI is more like being a fact-checker for someone writing an important article: Can every fact be traced back to a documentable source? Checking the AI is a strenuous test of your own knowledge. It may reduce hallucination.

Modeling

Modeling Reporting Sales Testing

How IT leaders use agentic AI for business workflows

CIO Business Intelligence

APRIL 30, 2025

At ServiceNow, theyre infusing agentic AI into three core areas: answering customer or employee requests for things like technical support and payroll info; reducing workloads for teams in IT, HR, and customer service; and boosting developer productivity by speeding up coding and testing. For others, integration remains the biggest obstacle.

IT

IT Sales Cost-Benefit Data-driven

Mastering Python Docstrings: A Comprehensive Guide

Analytics Vidhya

JANUARY 13, 2024

Introduction Welcome to “A Comprehensive Guide to Python Docstrings,” where we embark on a journey into documenting Python code effectively. Docstrings are pivotal in enhancing code readability, maintainability, and collaboration among developers.

Analytics

Analytics Testing

Data center provider fakes Tier 4 data center certificate to bag $11M SEC deal

CIO Business Intelligence

OCTOBER 17, 2024

According to the indictment, Jain’s firm provided fraudulent certification documents during contract negotiations in 2011, claiming that their Beltsville, Maryland, data center met Tier 4 standards, which require 99.995% uptime and advanced resilience features. By then, the Commission had spent $10.7 million on the contract. “If

Broadcasting

Broadcasting Risk Reporting Measurement

Unlock the power of optimization in Amazon Redshift Serverless

AWS Big Data

MARCH 10, 2025

You can use the query from the Amazon Redshift documentation and add the same start and end times. Also, we designed our test environment without setting the Amazon Redshift Serverless workgroup max capacity parametera key configuration that controls the maximum RPUs available to your data warehouse.

Optimization

Optimization Data Warehouse Data-driven Testing

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

They consist of: A data sample of the documents you want to index. A pipeline of processors that apply transforms on ingested documents. An index constructed from the processed documents. From the designer, we see that Cohere Rerank requires a list of documents and the query context as input.

Machine Learning

Machine Learning Visualization Dashboards Metadata

Introducing Accelerator for Machine Learning (ML) Projects: Summarization with Gemini from Vertex AI

Cloudera

DECEMBER 9, 2024

We built this AMP for two reasons: To add an AI application prototype to our AMP catalog that can handle both full document summarization and raw text block summarization. Benchmark tests indicate that Gemini Pro demonstrates superior speed in token processing compared to its competitors like GPT-4. More on AMPs can be found here.

Machine Learning

Machine Learning Modeling Testing Optimization

Generative AI in the Enterprise

O'Reilly on Data

NOVEMBER 28, 2023

Unexpected outcomes, security, safety, fairness and bias, and privacy are the biggest risks for which adopters are testing. And there are tools for archiving and indexing prompts for reuse, vector databases for retrieving documents that an AI can use to answer a question, and much more. Only 4% pointed to lower head counts.

Enterprise

Enterprise Testing Modeling Reporting

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in Amazon OpenSearch Service

AWS Big Data

OCTOBER 11, 2024

While a snapshot is in progress, you can still index documents and make other requests to the domain, but new documents and updates to existing documents generally aren’t included in the snapshot. Testing and development – You can use snapshots to create copies of your data for testing or development purposes.

Snapshot

Snapshot Dashboards Management Testing

Preparing for Q-Day: Safeguarding Enterprises Against Quantum Threats

David Menninger's Analyst Perspectives

MAY 15, 2025

However, they do not yet exist nor could they be tested. This reality leads to documenting enterprise encryption inventory as a step every organization can take today. A desirable outcome is to implement quantum-safe encryption algorithms. Until there are known quantum threats, the typical threat scanning methods serve no purpose.

Enterprise

Enterprise Risk Measurement Risk Management

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

Since ChatGPT is built from large language models that are trained against massive data sets (mostly business documents, internal text repositories, and similar resources) within your organization, consequently attention must be given to the stability, accessibility, and reliability of those resources. Test early and often.

Strategy

Strategy Experimentation Uncertainty Machine Learning

From project to product: Architecting the future of enterprise technology

CIO Business Intelligence

JANUARY 14, 2025

Documentation and diagrams transform abstract discussions into something tangible. By articulating fitness functions automated tests tied to specific quality attributes like reliability, security or performance teams can visualize and measure system qualities that align with business goals.

Enterprise

Enterprise Technology Metrics Measurement

Lessons learned building natural language processing systems in health care

O'Reilly on Data

MARCH 7, 2019

If you don’t believe me, feel free to test it yourself with the six popular NLP cloud services and libraries listed below. In a test done during December 2018, of the six engines, the only medical term (which only two of them recognized) was Tylenol as a product. IBM Watson NLU. Azure Text Analytics. spaCy Named Entity Visualizer.

Deep Learning

Deep Learning Testing Machine Learning Modeling

Fearing the Wrong Thing

O'Reilly on Data

JULY 11, 2023

Some of that time is spent in pointless meetings, but much of “the rest of the job” is understanding the user’s needs, designing, testing, debugging, reviewing code, finding out what the user really needs (that they didn’t tell you the first time), refining the design, building an effective user interface, auditing for security, and so on.

Testing

Testing Software Visualization Interactive

AI Governance: Act now, thrive later

CIO Business Intelligence

JANUARY 30, 2025

You need to perform testing of the new model and ensure that you are setting aside enough time for testing and evaluation. You can look at the documentation for additional information as well as to see which model can be used as a suggested replacement. The next part of any model update is the testing that needs to take place.

Testing

Testing Metrics Cost-Benefit Modeling

DataKitchen Training And Certification Offerings

DataKitchen

MAY 7, 2024

DataKitchen Training And Certification Offerings For Individual contributors with a background in Data Analytics/Science/Engineering Overall Ideas and Principles of DataOps DataOps Cookbook (200 page book over 30,000 readers, free): DataOps Certificatio n (3 hours, online, free, signup online): DataOps Manifesto (over 30,000 signatures) One (..)

Data Quality

Data Quality Testing Consulting Metrics

Improve search results for AI using Amazon OpenSearch Service as a vector database with Amazon Bedrock

AWS Big Data

FEBRUARY 21, 2025

Search applications include ecommerce websites, document repository search, customer support call centers, customer relationship management, matchmaking for gaming, and application search. Before FMs, search engines used a word-frequency scoring system called term frequency/inverse document frequency (TF/IDF).

Dashboards

Dashboards Modeling Measurement Interactive

Managing machine learning in the enterprise: Lessons from banking and health care

O'Reilly on Data

JULY 15, 2019

In recent posts, we described requisite foundational technologies needed to sustain machine learning practices within organizations, and specialized tools for model development, model governance, and model operations/testing/monitoring.

Machine Learning

Machine Learning Management Enterprise Risk Management

7 types of tech debt that could cripple your business

CIO Business Intelligence

MARCH 25, 2025

What CIOs can do: To make transitions to new AI capabilities less costly, invest in regression testing and change management practices around AI-enabled large-scale workflows.

Risk

Risk Cost-Benefit Data-driven Digital Transformation

Can Language Models Replace Compilers?

O'Reilly on Data

JANUARY 9, 2024

We still rely on humans to test and fix the errors. How do you understand what the program is doing if it’s a different program each time you generate and test it? Automated code generation doesn’t yet have the kind of reliability we expect from traditional programming; Simon Willison calls this “ vibes-based development.”

Modeling

Modeling Software Testing Optimization

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

This upgrade allows you to build, test, and deploy data models in dbt with greater ease and efficiency, using all the features that dbt Cloud provides. This makes sure your data models are well-documented, versioned, and straightforward to manage within a collaborative environment.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Structural Evolutions in Data

O'Reilly on Data

SEPTEMBER 19, 2023

A single document may represent thousands of features. You can see a simulation as a temporary, synthetic environment in which to test an idea. Millions of tests, across as many parameters as will fit on the hardware. Other groups have tested evolutionary algorithms in drug discovery. Specifically, through simulation.

Machine Learning

Machine Learning Testing Modeling Cost-Benefit

Beyond the hype: Do you really need an LLM for your data?

CIO Business Intelligence

FEBRUARY 6, 2025

For example, at a company providing manufacturing technology services, the priority was predicting sales opportunities, while at a company that designs and manufactures automatic test equipment (ATE), it was developing a platform for equipment production automation that relied heavily on forecasting.

Unstructured Data

Unstructured Data Manufacturing Data Governance Sales

Why you should care about debugging machine learning models

O'Reilly on Data

DECEMBER 12, 2019

In addition to newer innovations, the practice borrows from model risk management, traditional model diagnostics, and software testing. The study of security in ML is a growing field—and a growing problem, as we documented in a recent Future of Privacy Forum report. [8]. 6] See: Testing and Debugging Machine Learning Models. [7]

Machine Learning

Machine Learning Modeling Testing Risk Management

How Birmingham’s $48M Oracle ERP project turned into an epic failure

CIO Business Intelligence

FEBRUARY 21, 2025

Integration with Oracles systems proved more complex than expected, leading to prolonged testing and spiraling costs, the report stated. Despite providing a senior director to advise council officers and recommending go-live, EvoSyss actual contribution to program discussions appears minimal in meeting minutes and other documentation.

Reporting

Reporting Risk Testing Risk Management

Introducing Cloudera Fine Tuning Studio for Training, Evaluating, and Deploying LLMs with Cloudera AI

Cloudera

NOVEMBER 13, 2024

LLMs deployed as internal enterprise-specific agents can help employees find internal documentation, data, and other company information to help organizations easily extract and summarize important internal content. Build and test training and inference prompts. Increase Productivity. Evaluate the performance of trained LLMs.

Cost-Benefit

Cost-Benefit Data Processing Machine Learning Testing

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Collaborating closely with our partners, we have tested and validated Amazon DataZone authentication via the Athena JDBC connection, providing an intuitive and secure connection experience for users. Choose Test connection. Choose Test Connection. Get started with our technical documentation.

Visualization

Visualization Data Lake Testing Data Governance

AI, Protests, and Justice

O'Reilly on Data

JULY 21, 2020

The CVDazzle site states clearly that it’s designs have only been tested against one algorithm (and one that is now relatively old.) Indeed, human rights groups are already using AI: there’s an important initiative to use AI to document war crimes in Yemen. Juggalo makeup doesn’t alter basic facial structure.

Technology

Technology Sales Testing Software

Beyond “Prompt and Pray”

The Race For Data Quality in a Medallion Architecture

Webinars

Trending Sources

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Webinars

5 predictions for emerging ’25 technology trends

Drug Launch Case Study: Amazing Efficiency Using DataOps

Generative AI for Farming

5 top business use cases for AI agents

Close Brothers unlocks RPA with Document Understanding

Agentic AI design: An architectural case study

Accomplish Agile Business Intelligence & Analytics For Your Business

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Real World Programming with ChatGPT

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

Getting the timing right at Setterwalls to invest in AI support

Copyright, AI, and Provenance

Accelerating Drug Discovery and Development with DataOps

Preparing for AI

How IT leaders use agentic AI for business workflows

Mastering Python Docstrings: A Comprehensive Guide

Data center provider fakes Tier 4 data center certificate to bag $11M SEC deal

Unlock the power of optimization in Amazon Redshift Serverless

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

Introducing Accelerator for Machine Learning (ML) Projects: Summarization with Gemini from Vertex AI

Generative AI in the Enterprise

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in Amazon OpenSearch Service

Preparing for Q-Day: Safeguarding Enterprises Against Quantum Threats

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

From project to product: Architecting the future of enterprise technology

Lessons learned building natural language processing systems in health care

Fearing the Wrong Thing

AI Governance: Act now, thrive later

DataKitchen Training And Certification Offerings

Improve search results for AI using Amazon OpenSearch Service as a vector database with Amazon Bedrock

Managing machine learning in the enterprise: Lessons from banking and health care

7 types of tech debt that could cripple your business

Can Language Models Replace Compilers?

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Structural Evolutions in Data

Beyond the hype: Do you really need an LLM for your data?

Why you should care about debugging machine learning models

How Birmingham’s $48M Oracle ERP project turned into an epic failure

Introducing Cloudera Fine Tuning Studio for Training, Evaluating, and Deploying LLMs with Cloudera AI

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AI, Protests, and Justice

Stay Connected