Reference and Testing - Data Leaders Brief

Test – Blogathon

Analytics Vidhya

AUGUST 29, 2024

Introduction Hallucination in large language models (LLMs) refers to the generation of information that is factually incorrect, misleading, or fabricated. What […] The post Test – Blogathon appeared first on Analytics Vidhya.

Testing

Testing Modeling Analytics IT

Beyond “Prompt and Pray”

O'Reilly on Data

JANUARY 21, 2025

When we talk about conversational AI, were referring to systems designed to have a conversation, orchestrate workflows, and make decisions in real time. Instead of having LLMs make runtime decisions about business logic, use them to help create robust, reusable workflows that can be tested, versioned, and maintained like traditional software.

Cost-Benefit

Cost-Benefit Testing Interactive Software

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

For instance, records may be cleaned up to create unique, non-duplicated transaction logs, master customer records, and cross-reference tables. This involves setting up automated, column-by-column quality tests to quickly identify deviations from expected values and catch emerging issues before they impact downstream layers.

Data Quality

Data Quality Testing Metrics Reporting

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Agentic AI design: An architectural case study

CIO Business Intelligence

NOVEMBER 19, 2024

Now that we have covered AI agents, we can see that agentic AI refers to the concept of AI systems being capable of independent action and goal achievement, while AI agents are the individual components within this system that perform each specific task. In our real-world case study, we needed a system that would create test data.

Testing

Testing Cost-Benefit Interactive ROI

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly on Data

MARCH 25, 2025

Weve seen this across dozens of companies, and the teams that break out of this trap all adopt some version of Evaluation-Driven Development (EDD), where testing, monitoring, and evaluation drive every decision from the start. What breaks your app in production isnt always what you tested for in dev! The way out?

Testing

Testing Data-driven Software Measurement

The Syntax, Semantics, and Pragmatics Gap in Data Quality Validation Testing

DataKitchen

JULY 12, 2023

The Syntax, Semantics, and Pragmatics Gap in Data Quality Validate Testing Data Teams often have too many things on their ‘to-do’ list. Syntax-Based Profiling and Testing : By profiling the columns of data in a table, you can look at values in a column to understand and craft rules about what is allowed for a column.

Data Quality

Data Quality Testing Manufacturing Finance

CIOs contend with gen AI growing pains

CIO Business Intelligence

NOVEMBER 22, 2024

Unfortunately, despite hard-earned lessons around what works and what doesn’t, pressure-tested reference architectures for gen AI — what IT executives want most — remain few and far between, she said. “What’s Next for GenAI in Business” panel at last week’s Big.AI@MIT

Unstructured Data

Unstructured Data Testing Modeling Enterprise

12 AI predictions for 2025

CIO Business Intelligence

DECEMBER 30, 2024

The company says it can achieve PhD-level performance in challenging benchmark tests in physics, chemistry, and biology. In these uses case, we have enough reference implementations to point to and say, Theres value to be had here.'

Software

Software ROI Modeling Interactive

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

The applications must be integrated to the surrounding business systems so ideas can be tested and validated in the real world in a controlled manner. but to reference concrete tooling used today in order to ground what could otherwise be a somewhat abstract exercise. An Overarching Concern: Correctness and Testing.

IT

IT Testing Experimentation Software

Start DataOps Today with ‘Lean DataOps’

DataKitchen

SEPTEMBER 20, 2021

The best way to ensure error-free execution of data production is through automated testing and monitoring. The DataKitchen Platform enables data teams to integrate testing and observability into data pipeline orchestrations. Automated tests work 24×7 to ensure that the results of each processing stage are accurate and correct.

Testing

Testing Metrics Measurement Dashboards

How IT leaders use agentic AI for business workflows

CIO Business Intelligence

APRIL 30, 2025

Though loosely applied, agentic AI generally refers to granting AI agents more autonomy to optimize tasks and chain together increasingly complex actions. Boosting IT and security AI agents are transforming software engineering , aiding in code generation , testing, refactoring, observability, and beyond.

IT

IT Sales Cost-Benefit Data-driven

My top learning and pondering moments at Splunk.conf22

Rocket-Powered Data Science

JUNE 17, 2022

The dominant references everywhere to Observability was just the start of awesome brain food offered at Splunk’s.conf22 event. Reference ) The latest updates to the Splunk platform address the complexities of multi-cloud and hybrid environments, enabling cybersecurity and network big data functions (e.g., is here, now!

Machine Learning

Machine Learning Recreation/Entertainment Risk Business Objectives

Unlock the power of optimization in Amazon Redshift Serverless

AWS Big Data

MARCH 10, 2025

Also, we designed our test environment without setting the Amazon Redshift Serverless workgroup max capacity parametera key configuration that controls the maximum RPUs available to your data warehouse. By removing this limit, we could clearly showcase how different configurations affect scaling behavior in our test endpoints.

Optimization

Optimization Data Warehouse Data-driven Testing

Amazon EMR 7.5 runtime for Apache Spark and Iceberg can run Spark workloads 3.6 times faster than Spark 3.5.3 and Iceberg 1.6.1

AWS Big Data

DECEMBER 27, 2024

To assess the Spark engines performance with the Iceberg table format, we performed benchmark tests using the 3 TB TPC-DS dataset, version 2.13 (our results derived from the TPC-DS dataset are not directly comparable to the official TPC-DS results due to setup differences). 4xlarge instances, for testing both open source Spark 3.5.3

Cost-Benefit

Cost-Benefit Testing Metrics Optimization

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

For more examples and references to other posts, refer to the following GitHub repository. In case you don’t have sample data available for testing, we provide scripts for generating sample datasets on GitHub. For more examples and references to other posts on using XTable on AWS, refer to the following GitHub repository.

Metadata

Metadata Data Lake Snapshot Data Warehouse

How To Succeed As a DataOps Engineer

DataKitchen

NOVEMBER 20, 2021

A DataOps Engineer can make test data available on demand. We have automated testing and a system for exception reporting, where tests identify issues that need to be addressed. We often refer to data operations and analytics as a factory. It then autogenerates QC tests based on those rules.

Testing

Testing Machine Learning Data Warehouse Analytics

Lessons learned building natural language processing systems in health care

O'Reilly on Data

MARCH 7, 2019

They use a lot of jargon: 10/10 refers to the intensity of pain. Generalized abd radiating to lower” refers to general abdominal (stomach) pain that radiates to the lower back. Jargon refers to the 100-200 new words you learn in the first month after you join a new school or workplace. They don’t have a subject. IBM Watson NLU.

Deep Learning

Deep Learning Testing Machine Learning Modeling

Evaluating sample Amazon Redshift data sharing architecture using Redshift Test Drive and advanced SQL analysis

AWS Big Data

SEPTEMBER 10, 2024

Redshift Test Drive is a tool hosted on the GitHub repository that let customers evaluate which data warehouse configurations options are best suited for their workload. Generating and accessing Test Drive metrics The results of Amazon Redshift Test Drive can be accessed using an external schema for analysis of a replay.

Testing

Testing Snapshot Data Warehouse Metrics

Developer guidance on how to do local testing with Amazon MSK Serverless

AWS Big Data

SEPTEMBER 11, 2024

This allows developers to test their application with a Kafka cluster that has the same configuration as production and provides an identical infrastructure to the actual environment without needing to run Kafka locally. For guidance, refer to How to install Linux on Windows with WSL. ssh -i "~/ " ec2-user@ > -L 127.0.0.1:9098:

Testing

Testing Data Processing Management IT

You Can’t Regulate What You Don’t Understand

O'Reilly on Data

JUNE 15, 2023

And they are stress testing and “ red teaming ” them to uncover vulnerabilities. But exactly how this stress testing, post processing, and hardening works—or doesn’t—is mostly invisible to regulators. This is what Jeff Bezos has referred to as a “ one way door ,” a decision that, once made, is very hard to undo.

Metrics

Metrics Reporting Measurement Finance

Data center provider fakes Tier 4 data center certificate to bag $11M SEC deal

CIO Business Intelligence

OCTOBER 17, 2024

Deepak Jain, 49, of Potomac, was the CEO of an information technology services company (referred to in the indictment as Company A) that provided data center services to customers, including the SEC,” the US DOJ said in a statement. From 2012 through 2018, the SEC paid Company A approximately $10.7

Broadcasting

Broadcasting Risk Reporting Measurement

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

You’re now ready to sign in to both Aurora MySQL cluster and Amazon Redshift Serverless data warehouse and run some basic commands to test them. Choose Test Connection. Choose Next if the test succeeded. To add tests to your project: Create a new YAML file in the models directory and name it models/schema.yml.

Data Warehouse

Data Warehouse Analytics Testing Sales

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

In this post, we use the term vanilla Parquet to refer to Parquet files stored directly in Amazon S3 and accessed through standard query engines like Apache Spark, without the additional features provided by table formats such as Iceberg. Moreover, our tests show that for read-intensive workloads, Iceberg reduced DPU hours by 32.4%

Metadata

Metadata Snapshot Cost-Benefit Optimization

Write queries faster with Amazon Q generative SQL for Amazon Redshift

AWS Big Data

NOVEMBER 7, 2024

Refer to Easy analytics and cost-optimization with Amazon Redshift Serverless to get started. To test this, let’s ask Amazon Q to “delete data from web_sales table.” It can help optimize the generation process by reducing unnecessary table references. For pricing information, refer to Amazon Q generative SQL pricing.

Metadata

Metadata Sales Data Warehouse Optimization

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in Amazon OpenSearch Service

AWS Big Data

OCTOBER 11, 2024

Refer to this developer guide to understand more about index snapshots Understanding manual snapshots Manual snapshots are point-in-time backups of your OpenSearch Service domain that are initiated by the user. Testing and development – You can use snapshots to create copies of your data for testing or development purposes.

Snapshot

Snapshot Dashboards Management Testing

Generative AI in the Enterprise

O'Reilly on Data

NOVEMBER 28, 2023

Unexpected outcomes, security, safety, fairness and bias, and privacy are the biggest risks for which adopters are testing. Programmers have always developed tools that would help them do their jobs, from test frameworks to source control to integrated development environments. We’d like to see more companies test for fairness.

Enterprise

Enterprise Testing Modeling Reporting

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

AWS Big Data

NOVEMBER 19, 2024

Refer to Service Quotas for more details. Deploy the solution To deploy the solution to your AWS account, refer to the Readme file in our GitHub repo. Query documents with different personas Now let’s test the application using different personas. If needed, you can initiate a quota increase request.

Management

Management Metadata Manufacturing Testing

Avoiding Toxicity in Generative AI

David Menninger's Analyst Perspectives

SEPTEMBER 24, 2024

These types of prompts are referred to as jailbreak prompts. Regardless of whether you can curate the training data, it’s necessary to test the output of the models to identify any toxic content from an adversarial action. Red-teaming is a term used to describe human testing of models for vulnerabilities.

Testing

Testing Modeling Enterprise Risk

The New O’Reilly Answers: The R in “RAG” Stands for “Royalties”

O'Reilly on Data

JUNE 14, 2024

After all, research is only as good as your references, and the teams at both organizations acutely understood that the possibility of hallucinations and ungrounded answers could outright confuse and frustrate learners. Miso’s team shares O’Reilly’s belief in not developing LLMs without credit, consent, and compensation from creators.

Metadata

Metadata Publishing Data-driven Modeling

Introducing Amazon MWAA micro environments for Apache Airflow

AWS Big Data

NOVEMBER 19, 2024

These organizations often maintain multiple AWS accounts for development, testing, and production stages, leading to increased complexity and cost. This micro environment is particularly well-suited for development, testing, or small production workloads where resource optimization and cost-efficiency are primary concerns.

Metadata

Metadata Cost-Benefit Metrics Optimization

Data Observability and Monitoring with DataOps

DataKitchen

MAY 10, 2021

Some will argue that observability is nothing more than testing and monitoring applications using tests, metrics, logs, and other artifacts. Below we will explain how to virtually eliminate data errors using DataOps automation and the simple building blocks of data and analytics testing and monitoring. . Tie tests to alerts.

Testing

Testing Manufacturing Data Quality Statistics

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

In internal tests, AI-driven scaling and optimizations showcased up to 10 times price-performance improvements for variable workloads. Launch summary Following is the launch summary which provides the announcement links and reference blogs for the key announcements. Industry-leading price-performance: Amazon Redshift launches RA3.large

Data Lake

Data Lake Data Warehouse Data-driven Optimization

From project to product: Architecting the future of enterprise technology

CIO Business Intelligence

JANUARY 14, 2025

By articulating fitness functions automated tests tied to specific quality attributes like reliability, security or performance teams can visualize and measure system qualities that align with business goals. Experimentation: The innovation zone Progressive cities designate innovation districts where new ideas can be tested safely.

Enterprise

Enterprise Technology Metrics Measurement

Preparing for AI

O'Reilly on Data

SEPTEMBER 17, 2024

Is every reference correct and—even more important—does it exist? Checking the AI is a strenuous test of your own knowledge. Checking an AI is more like being a fact-checker for someone writing an important article: Can every fact be traced back to a documentable source? Is the AI’s output too vague or general to be useful?

Modeling

Modeling Reporting Sales Testing

What Are ChatGPT and Its Friends?

O'Reilly on Data

MARCH 23, 2023

There’s a very important difference between these two almost identical sentences: in the first, “it” refers to the cup. In the second, “it” refers to the pitcher. It’s by far the most convincing example of a conversation with a machine; it has certainly passed the Turing test. Ethan Mollick says that it is “only OK at search.

IT

IT Modeling Testing Risk

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Collaborating closely with our partners, we have tested and validated Amazon DataZone authentication via the Athena JDBC connection, providing an intuitive and secure connection experience for users. Refer to the detailed blog post on how you can use this to connect through various other tools.

Analytics

Analytics Visualization Data Governance Data-driven

Cost, security, and flexibility: the business case for open source gen AI

CIO Business Intelligence

DECEMBER 11, 2024

Thats a problem, since building commercial products requires a lot of testing and optimization. An abundance of choice In the most general definition, open source here refers to the code thats available, and that the model can be modified and used for free in a variety of contexts. Finally, theres the price.

Cost-Benefit

Cost-Benefit Modeling Marketing Sales

Integrate custom applications with AWS Lake Formation – Part 2

AWS Big Data

NOVEMBER 19, 2024

You can now test the newly created application by running the following command: npm run dev By default, the application is available on port 5173 on your local machine. Enter the following command: cd lfappblog && npm install You should now see the directory structure shown in the following screenshot.

Data Processing

Data Processing Metadata Publishing Testing

Implementing a Pharma Data Mesh using DataOps

DataKitchen

AUGUST 19, 2021

For each domain, one would want to know that a build was completed, that tests were applied and passed, and that data flowing through the system is correct. One challenge is that each domain team can choose a different toolset that complicates multi-level orchestration, testing and monitoring. Figure 5: Domain layer processing steps.

Data Warehouse

Data Warehouse Data Lake Manufacturing Testing

14 Dashboard Design Principles & Best Practices To Enhance Your Data Analysis

datapine

JULY 11, 2019

For reference, here are the 4 primary types of dashboards for each main branch business-based activity: Strategic: A dashboard focused on monitoring long-term company strategies by analyzing and benchmarking a wide range of critical trend-based information. Don’t try to place all the information on the same page. Provide context.

Dashboards

Dashboards Metrics Visualization Key Performance Indicator

5 top business use cases for AI agents

CIO Business Intelligence

MARCH 19, 2025

Meanwhile, in December, OpenAIs new O3 model, an agentic model not yet available to the public, scored 72% on the same test. Mitre has also tested dozens of commercial AI models in a secure Mitre-managed cloud environment with AWS Bedrock. By August, agentic AI systems approached 40% and today, theyve passed the 60% milestone.

Software

Software Risk Enterprise Cost-Benefit

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

have a large body of tools to choose from: IDEs, CI/CD tools, automated testing tools, and so on. We have great tools for working with code: creating it, managing it, testing it, and deploying it. Both refer to the source of the data: where does the data come from, how was it gathered, and how was it modified along the way?

Machine Learning

Machine Learning Software Metadata Testing

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

For each service, you need to learn the supported authorization and authentication methods, data access APIs, and framework to onboard and test data sources. The SageMaker Lakehouse data connection testing capability boosts your confidence in established connections. To learn more, refer to Amazon SageMaker Unified Studio.

Visualization

Visualization Data Processing Testing Publishing

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

AWS Big Data

MARCH 6, 2025

Refer to IAM Identity Center identity source tutorials for the IdP setup. For more details, refer to Creating a workgroup with a namespace. Refer to Authorization servers for more information about authorization servers in Okta. The application has been tested successfully with versions v3.12.8 Choose Create workgroup.

Visualization

Visualization Sales Data Warehouse Management

Test – Blogathon

Beyond “Prompt and Pray”

Webinars

Trending Sources

The Race For Data Quality in a Medallion Architecture

Webinars

Agentic AI design: An architectural case study

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

The Syntax, Semantics, and Pragmatics Gap in Data Quality Validation Testing

CIOs contend with gen AI growing pains

12 AI predictions for 2025

MLOps and DevOps: Why Data Makes It Different

Start DataOps Today with ‘Lean DataOps’

How IT leaders use agentic AI for business workflows

My top learning and pondering moments at Splunk.conf22

Unlock the power of optimization in Amazon Redshift Serverless

Amazon EMR 7.5 runtime for Apache Spark and Iceberg can run Spark workloads 3.6 times faster than Spark 3.5.3 and Iceberg 1.6.1

Run Apache XTable in AWS Lambda for background conversion of open table formats

How To Succeed As a DataOps Engineer

Lessons learned building natural language processing systems in health care

Evaluating sample Amazon Redshift data sharing architecture using Redshift Test Drive and advanced SQL analysis

Developer guidance on how to do local testing with Amazon MSK Serverless

You Can’t Regulate What You Don’t Understand

Data center provider fakes Tier 4 data center certificate to bag $11M SEC deal

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Build a high-performance quant research platform with Apache Iceberg

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in Amazon OpenSearch Service

Generative AI in the Enterprise

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

Avoiding Toxicity in Generative AI

The New O’Reilly Answers: The R in “RAG” Stands for “Royalties”

Introducing Amazon MWAA micro environments for Apache Airflow

Data Observability and Monitoring with DataOps

Recap of Amazon Redshift key product announcements in 2024

From project to product: Architecting the future of enterprise technology

Preparing for AI

What Are ChatGPT and Its Friends?

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Cost, security, and flexibility: the business case for open source gen AI

Integrate custom applications with AWS Lake Formation – Part 2

Implementing a Pharma Data Mesh using DataOps

14 Dashboard Design Principles & Best Practices To Enhance Your Data Analysis

5 top business use cases for AI agents

Deep automation in machine learning

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

Stay Connected