Blog and Testing - Data Leaders Brief

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

Get Off The Blocks Fast: Data Quality In The Bronze Layer Effective Production QA techniques begin with rigorous automated testing at the Bronze layer , where raw data enters the lakehouse environment. Data Drift Checks (does it make sense): Is there a shift in the overall data quality?

Data Quality

Data Quality Testing Metrics Reporting

Training a CNN from Scratch using Data Augmentation

Analytics Vidhya

SEPTEMBER 27, 2022

Introduction My last blog discussed the “Training of a convolutional neural network from scratch using the custom dataset.” ” In that blog, I have explained: how to create a dataset directory, train, test and validation dataset splitting, and training from scratch. This blog is […].

Testing

Testing Data Science Publishing Analytics

Data Observability and Data Quality Testing Certification Series

DataKitchen

MAY 14, 2024

Data Observability and Data Quality Testing Certification Series We are excited to invite you to a free four-part webinar series that will elevate your understanding and skills in Data Observation and Data Quality Testing. Register for free today and take the first step towards mastering data observability and quality testing!

Data Quality

Data Quality Testing Metrics Measurement

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

o3 vs o4-mini vs Gemini 2.5 pro: The Ultimate Reasoning Battle

Analytics Vidhya

APRIL 20, 2025

In this blog, we put o3, o4-mini, and Gemini 2.5 Pro through a series of intense challenges: physics puzzles, math problems, coding tasks, and real-world IQ tests. No hand-holding, no easy winsjust a raw test of thinking power. AI models keep getting smarter, but which one truly reasons under pressure?

Testing

Testing Modeling Analytics

Drug Launch Case Study: Amazing Efficiency Using DataOps

DataKitchen

DECEMBER 9, 2024

This blog dives into the remarkable journey of a data team that achieved unparalleled efficiency using DataOps principles and software that transformed their analytics and data teams into a hyper-efficient powerhouse. data quality tests every day to support a cast of analysts and customers.

Data Quality

Data Quality Data Lake Testing Statistics

c Part 3: Model Deployment and Model Monitoring

Analytics Vidhya

OCTOBER 17, 2022

Introduction This article is part of blog series on Machine Learning Operations(MLOps). In the previous articles, we have gone through the introduction, MLOps pipeline, model training, model testing, model packaging, and model registering. We have seen how to train, test, package, and register […].

Modeling

Modeling Machine Learning Testing Data Science

Building a custom CNN model: Identification of COVID-19

Analytics Vidhya

DECEMBER 22, 2021

This article was published as a part of the Data Science Blogathon Dear readers, In this blog, let’s build our own custom CNN(Convolutional Neural Network) model all from scratch by training and testing it with our custom image dataset.

Modeling

Modeling Data Science Testing Publishing

Announcing Open Source DataOps Data Quality TestGen 3.0

DataKitchen

FEBRUARY 20, 2025

Now With Actionable, Automatic, Data Quality Dashboards Imagine a tool that can point at any dataset, learn from your data, screen for typical data quality issues, and then automatically generate and perform powerful tests, analyzing and scoring your data to pinpoint issues before they snowball. DataOps just got more intelligent.

Data Quality

Data Quality Scorecard Testing Dashboards

How to Use ChatGPT as a Data Scientist?

Analytics Vidhya

APRIL 8, 2023

My latest blog post is jam-packed with fun and innovative experiments that I conducted with ChatGPT over the weekend. In this experiment, I put ChatGPT to the test and challenged it to […] The post How to Use ChatGPT as a Data Scientist? Introduction Are you a data scientist looking for an exciting and informative read?

Testing

Testing Analytics IT Data Science

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Read the complete blog below for a more detailed description of the vendors and their capabilities. Testing and Data Observability. It orchestrates complex pipelines, toolchains, and tests across teams, locations, and data centers. Testing and Data Observability. Production Monitoring and Development Testing.

Testing

Testing Machine Learning Consulting Data Science

Machine Learning is Invaluable for Mobile App Testing Automation

Smart Data Collective

NOVEMBER 23, 2022

We have talked extensively about some of the benefits of AI and machine learning in mobile app development in previous blog posts. However, one of the benefits that we haven’t talked as much about is the application of machine learning for testing new apps during the design process. What Is Automated Mobile App Testing?

Machine Learning

Machine Learning Testing Cost-Benefit Publishing

Agentic AI design: An architectural case study

CIO Business Intelligence

NOVEMBER 19, 2024

Development teams starting small and building up, learning, testing and figuring out the realities from the hype will be the ones to succeed. For instance, If you want to create a system to write blog entries, you might have a researcher agent, a writer agent and a user agent. There can be up to eight different data sets or files.

Testing

Testing Cost-Benefit Interactive ROI

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

You’re now ready to sign in to both Aurora MySQL cluster and Amazon Redshift Serverless data warehouse and run some basic commands to test them. Choose Test Connection. Choose Next if the test succeeded. This verifies that dbt Cloud can access your Redshift data warehouse. Make your initial commit by choosing Commit and sync.

Data Warehouse

Data Warehouse Analytics Testing Sales

10 Advantages of Python Over Other Programming Languages

Analytics Vidhya

JANUARY 2, 2024

Introduction In this technologically advanced era, programming languages come and go, but Python has stood the test of time, emerging as a titan in coding. Its simplicity, versatility, and robust community support have made it the go-to language for beginners and experts alike.

Testing

Testing Technology Analytics IT

DataKitchen’s Best of 2021 DataOps Resources

DataKitchen

DECEMBER 13, 2021

Without further ado, here are DataKitchen’s top ten blog posts, top five white papers, and top five webinars from 2021. Top 10 Blog Posts. Add DataOps Tests to Deploy with Confidence. We hope you and your family have happy holidays and we look forward to continuing your DataOps journey with you in the new year.

Data Architecture

Data Architecture Data Governance Testing Modeling

The Syntax, Semantics, and Pragmatics Gap in Data Quality Validation Testing

DataKitchen

JULY 12, 2023

The Syntax, Semantics, and Pragmatics Gap in Data Quality Validate Testing Data Teams often have too many things on their ‘to-do’ list. Syntax-Based Profiling and Testing : By profiling the columns of data in a table, you can look at values in a column to understand and craft rules about what is allowed for a column.

Data Quality

Data Quality Testing Manufacturing Finance

Companies look to sell off assets to pay for AI investments

CIO Business Intelligence

JANUARY 15, 2025

EY, in a recent blog post focused on top opportunities for IT companies in 2025, recommends money raised from these activities be used on AI projects. Divestitures can also help companies zero in on their potential and market relevance, the blog authors note. billion.

Sales

Sales Data-driven Marketing Optimization

DataKitchen’s 2020 Honors & Awards

DataKitchen

DECEMBER 30, 2020

Full disclosure: some images have been edited to remove ads or to shorten the scrolling in this blog post. DataKitchen provides an end-to-end DataOps platform that automates and coordinates people, tools, and environments in the entire data analytics organization—from orchestration, testing, and monitoring to development and deployment.

Testing

Testing Big Data Statistics Manufacturing

Guide to Cross-validation with Julius

Analytics Vidhya

MAY 9, 2024

It involves dividing a training dataset into multiple subsets and testing it on a new set. Introduction Cross-validation is a machine learning technique that evaluates a model’s performance on a new dataset. This prevents overfitting by encouraging the model to learn underlying trends associated with the data.

Machine Learning

Machine Learning Testing Modeling Analytics

Companies to shift AI goals in 2025 — with setbacks inevitable, Forrester predicts

CIO Business Intelligence

OCTOBER 24, 2024

2025 will be about the pursuit of near-term, bottom-line gains while competing for declining consumer loyalty and digital-first business buyers,” Sharyn Leaver, Forrester chief research officer, wrote in a blog post Tuesday. The rest of their time is spent creating designs, writing tests, fixing bugs, and meeting with stakeholders. “So

ROI

ROI Data-driven Enterprise Experimentation

UPCOMING WEBINAR: Automated Test Generation – Why Data Teams Need It

DataKitchen

MAY 31, 2023

We will also discuss how the vast majority of data engineers are so busy that they don’t know, or have time to write, tests to write to find data errors. The post UPCOMING WEBINAR: Automated Test Generation – Why Data Teams Need It first appeared on DataKitchen. It is the missing piece of our data systems.

Testing

Testing IT

The Terms and Conditions of a Data Contract are Data Tests

DataKitchen

DECEMBER 29, 2022

The Terms and Conditions of a Data Contract are Automated Production Data Tests. The best data contract is an automated production data test. Data testing plays a critical role in the process of implementing data contracts. Data testing ensures that the data is transmitted and received accurately and consistently.

Testing

Testing Statistics Data Quality Data Integration

ON DEMAND WEBINAR: Automated Test Generation – Why Data Teams Need It

DataKitchen

SEPTEMBER 20, 2023

We will also discuss how the vast majority of data engineers are so busy that they don’t know, or have time to write, tests to write to find data errors. The post ON DEMAND WEBINAR: Automated Test Generation – Why Data Teams Need It first appeared on DataKitchen. It is the missing piece of our data systems.

Testing

Testing IT

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

DataKitchen

MARCH 20, 2025

Rather than concentrating on individual tables, these teams devote their resources to ensuring each pipeline, workflow, or DAG (Directed Acyclic Graph) is transparent, thoroughly tested, and easily deployable through automation. Their data tables become dependable by-products of meticulously crafted and managed workflows.

Data Quality

Data Quality Testing Metrics Management

Sydney and the Bard

O'Reilly on Data

FEBRUARY 16, 2023

There are excellent summaries of these failures in Ben Thompson’s newsletter Stratechery and Simon Willison’s blog. That’s what beta tests are for. Will it take weeks, months, or years to iron out the problems with Microsoft’s and Google’s beta tests? The important question is where we go from here.

Testing

Testing Statistics Modeling Optimization

Addressing Data Mesh Technical Challenges with DataOps

DataKitchen

AUGUST 9, 2021

The domain requires a team that creates/updates/runs the domain, and we can’t forget metadata: catalogs, lineage, test results, processing history, etc., …. It can orchestrate a hierarchy of directed acyclic graphs ( DAGS ) that span domains and integrates testing at each step of processing.

Testing

Testing Data Lake Metadata Publishing

Amazon EMR 7.5 runtime for Apache Spark and Iceberg can run Spark workloads 3.6 times faster than Spark 3.5.3 and Iceberg 1.6.1

AWS Big Data

DECEMBER 27, 2024

To assess the Spark engines performance with the Iceberg table format, we performed benchmark tests using the 3 TB TPC-DS dataset, version 2.13 (our results derived from the TPC-DS dataset are not directly comparable to the official TPC-DS results due to setup differences). 4xlarge instances, for testing both open source Spark 3.5.3

Cost-Benefit

Cost-Benefit Testing Metrics Optimization

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

Your Chance: Want to test an agile business intelligence solution? Business intelligence is moving away from the traditional engineering model: analysis, design, construction, testing, and implementation. Test BI in a small group and deploy the software internally. Finalize testing. Test throughout the lifecycle.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Start DataOps Today with ‘Lean DataOps’

DataKitchen

SEPTEMBER 20, 2021

The best way to ensure error-free execution of data production is through automated testing and monitoring. The DataKitchen Platform enables data teams to integrate testing and observability into data pipeline orchestrations. Automated tests work 24×7 to ensure that the results of each processing stage are accurate and correct.

Testing

Testing Metrics Measurement Dashboards

DataOps is the Factory that Supports Your Data Mesh

DataKitchen

SEPTEMBER 17, 2021

Design your data analytics workflows with tests at every stage of processing so that errors are virtually zero in number. It’s hard enough to test within a single domain, but imagine testing with other domains which use different teams and toolchains, managed in other locations. Take a broader view.

Testing

Testing Data Architecture Measurement Visualization

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Collaborating closely with our partners, we have tested and validated Amazon DataZone authentication via the Athena JDBC connection, providing an intuitive and secure connection experience for users. Refer to the detailed blog post on how you can use this to connect through various other tools.

Analytics

Analytics Visualization Data Governance Data-driven

How To Succeed As a DataOps Engineer

DataKitchen

NOVEMBER 20, 2021

A DataOps Engineer can make test data available on demand. We have automated testing and a system for exception reporting, where tests identify issues that need to be addressed. It then autogenerates QC tests based on those rules. Every time we see an error, we address it with a new automated test.

Testing

Testing Machine Learning Data Warehouse Analytics

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

These rules are not necessarily “Rocket Science” (despite the name of this blog site), but they are common business sense for most business-disruptive technology implementations in enterprises. Keep it agile, with short design, develop, test, release, and feedback cycles: keep it lean, and build on incremental changes.

Strategy

Strategy Experimentation Uncertainty Machine Learning

Is Your Team in Denial of Data Quality? Here’s How to Tell

DataKitchen

MAY 2, 2025

That’s not testing; that’s wishful thinking. Use the laughter and mild embarrassment that follows as a gateway to having a serious conversation about your data quality testing practices, data assessment, and data observability. And the engineer proudly pointing to COUNT(*) as validation?

Data Quality

Data Quality Snapshot Testing Data-driven

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

Model developers will test for AI bias as part of their pre-deployment testing. Quality test suites will enforce “equity,” like any other performance metric. Continuous testing, monitoring and observability will prevent biased models from deploying or continuing to operate. Companies Commit to Remote.

Testing

Testing Data Lake Data Architecture Manufacturing

How DataOps is Transforming Commercial Pharma Analytics

DataKitchen

AUGUST 27, 2021

The Otezla team built a system with tens of thousands of automated tests checking data and analytics quality. When the tests pass, the orchestration admits the data to a data catalog. The DataKitchen DataOps Platform implements automation that replaces an army of people who previously executed manual tests, checklists and procedures.

Analytics

Analytics Sales Testing Cost-Benefit

Top 10 Management Reporting Best Practices To Create Effective Reports

datapine

OCTOBER 17, 2019

In this blog post, we’re going to give a bit of background and context about management reports, and then we’re going to outline 10 essential best practices you can use to make sure your reports are effective. Ask other key stakeholders within the organization to test your report and offer their feedback. Get testing!

Reporting

Reporting Management Dashboards KPI

The Ultimate Test of ChatGPT

Dataiku

NOVEMBER 15, 2024

To attempt to answer this question, this blog post will compare responses from ChatGPT and surveyed individuals. ChatGPT was released almost two years ago now, so we thought this would be a good time to analyze its performance. After two years of reinforcement learning , how knowledgeable has ChatGPT become?

Testing

Testing IT

DataKitchen Training And Certification Offerings

DataKitchen

MAY 7, 2024

DataKitchen Training And Certification Offerings For Individual contributors with a background in Data Analytics/Science/Engineering Overall Ideas and Principles of DataOps DataOps Cookbook (200 page book over 30,000 readers, free): DataOps Certificatio n (3 hours, online, free, signup online): DataOps Manifesto (over 30,000 signatures) One (..)

Data Quality

Data Quality Testing Consulting Metrics

Data Observability and Monitoring with DataOps

DataKitchen

MAY 10, 2021

Some will argue that observability is nothing more than testing and monitoring applications using tests, metrics, logs, and other artifacts. Below we will explain how to virtually eliminate data errors using DataOps automation and the simple building blocks of data and analytics testing and monitoring. . Tie tests to alerts.

Testing

Testing Manufacturing Data Quality Statistics

Accelerating Drug Discovery and Development with DataOps

DataKitchen

AUGUST 13, 2021

A drug company tests 50,000 molecules and spends a billion dollars or more to find a single safe and effective medicine that addresses a substantial market. Figure 1: A pharmaceutical company tests 50,000 compounds just to find one that reaches the market. A DataOps superstructure provides a common testing framework.

Testing

Testing Dashboards Marketing Measurement

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in Amazon OpenSearch Service

AWS Big Data

OCTOBER 11, 2024

Testing and development – You can use snapshots to create copies of your data for testing or development purposes. Note: While using Postman or Insomnia to run the API calls mentioned throughout this blog, choose AWS IAM v4 as the authentication method and input your IAM credentials in the Authorization section.

Snapshot

Snapshot Dashboards Management Testing

Introducing Cloudera Fine Tuning Studio for Training, Evaluating, and Deploying LLMs with Cloudera AI

Cloudera

NOVEMBER 13, 2024

Build and test training and inference prompts. Fine Tuning Studio ships with powerful prompt templating features, so users can build and test the performance of different prompts to feed into different models and model adapters during training. We can then test the prompt against the dataset to make sure everything is working properly.

Cost-Benefit

Cost-Benefit Data Processing Machine Learning Testing

Generative AI in the Enterprise

O'Reilly on Data

NOVEMBER 28, 2023

Unexpected outcomes, security, safety, fairness and bias, and privacy are the biggest risks for which adopters are testing. Programmers have always developed tools that would help them do their jobs, from test frameworks to source control to integrated development environments. Only 4% pointed to lower head counts. Perhaps not yet.

Enterprise

Enterprise Testing Modeling Reporting

The Race For Data Quality in a Medallion Architecture

Training a CNN from Scratch using Data Augmentation

Webinars

Trending Sources

Data Observability and Data Quality Testing Certification Series

Webinars

o3 vs o4-mini vs Gemini 2.5 pro: The Ultimate Reasoning Battle

Drug Launch Case Study: Amazing Efficiency Using DataOps

c Part 3: Model Deployment and Model Monitoring

Building a custom CNN model: Identification of COVID-19

Announcing Open Source DataOps Data Quality TestGen 3.0

How to Use ChatGPT as a Data Scientist?

The DataOps Vendor Landscape, 2021

Machine Learning is Invaluable for Mobile App Testing Automation

Agentic AI design: An architectural case study

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

10 Advantages of Python Over Other Programming Languages

DataKitchen’s Best of 2021 DataOps Resources

The Syntax, Semantics, and Pragmatics Gap in Data Quality Validation Testing

Companies look to sell off assets to pay for AI investments

DataKitchen’s 2020 Honors & Awards

Guide to Cross-validation with Julius

Companies to shift AI goals in 2025 — with setbacks inevitable, Forrester predicts

UPCOMING WEBINAR: Automated Test Generation – Why Data Teams Need It

The Terms and Conditions of a Data Contract are Data Tests

ON DEMAND WEBINAR: Automated Test Generation – Why Data Teams Need It

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

Sydney and the Bard

Addressing Data Mesh Technical Challenges with DataOps

Amazon EMR 7.5 runtime for Apache Spark and Iceberg can run Spark workloads 3.6 times faster than Spark 3.5.3 and Iceberg 1.6.1

Accomplish Agile Business Intelligence & Analytics For Your Business

Start DataOps Today with ‘Lean DataOps’

DataOps is the Factory that Supports Your Data Mesh

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

How To Succeed As a DataOps Engineer

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Is Your Team in Denial of Data Quality? Here’s How to Tell

Eight Top DataOps Trends for 2022

How DataOps is Transforming Commercial Pharma Analytics

Top 10 Management Reporting Best Practices To Create Effective Reports

The Ultimate Test of ChatGPT

DataKitchen Training And Certification Offerings

Data Observability and Monitoring with DataOps

Accelerating Drug Discovery and Development with DataOps

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in Amazon OpenSearch Service

Introducing Cloudera Fine Tuning Studio for Training, Evaluating, and Deploying LLMs with Cloudera AI

Generative AI in the Enterprise

Stay Connected