2013, Modeling, Statistics and Testing

2013

Modeling

Statistics

Testing

Why you should care about debugging machine learning models

O'Reilly on Data

DECEMBER 12, 2019

Not least is the broadening realization that ML models can fail. And that’s why model debugging, the art and science of understanding and fixing problems in ML models, is so critical to the future of ML. Because all ML models make mistakes, everyone who cares about ML should also care about model debugging. [1]

Machine Learning

Machine Learning Modeling Testing Risk Management

The AIgent: Using Google’s BERT Language Model to Connect Writers & Representation

Insight

MARCH 12, 2020

In 2013, Robert Galbraith?—?an The AIgent was built with BERT, Google’s state-of-the-art language model. In this article, I will discuss the construction of the AIgent, from data collection to model assembly. More relevant to the AIgent is Google’s BERT model, a task-agnostic (i.e. an aspiring author?—?finished

Modeling

Modeling Metadata Publishing Sales

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

The AI Superhero Approach to Product Management

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Building Your BI Strategy: How to Choose a Solution That Scales and Delivers

MORE WEBINARS

Trending Sources

Data Drift Detection for Image Classifiers

Domino Data Lab

DECEMBER 1, 2019

This article covers how to detect data drift for models that ingest image data as their input in order to prevent their silent degradation in production. Introduction: preventing silent model degradation in production. This article explores an approach that can be used to detect data drift for models that classify/score image data.

Modeling

Modeling Machine Learning Deep Learning Testing

Webinars

The AI Superhero Approach to Product Management

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Building Your BI Strategy: How to Choose a Solution That Scales and Delivers

MORE WEBINARS

Towards optimal experimentation in online systems

The Unofficial Google Data Science Blog

APRIL 23, 2024

If $Y$ at that point is (statistically and practically) significantly better than our current operating point, and that point is deemed acceptable, we update the system parameters to this better value. Figure 2: Spreading measurements out makes estimates of model (slope of line) more accurate. And sometimes even if it is not[1].)

Experimentation

Experimentation Optimization Uncertainty Metrics

Credit Card Fraud Detection using XGBoost, SMOTE, and threshold moving

Domino Data Lab

APRIL 21, 2021

We’ll use a gradient boosting technique via XGBoost to create a model and I’ll walk you through steps you can take to avoid overfitting and build a model that is fit for purpose and ready for production. Let’s also look at the basic descriptive statistics for all attributes. from sklearn import metrics.

Statistics

Statistics Machine Learning Modeling Metrics

Build a RAG data ingestion pipeline for large-scale ML workloads

AWS Big Data

MARCH 13, 2024

For building any generative AI application, enriching the large language models (LLMs) with new data is imperative. Ray cluster for ingestion and creating vector embeddings In our testing, we found that the GPUs make the biggest impact to performance when creating the embeddings. This breaks out to approximately 200 per record.

Data Processing

Data Processing Dashboards Machine Learning Metrics

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

Deep Learning Illustrated: Building Natural Language Processing Models

Domino Data Lab

AUGUST 22, 2019

The excerpt covers how to create word vectors and utilize them as an input into a deep learning model. While the field of computational linguistics, or Natural Language Processing (NLP), has been around for decades, the increased interest in and use of deep learning models has also propelled applications of NLP forward within industry.

Deep Learning

Deep Learning Modeling Metrics Testing

Data Science at The New York Times

Domino Data Lab

JULY 9, 2019

Diving into examples of building and deploying ML models at The New York Times including the descriptive topic modeling-oriented Readerscope (audience insights engine), a prediction model regarding who was likely to subscribe/cancel their subscription, as well as prescriptive example via recommendations of highly curated editorial content.

Data Science

Data Science Machine Learning Advertising Modeling

Using DataOps to Drive Agility and Business Value

DataKitchen

JUNE 24, 2021

Chapin shared that even though GE had embraced agile practices since 2013, the company still struggled with massive amounts of legacy systems. Most companies have legacy models in software development that are well-oiled machines. Success Requires Focus on Business Outcomes, Benchmarking. Take a show-me approach.

ROI

ROI Metrics Measurement Cost-Benefit

Manipulating Data with dplyr

Domino Data Lab

MARCH 27, 2019

For example, you can calculate the average percentage of votes cast for Democratic Party candidates: # Compute summary statistics for the `presidentialElections` data frame average_votes <- summarize(. Using the summarize() function to calculate summary statistics for the presidentialElections data frame. Red notes are added.

Statistics

Statistics Data Science Visualization IT

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Companies like Tableau (which raised over $250 million when it had its IPO in 2013) demonstrated an unmet need in the market. These licensing terms are critical: Perpetual license vs subscription: Subscription is a pay-as-you-go model that provides flexibility as you evaluate a vendor. Their dashboards were visually stunning.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Data Leaders Brief

Why you should care about debugging machine learning models

The AIgent: Using Google’s BERT Language Model to Connect Writers & Representation

Webinars

Trending Sources

Data Drift Detection for Image Classifiers

Webinars

Towards optimal experimentation in online systems

Credit Card Fraud Detection using XGBoost, SMOTE, and threshold moving

Build a RAG data ingestion pipeline for large-scale ML workloads

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Deep Learning Illustrated: Building Natural Language Processing Models

Data Science at The New York Times

Using DataOps to Drive Agility and Business Value

Manipulating Data with dplyr

What Is Embedded Analytics?

Stay Connected