2013, Machine Learning and Testing

2013

Machine Learning

Testing

Why you should care about debugging machine learning models

O'Reilly on Data

DECEMBER 12, 2019

For all the excitement about machine learning (ML), there are serious impediments to its widespread adoption. In addition to newer innovations, the practice borrows from model risk management, traditional model diagnostics, and software testing. Not least is the broadening realization that ML models can fail. ML security audits.

Machine Learning

Machine Learning Modeling Testing Risk Management

DataKitchen’s 2020 Honors & Awards

DataKitchen

DECEMBER 30, 2020

In June of 2020, Database Trends & Applications featured DataKitchen’s end-to-end DataOps platform for its ability to coordinate data teams, tools, and environments in the entire data analytics organization with features such as meta-orchestration , automated testing and monitoring , and continuous deployment : DataKitchen [link].

Testing

Testing Big Data Statistics Manufacturing

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. In internal tests, AI-driven scaling and optimizations showcased up to 10 times price-performance improvements for variable workloads.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Overcoming Common Challenges in Natural Language Processing

Sisense

MAY 26, 2020

While training a model for NLP, words not present in the training data commonly appear in the test data. Because of this, predictions made using test data may not be correct. To solve this problem, machines need to capture the semantic meaning of words. Test data then contains this sentence: Pasta is delicious.

Unstructured Data

Unstructured Data Big Data Testing Machine Learning

FINRA CIO Steve Randich pushes the public cloud forward

CIO Business Intelligence

FEBRUARY 10, 2023

But for two years, we were testing limits within the public cloud.” Randich, who came to FINRA.org in 2013 after stints as co-CIO of Citigroup and former CIO of Nasdaq, is no stranger to the public cloud. “We spent about a year and a half going through several bottlenecks, taking them out one at a time with Amazon engineers.

Unstructured Data

Unstructured Data Data Lake Machine Learning Enterprise

Protein similarity search using ProtT5-XL-UniRef50 and Amazon OpenSearch Service

AWS Big Data

JULY 11, 2024

ProtT5-XL-UniRef50 is a machine learning (ML) model specifically designed to understand the language of proteins by converting protein sequences into multidimensional embeddings. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning”. Code for this solution is available in GitHub. Mikolov, T.;

Machine Learning

Machine Learning Modeling Data Processing Testing

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Build a RAG data ingestion pipeline for large-scale ML workloads

AWS Big Data

MARCH 13, 2024

RAG is a machine learning (ML) architecture that uses external documents (like Wikipedia) to augment its knowledge and achieve state-of-the-art results on knowledge-intensive tasks. He entered the big data space in 2013 and continues to explore that area. This is where the Retrieval Augmented Generation (RAG) technique comes in.

Data Processing

Data Processing Dashboards Machine Learning Metrics

How Wallapop improved performance of analytics workloads with Amazon Redshift Serverless and data sharing

AWS Big Data

NOVEMBER 14, 2023

Wallapop’s initial data architecture platform Wallapop is a Spanish ecommerce marketplace company focused on second-hand items, founded in 2013. Since its creation in 2013, it has reached more than 40 million downloads and more than 700 million products have been listed. The marketplace can be accessed via mobile app or website.

Data Warehouse

Data Warehouse Analytics Testing Cost-Benefit

Data Drift Detection for Image Classifiers

Domino Data Lab

DECEMBER 1, 2019

In the context of machine learning, we consider data drift 1 to be the change in model input data that leads to a degradation of model performance. Step 4: Generate the test, train and noisy MNIST data sets. Generate the train and test sets (x_train, _), (x_test, _) = mnist.load_data() x_train = x_train.astype('float32') / 255.

Machine Learning

Machine Learning Modeling Deep Learning Testing

PODCAST: COVID19 | Redefining Digital Enterprises – Episode 12: How AI is rapidly transforming the enterprise landscape in the post-COVID world

bridgei2i

JULY 28, 2020

It is my immense pleasure to introduce you all to our guest today Ria Persad, she’s named as international woman of the year by Renewable Energy World in power engineering in 2013 and the lifetime achievement leader by Platts Global Energy awards in 2014. We need people who can test.

Enterprise

Enterprise Digital Transformation Insurance B2B

Dresner’s Point: Don’t Overlook the Zigzagging of Collaboration & Text Analytics

Howard Dresner

FEBRUARY 11, 2014

Collaboration BI At one of my weekly #BIWisdom tweetchats this month, collaboration, social media and text analytics turned up in a discussion about 2013 BI predictions that didn’t pan out. Vendors need to automate and decrease that effort.” • “I tested a social analytics tool; I was less than impressed.

Analytics

Analytics Business Intelligence Data Processing Machine Learning

Credit Card Fraud Detection using XGBoost, SMOTE, and threshold moving

Domino Data Lab

APRIL 21, 2021

In this article, we’ll discuss the challenge organizations face around fraud detection, how machine learning can be used to identify and spot anomalies that the human eye might not catch. This is to prevent any information leakage into our test set. 2f%% of the test set." 2f%% of the test set."

Statistics

Statistics Machine Learning Modeling Metrics

The Value of Data for Philanthropy

Cloudera

AUGUST 6, 2018

For example, Crisis Text Line , which provides online support to people in crisis, received a total of 8 m illion text messages in the first two years of its existence between 2013 and 2015. Fox Foundation is testing a watch-type wearable device in Australia to continuously monitor the symptoms of patients with Parkinson’s disease.

Machine Learning

Machine Learning Internet of Things Data-driven Cost-Benefit

Using Empirical Bayes to approximate posteriors for large "black box" estimators

The Unofficial Google Data Science Blog

NOVEMBER 4, 2015

by OMKAR MURALIDHARAN Many machine learning applications have some kind of regression at their core, so understanding large-scale regression systems is important. But most common machine learning methods don’t give posteriors, and many don’t have explicit probability models. Figure 4 shows the results of such a test.

KDD

KDD Testing Machine Learning Measurement

Deep Learning Illustrated: Building Natural Language Processing Models

Domino Data Lab

AUGUST 22, 2019

word2vec is an unsupervised learning technique—that is, it is applied to a corpus of natural language without making use of any labels that may or may not happen to exist for the corpus. Note: A test set of 19,500 such analogies was developed by Tomas Mikolov and his colleagues in their 2013 word2vec paper. Note: Mikolov, T.,

Deep Learning

Deep Learning Modeling Metrics Testing

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

BizAcuity

MAY 10, 2022

2013: Google launches Google Compute Engine (IaaS), its own version of EC2. Microsoft launches Azure ML Studio for machine learning capabilities on the cloud. AWS rolls out SageMaker, designed to build, train, test and deploy machine learning (ML) models. Google releases Kubernetes.

Data-driven

Data-driven IoT Unstructured Data Data Lake

Towards optimal experimentation in online systems

The Unofficial Google Data Science Blog

APRIL 23, 2024

Multiparameter experiments, however, generate richer data than standard A/B tests, and automated t-tests alone are insufficient to analyze them well. We use PrePost in most of our A/B tests, so we have pre-experiment metric measurements readily available that we can use as covariates in our models. Springer Netherlands, 2013. [16]

Experimentation

Experimentation Optimization Uncertainty Metrics

Data Science at The New York Times

Domino Data Lab

JULY 9, 2019

Wiggins advocated that data scientists find problems that impact the business; re-frame the problem as a machine learning (ML) task; execute on the ML task; and communicate the results back to the business in an impactful way. I still believe that data science is the craft of trying to apply machine learning to some real world problem.

Data Science

Data Science Machine Learning Advertising Modeling

Schrodinger’s Automation in AI and the Automation Bias

Jen Stirrup

FEBRUARY 1, 2023

The effects of AI will be magnified in the coming decade as manufacturing, retailing, transportation, finance, health care, law, advertising, insurance, entertainment, education, and virtually every other industry transform their core processes and business models to take advantage of machine learning.

Recreation/Entertainment

Recreation/Entertainment Testing Advertising Insurance

Operationalizing responsible AI principles for defense

IBM Big Data Hub

FEBRUARY 22, 2024

Reliable “The Department’s AI capabilities will have explicit, well-defined uses, and the safety, security, and effectiveness of such capabilities will be subject to testing and assurance within those defined uses across their entire life cycles.” This is misguided. But it is well worth the effort.

Metadata

Metadata Measurement Risk Modeling

The AIgent: Using Google’s BERT Language Model to Connect Writers & Representation

Insight

MARCH 12, 2020

In 2013, Robert Galbraith?—?an I tested several different flavors of BERT for use as synopsis classifiers before settling on the DistilBERT model from Hugging Face. On my test set, this approach resulted in~75–95% accuracy and ~.65 Test Case: Dune Let’s see an example of genre tag prediction in action.

Modeling

Modeling Metadata Publishing Sales

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Companies like Tableau (which raised over $250 million when it had its IPO in 2013) demonstrated an unmet need in the market. Later on, you’ll appreciate being able to test ideas and leverage best practices as your needs evolve. Users’ varied needs require a shift in traditional BI thinking. Their dashboards were visually stunning.

Analytics

Analytics Cost-Benefit Visualization Dashboards

How search accelerates your path to “AI first”

CIO Business Intelligence

MARCH 26, 2025

The combination of AI and search enables new levels of enterprise intelligence, with technologies such as natural language processing (NLP), machine learning (ML)-based relevancy, vector/semantic search, and large language models (LLMs) helping organizations finally unlock the value of unanalyzed data.

Knowledge Discovery

Knowledge Discovery Cost-Benefit Enterprise Modeling

Data Leaders Brief

Why you should care about debugging machine learning models

DataKitchen’s 2020 Honors & Awards

Webinars

Trending Sources

Recap of Amazon Redshift key product announcements in 2024

Webinars

Overcoming Common Challenges in Natural Language Processing

FINRA CIO Steve Randich pushes the public cloud forward

Protein similarity search using ProtT5-XL-UniRef50 and Amazon OpenSearch Service

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Build a RAG data ingestion pipeline for large-scale ML workloads

How Wallapop improved performance of analytics workloads with Amazon Redshift Serverless and data sharing

Data Drift Detection for Image Classifiers

PODCAST: COVID19 | Redefining Digital Enterprises – Episode 12: How AI is rapidly transforming the enterprise landscape in the post-COVID world

Dresner’s Point: Don’t Overlook the Zigzagging of Collaboration & Text Analytics

Credit Card Fraud Detection using XGBoost, SMOTE, and threshold moving

The Value of Data for Philanthropy

Using Empirical Bayes to approximate posteriors for large "black box" estimators

Deep Learning Illustrated: Building Natural Language Processing Models

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

Towards optimal experimentation in online systems

Data Science at The New York Times

Schrodinger’s Automation in AI and the Automation Bias

Operationalizing responsible AI principles for defense

The AIgent: Using Google’s BERT Language Model to Connect Writers & Representation

What Is Embedded Analytics?

How search accelerates your path to “AI first”

Stay Connected