Experimentation, Metadata and Modeling

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

Instead of writing code with hard-coded algorithms and rules that always behave in a predictable manner, ML engineers collect a large number of examples of input and output pairs and use them as training data for their models. The model is produced by code, but it isn’t code; it’s an artifact of the code and the training data.

Management

Management Machine Learning Experimentation Metrics

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

While generative AI has been around for several years , the arrival of ChatGPT (a conversational AI tool for all business occasions, built and trained from large language models) has been like a brilliant torch brought into a dark room, illuminating many previously unseen opportunities.

Strategy

Strategy Experimentation Uncertainty Machine Learning

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

EUROGATEs data science team aims to create machine learning models that integrate key data sources from various AWS accounts, allowing for training and deployment across different container terminals. From here, the metadata is published to Amazon DataZone by using AWS Glue Data Catalog. This process is shown in the following figure.

IoT

IoT Machine Learning Metadata Data-driven

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Introducing Amazon MWAA micro environments for Apache Airflow

AWS Big Data

NOVEMBER 19, 2024

Customers maintain multiple MWAA environments to separate development stages, optimize resources, manage versions, enhance security, ensure redundancy, customize settings, improve scalability, and facilitate experimentation. This approach offers greater flexibility and control over workflow management. The introduction of mw1.micro

Metadata

Metadata Cost-Benefit Metrics Optimization

AI adoption in the enterprise 2020

O'Reilly on Data

MARCH 18, 2020

Whether it’s controlling for common risk factors—bias in model development, missing or poorly conditioned data, the tendency of models to degrade in production—or instantiating formal processes to promote data governance, adopters will have their work cut out for them as they work to establish reliable AI production lines.

Enterprise

Enterprise Deep Learning Data Governance Risk

What Are ChatGPT and Its Friends?

O'Reilly on Data

MARCH 23, 2023

It’s important to understand that ChatGPT is not actually a language model. It’s a convenient user interface built around one specific language model, GPT-3.5, is one of a class of language models that are sometimes called “large language models” (LLMs)—though that term isn’t very helpful. with specialized training.

IT

IT Modeling Testing Risk

How Far We Can Go with GenAI as an Information Extraction Tool

Ontotext

JANUARY 10, 2025

Generative AI (GenAI) models, such as GPT-4, offer a promising solution, potentially reducing the dependency on labor-intensive annotation. Through iterative experimentation, we incrementally added new modules refining the prompts. BioRED performance Prompt Model P R F1 Price Latency Generic prompt GPT-4o 72 35 47.8

Informatics

Informatics Modeling Metadata Experimentation

Of Muffins and Machine Learning Models

Cloudera

FEBRUARY 16, 2022

In this example, the Machine Learning (ML) model struggles to differentiate between a chihuahua and a muffin. Will the model correctly determine it is a muffin or get confused and think it is a chihuahua? The extent to which we can predict how the model will classify an image given a change input (e.g. Model Visibility.

Machine Learning

Machine Learning Modeling Metadata Recreation/Entertainment

Announcing Domino 3.3: Datasets and Experiment Manager

Domino Data Lab

MARCH 20, 2019

Our mission at Domino is to enable organizations to put models at the heart of their business. Today we’re announcing two major new capabilities in Domino that make model development easier and faster for data scientists. This pain point is magnified in organizations with teams of data scientists working on numerous experiments.

Management

Management Experimentation Data Science Modeling

Regeneron turns to IT to accelerate drug discovery

CIO Business Intelligence

NOVEMBER 4, 2022

The company’s multicloud infrastructure has since expanded to include Microsoft Azure for business applications and Google Cloud Platform to provide its scientists with a greater array of options for experimentation. Google created some very interesting algorithms and tools that are available in AWS,” McCowan says.

Data Lake

Data Lake IT Experimentation Data-driven

AI Governance: Break open the black box

IBM Big Data Hub

OCTOBER 4, 2022

It is well known that Artificial Intelligence (AI) has progressed, moving past the era of experimentation. Multiple unsupported tools for building and deploying models. Consistent principles guiding the design, development, deployment and monitoring of models are critical in driving responsible, trustworthy AI.

Metadata

Metadata Risk Management Risk Experimentation

Themes and Conferences per Pacoid, Episode 11

Domino Data Lab

JULY 2, 2019

Paco Nathan ‘s latest article covers program synthesis, AutoPandas, model-driven data queries, and more. In other words, using metadata about data science work to generate code. Using ML models to search more effectively brought the search space down to 102—which can run on modest hardware. Model-Driven Data Queries.

Metadata

Metadata Data Science Machine Learning Data-driven

How the DataRobot AI Platform Is Delivering Value-Driven AI

DataRobot Blog

MARCH 16, 2023

Why model-driven AI falls short of delivering value Teams that just focus model performance using model-centric and data-centric ML risk missing the big picture business context. We are also thrilled to share the innovations and capabilities that we have developed at DataRobot to meet and exceed those requirements.

Experimentation

Experimentation Data-driven Modeling Metadata

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

An Amazon DataZone domain contains an associated business data catalog for search and discovery, a set of metadata definitions to decorate the data assets that are used for discovery purposes, and data projects with integrated analytics and ML tools for users and groups to consume and publish data assets.

Metadata

Metadata Data Lake Publishing Data Governance

6 Case Studies on The Benefits of Business Intelligence And Analytics

datapine

JANUARY 31, 2022

They’re about having the mindset of an experimenter and being willing to let data guide a company’s decision-making process. To do so, the company started by defining the goals, and finding a way to translate employees’ behavior and experience into data, so as to model against actual outcomes.

Business Intelligence

Business Intelligence Analytics Cost-Benefit ROI

Behind the scenes: The daily impact of genAI at Hamburg’s largest gaming company

CIO Business Intelligence

DECEMBER 10, 2024

Companies in various industries are now relying on artificial intelligence (AI) to work more efficiently and develop new, innovative products and business models. We encourage our teams to experiment with different AI models and platforms and explore new application fields. The games industry is no exception. The KAWAII frontend.

Data-driven

Data-driven Metadata Interactive KPI

The Lean Analytics Cycle: Metrics > Hypothesis > Experiment > Act

Occam's Razor

APRIL 8, 2013

Let's listen in as Alistair discusses the lean analytics model… The Lean Analytics Cycle is a simple, four-step process that shows you how to improve a part of your business. Another way to find the metric you want to change is to look at your business model. The business model also tells you what the metric should be.

Metrics

Metrics Analytics KPI Key Performance Indicator

AI in Analytics: The NLQ Use Case

Sisense

JULY 24, 2019

NLQ serves those users who are in a rush, or who lack the skills or permissions to model their data using visualization tools or code editors. Last, and still a very painful challenge for most users, is the familiarity with the underlying data and data model.

Analytics

Analytics Experimentation Metadata Big Data

What is a data scientist? A key data analytics role and a lucrative career

CIO Business Intelligence

MARCH 21, 2022

The more high-quality data available to data scientists, the more parameters they can include in a given model, and the more data they will have on hand for training their models. It doesn’t conform to a data model but does have associated metadata that can be used to group it. Semi-structured data falls between the two.

Unstructured Data

Unstructured Data Data Analytics Analytics Data Science

Bring light to the black box

IBM Big Data Hub

MAY 9, 2023

It is well known that Artificial Intelligence (AI) has progressed, moving past the era of experimentation to become business critical for many organizations. Success in delivering scalable enterprise AI necessitates the use of tools and processes that are specifically made for building, deploying, monitoring and retraining AI models.

Metadata

Metadata Risk Experimentation Dashboards

Improving Multi-tenancy with Virtual Private Clusters

Cloudera

JUNE 6, 2019

While this approach provides isolation, it creates another significant challenge: duplication of data, metadata, and security policies, or ‘split-brain’ data lake. Now the admins need to synchronize multiple copies of the data and metadata and ensure that users across the many clusters are not viewing stale information.

Metadata

Metadata Data Lake Optimization Strategy

MNIST Expanded: 50,000 New Samples Added

Domino Data Lab

JUNE 13, 2019

Many data scientists and researchers have used the MNIST test set of 10,000 samples for training and testing models for over 20 years. 2018 , 2019 ], the rediscovery of the 50,000 lost MNIST test digits provides an opportunity to quantify the degradation of the official MNIST test set over a quarter-century of experimental research.”

Testing

Testing Data Science Experimentation Metadata

Amazon OpenSearch Service search enhancements: 2023 roundup

AWS Big Data

JANUARY 9, 2024

Now users seek methods that allow them to get even more relevant results through semantic understanding or even search through image visual similarities instead of textual search of metadata. Traditional lexical search, based on term frequency models like BM25, is widely used and effective for many search applications.

Visualization

Visualization Cost-Benefit Modeling Machine Learning

RDF-star Implementation in GraphDB and How Synaptica Used It Within Graphite for Access Control

Ontotext

MARCH 29, 2021

Vassil Momtchev: RDF-star (formerly known as RDF*) helps in every case, where the user needs to express a complex relationship with metadata associated for a triple like: 1. << Technically speaking, RDF-star is the syntactic sugar, which makes it easier to attach metadata to edges in the graph. source :TheNationalEnquirer ; 3.

Metadata

Metadata IT Modeling Experimentation

Shutterstock capitalizes on the cloud’s cutting edge

CIO Business Intelligence

MARCH 6, 2023

Advancements in analytics and AI as well as support for unstructured data in centralized data lakes are key benefits of doing business in the cloud, and Shutterstock is capitalizing on its cloud foundation, creating new revenue streams and business models using the cloud and data lakes as key components of its innovation platform.

Data Lake

Data Lake Cost-Benefit Recreation/Entertainment Unstructured Data

On the Hunt for Patterns: from Hippocrates to Supercomputers

Ontotext

MAY 18, 2020

Ever since Hippocrates founded his school of medicine in ancient Greece some 2,500 years ago, writes Hannah Fry in her book Hello World: Being Human in the Age of Algorithms , what has been fundamental to healthcare (as she calls it “the fight to keep us healthy”) was observation, experimentation and the analysis of data. Certainly not!

Knowledge Discovery

Knowledge Discovery Experimentation Data-driven Metadata

What’s new with Amazon MWAA support for Apache Airflow version 2.4.3

AWS Big Data

MAY 2, 2023

Removal of experimental Smart Sensors. This feature is particularly useful if you want to externally process various files, evaluate multiple machine learning models, or extraneously process a varied amount of data based on a SQL request. Apache Airflow v2.4.3 Airflow v2.4.0 Smart Sensors were added in v2.0 and have now been removed.

Testing

Testing Experimentation Management Metadata

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

AWS Big Data

JUNE 12, 2024

By using infrastructure as code (IaC) tools, ODP enables self-service data access with unified data management, metadata management (data catalog), and standard interfaces for analytics tools with a high degree of automation by providing the infrastructure, integrations, and compliance measures out of the box.

Data Architecture

Data Architecture Cost-Benefit Data-driven Experimentation

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

Amazon SageMaker is used to build, train, and deploy a range of ML models. Additionally, SageMaker training jobs are employed for training the models. After the models are trained, they are deployed and used to identify anomalies and alert customers in real time to potential security threats.

Data Lake

Data Lake Analytics Snapshot Data Quality

Four starting points to transform your organization into a data-driven enterprise

IBM Big Data Hub

JANUARY 17, 2023

Furthermore, a global effort to create new data privacy laws, and the increased attention on biases in AI models, has resulted in convoluted business processes for getting data to users. The automated metadata generation is essential to turn a manual process into one that is better controlled. AI is no longer experimental.

Data-driven

Data-driven Enterprise Data Governance Data Science

6 DataOps Best Practices to Increase Your Data Analytics Output AND Your Data Quality

Octopai

OCTOBER 26, 2022

When DataOps principles are implemented within an organization, you see an increase in collaboration, experimentation, deployment speed and data quality. A wheel should be a standardized part that you don’t have to think twice about before you incorporate it into a new car model. Let’s take a look. Six DataOps best practices.

Data Quality

Data Quality Data Analytics Analytics Manufacturing

Enabling Self-Service Business Insights with Cloudera Data Warehouse

Cloudera

JANUARY 11, 2021

And we want to model it quickly with some historic customer usage data…and oh yeah, it should be about 100TB, per day.”. Provides a pay-as-you-go model. . Experimental and production workloads access the same data without users impacting each others’ SLAs. We have this new data set, actually it is sensor data.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Machine Learning

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

AWS Big Data

JULY 3, 2023

With scalable metadata indexing, Apache Iceberg is able to deliver performant queries to a variety of engines such as Spark and Athena by reducing planning time. Our model portfolio will buy stocks that are added to the index, known as going long, and will sell an equivalent amount of stocks removed from the index, known as going short.

Snapshot

Snapshot Data Lake Testing Strategy

Ontotext Expands To Help More Enterprises Turn Their Data into Competitive Advantage

Ontotext

DECEMBER 30, 2022

9 years of research, prototyping and experimentation went into developing enterprise ready Semantic Technology products. Metadata Studio – our new product for streamlining the development and operation of solutions involving text analysis. Ontotext develops re-usable domain models as pre-packaged knowledge graphs.

Enterprise

Enterprise Sales Cost-Benefit Marketing

Introducing the vector engine for Amazon OpenSearch Serverless, now in preview

AWS Big Data

JULY 26, 2023

This enables you to process a user’s query to find the closest vectors and combine them with additional metadata without relying on external data sources or additional application code to integrate the results. We recognize that many of you are in the experimentation phase and would like a more economical option for dev-test.

Metadata

Metadata Cost-Benefit Testing Metrics

Bringing an AI Product to Market

O'Reilly on Data

JULY 28, 2020

Without clarity in metrics, it’s impossible to do meaningful experimentation. AI PMs must ensure that experimentation occurs during three phases of the product lifecycle: Phase 1: Concept During the concept phase, it’s important to determine if it’s even possible for an AI product “ intervention ” to move an upstream business metric.

Marketing

Marketing Experimentation Metrics Testing

How to build a safe path to AI in Healthcare

CIO Business Intelligence

AUGUST 5, 2024

Healthcare Domain Expertise: It cannot be said enough that anyone developing AI-driven models for healthcare needs to understand the unique use cases and stringent data security and privacy requirements – and the detailed nuances of how this information will be used – in the specific healthcare setting where the technology will be deployed.

Experimentation

Experimentation Risk Metadata Data-driven

Prioritizing AI? Don’t shortchange IT fundamentals

CIO Business Intelligence

FEBRUARY 14, 2024

That’s not just about the cost of preparing a larger data set than you need, which takes expertise that’s still uncommon and commands a high salary, but also what you’re teaching the model. Do you want to have an even more powerful search capability with AI in your data, and to be unsure about how you’ve organized that data?”

IT

IT Metadata Data-driven Management

Real-Real-World Programming with ChatGPT

O'Reilly on Data

JULY 25, 2023

I’m a professor who is interested in how we can use LLMs (Large Language Models) to teach programming. Here’s how I worked on it: I subscribed to ChatGPT Plus and used the GPT-4 model in ChatGPT (first the May 12, 2023 version, then the May 24 version) to help me with design and implementation.

Consulting

Consulting Interactive Software Metadata

How to get powerful and actionable insights from any and all of your data, without delay

Cloudera

SEPTEMBER 17, 2020

A large oil and gas company was suffering over not being able to offer users an easy and fast way to access the data needed to fuel their experimentation. To address this, they focused on creating an experimentation-oriented culture, enabled thanks to a cloud-native platform supporting the full data lifecycle.

Data Warehouse

Data Warehouse Experimentation Dashboards Visualization

The AIgent: Using Google’s BERT Language Model to Connect Writers & Representation

Insight

MARCH 12, 2020

The AIgent was built with BERT, Google’s state-of-the-art language model. In this article, I will discuss the construction of the AIgent, from data collection to model assembly. Data Collection The AIgent leverages book synopses and book metadata. To build the AIgent, I started with synopses and metadata from 100,000 books.

Modeling

Modeling Metadata Publishing Sales

Accelerate lightweight analytics using PyIceberg with AWS Lambda and an AWS Glue Iceberg REST endpoint

AWS Big Data

MAY 9, 2025

Representative use case The following are common scenarios where PyIceberg can be particularly useful: Data science experimentation and feature engineering In data science, experiment reproducibility is crucial for maintaining reliable and efficient analyses and models. For more details, see Getting started with AWS CloudShell.

Snapshot

Snapshot Analytics Data-driven Data Processing

A Field Guide to Rapidly Improving AI Products

O'Reilly on Data

APRIL 15, 2025

Even small UX decisionslike where to place metadata or which filters to exposecan make the difference between a tool people actually use and one they avoid. The most successful teams flip this model by giving domain experts tools to write and iterate on prompts directly. Our model suffers from hallucination issues.

Experimentation

Experimentation Testing Metrics Measurement

Your data’s wasted without predictive AI. Here’s how to fix that

CIO Business Intelligence

MAY 6, 2025

Predictive analytics: Turning insight into foresight Predictive analytics uses historical data and statistical models or machine learning algorithms to answer the question, What is likely to happen? If your models are trained on inconsistent, incomplete or inaccurate data, the results will be flawed no matter how advanced the algorithm.

Prescriptive Analytics

Prescriptive Analytics Predictive Analytics Descriptive Analytics ROI

What you need to know about product management for AI

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Webinars

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Introducing Amazon MWAA micro environments for Apache Airflow

AI adoption in the enterprise 2020

What Are ChatGPT and Its Friends?

How Far We Can Go with GenAI as an Information Extraction Tool

Of Muffins and Machine Learning Models

Announcing Domino 3.3: Datasets and Experiment Manager

Regeneron turns to IT to accelerate drug discovery

AI Governance: Break open the black box

Themes and Conferences per Pacoid, Episode 11

How the DataRobot AI Platform Is Delivering Value-Driven AI

Unlock data across organizational boundaries using Amazon DataZone – now generally available

6 Case Studies on The Benefits of Business Intelligence And Analytics

Behind the scenes: The daily impact of genAI at Hamburg’s largest gaming company

The Lean Analytics Cycle: Metrics > Hypothesis > Experiment > Act

AI in Analytics: The NLQ Use Case

What is a data scientist? A key data analytics role and a lucrative career

Bring light to the black box

Improving Multi-tenancy with Virtual Private Clusters

MNIST Expanded: 50,000 New Samples Added

Amazon OpenSearch Service search enhancements: 2023 roundup

RDF-star Implementation in GraphDB and How Synaptica Used It Within Graphite for Access Control

Shutterstock capitalizes on the cloud’s cutting edge

On the Hunt for Patterns: from Hippocrates to Supercomputers

What’s new with Amazon MWAA support for Apache Airflow version 2.4.3

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Four starting points to transform your organization into a data-driven enterprise

6 DataOps Best Practices to Increase Your Data Analytics Output AND Your Data Quality

Enabling Self-Service Business Insights with Cloudera Data Warehouse

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

Ontotext Expands To Help More Enterprises Turn Their Data into Competitive Advantage

Introducing the vector engine for Amazon OpenSearch Serverless, now in preview

Bringing an AI Product to Market

How to build a safe path to AI in Healthcare

Prioritizing AI? Don’t shortchange IT fundamentals

Real-Real-World Programming with ChatGPT

How to get powerful and actionable insights from any and all of your data, without delay

The AIgent: Using Google’s BERT Language Model to Connect Writers & Representation

Accelerate lightweight analytics using PyIceberg with AWS Lambda and an AWS Glue Iceberg REST endpoint

A Field Guide to Rapidly Improving AI Products

Your data’s wasted without predictive AI. Here’s how to fix that

Stay Connected