Data Leaders Brief

The Quality of Auto-Generated Code

O'Reilly on Data

OCTOBER 12, 2021

Kevlin Henney and I were riffing on some ideas about GitHub Copilot , the tool for automatically generating code base on GPT-3’s language model, trained on the body of code that’s in GitHub. 40 years ago, we might have cared about the assembly language code generated by a compiler. Does code quality improve?

Testing

Testing Measurement Consulting Modeling

What’s Next for AI and Sales?

David Menninger's Analyst Perspectives

MAY 21, 2025

Early tools applied rudimentary machine learning (ML) models to customer relationship management (CRM) exports, assigning win probability scores or advising on the ideal time to call. To ensure success, implementation discipline matters as much as model sophistication. The root cause of the problem came down to data quality.

Sales

Sales Measurement Data-driven Dashboards

Have we reached the end of ‘too expensive’ for enterprise software?

CIO Business Intelligence

JANUARY 9, 2025

Generative artificial intelligence ( genAI ) and in particular large language models ( LLMs ) are changing the way companies develop and deliver software. Instead of manually entering specific parameters, users will increasingly be able to describe their requirements in natural language.

Software

Software Enterprise Key Performance Indicator Machine Learning

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Lessons learned building natural language processing systems in health care

O'Reilly on Data

MARCH 7, 2019

NLP systems in health care are hard—they require broad general and medical knowledge, must handle a large variety of inputs, and need to understand context. We’re in an exciting decade for natural language processing (NLP). Meet the language of emergency room triage notes. Yes, emergency rooms have their own language.

Deep Learning

Deep Learning Testing Machine Learning Modeling

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

While generative AI has been around for several years , the arrival of ChatGPT (a conversational AI tool for all business occasions, built and trained from large language models) has been like a brilliant torch brought into a dark room, illuminating many previously unseen opportunities.

Strategy

Strategy Experimentation Uncertainty Machine Learning

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

These data processing and analytical services support Structured Query Language (SQL) to interact with the data. Large language model (LLM)-based generative AI is a new technology trend for comprehending a large corpora of information and assisting with complex tasks. Can it also help write SQL queries?

Metadata

Metadata Data Lake Modeling Data Warehouse

5 top business use cases for AI agents

CIO Business Intelligence

MARCH 19, 2025

And in a January survey by KPMG of 100 senior executives at large enterprises, 12% of companies are already deploying AI agents, 37% are in pilot stages, and another 51% are exploring their use. Meanwhile, in December, OpenAIs new O3 model, an agentic model not yet available to the public, scored 72% on the same test.

Software

Software Risk Enterprise Cost-Benefit

How Far We Can Go with GenAI as an Information Extraction Tool

Ontotext

JANUARY 10, 2025

Generative AI (GenAI) models, such as GPT-4, offer a promising solution, potentially reducing the dependency on labor-intensive annotation. Beyond knowledge graph building, NER supports use cases such as natural language querying (NLQ) , where accurate entity recognition improves search accuracy and user experience. sec Llama 87.4

Informatics

Informatics Modeling Metadata Experimentation

Accelerating Industry 4.0 at warp speed: The role of GenAI at the factory edge

CIO Business Intelligence

APRIL 16, 2024

While warp speed is a fictional concept, it’s an apt way to describe what generative AI (GenAI) and large language models (LLMs) are doing to exponentially accelerate Industry 4.0. By leveraging a GenAI-fueled edge with small language models (SLMs), this lengthy work will be streamlined and simplified.

Manufacturing

Manufacturing Digital Transformation Risk Dashboards

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

AWS Big Data

NOVEMBER 19, 2024

The process starts by creating a vector based on the question (embedding) by invoking the embedding model. Pre-filtered documents that relate to the user query are included in the prompt of the large language model (LLM) that summarizes the answer.

Management

Management Metadata Manufacturing Testing

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

These innovations run AI search flows to uncover relevant information through semantic, cross-language, and content understanding; adapt information ranking to individual behaviors; and enable guided conversations to pinpoint answers. This template requires us to select a text embedding model. that can operate on text and images.

Machine Learning

Machine Learning Visualization Dashboards Metadata

The Power of Graph Databases, Linked Data, and Graph Algorithms

Rocket-Powered Data Science

MARCH 10, 2020

And this: perhaps the most powerful node in a graph model for real-world use cases might be “context”. How does one express “context” in a data model? After all, the standard relational model of databases instantiated these types of relationships in its very foundation decades ago: the ERD (Entity-Relationship Diagram).

Metadata

Metadata Machine Learning Prescriptive Analytics ROI

Four things that matter in the AI hype cycle

CIO Business Intelligence

OCTOBER 24, 2023

The capabilities of these new generative AI tools, most of which are powered by large language models (LLM), forced every company and employee to rethink how they work. Vector Databases To make use of a Large Language Model, you’re going to need to vectorize your data.

Cost-Benefit

Cost-Benefit Modeling Data Quality Statistics

AI at the retail edge: What’s new, and what’s coming soon

CIO Business Intelligence

APRIL 11, 2024

They can even make context-relevant suggestions for upsells in natural language: “ You know if you want the meal deal, I can sub in some rings instead of fries for you.” RFID tags have been around for decades and now cost just pennies. RFID tags combined with GenAI can be used for inventory tracking, loss prevention, and stocking.

Cost-Benefit

Cost-Benefit Technology Sales Visualization

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

Unstructured data is information that doesn’t conform to a predefined schema or isn’t organized according to a preset data model. The following are the most commonly used services for unstructured data processing: Amazon Comprehend – This natural language processing (NLP) service uses ML to extract metadata from text data.

Unstructured Data

Unstructured Data Metadata Management Analytics

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

To remain at the forefront of quantitative investing, CFM has put in place a large-scale data acquisition strategy. Some datasets require large or specific compute capabilities that we can’t afford to buy if the trial is a failure. CFM data scientists then look up the data and build features that can be used in our trading models.

Interactive

Interactive Strategy Cost-Benefit Data Governance

Demystifying Multimodal LLMs

Dataiku

MARCH 25, 2024

This scenario is not science fiction but a glimpse into the capabilities of Multimodal Large Language Models (M-LLMs), where the convergence of various modalities extends the landscape of AI. M-LLMs are well suited to tackle VQA due to their ability to process and fuse information from both textual and visual modalities.

Visualization

Visualization Modeling Experimentation Testing

8 tips for unleashing the power of unstructured data

CIO Business Intelligence

NOVEMBER 28, 2023

These tools categorize and tag various elements of the artwork, whether it’s a character, landscape, or some other element. The company also uses large language models (LLMs) to summarize recognition trends over time and to suggest language for an effective recognition message. “One

Unstructured Data

Unstructured Data Data-driven Visualization Data Quality

From charred scrolls to customer sentiment: How AI helps you monetize your unstructured data

CIO Business Intelligence

SEPTEMBER 12, 2024

One reason is that documents, medical records, emails, images, video, and audio and so on, are almost impossible to prepare, manage, and use in AI applications before recent technological strides in areas such as AI, computer vision, and large language models such as those used in generative AI.

Unstructured Data

Unstructured Data Deep Learning Metadata Structured Data

Generative AI upskilling can help future-proof your company

CIO Business Intelligence

FEBRUARY 26, 2024

Education starts with prompt engineering, the art and science of framing prompts that steer Large Language Models (LLMs) towards desired outputs. Learning the proper coding prompts can help software developers use LLMs to create and debug software , as well as increase their skills working with natural language processing (NLP).

Software

Software Modeling Technology Marketing

Enterprise IT moves forward — cautiously — with generative AI

CIO Business Intelligence

MARCH 7, 2023

OpenAI’s text-generating ChatGPT, along with its image generation cousin DALL-E, are the most prominent among a series of large language models, also known as generative language models or generative AI, that have captured the public’s imagination over the last year. That’s incredibly powerful.”

Enterprise

Enterprise IT Unstructured Data Experimentation

Data Science Journey Walkthrough – From Beginner to Expert

Smart Data Collective

JUNE 4, 2021

Data scientists use algorithms for creating data models. These data models predict outcomes of new data. Programming Language (R or Python). Programming knowledge is needed for the typical tasks of transforming data, creating graphs, and creating data models. For academics and domain experts, R is the preferred language.

Data Science

Data Science Statistics Deep Learning Machine Learning

Semantization of Regulatory Documents in AECO

Ontotext

NOVEMBER 29, 2024

But even though technologies like Building Information Modelling (BIM) have finally introduced symbolic representation, in many ways, AECO still clings to outdated, analog practices and documents. Here, one of the challenges involves digitizing the national specifics of regulatory documents and building codes in multiple languages.

Modeling

Modeling Structured Data Technology Data Transformation

Enterprise-class NLP with spaCy v3

Domino Data Lab

FEBRUARY 19, 2021

spaCy is a python library that provides capabilities to conduct advanced natural language processing analysis and build models that can underpin document analysis, chatbot capabilities, and all other forms of text analysis. brings many improvements to help build, configure and maintain your NLP models, including. import spacy.

Enterprise

Enterprise Data Science Modeling Visualization

New Data Cloud features to boost Salesforce’s AI agents

CIO Business Intelligence

SEPTEMBER 17, 2024

Most enterprise data traditionally, according to Nucleus Research CEO Ian Campbell, stores structured in tables or spreadsheets, but a large amount of valuable information exists in unstructured formats like video, audio, and text. This ensures faster, more accurate customer interactions.

Unstructured Data

Unstructured Data Enterprise Software Metadata

How Chime Financial uses AWS to build a serverless stream analytics platform and defeat fraudsters

AWS Big Data

SEPTEMBER 19, 2023

It was designed to manage complex queries and business intelligence (BI) use cases on a large scale. We decided to explore streaming analytics solutions where we can capture, transform, and store event streams at scale, and serve rule-based fraud detection models and machine learning (ML) models with milliseconds latency.

Analytics

Analytics Risk Big Data Machine Learning

Build a real-time analytics solution with Apache Pinot on AWS

AWS Big Data

AUGUST 6, 2024

Pinot has been tested at very large scale in large enterprises, serving over 70 LinkedIn data products , handling over 120,000 Queries Per Second (QPS), ingesting over 1.5 In Apache Pinot, tables are tagged with an identifier that’s used for routing queries to the appropriate servers. First, bootstrap the AWS CDK.

OLAP

OLAP Analytics Visualization Dashboards

Microsoft and Cognizant team up to boost enterprise Copilot adoption

CIO Business Intelligence

APRIL 24, 2024

Many see the high price tag of initial offerings and a lack of understanding as to how workflows will need to be adjusted to truly capture the value of copilots as deterrents to signing on for the added functionality. The announcement comes amid reluctance among some CIOs regarding the ROI of generative AI copilots.

Enterprise

Enterprise Digital Transformation Consulting Uncertainty

How Useful is ChatGPT for Data Visualisation Work?

The Data Visualisation Catalogue

JANUARY 24, 2023

For those unaware, ChatGPT is a large language model developed by OpenAI. Probably it would have been better here to write that Bar Charts can’t handle large datasets well. Limited knowledge ChatGPT is a machine learning model that was trained on a dataset of text up to the year 2021.

Visualization

Visualization Interactive Data-driven Testing

Natural Language in Python using spaCy: An Introduction

Domino Data Lab

SEPTEMBER 9, 2019

This article provides a brief introduction to natural language using spaCy and related libraries in Python. This article and paired Domino project provide a brief introduction to working with natural language (sometimes called “text analytics”) in Python using spaCy and related libraries. Introduction. Getting Started.

Deep Learning

Deep Learning Machine Learning Data Science Visualization

Foundational models at the edge

IBM Big Data Hub

SEPTEMBER 20, 2023

Foundational models (FMs) are marking the beginning of a new era in machine learning (ML) and artificial intelligence (AI) , which is leading to faster development of AI that can be adapted to a wide range of downstream tasks and fine-tuned for an array of applications. What are large language models?

Modeling

Modeling Enterprise Software Data Science

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

Analytics is the means for discovering those insights, and doing it well requires the right tools for ingesting and preparing data, enriching and tagging it, building and sharing reports, and managing and protecting your data and insights. For many enterprises, Microsoft Azure has become a central hub for analytics. Azure Data Factory.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

5 IT risks CIOs should be paranoid about

CIO Business Intelligence

JULY 23, 2024

Robin Roacho, lead FinOps financial analyst at SADA, says, “CIOs should be mindful of increasing cloud costs without clear justification,” and recommends: When establishing cost ownership, ensure that resources are labeled and tagged. Confirm that the financial models accurately explain budget-to-actual variances.

Risk

Risk IT Risk Management Cost-Benefit

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Working with large language models (LLMs) for enterprise use cases requires the implementation of quality and privacy considerations to drive responsible AI. From the raw zone in Amazon S3, the objects need to be processed before they can be consumed by downstream generative AI models.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Star Wars: Knowledge Graph Federation

Ontotext

JUNE 24, 2020

A single GraphQL model acts as a language and grammar that aids communication between developers, domain experts and clients. Web Annotation GraphQL Service. Find Annotations with Droid tags. As you model and engineer a large knowledge graph it can become very difficult to manage complexity. Count Dooku.

Modeling

Modeling Technology IT Management

Open Data Science and Machine Learning for Business with Cloudera Data Science Workbench on HDP

Cloudera

JANUARY 30, 2019

Trusted by large data science teams across hundreds of enterprises —. Cloudera Data Science Workbench is a web-based application that allows data scientists to use their favorite open source libraries and languages — including R, Python, and Scala — directly in secure environments, accelerating analytics projects from research to production.

Data Science

Data Science Machine Learning Experimentation Visualization

7 dark secrets of generative AI

CIO Business Intelligence

SEPTEMBER 12, 2023

They conjure mistakes out of thin air There’s something almost magical about the way large language models (LLMs) write 1,000-word essays on obscure topics like the mating rituals of sand cranes or the importance of crenulations in 17th century Eastern European architecture.

Enterprise

Enterprise Consulting Technology IT

Getting ready for artificial general intelligence with examples

IBM Big Data Hub

APRIL 18, 2024

AI systems like LaMDA and GPT-3 excel at generating human-quality text, accomplishing specific tasks, translating languages as needed, and creating different kinds of creative content. Achieving these feats is accomplished through a combination of sophisticated algorithms, natural language processing (NLP) and computer science principles.

Cost-Benefit

Cost-Benefit Manufacturing Modeling Interactive

Ontotext Marketing Gets a Boost from Knowledge Graph Powered LLMs

Ontotext

MARCH 13, 2024

OTKG models information about Ontotext, combined with content produced by different teams inside the organization. Our standard methodology for such projects is to start by defining competency questions that would help us understand what we need to model in our graph. This is graph-based tagging, so the mentions are not just keywords.

Marketing

Marketing Knowledge Discovery Metadata Data-driven

Leveraging AI to discover and classify your data in a complex and dynamic landscape

Laminar Security

DECEMBER 13, 2023

ML, a subset of AI, involves training models on existing data sets so they can make predictions or decisions without being explicitly programmed to do so. This advanced approach not only enhances the efficiency of detection models but also yields more insightful and valuable outcomes.

Data-driven

Data-driven Machine Learning Deep Learning Risk

Disinformation Research with @lucas_a_meyer: TDI 21

Data Science 101

OCTOBER 12, 2023

“In general, the most common use of the work I do is to remove bad stuff from the Internet or tag it as suspicious.” I mostly use U-SQL, a mix between C# and SQL that can distribute in very large clusters. In general, the most common use of the work I do is to remove bad stuff from the Internet or tag it as suspicious.

Finance

Finance Machine Learning Interactive Manufacturing

One Big Cluster Stuck: The Right Tool for the Right Job

Cloudera

JUNE 26, 2023

For data engineering teams, Airflow is regarded as the best in class tool for orchestration (scheduling and managing end-to-end workflow) of pipelines that are built using programming languages like Python and SPARK. Impala vs Spark Use Impala primarily for analytical workloads triggered by end users.

Data Processing

Data Processing Testing Visualization Data Science

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state.” What’s in a Data Lake? ” – James Dixon.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

CIO 100 Award winners prove the transformative value of IT

CIO Business Intelligence

AUGUST 15, 2023

From healthcare to manufacturing, this year’s award winners span a wide range of industries, proving once again the impact information technology has in reshaping business and society at large. In partnership with OpenAI and Microsoft, CarMax worked to develop, test, and iterate GPT-3 natural language models aimed at achieving those results.

IT

IT Manufacturing IoT Cost-Benefit

The Quality of Auto-Generated Code

What’s Next for AI and Sales?

Webinars

Trending Sources

Have we reached the end of ‘too expensive’ for enterprise software?

Webinars

Lessons learned building natural language processing systems in health care

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

5 top business use cases for AI agents

How Far We Can Go with GenAI as an Information Extraction Tool

Accelerating Industry 4.0 at warp speed: The role of GenAI at the factory edge

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

The Power of Graph Databases, Linked Data, and Graph Algorithms

Four things that matter in the AI hype cycle

AI at the retail edge: What’s new, and what’s coming soon

Unstructured data management and governance using AWS AI/ML and analytics services

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

Demystifying Multimodal LLMs

8 tips for unleashing the power of unstructured data

From charred scrolls to customer sentiment: How AI helps you monetize your unstructured data

Generative AI upskilling can help future-proof your company

Enterprise IT moves forward — cautiously — with generative AI

Data Science Journey Walkthrough – From Beginner to Expert

Semantization of Regulatory Documents in AECO

Enterprise-class NLP with spaCy v3

New Data Cloud features to boost Salesforce’s AI agents

How Chime Financial uses AWS to build a serverless stream analytics platform and defeat fraudsters

Build a real-time analytics solution with Apache Pinot on AWS

Microsoft and Cognizant team up to boost enterprise Copilot adoption

How Useful is ChatGPT for Data Visualisation Work?

Natural Language in Python using spaCy: An Introduction

Foundational models at the edge

7 key Microsoft Azure analytics services (plus one extra)

5 IT risks CIOs should be paranoid about

Data governance in the age of generative AI

Star Wars: Knowledge Graph Federation

Open Data Science and Machine Learning for Business with Cloudera Data Science Workbench on HDP

7 dark secrets of generative AI

Getting ready for artificial general intelligence with examples

Ontotext Marketing Gets a Boost from Knowledge Graph Powered LLMs

Leveraging AI to discover and classify your data in a complex and dynamic landscape

Disinformation Research with @lucas_a_meyer: TDI 21

One Big Cluster Stuck: The Right Tool for the Right Job

Data Lakes: What Are They and Who Needs Them?

CIO 100 Award winners prove the transformative value of IT

Stay Connected