Data Integration, Machine Learning and Modeling

Why you should care about debugging machine learning models

O'Reilly on Data

DECEMBER 12, 2019

For all the excitement about machine learning (ML), there are serious impediments to its widespread adoption. Not least is the broadening realization that ML models can fail. And that’s why model debugging, the art and science of understanding and fixing problems in ML models, is so critical to the future of ML.

Machine Learning

Machine Learning Modeling Testing Risk Management

Managing risk in machine learning

O'Reilly on Data

NOVEMBER 13, 2018

Considerations for a world where ML models are becoming mission critical. In this post, I share slides and notes from a keynote I gave at the Strata Data Conference in New York last September. As the data community begins to deploy more machine learning (ML) models, I wanted to review some important considerations.

Machine Learning

Machine Learning Risk Management Statistics

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

We need to do more than automate model building with autoML; we need to automate tasks at every stage of the data pipeline. In a previous post , we talked about applications of machine learning (ML) to software development, which included a tour through sample tools in data science and for managing data infrastructure.

Machine Learning

Machine Learning Software Metadata Testing

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Artificial intelligence and machine learning adoption in European enterprise

O'Reilly on Data

FEBRUARY 4, 2019

In a recent survey , we explored how companies were adjusting to the growing importance of machine learning and analytics, while also preparing for the explosion in the number of data sources. You can find full results from the survey in the free report “Evolving Data Infrastructure”.). Data Platforms.

Machine Learning

Machine Learning Enterprise IoT Big Data

Becoming a machine learning company means investing in foundational technologies

O'Reilly on Data

MAY 21, 2019

Companies successfully adopt machine learning either by building on existing data products and services, or by modernizing existing models and algorithms. In this post, I share slides and notes from a keynote I gave at the Strata Data Conference in London earlier this year. Use ML to unlock new data types—e.g.,

Machine Learning

Machine Learning Technology Deep Learning Data Science

Proposals for model vulnerability and security

O'Reilly on Data

MARCH 20, 2019

Apply fair and private models, white-hat and forensic model debugging, and common sense to protect machine learning models from malicious actors. Like many others, I’ve known for some time that machine learning models themselves could pose security risks.

Modeling

Modeling Machine Learning Predictive Modeling Consulting

How AI orchestration has become more important than the models themselves

CIO Business Intelligence

DECEMBER 10, 2024

Large language models (LLMs) just keep getting better. In just about two years since OpenAI jolted the news cycle with the introduction of ChatGPT, weve already seen the launch and subsequent upgrades of dozens of competing models. million on inference, grounding, and data integration for just proof-of-concept AI projects.

Modeling

Modeling Insurance Unstructured Data Experimentation

Digital twins at scale: Building the AI architecture that will reshape enterprise operations

CIO Business Intelligence

MAY 22, 2025

When developing AI solutions, training the model and reducing common AI problems like hallucination, data protection, privacy and unlearning the model can be costly on the real system and hence developing a digital twin solution in AI can help to simulate the real system and tune the system before deploying to productionized environments.

Enterprise

Enterprise Visualization Key Performance Indicator Machine Learning

How AI and ML Can Transform Data Integration

Smart Data Collective

OCTOBER 20, 2021

The data integration landscape is under a constant metamorphosis. In the current disruptive times, businesses depend heavily on information in real-time and data analysis techniques to make better business decisions, raising the bar for data integration. Why is Data Integration a Challenge for Enterprises?

Data Integration

Data Integration Machine Learning Big Data Statistics

The quest for high-quality data

O'Reilly on Data

JUNE 18, 2019

Machine learning solutions for data integration, cleaning, and data generation are beginning to emerge. “AI AI starts with ‘good’ data” is a statement that receives wide agreement from data scientists, analysts, and business owners. These data sets are often siloed, incomplete, and extremely sparse.

Machine Learning

Machine Learning Data Quality Statistics Modeling

Oracle Wants to Be the Database for AI

David Menninger's Analyst Perspectives

MAY 15, 2025

Oracles pricing model does not charge for input/output operations, which can be an important consideration in cloud-based workloads. Oracle was an early leader in using machine learning to provide autonomous capabilities, introducing them in 2017.

Data Lake

Data Lake Data Warehouse Machine Learning Software

Beginner’s Guide to Machine Learning Testing With DeepChecks

KDnuggets

JUNE 19, 2024

Perform data integrity tests and generate model evaluation reports by writing a few lines of code.

Testing

Testing Machine Learning Data Integration Reporting

Domo Addresses Data Products and Agentic AI

David Menninger's Analyst Perspectives

MAY 20, 2025

Additionally, as I recently explained , the companys platform addresses a broad range of capabilities that includes data governance and security, data integration and application development, as well as the automation and incorporation of artificial intelligence (AI) and machine learning (ML) models into BI and analytics.

Metrics

Metrics Data Governance Unstructured Data Data-driven

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

We have also included vendors for the specific use cases of ModelOps, MLOps, DataGovOps and DataSecOps which apply DataOps principles to machine learning, AI, data governance, and data security operations. . Dagster / ElementL — A data orchestrator for machine learning, analytics, and ETL. .

Testing

Testing Machine Learning Consulting Data Science

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

The following requirements were essential to decide for adopting a modern data mesh architecture: Domain-oriented ownership and data-as-a-product : EUROGATE aims to: Enable scalable and straightforward data sharing across organizational boundaries. Eliminate centralized bottlenecks and complex data pipelines.

IoT

IoT Machine Learning Metadata Data-driven

Bigeye Enable Monitoring, Quality and Lineage of Data

David Menninger's Analyst Perspectives

NOVEMBER 19, 2024

Bigeye’s anomaly detection capabilities rely on the automated generation of data quality thresholds based on machine learning (ML) models fueled by historical data.

Data Quality

Data Quality Dashboards Data-driven Software

What is data architecture? A framework to manage data

CIO Business Intelligence

DECEMBER 20, 2024

Data architecture definition Data architecture describes the structure of an organizations logical and physical data assets, and data management resources, according to The Open Group Architecture Framework (TOGAF). An organizations data architecture is the purview of data architects. Curate the data.

Data Architecture

Data Architecture Management Consulting Internet of Things

Core technologies and tools for AI, big data, and cloud computing

O'Reilly on Data

FEBRUARY 11, 2019

Highlights and use cases from companies that are building the technologies needed to sustain their use of analytics and machine learning. In a forthcoming survey, “Evolving Data Infrastructure,” we found strong interest in machine learning (ML) among respondents across geographic regions. Deep Learning.

Big Data

Big Data Technology Machine Learning Deep Learning

How Data Integration and Machine Learning Improve Retention Marketing

Business Over Broadway

SEPTEMBER 27, 2018

In this paper, I show you how marketers can improve their customer retention efforts by 1) integrating disparate data silos and 2) employing machine learning predictive analytics. In our world of Big Data, marketers no longer need to simply rely on their gut instincts to make marketing decisions.

Machine Learning

Machine Learning Data Integration Marketing Predictive Modeling

Are You Content with Your Organization’s Content Strategy?

Rocket-Powered Data Science

JULY 6, 2021

My favorite approach to TAM creation and to modern data management in general is AI and machine learning (ML). That is, use AI and machine learning techniques on digital content (databases, documents, images, videos, press releases, forms, web content, social network posts, etc.)

Strategy

Strategy Machine Learning Metadata Knowledge Discovery

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

They’re taking data they’ve historically used for analytics or business reporting and putting it to work in machine learning (ML) models and AI-powered applications. You’ll get a single unified view of all your data for your data and AI workers, regardless of where the data sits, breaking down your data siloes.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Top 10 Analytics And Business Intelligence Trends For 2020

datapine

NOVEMBER 27, 2019

The development of business intelligence to analyze and extract value from the countless sources of data that we gather at a high scale, brought alongside a bunch of errors and low-quality reports: the disparity of data sources and data types added some more complexity to the data integration process.

Business Intelligence

Business Intelligence Analytics Prescriptive Analytics Data Quality

IoT security: Challenges and best practices for a hyperconnected world

CIO Business Intelligence

MAY 20, 2025

Use machine learning models to monitor behavioral anomalies, detect zero-day threats and trigger rapid incident response. Distributed ledgers can secure device identities, ensure data integrity and provide immutable audit trails. Patch management is non-negotiable. AI-powered threat detection. Quantum encryption.

IoT

IoT Internet of Things Manufacturing Risk

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

AWS Big Data

MARCH 13, 2025

At AWS re:Invent 2024, we announced the next generation of Amazon SageMaker , the center for all your data, analytics, and AI. Governance features including fine-grained access control are built into SageMaker Unified Studio using Amazon SageMaker Catalog to help you meet enterprise security requirements across your entire data estate.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

These strategies, such as investing in AI-powered cleansing tools and adopting federated governance models, not only address the current data quality challenges but also pave the way for improved decision-making, operational efficiency and customer satisfaction. When financial data is inconsistent, reporting becomes unreliable.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

A Close Look at Data Mapping Automation Using Machine Learning Approaches

Octopai

JULY 17, 2022

destination fields may contain no more than 10 characters) Frequency of transfer for data integration cases (e.g. transfer data from source to target every 12 hours). If you’re aiming for uninterrupted data flow and accurate data, thorough data mapping is a critical piece of the puzzle.

Machine Learning

Machine Learning Visualization Enterprise Software

Author visual ETL flows on Amazon SageMaker Unified Studio (preview)

AWS Big Data

DECEMBER 4, 2024

From the Unified Studio, you can collaborate and build faster using familiar AWS tools for model development, generative AI, data processing, and SQL analytics. You can use a simple visual interface to compose flows that move and transform data and run them on serverless compute.

Visualization

Visualization Sales Data-driven Analytics

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Simplified data corrections and updates Iceberg enhances data management for quants in capital markets through its robust insert, delete, and update capabilities. These features allow efficient data corrections, gap-filling in time series, and historical data updates without disrupting ongoing analyses or compromising data integrity.

Metadata

Metadata Snapshot Cost-Benefit Optimization

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

Many AWS customers have integrated their data across multiple data sources using AWS Glue , a serverless data integration service, in order to make data-driven business decisions. Are there recommended approaches to provisioning components for data integration?

Data Integration

Data Integration Snapshot Testing Visualization

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

ChatGPT> DataOps is a term that refers to the set of practices and tools that organizations use to improve the quality and speed of data analytics and machine learning. It involves bringing together people, processes, and technology to enable data-driven decision making and improve the efficiency of data-related workflows.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

When dealing with third-party data sources, AWS Data Exchange simplifies the discovery, subscription, and utilization of third-party data from a diverse range of producers or providers. As a producer, you can also monetize your data through the subscription model using AWS Data Exchange.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

At Atlanta’s Hartsfield-Jackson International Airport, an IT pilot has led to a wholesale data journey destined to transform operations at the world’s busiest airport, fueled by machine learning and generative AI. Data integrity presented a major challenge for the team, as there were many instances of duplicate data.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

High-standard ML validation with Deepchecks

Domino Data Lab

JULY 20, 2022

We've blogged before about the importance of model validation, a process that ensures that the model is performing the way it was intended and that it solves the problem it was designed to solve. Validations and tests are key elements to building machine learning pipelines you can trust.

Machine Learning

Machine Learning Testing Modeling Data Science

How a modern data platform supports government fraud detection

Cloudera

NOVEMBER 19, 2020

In financial services, another highly regulated, data-intensive industry, some 80 percent of industry experts say artificial intelligence is helping to reduce fraud. Machine learning algorithms enable fraud detection systems to distinguish between legitimate and fraudulent behaviors. The Public Sector data challenge.

Machine Learning

Machine Learning Data-driven Modeling Deep Learning

Data integrity vs. data quality: Is there a difference?

IBM Big Data Hub

JULY 13, 2023

When we talk about data integrity, we’re referring to the overarching completeness, accuracy, consistency, accessibility, and security of an organization’s data. Together, these factors determine the reliability of the organization’s data. In short, yes.

Data Quality

Data Quality Data Integration Metadata Cost-Benefit

Build data integration jobs with AI companion on AWS Glue Studio notebook powered by Amazon CodeWhisperer

AWS Big Data

JULY 26, 2023

AWS offers AWS Glue to help you integrate your data from multiple sources on serverless infrastructure for analysis, machine learning (ML), and application development. AWS Glue provides different authoring experiences for you to build data integration jobs. This integration is available today in US East (N.

Data Integration

Data Integration Interactive Machine Learning Big Data

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

What CEOs really need from today’s CIOs

CIO Business Intelligence

AUGUST 3, 2022

Modern delivery is product (rather than project) management , agile development, small cross-functional teams that co-create , and continuous integration and delivery all with a new financial model that funds “value” not “projects.”. This model allows us to pivot from a data-defensive to a data-offensive position.”.

Finance

Finance IoT Digital Transformation Sales

ThoughtSpot Delivers Democratized AI-Based Analytics

David Menninger's Analyst Perspectives

MAY 6, 2025

Large language models have allowed BI providers to accelerate the delivery of functionality to convert natural language questions into analytic queries and generate summarizations and recommendations from data and charts. In addition to investing in internal development, the company has boosted its capabilities through acquisitions.

Analytics

Analytics Business Intelligence Interactive Data-driven

GraphDB in Action: Using Semantics To Push The Envelope Of Software Engineering, Machine Learning, and E-Health Domains

Ontotext

SEPTEMBER 11, 2024

In this edition of GraphDB In Action, we present to you the work of three bright researchers who have set out to find solutions that allow meaningful analysis and interpretation of data, supported by Ontotext GraphDB. The study discusses the key concepts and technologies related to semantic data integration in the field of brain diseases.

Machine Learning

Machine Learning Software Data Integration Modeling

Machine Learning and AI Underpin Predictive Analytics to Achieve Clinical Breakthroughs

Cloudera

JULY 18, 2018

As such, we are witnessing a revolution in the healthcare industry, in which there is now an opportunity to employ a new model of improved, personalized, evidence and data-driven clinical care. Additionally, organizations are increasingly restrained due to budgetary constraints and having limited data sciences resources.

Machine Learning

Machine Learning Predictive Analytics Analytics Prescriptive Analytics

What is data analytics? Analyzing and managing data for decisions

CIO Business Intelligence

JUNE 7, 2022

Data analytics draws from a range of disciplines — including computer programming, mathematics, and statistics — to perform analysis on data in an effort to describe, predict, and improve performance. What are the four types of data analytics? Data analytics methods and techniques. Data analytics vs. business analytics.

Data Analytics

Data Analytics Diagnostic Analytics Management Analytics

Who’s paying your data integration tax?

CIO Business Intelligence

JUNE 5, 2023

Though we know who’s paying your income taxes this April (sorry to rub it in: it’s you), we have to ask: Who’s paying your data integration tax? Data integration tax is a term used to describe the hidden costs associated with integrating data solutions to process your data from disparate sources and for different needs.

Data Integration

Data Integration ROI Risk Machine Learning

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

Data in Use pertains explicitly to how data is actively employed in business intelligence tools, predictive models, visualization platforms, and even during export or reverse ETL processes. These applications are where the rubber meets the road and often where customers first encounter data quality issues.

Testing

Testing Data Quality Predictive Modeling Metrics

Why you should care about debugging machine learning models

Managing risk in machine learning

Webinars

Trending Sources

Deep automation in machine learning

Webinars

Artificial intelligence and machine learning adoption in European enterprise

Becoming a machine learning company means investing in foundational technologies

Proposals for model vulnerability and security

How AI orchestration has become more important than the models themselves

Digital twins at scale: Building the AI architecture that will reshape enterprise operations

How AI and ML Can Transform Data Integration

The quest for high-quality data

Oracle Wants to Be the Database for AI

Beginner’s Guide to Machine Learning Testing With DeepChecks

Domo Addresses Data Products and Agentic AI

The DataOps Vendor Landscape, 2021

How EUROGATE established a data mesh architecture using Amazon DataZone

Bigeye Enable Monitoring, Quality and Lineage of Data

What is data architecture? A framework to manage data

Core technologies and tools for AI, big data, and cloud computing

How Data Integration and Machine Learning Improve Retention Marketing

Are You Content with Your Organization’s Content Strategy?

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Top 10 Analytics And Business Intelligence Trends For 2020

IoT security: Challenges and best practices for a hyperconnected world

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

Data’s dark secret: Why poor quality cripples AI and growth

A Close Look at Data Mapping Automation Using Machine Learning Approaches

Author visual ETL flows on Amazon SageMaker Unified Studio (preview)

Build a high-performance quant research platform with Apache Iceberg

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

An AI Chat Bot Wrote This Blog Post …

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

High-standard ML validation with Deepchecks

How a modern data platform supports government fraud detection

Data integrity vs. data quality: Is there a difference?

Build data integration jobs with AI companion on AWS Glue Studio notebook powered by Amazon CodeWhisperer

Recap of Amazon Redshift key product announcements in 2024

What CEOs really need from today’s CIOs

ThoughtSpot Delivers Democratized AI-Based Analytics

GraphDB in Action: Using Semantics To Push The Envelope Of Software Engineering, Machine Learning, and E-Health Domains

Machine Learning and AI Underpin Predictive Analytics to Achieve Clinical Breakthroughs

What is data analytics? Analyzing and managing data for decisions

Who’s paying your data integration tax?

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

Stay Connected