Data Integration, Modeling and Reference

Proposals for model vulnerability and security

O'Reilly on Data

MARCH 20, 2019

Apply fair and private models, white-hat and forensic model debugging, and common sense to protect machine learning models from malicious actors. Like many others, I’ve known for some time that machine learning models themselves could pose security risks. Data poisoning attacks.

Modeling

Modeling Machine Learning Predictive Modeling Consulting

Digital twins at scale: Building the AI architecture that will reshape enterprise operations

CIO Business Intelligence

MAY 22, 2025

When developing AI solutions, training the model and reducing common AI problems like hallucination, data protection, privacy and unlearning the model can be costly on the real system and hence developing a digital twin solution in AI can help to simulate the real system and tune the system before deploying to productionized environments.

Enterprise

Enterprise Visualization Key Performance Indicator Machine Learning

The quest for high-quality data

O'Reilly on Data

JUNE 18, 2019

Machine learning solutions for data integration, cleaning, and data generation are beginning to emerge. “AI AI starts with ‘good’ data” is a statement that receives wide agreement from data scientists, analysts, and business owners. The problem is even more magnified in the case of structured enterprise data.

Machine Learning

Machine Learning Data Quality Statistics Modeling

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Seamless Lakehouse architectures Lakehouse brings together flexibility and openness of data lakes with the performance and transactional capabilities of data warehouses. Lakehouse allows you to use preferred analytics engines and AI models of your choice with consistent governance across all your data.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

How IT leaders use agentic AI for business workflows

CIO Business Intelligence

APRIL 30, 2025

Though loosely applied, agentic AI generally refers to granting AI agents more autonomy to optimize tasks and chain together increasingly complex actions. As Xerox continues its reinvention, shifting from its traditional print roots to a services-led model, agentic AI fits well into that journey.

IT

IT Sales Cost-Benefit Data-driven

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

“The challenge that a lot of our customers have is that requires you to copy that data, store it in Salesforce; you have to create a place to store it; you have to create an object or field in which to store it; and then you have to maintain that pipeline of data synchronization and make sure that data is updated,” Carlson said.

Data Integration

Data Integration Data Lake Data Warehouse Metadata

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

They’re taking data they’ve historically used for analytics or business reporting and putting it to work in machine learning (ML) models and AI-powered applications. Amazon SageMaker Unified Studio (Preview) solves this challenge by providing an integrated authoring experience to use all your data and tools for analytics and AI.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

Many AWS customers have integrated their data across multiple data sources using AWS Glue , a serverless data integration service, in order to make data-driven business decisions. Are there recommended approaches to provisioning components for data integration?

Data Integration

Data Integration Snapshot Testing Visualization

Author visual ETL flows on Amazon SageMaker Unified Studio (preview)

AWS Big Data

DECEMBER 4, 2024

From the Unified Studio, you can collaborate and build faster using familiar AWS tools for model development, generative AI, data processing, and SQL analytics. You can use a simple visual interface to compose flows that move and transform data and run them on serverless compute.

Visualization

Visualization Sales Data-driven Analytics

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

We need to do more than automate model building with autoML; we need to automate tasks at every stage of the data pipeline. In a previous post , we talked about applications of machine learning (ML) to software development, which included a tour through sample tools in data science and for managing data infrastructure.

Machine Learning

Machine Learning Software Metadata Testing

Dimensional modeling in Amazon Redshift

AWS Big Data

JULY 19, 2023

You can structure your data, measure business processes, and get valuable insights quickly can be done by using a dimensional model. Amazon Redshift provides built-in features to accelerate the process of modeling, orchestrating, and reporting from a dimensional model. Declare the grain of your data.

Modeling

Modeling Sales Data Warehouse Snapshot

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. Unlike direct Amazon S3 access, Iceberg supports these operations on petabyte-scale data lakes without requiring complex custom code.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

When dealing with third-party data sources, AWS Data Exchange simplifies the discovery, subscription, and utilization of third-party data from a diverse range of producers or providers. As a producer, you can also monetize your data through the subscription model using AWS Data Exchange.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Data integrity vs. data quality: Is there a difference?

IBM Big Data Hub

JULY 13, 2023

When we talk about data integrity, we’re referring to the overarching completeness, accuracy, consistency, accessibility, and security of an organization’s data. Together, these factors determine the reliability of the organization’s data. In short, yes.

Data Quality

Data Quality Data Integration Metadata Cost-Benefit

Introducing the GenAI models you haven’t heard of yet

CIO Business Intelligence

AUGUST 16, 2023

ChatGPT is capable of doing many of these tasks, but the custom support chatbot is using another model called text-embedding-ada-002, another generative AI model from OpenAI, specifically designed to work with embeddings—a type of database specifically designed to feed data into large language models (LLM).

Modeling

Modeling Enterprise Cost-Benefit Data Science

DataOps Enables Your Data Fabric

DataKitchen

APRIL 28, 2021

In Figure 1, the nodes could be sources of data, storage, internal/external applications, users – anything that accesses or relates to data. Data fabrics provide reusable services that span data integration, access, transformation, modeling, visualization, governance, and delivery.

Statistics

Statistics Optimization Data Analytics Technology

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Q: Is data modeling cool again? In today’s fast-paced digital landscape, data reigns supreme. The data-driven enterprise relies on accurate, accessible, and actionable information to make strategic decisions and drive innovation. A: It always was and is getting cooler!!

Data-driven

Data-driven Modeling Enterprise Structured Data

Top 10 Analytics And Business Intelligence Trends For 2020

datapine

NOVEMBER 27, 2019

The development of business intelligence to analyze and extract value from the countless sources of data that we gather at a high scale, brought alongside a bunch of errors and low-quality reports: the disparity of data sources and data types added some more complexity to the data integration process.

Business Intelligence

Business Intelligence Analytics Prescriptive Analytics Data Quality

Informatica Embraces AI for Data Intelligence and Operations

David Menninger's Analyst Perspectives

MAY 8, 2025

Many longstanding providers of data management products, such as Informatica, have adopted DataOps capabilities and methodologies, adapting product portfolios to cloud-based consumption and automated, collaborative and agile processes. Informatica is still closely associated with data integration.

Data Quality

Data Quality Data Governance Data Integration Software

RDF-Star: Metadata Complexity Simplified

Ontotext

JUNE 10, 2021

And yeah, the real-world relationships among the entities represented in the data had to be fudged a bit to fit in the counterintuitive model of tabular data, but, in trade, you get reliability and speed. They create reliable, consistent and communicable models for representing data. Schemas are powerful.

Metadata

Metadata Cost-Benefit OLAP Modeling

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

The goal of DataOps is to help organizations make better use of their data to drive business decisions and improve outcomes. ChatGPT> DataOps is a term that refers to the set of practices and tools that organizations use to improve the quality and speed of data analytics and machine learning.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

The Need For Personalized Data Journeys for Your Data Consumers

DataKitchen

OCTOBER 20, 2023

The Solution: ‘Payload’ Data Journeys Traditional Data Observability usually focuses on a ‘process journey,’ tracking the performance and status of data pipelines. ’ It assigns unique identifiers to each data item—referred to as ‘payloads’—related to each event.

Insurance

Insurance Metadata Data-driven Data Quality

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

.’ It’s not just about playing detective to discover where things went wrong; it’s about proactively monitoring your entire data journey to ensure everything goes right with your data. What is Data in Place? ” For example, these tools may offer metadata-based notifications.

Testing

Testing Data Quality Predictive Modeling Metrics

Data Integration Patterns in Knowledge Graph Building with GraphDB

Ontotext

AUGUST 24, 2023

The solution is choosing one of the standard provenance models. Standard provenance models Graph Replace is probably the most straightforward model. Trade-offs of the standard provenance models Graph Replace is fast and simple to implement and we recommend it to people with batch updates. Persistent or non-persistent IDs?

Data Integration

Data Integration Modeling Business Objectives Optimization

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

As organizations increasingly rely on data stored across various platforms, such as Snowflake , Amazon Simple Storage Service (Amazon S3), and various software as a service (SaaS) applications, the challenge of bringing these disparate data sources together has never been more pressing. Choose Create connection. Choose Next.

Analytics

Analytics Data-driven Data Integration Data Lake

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Ontotext

JULY 29, 2021

The Semantic Web started in the late 90’s as a fascinating vision for a web of data, which is easy to interpret by both humans and machines. One of its pillars are ontologies that represent explicit formal conceptual models, used to describe semantically both unstructured content and databases.

Enterprise

Enterprise Metadata Knowledge Discovery Management

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. Data governance is a critical building block across all these approaches, and we see two emerging areas of focus.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

It encompasses the people, processes, and technologies required to manage and protect data assets. The Data Management Association (DAMA) International defines it as the “planning, oversight, and control over management of data and the use of data and data-related sources.”

Data Governance

Data Governance Management Metadata Data Quality

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

erwin

JULY 17, 2019

Part Two of the Digital Transformation Journey … In our last blog on driving digital transformation , we explored how enterprise architecture (EA) and business process (BP) modeling are pivotal factors in a viable digital transformation strategy. Digital Transformation Strategy: Smarter Data.

Digital Transformation

Digital Transformation Strategy Metadata Data-driven

Entity resolution and fuzzy matches in AWS Glue using the Zingg open source library

AWS Big Data

MAY 23, 2024

In today’s data-driven world, organizations often deal with data from multiple sources, leading to challenges in data integration and governance. This process is crucial for maintaining data integrity and avoiding duplication that could skew analytics and insights. Choose Create notebook.

Machine Learning

Machine Learning Interactive Recreation/Entertainment Data Integration

Why Are Organizations Focusing on Data Security?

Smart Data Collective

JUNE 1, 2022

As stated earlier, data is the digital gold in the modern era. Your business’s success or failure depends on your collection and processing of relevant data. Companies use data to develop their marketing and pricing models and gain access to a larger consumer base. Data integrity is important.

Revenue Optimization

Revenue Optimization Measurement Marketing Interactive

What’s the Difference Between Data Conversion and Data Migration?

Smart Data Collective

NOVEMBER 10, 2021

Without this knowledge, the result can be compromised data or ruined data integrity. Other common issues include duplicate data, which may need to be merged; obsolete data, which will need to be deleted before conversion; and incorrect data, which may result in the need for a manual fix.

Testing

Testing Data Integration Software Modeling

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

AWS Big Data

MARCH 27, 2024

AWS has invested in a zero-ETL (extract, transform, and load) future so that builders can focus more on creating value from data, instead of having to spend time preparing data for analysis. To create an AWS HealthLake data store, refer to Getting started with AWS HealthLake. reference", SUBSTRING(a."patient"."reference",

Data Analytics

Data Analytics Analytics Data Warehouse Data Lake

Compose your ETL jobs for MongoDB Atlas with AWS Glue

AWS Big Data

MAY 3, 2023

These two tasks (building data lakes or data warehouses and application modernization) involve data movement, which uses an extract, transform, and load (ETL) process. Developers would also need to build this quickly to migrate the data. Developers can use AWS Glue Studio to efficiently create such data pipelines.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

The dbt-glue adapter democratized access for dbt users to data lakes, and enabled many users to effortlessly run their transformation workloads on the cloud with the serverless data integration capability of AWS Glue. The team uses dbt-glue to build a transformed gold model optimized for business intelligence (BI).

Data Lake

Data Lake Management Metrics Data Warehouse

Business Intelligence and Analytics: Definitive Guide

FineReport

JUNE 11, 2021

According to the definition, business intelligence and analytics refer to the data management solutions implemented in companies to collect, analyze and drive insights from data. By contrast, business analytics may use historical data to predict what may happen in the future or how the organization will move forward.

Business Intelligence

Business Intelligence Analytics Dashboards Statistics

erwin® Data Modeler by Quest® R12.0: Leading the way with a new DevOps GitHub capability

erwin

APRIL 4, 2022

If you’re a long-time erwin ® Data Modeler by Quest ® customer, you might be asking yourself, “What happened to the release naming convention of erwin Data Modeler?” In 2021 erwin Data Modeler released 2021R1. What’s new in erwin Data Modeler R12.0? DevOps GitHub integration via Mart.

Modeling

Modeling Data Processing Data-driven Big Data

Top Business Intelligence Features To Boost Your Business Performance

datapine

NOVEMBER 11, 2021

This data is usually saved in different databases, external applications, or in an indefinite number of Excel sheets which makes it almost impossible to combine different data sets and update every source promptly. BI tools aim to make data integration a simple task by providing the following features: a) Data Connectors.

Business Intelligence

Business Intelligence Dashboards Interactive Visualization

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

The Central IT team manages a unified Redshift data warehouse, handling all data integration, processing, and maintenance. Business units access clean, standardized data. This model enables the units to focus on insights, with costs aligned to actual consumption. In this post, we use three AWS accounts.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

Five areas where EA matters more than ever

CIO Business Intelligence

MAY 18, 2022

They range from organizational architects who define business and operating models to projects, platforms, and digital architects. That may require, for example, an analysis of ERP systems to understand all the dependencies and functions that reference a bill of materials, he says.

Enterprise

Enterprise Measurement Data-driven Software

How to accelerate your data monetization strategy with data products and AI

IBM Big Data Hub

NOVEMBER 14, 2023

But few organizations have made the strategic shift to managing “data as a product.” ” This data management means applying product development practices to data. Serve: Data products are discoverable and consumed as services, typically via a platform.

Strategy

Strategy Data-driven Cost-Benefit Measurement

Explore visualizations with AWS Glue interactive sessions

AWS Big Data

SEPTEMBER 20, 2023

With this functionality, you’re empowered to focus on extracting valuable insights from their data, while AWS Glue handles the infrastructure heavy lifting using a serverless compute model. To get started today, refer to Developing AWS Glue jobs with Notebooks and Interactive sessions. Big Data Architect.

Interactive

Interactive Visualization Measurement Data Architecture

How Infomedia built a serverless data pipeline with change data capture using AWS Glue and Apache Hudi

AWS Big Data

MARCH 15, 2023

They wanted to develop a simple incremental data processing pipeline without having to update the entire database each time the pipeline ran. The Apache Hudi framework allowed the Infomedia team to maintain a golden reference dataset and capture changes so that the downstream database could be incrementally updated in a short timeframe.

Cost-Benefit

Cost-Benefit Data Processing Optimization Data-driven

AVB accelerates search in LINQ with Amazon OpenSearch Service

AWS Big Data

MAY 21, 2024

A significant part of this phase involved the innovative process of data flattening, a technique crucial for managing complex product data. This product is linked to several related tables: one for basic details like model number and manufacturer, another for pricing, and another for features such as energy efficiency and capacity.

Manufacturing

Manufacturing Sales Optimization Data Processing

Proposals for model vulnerability and security

Digital twins at scale: Building the AI architecture that will reshape enterprise operations

Webinars

Trending Sources

The quest for high-quality data

Webinars

Recap of Amazon Redshift key product announcements in 2024

How IT leaders use agentic AI for business workflows

Salesforce debuts Zero Copy Partner Network to ease data integration

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Author visual ETL flows on Amazon SageMaker Unified Studio (preview)

Deep automation in machine learning

Dimensional modeling in Amazon Redshift

Build a high-performance quant research platform with Apache Iceberg

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Data integrity vs. data quality: Is there a difference?

Introducing the GenAI models you haven’t heard of yet

DataOps Enables Your Data Fabric

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

Top 10 Analytics And Business Intelligence Trends For 2020

Informatica Embraces AI for Data Intelligence and Operations

RDF-Star: Metadata Complexity Simplified

An AI Chat Bot Wrote This Blog Post …

The Need For Personalized Data Journeys for Your Data Consumers

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

Data Integration Patterns in Knowledge Graph Building with GraphDB

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Data governance in the age of generative AI

What is data governance? Best practices for managing data assets

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

Entity resolution and fuzzy matches in AWS Glue using the Zingg open source library

Why Are Organizations Focusing on Data Security?

What’s the Difference Between Data Conversion and Data Migration?

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

Compose your ETL jobs for MongoDB Atlas with AWS Glue

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Business Intelligence and Analytics: Definitive Guide

erwin® Data Modeler by Quest® R12.0: Leading the way with a new DevOps GitHub capability

Top Business Intelligence Features To Boost Your Business Performance

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Five areas where EA matters more than ever

How to accelerate your data monetization strategy with data products and AI

Explore visualizations with AWS Glue interactive sessions

How Infomedia built a serverless data pipeline with change data capture using AWS Glue and Apache Hudi

AVB accelerates search in LINQ with Amazon OpenSearch Service

Stay Connected