Data Integration, Machine Learning and Metadata

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

We need to do more than automate model building with autoML; we need to automate tasks at every stage of the data pipeline. In a previous post , we talked about applications of machine learning (ML) to software development, which included a tour through sample tools in data science and for managing data infrastructure.

Machine Learning

Machine Learning Software Metadata Testing

Becoming a machine learning company means investing in foundational technologies

O'Reilly on Data

MAY 21, 2019

Companies successfully adopt machine learning either by building on existing data products and services, or by modernizing existing models and algorithms. In this post, I share slides and notes from a keynote I gave at the Strata Data Conference in London earlier this year. Use ML to unlock new data types—e.g.,

Machine Learning

Machine Learning Technology Deep Learning Data Science

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. We take care of the ETL for you by automating the creation and management of data replication. Glue ETL offers customer-managed data ingestion.

Data Integration

Data Integration Data Lake Statistics Data-driven

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

The following requirements were essential to decide for adopting a modern data mesh architecture: Domain-oriented ownership and data-as-a-product : EUROGATE aims to: Enable scalable and straightforward data sharing across organizational boundaries. Eliminate centralized bottlenecks and complex data pipelines.

IoT

IoT Machine Learning Metadata Data-driven

How companies are building sustainable AI and ML initiatives

O'Reilly on Data

JANUARY 29, 2019

In 2017, we published “ How Companies Are Putting AI to Work Through Deep Learning ,” a report based on a survey we ran aiming to help leaders better understand how organizations are applying AI through deep learning. We found companies were planning to use deep learning over the next 12-18 months.

Deep Learning

Deep Learning Machine Learning Data Science Metadata

Are You Content with Your Organization’s Content Strategy?

Rocket-Powered Data Science

JULY 6, 2021

This is accomplished through tags, annotations, and metadata (TAM). My favorite approach to TAM creation and to modern data management in general is AI and machine learning (ML). Smart content includes labeled (tagged, annotated) metadata (TAM). Tagging and annotating those subcomponents and subsets (i.e.,

Strategy

Strategy Machine Learning Metadata Knowledge Discovery

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. Unlike direct Amazon S3 access, Iceberg supports these operations on petabyte-scale data lakes without requiring complex custom code.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Enterprises can gain an edge with Metadata Management

CIO Business Intelligence

SEPTEMBER 6, 2024

As artificial intelligence (AI) and machine learning (ML) continue to reshape industries, robust data management has become essential for organizations of all sizes. Let’s dive into what that looks like, what workarounds some IT teams use today, and why metadata management is the key to success.

Metadata

Metadata Enterprise Management Cost-Benefit

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

Data lakes provide a unified repository for organizations to store and use large volumes of data. This enables more informed decision-making and innovative insights through various analytics and machine learning applications.

Metadata

Metadata Snapshot Data Lake Metrics

Proposals for model vulnerability and security

O'Reilly on Data

MARCH 20, 2019

Apply fair and private models, white-hat and forensic model debugging, and common sense to protect machine learning models from malicious actors. Like many others, I’ve known for some time that machine learning models themselves could pose security risks. Data poisoning attacks. Inversion by surrogate models.

Modeling

Modeling Machine Learning Predictive Modeling Consulting

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Our customers are telling us that they are seeing their analytics and AI workloads increasingly converge around a lot of the same data, and this is changing how they are using analytics tools with their data. Having confidence in your data is key. They aren’t using analytics and AI tools in isolation.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

AWS Big Data

FEBRUARY 6, 2023

AWS Glue is a serverless data integration service that makes it simple to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. MongoDB Atlas is a developer data service from AWS technology partner MongoDB, Inc.

Metadata

Metadata Data Lake Machine Learning Big Data

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Let’s briefly describe the capabilities of the AWS services we referred above: AWS Glue is a fully managed, serverless, and scalable extract, transform, and load (ETL) service that simplifies the process of discovering, preparing, and loading data for analytics. Amazon Athena is used to query, and explore the data.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

What’s the Current State of Data Governance and Automation?

erwin

JANUARY 30, 2020

The results of our new research show that organizations are still trying to master data governance, including adjusting their strategies to address changing priorities and overcoming challenges related to data discovery, preparation, quality and traceability. And close to 50 percent have deployed data catalogs and business glossaries.

Data Governance

Data Governance Metadata Cost-Benefit Digital Transformation

Data integrity vs. data quality: Is there a difference?

IBM Big Data Hub

JULY 13, 2023

When we talk about data integrity, we’re referring to the overarching completeness, accuracy, consistency, accessibility, and security of an organization’s data. Together, these factors determine the reliability of the organization’s data. In short, yes.

Data Quality

Data Quality Data Integration Metadata Cost-Benefit

Why data observability is essential to AI governance

erwin

DECEMBER 9, 2024

When it comes to using AI and machine learning across your organization, there are many good reasons to provide your data and analytics community with an intelligent data foundation. For instance, Large Language Models (LLMs) are known to ultimately perform better when data is structured.

Metadata

Metadata Data Quality Sales Modeling

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

Third, some services require you to set up and manage compute resources used for federated connectivity, and capabilities like connection testing and data preview arent available in all services. To solve for these challenges, we launched Amazon SageMaker Lakehouse unified data connectivity.

Visualization

Visualization Data Processing Testing Publishing

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

Data engineers use Apache Iceberg because it’s fast, efficient, and reliable at any scale and keeps records of how datasets change over time. Apache Iceberg offers integrations with popular data processing frameworks such as Apache Spark, Apache Flink, Apache Hive, Presto, and more.

Data Lake

Data Lake Snapshot Metadata Data Architecture

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization. For this, Cargotec built an Amazon Simple Storage Service (Amazon S3) data lake and cataloged the data assets in AWS Glue Data Catalog.

Metadata

Metadata Data Lake Machine Learning Big Data

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The program must introduce and support standardization of enterprise data. Programs must support proactive and reactive change management activities for reference data values and the structure/use of master data and metadata.

Data Governance

Data Governance Management Metadata Data Quality

What is a data fabric architecture?

IBM Big Data Hub

MARCH 25, 2022

A data fabric is an architectural approach that enables organizations to simplify data access and data governance across a hybrid multicloud landscape for better 360-degree views of the customer and enhanced MLOps and trustworthy AI. The post What is a data fabric architecture? appeared first on Journey to AI Blog.

Metadata

Metadata Data Quality Data Governance Data Integration

Informatica Embraces AI for Data Intelligence and Operations

David Menninger's Analyst Perspectives

MAY 8, 2025

Many longstanding providers of data management products, such as Informatica, have adopted DataOps capabilities and methodologies, adapting product portfolios to cloud-based consumption and automated, collaborative and agile processes. Informatica is still closely associated with data integration.

Data Quality

Data Quality Data Governance Data Integration Software

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

This amalgamation empowers vendors with authority over a diverse range of workloads by virtue of owning the data. This authority extends across realms such as business intelligence, data engineering, and machine learning thus limiting the tools and capabilities that can be used. Here is where it can get complicated.

Data Lake

Data Lake Metadata Snapshot Analytics

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Introducing the next generation of Amazon SageMaker AWS announces the next generation of Amazon SageMaker, a unified platform for data, analytics, and AI. S3 Metadata is designed to automatically capture metadata from objects as they are uploaded into a bucket, and to make that metadata queryable in a read-only table.

Analytics

Analytics Data Lake Metadata Data Warehouse

There’s More to erwin Data Governance Automation Than Meets the AI

erwin

NOVEMBER 6, 2020

The clear benefit is that data stewards spend less time building and populating the data governance framework and more time realizing value and ROI from it. . Industry analysts and other people who write about data governance and automation define it narrowly, with an emphasis on artificial intelligence (AI) and machine learning (ML).

Data Governance

Data Governance Metadata Data-driven Visualization

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

This cloud service was a significant leap from the traditional data warehousing solutions, which were expensive, not elastic, and required significant expertise to tune and operate. Here’s a couple of highlights from this week and for the full list, see below.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

And each of these gains requires data integration across business lines and divisions. Limiting growth by (data integration) complexity Most operational IT systems in an enterprise have been developed to serve a single business function and they use the simplest possible model for this. We call this the Bad Data Tax.

Metadata

Metadata Slice and Dice Data Integration Enterprise

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

These applications are where the rubber meets the road and often where customers first encounter data quality issues. Problems can manifest in various ways, such as Model Prediction Errors in machine learning applications, empty dashboards in BI tools, or row counts in exported data falling short of expectations.

Testing

Testing Data Quality Predictive Modeling Metrics

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

Software development, once solely the domain of human programmers, is now increasingly the by-product of data being carefully selected, ingested, and analysed by machine learning (ML) systems in a recurrent cycle. Further, data management activities don’t end once the AI model has been developed. era is upon us.

Data Governance

Data Governance IT Risk Data Lake

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Customers use Amazon Redshift as a key component of their data architecture to drive use cases from typical dashboarding to self-service analytics, real-time analytics, machine learning (ML), data sharing and monetization, and more.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

Figure 1: Apache Iceberg fits the next generation data architecture by abstracting storage layer from analytics layer while introducing net new capabilities like time-travel and partition evolution. #1: Apache Iceberg enables seamless integration between different streaming and processing engines while maintaining data integrity between them.

Metadata

Metadata Data Architecture Machine Learning Cost-Benefit

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Gartner defines a data fabric as “a design concept that serves as an integrated layer of data and connecting processes. The data fabric architectural approach can simplify data access in an organization and facilitate self-service data consumption at scale. What’s a data mesh? 11 May 2021. .

Management

Management Metadata Data Architecture Data Lake

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

However, enterprise data generated from siloed sources combined with the lack of a data integration strategy creates challenges for provisioning the data for generative AI applications. Data discoverability Unlike structured data, which is managed in well-defined rows and columns, unstructured data is stored as objects.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. To address this challenge, organizations can deploy a data mesh using AWS Lake Formation that connects the multiple EMR clusters. An entity can act both as a producer of data assets and as a consumer of data assets.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Many AWS customers adopted Apache Hudi on their data lakes built on top of Amazon S3 using AWS Glue , a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development.

Data Lake

Data Lake Snapshot Metadata Optimization

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

AWS Transfer Family seamlessly integrates with other AWS services, automates transfer, and makes sure data is protected with encryption and access controls. Each file arrives as a pair with a tail metadata file in CSV format containing the size and name of the file. 2 GB into the landing zone daily.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

AWS Big Data

JUNE 25, 2024

In today’s data-driven business landscape, organizations collect a wealth of data across various touch points and unify it in a central data warehouse or a data lake to deliver business insights. It provides secure, real-time access to Redshift data without copying, keeping enterprise data in place.

Data Lake

Data Lake Cost-Benefit Data-driven Data Warehouse

Dive deep into security management: The Data on EKS Platform

AWS Big Data

APRIL 29, 2024

The construction of big data applications based on open source software has become increasingly uncomplicated since the advent of projects like Data on EKS , an open source project from AWS to provide blueprints for building data and machine learning (ML) applications on Amazon Elastic Kubernetes Service (Amazon EKS).

Management

Management Big Data Data Warehouse Metadata

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Ontotext

DECEMBER 1, 2023

So, KGF 2023 proved to be a breath of fresh air for anyone interested in topics like data mesh and data fabric , knowledge graphs, text analysis , large language model (LLM) integrations, retrieval augmented generation (RAG), chatbots, semantic data integration , and ontology building.

Metadata

Metadata Sales Machine Learning Consulting

Cloudera Provides First Look at Cloudera Data Platform, the Industry’s First Enterprise Data Cloud

Cloudera

JUNE 25, 2019

Cloudera shared a comprehensive overview and demonstration of the all-new Cloudera Data Platform (CDP). Hybrid and multi-cloud – provides choice to manage, analyze and experiment with data in any public cloud and in private data centers for maximum choice and flexibility.

Enterprise

Enterprise Machine Learning Recreation/Entertainment IoT

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

Data governance shows up as the fourth-most-popular kind of solution that enterprise teams were adopting or evaluating during 2019. That’s a lot of priorities – especially when you group together closely related items such as data lineage and metadata management which rank nearby. We keep feeding the monster data.

Machine Learning

Machine Learning Data Governance Metadata Data Science

Deep automation in machine learning

Becoming a machine learning company means investing in foundational technologies

Webinars

Trending Sources

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Webinars

How EUROGATE established a data mesh architecture using Amazon DataZone

How companies are building sustainable AI and ML initiatives

Are You Content with Your Organization’s Content Strategy?

Build a high-performance quant research platform with Apache Iceberg

Enterprises can gain an edge with Metadata Management

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Proposals for model vulnerability and security

Data’s dark secret: Why poor quality cripples AI and growth

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Recap of Amazon Redshift key product announcements in 2024

What’s the Current State of Data Governance and Automation?

Data integrity vs. data quality: Is there a difference?

Why data observability is essential to AI governance

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

How Cargotec uses metadata replication to enable cross-account data sharing

What is data governance? Best practices for managing data assets

What is a data fabric architecture?

Informatica Embraces AI for Data Intelligence and Operations

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Top analytics announcements of AWS re:Invent 2024

There’s More to erwin Data Governance Automation Than Meets the AI

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

You Cannot Get to the Moon on a Bike!

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

AWS re:Invent 2023 Amazon Redshift Sessions Recap

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Augmented data management: Data fabric versus data mesh

Data governance in the age of generative AI

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Introducing Apache Hudi support with AWS Glue crawlers

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

Dive deep into security management: The Data on EKS Platform

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Cloudera Provides First Look at Cloudera Data Platform, the Industry’s First Enterprise Data Cloud

Themes and Conferences per Pacoid, Episode 8

Stay Connected