Machine Learning, Metadata and Publishing

Specialized tools for machine learning development and model governance are becoming essential

O'Reilly on Data

APRIL 2, 2019

Why companies are turning to specialized machine learning tools like MLflow. A few years ago, we started publishing articles (see “Related resources” at the end of this post) on the challenges facing data teams as they start taking on more machine learning (ML) projects. The upcoming 0.9.0

Machine Learning

Machine Learning Modeling Data Science Software

Neptune.ai?—?A Metadata Store for MLOps

Analytics Vidhya

JANUARY 27, 2022

This article was published as a part of the Data Science Blogathon. A centralized location for research and production teams to govern models and experiments by storing metadata throughout the ML model lifecycle. A Metadata Store for MLOps appeared first on Analytics Vidhya. Keeping track of […]. The post Neptune.ai?—?A

Metadata

Metadata Machine Learning Data Science Publishing

Underlying Engineering Behind Alexa’s Contextual ASR

Analytics Vidhya

SEPTEMBER 17, 2022

This article was published as a part of the Data Science Blogathon. Any type of contextual information, like device context, conversational context, and metadata, […]. Any type of contextual information, like device context, conversational context, and metadata, […].

Metadata

Metadata Statistics Data Science Publishing

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

The state of data quality in 2020

O'Reilly on Data

FEBRUARY 11, 2020

Just 20% of organizations publish data provenance and data lineage. Almost half (48%) of respondents say they use data analysis, machine learning, or AI tools to address data quality issues. These include the basics, such as metadata creation and management, data provenance, data lineage, and other essentials.

Data Quality

Data Quality Metadata Data Governance Publishing

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Improve accuracy and resiliency of analytics and machine learning by fostering data standards and high-quality data products. In addition to real-time analytics and visualization, the data needs to be shared for long-term data analytics and machine learning applications. This process is shown in the following figure.

IoT

IoT Machine Learning Metadata Data-driven

How companies are building sustainable AI and ML initiatives

O'Reilly on Data

JANUARY 29, 2019

In 2017, we published “ How Companies Are Putting AI to Work Through Deep Learning ,” a report based on a survey we ran aiming to help leaders better understand how organizations are applying AI through deep learning. We found companies were planning to use deep learning over the next 12-18 months.

Deep Learning

Deep Learning Machine Learning Data Science Metadata

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

If you’re already a software product manager (PM), you have a head start on becoming a PM for artificial intelligence (AI) or machine learning (ML). AI products are automated systems that collect and learn from data to make user-facing decisions. We won’t go into the mathematics or engineering of modern machine learning here.

Management

Management Machine Learning Experimentation Metrics

The Power of Graph Databases, Linked Data, and Graph Algorithms

Rocket-Powered Data Science

MARCH 10, 2020

In their wisdom, the editors of the book decided that I wrote “too much” So, they correctly shortened my contribution by about half in the final published version of my Foreword for the book. I publish this in its original form in order to capture the essence of my point of view on the power of graph analytics.

Metadata

Metadata Machine Learning Prescriptive Analytics ROI

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

The CDH is used to create, discover, and consume data products through a central metadata catalog, while enforcing permission policies and tightly integrating data engineering, analytics, and machine learning services to streamline the user journey from data to insight.

Data Lake

Data Lake Sales Metadata Machine Learning

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

For instance, Domain A will have the flexibility to create data products that can be published to the divisional catalog, while also maintaining the autonomy to develop data products that are exclusively accessible to teams within the domain. A data portal for consumers to discover data products and access associated metadata.

Metadata

Metadata Data Governance Data Quality Data-driven

Enhance data governance with enforced metadata rules in Amazon DataZone

AWS Big Data

NOVEMBER 20, 2024

We’re excited to announce a new feature in Amazon DataZone that offers enhanced metadata governance for your subscription approval process. With this update, domain owners can define and enforce metadata requirements for data consumers when they request access to data assets. Key benefits The feature benefits multiple stakeholders.

Metadata

Metadata Data Governance Metrics Marketing

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

To achieve this, they plan to use machine learning (ML) models to extract insights from data. Business analysts enhance the data with business metadata/glossaries and publish the same as data assets or data products. The data security officer sets permissions in Amazon DataZone to allow users to access the data portal.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Proposals for model vulnerability and security

O'Reilly on Data

MARCH 20, 2019

Apply fair and private models, white-hat and forensic model debugging, and common sense to protect machine learning models from malicious actors. Like many others, I’ve known for some time that machine learning models themselves could pose security risks. Data poisoning attacks. General concerns.

Modeling

Modeling Machine Learning Predictive Modeling Consulting

Of Muffins and Machine Learning Models

Cloudera

FEBRUARY 16, 2022

In this example, the Machine Learning (ML) model struggles to differentiate between a chihuahua and a muffin. We will learn what it is, why it is important and how Cloudera Machine Learning (CML) is helping organisations tackle this challenge as part of the broader objective of achieving Ethical AI.

Machine Learning

Machine Learning Modeling Metadata Recreation/Entertainment

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

Extract, transform, and load (ETL) is the process of combining, cleaning, and normalizing data from different sources to prepare it for analytics, artificial intelligence (AI), and machine learning (ML) workloads. The data is also registered in the Glue Data Catalog , a metadata repository.

Data Integration

Data Integration Data Lake Statistics Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant. Data fabric Metadata-rich integration layer across distributed systems. Implementation complexity, relies on robust metadata management.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

They’re taking data they’ve historically used for analytics or business reporting and putting it to work in machine learning (ML) models and AI-powered applications. Collaboration is seamless, with straightforward publishing and subscribing workflows, fostering a more connected and efficient work environment.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

This fragmented, repetitive, and error-prone experience for data connectivity is a significant obstacle to data integration, analysis, and machine learning (ML) initiatives. If you want to revert a draft notebook to its last published state, choose Revert to published version to roll back to the most recently published version.

Visualization

Visualization Data Processing Testing Publishing

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

It focuses on the key aspect of the solution, which was enabling data providers to automatically publish data assets to Amazon DataZone, which served as the central data mesh for enhanced data discoverability. Data domain producers publish data assets using datasource run to Amazon DataZone in the Central Governance account.

Data Lake

Data Lake Publishing Metadata Data-driven

Metadata enrichment – highly scalable data classification and data discovery

IBM Big Data Hub

JULY 28, 2022

Metadata enrichment is about scaling the onboarding of new data into a governed data landscape by taking data and applying the appropriate business terms, data classes and quality assessments so it can be discovered, governed and utilized effectively.

Metadata

Metadata Machine Learning Data Quality Statistics

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

AWS Big Data

JULY 25, 2024

Instead of a central data platform team with a data warehouse or data lake serving as the clearinghouse of all data across the company, a data mesh architecture encourages distributed ownership of data by data producers who publish and curate their data as products, which can then be discovered, requested, and used by data consumers.

Data Lake

Data Lake Metadata Sales Publishing

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization. To share the datasets, they needed a way to share access to the data and access to catalog metadata in the form of tables and views.

Metadata

Metadata Data Lake Machine Learning Big Data

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Companies such as Adobe , Expedia , LinkedIn , Tencent , and Netflix have published blogs about their Apache Iceberg adoption for processing their large scale analytics datasets. . In CDP we enable Iceberg tables side-by-side with the Hive table types, both of which are part of our SDX metadata and security framework.

Snapshot

Snapshot Metadata Cost-Benefit Data Architecture

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

Octopai

APRIL 19, 2021

Aptly named, metadata management is the process in which BI and Analytics teams manage metadata, which is the data that describes other data. In other words, data is the context and metadata is the content. Without metadata, BI teams are unable to understand the data’s full story. It is published by Robert S.

Metadata

Metadata Management Business Intelligence Data Governance

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

Use business terms to search, share, and access cataloged data, making data accessible to all the configured users to learn more about data they want to use with the business glossary. Automate data discovery and cataloging with machine learning (ML).

Metadata

Metadata Data Lake Publishing Data Governance

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

What’s covered in this post is already implemented and available in the Guidance for Connecting Data Products with Amazon DataZone solution, published in the AWS Solutions Library. It offers AWS Glue connections and AWS Glue crawlers as a means to capture the data asset’s metadata easily from their source database and keep it up to date.

Metadata

Metadata Data Lake Data Processing Data-driven

Oracle launches a new sustainability app for Fusion Cloud EPM

CIO Business Intelligence

SEPTEMBER 11, 2024

Fusion Data Intelligence — which can be viewed as an updated avatar of Fusion Analytics Warehouse — combines enterprise data, ready-to-use analytics along with prebuilt AI and machine learning models to deliver business intelligence.

Contextual Data

Contextual Data Key Performance Indicator Dashboards Data-driven

Salesforce rebrands its low-code platform to Einstein 1 Studio

CIO Business Intelligence

MARCH 6, 2024

The new feature, which Claire Cheng, vice president of machine learning and AI engineering at Salesforce said was in the works last month , has been launched as the Prompt Builder and has been made generally available.

IT

IT Metadata Interactive Enterprise

How REA Group approaches Amazon MSK cluster capacity planning

AWS Big Data

DECEMBER 5, 2024

Hydro is powered by Amazon MSK and other tools with which teams can move, transform, and publish data at low latency using event-driven architectures. In the future, we plan to profile workloads based on metadata, cross-check them with capacity metrics, and place them in the appropriate MSK cluster.

Metrics

Metrics Dashboards Testing Optimization

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

This launch brings together widely adopted AWS machine learning (ML) and analytics capabilities and provides an integrated experience for analytics and AI with unified access to data and built-in governance. These metadata tables are stored in S3 Tables, the new S3 storage offering optimized for tabular data. With AWS Glue 5.0,

Analytics

Analytics Data Lake Metadata Data Warehouse

How Fujitsu implemented a global data mesh architecture and democratized data

AWS Big Data

MAY 1, 2024

Solution overview OneData defines three personas: Publisher – This role includes the organizational and management team of systems that serve as data sources. Provide and keep up to date with technical metadata for loaded data. Use the latest data published by the publisher to update data as needed.

Dashboards

Dashboards Publishing Data-driven Cost-Benefit

What is a data fabric architecture?

IBM Big Data Hub

MARCH 25, 2022

These services include the ability to auto-discover and classify data, to detect sensitive information, to analyze data quality, to link business terms to technical metadata and to publish data to the knowledge catalog.

Metadata

Metadata Data Quality Data Governance Data Integration

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

Each file arrives as a pair with a tail metadata file in CSV format containing the size and name of the file. This metadata file is later used to read source file names during processing into the staging layer. The Redshift publish zone is a different set of tables in the same Redshift provisioned cluster.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

The Modern Data Lakehouse: An Architectural Innovation

Cloudera

SEPTEMBER 9, 2022

analyst Sumit Pal, in “Exploring Lakehouse Architecture and Use Cases,” published January 11, 2022: “Data lakehouses integrate and unify the capabilities of data warehouses and data lakes, aiming to support AI, BI, ML, and data engineering on a single platform.” This is the promise of the modern data lakehouse architecture.

Metadata

Metadata Machine Learning Unstructured Data Data Lake

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. Centralized catalog for published data – Multiple producers release data currently governed by their respective entities. For consumer access, a centralized catalog is necessary where producers can publish their data assets.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Cloudera

JANUARY 19, 2022

We have been working hard to build our cloud-native data services on Cloudera Data Platform (CDP), which include CDP Data Warehouse, CDP Operational Database, CDP Machine Learning, CDP Data Engineering and CDP Data Flow. Gartner and Magic Quadrant are registered trademarks of Gartner, Inc. and/or its affiliates in the U.S.

Reporting

Reporting Data Warehouse Data Lake Machine Learning

Disaster recovery strategies for Amazon MWAA – Part 1

AWS Big Data

JANUARY 16, 2024

The need for Amazon MWAA disaster recovery Amazon MWAA, a fully managed service for Apache Airflow , brings immense value to organizations by automating workflow orchestration for extract, transform, and load (ETL), DevOps, and machine learning (ML) workloads. This makes it difficult to implement a comprehensive DR strategy.

Strategy

Strategy Metadata Metrics Dashboards

Introducing watsonx: The future of AI for business

IBM Big Data Hub

MAY 9, 2023

After some impressive advances over the past decade, largely thanks to the techniques of Machine Learning (ML) and Deep Learning , the technology seems to have taken a sudden leap forward. 1] Users can access data through a single point of entry, with a shared metadata layer across clouds and on-premises environments.

Data Warehouse

Data Warehouse Machine Learning Cost-Benefit Metadata

Why Data Governance Is Crucial for All Enterprise-Level Businesses

Cloudera

MARCH 3, 2022

In the 2020 O’Reilly Data Quality survey only 20% of respondents say their organizations publish information about data provenance or data lineage internally. What’s more, SDX provides access to the lineage, metadata, and metrics associated with data utilization across environments. From Bad to Worse.

Data Governance

Data Governance Enterprise Data Quality Metadata

Using Machine Learning for Sentiment Analysis: a Deep Dive

DataRobot Blog

MARCH 9, 2022

This article was originally published at Algorithimia’s website. The Amazon Product Reviews Dataset provides over 142 million Amazon product reviews with their associated metadata, allowing machine learning practitioners to train sentiment models using product ratings as a proxy for the sentiment label. It provides 1.6

Machine Learning

Machine Learning Deep Learning Modeling Measurement

From charred scrolls to customer sentiment: How AI helps you monetize your unstructured data

CIO Business Intelligence

SEPTEMBER 12, 2024

But now is the time to understand what “information” should be digitized and which digitized assets should be enriched and tagged with metadata so that they are searchable and can be connected to other relevant records. That means developers can use any technology stack and publish content consistently across various digital channels.

Unstructured Data

Unstructured Data Deep Learning Metadata Structured Data

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

A data fabric utilizes continuous analytics over existing, discoverable and inferenced metadata to support the design, deployment and utilization of integrated and reusable datasets across all environments, including hybrid and multicloud platforms.” [1]. This improves data engineering productivity and time-to-value for data consumers.

Management

Management Metadata Data Architecture Data Lake

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

In this post, we discuss how the Amazon Finance Automation team used AWS Lake Formation and the AWS Glue Data Catalog to build a data mesh architecture that simplified data governance at scale and provided seamless data access for analytics, AI, and machine learning (ML) use cases.

Finance

Finance Metadata Big Data Recreation/Entertainment

Specialized tools for machine learning development and model governance are becoming essential

Neptune.ai?—?A Metadata Store for MLOps

Webinars

Trending Sources

Underlying Engineering Behind Alexa’s Contextual ASR

Webinars

The state of data quality in 2020

How EUROGATE established a data mesh architecture using Amazon DataZone

How companies are building sustainable AI and ML initiatives

What you need to know about product management for AI

The Power of Graph Databases, Linked Data, and Graph Algorithms

How BMW streamlined data access using AWS Lake Formation fine-grained access control

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Enhance data governance with enforced metadata rules in Amazon DataZone

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Proposals for model vulnerability and security

Of Muffins and Machine Learning Models

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Data’s dark secret: Why poor quality cripples AI and growth

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Recap of Amazon Redshift key product announcements in 2024

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Metadata enrichment – highly scalable data classification and data discovery

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

How Cargotec uses metadata replication to enable cross-account data sharing

Introducing Apache Iceberg in Cloudera Data Platform

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

Unlock data across organizational boundaries using Amazon DataZone – now generally available

Governing data in relational databases using Amazon DataZone

Oracle launches a new sustainability app for Fusion Cloud EPM

Salesforce rebrands its low-code platform to Einstein 1 Studio

How REA Group approaches Amazon MSK cluster capacity planning

Top analytics announcements of AWS re:Invent 2024

How Fujitsu implemented a global data mesh architecture and democratized data

What is a data fabric architecture?

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

The Modern Data Lakehouse: An Architectural Innovation

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Disaster recovery strategies for Amazon MWAA – Part 1

Introducing watsonx: The future of AI for business

Why Data Governance Is Crucial for All Enterprise-Level Businesses

Using Machine Learning for Sentiment Analysis: a Deep Dive

From charred scrolls to customer sentiment: How AI helps you monetize your unstructured data

Augmented data management: Data fabric versus data mesh

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Stay Connected