Metadata and Publishing - Data Leaders Brief

AWS Glue for Handling Metadata

Analytics Vidhya

AUGUST 19, 2022

This article was published as a part of the Data Science Blogathon. The post AWS Glue for Handling Metadata appeared first on Analytics Vidhya. Introduction AWS Glue helps Data Engineers to prepare data for other data consumers through the Extract, Transform & Load (ETL) Process. It provides organizations with […].

Metadata

Metadata Data Science Big Data Publishing

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

We want to publish this data to Amazon DataZone as discoverable S3 data. Custom subscription workflow architecture diagram To implement the solution, we complete the following steps: As a data producer, publish an unstructured S3 based data asset as S3ObjectCollectionType to Amazon DataZone.

Publishing

Publishing Unstructured Data Metadata Data-driven

Underlying Engineering Behind Alexa’s Contextual ASR

Analytics Vidhya

SEPTEMBER 17, 2022

This article was published as a part of the Data Science Blogathon. Any type of contextual information, like device context, conversational context, and metadata, […]. Any type of contextual information, like device context, conversational context, and metadata, […].

Metadata

Metadata Statistics Data Science Publishing

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Neptune.ai?—?A Metadata Store for MLOps

Analytics Vidhya

JANUARY 27, 2022

This article was published as a part of the Data Science Blogathon. A centralized location for research and production teams to govern models and experiments by storing metadata throughout the ML model lifecycle. A Metadata Store for MLOps appeared first on Analytics Vidhya. Keeping track of […]. The post Neptune.ai?—?A

Metadata

Metadata Machine Learning Data Science Publishing

Knowledge Graphs are Critical to Data Intelligence and AI

David Menninger's Analyst Perspectives

MAY 22, 2025

These catalogs combine technical and business metadata and data governance capabilities with knowledge graph functionality to deliver a holistic, business-level view of data production and consumption. Key industries include media, publishing, life sciences and pharmaceuticals.

Metadata

Metadata Enterprise Data-driven Publishing

The state of data quality in 2020

O'Reilly on Data

FEBRUARY 11, 2020

Just 20% of organizations publish data provenance and data lineage. These include the basics, such as metadata creation and management, data provenance, data lineage, and other essentials. They’re still struggling with the basics: tagging and labeling data, creating (and managing) metadata, managing unstructured data, etc.

Data Quality

Data Quality Metadata Data Governance Publishing

Metadata is Like Packaging: Seeing Beyond the Library Card Metaphor

Ontotext

MARCH 19, 2021

way we package information has a lot to do with metadata. The somewhat conventional metaphor about metadata is the one of the library card. This metaphor has it that books are the data and library cards are the metadata helping us find what we need, want to know more about or even what we don’t know we were looking for.

Metadata

Metadata Publishing Enterprise Management

RDF-Star: Metadata Complexity Simplified

Ontotext

JUNE 10, 2021

Not Every Graph is a Knowledge Graph: Schemas and Semantic Metadata Matter. To be able to automate these operations and maintain sufficient data quality, enterprises have started implementing the so-called data fabrics , that employ diverse metadata sourced from different systems. Metadata about Relationships Come in Handy.

Metadata

Metadata Cost-Benefit OLAP Modeling

Metadata Management Best Practices: How to Plan Your Metadata Management Program

Octopai

NOVEMBER 10, 2021

Metadata has been defined as the who, what, where, when, why, and how of data. Without the context given by metadata, data is just a bunch of numbers and letters. But going on a rampage to define, categorize, and otherwise metadata-ize your data doesn’t necessarily give you the key to the value in your data. Hold on tight!

Metadata

Metadata Management Interactive Strategy

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

To achieve this, EUROGATE designed an architecture that uses Amazon DataZone to publish specific digital twin data sets, enabling access to them with SageMaker in a separate AWS account. From here, the metadata is published to Amazon DataZone by using AWS Glue Data Catalog. This process is shown in the following figure.

IoT

IoT Machine Learning Metadata Data-driven

Streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio

AWS Big Data

APRIL 9, 2025

Whether youre a data analyst seeking a specific metric or a data steward validating metadata compliance, this update delivers a more precise, governed, and intuitive search experience. Refer to the product documentation to learn more about how to set up metadata rules for subscription and publishing workflows.

Metadata

Metadata Metrics Data-driven Cost-Benefit

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

For instance, Domain A will have the flexibility to create data products that can be published to the divisional catalog, while also maintaining the autonomy to develop data products that are exclusively accessible to teams within the domain. A data portal for consumers to discover data products and access associated metadata.

Metadata

Metadata Data Governance Data Quality Data-driven

The New O’Reilly Answers: The R in “RAG” Stands for “Royalties”

O'Reilly on Data

JUNE 14, 2024

Will content creators and publishers on the open web ever be directly credited and fairly compensated for their works’ contributions to AI platforms? At the same time, Miso went about an in-depth chunking and metadata-mapping of every book in the O’Reilly catalog to generate enriched vector snippet embeddings of each work.

Metadata

Metadata Publishing Data-driven Modeling

Enhance data governance with enforced metadata rules in Amazon DataZone

AWS Big Data

NOVEMBER 20, 2024

We’re excited to announce a new feature in Amazon DataZone that offers enhanced metadata governance for your subscription approval process. With this update, domain owners can define and enforce metadata requirements for data consumers when they request access to data assets. Key benefits The feature benefits multiple stakeholders.

Metadata

Metadata Data Governance Metrics Marketing

It’s 2025. Are your data strategies strong enough to de-risk AI adoption?

CIO Business Intelligence

DECEMBER 11, 2024

This need to improve data governance is therefore at the forefront of many AI strategies, as highlighted by the findings of The State of Data Intelligence report published in October 2024 by Quest, which found the top drivers of data governance were improving data quality (42%), security (40%), and analytics (40%).

Risk

Risk Data Strategy Strategy Data Governance

How companies are building sustainable AI and ML initiatives

O'Reilly on Data

JANUARY 29, 2019

In 2017, we published “ How Companies Are Putting AI to Work Through Deep Learning ,” a report based on a survey we ran aiming to help leaders better understand how organizations are applying AI through deep learning. Data scientists and data engineers are in demand.

Deep Learning

Deep Learning Machine Learning Data Science Metadata

Automate AWS Clean Rooms querying and dashboard publishing using AWS Step Functions and Amazon QuickSight – Part 2

AWS Big Data

FEBRUARY 12, 2024

We automate running queries using Step Functions with Amazon EventBridge schedules, build an AWS Glue Data Catalog on query outputs, and publish dashboards using QuickSight so they automatically refresh with new data. QuickSight is used to query, build visualizations, and publish dashboards using the data from the query results.

Publishing

Publishing Dashboards Metadata Visualization

Illuminating the black box: why CIOs should consider publishing an annual IT report

CIO Business Intelligence

NOVEMBER 15, 2023

One vehicle might be an annual report, one similar to those that have been published for years by public companies—10ks and 10qs and all those other filings by which stakeholders judge a company’s performance, posture, and potential. And don’t just rattle off project metadata. Such a report has a legacy already, if only a short one.

Publishing

Publishing Reporting IT Finance

Reduce your compute costs for stream processing applications with Kinesis Client Library 3.0

AWS Big Data

NOVEMBER 6, 2024

Kinesis Data Streams not only offers the flexibility to use many out-of-box integrations to process the data published to the streams, but also provides the capability to build custom stream processing applications that can be deployed on your compute fleet. KCL uses DynamoDB to store metadata such as shard-worker mapping and checkpoints.

Cost-Benefit

Cost-Benefit Metadata Optimization Publishing

The Power of Graph Databases, Linked Data, and Graph Algorithms

Rocket-Powered Data Science

MARCH 10, 2020

In their wisdom, the editors of the book decided that I wrote “too much” So, they correctly shortened my contribution by about half in the final published version of my Foreword for the book. I publish this in its original form in order to capture the essence of my point of view on the power of graph analytics.

Metadata

Metadata Machine Learning Prescriptive Analytics ROI

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Business analysts enhance the data with business metadata/glossaries and publish the same as data assets or data products. Users can search for assets in the Amazon DataZone catalog, view the metadata assigned to them, and access the assets. Amazon Athena is used to query, and explore the data.

Sales

Sales Data-driven Data Processing Key Performance Indicator

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

The CDH is used to create, discover, and consume data products through a central metadata catalog, while enforcing permission policies and tightly integrating data engineering, analytics, and machine learning services to streamline the user journey from data to insight.

Data Lake

Data Lake Sales Metadata Machine Learning

Metadata, the Neglected Stepchild of IT

Data Virtualization

DECEMBER 8, 2022

Reading Time: 3 minutes While cleaning up our archive recently, I found an old article published in 1976 about data dictionary/directory systems (DD/DS). Nowadays, we no longer use the term DD/DS, but “data catalog” or simply “metadata system”. It was written by L.

Metadata

Metadata IT Publishing Data Integration

Data Warehouses: Basic Concepts for data enthusiasts

Analytics Vidhya

SEPTEMBER 13, 2022

This article was published as a part of the Data Science Blogathon. Introduction The purpose of a data warehouse is to combine multiple sources to generate different insights that help companies make better decisions and forecasting. It consists of historical and commutative data from single or multiple sources.

Data Warehouse

Data Warehouse Forecasting Data Science Big Data

Integrate custom applications with AWS Lake Formation – Part 2

AWS Big Data

NOVEMBER 19, 2024

Solution overview AWS AppSync creates serverless GraphQL and pub/sub APIs that simplify application development through a single endpoint to securely query, update, or publish data. Unfiltered Table Metadata This tab displays the response of the AWS Glue API GetUnfilteredTableMetadata policies for the selected table.

Data Processing

Data Processing Metadata Publishing Testing

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

An Iceberg table’s metadata stores a history of snapshots, which are updated with each transaction. Over time, this creates multiple data files and metadata files as changes accumulate. Additionally, they can impact query performance due to the overhead of handling large amounts of metadata.

Snapshot

Snapshot Metadata Data Lake Optimization

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

AWS Big Data

MARCH 5, 2025

This post describes the process of using the business data catalog resource of Amazon DataZone to publish data assets so theyre discoverable by other accounts. Data publishers : Users in producer AWS accounts. Create the necessary publish project for AWS Glue and Amazon Redshift in the producer account.

Analytics

Analytics Publishing Metadata Sales

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

AWS Big Data

JUNE 19, 2023

In a series of follow-up posts, we will review the source code and walkthrough published examples of the Lambda ingestion framework in the AWS Samples GitHub repo. The framework can be modified for use in containers to help address companies that have longer processing times for large files published in Security Lake.

Publishing

Publishing Dashboards Visualization Management

Do I Need a Data Catalog?

erwin

JUNE 26, 2020

Organizations with particularly deep data stores might need a data catalog with advanced capabilities, such as automated metadata harvesting to speed up the data preparation process. Three Types of Metadata in a Data Catalog. The metadata provides information about the asset that makes it easier to locate, understand and evaluate.

Metadata

Metadata Cost-Benefit Measurement Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant. Data fabric Metadata-rich integration layer across distributed systems. Implementation complexity, relies on robust metadata management.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

2019 Gartner Magic Quadrant for Metadata Management Solutions

erwin

OCTOBER 18, 2019

erwin positioned as a Leader in Gartner’s “2019 Magic Quadrant for Metadata Management Solutions”. We were excited to announce earlier today that erwin was named as a Leader in the @Gartner _inc “2019 Magic Quadrant for Metadata Management Solutions.”. This graphic was published by Gartner, Inc. GET THE REPORT NOW.

Metadata

Metadata Management Reporting Publishing

What an Old Dictionary teaches us about Metadata

Jim Harris

MAY 5, 2017

Spelling, pronunciation, and examples of usage are included in the dictionary definition of a word, which is a good example of one of the many uses of metadata, namely to provide a definition, description, and context for data. In practice, I haven’t encountered a metadata dictionary that could deliver on that promise.

Metadata

Metadata Publishing Management IT

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

We have enhanced data sharing performance with improved metadata handling, resulting in data sharing first query execution that is up to four times faster when the data sharing producers data is being updated. You can also create new data lake tables using Redshift Managed Storage (RMS) as a native storage option.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Addressing Data Mesh Technical Challenges with DataOps

DataKitchen

AUGUST 9, 2021

The domain requires a team that creates/updates/runs the domain, and we can’t forget metadata: catalogs, lineage, test results, processing history, etc., …. It’s convenient to publish a set of URLs that provide access to domain-related data and services. Figure 5: Domain interfaces as URLs. How does one get access to a domain?

Testing

Testing Data Lake Metadata Publishing

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

It focuses on the key aspect of the solution, which was enabling data providers to automatically publish data assets to Amazon DataZone, which served as the central data mesh for enhanced data discoverability. Data domain producers publish data assets using datasource run to Amazon DataZone in the Central Governance account.

Data Lake

Data Lake Publishing Metadata Data-driven

Amazon DataZone introduces OpenLineage-compatible data lineage visualization in preview

AWS Big Data

JULY 8, 2024

It also offers reference implementation of an object model to persist metadata along with integration to major data and analytics tools. Lineage form types – Form types, or facets , provide additional metadata or context about lineage entities or events, enabling richer and more descriptive lineage information.

Visualization

Visualization Metadata Publishing Sales

Data Intelligence and Its Role in Combating Covid-19

erwin

MARCH 30, 2020

Unraveling Data Complexities with Metadata Management. Metadata management will be critical to the process for cataloging data via automated scans. Essentially, metadata management is the administration of data that describes other data, with an emphasis on associations and lineage. Data lineage to support impact analysis.

Metadata

Metadata IT Data Governance Data Quality

Specialized tools for machine learning development and model governance are becoming essential

O'Reilly on Data

APRIL 2, 2019

A few years ago, we started publishing articles (see “Related resources” at the end of this post) on the challenges facing data teams as they start taking on more machine learning (ML) projects. Metadata and artifacts needed for audits: as an example, the output from the components of MLflow will be very pertinent for audits.

Machine Learning

Machine Learning Modeling Data Science Software

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

The retail team, acting as the data producer, publishes the necessary data assets to Amazon DataZone, allowing you, as a consumer, to discover and subscribe to these assets. Publish data assets – As the data producer from the retail team, you must ingest individual data assets into Amazon DataZone.

Visualization

Visualization Data Lake Testing Data Governance

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of. This approach simplifies your data journey and helps you meet your security requirements. Now, lets start running queries on your notebook.

Visualization

Visualization Data Processing Testing Publishing

Metadata enrichment – highly scalable data classification and data discovery

IBM Big Data Hub

JULY 28, 2022

Metadata enrichment is about scaling the onboarding of new data into a governed data landscape by taking data and applying the appropriate business terms, data classes and quality assessments so it can be discovered, governed and utilized effectively. With public API you can now manage metadata enrichment from external tools and workflows.

Metadata

Metadata Machine Learning Data Quality Statistics

Get started with the new Amazon DataZone enhancements for Amazon Redshift

AWS Big Data

JULY 29, 2024

On March 21, 2024, Amazon DataZone introduced several exciting enhancements to its Amazon Redshift integration that simplify the process of publishing and subscribing to data warehouse assets like tables and views, while enabling Amazon Redshift customers to take advantage of the data management and governance capabilities or Amazon DataZone.

Data Warehouse

Data Warehouse Sales Metadata Publishing

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

The data is also registered in the Glue Data Catalog , a metadata repository. The database will be used to store the metadata related to the data integrations performed by zero-ETL. The status and statistics of the CDC load are published into CloudWatch.

Data Integration

Data Integration Data Lake Statistics Data-driven

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

AWS Big Data

JULY 25, 2024

Instead of a central data platform team with a data warehouse or data lake serving as the clearinghouse of all data across the company, a data mesh architecture encourages distributed ownership of data by data producers who publish and curate their data as products, which can then be discovered, requested, and used by data consumers.

Data Lake

Data Lake Metadata Sales Publishing

AWS Glue for Handling Metadata

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

Webinars

Trending Sources

Underlying Engineering Behind Alexa’s Contextual ASR

Webinars

Neptune.ai?—?A Metadata Store for MLOps

Knowledge Graphs are Critical to Data Intelligence and AI

The state of data quality in 2020

Metadata is Like Packaging: Seeing Beyond the Library Card Metaphor

RDF-Star: Metadata Complexity Simplified

Metadata Management Best Practices: How to Plan Your Metadata Management Program

How EUROGATE established a data mesh architecture using Amazon DataZone

Streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

The New O’Reilly Answers: The R in “RAG” Stands for “Royalties”

Enhance data governance with enforced metadata rules in Amazon DataZone

It’s 2025. Are your data strategies strong enough to de-risk AI adoption?

How companies are building sustainable AI and ML initiatives

Automate AWS Clean Rooms querying and dashboard publishing using AWS Step Functions and Amazon QuickSight – Part 2

Illuminating the black box: why CIOs should consider publishing an annual IT report

Reduce your compute costs for stream processing applications with Kinesis Client Library 3.0

The Power of Graph Databases, Linked Data, and Graph Algorithms

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Metadata, the Neglected Stepchild of IT

Data Warehouses: Basic Concepts for data enthusiasts

Integrate custom applications with AWS Lake Formation – Part 2

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

Do I Need a Data Catalog?

Data’s dark secret: Why poor quality cripples AI and growth

2019 Gartner Magic Quadrant for Metadata Management Solutions

What an Old Dictionary teaches us about Metadata

Recap of Amazon Redshift key product announcements in 2024

Addressing Data Mesh Technical Challenges with DataOps

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Amazon DataZone introduces OpenLineage-compatible data lineage visualization in preview

Data Intelligence and Its Role in Combating Covid-19

Specialized tools for machine learning development and model governance are becoming essential

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Metadata enrichment – highly scalable data classification and data discovery

Get started with the new Amazon DataZone enhancements for Amazon Redshift

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

Stay Connected