Blog and Metadata - Data Leaders Brief

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

NOVEMBER 13, 2024

It leverages knowledge graphs to keep track of all the data sources and data flows, using AI to fill the gaps so you have the most comprehensive metadata management solution. Together, Cloudera and Octopai will help reinvent how customers manage their metadata and track lineage across all their data sources.

Metadata

Metadata Management Data Governance Data-driven

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Writing SQL queries requires not just remembering the SQL syntax rules, but also knowledge of the tables metadata, which is data about table schemas, relationships among the tables, and possible column values. Although LLMs can generate syntactically correct SQL queries, they still need the table metadata for writing accurate SQL query.

Metadata

Metadata Data Lake Modeling Data Warehouse

Data Governance and Metadata Management: You Can’t Have One Without the Other

erwin

FEBRUARY 13, 2020

When an organization’s data governance and metadata management programs work in harmony, then everything is easier. Creating and sustaining an enterprise-wide view of and easy access to underlying metadata is also a tall order. Metadata Management Takes Time. Finding metadata, “the data about the data,” isn’t easy.

Metadata

Metadata Data Governance Management Cost-Benefit

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Best Practices for Metadata Management

Alation

JULY 19, 2021

What Is Metadata? Metadata is information about data. A clothing catalog or dictionary are both examples of metadata repositories. Indeed, a popular online catalog, like Amazon, offers rich metadata around products to guide shoppers: ratings, reviews, and product details are all examples of metadata.

Metadata

Metadata Management Data Governance Machine Learning

7 Benefits of Metadata Management

erwin

FEBRUARY 19, 2021

Metadata management is key to wringing all the value possible from data assets. What Is Metadata? Analyst firm Gartner defines metadata as “information that describes various facets of an information asset to improve its usability throughout its life cycle. It is metadata that turns information into an asset.”.

Metadata

Metadata Management Data Quality Cost-Benefit

Metadata is Like Packaging: Seeing Beyond the Library Card Metaphor

Ontotext

MARCH 19, 2021

way we package information has a lot to do with metadata. The somewhat conventional metaphor about metadata is the one of the library card. This metaphor has it that books are the data and library cards are the metadata helping us find what we need, want to know more about or even what we don’t know we were looking for.

Metadata

Metadata Publishing Enterprise Management

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In this blog post, we’ll discuss how the metadata layer of Apache Iceberg can be used to make data lakes more efficient. You will learn about an open-source solution that can collect important metrics from the Iceberg metadata layer.

Metadata

Metadata Snapshot Data Lake Metrics

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

Under the hood, UniForm generates Iceberg metadata files (including metadata and manifest files) that are required for Iceberg clients to access the underlying data files in Delta Lake tables. Both Delta Lake and Iceberg metadata files reference the same data files. The table is registered in AWS Glue Data Catalog.

Metadata

Metadata Data Warehouse Big Data Data Lake

RDF-Star: Metadata Complexity Simplified

Ontotext

JUNE 10, 2021

Not Every Graph is a Knowledge Graph: Schemas and Semantic Metadata Matter. To be able to automate these operations and maintain sufficient data quality, enterprises have started implementing the so-called data fabrics , that employ diverse metadata sourced from different systems. Metadata about Relationships Come in Handy.

Metadata

Metadata Cost-Benefit OLAP Modeling

Announcing Open Source DataOps Data Quality TestGen 3.0

DataKitchen

FEBRUARY 20, 2025

Better Metadata Management Add Descriptions and Data Product tags to tables and columns in the Data Catalog for improved governance. Smarter Profiling & Test Generation Improved logic reduces false positives , making test results more accurate and actionable. DataOps just got more intelligent.

Data Quality

Data Quality Scorecard Testing Dashboards

Accelerate your migration to Amazon OpenSearch Service with Reindexing-from-Snapshot

AWS Big Data

NOVEMBER 22, 2024

How RFS works OpenSearch and Elasticsearch snapshots are a directory tree that contains both data and metadata. Metadata files exist in the snapshot to provide details about the snapshot as a whole, the source cluster’s global metadata and settings, each index in the snapshot, and each shard in the snapshot.

Snapshot

Snapshot Metadata Recreation/Entertainment Data Processing

Are You Content with Your Organization’s Content Strategy?

Rocket-Powered Data Science

JULY 6, 2021

If you include the title of this blog, you were just presented with 13 examples of heteronyms in the preceding paragraphs. This is accomplished through tags, annotations, and metadata (TAM). Smart content includes labeled (tagged, annotated) metadata (TAM). What you have just experienced is a plethora of heteronyms.

Strategy

Strategy Machine Learning Metadata Knowledge Discovery

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI. It is a critical feature for delivering unified access to data in distributed, multi-engine architectures.

Metadata

Metadata Data Lake Dashboards Interactive

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

Cloudera

AUGUST 6, 2024

And for that future to be a reality, data teams must shift their attention to metadata, the new turf war for data. The need for unified metadata While open and distributed architectures offer many benefits, they come with their own set of challenges. Data teams actually need to unify the metadata. Open data is the future.

Metadata

Metadata Cost-Benefit Management Enterprise

Enhance data governance with enforced metadata rules in Amazon DataZone

AWS Big Data

NOVEMBER 20, 2024

We’re excited to announce a new feature in Amazon DataZone that offers enhanced metadata governance for your subscription approval process. With this update, domain owners can define and enforce metadata requirements for data consumers when they request access to data assets. Key benefits The feature benefits multiple stakeholders.

Metadata

Metadata Data Governance Metrics Marketing

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

An Iceberg table’s metadata stores a history of snapshots, which are updated with each transaction. Over time, this creates multiple data files and metadata files as changes accumulate. Additionally, they can impact query performance due to the overhead of handling large amounts of metadata.

Snapshot

Snapshot Metadata Data Lake Optimization

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

We have enhanced data sharing performance with improved metadata handling, resulting in data sharing first query execution that is up to four times faster when the data sharing producers data is being updated. Launch summary Following is the launch summary which provides the announcement links and reference blogs for the key announcements.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Metadata Management and Data Governance with Cloudera SDX

Cloudera

JANUARY 26, 2024

This will allow a data office to implement access policies over metadata management assets like tags or classifications, business glossaries, and data catalog entities, laying the foundation for comprehensive data access control. First, a set of initial metadata objects are created by the data steward.

Metadata

Metadata Data Governance Management Finance

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

The CDH is used to create, discover, and consume data products through a central metadata catalog, while enforcing permission policies and tightly integrating data engineering, analytics, and machine learning services to streamline the user journey from data to insight.

Data Lake

Data Lake Sales Metadata Machine Learning

The Struggle Between Data Dark Ages and LLM Accuracy

Cloudera

DECEMBER 6, 2024

And specifically, I was reading one of your blog posts recently that talked about the dark ages of data. It could be metadata that you weren’t capturing before. The post The Struggle Between Data Dark Ages and LLM Accuracy appeared first on Cloudera Blog. Here are some key takeaways from Ray in that conversation.

Manufacturing

Manufacturing Forecasting Metadata Data Processing

Metadata is the Magic Behind Data Fabric

TDAN

MAY 31, 2022

The main goal of creating an enterprise data fabric is not new. It is the ability to deliver the right data at the right time, in the right shape, and to the right data consumer, irrespective of how and where it is stored. Data fabric is the common “net” that stitches integrated data from multiple data […].

Metadata

Metadata Enterprise IT Data Architecture

Integrate custom applications with AWS Lake Formation – Part 2

AWS Big Data

NOVEMBER 19, 2024

Run the following commands: export PROJ_NAME=lfappblog aws s3 cp s3://aws-blogs-artifacts-public/BDB-3934/schema.graphql ~/${PROJ_NAME}/amplify/backend/api/${PROJ_NAME}/schema.graphql In the s chema.graphql file, you can see that the lf-app-lambda-engine function is set as the data source for the GraphQL queries.

Data Processing

Data Processing Metadata Publishing Testing

Do I Need a Data Catalog?

erwin

JUNE 26, 2020

Organizations with particularly deep data stores might need a data catalog with advanced capabilities, such as automated metadata harvesting to speed up the data preparation process. Three Types of Metadata in a Data Catalog. The metadata provides information about the asset that makes it easier to locate, understand and evaluate.

Metadata

Metadata Cost-Benefit Measurement Data-driven

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

With all these diverse metadata sources, it is difficult to understand the complicated web they form much less get a simple visual flow of data lineage and impact analysis. The metadata-driven suite automatically finds, models, ingests, catalogs and governs cloud data assets. Subscribe to the erwin Expert Blog.

Data Governance

Data Governance Metadata Testing Data Lake

Addressing Data Mesh Technical Challenges with DataOps

DataKitchen

AUGUST 9, 2021

The domain requires a team that creates/updates/runs the domain, and we can’t forget metadata: catalogs, lineage, test results, processing history, etc., …. At this point in our blog series, we have talked about data mesh from an organizational and technical perspective.

Testing

Testing Data Lake Metadata Publishing

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

Discoverable – users have access to a catalog or metadata management tool which renders the domain discoverable and accessible. We’ll cover some of the potential challenges facing data mesh enterprise architectures in our next blog. Also, the domain must support the attributes that are part of every modern data architecture.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Apache Iceberg addresses customer needs by capturing rich metadata information about the dataset at the time the individual data files are created.

Data Lake

Data Lake Data Processing Metadata Snapshot

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

These rules are not necessarily “Rocket Science” (despite the name of this blog site), but they are common business sense for most business-disruptive technology implementations in enterprises. The latter is essential for Generative AI implementations.

Strategy

Strategy Experimentation Uncertainty Machine Learning

Dark Data: How to Find It and What to Do with It

Timo Elliott

JANUARY 6, 2022

In a previous blog , I explored the value of dark data and how it can reveal insights that can streamline processes, improve customer experiences, generate more revenue – and maybe even help make the world a better place. Analyze your metadata. The data you’ve collected and saved over the years isn’t free. Use people.

IT

IT Metadata Data-driven Data Governance

5 Ways Data Modeling Is Critical to Data Governance

erwin

JANUARY 9, 2020

That’s because it’s the best way to visualize metadata , and metadata is now the heart of enterprise data management and data governance/ intelligence efforts. erwin DM 2020 is an essential source of metadata and a critical enabler of data governance and intelligence efforts. erwin Data Modeler: Where the Magic Happens.

Data Governance

Data Governance Modeling Metadata Unstructured Data

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Business analysts enhance the data with business metadata/glossaries and publish the same as data assets or data products. Users can search for assets in the Amazon DataZone catalog, view the metadata assigned to them, and access the assets. Amazon Athena is used to query, and explore the data.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

This means the data files in the data lake aren’t modified during the migration and all Apache Iceberg metadata files (manifests, manifest files, and table metadata files) are generated outside the purview of the data. In this method, the metadata are recreated in an isolated environment and colocated with the existing data files.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Why data observability is essential to AI governance

erwin

DECEMBER 9, 2024

Metadata is the basis of trust for data forensics as we answer the questions of fact or fiction when it comes to the data we see. Being that AI is comprised of more data than code, it is now more essential than ever to combine data with metadata in near real-time.

Metadata

Metadata Data Quality Sales Modeling

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Cloudera

NOVEMBER 13, 2020

This is part of our series of blog posts on recent enhancements to Impala. Metadata Caching. As Impala’s adoption grew the catalog service started to experience these growing pains, therefore recently we introduced two new features to alleviate the stress, On-demand Metadata and Zero Touch Metadata. More on this below.

Optimization

Optimization Metadata Statistics Cost-Benefit

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

After you create the asset, you can add glossaries or metadata forms, but its not necessary for this post. Create it as a JSON file on your workstation (for this post, we call it blog-sub-target.json ). Enter a name for the asset. For Asset type , choose S3 object collection. For S3 location ARN , enter the ARN of the S3 prefix.

Publishing

Publishing Unstructured Data Metadata Data-driven

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

This blog post will explore how zero-ETL capabilities combined with its new application connectors are transforming the way businesses integrate and analyze their data from popular platforms such as ServiceNow, Salesforce, Zendesk, SAP and others. The data is also registered in the Glue Data Catalog , a metadata repository.

Data Integration

Data Integration Data Lake Statistics Data-driven

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. Data and Metadata: Data inputs and data outputs produced based on the application logic. Introduction.

Metadata

Metadata Cost-Benefit Enterprise Interactive

How Far We Can Go with GenAI as an Information Extraction Tool

Ontotext

JANUARY 10, 2025

This blog post summarizes our findings, focusing on NER as a first-step key task for knowledge extraction. You can use the Ontotext Metadata Studio (OMDS) to integrate any NER model and apply it to your documents to extract the entities you are interested in.

Informatics

Informatics Modeling Metadata Experimentation

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

In this blog post, we dive into different data aspects and how Cloudinary breaks the two concerns of vendor locking and cost efficient data analytics by using Apache Iceberg, Amazon Simple Storage Service (Amazon S3 ), Amazon Athena , Amazon EMR , and AWS Glue. Each change to a table produces a new metadata file to provide atomicity.

Data Lake

Data Lake Metadata Snapshot Analytics

A Data Prediction for 2025

DataKitchen

FEBRUARY 2, 2023

Most data governance tools today start with the slow, waterfall building of metadata with data stewards and then hope to use that metadata to drive code that runs in production. In reality, the ‘active metadata’ is just a written specification for a data developer to write their code.

Metadata

Metadata Testing Data Science Risk

Apache Ozone Powers Data Science in CDP Private Cloud

Cloudera

AUGUST 26, 2021

In this blog post, we will ingest a real world dataset into Ozone, create a Hive table on top of it and analyze the data to study the correlation between new vaccinations and new cases per country using a Spark ML Jupyter notebook in CML. Learn more about the impacts of global data sharing in this blog, The Ethics of Data Exchange.

Data Science

Data Science Forecasting Metadata Machine Learning

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Companies such as Adobe , Expedia , LinkedIn , Tencent , and Netflix have published blogs about their Apache Iceberg adoption for processing their large scale analytics datasets. . In CDP we enable Iceberg tables side-by-side with the Hive table types, both of which are part of our SDX metadata and security framework.

Snapshot

Snapshot Metadata Cost-Benefit Data Architecture

Why You Need End-to-End Data Lineage

erwin

SEPTEMBER 10, 2020

In a previous blog , I explained that data lineage is basically the history of data, including a data set’s origin, characteristics, quality and movement over time. Data lineage helps answer questions about the origin of data in key performance indicator (KPI) reports, including: How are the report tables and columns defined in the metadata?

Data Governance

Data Governance Key Performance Indicator Metadata Digital Transformation

What’s the Current State of Data Governance and Automation?

erwin

JANUARY 30, 2020

However, more than 50 percent say they have deployed metadata management, data analytics, and data quality solutions. erwin Named a Leader in Gartner 2019 Metadata Management Magic Quadrant. And close to 50 percent have deployed data catalogs and business glossaries. Most have only data governance operations.

Data Governance

Data Governance Metadata Cost-Benefit Digital Transformation

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Webinars

Trending Sources

Data Governance and Metadata Management: You Can’t Have One Without the Other

Webinars

Best Practices for Metadata Management

7 Benefits of Metadata Management

Metadata is Like Packaging: Seeing Beyond the Library Card Metaphor

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

RDF-Star: Metadata Complexity Simplified

Announcing Open Source DataOps Data Quality TestGen 3.0

Accelerate your migration to Amazon OpenSearch Service with Reindexing-from-Snapshot

Are You Content with Your Organization’s Content Strategy?

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

Enhance data governance with enforced metadata rules in Amazon DataZone

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Recap of Amazon Redshift key product announcements in 2024

Metadata Management and Data Governance with Cloudera SDX

How BMW streamlined data access using AWS Lake Formation fine-grained access control

The Struggle Between Data Dark Ages and LLM Accuracy

Metadata is the Magic Behind Data Fabric

Integrate custom applications with AWS Lake Formation – Part 2

Do I Need a Data Catalog?

Doing Cloud Migration and Data Governance Right the First Time

Addressing Data Mesh Technical Challenges with DataOps

What is a Data Mesh?

Use Apache Iceberg in a data lake to support incremental data processing

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Dark Data: How to Find It and What to Do with It

5 Ways Data Modeling Is Critical to Data Governance

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Why data observability is essential to AI governance

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

How Cloudera Data Flow Enables Successful Data Mesh Architectures

How Far We Can Go with GenAI as an Information Extraction Tool

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

A Data Prediction for 2025

Apache Ozone Powers Data Science in CDP Private Cloud

Introducing Apache Iceberg in Cloudera Data Platform

Why You Need End-to-End Data Lineage

What’s the Current State of Data Governance and Automation?

Stay Connected