Blog, Metadata and Technology - Data Leaders Brief

Blog

Metadata

Technology

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Writing SQL queries requires not just remembering the SQL syntax rules, but also knowledge of the tables metadata, which is data about table schemas, relationships among the tables, and possible column values. Although LLMs can generate syntactically correct SQL queries, they still need the table metadata for writing accurate SQL query.

Metadata

Metadata Data Lake Modeling Data Warehouse

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

Under the hood, UniForm generates Iceberg metadata files (including metadata and manifest files) that are required for Iceberg clients to access the underlying data files in Delta Lake tables. Both Delta Lake and Iceberg metadata files reference the same data files. The table is registered in AWS Glue Data Catalog.

Metadata

Metadata Data Warehouse Big Data Data Lake

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Streamline Payment Applications & Lien Waivers Through Innovative Construction Technology

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Trending Sources

Accelerate your migration to Amazon OpenSearch Service with Reindexing-from-Snapshot

AWS Big Data

NOVEMBER 22, 2024

How RFS works OpenSearch and Elasticsearch snapshots are a directory tree that contains both data and metadata. Metadata files exist in the snapshot to provide details about the snapshot as a whole, the source cluster’s global metadata and settings, each index in the snapshot, and each shard in the snapshot.

Snapshot

Snapshot Metadata Recreation/Entertainment Data Processing

Webinars

How to Streamline Payment Applications & Lien Waivers Through Innovative Construction Technology

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

How Volkswagen Autoeuropa built a data solution with a robust governance framework, simplifying access to quality data using Amazon DataZone

AWS Big Data

NOVEMBER 13, 2024

In addition, the team aligned on business metadata attributes that would help with data discovery. Business metadata Business metadata helps users understand the context of the data, which can lead to increased trust in the data. This provides consistency of business metadata across the organization.

Metadata

Metadata Data Quality Digital Transformation Data-driven

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

The CDH is used to create, discover, and consume data products through a central metadata catalog, while enforcing permission policies and tightly integrating data engineering, analytics, and machine learning services to streamline the user journey from data to insight. Durga Mishra is a Principal solutions architect at AWS.

Data Lake

Data Lake Sales Metadata Machine Learning

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

We have enhanced data sharing performance with improved metadata handling, resulting in data sharing first query execution that is up to four times faster when the data sharing producers data is being updated. Launch summary Following is the launch summary which provides the announcement links and reference blogs for the key announcements.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

After you create the asset, you can add glossaries or metadata forms, but its not necessary for this post. Create it as a JSON file on your workstation (for this post, we call it blog-sub-target.json ). Enter a name for the asset. For Asset type , choose S3 object collection. For S3 location ARN , enter the ARN of the S3 prefix.

Publishing

Publishing Unstructured Data Metadata Data-driven

Key Takeaways from AWS re:Invent 2024

Cloudera

DECEMBER 19, 2024

It is one of the biggest technology conferences of the year and is an opportunity to have hundreds of conversations with customers and prospects, listen to their priorities and challenges, hopes, and give them a Cloudera tote bag or a pair of orange sunglasses. The post Key Takeaways from AWS re:Invent 2024 appeared first on Cloudera Blog.

Metadata

Metadata Data Processing Machine Learning Cost-Benefit

AI-Powered Feature Engineering with n8n: Scaling Data Science Intelligence

KDnuggets

AUGUST 8, 2025

Heres where n8n really shines: you can connect different technologies smoothly. He bridges the gap between emerging AI technologies and practical implementation for working professionals. Combine data processing, AI analysis, and professional reporting without jumping between tools or managing complex infrastructure.

Data Science

Data Science Statistics Machine Learning Advertising

Build an analytics pipeline that is resilient to Avro schema changes using Amazon Athena

AWS Big Data

JULY 25, 2025

As technology progresses, the Internet of Things (IoT) expands to encompass more and more things. The schema literal serves as a form of metadata, providing a clear description of your data structure. Additionally, it reduces the number of API calls to the metadata store, potentially lowering costs associated with these operations.

IoT

IoT Analytics Metadata Measurement

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

This balance between unification and maintaining advanced capabilities is key to supporting our customers’ ongoing innovation and adaptability in a rapidly changing technological landscape. Collaboration is seamless, with straightforward publishing and subscribing workflows, fostering a more connected and efficient work environment.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

My Take on the 2024 Gartner® Critical Capabilities for Data Integration Tools Report

Data Virtualization

AUGUST 5, 2025

The post My Take on the 2024 Gartner® Critical Capabilities for Data Integration Tools Report appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Data Integration

Data Integration Reporting Metadata Data Architecture

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

An Iceberg table’s metadata stores a history of snapshots, which are updated with each transaction. Over time, this creates multiple data files and metadata files as changes accumulate. Additionally, they can impact query performance due to the overhead of handling large amounts of metadata.

Snapshot

Snapshot Metadata Data Lake Optimization

Data Quality Testing: A Shared Resource for Modern Data Teams

DataKitchen

JUNE 6, 2025

Data Governance Teams: Data Governance professionals employ quality testing as a means to enhance data catalogs with high-quality metadata. They establish quality metrics, set thresholds, and collaborate with upstream systems to identify and address the root causes of data issues.

Data Quality

Data Quality Testing Dashboards Metrics

Why data observability is essential to AI governance

erwin

DECEMBER 9, 2024

Metadata is the basis of trust for data forensics as we answer the questions of fact or fiction when it comes to the data we see. Being that AI is comprised of more data than code, it is now more essential than ever to combine data with metadata in near real-time.

Metadata

Metadata Data Quality Sales Modeling

Generative AI: A Self-Study Roadmap

KDnuggets

JULY 11, 2025

Preprocessing steps like cleaning formatting, extracting metadata, and creating document summaries improve retrieval accuracy. For example, a marketing content generator that produces blog posts, social media content, and email campaigns based on product information and target audience.

Machine Learning

Machine Learning Testing Data Science Cost-Benefit

Accelerate queries on Apache Iceberg tables through AWS Glue auto compaction

AWS Big Data

DECEMBER 19, 2024

By using features like Icebergs compaction, OTFs streamline maintenance, making it straightforward to manage object and metadata versioning at scale. Enabling automatic compaction on Iceberg tables reduces metadata overhead on your Iceberg tables and improves query performance. The Data Catalog manages the metadata for the datasets.

Data Lake

Data Lake IoT Metadata Testing

Data Insights Assure Quality Data and Confident Decisions!

Smarten

NOVEMBER 26, 2024

Today, organizations look to data and to technology to help them understand historical results, and predict the future needs of the enterprise to manage everything from suppliers and supplies to new locations, new products and services, hiring, training and investments.

Machine Learning

Machine Learning Data Quality Predictive Modeling Metadata

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

This blog post will explore how zero-ETL capabilities combined with its new application connectors are transforming the way businesses integrate and analyze their data from popular platforms such as ServiceNow, Salesforce, Zendesk, SAP and others. The data is also registered in the Glue Data Catalog , a metadata repository.

Data Integration

Data Integration Data Lake Statistics Data-driven

How Far We Can Go with GenAI as an Information Extraction Tool

Ontotext

JANUARY 10, 2025

This blog post summarizes our findings, focusing on NER as a first-step key task for knowledge extraction. You can use the Ontotext Metadata Studio (OMDS) to integrate any NER model and apply it to your documents to extract the entities you are interested in.

Informatics

Informatics Metadata Modeling Experimentation

The R in RAG

Data Virtualization

JULY 30, 2025

The post The R in RAG appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information. Many know that it stands for retrieval augmented generation, but recently I’ve encountered some confusion around the “R” (retrieval) aspect of RAG. I think that much of that confusion.

Data Integration

Data Integration Management IT ROI

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

For this use case, create a data source and import the technical metadata of four data assets— customers , order_items , orders , products , reviews , and shipments —from AWS Glue Data Catalog. See the Amazon DataZone and Tableau blog post for step-by-step instructions. Connect with him on LinkedIn.

Visualization

Visualization Data Lake Testing Data Governance

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

AWS Big Data

DECEMBER 9, 2024

The metadata of an Iceberg table stores a history of snapshots. aws s3 cp s3://aws-blogs-artifacts-public/artifacts/BDB-4341/data/part-00000-fa08487a-43c2-4398-bae9-9cb912f8843c-c000.snappy.parquet aws s3 cp s3://aws-blogs-artifacts-public/artifacts/BDB-4341/data/new-part-00000-e8a06ab0-f33d-4b3b-bd0a-f04d366f067e-c000.snappy.parquet

Data Quality

Data Quality Publishing Snapshot Data Lake

Introducing erwin Data Modeler 15.0: Bridging the Gap Between Data Modeling & Data Engineering

erwin

JULY 9, 2025

Yet despite these technological advances, one challenge persists: overcoming the gap between data modeling and implementation. For the first time, data architects can export YAML files directly from erwin Data Modeler with rich metadata intact, creating a seamless handoff to data engineering teams. erwin Data Modeler 15.0 Download Now!

Modeling

Modeling Metadata Visualization Data Architecture

Secure Data Sharing and Interoperability Powered by Iceberg REST Catalog

Cloudera

DECEMBER 3, 2024

Many enterprises have heterogeneous data platforms and technology stacks across different business units or data domains. REST Catalog Value Proposition It provides open, metastore-agnostic APIs for Iceberg metadata operations, dramatically simplifying the Iceberg client and metastore/engine integration.

Metadata

Metadata Data Warehouse ROI Snapshot

Summary of the Gartner Presentation: “How Can You Leverage Technologies to Solve Data Quality Challenges?”

DataKitchen

DECEMBER 17, 2024

The Gartner presentation, How Can You Leverage Technologies to Solve Data Quality Challenges? Gartners solution emphasizes adopting augmented data quality technologies that use automation, AI/ML-driven insights, and metadata-driven workflows to improve efficiency. Poor data quality, on average, costs organizations $12.9

Data Quality

Data Quality Technology Data-driven Testing

Amazon OpenSearch Service 101: Create your first search application with OpenSearch

AWS Big Data

JUNE 25, 2025

Each product record contains rich metadata, including title, detailed description, category, color, and price. For more insights, best practices and architectures, and industry trends, refer to Amazon OpenSearch Service blog posts and hands-on workshops at AWS Workshops. For an exhaustive list, refer to Search features.

Dashboards

Dashboards IoT Interactive Visualization

Don’t get left in the dark with SAP PowerDesigner: Keep the lights on with erwin

erwin

MAY 22, 2025

A looming power outage The darkness is already creeping in, and itll only get worse, as you face: The end of updates As SAP PowerDesigner is phased out, ongoing development will cease, leaving you stuck with outdated technology. Discontinued support When issues arise, youll have nowhere to turn for help.

Uncertainty

Uncertainty Modeling Metadata Data Integration

Near real-time baggage operational insights for airlines using Amazon Kinesis Data Streams

AWS Big Data

JULY 8, 2025

More details related to baggage operational database modernization can be found at Enhance the reliability of airlines’ mission-critical baggage handling using Amazon DynamoDB in the AWS Database Blog. As a trusted advisor, he works directly with the client executive and architects on business strategy to define a technology roadmap.

Internet of Things

Internet of Things IoT Metrics Data-driven

Advance top 2025 data initiatives with analyst firm-recognized erwin by Quest

erwin

JANUARY 23, 2025

Thankfully, technology can help. Industry analysts provide valuable insights for both software evaluators and technology providers If youre new to the data intelligence and governance analyst community, there are many respected research firms providing insights through different lenses tackling a variety of data intelligence use cases.

Metadata

Metadata Data Quality Data Governance Software

My Reflections on the Gartner® Hype Cycle™ for Data Management, 2024

Data Virtualization

DECEMBER 20, 2024

Reading Time: 3 minutes Gartner Hype Cycle provides a graphic representation of the maturity and adoption of technologies and applications, and how they are potentially relevant to solving real business problems and exploiting new opportunities. Gartner Hype Cycle methodology provides a view of how.

Management

Management Data Integration Technology Data Architecture

Data Management with the User Experience in Mind

Data Virtualization

JANUARY 8, 2025

The post Data Management with the User Experience in Mind appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information. Generative AI (GenAI), with its challenges, brought hope in a new reality of what could become of our data. There is still some hard work ahead, but now we.

Management

Management Data Integration IT Data Lake

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

AWS Big Data

MARCH 21, 2025

In this blog post, we will demonstrate how business units can use Amazon SageMaker Unified Studio to discover, subscribe to, and analyze these distributed data assets. The table metadata is managed by Data Catalog. This is a SageMaker Lakehouse managed catalog backed by RMS storage.

Data Warehouse

Data Warehouse Metadata Publishing Sales

A Field Guide to Rapidly Improving AI Products

O'Reilly on Data

APRIL 15, 2025

Even small UX decisionslike where to place metadata or which filters to exposecan make the difference between a tool people actually use and one they avoid. As I wrote in my LLM-as-a-Judge blog post , synthetic data can be remarkably effective for evaluation. Fortunately, theres a solution that works surprisingly well: synthetic data.

Experimentation

Experimentation Testing Metrics Measurement

Denodo on Deepseek R1: Opportunities & Considerations for GenAI Initiatives

Data Virtualization

FEBRUARY 25, 2025

The post Denodo on Deepseek R1: Opportunities & Considerations for GenAI Initiatives appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information. Denodo applauds the release of Deepseek R1 and the ingenuity.

Data Integration

Data Integration Marketing Management Metadata

Data Governance and Metadata Management: You Can’t Have One Without the Other

erwin

FEBRUARY 13, 2020

When an organization’s data governance and metadata management programs work in harmony, then everything is easier. Creating and sustaining an enterprise-wide view of and easy access to underlying metadata is also a tall order. Metadata Management Takes Time. Finding metadata, “the data about the data,” isn’t easy.

Metadata

Metadata Data Governance Management Cost-Benefit

7 Benefits of Metadata Management

erwin

FEBRUARY 19, 2021

Metadata management is key to wringing all the value possible from data assets. What Is Metadata? Analyst firm Gartner defines metadata as “information that describes various facets of an information asset to improve its usability throughout its life cycle. It is metadata that turns information into an asset.”.

Metadata

Metadata Management Data Quality Cost-Benefit

Metadata is Like Packaging: Seeing Beyond the Library Card Metaphor

Ontotext

MARCH 19, 2021

way we package information has a lot to do with metadata. The somewhat conventional metaphor about metadata is the one of the library card. This metaphor has it that books are the data and library cards are the metadata helping us find what we need, want to know more about or even what we don’t know we were looking for.

Metadata

Metadata Publishing Enterprise Management

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In this blog post, we’ll discuss how the metadata layer of Apache Iceberg can be used to make data lakes more efficient. You will learn about an open-source solution that can collect important metrics from the Iceberg metadata layer.

Metadata

Metadata Snapshot Data Lake Metrics

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

erwin

OCTOBER 24, 2019

While there has been a lot of talk about big data over the years, the real hero in unlocking the value of enterprise data is metadata , or the data about the data. And to truly understand it , you need to be able to create and sustain an enterprise-wide view of and easy access to underlying metadata. This isn’t an easy task.

Metadata

Metadata Management Data-driven Data Architecture

Metadata Management, Data Governance and Automation

erwin

NOVEMBER 6, 2019

According to IDC’s “Data Intelligence in Context” Technology Spotlight sponsored by erwin, “professionals who work with data spend 80 percent of their time looking for and preparing data and only 20 percent of their time on analytics.”. IDC Technology Spotlight, Data Intelligence in Context: Get the report (… it’s free).

Metadata

Metadata Data Governance Management Cost-Benefit

Enhance data governance with enforced metadata rules in Amazon DataZone

AWS Big Data

NOVEMBER 20, 2024

We’re excited to announce a new feature in Amazon DataZone that offers enhanced metadata governance for your subscription approval process. With this update, domain owners can define and enforce metadata requirements for data consumers when they request access to data assets. Key benefits The feature benefits multiple stakeholders.

Metadata

Metadata Data Governance Metrics Marketing

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

Third, any commitment to a disruptive technology (including data-intensive and AI implementations) must start with a business strategy. Another perspective on technology-induced business disruption (including ChatGPT deployments) is to consider the three F’s that affect (and can potentially derail) such projects.

Strategy

Strategy Experimentation Uncertainty Machine Learning

Top 10 Data Lineage Podcasts, Blogs, and Magazines

Octopai

JANUARY 31, 2021

Our list of Top 10 Data Lineage Podcasts, Blogs, and Websites To Follow in 2021. The particular episode we recommend looks at how WeWork struggled with understanding their data lineage so they created a metadata repository to increase visibility. Data Engineering Podcast. Agile Data. A-Team Insight. Malcolm Chisholm.

Data Governance

Data Governance Data Processing Data Quality Metadata

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Webinars

Trending Sources

Accelerate your migration to Amazon OpenSearch Service with Reindexing-from-Snapshot

Webinars

How Volkswagen Autoeuropa built a data solution with a robust governance framework, simplifying access to quality data using Amazon DataZone

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Recap of Amazon Redshift key product announcements in 2024

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

Key Takeaways from AWS re:Invent 2024

AI-Powered Feature Engineering with n8n: Scaling Data Science Intelligence

Build an analytics pipeline that is resilient to Avro schema changes using Amazon Athena

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

My Take on the 2024 Gartner® Critical Capabilities for Data Integration Tools Report

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Data Quality Testing: A Shared Resource for Modern Data Teams

Why data observability is essential to AI governance

Generative AI: A Self-Study Roadmap

Accelerate queries on Apache Iceberg tables through AWS Glue auto compaction

Data Insights Assure Quality Data and Confident Decisions!

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

How Far We Can Go with GenAI as an Information Extraction Tool

The R in RAG

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

Introducing erwin Data Modeler 15.0: Bridging the Gap Between Data Modeling & Data Engineering

Secure Data Sharing and Interoperability Powered by Iceberg REST Catalog

Summary of the Gartner Presentation: “How Can You Leverage Technologies to Solve Data Quality Challenges?”

Amazon OpenSearch Service 101: Create your first search application with OpenSearch

Don’t get left in the dark with SAP PowerDesigner: Keep the lights on with erwin

Near real-time baggage operational insights for airlines using Amazon Kinesis Data Streams

Advance top 2025 data initiatives with analyst firm-recognized erwin by Quest

My Reflections on the Gartner® Hype Cycle™ for Data Management, 2024

Data Management with the User Experience in Mind

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

A Field Guide to Rapidly Improving AI Products

Denodo on Deepseek R1: Opportunities & Considerations for GenAI Initiatives

Data Governance and Metadata Management: You Can’t Have One Without the Other

7 Benefits of Metadata Management

Metadata is Like Packaging: Seeing Beyond the Library Card Metaphor

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

Metadata Management, Data Governance and Automation

Enhance data governance with enforced metadata rules in Amazon DataZone

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Top 10 Data Lineage Podcasts, Blogs, and Magazines

Stay Connected