Metadata and Technology - Data Leaders Brief

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Writing SQL queries requires not just remembering the SQL syntax rules, but also knowledge of the tables metadata, which is data about table schemas, relationships among the tables, and possible column values. Although LLMs can generate syntactically correct SQL queries, they still need the table metadata for writing accurate SQL query.

Metadata

Metadata Data Lake Modeling Data Warehouse

Becoming a machine learning company means investing in foundational technologies

O'Reilly on Data

MAY 21, 2019

Metadata and artifacts needed for audits. Machine learning often interacts and impacts users, so companies not only need to put in place processes that will let them deploy ML responsibly, they need to build foundational technologies that will allow them to retain oversight, particularly when things go wrong.

Machine Learning

Machine Learning Technology Deep Learning Data Science

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

However, commits can still fail if the latest metadata is updated after the base metadata version is established. Iceberg uses a layered architecture to manage table state and data: Catalog layer Maintains a pointer to the current table metadata file, serving as the single source of truth for table state.

Snapshot

Snapshot Management Metadata Big Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

In fact, by putting a single label like AI on all the steps of a data-driven business process, we have effectively not only blurred the process, but we have also blurred the particular characteristics that make each step separately distinct, uniquely critical, and ultimately dependent on specialized, specific technologies at each step.

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

How to Operationalize Data From Multiple Sources to Deliver Actionable Insights

Speaker: Speakers from SafeGraph, Facteus, AWS Data Exchange, SimilarWeb, and AtScale

Leveraging metadata (labels, annotations) for deep dimensional analysis. Leveraging technologies including Amazon SageMaker & Redshift, Microsoft Power BI & Excel, and AtScale. Extending analysis-ready data to all of your business stakeholders at scale.

Metadata

Underlying Engineering Behind Alexa’s Contextual ASR

Analytics Vidhya

SEPTEMBER 17, 2022

Any type of contextual information, like device context, conversational context, and metadata, […]. However, we can improve the system’s accuracy by leveraging contextual information. The post Underlying Engineering Behind Alexa’s Contextual ASR appeared first on Analytics Vidhya.

Metadata

Metadata Statistics Data Science Publishing

Enterprises can gain an edge with Metadata Management

CIO Business Intelligence

SEPTEMBER 6, 2024

Central to this is metadata management, a critical component for driving future success AI and ML need large amounts of accurate data for companies to get the most out of the technology. Let’s dive into what that looks like, what workarounds some IT teams use today, and why metadata management is the key to success.

Metadata

Metadata Enterprise Management Cost-Benefit

The state of data quality in 2020

O'Reilly on Data

FEBRUARY 11, 2020

These include the basics, such as metadata creation and management, data provenance, data lineage, and other essentials. They’re still struggling with the basics: tagging and labeling data, creating (and managing) metadata, managing unstructured data, etc. On the good side, technological progress—e.g., Respondent demographics.

Data Quality

Data Quality Metadata Data Governance Publishing

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. Icebergs table format separates data files from metadata files, enabling efficient data modifications without full dataset rewrites.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

Under the hood, UniForm generates Iceberg metadata files (including metadata and manifest files) that are required for Iceberg clients to access the underlying data files in Delta Lake tables. Both Delta Lake and Iceberg metadata files reference the same data files. The table is registered in AWS Glue Data Catalog.

Metadata

Metadata Data Warehouse Big Data Data Lake

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In this blog post, we’ll discuss how the metadata layer of Apache Iceberg can be used to make data lakes more efficient. You will learn about an open-source solution that can collect important metrics from the Iceberg metadata layer. This ensures that each change is tracked and reversible, enhancing data governance and auditability.

Metadata

Metadata Snapshot Data Lake Metrics

Accelerate your migration to Amazon OpenSearch Service with Reindexing-from-Snapshot

AWS Big Data

NOVEMBER 22, 2024

How RFS works OpenSearch and Elasticsearch snapshots are a directory tree that contains both data and metadata. Metadata files exist in the snapshot to provide details about the snapshot as a whole, the source cluster’s global metadata and settings, each index in the snapshot, and each shard in the snapshot.

Snapshot

Snapshot Metadata Recreation/Entertainment Data Processing

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Central to a transactional data lake are open table formats (OTFs) such as Apache Hudi , Apache Iceberg , and Delta Lake , which act as a metadata layer over columnar formats. XTable isn’t a new table format but provides abstractions and tools to translate the metadata associated with existing formats.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio

AWS Big Data

APRIL 9, 2025

Whether youre a data analyst seeking a specific metric or a data steward validating metadata compliance, this update delivers a more precise, governed, and intuitive search experience. Refer to the product documentation to learn more about how to set up metadata rules for subscription and publishing workflows.

Metadata

Metadata Metrics Data-driven Cost-Benefit

It’s 2025. Are your data strategies strong enough to de-risk AI adoption?

CIO Business Intelligence

DECEMBER 11, 2024

According to Richard Kulkarni, Country Manager for Quest, a lack of clarity concerning governance and policy around AI means that employees and teams are finding workarounds to access the technology. Some senior technology leaders fear a Pandoras Box type situation with AI becoming impossible to control once unleashed.

Risk

Risk Data Strategy Strategy Data Governance

Enhance data governance with enforced metadata rules in Amazon DataZone

AWS Big Data

NOVEMBER 20, 2024

We’re excited to announce a new feature in Amazon DataZone that offers enhanced metadata governance for your subscription approval process. With this update, domain owners can define and enforce metadata requirements for data consumers when they request access to data assets. Key benefits The feature benefits multiple stakeholders.

Metadata

Metadata Data Governance Metrics Marketing

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

With the addition of these technologies alongside existing systems like terminal operating systems (TOS) and SAP, the number of data producers has grown substantially. From here, the metadata is published to Amazon DataZone by using AWS Glue Data Catalog. This process is shown in the following figure.

IoT

IoT Machine Learning Metadata Data-driven

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Nodes and domains serve business needs and are not technology mandated. The Institutional Data & AI platform adopts a federated approach to data while centralizing the metadata to facilitate simpler discovery and sharing of data products. A data portal for consumers to discover data products and access associated metadata.

Metadata

Metadata Data Governance Data Quality Data-driven

Accelerating AI at scale without sacrificing security

CIO Business Intelligence

NOVEMBER 27, 2024

The transformative power of AI is already evident in the way it drives significant operational efficiencies, particularly when combined with technologies like robotic process automation (RPA). The platform also offers a deeply integrated set of security and governance technologies, ensuring comprehensive data management and reducing risk.

Data Governance

Data Governance Risk Insurance Metadata

How Eightfold AI implemented metadata security in a multi-tenant data analytics environment with Amazon Redshift

AWS Big Data

NOVEMBER 29, 2023

Doing this gives them access to their raw analytics data, which can then be integrated into their analytics infrastructure irrespective of the technology stack they use. This post discusses restricting listing of data catalog metadata as per the granted permissions. Securing customer data is a top priority for Eightfold.

Metadata

Metadata Data Warehouse Analytics Data Analytics

How companies are building sustainable AI and ML initiatives

O'Reilly on Data

JANUARY 29, 2019

The current generation of AI and ML methods and technologies rely on large amounts of data—specifically, labeled training data. Companies are building or evaluating solutions in foundational technologies needed to sustain success in analytics and AI.

Deep Learning

Deep Learning Machine Learning Data Science Metadata

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

The CDH is used to create, discover, and consume data products through a central metadata catalog, while enforcing permission policies and tightly integrating data engineering, analytics, and machine learning services to streamline the user journey from data to insight. Durga Mishra is a Principal solutions architect at AWS.

Data Lake

Data Lake Sales Metadata Machine Learning

Use Amazon Kinesis Data Streams to deliver real-time data to Amazon OpenSearch Service domains with Amazon OpenSearch Ingestion

AWS Big Data

NOVEMBER 11, 2024

For example, you can use metadata about the Kinesis data stream name to index by data stream ( ${getMetadata("kinesis_stream_name") ), or you can use document fields to index data depending on the CloudWatch log group or other document data ( ${path/to/field/in/document} ).

Metadata

Metadata Metrics Analytics Data Processing

Introducing Amazon MWAA micro environments for Apache Airflow

AWS Big Data

NOVEMBER 19, 2024

Pricing and availability Amazon MWAA pricing dimensions remains unchanged, and you only pay for what you use: The environment class Metadata database storage consumed Metadata database storage pricing remains the same. He is passionate about serverless technologies, security, and compliance.

Metadata

Metadata Cost-Benefit Metrics Optimization

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

Cloudera

AUGUST 6, 2024

And for that future to be a reality, data teams must shift their attention to metadata, the new turf war for data. The need for unified metadata While open and distributed architectures offer many benefits, they come with their own set of challenges. Data teams actually need to unify the metadata. Open data is the future.

Metadata

Metadata Cost-Benefit Management Enterprise

The New O’Reilly Answers: The R in “RAG” Stands for “Royalties”

O'Reilly on Data

JUNE 14, 2024

Generative AI may be a groundbreaking new technology, but it’s also unleashed a torrent of complications that undermine its trustworthiness, many of which are the basis of lawsuits. How O’Reilly Answers Came to Be O’Reilly is a technology-focused learning platform that supports the continuous learning of tech teams.

Metadata

Metadata Publishing Data-driven Modeling

Comprehensive data management for AI: The next-gen data management engine that will drive AI to new heights

CIO Business Intelligence

NOVEMBER 19, 2024

The core of their problem is applying AI technology to the data they already have, whether in the cloud, on their premises, or more likely both. For AI to be effective, the relevant data must be easily discoverable and accessible, which requires powerful metadata management and data exploration tools.

Management

Management Unstructured Data Deep Learning Metadata

7 data trends on our radar

O'Reilly on Data

JANUARY 8, 2019

We are also seeing more training programs aimed at executives and decision makers , who need to understand how these new ML technologies can impact their current operations and products. Data engineers and data scientists are beginning to use new cloud technologies, like serverless, for some of their tasks.

Machine Learning

Machine Learning IoT Internet of Things Data Science

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Its a strategic imperative that demands the focus of both technology and business leaders. Data fabric Metadata-rich integration layer across distributed systems.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

AWS Big Data

FEBRUARY 6, 2023

MongoDB Atlas is a developer data service from AWS technology partner MongoDB, Inc. Choose the table to view the schema and other metadata. Conclusion In this post, we showed how to set up an AWS Glue crawler to crawl over a MongoDB Atlas collection, gathering metadata and creating table records in the AWS Glue Data Catalog.

Metadata

Metadata Data Lake Machine Learning Big Data

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

An Iceberg table’s metadata stores a history of snapshots, which are updated with each transaction. Over time, this creates multiple data files and metadata files as changes accumulate. Additionally, they can impact query performance due to the overhead of handling large amounts of metadata.

Snapshot

Snapshot Metadata Data Lake Optimization

Automating ethics

O'Reilly on Data

MARCH 22, 2019

The problem wasn’t with privacy technology, but with the intention: to use purchase data to target advertising circulars. While neither of these is a complete solution, I can imagine a future version of these proposals that standardizes metadata so data routing protocols can determine which flows are appropriate and which aren't.

Metadata

Metadata Advertising Insurance Modeling

Hadoop Data Mining Tools Can Enhance The Value Of Digital Assets

Smart Data Collective

AUGUST 25, 2020

Before the turn of the century, the reliance on data technology was little more than nonexistent. Hadoop technology is helping disrupt online marketing in various ways. Some of the benefits are detailed below: Optimizing metadata for greater reach and branding benefits. One of the most overlooked factors is metadata.

Data mining

Data mining Metadata Big Data ROI

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

Third, any commitment to a disruptive technology (including data-intensive and AI implementations) must start with a business strategy. Another perspective on technology-induced business disruption (including ChatGPT deployments) is to consider the three F’s that affect (and can potentially derail) such projects.

Strategy

Strategy Experimentation Uncertainty Machine Learning

How to Implement Data Lineage Mapping Techniques

Octopai

MARCH 31, 2021

Look for the Metadata. This metadata (read: data about your data) is key to tracking your data. In other words, kind of like Hansel and Gretel in the forest, your data leaves a trail of breadcrumbs – the metadata – to record where it came from and who it really is. It can work on any system or any technology.

Metadata

Metadata Data Transformation Business Intelligence Reporting

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

We have enhanced data sharing performance with improved metadata handling, resulting in data sharing first query execution that is up to four times faster when the data sharing producers data is being updated. We enhanced support for querying Apache Iceberg data and improved the performance of querying Iceberg up to threefold year-over-year.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

Solution overview Data and metadata discovery is one of the primary requirements in data analytics, where data consumers explore what data is available and in what format, and then consume or query it for analysis. But in the case of unstructured data, metadata discovery is challenging because the raw data isn’t easily readable.

Unstructured Data

Unstructured Data Metadata Management Analytics

AI adoption in the enterprise 2020

O'Reilly on Data

MARCH 18, 2020

More than half of respondent organizations identify as “mature” adopters of AI technologies: that is, they’re using AI for analysis or in production. The sample is far from tech-laden, however: the only other explicit technology category—“Computers, Electronics, & Hardware”—accounts for less than 7% of the sample.

Enterprise

Enterprise Deep Learning Data Governance Risk

When is data too clean to be useful for enterprise AI?

CIO Business Intelligence

NOVEMBER 27, 2024

A golden dataset of questions paired with a gold standard response can help you quickly benchmark new models as the technology improves. “In the generative AI world, the notion of accuracy is much more nebulous.” Missing trends Cleaning old and new data in the same way can lead to other problems.

Enterprise

Enterprise Data Quality Structured Data Modeling

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

AWS Big Data

SEPTEMBER 12, 2024

Along with the Glue Data Catalog’s automated compaction feature, these storage optimizations can help you reduce metadata overhead, control storage costs, and improve query performance. The Glue Data Catalog monitors tables daily, removes snapshots from table metadata, and removes the data files and orphan files that are no longer needed.

Optimization

Optimization Snapshot Metadata Metrics

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

Metadata analysis makes it possible to build data catalogs, which in turn allow humans to discover data that’s relevant to their projects. We now need to build the tools for software+data: tools to track data provenance and lineage, tools to build catalogs from metadata, tools to do fundamental operations like ETL.

Machine Learning

Machine Learning Software Metadata Testing

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

We took this a step further by creating a blueprint to create smart recommendations by linking similar data products using graph technology and ML. In this post, we showed how an organization can augment a data catalog with additional metadata by using ML and Neptune with an automated process.

Technology

Technology Data-driven Machine Learning Sales

Microsoft Azure OpenAI Service and DataRobot Modernize Data Science Work with Cutting-Edge Technology Innovations

DataRobot Blog

MARCH 16, 2023

This integration takes the power of one of the most advanced large language model technologies that exists today in Azure OpenAI Service, and through DataRobot, drives value-centric outcomes with machine learning. This saves us the time it would otherwise take to memorize metadata and APIs. DataRobot Launch Event From Vision to Value.

Data Science

Data Science Technology Data-driven Modeling

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

The desire to modernize technology, over time, leads to acquiring many different systems with various data entry points and transformation rules for data as it moves into and across the organization. The metadata-driven suite automatically finds, models, ingests, catalogs and governs cloud data assets. GDPR, CCPA, HIPAA, SOX, PIC DSS).

Data Governance

Data Governance Metadata Testing Data Lake

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Becoming a machine learning company means investing in foundational technologies

Webinars

Trending Sources

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Webinars

SAP Datasphere Powers Business at the Speed of Data

How to Operationalize Data From Multiple Sources to Deliver Actionable Insights

Underlying Engineering Behind Alexa’s Contextual ASR

Enterprises can gain an edge with Metadata Management

The state of data quality in 2020

Build a high-performance quant research platform with Apache Iceberg

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Accelerate your migration to Amazon OpenSearch Service with Reindexing-from-Snapshot

Run Apache XTable in AWS Lambda for background conversion of open table formats

Streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio

It’s 2025. Are your data strategies strong enough to de-risk AI adoption?

Enhance data governance with enforced metadata rules in Amazon DataZone

How EUROGATE established a data mesh architecture using Amazon DataZone

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Accelerating AI at scale without sacrificing security

How Eightfold AI implemented metadata security in a multi-tenant data analytics environment with Amazon Redshift

How companies are building sustainable AI and ML initiatives

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Use Amazon Kinesis Data Streams to deliver real-time data to Amazon OpenSearch Service domains with Amazon OpenSearch Ingestion

Introducing Amazon MWAA micro environments for Apache Airflow

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

The New O’Reilly Answers: The R in “RAG” Stands for “Royalties”

Comprehensive data management for AI: The next-gen data management engine that will drive AI to new heights

7 data trends on our radar

Data’s dark secret: Why poor quality cripples AI and growth

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Automating ethics

Hadoop Data Mining Tools Can Enhance The Value Of Digital Assets

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

How to Implement Data Lineage Mapping Techniques

Recap of Amazon Redshift key product announcements in 2024

Unstructured data management and governance using AWS AI/ML and analytics services

AI adoption in the enterprise 2020

When is data too clean to be useful for enterprise AI?

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

Deep automation in machine learning

Automate discovery of data relationships using ML and Amazon Neptune graph technology

Microsoft Azure OpenAI Service and DataRobot Modernize Data Science Work with Cutting-Edge Technology Innovations

Doing Cloud Migration and Data Governance Right the First Time

Stay Connected