Analytics, Metadata and Reference

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. These data processing and analytical services support Structured Query Language (SQL) to interact with the data.

Metadata

Metadata Data Lake Modeling Data Warehouse

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

AWS Big Data

MAY 20, 2025

For instructions, refer to Creating a general purpose bucket. It reads metadata from your structured data store to generate SQL queries. For more information, refer to the Set up query engine for your structured data store in Amazon Bedrock Knowledge Bases. To learn more, refer to Amazon Bedrock pricing. Choose Next.

Structured Data

Structured Data Data Warehouse Analytics Finance

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

This expands data access to broader options of analytics engines. Under the hood, UniForm generates Iceberg metadata files (including metadata and manifest files) that are required for Iceberg clients to access the underlying data files in Delta Lake tables. With UniForm, you can read Delta Lake tables as Apache Iceberg tables.

Metadata

Metadata Data Warehouse Big Data Data Lake

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Initially, data warehouses were the go-to solution for structured data and analytical workloads but were limited by proprietary storage formats and their inability to handle unstructured data. In practice, OTFs are used in a broad range of analytical workloads, from business intelligence to machine learning.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

However, commits can still fail if the latest metadata is updated after the base metadata version is established. Iceberg uses a layered architecture to manage table state and data: Catalog layer Maintains a pointer to the current table metadata file, serving as the single source of truth for table state.

Snapshot

Snapshot Management Metadata Big Data

Write queries faster with Amazon Q generative SQL for Amazon Redshift

AWS Big Data

NOVEMBER 7, 2024

Amazon Redshift is a fully managed, AI-powered cloud data warehouse that delivers the best price-performance for your analytics workloads at any scale. It enables you to get insights faster without extensive knowledge of your organization’s complex database schema and metadata. Within this feature, user data is secure and private.

Metadata

Metadata Sales Data Warehouse Optimization

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

This week on the keynote stages at AWS re:Invent 2024, you heard from Matt Garman, CEO, AWS, and Swami Sivasubramanian, VP of AI and Data, AWS, speak about the next generation of Amazon SageMaker , the center for all of your data, analytics, and AI. The relationship between analytics and AI is rapidly evolving.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio

AWS Big Data

APRIL 9, 2025

Whether youre a data analyst seeking a specific metric or a data steward validating metadata compliance, this update delivers a more precise, governed, and intuitive search experience. Refer to the product documentation to learn more about how to set up metadata rules for subscription and publishing workflows.

Metadata

Metadata Metrics Data-driven Cost-Benefit

RDF-Star: Metadata Complexity Simplified

Ontotext

JUNE 10, 2021

This is a graph of millions of edges and vertices – in enterprise data management terms it is a giant piece of master/reference data. To handle such scenarios you need a transalytical graph database – a database engine that can deal with both frequent updates (OLTP workload) as well as with graph analytics (OLAP).

Metadata

Metadata Cost-Benefit OLAP Modeling

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. Icebergs table format separates data files from metadata files, enabling efficient data modifications without full dataset rewrites.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. This allowed customers to scale read analytics workloads and offered isolation to help maintain SLAs for business-critical applications.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

By providing a standardized framework for data representation, open table formats break down data silos, enhance data quality, and accelerate analytics at scale. For more details, refer to Iceberg Release 1.6.1. An Iceberg table’s metadata stores a history of snapshots, which are updated with each transaction.

Snapshot

Snapshot Metadata Data Lake Optimization

Best Practices for Metadata Management

Alation

JULY 19, 2021

What Is Metadata? Metadata is information about data. A clothing catalog or dictionary are both examples of metadata repositories. Indeed, a popular online catalog, like Amazon, offers rich metadata around products to guide shoppers: ratings, reviews, and product details are all examples of metadata.

Metadata

Metadata Management Data Governance Machine Learning

How Eightfold AI implemented metadata security in a multi-tenant data analytics environment with Amazon Redshift

AWS Big Data

NOVEMBER 29, 2023

The Eightfold Talent Intelligence Platform powered by Amazon Redshift and Amazon QuickSight provides a full-fledged analytics platform for Eightfold’s customers. It delivers analytics and enhanced insights about the customer’s Talent Acquisition, Talent Management pipelines, and much more.

Metadata

Metadata Data Warehouse Analytics Data Analytics

Enhance data governance with enforced metadata rules in Amazon DataZone

AWS Big Data

NOVEMBER 20, 2024

We’re excited to announce a new feature in Amazon DataZone that offers enhanced metadata governance for your subscription approval process. With this update, domain owners can define and enforce metadata requirements for data consumers when they request access to data assets. Key benefits The feature benefits multiple stakeholders.

Metadata

Metadata Data Governance Metrics Marketing

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Organizations with legacy, on-premises, near-real-time analytics solutions typically rely on self-managed relational databases as their data store for analytics workloads. Near-real-time streaming analytics captures the value of operational data and metrics to provide new insights to create business opportunities.

Management

Management Metadata Analytics Dashboards

Reduce your compute costs for stream processing applications with Kinesis Client Library 3.0

AWS Big Data

NOVEMBER 6, 2024

reduces the Amazon DynamoDB cost associated with KCL by optimizing read operations on the DynamoDB table storing metadata. KCL uses DynamoDB to store metadata such as shard-worker mapping and checkpoints. x benefits, refer to Use features of the AWS SDK for Java 2.x. Refer to Step 4 of Migrating from KCL 2.x x to KCL 3.x

Cost-Benefit

Cost-Benefit Metadata Optimization Publishing

A Reference Architecture for the Cloudera Private Cloud Base Data Platform

Cloudera

JULY 15, 2021

Best of CDH & HDP, with added analytic and platform features . All three will be quorums of Zookeepers and HDFS Journal nodes to track changes to HDFS Metadata stored on the Namenodes. The post A Reference Architecture for the Cloudera Private Cloud Base Data Platform appeared first on Cloudera Blog. Networking .

Data Processing

Data Processing Metadata Testing Management

Introducing Amazon MWAA micro environments for Apache Airflow

AWS Big Data

NOVEMBER 19, 2024

Pricing and availability Amazon MWAA pricing dimensions remains unchanged, and you only pay for what you use: The environment class Metadata database storage consumed Metadata database storage pricing remains the same. Refer to Amazon Managed Workflows for Apache Airflow Pricing for rates and more details.

Metadata

Metadata Cost-Benefit Metrics Optimization

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

Solution overview Data and metadata discovery is one of the primary requirements in data analytics, where data consumers explore what data is available and in what format, and then consume or query it for analysis. But in the case of unstructured data, metadata discovery is challenging because the raw data isn’t easily readable.

Unstructured Data

Unstructured Data Metadata Management Analytics

Amazon SageMaker Lakehouse now supports attribute-based access control

AWS Big Data

APRIL 24, 2025

You can secure and centrally manage your data in the lakehouse by defining fine-grained permissions with Lake Formation that are consistently applied across all analytics and machine learning(ML) tools and engines. In this post, we demonstrate how to get started with ABAC in SageMaker Lakehouse and use with various analytics services.

Sales

Sales Data Lake Management Data-driven

6 Case Studies on The Benefits of Business Intelligence And Analytics

datapine

JANUARY 31, 2022

Using business intelligence and analytics effectively is the crucial difference between companies that succeed and companies that fail in the modern environment. Your Chance: Want to try a professional BI analytics software? Experts say that BI and data analytics makes the decision-making process 5x times faster for businesses.

Business Intelligence

Business Intelligence Analytics Cost-Benefit ROI

Three Emerging Analytics Products Derived from Value-driven Data Innovation and Insights Discovery in the Enterprise

Rocket-Powered Data Science

JULY 19, 2023

I recently saw an informal online survey that asked users which types of data (tabular, text, images, or “other”) are being used in their organization’s analytics applications. The results showed that (among those surveyed) approximately 90% of enterprise analytics applications are being built on tabular data.

Data-driven

Data-driven Enterprise Analytics Machine Learning

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

AWS Big Data

NOVEMBER 7, 2024

BladeBridge offers a comprehensive suite of tools that automate much of the complex conversion work, allowing organizations to quickly and reliably transition their data analytics capabilities to the scalable Amazon Redshift data warehouse. For more details, refer to the BladeBridge Analyzer Demo.

Data Warehouse

Data Warehouse Reporting Big Data Data Lake

What Is a Metadata Catalog? (And How it Can Dramatically Improve Your Data Accuracy)

Octopai

JANUARY 31, 2022

If you’re a mystery lover, I’m sure you’ve read that classic tale: Sherlock Holmes and the Case of the Deceptive Data, and you know how a metadata catalog was a key plot element. When your CRM talks about “conversions” and your analytics suite talks about “conversions,” are they speaking the same language ? Enter the metadata catalog.

Metadata

Metadata IT Unstructured Data IoT

The Power of Graph Databases, Linked Data, and Graph Algorithms

Rocket-Powered Data Science

MARCH 10, 2020

I wrote an extensive piece on the power of graph databases, linked data, graph algorithms, and various significant graph analytics applications. The book is awesome, an absolute must-have reference volume, and it is free (for now, downloadable from Neo4j ). Well, the graph analytics algorithm would notice!

Metadata

Metadata Machine Learning Prescriptive Analytics ROI

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

AWS Big Data

MARCH 5, 2025

After theyve been published, you can query the published assets from another AWS account using analytical tools such as Amazon Athena and the Amazon Redshift query editor , as shown in the following figure. The AWS account that needs to access or use the data from the producer account is referred to as the consumer account.

Analytics

Analytics Publishing Metadata Sales

Best practices for upgrading Amazon MWAA environments

AWS Big Data

JUNE 2, 2025

Refer to Introducing in-place version upgrades with Amazon MWAA for more details. With this approach, you create a new Amazon MWAA environment, migrate your metadata, and manage the transition between environments. The Airflow scheduler automatically populates some metadata tables (dag, dag_tag, and dag_code) in your new environment.

Metadata

Metadata Testing Metrics Management

Multicloud data lake analytics with Amazon Athena

AWS Big Data

MARCH 18, 2024

In these cases, you may want an integrated query layer to seamlessly run analytical queries across these diverse cloud stores and streamline your data analytics processes. You can consolidate your analytics workflows, reducing the need for extensive tooling and infrastructure management.

Data Lake

Data Lake Analytics Cost-Benefit Management

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

AWS Big Data

FEBRUARY 6, 2023

AWS Glue is a serverless data integration service that makes it simple to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. For instructions, refer to How to Set Up a MongoDB Cluster. Choose the table to view the schema and other metadata.

Metadata

Metadata Data Lake Machine Learning Big Data

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Let’s briefly describe the capabilities of the AWS services we referred above: AWS Glue is a fully managed, serverless, and scalable extract, transform, and load (ETL) service that simplifies the process of discovering, preparing, and loading data for analytics. Amazon Athena is used to query, and explore the data.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Integrate custom applications with AWS Lake Formation – Part 2

AWS Big Data

NOVEMBER 19, 2024

Unfiltered Table Metadata This tab displays the response of the AWS Glue API GetUnfilteredTableMetadata policies for the selected table. Get table data and metadata for this user to see how Lake Formation permissions are enforced and so the two users can see different data (on the Authorized Data tab).

Data Processing

Data Processing Metadata Publishing Testing

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

To learn more about working with events using EventBridge, refer to Events via Amazon EventBridge default bus. After you create the asset, you can add glossaries or metadata forms, but its not necessary for this post. We refer to this role as the instance-role throughout the post. Enter a name for the asset.

Publishing

Publishing Unstructured Data Metadata Data-driven

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

You can take all your data from various silos, aggregate that data in your data lake, and perform analytics and machine learning (ML) directly on top of that data. We refer to this concept as outside-in data movement. For a list of supported metrics, refer to Monitoring pipeline metrics.

Data Lake

Data Lake Analytics Dashboards Metrics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

Amazon SageMaker Lakehouse unifies all your data across Amazon S3 data lakes and Amazon Redshift data warehouses, helping you build powerful analytics and AI/ML applications on a single copy of data. The data is also registered in the Glue Data Catalog , a metadata repository. You don’t need to maintain complex ETL pipelines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Do I Need a Data Catalog?

erwin

JUNE 26, 2020

Organizations with particularly deep data stores might need a data catalog with advanced capabilities, such as automated metadata harvesting to speed up the data preparation process. Three Types of Metadata in a Data Catalog. The metadata provides information about the asset that makes it easier to locate, understand and evaluate.

Metadata

Metadata Cost-Benefit Measurement Data-driven

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

This post provides guidance on how to build scalable analytical solutions for gaming industry use cases using Amazon Redshift Serverless. The following diagram is a conceptual analytics data hub reference architecture. They should also provide optimal performance with low or no tuning.

Analytics

Analytics Data Warehouse Data Lake Metadata

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

AWS Big Data

SEPTEMBER 12, 2024

Along with the Glue Data Catalog’s automated compaction feature, these storage optimizations can help you reduce metadata overhead, control storage costs, and improve query performance. The Glue Data Catalog monitors tables daily, removes snapshots from table metadata, and removes the data files and orphan files that are no longer needed.

Optimization

Optimization Snapshot Metadata Metrics

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

BMW Cloud Efficiency Analytics (CLEA) is a homegrown tool developed within the BMW FinOps CoE (Center of Excellence) aiming to optimize and reduce costs across all these accounts. In this post, we explore how the BMW Group FinOps CoE implemented their Cloud Efficiency Analytics tool (CLEA), powered by Amazon QuickSight and Amazon Athena.

Analytics

Analytics Dashboards Metadata Data Warehouse

Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework

AWS Big Data

JUNE 2, 2023

Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytic workloads. This JSON file contains the migration metadata, namely the following: A list of Google BigQuery projects and datasets. If you don’t have one, refer to Amazon Redshift Serverless. An S3 bucket.

Metadata

Metadata Data Warehouse Big Data Analytics

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

quintillion bytes of data being produced on a daily basis and the wide range of online data analysis tools in the market, the use of data and analytics has never been more accessible. Data quality refers to the assessment of the information you have, relative to its purpose and its ability to serve that purpose. With a shocking 2.5

Data Quality

Data Quality Metrics Data-driven Management

Near-real-time analytics using Amazon Redshift streaming ingestion with Amazon Kinesis Data Streams and Amazon DynamoDB

AWS Big Data

JULY 27, 2023

Amazon Redshift is a fully managed, scalable cloud data warehouse that accelerates your time to insights with fast, easy, and secure analytics at scale. Tens of thousands of customers rely on Amazon Redshift to analyze exabytes of data and run complex analytical queries, making it the widely used cloud data warehouse.

Data Warehouse

Data Warehouse Analytics Metadata Dashboards

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

AWS Big Data

SEPTEMBER 12, 2024

Considerations when using data sharing in Amazon Redshift For a comprehensive list of considerations and limitations of data sharing, refer to Considerations when using data sharing in Amazon Redshift. Refer to Part 1 of this series to complete the setup. xlplus) and Redshift Serverless. Choose Create to complete the setup.

Data Lake

Data Lake Analytics Data-driven Data Strategy

Introducing support for Apache Kafka on Raft mode (KRaft) with Amazon MSK clusters

AWS Big Data

MAY 29, 2024

Since its inception, Apache Kafka has depended on Apache Zookeeper for storing and replicating the metadata of Kafka brokers and topics. the Kafka community has adopted KRaft (Apache Kafka on Raft), a consensus protocol, to replace Kafka’s dependency on ZooKeeper for metadata management. For Metadata mode , select KRaft.

Metadata

Metadata Cost-Benefit Management Big Data

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

Webinars

Trending Sources

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Webinars

Run Apache XTable in AWS Lambda for background conversion of open table formats

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Write queries faster with Amazon Q generative SQL for Amazon Redshift

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio

RDF-Star: Metadata Complexity Simplified

Build a high-performance quant research platform with Apache Iceberg

Recap of Amazon Redshift key product announcements in 2024

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Best Practices for Metadata Management

How Eightfold AI implemented metadata security in a multi-tenant data analytics environment with Amazon Redshift

Enhance data governance with enforced metadata rules in Amazon DataZone

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Reduce your compute costs for stream processing applications with Kinesis Client Library 3.0

A Reference Architecture for the Cloudera Private Cloud Base Data Platform

Introducing Amazon MWAA micro environments for Apache Airflow

Unstructured data management and governance using AWS AI/ML and analytics services

Amazon SageMaker Lakehouse now supports attribute-based access control

6 Case Studies on The Benefits of Business Intelligence And Analytics

Three Emerging Analytics Products Derived from Value-driven Data Innovation and Insights Discovery in the Enterprise

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

What Is a Metadata Catalog? (And How it Can Dramatically Improve Your Data Accuracy)

The Power of Graph Databases, Linked Data, and Graph Algorithms

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

Best practices for upgrading Amazon MWAA environments

Multicloud data lake analytics with Amazon Athena

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Integrate custom applications with AWS Lake Formation – Part 2

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Do I Need a Data Catalog?

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Near-real-time analytics using Amazon Redshift streaming ingestion with Amazon Kinesis Data Streams and Amazon DynamoDB

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

Introducing support for Apache Kafka on Raft mode (KRaft) with Amazon MSK clusters

Stay Connected