Data Warehouse, Metadata and Reference

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Table metadata is fetched from AWS Glue. The generated Athena SQL query is run.

Metadata

Metadata Data Lake Modeling Data Warehouse

Write queries faster with Amazon Q generative SQL for Amazon Redshift

AWS Big Data

NOVEMBER 7, 2024

Amazon Redshift is a fully managed, AI-powered cloud data warehouse that delivers the best price-performance for your analytics workloads at any scale. It enables you to get insights faster without extensive knowledge of your organization’s complex database schema and metadata. Your data is not shared across accounts.

Metadata

Metadata Sales Data Warehouse Optimization

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

You can learn how to query Delta Lake native tables through UniForm from different data warehouses or engines such as Amazon Redshift as an example of expanding data access to more engines. Both Delta Lake and Iceberg metadata files reference the same data files. in Delta Lake public document. Appendix 1.

Metadata

Metadata Data Warehouse Big Data Data Lake

Webinars

How to Streamline Payment Applications & Lien Waivers Through Innovative Construction Technology

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. Industry-leading price-performance Amazon Redshift offers up to three times better price-performance than alternative cloud data warehouses.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. For more examples and references to other posts, refer to the following GitHub repository.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

AWS Big Data

NOVEMBER 7, 2024

BladeBridge offers a comprehensive suite of tools that automate much of the complex conversion work, allowing organizations to quickly and reliably transition their data analytics capabilities to the scalable Amazon Redshift data warehouse. times better price performance than other cloud data warehouses.

Data Warehouse

Data Warehouse Reporting Big Data Data Lake

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

SageMaker still includes all the existing ML and AI capabilities you’ve come to know and love for data wrangling, human-in-the-loop data labeling with Amazon SageMaker Ground Truth , experiments, MLOps, Amazon SageMaker HyperPod managed distributed training, and more. Having confidence in your data is key.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

AWS Big Data

MAY 20, 2025

Now with Amazon Bedrock Knowledge Bases integration with structured data, you can use simple, natural language prompts to query complex financial datasets. From customer portals to internal dashboards and mobile apps, this API-driven approach makes enterprise-grade data analysis accessible to everyone in your organization.

Structured Data

Structured Data Data Warehouse Analytics Finance

Reduce time to access your transactional data for analytical processing using the power of Amazon SageMaker Lakehouse and zero-ETL

AWS Big Data

JUNE 16, 2025

This shift requires shorter time to value and tighter collaboration among data analysts, data scientists, machine learning (ML) engineers, and application developers. However, the reality of scattered data across various systems—from data lakes to data warehouses and applications—makes it difficult to access and use data efficiently.

Data Lake

Data Lake Analytics Data Warehouse Metadata

Federate to Amazon Redshift Query Editor v2 with Microsoft Entra ID

AWS Big Data

DECEMBER 10, 2024

Amazon Redshift is a fast, petabyte-scale, cloud data warehouse that tens of thousands of customers rely on to power their analytics workloads. With its massively parallel processing (MPP) architecture and columnar data storage, Amazon Redshift delivers high price-performance for complex analytical queries against large datasets.

Sales

Sales Metadata Enterprise Testing

Amazon SageMaker Lakehouse now supports attribute-based access control

AWS Big Data

APRIL 24, 2025

SageMaker Lakehouse is a unified, open, and secure data lakehouse that now supports ABAC to provide unified access to general purpose Amazon S3 buckets, Amazon S3 Tables , Amazon Redshift data warehouses, and data sources such as Amazon DynamoDB or PostgreSQL. For instructions, refer to Data analyst permissions.

Sales

Sales Data Lake Management Data-driven

Capture data lineage from dbt, Apache Airflow, and Apache Spark with Amazon SageMaker

AWS Big Data

JUNE 24, 2025

With data lineage, now part of SageMaker Catalog, domain administrators and data producers can centralize lineage metadata of their data assets in a single place. You can track the flow of data over time, giving you a clear understanding of where it originated, how it has changed, and its ultimate use across the business.

Testing

Testing Metadata Data Governance Data Processing

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

AWS Big Data

MARCH 5, 2025

In this solution (as shown in the preceding figure), the AWS account that contains the data assets is referred to as the producer account. The AWS account that needs to access or use the data from the producer account is referred to as the consumer account. You will then publish the data assets from these data sources.

Analytics

Analytics Publishing Metadata Sales

Secure Data Sharing and Interoperability Powered by Iceberg REST Catalog

Cloudera

DECEMBER 3, 2024

REST Catalog Value Proposition It provides open, metastore-agnostic APIs for Iceberg metadata operations, dramatically simplifying the Iceberg client and metastore/engine integration. It provides real time metadata access by directly integrating with the Iceberg-compatible metastore. Follow the steps below to setup Cloudera: 1.

Metadata

Metadata Data Warehouse ROI Snapshot

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

AWS Big Data

DECEMBER 9, 2024

The DLQ approach The DLQ strategy focuses on efficiently segregating high-quality data from problematic entries so that only clean data makes it into your primary dataset. The metadata of an Iceberg table stores a history of snapshots. Replace with the S3 bucket from the CloudFormation Outputs tab.

Data Quality

Data Quality Publishing Snapshot Data Lake

Near real-time baggage operational insights for airlines using Amazon Kinesis Data Streams

AWS Big Data

JULY 8, 2025

The airline typically stores the reports in the operational database referred to in the diagram as baggage handling (relational database), retaining historical data spanning multiple years, and makes them available to all personnel on the airline’s network.

Internet of Things

Internet of Things IoT Metrics Data-driven

SAP aims to unify data for AI, analytics with new Business Data Cloud

CIO Business Intelligence

FEBRUARY 13, 2025

Another key component of BDC is its native integration with Databricks, enabling customers to access machine learning, data science, and AI capabilities such as Mosaic AI , Databricks SQL , and Unity Catalog inside SAP.

Analytics

Analytics Enterprise Finance Sales

Geospatial data lakes with Amazon Redshift

AWS Big Data

JULY 10, 2025

Amazon Redshift supports querying spatial data , including the GEOMETRY and GEOGRAPHY data types and functions that are used in querying GIS systems. Additionally, Amazon Redshift lets you query geospatial data both in your data lakes on Amazon S3 and your Redshift data warehouse, giving you the choice of how you can access your data.

Data Lake

Data Lake Metadata Internet of Things Visualization

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

AWS Big Data

MARCH 21, 2025

Traditionally, answering this question would involve multiple data exports, complex extract, transform, and load (ETL) processes, and careful data synchronization across systems. Managed catalogs : A managed catalog refers to the data that resides on Amazon S3 or Redshift Managed Storage (RMS).

Data Warehouse

Data Warehouse Metadata Publishing Sales

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unifying these necessitates additional data processing, requiring each business unit to provision and maintain a separate data warehouse. This burdens business units focused solely on consuming the curated data for analysis and not concerned with data management tasks, cleansing, or comprehensive data processing.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

How Eightfold AI implemented metadata security in a multi-tenant data analytics environment with Amazon Redshift

AWS Big Data

NOVEMBER 29, 2023

As part of the Talent Intelligence Platform Eightfold also exposes a data hub where each customer can access their Amazon Redshift-based data warehouse and perform ad hoc queries as well as schedule queries for reporting and data export. Many customers have implemented Amazon Redshift to support multi-tenant applications.

Metadata

Metadata Data Warehouse Analytics Data Analytics

Four Use Cases Proving the Benefits of Metadata-Driven Automation

erwin

FEBRUARY 7, 2019

Organization’s cannot hope to make the most out of a data-driven strategy, without at least some degree of metadata-driven automation. The volume and variety of data has snowballed, and so has its velocity. As such, traditional – and mostly manual – processes associated with data management and data governance have broken down.

Metadata

Metadata Insurance Data-driven Data Warehouse

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

AWS Big Data

MARCH 27, 2023

Amazon Redshift Serverless makes it simple to run and scale analytics without having to manage your data warehouse infrastructure. Tags allows you to assign metadata to your AWS resources. For more details on tagging, refer to Tagging resources overview. For more tagging best practices, refer to Tagging AWS resources.

Data Warehouse

Data Warehouse Management Snapshot Data Lake

Do I Need a Data Catalog?

erwin

JUNE 26, 2020

Given the value this sort of data-driven insight can provide, the reason organizations need a data catalog should become clearer. It’s no surprise that most organizations’ data is often fragmented and siloed across numerous sources (e.g., Three Types of Metadata in a Data Catalog. Technical Metadata.

Metadata

Metadata Cost-Benefit Measurement Data-driven

Benefits of Data Dictionary Tools for Enterprise Metadata Management

Octopai

FEBRUARY 12, 2020

Like any good puzzle, metadata management comes with a lot of complex variables. That’s why you need to use data dictionary tools, which can help organize your metadata into an archive that can be navigated with ease and from which you can derive good information to power informed decision-making. Why Have a Data Dictionary? #1

Metadata

Metadata Enterprise Management Data Warehouse

Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework

AWS Big Data

JUNE 2, 2023

Amazon Redshift is a widely used, fully managed, petabyte-scale cloud data warehouse. Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytic workloads. This JSON file contains the migration metadata, namely the following: A list of Google BigQuery projects and datasets.

Metadata

Metadata Data Warehouse Big Data Analytics

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift is a fast, fully managed petabyte-scale cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Amazon Redshift also supports querying nested data with complex data types such as struct, array, and map.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

Cloudera

FEBRUARY 9, 2021

Today’s customers have a growing need for a faster end to end data ingestion to meet the expected speed of insights and overall business demand. This ‘need for speed’ drives a rethink on building a more modern data warehouse solution, one that balances speed with platform cost management, performance, and reliability.

Data Warehouse

Data Warehouse Cost-Benefit Metadata Management

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architect Armando Vázquez identifies eight common types of data architects: Enterprise data architect: These data architects oversee an organization’s overall data architecture, defining data architecture strategy and designing and implementing architectures.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

Currently, a handful of startups offer “reverse” extract, transform, and load (ETL), in which they copy data from a customer’s data warehouse or data platform back into systems of engagement where business users do their work. It works in Salesforce just like any other native Salesforce data,” Carlson said.

Data Integration

Data Integration Data Lake Data Warehouse Metadata

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

First, many LLM use cases rely on enterprise knowledge that needs to be drawn from unstructured data such as documents, transcripts, and images, in addition to structured data from data warehouses. Data enrichment In addition, additional metadata may need to be extracted from the objects.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Near-real-time analytics using Amazon Redshift streaming ingestion with Amazon Kinesis Data Streams and Amazon DynamoDB

AWS Big Data

JULY 27, 2023

Amazon Redshift is a fully managed, scalable cloud data warehouse that accelerates your time to insights with fast, easy, and secure analytics at scale. Tens of thousands of customers rely on Amazon Redshift to analyze exabytes of data and run complex analytical queries, making it the widely used cloud data warehouse.

Data Warehouse

Data Warehouse Analytics Metadata Dashboards

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

erwin

JULY 17, 2019

Your organization won’t be able to take complete advantage of analytics tools to become data-driven unless you establish a foundation for agile and complete data management. You need automated data mapping and cataloging through the integration lifecycle process, inclusive of data at rest and data in motion.

Digital Transformation

Digital Transformation Strategy Metadata Data-driven

How to Build a Performant Data Warehouse in Redshift

Sisense

SEPTEMBER 3, 2019

This blog is intended to give an overview of the considerations you’ll want to make as you build your Redshift data warehouse to ensure you are getting the optimal performance. OLTP databases are best at queries where we are doing point scans or short scans of the data, think “return the number of deposits by X user this week.”.

Data Warehouse

Data Warehouse OLAP Statistics Cost-Benefit

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

Flexible and easy to use – The solutions should provide less restrictive, easy-to-access, and ready-to-use data. And unlike data warehouses, which are primarily analytical stores, a data hub is a combination of all types of repositories—analytical, transactional, operational, reference, and data I/O services, along with governance processes.

Analytics

Analytics Data Warehouse Data Lake Metadata

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

Reporting being part of an effective DQM, we will also go through some data quality metrics examples you can use to assess your efforts in the matter. But first, let’s define what data quality actually is. What is the definition of data quality? Why Do You Need Data Quality Management? 2 – Data profiling.

Data Quality

Data Quality Metrics Data-driven Management

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Get started with the new Amazon DataZone enhancements for Amazon Redshift

AWS Big Data

JULY 29, 2024

Amazon DataZone is a powerful data management service that empowers data engineers, data scientists, product managers, analysts, and business users to seamlessly catalog, discover, analyze, and govern data across organizational boundaries, AWS accounts, data lakes, and data warehouses.

Data Warehouse

Data Warehouse Sales Metadata Publishing

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Benefits of Enterprise Modeling and Data Intelligence Solutions

erwin

JULY 2, 2020

a senior business process management architect at a pharma/biotech company with more than 5,000 employees, erwin Evolve was useful for enterprise architecture reference. As he put it, “We are describing our business process and we are trying to describe our data catalog. Data Modeling with erwin Data Modeler. George H.,

Enterprise

Enterprise Modeling Metadata Data Governance

Implement data quality checks on Amazon Redshift data assets and integrate with Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Data producers (data owners) can add context and control access through predefined approvals, providing secure and governed data sharing. To learn more about the core components of Amazon DataZone, refer to Amazon DataZone terminology and concepts.

Data Quality

Data Quality Visualization Metadata Key Performance Indicator

The Benefits of a Knowledge Graph-based Metadata Hub

Ontotext

DECEMBER 15, 2022

But whatever their business goals, in order to turn their invisible data into a valuable asset, they need to understand what they have and to be able to efficiently find what they need. Enter metadata. It enables us to make sense of our data because it tells us what it is and how best to use it. Knowledge (metadata) layer.

Metadata

Metadata Unstructured Data Structured Data Enterprise

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

These formats enable ACID (atomicity, consistency, isolation, durability) transactions, upserts, and deletes, and advanced features such as time travel and snapshots that were previously only available in data warehouses. For more information, refer to Amazon S3: Allows read and write access to objects in an S3 Bucket.

Snapshot

Snapshot Data Lake Metadata Optimization

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Webinars

Trending Sources

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Webinars

Recap of Amazon Redshift key product announcements in 2024

Run Apache XTable in AWS Lambda for background conversion of open table formats

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

Reduce time to access your transactional data for analytical processing using the power of Amazon SageMaker Lakehouse and zero-ETL

Federate to Amazon Redshift Query Editor v2 with Microsoft Entra ID

Amazon SageMaker Lakehouse now supports attribute-based access control

Capture data lineage from dbt, Apache Airflow, and Apache Spark with Amazon SageMaker

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

Secure Data Sharing and Interoperability Powered by Iceberg REST Catalog

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

Near real-time baggage operational insights for airlines using Amazon Kinesis Data Streams

SAP aims to unify data for AI, analytics with new Business Data Cloud

Geospatial data lakes with Amazon Redshift

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

How Eightfold AI implemented metadata security in a multi-tenant data analytics environment with Amazon Redshift

Four Use Cases Proving the Benefits of Metadata-Driven Automation

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

Do I Need a Data Catalog?

Benefits of Data Dictionary Tools for Enterprise Metadata Management

Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

What is a data architect? Skills, salaries, and how to become a data framework master

Salesforce debuts Zero Copy Partner Network to ease data integration

Data governance in the age of generative AI

Near-real-time analytics using Amazon Redshift streaming ingestion with Amazon Kinesis Data Streams and Amazon DynamoDB

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

How to Build a Performant Data Warehouse in Redshift

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Get started with the new Amazon DataZone enhancements for Amazon Redshift

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Benefits of Enterprise Modeling and Data Intelligence Solutions

Implement data quality checks on Amazon Redshift data assets and integrate with Amazon DataZone

The Benefits of a Knowledge Graph-based Metadata Hub

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Stay Connected