Data Warehouse, Metadata and Optimization

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Table metadata is fetched from AWS Glue. The generated Athena SQL query is run.

Metadata

Metadata Data Lake Modeling Data Warehouse

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. Industry-leading price-performance Amazon Redshift offers up to three times better price-performance than alternative cloud data warehouses.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Write queries faster with Amazon Q generative SQL for Amazon Redshift

AWS Big Data

NOVEMBER 7, 2024

Amazon Redshift is a fully managed, AI-powered cloud data warehouse that delivers the best price-performance for your analytics workloads at any scale. It enables you to get insights faster without extensive knowledge of your organization’s complex database schema and metadata. Your data is not shared across accounts.

Metadata

Metadata Sales Data Warehouse Optimization

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

AWS Big Data

NOVEMBER 7, 2024

BladeBridge offers a comprehensive suite of tools that automate much of the complex conversion work, allowing organizations to quickly and reliably transition their data analytics capabilities to the scalable Amazon Redshift data warehouse. times better price performance than other cloud data warehouses.

Data Warehouse

Data Warehouse Reporting Big Data Data Lake

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

AWS Big Data

MAY 20, 2025

Now with Amazon Bedrock Knowledge Bases integration with structured data, you can use simple, natural language prompts to query complex financial datasets. From customer portals to internal dashboards and mobile apps, this API-driven approach makes enterprise-grade data analysis accessible to everyone in your organization. Choose Next.

Structured Data

Structured Data Data Warehouse Analytics Finance

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

Cloudera

SEPTEMBER 29, 2020

Performance is one of the key, if not the most important deciding criterion, in choosing a Cloud Data Warehouse service. In today’s fast changing world, enterprises have to make data driven decisions quickly and for that they rely heavily on their data warehouse service. . Cloudera Data Warehouse vs HDInsight.

Data Warehouse

Data Warehouse Metadata Data-driven Machine Learning

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unifying these necessitates additional data processing, requiring each business unit to provision and maintain a separate data warehouse. This burdens business units focused solely on consuming the curated data for analysis and not concerned with data management tasks, cleansing, or comprehensive data processing.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. From here, the metadata is published to Amazon DataZone by using AWS Glue Data Catalog. This process is shown in the following figure.

IoT

IoT Machine Learning Metadata Data-driven

3x better performance with CDP Data Warehouse compared to EMR in TPC-DS benchmark

Cloudera

DECEMBER 11, 2020

In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to EMR 6.0 (also powered by Apache Hive-LLAP) on Amazon using the TPC-DS 2.9 Cloudera Data Warehouse vs EMR. Learn more about Cloudera Data Warehouse on CDP.

Data Warehouse

Data Warehouse Metadata Machine Learning Measurement

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Cloudera

JANUARY 15, 2021

Cloud data warehouses allow users to run analytic workloads with greater agility, better isolation and scale, and lower administrative overhead than ever before. The results demonstrate superior price performance of Cloudera Data Warehouse on the full set of 99 queries from the TPC-DS benchmark. Introduction.

Data Warehouse

Data Warehouse Cost-Benefit Consulting Interactive

Key considerations when making a decision on a Cloud Data Warehouse

Cloudera

MAY 17, 2021

Making a decision on a cloud data warehouse is a big deal. Modernizing your data warehousing experience with the cloud means moving from dedicated, on-premises hardware focused on traditional relational analytics on structured data to a modern platform.

Data Warehouse

Data Warehouse Measurement Reporting Testing

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Within the ANZ enterprise data mesh strategy, aligning data mesh nodes with the ANZ Group’s divisional structure provides optimal alignment between data mesh principles and organizational structure, as shown in the following diagram. Consumer feedback and demand drives creation and maintenance of the data product.

Metadata

Metadata Data Governance Data Quality Data-driven

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Lakehouse provides an open data architecture that reduces data silos and unifies data across Amazon Simple Storage Service (Amazon S3) data lakes, Redshift data warehouses, and third-party and federated data sources.

Analytics

Analytics Data Lake Metadata Data Warehouse

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

Source systems Aruba’s source repository includes data from three different operating regions in AMER, EMEA, and APJ, along with one worldwide (WW) data pipeline from varied sources like SAP S/4 HANA, Salesforce, Enterprise Data Warehouse (EDW), Enterprise Analytics Platform (EAP) SharePoint, and more.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Cloudera

APRIL 3, 2023

In this blog, we will share with you in detail how Cloudera integrates core compute engines including Apache Hive and Apache Impala in Cloudera Data Warehouse with Iceberg. We will publish follow up blogs for other data services. Try Cloudera Data Warehouse (CDW) by signing up for a 60 day trial , or test drive CDP.

Data Warehouse

Data Warehouse Snapshot Metadata Cost-Benefit

Choosing the right Data Warehouse SQL Engine: Apache Hive LLAP vs Apache Impala

Cloudera

SEPTEMBER 24, 2020

Some of the most powerful results come from combining complementary superpowers, and the “dynamic duo” of Apache Hive LLAP and Apache Impala, both included in Cloudera Data Warehouse , is further evidence of this. Both Impala and Hive can operate at an unprecedented and massive scale, with many petabytes of data.

Data Warehouse

Data Warehouse Metadata Interactive Dashboards

Do I Need a Data Catalog?

erwin

JUNE 26, 2020

Given the value this sort of data-driven insight can provide, the reason organizations need a data catalog should become clearer. It’s no surprise that most organizations’ data is often fragmented and siloed across numerous sources (e.g., Three Types of Metadata in a Data Catalog. Technical Metadata.

Metadata

Metadata Cost-Benefit Measurement Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Cloudinary is a cloud-based media management platform that provides a comprehensive set of tools and services for managing, optimizing, and delivering images, videos, and other media assets on websites and mobile applications. This concept makes Iceberg extremely versatile.

Data Lake

Data Lake Metadata Snapshot Analytics

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

Cloudera

FEBRUARY 9, 2021

Today’s customers have a growing need for a faster end to end data ingestion to meet the expected speed of insights and overall business demand. This ‘need for speed’ drives a rethink on building a more modern data warehouse solution, one that balances speed with platform cost management, performance, and reliability.

Data Warehouse

Data Warehouse Cost-Benefit Metadata Management

How to Build a Performant Data Warehouse in Redshift

Sisense

SEPTEMBER 3, 2019

This blog is intended to give an overview of the considerations you’ll want to make as you build your Redshift data warehouse to ensure you are getting the optimal performance. Amazon describes the dense storage nodes (DS2) as optimized for large data workloads and use hard disk drives (HDD) for storage.

Data Warehouse

Data Warehouse OLAP Statistics Cost-Benefit

Enabling Self-Service Business Insights with Cloudera Data Warehouse

Cloudera

JANUARY 11, 2021

How self-service data warehousing frees IT resources. Cloudera Data Warehouse (CDW) is a cloud service and an integral part of the newly released Cloudera Data Platform (CDP). Key features are: Highly scalable and performant open-source engines for BI and data warehousing workloads. Simplified provisioning.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Machine Learning

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

The external data catalog can be AWS Glue Data Catalog, the data catalog that comes with Amazon Athena, or your own Apache Hive metastore. To get the best performance on data lake queries with Redshift, you can use AWS Glue Data Catalog’s column statistics feature to collect statistics on Data Lake tables.

Data Lake

Data Lake Statistics Broadcasting Optimization

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architect Armando Vázquez identifies eight common types of data architects: Enterprise data architect: These data architects oversee an organization’s overall data architecture, defining data architecture strategy and designing and implementing architectures.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift is a fast, fully managed petabyte-scale cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Amazon Redshift also supports querying nested data with complex data types such as struct, array, and map.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

But the data repository options that have been around for a while tend to fall short in their ability to serve as the foundation for big data analytics powered by AI. Traditional data warehouses, for example, support datasets from multiple sources but require a consistent data structure.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

While cloud-native, point-solution data warehouse services may serve your immediate business needs, there are dangers to the corporation as a whole when you do your own IT this way. Cloudera Data Warehouse (CDW) is here to save the day! CDW is an integrated data warehouse service within Cloudera Data Platform (CDP).

Data Warehouse

Data Warehouse Data Lake IT Analytics

Altus Data Warehouse

Cloudera

SEPTEMBER 9, 2018

We are proud to announce the general availability of Cloudera Altus Data Warehouse , the only cloud data warehousing service that brings the warehouse to the data. Modern data warehousing for the cloud. Modern data warehousing for the cloud. Using Cloudera Altus for your cloud data warehouse.

Data Warehouse

Data Warehouse Metadata Cost-Benefit Reporting

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse. In this post, we show how smava optimized their data platform by using Amazon Redshift Serverless and Amazon Redshift data sharing to overcome right-sizing challenges for unpredictable workloads and further improve price-performance.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

It can control changes in the sources from which it extracts data and includes Data Lineage capabilities, which means confidence for the user. How is Data Virtualization performance optimized? How does Data Virtualization complement Data Warehousing and SOA Architectures? In improving operational processes.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Improve performance of workloads containing repetitive scan filters with multidimensional data layout sort keys in Amazon Redshift

AWS Big Data

NOVEMBER 27, 2023

Amazon Redshift , the most widely used cloud data warehouse, has evolved significantly to meet the performance requirements of the most demanding workloads. This post covers one such new feature—the multidimensional data layout sort key. Refer to Working with automatic table optimization for more details on ATO.

Data Warehouse

Data Warehouse Cost-Benefit Optimization Testing

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback.

Data Lake

Data Lake Data Processing Metadata Snapshot

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

BMW Group uses 4,500 AWS Cloud accounts across the entire organization but is faced with the challenge of reducing unnecessary costs, optimizing spend, and having a central place to monitor costs. The ultimate goal is to raise awareness of cloud efficiency and optimize cloud utilization in a cost-effective and sustainable manner.

Analytics

Analytics Dashboards Metadata Data Warehouse

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

AWS Big Data

DECEMBER 9, 2024

Inventory management benefits from historical data for analyzing sales patterns and optimizing stock levels. In fraud detection, historical data helps identify anomalous patterns in transactions or user behaviors. In customer relationship management, it tracks changes in customer information over time.

Snapshot

Snapshot Data Warehouse Data Lake Data Quality

Extreme data center pressure? Burst to the cloud with CDP!

Cloudera

NOVEMBER 12, 2020

Burst to Cloud not only relieves pressure on your data center, but it also protects your VIP applications and users by giving them optimal performance without breaking your bank. Cloud deployments for suitable workloads gives you the agility to keep pace with rapidly changing business and data needs. You are probably hesitant.

Data Warehouse

Data Warehouse Reporting Risk Cost-Benefit

Integrating Data Governance and Enterprise Architecture

erwin

SEPTEMBER 3, 2020

Data governance and EA also provide many of the same benefits of enterprise architecture or business process modeling projects: reducing risk, optimizing operations, and increasing the use of trusted data. Automating Data Governance and Enterprise Architecture.

Data Governance

Data Governance Enterprise Risk Data Lake

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

How AppsFlyer modernized their interactive workload by moving to Amazon Athena and saved 80% of costs

AWS Big Data

AUGUST 8, 2024

It’s designed to make it straightforward for users to analyze data stored in Amazon Simple Storage Service (Amazon S3) using standard SQL queries. We dive into the various optimization techniques AppsFlyer employed, such as partition projection, sorting, parallel query runs, and the use of query result reuse.

Interactive

Interactive Metadata Optimization Testing

The Security Challenges of Data Warehousing in the Cloud

Cloudera

NOVEMBER 5, 2020

Many organizations struggle to meet growing and variable data warehouse demands. How do you control data privacy and protect against data breaches when the data is spread across so many different systems? How do you optimize your enterprise-wide infrastructure (mostly cloud) and application expenditures?

Data Lake

Data Lake Data Warehouse Metadata Optimization

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Recap of Amazon Redshift key product announcements in 2024

Webinars

Trending Sources

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Webinars

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

How EUROGATE established a data mesh architecture using Amazon DataZone

3x better performance with CDP Data Warehouse compared to EMR in TPC-DS benchmark

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Key considerations when making a decision on a Cloud Data Warehouse

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Top analytics announcements of AWS re:Invent 2024

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Choosing the right Data Warehouse SQL Engine: Apache Hive LLAP vs Apache Impala

Do I Need a Data Catalog?

Data’s dark secret: Why poor quality cripples AI and growth

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

How to Build a Performant Data Warehouse in Redshift

Enabling Self-Service Business Insights with Cloudera Data Warehouse

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS re:Invent 2023 Amazon Redshift Sessions Recap

What is a data architect? Skills, salaries, and how to become a data framework master

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Building a Beautiful Data Lakehouse

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Altus Data Warehouse

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Biggest Trends in Data Visualization Taking Shape in 2022

Improve performance of workloads containing repetitive scan filters with multidimensional data layout sort keys in Amazon Redshift

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Use Apache Iceberg in a data lake to support incremental data processing

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

Extreme data center pressure? Burst to the cloud with CDP!

Integrating Data Governance and Enterprise Architecture

Choosing an open table format for your transactional data lake on AWS

How AppsFlyer modernized their interactive workload by moving to Amazon Athena and saved 80% of costs

The Security Challenges of Data Warehousing in the Cloud

Stay Connected