Blog, Data Warehouse and Metadata

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Table metadata is fetched from AWS Glue. The generated Athena SQL query is run.

Metadata

Metadata Data Lake Modeling Data Warehouse

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

You can learn how to query Delta Lake native tables through UniForm from different data warehouses or engines such as Amazon Redshift as an example of expanding data access to more engines. Both Delta Lake and Iceberg metadata files reference the same data files.

Metadata

Metadata Data Warehouse Big Data Data Lake

Data Governance and Metadata Management: You Can’t Have One Without the Other

erwin

FEBRUARY 13, 2020

When an organization’s data governance and metadata management programs work in harmony, then everything is easier. Data governance is a complex but critical practice. There’s always more data to handle, much of it unstructured; more data sources, like IoT, more points of integration, and more regulatory compliance requirements.

Metadata

Metadata Data Governance Management Cost-Benefit

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. Industry-leading price-performance Amazon Redshift offers up to three times better price-performance than alternative cloud data warehouses.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

Cloudera

SEPTEMBER 29, 2020

Performance is one of the key, if not the most important deciding criterion, in choosing a Cloud Data Warehouse service. In today’s fast changing world, enterprises have to make data driven decisions quickly and for that they rely heavily on their data warehouse service. . Cloudera Data Warehouse vs HDInsight.

Data Warehouse

Data Warehouse Metadata Data-driven Machine Learning

7 Benefits of Metadata Management

erwin

FEBRUARY 19, 2021

Metadata management is key to wringing all the value possible from data assets. However, most organizations don’t use all the data at their disposal to reach deeper conclusions about how to drive revenue, achieve regulatory compliance or accomplish other strategic objectives. What Is Metadata? Harvest data.

Metadata

Metadata Management Data Quality Cost-Benefit

3x better performance with CDP Data Warehouse compared to EMR in TPC-DS benchmark

Cloudera

DECEMBER 11, 2020

In a previous blog post on CDW performance, we compared Azure HDInsight to CDW. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to EMR 6.0 (also powered by Apache Hive-LLAP) on Amazon using the TPC-DS 2.9 More on this later in the blog.

Data Warehouse

Data Warehouse Metadata Machine Learning Measurement

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Cloudera

JANUARY 15, 2021

Cloud data warehouses allow users to run analytic workloads with greater agility, better isolation and scale, and lower administrative overhead than ever before. The results demonstrate superior price performance of Cloudera Data Warehouse on the full set of 99 queries from the TPC-DS benchmark. Introduction.

Data Warehouse

Data Warehouse Cost-Benefit Consulting Interactive

Four Use Cases Proving the Benefits of Metadata-Driven Automation

erwin

FEBRUARY 7, 2019

Organization’s cannot hope to make the most out of a data-driven strategy, without at least some degree of metadata-driven automation. The volume and variety of data has snowballed, and so has its velocity. As such, traditional – and mostly manual – processes associated with data management and data governance have broken down.

Metadata

Metadata Insurance Data-driven Cost-Benefit

Key considerations when making a decision on a Cloud Data Warehouse

Cloudera

MAY 17, 2021

Making a decision on a cloud data warehouse is a big deal. Modernizing your data warehousing experience with the cloud means moving from dedicated, on-premises hardware focused on traditional relational analytics on structured data to a modern platform.

Data Warehouse

Data Warehouse Measurement Reporting Testing

The Role Of Data Warehousing In Your Business Intelligence Architecture

datapine

MAY 29, 2019

One of the BI architecture components is data warehousing. Organizing, storing, cleaning, and extraction of the data must be carried by a central repository system, namely data warehouse, that is considered as the fundamental component of business intelligence. What Is Data Warehousing And Business Intelligence?

Business Intelligence

Business Intelligence Data Warehouse Dashboards Visualization

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The past decades of enterprise data platform architectures can be summarized in 69 words. First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt. The post What is a Data Mesh? first appeared on DataKitchen.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

How Metadata Makes Data Meaningful

erwin

DECEMBER 12, 2019

Metadata is an important part of data governance, and as a result, most nascent data governance programs are rife with project plans for assessing and documenting metadata. But in many scenarios, it seems that the underlying driver of metadata collection projects is that it’s just something you do for data governance.

Metadata

Metadata Data Governance Digital Transformation Data Quality

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Amazon SageMaker Lakehouse , now generally available, unifies all your data across Amazon Simple Storage Service (Amazon S3) data lakes and Amazon Redshift data warehouses, helping you build powerful analytics and AI/ML applications on a single copy of data. Having confidence in your data is key.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Cloudera

APRIL 3, 2023

In this blog, we will share with you in detail how Cloudera integrates core compute engines including Apache Hive and Apache Impala in Cloudera Data Warehouse with Iceberg. We will publish follow up blogs for other data services. Iceberg basics Iceberg is an open table format designed for large analytic workloads.

Data Warehouse

Data Warehouse Snapshot Metadata Cost-Benefit

Choosing the right Data Warehouse SQL Engine: Apache Hive LLAP vs Apache Impala

Cloudera

SEPTEMBER 24, 2020

Some of the most powerful results come from combining complementary superpowers, and the “dynamic duo” of Apache Hive LLAP and Apache Impala, both included in Cloudera Data Warehouse , is further evidence of this. Both Impala and Hive can operate at an unprecedented and massive scale, with many petabytes of data.

Data Warehouse

Data Warehouse Metadata Interactive Dashboards

Cloud Data Warehouse Migration 101: Expert Tips

Alation

JULY 28, 2022

It’s costly and time-consuming to manage on-premises data warehouses — and modern cloud data architectures can deliver business agility and innovation. However, CIOs declare that agility, innovation, security, adopting new capabilities, and time to value — never cost — are the top drivers for cloud data warehousing.

Data Warehouse

Data Warehouse Cost-Benefit Data-driven Data Governance

Metadata, the Neglected Stepchild of IT

Data Virtualization

DECEMBER 8, 2022

Reading Time: 3 minutes While cleaning up our archive recently, I found an old article published in 1976 about data dictionary/directory systems (DD/DS). Nowadays, we no longer use the term DD/DS, but “data catalog” or simply “metadata system”. It was written by L.

Metadata

Metadata IT Data Integration Publishing

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

Cloudera

FEBRUARY 9, 2021

Today’s customers have a growing need for a faster end to end data ingestion to meet the expected speed of insights and overall business demand. This ‘need for speed’ drives a rethink on building a more modern data warehouse solution, one that balances speed with platform cost management, performance, and reliability.

Data Warehouse

Data Warehouse Cost-Benefit Metadata Management

Do I Need a Data Catalog?

erwin

JUNE 26, 2020

Given the value this sort of data-driven insight can provide, the reason organizations need a data catalog should become clearer. It’s no surprise that most organizations’ data is often fragmented and siloed across numerous sources (e.g., Three Types of Metadata in a Data Catalog. Technical Metadata.

Metadata

Metadata Cost-Benefit Measurement Data-driven

Dark Data: How to Find It and What to Do with It

Timo Elliott

JANUARY 6, 2022

Like the proverbial man looking for his keys under the streetlight , when it comes to enterprise data, if you only look at where the light is already shining, you can end up missing a lot. The data you’ve collected and saved over the years isn’t free. Analyze your metadata. Data sense-making. Storing data isn’t enough.

IT

IT Metadata Data-driven Data Governance

The Data Lakehouse: Blending Data Warehouses and Data Lakes

Data Virtualization

APRIL 21, 2022

Reading Time: 3 minutes First we had data warehouses, then came data lakes, and now the new kid on the block is the data lakehouse. But what is a data lakehouse and why should we develop one? In a way, the name describes what.

Data Lake

Data Lake Data Warehouse Data Integration Management

How to Build a Performant Data Warehouse in Redshift

Sisense

SEPTEMBER 3, 2019

This blog is intended to give an overview of the considerations you’ll want to make as you build your Redshift data warehouse to ensure you are getting the optimal performance. This results in less joins between the metric data in fact tables, and the dimensions. So let’s dive in! OLTP vs OLAP.

Data Warehouse

Data Warehouse OLAP Statistics Cost-Benefit

Enabling Self-Service Business Insights with Cloudera Data Warehouse

Cloudera

JANUARY 11, 2021

How self-service data warehousing frees IT resources. Cloudera Data Warehouse (CDW) is a cloud service and an integral part of the newly released Cloudera Data Platform (CDP). Key features are: Highly scalable and performant open-source engines for BI and data warehousing workloads. Simplified provisioning.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Machine Learning

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

erwin

JULY 17, 2019

Part Two of the Digital Transformation Journey … In our last blog on driving digital transformation , we explored how enterprise architecture (EA) and business process (BP) modeling are pivotal factors in a viable digital transformation strategy. Analyze metadata – Understand how data relates to the business and what attributes it has.

Digital Transformation

Digital Transformation Strategy Metadata Data-driven

Next Stop – Building a Data Pipeline from Edge to Insight

Cloudera

FEBRUARY 8, 2021

This is part 2 in this blog series. You can read part 1, here: Digital Transformation is a Data Journey From Edge to Insight. The first blog introduced a mock connected vehicle manufacturing company, The Electric Car Company (ECC), to illustrate the manufacturing data path through the data lifecycle.

Manufacturing

Manufacturing Data Warehouse Sales Predictive Analytics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Altus Data Warehouse

Cloudera

SEPTEMBER 9, 2018

We are proud to announce the general availability of Cloudera Altus Data Warehouse , the only cloud data warehousing service that brings the warehouse to the data. Modern data warehousing for the cloud. Modern data warehousing for the cloud. Using Cloudera Altus for your cloud data warehouse.

Data Warehouse

Data Warehouse Metadata Cost-Benefit Reporting

How Metadata Makes Data Meaningful

erwin

DECEMBER 12, 2019

Metadata is an important part of data governance, and as a result, most nascent data governance programs are rife with project plans for assessing and documenting metadata. But in many scenarios, it seems that the underlying driver of metadata collection projects is that it’s just something you do for data governance.

Metadata

Metadata Data Governance Digital Transformation Data Quality

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback.

Data Lake

Data Lake Data Processing Metadata Snapshot

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

In this blog post, we dive into different data aspects and how Cloudinary breaks the two concerns of vendor locking and cost efficient data analytics by using Apache Iceberg, Amazon Simple Storage Service (Amazon S3 ), Amazon Athena , Amazon EMR , and AWS Glue. Old metadata files are kept for history by default.

Data Lake

Data Lake Metadata Snapshot Analytics

Where Do Data Catalogs Fit in Metadata Management?

Alation

FEBRUARY 13, 2020

In an earlier blog, I defined a data catalog as “a collection of metadata, combined with data management and search tools, that helps analysts and other data users to find the data that they need, serves as an inventory of available data, and provides information to evaluate fitness data for intended uses.”.

Metadata

Metadata Management Data Lake Data Governance

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Cloudera

AUGUST 31, 2021

Cloudera and Accenture demonstrate strength in their relationship with an accelerator called the Smart Data Transition Toolkit for migration of legacy data warehouses into Cloudera Data Platform. Accenture’s Smart Data Transition Toolkit . Are you looking for your data warehouse to support the hybrid multi-cloud?

Data Warehouse

Data Warehouse Cost-Benefit Metadata Data-driven

Benefits of Enterprise Modeling and Data Intelligence Solutions

erwin

JULY 2, 2020

This matters because, as he said, “By placing the data and the metadata into a model, which is what the tool does, you gain the abilities for linkages between different objects in the model, linkages that you cannot get on paper or with Visio or PowerPoint.” Data Modeling with erwin Data Modeler. George H.,

Enterprise

Enterprise Modeling Metadata Data Governance

Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework

AWS Big Data

JUNE 2, 2023

Amazon Redshift is a widely used, fully managed, petabyte-scale cloud data warehouse. Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytic workloads. This JSON file contains the migration metadata, namely the following: A list of Google BigQuery projects and datasets.

Metadata

Metadata Data Warehouse Big Data Analytics

The Benefits of a Knowledge Graph-based Metadata Hub

Ontotext

DECEMBER 15, 2022

But whatever their business goals, in order to turn their invisible data into a valuable asset, they need to understand what they have and to be able to efficiently find what they need. Enter metadata. It enables us to make sense of our data because it tells us what it is and how best to use it. Knowledge (metadata) layer.

Metadata

Metadata Unstructured Data Structured Data Enterprise

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift is a fast, fully managed petabyte-scale cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Amazon Redshift also supports querying nested data with complex data types such as struct, array, and map.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Materialized Views in Hive for Iceberg Table Format

Cloudera

FEBRUARY 8, 2024

Overview This blog post describes support for materialized views for the Iceberg table format. It brings the reliability and simplicity of SQL tables to big data while enabling engines like Hive, Impala, Spark, Trino, Flink, and Presto to work with the same tables at the same time. Starting from the CDW Public Cloud DWX-1.6.1

Snapshot

Snapshot Metadata Cost-Benefit Data Warehouse

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This blog post is co-written with Hardeep Randhawa and Abhay Kumar from HPE. The data sources include 150+ files including 10-15 mandatory files per region ingested in various formats like xlxs, csv, and dat. Each file arrives as a pair with a tail metadata file in CSV format containing the size and name of the file.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

The Security Challenges of Data Warehousing in the Cloud

Cloudera

NOVEMBER 5, 2020

Many organizations struggle to meet growing and variable data warehouse demands. This is exactly what Cloudera Data Platform (CDP) provides to the Cloudera Data Warehouse. CDP is a data platform that is optimized for both business units and central IT. . Cloudera Data Warehouse Security.

Data Lake

Data Lake Data Warehouse Metadata Optimization

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. In recent years, the term “data lakehouse” was coined to describe this architectural pattern of tabular analytics over data in the data lake.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

While cloud-native, point-solution data warehouse services may serve your immediate business needs, there are dangers to the corporation as a whole when you do your own IT this way. Cloudera Data Warehouse (CDW) is here to save the day! CDW is an integrated data warehouse service within Cloudera Data Platform (CDP).

Data Warehouse

Data Warehouse Data Lake IT Analytics

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

Additionally, data owners and data stewards can make data discovery simpler by adding business context to data while balancing access governance to the data via pre-defined approval workflows in the user interface. The metadata forms types, and asset types can be used as templates for defining your assets.

Metadata

Metadata Data Lake Publishing Data Governance

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. This is something that you can learn more about in just about any technology blog. We would like to talk about data visualization and its role in the big data movement.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Webinars

Trending Sources

Data Governance and Metadata Management: You Can’t Have One Without the Other

Webinars

Recap of Amazon Redshift key product announcements in 2024

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

7 Benefits of Metadata Management

3x better performance with CDP Data Warehouse compared to EMR in TPC-DS benchmark

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Four Use Cases Proving the Benefits of Metadata-Driven Automation

Key considerations when making a decision on a Cloud Data Warehouse

The Role Of Data Warehousing In Your Business Intelligence Architecture

What is a Data Mesh?

How Metadata Makes Data Meaningful

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Choosing the right Data Warehouse SQL Engine: Apache Hive LLAP vs Apache Impala

Cloud Data Warehouse Migration 101: Expert Tips

Metadata, the Neglected Stepchild of IT

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

Do I Need a Data Catalog?

Dark Data: How to Find It and What to Do with It

The Data Lakehouse: Blending Data Warehouses and Data Lakes

How to Build a Performant Data Warehouse in Redshift

Enabling Self-Service Business Insights with Cloudera Data Warehouse

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

Next Stop – Building a Data Pipeline from Edge to Insight

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Altus Data Warehouse

How Metadata Makes Data Meaningful

Use Apache Iceberg in a data lake to support incremental data processing

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Where Do Data Catalogs Fit in Metadata Management?

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Benefits of Enterprise Modeling and Data Intelligence Solutions

Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework

The Benefits of a Knowledge Graph-based Metadata Hub

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Materialized Views in Hive for Iceberg Table Format

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

The Security Challenges of Data Warehousing in the Cloud

The Future of the Data Lakehouse – Open

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Unlock data across organizational boundaries using Amazon DataZone – now generally available

Biggest Trends in Data Visualization Taking Shape in 2022

Stay Connected