2023, Data Warehouse and Metadata

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

In fact, by putting a single label like AI on all the steps of a data-driven business process, we have effectively not only blurred the process, but we have also blurred the particular characteristics that make each step separately distinct, uniquely critical, and ultimately dependent on specialized, specific technologies at each step.

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Write queries faster with Amazon Q generative SQL for Amazon Redshift

AWS Big Data

NOVEMBER 7, 2024

Amazon Redshift is a fully managed, AI-powered cloud data warehouse that delivers the best price-performance for your analytics workloads at any scale. Amazon Q generative SQL for Amazon Redshift was launched in preview during AWS re:Invent 2023. Within this feature, user data is secure and private.

Metadata

Metadata Sales Data Warehouse Optimization

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. The synchronization process in XTable works by translating table metadata using the existing APIs of these table formats.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Top Opportunities for SAP Partners in 2023

Timo Elliott

NOVEMBER 30, 2022

My role was to talk about the trends and opportunities for 2023, for customers, SAP, and our partners. Because of technology limitations, we have always had to start by ripping information from the business systems and moving it to a different platform—a data warehouse, data lake, data lakehouse, data cloud.

Recreation/Entertainment

Recreation/Entertainment Metadata Data Warehouse Cost-Benefit

The Increasing Importance of Open Table Formats

David Menninger's Analyst Perspectives

OCTOBER 31, 2024

It was not until the addition of open table formats— specifically Apache Hudi, Apache Iceberg and Delta Lake—that data lakes truly became capable of supporting multiple business intelligence (BI) projects as well as data science and even operational applications and, in doing so, began to evolve into data lakehouses.

Data Lake

Data Lake Unstructured Data Data Warehouse Software

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

They enable transactions on top of data lakes and can simplify data storage, management, ingestion, and processing. These transactional data lakes combine features from both the data lake and the data warehouse. The Iceberg table is synced with the AWS Glue Data Catalog.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback.

Data Lake

Data Lake Data Processing Metadata Snapshot

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Performance was tested on a Redshift serverless data warehouse with 128 RPU. In our testing, the dataset was stored in Amazon S3 in Parquet format and AWS Glue Data Catalog was used to manage external databases and tables. He works on the intersection of data lakes and data warehouses.

Data Lake

Data Lake Statistics Broadcasting Optimization

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

These formats enable ACID (atomicity, consistency, isolation, durability) transactions, upserts, and deletes, and advanced features such as time travel and snapshots that were previously only available in data warehouses. The output will give a count of the number of data and metadata files deleted.

Snapshot

Snapshot Data Lake Metadata Optimization

Introducing watsonx: The future of AI for business

IBM Big Data Hub

MAY 9, 2023

With watsonx.data , businesses can quickly connect to data, get trusted insights and reduce data warehouse costs. A data store built on open lakehouse architecture, it runs both on premises and across multi-cloud environments. Savings may vary depending on configurations, workloads and vendors.

Data Warehouse

Data Warehouse Machine Learning Cost-Benefit Metadata

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

It seamlessly consolidates data from various data sources within AWS, including AWS Cost Explorer (and forecasting with Cost Explorer ), AWS Trusted Advisor , and AWS Compute Optimizer. Overview of the BMW Cloud Data Hub At the BMW Group, Cloud Data Hub (CDH) is the central platform for managing company-wide data and data solutions.

Dashboards

Dashboards Analytics Metadata Data Warehouse

Materialized Views in Hive for Iceberg Table Format

Cloudera

FEBRUARY 8, 2024

Cloudera Data Warehouse (CDW) running Hive has previously supported creating materialized views against Hive ACID source tables. release and the matching CDW Private Cloud Data Services release, Hive also supports creating, using, and rebuilding materialized views for Iceberg table format.

Snapshot

Snapshot Metadata Cost-Benefit Data Warehouse

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Cloudera

APRIL 1, 2024

We’re excited to share that Gartner has recognized Cloudera as a Visionary among all vendors evaluated in the 2023 Gartner® Magic Quadrant for Cloud Database Management Systems. Download the complimentary 2023 Gartner Magic Quadrant for Cloud Database Management Systems report.

Unstructured Data

Unstructured Data Cost-Benefit Metadata Machine Learning

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

RIO is really great",date("2023-04-06"),2023)""") You can check the new snapshot is created after this append operation by querying the Iceberg snapshot: spark.sql("""SELECT * FROM dev.db.amazon_reviews_iceberg.snapshots""").show() When you want to access the data back, you can bulk restore the archived objects.

Data Lake

Data Lake Snapshot Metadata Optimization

Educating ChatGPT on Data Lakehouse

Cloudera

MARCH 17, 2023

I took the free version of ChatGPT on a test drive (in March 2023) and asked some simple questions on data lakehouse and its components. Hopefully this blog will give ChatGPT an opportunity to learn and correct itself while counting towards my 2023 contribution to social good. I thought this was a fairly comprehensive list.

Unstructured Data

Unstructured Data Data Lake Data Warehouse Machine Learning

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts. We recently announced the integration of Amazon Redshift data sharing with AWS Lake Formation. See Managing LF-Tags for metadata access control for more details.

Data Lake

Data Lake Data Warehouse Marketing Management

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

Data Firehose uses an AWS Lambda function to transform data and ingest the transformed records into an Amazon Simple Storage Service (Amazon S3) bucket. An AWS Glue crawler scans data on the S3 bucket and populates table metadata on the AWS Glue Data Catalog. Let’s drill down into details.

Metrics

Metrics Visualization Dashboards Publishing

Empower Your Cyber Defenders with Real-Time Analytics

Cloudera

NOVEMBER 15, 2024

In fact, according to the Identity Theft Resource Center (ITRC) Annual Data Breach Report , there were 2,365 cyber attacks in 2023 with more than 300 million victims, and a 72% increase in data breaches since 2021. Real-Time Threat Detection with Iceberg Cyber log data is massive and constantly evolving.

Analytics

Analytics Metadata Snapshot Data-driven

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

It is supported by querying, governance, and open data formats to access and share data across the hybrid cloud. Through workload optimization across multiple query engines and storage tiers, organizations can reduce data warehouse costs by up to 50 percent. Later this year, it will leverage watsonx.ai

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

This involves unifying and sharing a single copy of data and metadata across IBM® watsonx.data ™, IBM® Db2 ®, IBM® Db2® Warehouse and IBM® Netezza ®, using native integrations and supporting open formats, all without the need for migration or recataloging.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Best practices to implement near-real-time analytics using Amazon Redshift Streaming Ingestion with Amazon MSK

AWS Big Data

MARCH 11, 2024

Amazon Redshift is a fully managed, scalable cloud data warehouse that accelerates your time to insights with fast, straightforward, and secure analytics at scale. Tens of thousands of customers rely on Amazon Redshift to analyze exabytes of data and run complex analytical queries, making it the most widely used cloud data warehouse.

Analytics

Analytics Data Warehouse Optimization Metrics

Top Takeaways from the Gartner® Innovation Insight: Data Security Posture Management

Laminar Security

MAY 3, 2023

According to our recent State of Cloud Data Security Report 2023 , 77% of organizations experienced a cloud data breach in 2022. That’s particularly concerning considering that 60% of worldwide corporate data was stored in the cloud during that same period. The first step in combating shadow data is discovering it.

Management

Management Risk Risk Management Data Processing

Use the Amazon Redshift Data API to interact with Amazon Redshift Serverless

AWS Big Data

APRIL 28, 2023

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing ETL (extract, transform, and load), business intelligence (BI), and reporting tools.

Interactive

Interactive Data Warehouse Metadata Data-driven

Fivetran Modern Data Stack Conference 2023: Key Takeaways

Alation

APRIL 14, 2023

In “The modern data stack is dead, long live the modern data stack!” the presenters elaborated on the common pain points of the cloud data warehouse today and predicted what it may look like in the future. Subscribe to Alation's Blog Get the latest data cataloging news and trends in your inbox.

Data Warehouse

Data Warehouse Data-driven Metadata Digital Transformation

Implement a serverless CDC process with Apache Iceberg using Amazon DynamoDB and Amazon Athena

AWS Big Data

AUGUST 16, 2023

We fetch the metadata of the users_xxxxxx table from Athena. The following are a few important considerations regarding how the Lambda function handles Iceberg table metadata changes: In this approach, target metadata takes precedence during DML operations. It’s imperative that the source and target metadata match.

Data Lake

Data Lake Metadata Testing Snapshot

Do the Benefits of Cloud Outweigh the Costs?

Jet Global

SEPTEMBER 19, 2023

Data Access What insights can we derive from our cloud ERP? What are the best practices for analyzing cloud ERP data? Data Management How do we create a data warehouse or data lake in the cloud using our cloud ERP? How do I access the legacy data from my previous ERP? Cross-functional collaboration.

Cost-Benefit

Cost-Benefit Data Warehouse Reporting Enterprise

Build a data lakehouse in a hybrid Environment using Amazon EMR Serverless, Apache DolphinScheduler, and TiDB

AWS Big Data

MARCH 20, 2025

Solution overview For our use case, an enterprise data warehouse with business data is hosted on an on-premises TiDB platform, an AWS Global Partner that is also available on AWS through AWS Marketplace. Typically, there are four layers in terms of data warehouse design. bin.tar.gz cd apache-zookeeper-3.8.0-bin/conf

Data Warehouse

Data Warehouse Metadata Testing Management

Empower Your Cyber Defenders with Real-Time Analytics Author: Carolyn Duby, Field CTO

Cloudera

NOVEMBER 15, 2024

In fact, according to the Identity Theft Resource Center (ITRC) Annual Data Breach Report , there were 2,365 cyber attacks in 2023 with more than 300 million victims, and a 72% increase in data breaches since 2021. Real-Time Threat Detection with Iceberg Cyber log data is massive and constantly evolving.

Analytics

Analytics Metadata Snapshot Data-driven

SAP aims to unify data for AI, analytics with new Business Data Cloud

CIO Business Intelligence

FEBRUARY 13, 2025

Another key component of BDC is its native integration with Databricks, enabling customers to access machine learning, data science, and AI capabilities such as Mosaic AI , Databricks SQL , and Unity Catalog inside SAP.

Analytics

Analytics Enterprise Finance Sales

How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and Amazon Athena

AWS Big Data

APRIL 29, 2025

These views are carefully crafted to include only the necessary columns, relevant metadata joins, and appropriate time filtering to have only recent data available in dashboards. The effectiveness of these optimizations implemented at the end of 2023 is visible in the following diagram, showing costs by Athena workgroups.

Data Transformation

Data Transformation Cost-Benefit Testing Data Lake

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

AWS Big Data

MARCH 21, 2025

Traditionally, answering this question would involve multiple data exports, complex extract, transform, and load (ETL) processes, and careful data synchronization across systems. The existing Data Catalog becomes the Default catalog (identified by the AWS account number) and is readily available in SageMaker Lakehouse.

Data Warehouse

Data Warehouse Metadata Publishing Sales

Data Leaders Brief

SAP Datasphere Powers Business at the Speed of Data

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Webinars

Trending Sources

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Top Opportunities for SAP Partners in 2023

The Increasing Importance of Open Table Formats

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Use Apache Iceberg in a data lake to support incremental data processing

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Introducing watsonx: The future of AI for business

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Materialized Views in Hive for Iceberg Table Format

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Educating ChatGPT on Data Lakehouse

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Empower Your Cyber Defenders with Real-Time Analytics

Exploring the AI and data capabilities of watsonx

Tackling AI’s data challenges with IBM databases on AWS

Best practices to implement near-real-time analytics using Amazon Redshift Streaming Ingestion with Amazon MSK

Top Takeaways from the Gartner® Innovation Insight: Data Security Posture Management

Use the Amazon Redshift Data API to interact with Amazon Redshift Serverless

Fivetran Modern Data Stack Conference 2023: Key Takeaways

Implement a serverless CDC process with Apache Iceberg using Amazon DynamoDB and Amazon Athena

Do the Benefits of Cloud Outweigh the Costs?

Build a data lakehouse in a hybrid Environment using Amazon EMR Serverless, Apache DolphinScheduler, and TiDB

Empower Your Cyber Defenders with Real-Time Analytics Author: Carolyn Duby, Field CTO

SAP aims to unify data for AI, analytics with new Business Data Cloud

How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and Amazon Athena

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

Stay Connected