2023, Data Lake and Data Warehouse

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

United Airlines sets its flight plan for gen AI success

CIO Business Intelligence

DECEMBER 20, 2024

Uniteds embrace of SageMaker and Bedrock as well as Amazon Q is going to be a game changer for building data products, said Mai-LanTomsenBukovec, AWS vice president of technology, who pointed to United Data Hub as a transformational component in its AI journey at re:Invent. That number has increased to 21% in just 18 months.

IT

IT Unstructured Data Experimentation Data Lake

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Data architecture has evolved significantly to handle growing data volumes and diverse workloads. Initially, data warehouses were the go-to solution for structured data and analytical workloads but were limited by proprietary storage formats and their inability to handle unstructured data.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries.

Data Lake

Data Lake Statistics Broadcasting Optimization

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. and later supports the Apache Iceberg framework for data lakes. AWS Glue 3.0 The following diagram illustrates the solution architecture.

Data Lake

Data Lake Data Processing Metadata Snapshot

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

AWS Big Data

JULY 31, 2024

In the current industry landscape, data lakes have become a cornerstone of modern data architecture, serving as repositories for vast amounts of structured and unstructured data. Maintaining data consistency and integrity across distributed data lakes is crucial for decision-making and analytics.

Data Lake

Data Lake Marketing Data Processing Management

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

AWS Big Data

FEBRUARY 26, 2025

Amazon Web Services (AWS) has been recognized as a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools. We were positioned in the Challengers Quadrant in 2023.

Data Integration

Data Integration Data Lake Data Warehouse Unstructured Data

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

2023 AWS Analytics Superheroes We are excited to introduce the 2023 AWS Analytics Superheroes at this year’s re:Invent conference! A shapeshifting guardian and protector of data like Data Lynx? 11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Accelerate your data warehouse migration to Amazon Redshift – Part 7

AWS Big Data

OCTOBER 17, 2023

With Amazon Redshift, you can use standard SQL to query data across your data warehouse, operational data stores, and data lake. Migrating a data warehouse can be complex. You have to migrate terabytes or petabytes of data from your legacy system while not disrupting your production workload.

Data Warehouse

Data Warehouse Data Processing Data Lake Management

The Increasing Importance of Open Table Formats

David Menninger's Analyst Perspectives

OCTOBER 31, 2024

I previously wrote about the importance of open table formats to the evolution of data lakes into data lakehouses. The concept of the data lake was initially proposed as a single environment where data could be combined from multiple sources to be stored and processed to enable analysis by multiple users for multiple purposes.

Data Lake

Data Lake Unstructured Data Data Warehouse Software

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts. We recently announced the integration of Amazon Redshift data sharing with AWS Lake Formation.

Data Lake

Data Lake Data Warehouse Marketing Management

Write queries faster with Amazon Q generative SQL for Amazon Redshift

AWS Big Data

NOVEMBER 7, 2024

Amazon Redshift is a fully managed, AI-powered cloud data warehouse that delivers the best price-performance for your analytics workloads at any scale. Amazon Q generative SQL for Amazon Redshift was launched in preview during AWS re:Invent 2023. Choose Run all on each notebook tab.

Metadata

Metadata Sales Data Warehouse Optimization

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

When you build your transactional data lake using Apache Iceberg to solve your functional use cases, you need to focus on operational use cases for your S3 data lake to optimize the production environment. availability. parquet") df.sortWithinPartitions("review_date").writeTo("dev.db.amazon_reviews_iceberg").append()

Data Lake

Data Lake Snapshot Metadata Optimization

Key finding from Forrester’s latest BI research including The Forrester Wave™: Augmented Business Intelligence Platforms, Q2 2023

Boris Evelson

JUNE 14, 2023

No matter what technology foundation you’re using – a data lake, a data warehouse, data fabric, data mesh, etc. – BI applications are where business users consume data and turn it into actionable insights and decisions. The BI market has […]

Business Intelligence

Business Intelligence Data Lake Data Warehouse Data-driven

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

AWS Big Data

MARCH 28, 2023

In a data warehouse, a dimension is a structure that categorizes facts and measures in order to enable users to answer business questions. As organizations across the globe are modernizing their data platforms with data lakes on Amazon Simple Storage Service (Amazon S3), handling SCDs in data lakes can be challenging.

Data Lake

Data Lake Testing Snapshot Sales

Top Opportunities for SAP Partners in 2023

Timo Elliott

NOVEMBER 30, 2022

My role was to talk about the trends and opportunities for 2023, for customers, SAP, and our partners. Because of technology limitations, we have always had to start by ripping information from the business systems and moving it to a different platform—a data warehouse, data lake, data lakehouse, data cloud.

Recreation/Entertainment

Recreation/Entertainment Metadata Data Warehouse Cost-Benefit

What’s cooking with Amazon Redshift at AWS re:Invent 2023

AWS Big Data

NOVEMBER 15, 2023

Connect with experts, meet with book authors on data warehousing and analytics (at the Meet the Authors event on November 29 and 30, 3:00 PM – 4:00 PM), win prizes, and learn all about the latest innovations from our AWS Analytics services. A shapeshifting guardian and protector of data like Data Lynx?

Data Lake

Data Lake Data Warehouse B2B Deep Learning

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

AWS Big Data

NOVEMBER 8, 2023

Save the date: AWS re:Invent 2023 is happening from November 27 to December 1 in Las Vegas, and you cannot miss it. In today’s data-driven landscape, the quality of data is the foundation upon which the success of organizations and innovations stands. Reserve your seat now! Register now to secure your spot!

Data-driven

Data-driven Machine Learning Data Lake Cost-Benefit

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

AWS Big Data

NOVEMBER 10, 2023

This integration expands the possibilities for AWS analytics and machine learning (ML) solutions, making the data warehouse accessible to a broader range of applications. Your applications can seamlessly read from and write to your Amazon Redshift data warehouse while maintaining optimal performance and transactional consistency.

Data Processing

Data Processing Data Lake Data Warehouse Optimization

5 Key Takeaways from Flink Forward 2023

Cloudera

NOVEMBER 27, 2023

Earlier this month (November 6 through 8, 2023) a few hundred Apache Flink enthusiasts descended upon a Hyatt Regency Lake near Seattle for the annual Flink Forward conference. Sign up for a free trial of Cloudera’s NiFi-based DataFlow and walk through use cases like stream filtering and cloud data warehouse ingest.

Advertising

Advertising Data Lake Data Warehouse ROI

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Big Data

NOVEMBER 20, 2023

Use case A typical workload for AWS Glue for Apache Spark jobs is to load data from a relational database to a data lake with SQL-based transformations. The following is a visual representation of an example job where the number of workers is 10. When the example job ran, the workerUtilization metrics showed the following trend.

Metrics

Metrics Data Lake Cost-Benefit Dashboards

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

AWS Big Data

MARCH 27, 2024

You can send data from your streaming source to this resource for ingesting the data into a Redshift data warehouse. This will be your online transaction processing (OLTP) data store for transactional data. With continuous innovations added to Amazon Redshift, it is now more than just a data warehouse.

Data Analytics

Data Analytics Analytics Data Warehouse Data Lake

Preparing the foundations for Generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

Data also needs to be sorted, annotated and labelled in order to meet the requirements of generative AI. No wonder CIO’s 2023 AI Priorities study found that data integration was the number one concern for IT leaders around generative AI integration, above security and privacy and the user experience. “We

Cost-Benefit

Cost-Benefit Data Lake Data Warehouse Data Processing

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

AWS-powered data lakes, supported by the unmatched availability of Amazon Simple Storage Service (Amazon S3), can handle the scale, agility, and flexibility required to combine different data and analytics approaches. The output will give a count of the number of data and metadata files deleted.

Snapshot

Snapshot Data Lake Metadata Optimization

Educating ChatGPT on Data Lakehouse

Cloudera

MARCH 17, 2023

I took the free version of ChatGPT on a test drive (in March 2023) and asked some simple questions on data lakehouse and its components. Hopefully this blog will give ChatGPT an opportunity to learn and correct itself while counting towards my 2023 contribution to social good. I thought this was a fairly comprehensive list.

Unstructured Data

Unstructured Data Data Lake Data Warehouse Machine Learning

Building a vision for real-time artificial intelligence

CIO Business Intelligence

APRIL 12, 2023

Most current data architectures were designed for batch processing with analytics and machine learning models running on data warehouses and data lakes. Wrapping up Moving real-time AI from the edge of innovation to the center of the business will be one of the biggest challenges for organizations in 2023.

Machine Learning

Machine Learning Cost-Benefit Data-driven Strategy

Backcountry modernizes for the cloud era

CIO Business Intelligence

APRIL 26, 2022

Backcountry also lacked many core services critical for an online retailer — no CMS, no analytics, no data platform, and no data lake. In recent years, e-commerce platforms have evolved into a combination of cloud, analytics, CX UIs, and data lakes dubbed customer data platforms (CDPs). That’s right.

Data Lake

Data Lake Dashboards Recreation/Entertainment Sales

Introducing watsonx: The future of AI for business

IBM Big Data Hub

MAY 9, 2023

With watsonx.data , businesses can quickly connect to data, get trusted insights and reduce data warehouse costs. A data store built on open lakehouse architecture, it runs both on premises and across multi-cloud environments. Savings may vary depending on configurations, workloads and vendors.

Data Warehouse

Data Warehouse Machine Learning Cost-Benefit Metadata

Unleashing the power of Presto: The Uber case study

IBM Big Data Hub

SEPTEMBER 25, 2023

Uber understood that digital superiority required the capture of all their transactional data, not just a sampling. They stood up a file-based data lake alongside their analytical database. Because much of the work done on their data lake is exploratory in nature, many users want to execute untested queries on petabytes of data.

OLAP

OLAP Data Lake Data-driven Online Analytical Processing

Wonderla Holidays goes digital to enhance business and customer fun

CIO Business Intelligence

OCTOBER 18, 2022

One pulse sends 150 bytes of data. So, each band can send out 500KB to 750KB of data. To handle the huge volume of data thus generated, the company is in the process of deploying a data lake, data warehouse, and real-time analytical tools in a hybrid model.

Data Lake

Data Lake Data Warehouse Cost-Benefit Digital Transformation

Get started with Amazon DynamoDB zero-ETL integration with Amazon Redshift

AWS Big Data

OCTOBER 17, 2024

You can then run enhanced analysis on this DynamoDB data with the rich capabilities of Amazon Redshift, such as high-performance SQL, built-in machine learning (ML) and Spark integrations, materialized views (MV) with automatic and incremental refresh, data sharing, and the ability to join data across multiple data stores and data lakes.

Metrics

Metrics Dashboards Data Warehouse Statistics

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

Looking at the Skewness Job per Job visualization, there was spike on November 1, 2023. You can choose Controls , and change filter conditions based on date time, Region, AWS account ID, AWS Glue job name, job run ID, and the source and sink of the data stores. Let’s drill down into details.

Metrics

Metrics Visualization Dashboards Interactive

Materialized Views in Hive for Iceberg Table Format

Cloudera

FEBRUARY 8, 2024

Cloudera Data Warehouse (CDW) running Hive has previously supported creating materialized views against Hive ACID source tables. release and the matching CDW Private Cloud Data Services release, Hive also supports creating, using, and rebuilding materialized views for Iceberg table format.

Snapshot

Snapshot Metadata Cost-Benefit Data Warehouse

Modeling, Modernization and Automation

BI-Survey

APRIL 27, 2023

Eckerson Group wrote this report in collaboration with BARC by studying the results of a global survey of 238 data & analytics practitioners and leaders. BARC conducted the survey in December 2022 and January 2023, drawing respondents from companies of various sizes and across various industries.

Modeling

Modeling Data Warehouse Data Quality Business Driver

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

AWS Big Data

JUNE 6, 2023

You can use AWS Glue to create, run, and monitor data integration and ETL (extract, transform, and load) pipelines and catalog your assets across multiple data stores. Hundreds of thousands of customers use data lakes for analytics and ML to make data-driven business decisions.

Data Quality

Data Quality Data-driven Data Lake Metrics

Cybersecurity e NIS2: come si muovono i CIO per dormire sonni (un po’) più tranquilli

CIO Business Intelligence

APRIL 22, 2024

L’ultimo Rapporto Clusit ha contato 2.779 incidenti gravi a livello globale nel 2023 (+12% rispetto al 2022), di cui 310 in Italia, ovvero l’11% del totale mondiale e un incremento addirittura del 65% in un anno. Questa attenzione massima rispecchia la consapevolezza che le cyber-minacce sono sempre più numerose e preoccupanti.

Data Lake

Data Lake Testing Management IoT

Amazon Kinesis Data Streams: celebrating a decade of real-time data innovation

AWS Big Data

NOVEMBER 14, 2023

However, in many organizations, data is typically spread across a number of different systems such as software as a service (SaaS) applications, operational databases, and data warehouses. Such data silos make it difficult to get unified views of the data in an organization and act in real time to derive the most value.

IoT

IoT Data-driven Data Lake Data Strategy

The year’s top 10 enterprise AI trends — so far

CIO Business Intelligence

SEPTEMBER 21, 2023

To make all this possible, the data had to be collected, processed, and fed into the systems that needed it in a reliable, efficient, scalable, and secure way. Data warehouses then evolved into data lakes, and then data fabrics and other enterprise-wide data architectures.

Enterprise

Enterprise Consulting Modeling Cost-Benefit

Empower Your Cyber Defenders with Real-Time Analytics

Cloudera

NOVEMBER 15, 2024

In fact, according to the Identity Theft Resource Center (ITRC) Annual Data Breach Report , there were 2,365 cyber attacks in 2023 with more than 300 million victims, and a 72% increase in data breaches since 2021.

Analytics

Analytics Metadata Snapshot Data-driven

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

It is supported by querying, governance, and open data formats to access and share data across the hybrid cloud. Through workload optimization across multiple query engines and storage tiers, organizations can reduce data warehouse costs by up to 50 percent.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Load data incrementally from transactional data lakes to data warehouses

United Airlines sets its flight plan for gen AI success

Webinars

Trending Sources

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Use Apache Iceberg in a data lake to support incremental data processing

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

Your guide to AWS Analytics at AWS re:Invent 2023

Accelerate your data warehouse migration to Amazon Redshift – Part 7

The Increasing Importance of Open Table Formats

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Key finding from Forrester’s latest BI research including The Forrester Wave™: Augmented Business Intelligence Platforms, Q2 2023

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Top Opportunities for SAP Partners in 2023

What’s cooking with Amazon Redshift at AWS re:Invent 2023

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

5 Key Takeaways from Flink Forward 2023

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

Preparing the foundations for Generative AI

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Educating ChatGPT on Data Lakehouse

Building a vision for real-time artificial intelligence

Backcountry modernizes for the cloud era

Introducing watsonx: The future of AI for business

Unleashing the power of Presto: The Uber case study

Wonderla Holidays goes digital to enhance business and customer fun

Get started with Amazon DynamoDB zero-ETL integration with Amazon Redshift

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Materialized Views in Hive for Iceberg Table Format

Modeling, Modernization and Automation

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

Cybersecurity e NIS2: come si muovono i CIO per dormire sonni (un po’) più tranquilli

Amazon Kinesis Data Streams: celebrating a decade of real-time data innovation

The year’s top 10 enterprise AI trends — so far

Empower Your Cyber Defenders with Real-Time Analytics

Exploring the AI and data capabilities of watsonx

Stay Connected