Data Architecture, Data Warehouse and Reference

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

AWS Big Data

NOVEMBER 7, 2024

BladeBridge offers a comprehensive suite of tools that automate much of the complex conversion work, allowing organizations to quickly and reliably transition their data analytics capabilities to the scalable Amazon Redshift data warehouse. times better price performance than other cloud data warehouses.

Data Warehouse

Data Warehouse Reporting Big Data Data Lake

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure. Delta Lake doesn’t have a specific concept for incremental queries.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

AWS Big Data

OCTOBER 30, 2024

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze your data using standard SQL and your existing business intelligence (BI) tools. Data ingestion is the process of getting data to Amazon Redshift.

Data Warehouse

Data Warehouse Sales Data Lake Recreation/Entertainment

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

The landscape of big data management has been transformed by the rising popularity of open table formats such as Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake. These formats, designed to address the limitations of traditional data storage systems, have become essential in modern data architectures.

Metadata

Metadata Data Warehouse Big Data Data Lake

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. Solution overview Amazon Redshift is an industry-leading cloud data warehouse.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. For more examples and references to other posts, refer to the following GitHub repository.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis.

Data Warehouse

Data Warehouse Analytics Testing Modeling

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon AppFlow automatically encrypts data in motion, and allows you to restrict data from flowing over the public internet for SaaS applications that are integrated with AWS PrivateLink , reducing exposure to security threats. Refer to the Amazon Redshift Database Developer Guide for more details.

Analytics

Analytics Data Warehouse Big Data Metrics

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

SageMaker still includes all the existing ML and AI capabilities you’ve come to know and love for data wrangling, human-in-the-loop data labeling with Amazon SageMaker Ground Truth , experiments, MLOps, Amazon SageMaker HyperPod managed distributed training, and more. The tools to transform your business are here.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS. The following diagram illustrates the solution architecture.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architecture is a complex and varied field and different organizations and industries have unique needs when it comes to their data architects. Solutions data architect: These individuals design and implement data solutions for specific business needs, including data warehouses, data marts, and data lakes.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Combine transactional, streaming, and third-party data on Amazon Redshift for financial services

AWS Big Data

FEBRUARY 1, 2024

Amazon Redshift features like streaming ingestion, Amazon Aurora zero-ETL integration , and data sharing with AWS Data Exchange enable near-real-time processing for trade reporting, risk management, and trade optimization. This will be your OLTP data store for transactional data. version cluster. version cluster.

Data Warehouse

Data Warehouse Dashboards Risk Management Risk

Centralize near-real-time governance through alerts on Amazon Redshift data warehouses for sensitive queries

AWS Big Data

JUNE 29, 2023

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud that delivers powerful and secure insights on all your data with the best price-performance. With Amazon Redshift, you can analyze your data to derive holistic insights about your business and your customers.

Data Warehouse

Data Warehouse Dashboards Testing Visualization

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

AWS Big Data

MARCH 6, 2025

Tens of thousands of customers use Amazon Redshift for modern data analytics at scale, delivering up to three times better price-performance and seven times better throughput than other cloud data warehouses. Refer to IAM Identity Center identity source tutorials for the IdP setup. IAM Identity Center enabled.

Visualization

Visualization Sales Data Warehouse Management

Has the Data Warehouse Had Its Day?

BI-Survey

JANUARY 15, 2023

Data architecture is a topic that is as relevant today as ever. It is widely regarded as a matter for data engineers, not business domain experts. Statements from countless interviews with our customers reveal that the data warehouse is seen as a “black box” by many and understood by few business users.

Data Warehouse

Data Warehouse IT Data Architecture Measurement

How to Build a Performant Data Warehouse in Redshift

Sisense

SEPTEMBER 3, 2019

This blog is intended to give an overview of the considerations you’ll want to make as you build your Redshift data warehouse to ensure you are getting the optimal performance. OLTP databases are best at queries where we are doing point scans or short scans of the data, think “return the number of deposits by X user this week.”.

Data Warehouse

Data Warehouse OLAP Statistics Cost-Benefit

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift is a fast, fully managed petabyte-scale cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Amazon Redshift also supports querying nested data with complex data types such as struct, array, and map.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

This means that if data is moved from a bucket in the source Region to another bucket in the target Region, the data access permissions need to be reapplied in the target Region. AWS Glue Data Catalog The AWS Glue Data Catalog is a central repository of metadata about data stored in your data lake.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Large-scale data warehouse migration to the cloud is a complex and challenging endeavor that many organizations undertake to modernize their data infrastructure, enhance data management capabilities, and unlock new business opportunities. This makes sure the new data platform can meet current and future business goals.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

Diagram 1: Overall architecture of the solution, using AWS Step Functions, Amazon Redshift and Amazon S3 The following AWS services were used to shape our new ETL architecture: Amazon Redshift A fully managed, petabyte-scale data warehouse service in the cloud.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Generic orchestration framework for data warehousing workloads using Amazon Redshift RSQL

AWS Big Data

APRIL 3, 2023

Tens of thousands of customers run business-critical workloads on Amazon Redshift , AWS’s fast, petabyte-scale cloud data warehouse delivering the best price-performance. With Amazon Redshift, you can query data across your data warehouse, operational data stores, and data lake using standard SQL.

Data Warehouse

Data Warehouse Testing Data Lake Data-driven

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

AWS Big Data

OCTOBER 9, 2024

Today, customers are embarking on data modernization programs by migrating on-premises data warehouses and data lakes to the AWS Cloud to take advantage of the scale and advanced analytical capabilities of the cloud. In the following sections, we showcase how to configure an AWS Glue Data Quality job for comparison.

Data Quality

Data Quality Data Lake Data Warehouse Metrics

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

In this post, we are excited to summarize the features that the AWS Glue Data Catalog, AWS Glue crawler, and Lake Formation teams delivered in 2022. Whether you are a data platform builder, data engineer, data scientist, or any technology leader interested in data lake solutions, this post is for you.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

One of the key challenges in modern big data management is facilitating efficient data sharing and access control across multiple EMR clusters. Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

AWS Big Data

OCTOBER 18, 2023

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that provides the flexibility to use provisioned or serverless compute for your analytical workloads. The decoupled compute and storage architecture of Amazon Redshift enables you to build highly scalable, resilient, and cost-effective workloads.

Analytics

Analytics Data Warehouse Dashboards Testing

Enhance data security and governance for Amazon Redshift Spectrum with VPC endpoints

AWS Big Data

FEBRUARY 16, 2024

Many customers are extending their data warehouse capabilities to their data lake with Amazon Redshift. They are looking to further enhance their security posture where they can enforce access policies on their data lakes based on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Data Warehouse Testing Business Objectives

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. These upstream data sources constitute the data producer components.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

An Introduction to Disaster Recovery with the Cloudera Data Platform

Cloudera

AUGUST 9, 2022

Data platforms are no longer skunkworks projects or science experiments. As customers import their mainframe and legacy data warehouse workloads, there is an expectation on the platform that it can meet, if not exceed, the resilience of the prior system and its associated dependencies. Cloudera Data Platform.

Data Lake

Data Lake Data Warehouse Data-driven IoT

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

Federated queries allow querying data across Amazon RDS for MySQL and PostgreSQL data sources without the need for extract, transform, and load (ETL) pipelines. If storing operational data in a data warehouse is a requirement, synchronization of tables between operational data stores and Amazon Redshift tables is supported.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

For more information on this foundation, refer to A Detailed Overview of the Cost Intelligence Dashboard. Additionally, it manages table definitions in the AWS Glue Data Catalog , containing references to data sources and targets of extract, transform, and load (ETL) jobs in AWS Glue.

Dashboards

Dashboards Analytics Metadata Data Warehouse

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

These are six main steps in the data pipeline: Amazon EventBridge triggers an AWS Lambda function when the event pattern for AWS Glue Data Quality matches the defined rule. For more information, refer to Working with Query Results, Output Files, and Query History. For S3 path , enter the S3 path to your data source. (

Data Quality

Data Quality Metrics Visualization Dashboards

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Performance was tested on a Redshift serverless data warehouse with 128 RPU. In our testing, the dataset was stored in Amazon S3 in Parquet format and AWS Glue Data Catalog was used to manage external databases and tables. He works on the intersection of data lakes and data warehouses.

Data Lake

Data Lake Statistics Broadcasting Optimization

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts. We recently announced the integration of Amazon Redshift data sharing with AWS Lake Formation. S3 data lake – Contains the web activity and leads datasets.

Data Lake

Data Lake Data Warehouse Marketing Management

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

To learn more about RAG, refer to Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart. A RAG-based generative AI application can only produce generic responses based on its training data and the relevant documents in the knowledge base.

Data Lake

Data Lake Unstructured Data Management Snapshot

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

Kinesis Data Streams has native integrations with other AWS services such as AWS Glue and Amazon EventBridge to build real-time streaming applications on AWS. Refer to Amazon Kinesis Data Streams integrations for additional details. To access your data from Timestream, you need to install the Timestream plugin for Grafana.

Analytics

Analytics IoT Data-driven Snapshot

Dive deep into AWS Glue 4.0 for Apache Spark

AWS Big Data

MAY 18, 2023

It’s even harder when your organization is dealing with silos that impede data access across different data stores. Seamless data integration is a key requirement in a modern data architecture to break down data silos. For more details, refer to Spark Release 3.3.0 AWS Glue Data Catalog client 3.6.0

Testing

Testing Data Lake Cost-Benefit Data Integration

Bringing Financial Services Business Use Cases to Life: Leveraging Data Analytics, ML/AI, and Gen AI

Cloudera

MAY 30, 2024

The client opted to adopt Kafka and Flink with Iceberg on Cloudera Private Cloud for streaming analytics scenarios and Cloudera Machine Learning and Data Warehouse on CDP Public Cloud for machine learning model development and data visualization applications.

Data Analytics

Data Analytics Risk Management Analytics Digital Transformation

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

Mark: The first element in the process is the link between the source data and the entry point into the data platform. At Ramsey International (RI), we refer to that layer in the architecture as the foundation, but others call it a staging area, raw zone, or even a source data lake. What is a data fabric?

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

Load data incrementally from transactional data lakes to data warehouses

Webinars

Trending Sources

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

Webinars

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

Run Apache XTable in AWS Lambda for background conversion of open table formats

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

What is a data architect? Skills, salaries, and how to become a data framework master

Combine transactional, streaming, and third-party data on Amazon Redshift for financial services

Centralize near-real-time governance through alerts on Amazon Redshift data warehouses for sensitive queries

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

Has the Data Warehouse Had Its Day?

How to Build a Performant Data Warehouse in Redshift

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Generic orchestration framework for data warehousing workloads using Amazon Redshift RSQL

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

AWS Lake Formation 2022 year in review

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

Enhance data security and governance for Amazon Redshift Spectrum with VPC endpoints

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

An Introduction to Disaster Recovery with the Cloudera Data Platform

Amazon Redshift data ingestion options

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Visualize data quality scores and metrics generated by AWS Glue Data Quality

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Exploring real-time streaming for generative AI Applications

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Dive deep into AWS Glue 4.0 for Apache Spark

Bringing Financial Services Business Use Cases to Life: Leveraging Data Analytics, ML/AI, and Gen AI

Demystifying Modern Data Platforms

Stay Connected