Data Architecture, Data Warehouse and Testing

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis.

Data Warehouse

Data Warehouse Analytics Testing Sales

Incremental refresh for Amazon Redshift materialized views on data lake tables

AWS Big Data

NOVEMBER 8, 2024

Amazon Redshift is a fast, fully managed cloud data warehouse that makes it cost-effective to analyze your data using standard SQL and business intelligence tools. However, if you want to test the examples using sample data, download the sample data. The sample files are ‘|’ delimited text files.

Data Lake

Data Lake Data Warehouse Optimization Testing

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. The past decades of enterprise data platform architectures can be summarized in 69 words. Introduction to Data Mesh. Source: Thoughtworks.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Modernizing the Data Warehouse: Challenges and Benefits

BI-Survey

AUGUST 21, 2020

Many companies are therefore forced to put these concepts to the test. But what are the right measures to make the data warehouse and BI fit for the future? Can the basic nature of the data be proactively improved? The data landscape and the data integration tasks to be solved are often too complex.

Data Warehouse

Data Warehouse Data Lake Data Governance Data Architecture

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

This enables you to extract insights from your data without the complexity of managing infrastructure. dbt has emerged as a leading framework, allowing data teams to transform and manage data pipelines effectively.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. Solution overview Amazon Redshift is an industry-leading cloud data warehouse.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

The future of data: A 5-pillar approach to modern data management

CIO Business Intelligence

DECEMBER 11, 2024

They must also select the data processing frameworks such as Spark, Beam or SQL-based processing and choose tools for ML. Based on business needs and the nature of the data, raw vs structured, organizations should determine whether to set up a data warehouse, a Lakehouse or consider a data fabric technology.

Management

Management Data Governance Data Science Reporting

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS. The following diagram illustrates the solution architecture.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. Data and metadata are shown in blue in the following detail diagram. create_hudi_s3.py

Metadata

Metadata Data Lake Snapshot Data Warehouse

Centralize near-real-time governance through alerts on Amazon Redshift data warehouses for sensitive queries

AWS Big Data

JUNE 29, 2023

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud that delivers powerful and secure insights on all your data with the best price-performance. With Amazon Redshift, you can analyze your data to derive holistic insights about your business and your customers.

Data Warehouse

Data Warehouse Dashboards Testing Visualization

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architecture is a complex and varied field and different organizations and industries have unique needs when it comes to their data architects. Solutions data architect: These individuals design and implement data solutions for specific business needs, including data warehouses, data marts, and data lakes.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

Data organizations often have a mix of centralized and decentralized activity. DataOps concerns itself with the complex flow of data across teams, data centers and organizational boundaries. It expands beyond tools and data architecture and views the data organization from the perspective of its processes and workflows.

Data Processing

Data Processing Data Lake Cost-Benefit Testing

DataOps For Business Analytics Teams

DataKitchen

JANUARY 3, 2022

There’s a recent trend toward people creating data lake or data warehouse patterns and calling it data enablement or a data hub. DataOps expands upon this approach by focusing on the processes and workflows that create data enablement and business analytics. DataOps Process Hub. Stop Firefighting.

Business Analytics

Business Analytics Analytics Testing Dashboards

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Large-scale data warehouse migration to the cloud is a complex and challenging endeavor that many organizations undertake to modernize their data infrastructure, enhance data management capabilities, and unlock new business opportunities. This makes sure the new data platform can meet current and future business goals.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Carhartt turns to data under new CIO

CIO Business Intelligence

NOVEMBER 25, 2022

Today, more than 90% of its applications run in the cloud, with most of its data is housed and analyzed in a homegrown enterprise data warehouse. Like many CIOs, Carhartt’s top digital leader is aware that data is the key to making advanced technologies work. Today, we backflush our data lake through our data warehouse.

Data Lake

Data Lake Data Warehouse Unstructured Data Data Architecture

Has the Data Warehouse Had Its Day?

BI-Survey

JANUARY 15, 2023

Data architecture is a topic that is as relevant today as ever. It is widely regarded as a matter for data engineers, not business domain experts. Statements from countless interviews with our customers reveal that the data warehouse is seen as a “black box” by many and understood by few business users.

Data Warehouse

Data Warehouse IT Data Architecture Measurement

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

AWS Big Data

MARCH 6, 2025

Tens of thousands of customers use Amazon Redshift for modern data analytics at scale, delivering up to three times better price-performance and seven times better throughput than other cloud data warehouses. The application has been tested successfully with versions v3.12.8 Create a Python virtual environment.

Visualization

Visualization Sales Data Warehouse Management

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

Our approach The migration initiative consisted of two main parts: building the new architecture and migrating data pipelines from the existing tool to the new architecture. Often, we would work on both in parallel, testing one component of the architecture while developing another at the same time.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Automate data loading from your database into Amazon Redshift using AWS Database Migration Service (DMS), AWS Step Functions, and the Redshift Data API

AWS Big Data

JULY 2, 2024

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing ETL (extract, transform, and load), business intelligence (BI), and reporting tools. For Name , enter a name (for example, dms-test ).

Data Warehouse

Data Warehouse Sales Testing Big Data

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To speed up the self-service analytics and foster innovation based on data, a solution was needed to provide ways to allow any team to create data products on their own in a decentralized manner. To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

A modern data platform entails maintaining data across multiple layers, targeting diverse platform capabilities like high performance, ease of development, cost-effectiveness, and DataOps features such as CI/CD, lineage, and unit testing. It’s raw, unprocessed data straight from the source. usr/local/airflow/.local/bin/dbt

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Enhance data security and governance for Amazon Redshift Spectrum with VPC endpoints

AWS Big Data

FEBRUARY 16, 2024

Many customers are extending their data warehouse capabilities to their data lake with Amazon Redshift. They are looking to further enhance their security posture where they can enforce access policies on their data lakes based on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Data Warehouse Testing Business Objectives

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

One of the key challenges in modern big data management is facilitating efficient data sharing and access control across multiple EMR clusters. Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. Test access using Athena queries in the consumer account.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Lakehouse provides an open data architecture that reduces data silos and unifies data across Amazon Simple Storage Service (Amazon S3) data lakes, Redshift data warehouses, and third-party and federated data sources. connection testing, metadata retrieval, and data preview.

Analytics

Analytics Data Lake Metadata Data Warehouse

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

AWS Big Data

OCTOBER 18, 2023

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that provides the flexibility to use provisioned or serverless compute for your analytical workloads. The decoupled compute and storage architecture of Amazon Redshift enables you to build highly scalable, resilient, and cost-effective workloads.

Analytics

Analytics Data Warehouse Dashboards Testing

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

AWS Big Data

OCTOBER 9, 2024

Today, customers are embarking on data modernization programs by migrating on-premises data warehouses and data lakes to the AWS Cloud to take advantage of the scale and advanced analytical capabilities of the cloud. This helps prevent bad data from entering your data lakes and data warehouses.

Data Quality

Data Quality Data Lake Data Warehouse Metrics

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. These upstream data sources constitute the data producer components.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Generic orchestration framework for data warehousing workloads using Amazon Redshift RSQL

AWS Big Data

APRIL 3, 2023

Tens of thousands of customers run business-critical workloads on Amazon Redshift , AWS’s fast, petabyte-scale cloud data warehouse delivering the best price-performance. With Amazon Redshift, you can query data across your data warehouse, operational data stores, and data lake using standard SQL.

Data Warehouse

Data Warehouse Testing Data Lake Data-driven

What is a data engineer? An analytics role in high demand

CIO Business Intelligence

SEPTEMBER 14, 2023

Database-centric: In larger organizations, where managing the flow of data is a full-time job, data engineers focus on analytics databases. Database-centric data engineers work with data warehouses across multiple databases and are responsible for developing table schemas.

Analytics

Analytics Data Science Unstructured Data Data mining

What is a data engineer? An analytics role in high demand

CIO Business Intelligence

AUGUST 9, 2022

Database-centric: In larger organizations, where managing the flow of data is a full-time job, data engineers focus on analytics databases. Database-centric data engineers work with data warehouses across multiple databases and are responsible for developing table schemas. Data engineer job description.

Analytics

Analytics Data Science Statistics Unstructured Data

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Many of the tests to check performance and volumes of data scanned have used Athena because it provides a simple to use, fully serverless, cost effective, interface without the need to setup infrastructure. 12xl) cluster The test included four types of queries that represent different production workloads that Cloudinary is running.

Data Lake

Data Lake Metadata Snapshot Analytics

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

Companies, on the other hand, have continued to demand highly scalable and flexible analytic engines and services on the data lake, without vendor lock-in. Organizations want modern data architectures that evolve at the speed of their business and we are happy to support them with the first open data lakehouse. .

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

How Wallapop improved performance of analytics workloads with Amazon Redshift Serverless and data sharing

AWS Big Data

NOVEMBER 14, 2023

Amazon Redshift is a fast, fully managed cloud data warehouse that makes it straightforward and cost-effective to analyze all your data at petabyte scale, using standard SQL and your existing business intelligence (BI) tools. Their cluster size of the provisioned data warehouse didn’t change.

Data Warehouse

Data Warehouse Analytics Testing Cost-Benefit

Accelerate Amazon Redshift secure data use with Satori – Part 1

AWS Big Data

SEPTEMBER 21, 2023

Satori integrates natively with both Amazon Redshift provisioned clusters and Amazon Redshift Serverless for easy setup of your Amazon Redshift data warehouse in the secure Satori portal. In part 2, we will explore how to set up self-service data access with Satori to data stored in Amazon Redshift.

Data Warehouse

Data Warehouse Interactive Data Architecture Data-driven

Cloudera Uses CDP to Reduce IT Cloud Spend by $12 Million

Cloudera

OCTOBER 18, 2022

Like all of our customers, Cloudera depends on the Cloudera Data Platform (CDP) to manage our day-to-day analytics and operational insights. Many aspects of our business live within this modern data architecture, providing all Clouderans the ability to ask, and answer, important questions for the business.

IT

IT Data Warehouse Dashboards Visualization

Dive deep into AWS Glue 4.0 for Apache Spark

AWS Big Data

MAY 18, 2023

It’s even harder when your organization is dealing with silos that impede data access across different data stores. Seamless data integration is a key requirement in a modern data architecture to break down data silos. We observed that our TPC-DS tests on Amazon S3 had a total job runtime on AWS Glue 4.0

Testing

Testing Data Lake Cost-Benefit Data Integration

Make the leap to Hybrid with Cloudera Data Engineering

Cloudera

FEBRUARY 14, 2022

CDE like the other data services (Data Warehouse and Machine Learning for example) deploys within the same kubernetes cluster and is managed through the same security and governance model. The key is that CDP, as a hybrid data platform, allows this shift to be fluid. With the introduction of PVC 1.3.0 Agile multi-tenancy.

Data-driven

Data-driven Testing Data Warehouse Enterprise

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Performance was tested on a Redshift serverless data warehouse with 128 RPU. In our testing, the dataset was stored in Amazon S3 in Parquet format and AWS Glue Data Catalog was used to manage external databases and tables. He works on the intersection of data lakes and data warehouses.

Data Lake

Data Lake Statistics Broadcasting Optimization

Extract data from SAP ERP using AWS Glue and the SAP SDK

AWS Big Data

FEBRUARY 8, 2023

Test the connection with SAP using the wheel file. Deenbandhu Prasad is a Senior Analytics Specialist at AWS, specializing in big data services. He is passionate about helping customers build modern data architecture on the AWS Cloud. The high-level steps are as follows: Clone the PyRFC module from GitHub.

Testing

Testing Data Integration Data Lake Enterprise

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

Each CDH dataset has three processing layers: source (raw data), prepared (transformed data in Parquet), and semantic (combined datasets). It is possible to define stages (DEV, INT, PROD) in each layer to allow structured release and test without affecting PROD.

Analytics

Analytics Dashboards Metadata Data Warehouse

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. We begin with a Data lake reference architecture followed by an overview of operational data processing framework. This concludes the demo.

Data Lake

Data Lake Data Processing Metadata Snapshot

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Incremental refresh for Amazon Redshift materialized views on data lake tables

Webinars

Trending Sources

What is a Data Mesh?

Webinars

Modernizing the Data Warehouse: Challenges and Benefits

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

The future of data: A 5-pillar approach to modern data management

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

Run Apache XTable in AWS Lambda for background conversion of open table formats

Centralize near-real-time governance through alerts on Amazon Redshift data warehouses for sensitive queries

What is a data architect? Skills, salaries, and how to become a data framework master

Centralize Your Data Processes With a DataOps Process Hub

DataOps For Business Analytics Teams

Data’s dark secret: Why poor quality cripples AI and growth

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Carhartt turns to data under new CIO

Has the Data Warehouse Had Its Day?

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Automate data loading from your database into Amazon Redshift using AWS Database Migration Service (DMS), AWS Step Functions, and the Redshift Data API

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Enhance data security and governance for Amazon Redshift Spectrum with VPC endpoints

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Top analytics announcements of AWS re:Invent 2024

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Generic orchestration framework for data warehousing workloads using Amazon Redshift RSQL

What is a data engineer? An analytics role in high demand

What is a data engineer? An analytics role in high demand

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

How Wallapop improved performance of analytics workloads with Amazon Redshift Serverless and data sharing

Accelerate Amazon Redshift secure data use with Satori – Part 1

Cloudera Uses CDP to Reduce IT Cloud Spend by $12 Million

Dive deep into AWS Glue 4.0 for Apache Spark

Make the leap to Hybrid with Cloudera Data Engineering

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Extract data from SAP ERP using AWS Glue and the SAP SDK

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Stay Connected