Data Architecture, Data Transformation and Optimization

Data Architecture

Data Transformation

Optimization

RocksDB 101: Optimizing stateful streaming in Apache Spark with Amazon EMR and AWS Glue

AWS Big Data

JUNE 18, 2025

Organizations face mounting pressure to process massive data streams instantaneously—from detecting fraudulent transactions and delivering personalized customer experiences to optimizing complex supply chains and responding to market dynamics milliseconds ahead of competitors.

Optimization

Optimization Snapshot Metrics Big Data

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This enables you to extract insights from your data without the complexity of managing infrastructure.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Streamline Payment Applications & Lien Waivers Through Innovative Construction Technology

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Trending Sources

Building a Trusted AI Data Architecture: The Foundation of Scalable Intelligence

Teradata

JUNE 30, 2025

Learn more Check out Teradata AI Factory close Home Resources Data architecture Article Building a Trusted AI Data Architecture: The Foundation of Scalable Intelligence Discover how AI data architecture shapes data quality and governance for successful AI initiatives. What is AI data architecture?

Data Architecture

Data Architecture ROI Data-driven Enterprise

Webinars

How to Streamline Payment Applications & Lien Waivers Through Innovative Construction Technology

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. With a unified catalog, enhanced analytics capabilities, and efficient data transformation processes, were laying the groundwork for future growth.

IoT

IoT Machine Learning Metadata Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Analytics

Analytics Data Warehouse Big Data Metrics

Go vs. Python for Modern Data Workflows: Need Help Deciding?

KDnuggets

JUNE 19, 2025

This readability becomes valuable when collaborating with domain experts who need to understand and validate your data transformations. Real-world data projects often involve integrating multiple data sources, handling different formats, and dealing with inconsistent data quality.

Experimentation

Experimentation Machine Learning Data Science Advertising

Harnessing the Power of Nested Materialized Views and exploring Cascading Refresh

AWS Big Data

JULY 11, 2025

Materialized views store precomputed query results that future similar queries can utilize, offering a powerful solution for data warehouse environments where applications often need to execute resource-intensive queries against large tables. It can be local tables or data sharing tables.

Data Warehouse

Data Warehouse Dashboards Optimization Sales

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments.

Data Warehouse

Data Warehouse Analytics Testing Sales

How To Use Airbyte, dbt-teradata, Dagster, and Teradata Vantage™ for Seamless Data Integration

Teradata

MAY 30, 2025

The product_popularity table aggregates data on product purchase frequency, delivering insights into product demand to inform inventory and marketing strategies. Finally, the purchase_patterns table examines customer purchase behavior over time, aiding in understanding buying trends and optimizing the customer journey.

Data Integration

Data Integration Data Processing Metadata Testing

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

Pattern 1: Data transformation, load, and unload Several of our data pipelines included significant data transformation steps, which were primarily performed through SQL statements executed by Amazon Redshift. The following Diagram 2 shows this workflow.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. The following diagram illustrates a scalable migration pattern for extract, transform, and load (ETL) scenario. The success criteria are the key performance indicators (KPIs) for each component of the data workflow.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse. In this post, we show how smava optimized their data platform by using Amazon Redshift Serverless and Amazon Redshift data sharing to overcome right-sizing challenges for unpredictable workloads and further improve price-performance.

Data Lake

Data Lake Data Warehouse Data-driven B2B

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

BMW Group uses 4,500 AWS Cloud accounts across the entire organization but is faced with the challenge of reducing unnecessary costs, optimizing spend, and having a central place to monitor costs. The ultimate goal is to raise awareness of cloud efficiency and optimize cloud utilization in a cost-effective and sustainable manner.

Analytics

Analytics Dashboards Metadata Data Warehouse

Cloudera Data Engineering 2021 Year End Review

Cloudera

DECEMBER 21, 2021

Since the release of Cloudera Data Engineering (CDE) more than a year ago , our number one goal was operationalizing Spark pipelines at scale with first class tooling designed to streamline automation and observability. This enabled new use-cases with customers that were using a mix of Spark and Hive to perform data transformations. .

Snapshot

Snapshot Data-driven Optimization Data Architecture

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

With auto-copy, automation enhances the COPY command by adding jobs for automatic ingestion of data. If storing operational data in a data warehouse is a requirement, synchronization of tables between operational data stores and Amazon Redshift tables is supported.

IoT

IoT Data Warehouse Cost-Benefit Data Lake

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

Additionally, a TCO calculator generates the TCO estimation of an optimized EMR cluster for facilitating the migration. After you complete the checklist, you’ll have a better understanding of how to design the future architecture. For the compute-heavy workloads such as MapReduce or Hive-on-MR jobs, use CPU-optimized instances.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

It also used device data to develop Lenovo Device Intelligence, which uses AI-driven predictive analytics to help customers understand and proactively prevent and solve potential IT issues. Lenovo Device Intelligence can also help to optimize IT support costs, reduce employee downtime, and improve the user experience, the company says.

Analytics

Analytics Data Lake Metadata Cost-Benefit

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. This ensures that the data is suitable for training purposes. The following diagram illustrates the solution architecture.

Data Lake

Data Lake Analytics Snapshot Data Quality

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Data Vault 2.0 allows for the following: Agile data warehouse development Parallel data ingestion A scalable approach to handle multiple data sources even on the same entity A high level of automation Historization Full lineage support However, Data Vault 2.0

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Breaking down data silos for digital success

CIO Business Intelligence

NOVEMBER 7, 2023

Given the importance of sharing information among diverse disciplines in the era of digital transformation, this concept is arguably as important as ever. The aim is to normalize, aggregate, and eventually make available to analysts across the organization data that originates in various pockets of the enterprise.

Data Warehouse

Data Warehouse Digital Transformation Data-driven Reporting

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift enables you to use SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and machine learning (ML) to deliver the best price-performance at scale. Shashank Tewari is a Senior Technical Account Manager at AWS.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Birst automates the creation of data warehouses in Snowflake

Birst BI

FEBRUARY 25, 2020

Customers such as Crossmark , DJO Global and others use Birst with Snowflake to deliver the ultimate modern data architecture. The Snowflake/Birst combination creates the optimal balance between IT control and end-user freedom, eliminating analytic silos once and for all.

Data Warehouse

Data Warehouse Cost-Benefit Data Architecture Enterprise

Introducing the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

When migrating Hadoop workloads to Amazon EMR , it’s often difficult to identify the optimal cluster configuration without analyzing existing workloads by hand. It enables compute such as EMR instances and storage such as Amazon Simple Storage Service (Amazon S3) data lakes to scale. For more information, see the GitHub repo.

Cost-Benefit

Cost-Benefit Data Lake Dashboards Big Data

Choosing A Graph Data Model to Best Serve Your Use Case

Ontotext

MARCH 27, 2024

It accelerates data projects with data quality and lineage and contextualizes through ontologies , taxonomies, and vocabularies, making integrations easier. RDF is used extensively for data publishing and data interchange and is based on W3C and other industry standards. Increasingly, organizations are using both.

Modeling

Modeling Metadata Data Quality Enterprise

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale. Data may be stored in its raw original form or optimized into a different format suitable for consumption by specialized engines.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Data ingestion – Steps 1 and 2 use AWS DMS, which connects to the source database and moves full and incremental data (CDC) to Amazon S3 in Parquet format. Data transformation – Steps 3 and 4 represent an EMR Serverless Spark application (Amazon EMR 6.9 Monjumi Sarma is a Data Lab Solutions Architect at AWS.

Data Lake

Data Lake Dashboards Metrics Metadata

Best BI Tools For 2024 You Need to Know

FineReport

MARCH 31, 2024

Furthermore, these tools boast customization options, allowing users to tailor data sources to address areas critical to their business success, thereby generating actionable insights and customizable reports. Best BI Tools for Data Analysts 3.1 Key Features: Extensive library of pre-built connectors for diverse data sources.

Dashboards

Dashboards Visualization Data mining Data-driven

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

A read-optimized platform that can integrate data from multiple applications emerged. In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. This adds an additional ETL step, making the data even more stale. Data lakehouse was created to solve these problems.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

AWS Big Data

APRIL 5, 2023

The company also used the opportunity to reimagine its data pipeline and architecture. A key architectural decision that Showpad took during this time was to create a portable data layer by decoupling the data transformation from visualization, ML, or ad hoc querying tools and centralizing its business logic.

Dashboards

Dashboards Reporting Cost-Benefit Visualization

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

The company started its New Analytics Era initiative by migrating its data from outdated SQL servers to a modern AWS data lake. It then built a cutting-edge cloud-based analytics platform, designed with an innovative data architecture. It also crafted multiple machine learning and AI models to tackle business challenges.

IT Insurance Cost-Benefit Testing

Unlocking Trino’s Full Potential With Simba Drivers for BI & ETL

Jet Global

OCTOBER 1, 2024

Trino allows users to run ad hoc queries across massive datasets, making real-time decision-making a reality without needing extensive data transformations. This is particularly valuable for teams that require instant answers from their data. Data Lake Analytics: Trino doesn’t just stop at databases.

Dashboards

Dashboards Data Lake Reporting Cost-Benefit

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Reports In formats that are both static and interactive, these showcase tabular views of data. Strategic Objective Provide an optimal user experience regardless of where and how users prefer to access information. Data Environment First off, the solutions you consider should be compatible with your current data architecture.

Analytics

Analytics Cost-Benefit Visualization Dashboards

How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and Amazon Athena

AWS Big Data

APRIL 29, 2025

At the BMW Group, our Cloud Efficiency Analytics (CLEA) team has developed a FinOps solution to optimize costs across over 10,000 cloud accounts. While enabling organization-wide efficiency, the team also applied these principles to the data architecture, making sure that CLEA itself operates frugally.

Data Transformation

Data Transformation Cost-Benefit Testing Data Lake

How Airties achieved scalability and cost-efficiency by moving from Kafka to Amazon Kinesis Data Streams

AWS Big Data

MAY 29, 2025

The flagship software as a service (SaaS) product, Airties Home, is an AI-driven platform designed to automate customer experience management for home connectivity, offering proactive customer care, network optimization, and real-time insights. This path is optimized for efficient storage and bulk reading from Amazon S3 by Amazon EMR.

Cost-Benefit

Cost-Benefit Optimization Metadata Data-driven

Introducing the HubSpot connector for AWS Glue

AWS Big Data

DECEMBER 2, 2024

AWS Glue establishes a secure connection to HubSpot using OAuth for authorization and TLS for data encryption in transit. AWS Glue also supports the ability to apply complex data transformations, enabling efficient data integration and preparation to meet your needs.

Data Lake

Data Lake Testing Data Integration Metadata

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

AWS Big Data

MAY 12, 2025

AppsFlyer is a leading analytics and attribution company designed to help businesses measure and optimize their marketing efforts across mobile, web, and connected devices. With a focus on privacy-first innovation, AppsFlyer empowers organizations to make data-driven decisions while respecting user privacy and compliance regulations.

Metrics

Metrics Cost-Benefit Metadata Data Lake

Unlock self-serve streaming SQL with Amazon Managed Service for Apache Flink

AWS Big Data

MAY 28, 2025

Riskified is an ecommerce fraud prevention and risk management platform that helps businesses optimize online transactions by distinguishing legitimate customers from fraudulent ones. You must know how the data is structured to validate a Flink SQL query on a streaming source like a Kafka topic.

Management

Management Metrics Cost-Benefit Technology

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

AWS Big Data

APRIL 28, 2025

Technical recap The AWS Glue Data Catalog served as the primary source of truth for schema and table updates, with Amazon EventBridge capturing Data Catalog events to trigger synchronization workflows. AWS Lambda parsed event metadata and managed schema synchronization, while Apache Kafka buffered events for real-time processing.

Data Lake

Data Lake Metadata Cost-Benefit Snapshot

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

AWS Big Data

NOVEMBER 14, 2024

We use the built-in features of Data Firehose, including AWS Lambda for necessary data transformation and Amazon Simple Notification Service (Amazon SNS) for near real-time alerts. To maintain up-to-date data, an AWS Glue crawler reads and updates the AWS Glue Data Catalog from transformed Parquet files.

Data Lake

Data Lake Metadata Testing Data-driven

RocksDB 101: Optimizing stateful streaming in Apache Spark with Amazon EMR and AWS Glue

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Trending Sources

Building a Trusted AI Data Architecture: The Foundation of Scalable Intelligence

Webinars

How EUROGATE established a data mesh architecture using Amazon DataZone

Data’s dark secret: Why poor quality cripples AI and growth

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Go vs. Python for Modern Data Workflows: Need Help Deciding?

Harnessing the Power of Nested Materialized Views and exploring Cascading Refresh

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

How To Use Airbyte, dbt-teradata, Dagster, and Teradata Vantage™ for Seamless Data Integration

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

How smava makes loans transparent and affordable using Amazon Redshift Serverless

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Cloudera Data Engineering 2021 Year End Review

Amazon Redshift data ingestion options

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Lay the groundwork now for advanced analytics and AI

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Breaking down data silos for digital success

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Birst automates the creation of data warehouses in Snowflake

Introducing the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Choosing A Graph Data Model to Best Serve Your Use Case

How to modernize data lakes with a data lakehouse architecture

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Best BI Tools For 2024 You Need to Know

Data platform trinity: Competitive or complementary?

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

CIO 100 Award winners drive business results with IT

Unlocking Trino’s Full Potential With Simba Drivers for BI & ETL

What Is Embedded Analytics?

How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and Amazon Athena

How Airties achieved scalability and cost-efficiency by moving from Kafka to Amazon Kinesis Data Streams

Introducing the HubSpot connector for AWS Glue

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

Unlock self-serve streaming SQL with Amazon Managed Service for Apache Flink

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

Stay Connected