Big Data, Data Architecture and Data Transformation

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This enables you to extract insights from your data without the complexity of managing infrastructure.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Analytics

Analytics Data Warehouse Big Data Metrics

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Need for a data mesh architecture Because entities in the EUROGATE group generate vast amounts of data from various sourcesacross departments, locations, and technologiesthe traditional centralized data architecture struggles to keep up with the demands for real-time insights, agility, and scalability.

IoT

IoT Machine Learning Metadata Data-driven

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments.

Data Warehouse

Data Warehouse Analytics Testing Sales

Introducing blueprint discovery and other UI enhancements for Amazon OpenSearch Ingestion

AWS Big Data

MAY 22, 2024

Amazon OpenSearch Ingestion is a fully managed serverless pipeline that allows you to ingest, filter, transform, enrich, and route data to an Amazon OpenSearch Service domain or Amazon OpenSearch Serverless collection. He is deeply passionate about Data Architecture and helps customers build analytics solutions at scale on AWS.

Data Architecture

Data Architecture Visualization Data Transformation Management

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

Pattern 1: Data transformation, load, and unload Several of our data pipelines included significant data transformation steps, which were primarily performed through SQL statements executed by Amazon Redshift. The following Diagram 2 shows this workflow.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. The following diagram illustrates a scalable migration pattern for extract, transform, and load (ETL) scenario. The success criteria are the key performance indicators (KPIs) for each component of the data workflow.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. By decoupling storage and compute, data lakes promote cost-effective storage and processing of big data. Why did Orca choose Apache Iceberg?

Data Lake

Data Lake Analytics Snapshot Data Quality

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

He has a specialty in big data services and technologies and an interest in building customer business outcomes together. Jiseong Kim is a Senior Data Architect at AWS ProServe. He also understands how to apply technologies to solve big data problems and build a well-designed data architecture.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

The difference lies in when and where data transformation takes place. In ETL, data is transformed before it’s loaded into the data warehouse. In ELT, raw data is loaded into the data warehouse first, then it’s transformed directly within the warehouse.

Analytics

Analytics Dashboards Metadata Data Warehouse

Connecting the Data Lifecycle

Cloudera

NOVEMBER 29, 2021

Data transforms businesses. That’s where the data lifecycle comes into play. Managing data and its flow, from the edge to the cloud, is one of the most important tasks in the process of gaining data intelligence. . The company needed a modern data architecture to manage the growing traffic effectively. .

Data Lake

Data Lake Data Warehouse Data Architecture Reporting

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

In this post, we delve into a case study for a retail use case, exploring how the Data Build Tool (dbt) was used effectively within an AWS environment to build a high-performing, efficient, and modern data platform. It does this by helping teams handle the T in ETL (extract, transform, and load) processes.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

Overview of solution As a data-driven company, smava relies on the AWS Cloud to power their analytics use cases. smava ingests data from various external and internal data sources into a landing stage on the data lake based on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Data Warehouse Data-driven B2B

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

If storing operational data in a data warehouse is a requirement, synchronization of tables between operational data stores and Amazon Redshift tables is supported. In scenarios where data transformation is required, you can use Redshift stored procedures to modify data in Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

Independent data products often only have value if you can connect them, join them, and correlate them to create a higher order data product that creates additional insights. A modern data architecture is critical in order to become a data-driven organization.

Technology

Technology Data-driven Machine Learning Sales

Introducing the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

Use case overview Migrating Hadoop workloads to Amazon EMR accelerates big data analytics modernization, increases productivity, and reduces operational cost. Refactoring coupled compute and storage to a decoupling architecture is a modern data solution. Jiseong Kim is a Senior Data Architect at AWS ProServe.

Cost-Benefit

Cost-Benefit Data Lake Dashboards Big Data

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Key considerations Gameskraft embraces a modern data architecture, with the data lake residing in Amazon S3. To grant seamless access to the data lake, we use the innovative capabilities of Redshift Spectrum, which is a bridge between the data warehouse (Amazon Redshift) and data lake (Amazon S3).

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Data Vault 2.0 allows for the following: Agile data warehouse development Parallel data ingestion A scalable approach to handle multiple data sources even on the same entity A high level of automation Historization Full lineage support However, Data Vault 2.0

Enterprise

Enterprise Data Warehouse Data Lake Optimization

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

This was, without a question, a significant departure from traditional analytic environments, which often meant vendor-lock in and the inability to work with data at scale. Another unexpected challenge was the introduction of Spark as a processing framework for big data.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

A step-by-step guide to setting up a data governance program

IBM Big Data Hub

FEBRUARY 9, 2023

In our last blog , we delved into the seven most prevalent data challenges that can be addressed with effective data governance. Today we will share our approach to developing a data governance program to drive data transformation and fuel a data-driven culture. Don’t try to do everything at once!

Data Governance

Data Governance Business Objectives Data Quality Measurement

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. Amazon EMR Serverless is a serverless option in Amazon EMR that makes it easy for data analysts and engineers to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers.

Data Lake

Data Lake Dashboards Metrics Metadata

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

This adds an additional ETL step, making the data even more stale. Data lakehouse was created to solve these problems. The data warehouse storage layer is removed from lakehouse architectures. Instead, continuous data transformation is performed within the BLOB storage. Data mesh: A mostly new culture.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Start Thinking About DataOps

TDAN

DECEMBER 3, 2019

Everyone’s talking about data. Data is the key to unlocking insight— the secret sauce that will help you get predictive, the fuel for business intelligence. The transformative potential in AI? It relies on data. The good news is that data has never […].

Business Intelligence

Business Intelligence Dashboards Reporting Data Architecture

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

AWS Big Data

APRIL 5, 2023

The company also used the opportunity to reimagine its data pipeline and architecture. A key architectural decision that Showpad took during this time was to create a portable data layer by decoupling the data transformation from visualization, ML, or ad hoc querying tools and centralizing its business logic.

Dashboards

Dashboards Reporting Cost-Benefit Visualization

Successful Data Virtualisation: more than the right choice of platform

Data Virtualization

JANUARY 20, 2021

Learn in 12 minutes: What makes a strong use case for data virtualisation How to come up with a solid Proof of Concept How to prepare your organisation for data virtualisation You’ll have read all about data virtualisation and you’ve.

Data Warehouse

Data Warehouse Data Architecture Data Transformation Big Data

Accelerate data pipeline creation with the new visual interface in Amazon OpenSearch Ingestion

AWS Big Data

APRIL 22, 2025

Amazon OpenSearch Ingestion is a fully managed serverless pipeline that allows you to ingest, filter, transform, enrich, and route data to an Amazon OpenSearch Service domain or Amazon OpenSearch Serverless collection. He is deeply passionate about Data Architecture and helps customers build analytics solutions at scale on AWS.

Visualization

Visualization Data Transformation Management Risk

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Data Environment First off, the solutions you consider should be compatible with your current data architecture. We have outlined the requirements that most providers ask for: Data Sources Strategic Objective Use native connectivity optimized for the data source. addresses).

Analytics

Analytics Cost-Benefit Visualization Dashboards

How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and Amazon Athena

AWS Big Data

APRIL 29, 2025

While enabling organization-wide efficiency, the team also applied these principles to the data architecture, making sure that CLEA itself operates frugally. After evaluating various tools, we built a serverless data transformation pipeline using Amazon Athena and dbt.

Data Transformation

Data Transformation Cost-Benefit Testing Data Lake

Introducing the HubSpot connector for AWS Glue

AWS Big Data

DECEMBER 2, 2024

AWS Glue establishes a secure connection to HubSpot using OAuth for authorization and TLS for data encryption in transit. AWS Glue also supports the ability to apply complex data transformations, enabling efficient data integration and preparation to meet your needs. Kamen Sharlandjiev is a Sr.

Data Lake

Data Lake Testing Data Integration Metadata

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

AWS Big Data

MAY 12, 2025

In this post, we share how AppsFlyer successfully migrated their massive data infrastructure from self-managed Hadoop clusters to Amazon EMR Serverless , detailing their best practices, challenges to overcome, and lessons learned that can help guide other organizations in similar transformations.

Metrics

Metrics Cost-Benefit Metadata Data Lake

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

AWS Big Data

APRIL 28, 2025

Technical recap The AWS Glue Data Catalog served as the primary source of truth for schema and table updates, with Amazon EventBridge capturing Data Catalog events to trigger synchronization workflows. AWS Lambda parsed event metadata and managed schema synchronization, while Apache Kafka buffered events for real-time processing.

Data Lake

Data Lake Metadata Cost-Benefit Snapshot

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

AWS Big Data

NOVEMBER 14, 2024

We use the built-in features of Data Firehose, including AWS Lambda for necessary data transformation and Amazon Simple Notification Service (Amazon SNS) for near real-time alerts. He has helped customers build scalable data warehousing and big data solutions for over 20 years.

Data Lake

Data Lake Metadata Testing Data-driven

Data Leaders Brief

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Webinars

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Introducing blueprint discovery and other UI enhancements for Amazon OpenSearch Ingestion

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Connecting the Data Lifecycle

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Amazon Redshift data ingestion options

Automate discovery of data relationships using ML and Amazon Neptune graph technology

Introducing the AWS ProServe Hadoop Migration Delivery Kit TCO tool

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

How to modernize data lakes with a data lakehouse architecture

A step-by-step guide to setting up a data governance program

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Data platform trinity: Competitive or complementary?

Start Thinking About DataOps

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

Successful Data Virtualisation: more than the right choice of platform

Accelerate data pipeline creation with the new visual interface in Amazon OpenSearch Ingestion

What Is Embedded Analytics?

How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and Amazon Athena

Introducing the HubSpot connector for AWS Glue

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

Stay Connected