Data Analytics, Data Transformation and Reference

Data Analytics

Data Transformation

Reference

Build data pipelines with dbt in Amazon Redshift using Amazon MWAA and Cosmos

AWS Big Data

AUGUST 13, 2025

When integrated with modern development practices, dbt projects can use version control for collaboration, incorporate testing for data quality, and utilize reusable components through macros. dbt also automatically manages dependencies, making sure data transformations execute in the correct sequence.

Data Quality

Data Quality Testing Modeling Data-driven

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

Your generated jobs can use a variety of data transformations, including filters, projections, unions, joins, and aggregations, giving you the flexibility to handle complex data processing requirements. In this post, we discuss how Amazon Q data integration transforms ETL workflow development.

Data Integration

Data Integration Visualization Data Processing Data Lake

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Streamline Payment Applications & Lien Waivers Through Innovative Construction Technology

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Trending Sources

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Analytics

Analytics Data Warehouse Big Data Metrics

Webinars

How to Streamline Payment Applications & Lien Waivers Through Innovative Construction Technology

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

AWS Big Data

NOVEMBER 22, 2024

You can create temporary tables once and reference them throughout, without having to constantly refresh database connections and restart from scratch. Please refer to Redshift Quotas and Limits here. Anusha Challa is a Senior Analytics Specialist Solutions Architect focused on Amazon Redshift.

Data Warehouse

Data Warehouse Recreation/Entertainment Cost-Benefit Data-driven

How DeNA Co., Ltd. accelerated anonymized data quality tests up to 100 times faster using Amazon Redshift Serverless and dbt

AWS Big Data

DECEMBER 17, 2024

dbt provides a SQL-first templating engine for repeatable and extensible data transformations, including a data tests feature, which allows verifying data models and tables against expected rules and conditions using SQL.

Data Quality

Data Quality Testing Metrics Optimization

How To Use Airbyte, dbt-teradata, Dagster, and Teradata Vantage™ for Seamless Data Integration

Teradata

MAY 30, 2025

Infrastructure layout Diagram illustrating the data flow between each component of the infrastructure Prerequisites Before you embark on this integration, ensure you have the following set up: Access to a Vantage instance: If you need a test instance of Vantage, you can provision one for free Python 3.10 In this example, we have used airbyte.

Data Integration

Data Integration Data Processing Metadata Testing

Stream data from Amazon MSK to Apache Iceberg tables in Amazon S3 and Amazon S3 Tables using Amazon Data Firehose

AWS Big Data

JUNE 20, 2025

Prerequisites To follow the tutorial in this post, you need the following prerequisites: An AWS account S3 bucket An Iceberg table in the AWS Glue Data Catalog An active Amazon MSK provisioned cluster with AWS Identity and Access Management (IAM) access control authentication enabled and multi-VPC connectivity.

Data Lake

Data Lake Big Data Data-driven Internet of Things

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments.

Data Warehouse

Data Warehouse Analytics Testing Sales

Overcome your Kafka Connect challenges with Amazon Data Firehose

AWS Big Data

JULY 7, 2025

At scale, customers need to programmatically manage their Kafka Connect infrastructure for consistent deployments when updates are required, as well as the code for error handling, retries, compression, or data transformation as it is delivered from your Kafka cluster. The time is in your local time zone.

Cost-Benefit

Cost-Benefit Data Lake Management Software

Develop and deploy a generative AI application using Amazon SageMaker Unified Studio

AWS Big Data

AUGUST 4, 2025

A SageMaker Unified Studio domain – For instructions, refer to Create an Amazon SageMaker Unified Studio domain – quick setup , and add a single sign-on (SSO) user for logging and set two-factor authentication. About the authors Amit Maindola is a Senior Data Architect focused on data engineering, analytics, and AI/ML at Amazon Web Services.

Visualization

Visualization Data Lake Testing Digital Transformation

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

AWS Big Data

JUNE 29, 2023

Amazon Kinesis Data Analytics makes it easy to transform and analyze streaming data in real time. In this post, we discuss why AWS recommends moving from Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics for Apache Flink to take advantage of Apache Flink’s advanced streaming capabilities.

Data Analytics

Data Analytics Analytics IoT Data Lake

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

ElastiCache manages the real-time application data caching, allowing your customers to experience microsecond response times while supporting high-throughput handling of hundreds of millions of operations per second. In the inventory management and forecasting solution, AWS Glue is recommended for data transformation.

Forecasting

Forecasting Management IoT Data-driven

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

ChatGPT> DataOps, or data operations, is a set of practices and technologies that organizations use to improve the speed, quality, and reliability of their data analytics processes. The goal of DataOps is to help organizations make better use of their data to drive business decisions and improve outcomes.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

The lift and shift migration approach is limited in its ability to transform businesses because it relies on outdated, legacy technologies and architectures that limit flexibility and slow down productivity. For the template and setup information, refer to Test Your Streaming Data Solution with the New Amazon Kinesis Data Generator.

Management

Management Metadata Analytics Dashboards

The Best Data Management Tools For Small Businesses

Smart Data Collective

APRIL 29, 2020

What is data management? Data management can be defined in many ways. Usually the term refers to the practices, techniques and tools that allow access and delivery through different fields and data structures in an organisation. Data transformation. Data analytics and visualisation.

Management

Management Data Warehouse Digital Transformation Dashboards

How the BMW Group analyses semiconductor demand with AWS Glue

AWS Big Data

APRIL 26, 2023

The Cloud Data Hub processes and combines anonymized data from vehicle sensors and other sources across the enterprise to make it easily accessible for internal teams creating customer-facing and internal applications. To learn more about the Cloud Data Hub, refer to BMW Group Uses AWS-Based Data Lake to Unlock the Power of Data.

Forecasting

Forecasting Manufacturing Data Lake Big Data

Enable data analytics with Talend and Amazon Redshift Serverless

AWS Big Data

JULY 25, 2023

Today, in order to accelerate and scale data analytics, companies are looking for an approach to minimize infrastructure management and predict computing needs for different types of workloads, including spikes and ad hoc analytics. For more information on how to connect to a database, refer to tDBConnection.

Data Analytics

Data Analytics Analytics Data Warehouse Data Processing

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

AWS Big Data

APRIL 2, 2024

You can use Amazon Data Firehose to aggregate and deliver log events from your applications and services captured in Amazon CloudWatch Logs to your Amazon Simple Storage Service (Amazon S3) bucket and Splunk destinations, for use cases such as data analytics, security analysis, application troubleshooting etc.

Metadata

Metadata Marketing Analytics Data Transformation

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

And as businesses contend with increasingly large amounts of data, the cloud is fast becoming the logical place where analytics work gets done. For many enterprises, Microsoft Azure has become a central hub for analytics. Azure Data Factory. Azure Data Explorer. Azure Synapse Analytics. Azure Databricks.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

For more information, refer to Delivering Consumer-friendly Healthcare Transparency in Coverage On AWS. The data in the machine-readable files can provide valuable insights to understand the true cost of healthcare services and compare prices and quality across hospitals. The tables are written to a database, which acts as a container.

Visualization

Visualization Dashboards Data-driven Gap analysis

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

Amazon Q Developer can now generate complex data integration jobs with multiple sources, destinations, and data transformations. Generated jobs can use a variety of data transformations, including filter, project, union, join, and custom user-supplied SQL.

Data Integration

Data Integration Data Lake Data Warehouse Software

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

Refer to Enabling AWS PrivateLink in the Snowflake documentation to verify the steps, required access level, and service level to set the configurations. For Data sources , search for and select Snowflake. To obtain the Snowflake PrivateLink account URL, refer to parameters obtained in the prerequisites. Choose Next.

Analytics

Analytics Data-driven Data Integration Data Lake

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

For full instructions, refer to Jira Cloud connector for Amazon AppFlow. You can do this by updating the CloudFormation stack with a flag that includes the CDC and data transformation steps. This will enable both the CDC steps and the data transformation steps for the Jira data. Choose Update.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

How SOCAR handles large IoT data with Amazon MSK and Amazon ElastiCache for Redis

AWS Big Data

MAY 3, 2023

The key requirements for SOCAR included achieving maximum performance for real-time data analytics, which required storing data in an in-memory data store. After careful consideration, ElastiCache for Redis was selected as the optimal solution due to its ability to handle complex data aggregation rules with ease.

IoT

IoT Internet of Things Data Transformation Management

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Furthermore, it allows for necessary actions to be taken, such as rectifying errors in the data source, refining data transformation processes, and updating data quality rules. This automated approach reduces the need for manual intervention and streamlines the data quality evaluation process.

Data Quality

Data Quality Metrics Data-driven Visualization

Enrich, standardize, and translate streaming data in Amazon Redshift with generative AI

AWS Big Data

AUGUST 6, 2024

Example data The following code shows an example of raw order data from the stream: Record1: { "orderID":"101", "email":" john. To address the challenges with the raw data, we can implement a comprehensive data transformation process using Redshift ML integrated with an LLM in an ETL workflow.

Data Warehouse

Data Warehouse Data-driven Modeling Internet of Things

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

Data analytics – Business analysts gather operational insights from multiple data sources, including the location data collected from the vehicles. You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches.

Analytics

Analytics IoT Metadata Internet of Things

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

citibike-tripdata-destination-ACCOUNT_ID – The bucket used for storing the transformed dataset. When implementing the solution in this post, replace references to airflow-blog-bucket-ACCOUNT_ID and citibike-tripdata-destination-ACCOUNT_ID with the names of your own S3 buckets. Run the DAG Let’s look at how to run the DAGs.

Data Processing

Data Processing Management Publishing Visualization

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

For setup instructions, refer to Getting started with Amazon OpenSearch Service. Conclusion The integration of AWS Glue with OpenSearch Service adds the powerful ability to perform data transformation when integrating with OpenSearch Service for analytics use cases.

Analytics

Analytics IT Data Lake Visualization

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

You can’t talk about data analytics without talking about data modeling. These two functions are nearly inseparable as we move further into a world of analytics that blends sources of varying volume, variety, veracity, and velocity. Big data analytics case study: SkullCandy.

Modeling

Modeling Big Data IoT Data Warehouse

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

The downstream consumers consist of business intelligence (BI) tools, with multiple data science and data analytics teams having their own WLM queues with appropriate priority values. Consequently, there was a fivefold rise in data integrations and a fivefold increase in ad hoc queries submitted to the Redshift cluster.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. Amazon EMR Serverless is a serverless option in Amazon EMR that makes it easy for data analysts and engineers to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers.

Data Lake

Data Lake Dashboards Metrics Metadata

Transforming Big Data into Actionable Intelligence

Sisense

MARCH 14, 2021

However, when investigating big data from the perspective of computer science research, we happily discover much clearer use of this cluster of confusing concepts. As we move from right to left in the diagram, from big data to BI, we notice that unstructured data transforms into structured data.

Big Data

Big Data IoT Data Warehouse Internet of Things

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

AWS Big Data

JULY 31, 2023

AWS DMS enables us to capture deltas, including deletes from the source database, through the use of Change Data Capture (CDC) configuration. CDC in DMS enables us to capture deltas without writing code and without missing any changes, which is critical for the integrity of the data. Under Transforms , choose SQL Query.

Sales

Sales Data Warehouse Visualization Testing

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. For an example, refer to How JPMorgan Chase built a data mesh architecture to drive significant value to enhance their enterprise data platform. Platform architects define a well-architected platform.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Data Landscape – Navigating The Data Jungle

Anmut

MARCH 24, 2022

We could give many answers, but they all centre on the same root cause: most data leaders focus on flashy technology and symptomatic fixes instead of approaching data transformation in a way that addresses the root causes of data problems and leads to tangible results and business success. It doesn’t have to be this way.

ROI

ROI Measurement Data-driven Data Transformation

Why The Public Sector Needs Data Governance

Alation

NOVEMBER 22, 2022

It should also give data analysts and scientists the ability to locate, understand, trust, and use data efficiently. For agencies to collaborate, they need a storage option that provides the scalability and flexibility needed for data analytics. Gain visibility into data history. Improve data visibility.

Data Governance

Data Governance Metadata Data-driven Unstructured Data

Introducing the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

Use case overview Migrating Hadoop workloads to Amazon EMR accelerates big data analytics modernization, increases productivity, and reduces operational cost. Refactoring coupled compute and storage to a decoupling architecture is a modern data solution. We also share case studies to show you the benefits of using the tool.

Cost-Benefit

Cost-Benefit Data Lake Dashboards Big Data

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

that gathers data from many sources. Third-party data might include industry benchmarks, data feeds (such as weather and social media), and/or anonymized customer data. Four Approaches to Data Analytics The world of data analytics is constantly and quickly changing. It’s all about context.

Analytics

Analytics Cost-Benefit Visualization Dashboards

What is a Data Pipeline?

Jet Global

MAY 9, 2024

A data pipeline is a series of processes that move raw data from one or more sources to one or more destinations, often transforming and processing the data along the way. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

It is important to have additional tools and processes in place to understand the impact of data errors and to minimize their effect on the data pipeline and downstream systems. These operations can include data movement, validation, cleaning, transformation, aggregation, analysis, and more.

Testing

Testing Data Quality Data Governance Data-driven

Hybrid big data analytics with Amazon EMR on AWS Outposts

AWS Big Data

JANUARY 29, 2025

times more performant than Apache Spark 3.5.1), and ease of Amazon EMR with the control and proximity of your data center, empowering enterprises to meet stringent regulatory and operational requirements while unlocking new data processing possibilities. This method is ideal for recurring tasks or large-scale data transformations.

Big Data

Big Data Data Analytics Analytics Interactive

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

AWS Big Data

FEBRUARY 18, 2025

To optimize their security operations, organizations are adopting modern approaches that combine real-time monitoring with scalable data analytics. They are using data lake architectures and Apache Iceberg to efficiently process large volumes of security data while minimizing operational overhead.

Snapshot

Snapshot Optimization Data Lake Metadata

Unified scheduling for visual ETL flows and query books in Amazon SageMaker Unified Studio

AWS Big Data

APRIL 30, 2025

Data engineers and analysts often need to automate their data processing workflows and queries to maintain up-to-date data pipelines and reports. Amazon SageMaker Unified Studio provides a unified environment for data, analytics, machine learning (ML), and AI workloads. Choose your visual ETL flow.

Visualization

Visualization Software Machine Learning Reporting

Build data pipelines with dbt in Amazon Redshift using Amazon MWAA and Cosmos

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Webinars

Trending Sources

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Webinars

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

How DeNA Co., Ltd. accelerated anonymized data quality tests up to 100 times faster using Amazon Redshift Serverless and dbt

How To Use Airbyte, dbt-teradata, Dagster, and Teradata Vantage™ for Seamless Data Integration

Stream data from Amazon MSK to Apache Iceberg tables in Amazon S3 and Amazon S3 Tables using Amazon Data Firehose

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Overcome your Kafka Connect challenges with Amazon Data Firehose

Develop and deploy a generative AI application using Amazon SageMaker Unified Studio

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

Reference guide to build inventory management and forecasting solutions on AWS

An AI Chat Bot Wrote This Blog Post …

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

The Best Data Management Tools For Small Businesses

How the BMW Group analyses semiconductor demand with AWS Glue

Enable data analytics with Talend and Amazon Redshift Serverless

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

7 key Microsoft Azure analytics services (plus one extra)

How healthcare organizations can analyze and create insights using price transparency data

Introducing Amazon Q data integration in AWS Glue

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

How SOCAR handles large IoT data with Amazon MSK and Amazon ElastiCache for Redis

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

Enrich, standardize, and translate streaming data in Amazon Redshift with generative AI

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Use Snowflake with Amazon MWAA to orchestrate data pipelines

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

Building Better Data Models to Unlock Next-Level Intelligence

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Transforming Big Data into Actionable Intelligence

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Data Landscape – Navigating The Data Jungle

Why The Public Sector Needs Data Governance

Introducing the AWS ProServe Hadoop Migration Delivery Kit TCO tool

What Is Embedded Analytics?

What is a Data Pipeline?

“You Complete Me,” said Data Lineage to DataOps Observability.

Hybrid big data analytics with Amazon EMR on AWS Outposts

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

Unified scheduling for visual ETL flows and query books in Amazon SageMaker Unified Studio

Stay Connected