Data Architecture, Data Transformation and Reference

Data Architecture

Data Transformation

Reference

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Analytics

Analytics Data Warehouse Big Data Metrics

Introducing blueprint discovery and other UI enhancements for Amazon OpenSearch Ingestion

AWS Big Data

MAY 22, 2024

Amazon OpenSearch Ingestion is a fully managed serverless pipeline that allows you to ingest, filter, transform, enrich, and route data to an Amazon OpenSearch Service domain or Amazon OpenSearch Serverless collection. He is deeply passionate about Data Architecture and helps customers build analytics solutions at scale on AWS.

Data Architecture

Data Architecture Visualization Data Transformation Management

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Trending Sources

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments.

Data Warehouse

Data Warehouse Analytics Testing Modeling

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

We set up our AWS CDK to refer to the contents of a specific directory and define a resource (for example, an AWS Step Functions state machine or an AWS Glue job) for each file it found in that directory. We also used it as a repository for storing code that could be retrieved and used by other services.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

For more information on this foundation, refer to A Detailed Overview of the Cost Intelligence Dashboard. Additionally, it manages table definitions in the AWS Glue Data Catalog , containing references to data sources and targets of extract, transform, and load (ETL) jobs in AWS Glue.

Analytics

Analytics Dashboards Metadata Data Warehouse

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

Independent data products often only have value if you can connect them, join them, and correlate them to create a higher order data product that creates additional insights. A modern data architecture is critical in order to become a data-driven organization.

Technology

Technology Data-driven Machine Learning Sales

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

For more details on how to configure and schedule the log collector, refer to the yarn-log-collector GitHub repo. Transform the YARN job history logs from JSON to CSV After obtaining YARN logs, you run a YARN log organizer, yarn-log-organizer.py, which is a parser to transform JSON-based logs to CSV files.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Data Vault 2.0 allows for the following: Agile data warehouse development Parallel data ingestion A scalable approach to handle multiple data sources even on the same entity A high level of automation Historization Full lineage support However, Data Vault 2.0 JOB_NAME All The process name from the ETL framework.

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

If storing operational data in a data warehouse is a requirement, synchronization of tables between operational data stores and Amazon Redshift tables is supported. In scenarios where data transformation is required, you can use Redshift stored procedures to modify data in Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

All these pitfalls are avoidable with the right data integrity policies in place. Means of ensuring data integrity. Data integrity can be divided into two areas: physical and logical. Physical data integrity refers to how data is stored and accessed. How are your devices physically secured?

Data Integration

Data Integration Testing Data Quality Data-driven

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. The following diagram illustrates a scalable migration pattern for extract, transform, and load (ETL) scenario. The success criteria are the key performance indicators (KPIs) for each component of the data workflow.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

A step-by-step guide to setting up a data governance program

IBM Big Data Hub

FEBRUARY 9, 2023

In our last blog , we delved into the seven most prevalent data challenges that can be addressed with effective data governance. Today we will share our approach to developing a data governance program to drive data transformation and fuel a data-driven culture. Don’t try to do everything at once!

Data Governance

Data Governance Business Objectives Data Quality Measurement

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Key considerations Gameskraft embraces a modern data architecture, with the data lake residing in Amazon S3. To grant seamless access to the data lake, we use the innovative capabilities of Redshift Spectrum, which is a bridge between the data warehouse (Amazon Redshift) and data lake (Amazon S3).

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Data ingestion – Steps 1 and 2 use AWS DMS, which connects to the source database and moves full and incremental data (CDC) to Amazon S3 in Parquet format. Let’s refer to this S3 bucket as the raw layer. Data transformation – Steps 3 and 4 represent an EMR Serverless Spark application (Amazon EMR 6.9

Data Lake

Data Lake Dashboards Metrics Metadata

Measuring Maturity

Peter James Thomas

MARCH 9, 2020

I used to talk about carrying out a Situational Analysis of Data Capabilities, nowadays I am more likely to refer to a Data Capability Review. I make such reviews with respect to my own Data Capability Framework, which I introduced to the public in 2019 via A Simple Data Capability Framework.

Measurement

Measurement Data Strategy Strategy Modeling

The Chief Marketing Officer and the CDO – A Modern Fable

Peter James Thomas

OCTOBER 30, 2018

All the references I can find to it are modern pieces comparing it to the CDO role, so perhaps it is apochryphal. It may well be that one thing that a CDO needs to get going is a data transformation programme. This may purely be focused on cultural aspects of how an organisation records, shares and otherwise uses data.

Marketing

Marketing Strategy Data Architecture Data Strategy

Data Landscape – Navigating The Data Jungle

Anmut

MARCH 24, 2022

We could give many answers, but they all centre on the same root cause: most data leaders focus on flashy technology and symptomatic fixes instead of approaching data transformation in a way that addresses the root causes of data problems and leads to tangible results and business success. It doesn’t have to be this way.

ROI

ROI Measurement Data-driven Data Transformation

Introducing the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

To dive deep into the HMDK TCO tool, refer to the next post in this series, How AWS ProServe Hadoop TCO tool accelerate Hadoop workload migrations to Amazon EMR. He also understands how to apply technologies to solve big data problems and build a well-designed data architecture.

Cost-Benefit

Cost-Benefit Data Lake Dashboards Big Data

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

The data mesh framework In the dynamic landscape of data management, the search for agility, scalability, and efficiency has led organizations to explore new, innovative approaches. One such innovation gaining traction is the data mesh framework. This empowers individual teams to own and manage their data.

Metadata

Metadata Data Quality Data Governance Modeling

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

AWS Big Data

NOVEMBER 14, 2024

We use the built-in features of Data Firehose, including AWS Lambda for necessary data transformation and Amazon Simple Notification Service (Amazon SNS) for near real-time alerts. For more information, refer to Amazon Kinesis Data Firehose now supports zero buffering. Munim Abbasi is currently a Sr.

Data Lake

Data Lake Metadata Testing Data-driven

Introducing the HubSpot connector for AWS Glue

AWS Big Data

DECEMBER 2, 2024

AWS Glue establishes a secure connection to HubSpot using OAuth for authorization and TLS for data encryption in transit. AWS Glue also supports the ability to apply complex data transformations, enabling efficient data integration and preparation to meet your needs. For more information on AWS Glue, visit AWS Glue.

Data Lake

Data Lake Testing Data Integration Metadata

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

It is important to have additional tools and processes in place to understand the impact of data errors and to minimize their effect on the data pipeline and downstream systems. These operations can include data movement, validation, cleaning, transformation, aggregation, analysis, and more. Are problems with data tests?

Testing

Testing Data Governance Data Quality Data-driven

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

that gathers data from many sources. Data Environment First off, the solutions you consider should be compatible with your current data architecture. We have outlined the requirements that most providers ask for: Data Sources Strategic Objective Use native connectivity optimized for the data source.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Data Leaders Brief

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Introducing blueprint discovery and other UI enhancements for Amazon OpenSearch Ingestion

Webinars

Trending Sources

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Automate discovery of data relationships using ML and Amazon Neptune graph technology

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Amazon Redshift data ingestion options

Data Integrity, the Basis for Reliable Insights

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

A step-by-step guide to setting up a data governance program

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Measuring Maturity

The Chief Marketing Officer and the CDO – A Modern Fable

Data Landscape – Navigating The Data Jungle

Introducing the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Empowering data mesh: The tools to deliver BI excellence

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

Introducing the HubSpot connector for AWS Glue

“You Complete Me,” said Data Lineage to DataOps Observability.

What Is Embedded Analytics?

Stay Connected