Big Data, Data Transformation and Reference

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Analytics

Analytics Data Warehouse Big Data Metrics

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

Your generated jobs can use a variety of data transformations, including filters, projections, unions, joins, and aggregations, giving you the flexibility to handle complex data processing requirements. In this post, we discuss how Amazon Q data integration transforms ETL workflow development.

Data Integration

Data Integration Visualization Data Processing Data Lake

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

Reporting being part of an effective DQM, we will also go through some data quality metrics examples you can use to assess your efforts in the matter. But first, let’s define what data quality actually is. What is the definition of data quality? Why Do You Need Data Quality Management? date, month, and year).

Data Quality

Data Quality Metrics Data-driven Management

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

ElastiCache manages the real-time application data caching, allowing your customers to experience microsecond response times while supporting high-throughput handling of hundreds of millions of operations per second. In the inventory management and forecasting solution, AWS Glue is recommended for data transformation.

Forecasting

Forecasting Management IoT Data-driven

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

You can now use your tool of choice, including Tableau, to quickly derive business insights from your data while using standardized definitions and decentralized ownership. Refer to the detailed blog post on how you can use this to connect through various other tools.

Analytics

Analytics Visualization Data Governance Data-driven

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Amazon Athena provides interactive analytics service for analyzing the data in Amazon Simple Storage Service (Amazon S3). Amazon Redshift is used to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes.

Metadata

Metadata Data Lake Modeling Data Warehouse

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

AWS Big Data

NOVEMBER 22, 2024

You can create temporary tables once and reference them throughout, without having to constantly refresh database connections and restart from scratch. Please refer to Redshift Quotas and Limits here. After 24 hours the session is forcibly closed, and in-progress queries are terminated.

Data Warehouse

Data Warehouse Recreation/Entertainment Cost-Benefit Data-driven

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of. To learn more, refer to Amazon SageMaker Unified Studio. Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team.

Visualization

Visualization Data Processing Testing Publishing

How the BMW Group analyses semiconductor demand with AWS Glue

AWS Big Data

APRIL 26, 2023

The Cloud Data Hub processes and combines anonymized data from vehicle sensors and other sources across the enterprise to make it easily accessible for internal teams creating customer-facing and internal applications. To learn more about the Cloud Data Hub, refer to BMW Group Uses AWS-Based Data Lake to Unlock the Power of Data.

Forecasting

Forecasting Manufacturing Data Lake Big Data

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments.

Data Warehouse

Data Warehouse Analytics Testing Modeling

Transforming Big Data into Actionable Intelligence

Sisense

MARCH 14, 2021

Attempting to learn more about the role of big data (here taken to datasets of high volume, velocity, and variety) within business intelligence today, can sometimes create more confusion than it alleviates, as vital terms are used interchangeably instead of distinctly. Big data challenges and solutions.

Big Data

Big Data IoT Data Warehouse Data-driven

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

AWS Big Data

AUGUST 8, 2024

Oracle GoldenGate for Oracle Database and Big Data adapters Oracle GoldenGate is a real-time data integration and replication tool used for disaster recovery, data migrations, high availability. Configure GoldenGate for Oracle Database and extract data from the Oracle database to trail files.

Analytics

Analytics Big Data Software Data Integration

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

AWS Big Data

OCTOBER 12, 2023

The integration between AWS Step Functions and Amazon EMR Serverless makes it easier to manage and orchestrate big data workflows. References Amazon EMR Serverless AWS Step Functions About the Authors Naveen Balaraman is a Sr Cloud Application Architect at Amazon Web Services. Now, with the support for “Run a Job (.sync)”

Big Data

Big Data Data-driven Management Visualization

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

But the features in Power BI Premium are now more powerful than the functionality in Azure Analysis Services, so while the service isn’t going away, Microsoft will offer an automated migration tool in the second half of this year for customers who want to move their data models into Power BI instead. Azure Data Factory.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

Refer to Enabling AWS PrivateLink in the Snowflake documentation to verify the steps, required access level, and service level to set the configurations. For Data sources , search for and select Snowflake. To obtain the Snowflake PrivateLink account URL, refer to parameters obtained in the prerequisites. Choose Next.

Analytics

Analytics Data-driven Data Integration Data Lake

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

Amazon Q Developer can now generate complex data integration jobs with multiple sources, destinations, and data transformations. Generated jobs can use a variety of data transformations, including filter, project, union, join, and custom user-supplied SQL. In his spare time, he enjoys cycling with his road bike.

Data Integration

Data Integration Data Lake Data Warehouse Software

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

For more information on this foundation, refer to A Detailed Overview of the Cost Intelligence Dashboard. Additionally, it manages table definitions in the AWS Glue Data Catalog , containing references to data sources and targets of extract, transform, and load (ETL) jobs in AWS Glue.

Dashboards

Dashboards Analytics Metadata Data Warehouse

Time for New Partnership Paradigms to Be Future-fit

CIO Business Intelligence

DECEMBER 6, 2023

Airbus was conceiving an ambitious plan to develop an open aviation data platform, Skywise, as a single platform of reference for all major aviation players that would enable them to improve their operational performance and business results and support Airbus’ own digital transformation.

Digital Transformation

Digital Transformation Software Cost-Benefit Manufacturing

The Best Data Management Tools For Small Businesses

Smart Data Collective

APRIL 29, 2020

What is data management? Data management can be defined in many ways. Usually the term refers to the practices, techniques and tools that allow access and delivery through different fields and data structures in an organisation. Extraction, Transform, Load (ETL). Data transformation.

Management

Management Data Warehouse Digital Transformation Dashboards

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Apply fine-grained access and transformation on the SUPER data type in Amazon Redshift

AWS Big Data

JUNE 19, 2024

We refer to multiple masking policies being attached to a table as a multi-modal masking policy. The OBJECT_TRANSFORM function in Amazon Redshift is designed to facilitate data transformations by allowing you to manipulate JSON data directly within the database. All columns should masked for them. The SUPER paths a.b.c

Data Warehouse

Data Warehouse Testing Sales Structured Data

Introducing blueprint discovery and other UI enhancements for Amazon OpenSearch Ingestion

AWS Big Data

MAY 22, 2024

Amazon OpenSearch Ingestion is a fully managed serverless pipeline that allows you to ingest, filter, transform, enrich, and route data to an Amazon OpenSearch Service domain or Amazon OpenSearch Serverless collection. You can control the costs OCUs incur by configuring maximum OCUs that a pipeline is allowed to scale.

Data Architecture

Data Architecture Visualization Data Transformation Management

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources. Data transformations through stored procedures and use of materialized views to curate datasets and generate insights is a known pattern with relational databases.

Management

Management Metadata Analytics Dashboards

Transition from Amazon CloudSearch to Amazon OpenSearch Service

AWS Big Data

JULY 25, 2024

OpenSearch Ingestion can ingest data from a wide variety of sources, such as Amazon Simple Storage Service (Amazon S3) buckets and HTTP endpoints, and has a rich ecosystem of built-in processors to take care of your most complex data transformation needs.

Cost-Benefit

Cost-Benefit Machine Learning Dashboards Management

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

For more details on how to configure and schedule the log collector, refer to the yarn-log-collector GitHub repo. Transform the YARN job history logs from JSON to CSV After obtaining YARN logs, you run a YARN log organizer, yarn-log-organizer.py, which is a parser to transform JSON-based logs to CSV files.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

Attempting to learn more about the role of big data (here taken to datasets of high volume, velocity, and variety) within business intelligence today, can sometimes create more confusion than it alleviates, as vital terms are used interchangeably instead of distinctly. Big data challenges and solutions. Dig into AI.

Modeling

Modeling Big Data IoT Data Warehouse

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

AWS Big Data

APRIL 2, 2024

With these settings, you can now seamlessly ingest decompressed CloudWatch log data into Splunk using Firehose. Pricing The Firehose decompression feature decompress the data and charges per GB of decompressed data. To understand decompression pricing, refer to Amazon Data Firehose pricing.

Metadata

Metadata Marketing Analytics Data Transformation

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

AWS Big Data

JUNE 20, 2023

Kinesis Data Firehose is a fully managed service for delivering near-real-time streaming data to various destinations for storage and performing near-real-time analytics. You can perform analytics on VPC flow logs delivered from your VPC using the Kinesis Data Firehose integration with Datadog as a destination.

Dashboards

Dashboards Visualization Metrics Data Transformation

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

By preserving historical versions, data lake time travel provides benefits such as auditing and compliance, data recovery and rollback, reproducible analysis, and data exploration at different points in time. Another popular transaction data lake use case is incremental query. in all Regions where Amazon EMR is available.

Data Lake

Data Lake Snapshot Big Data Data-driven

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

We set up our AWS CDK to refer to the contents of a specific directory and define a resource (for example, an AWS Step Functions state machine or an AWS Glue job) for each file it found in that directory. We also used it as a repository for storing code that could be retrieved and used by other services.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Ten new visual transforms in AWS Glue Studio

AWS Big Data

MAY 9, 2023

AWS Glue Studio is a graphical interface that makes it easy to create, run, and monitor extract, transform, and load (ETL) jobs in AWS Glue. It allows you to visually compose data transformation workflows using nodes that represent different data handling steps, which later are converted automatically into code to run.

Visualization

Visualization Marketing Big Data IT

Extract time series from satellite weather data with AWS Lambda

AWS Big Data

JULY 6, 2023

It has not been specifically designed for heavy data transformation tasks. Now that the data is on Amazon S3, you can delete the directory that has been downloaded from your Linux machine. Create the Lambda functions For step-by-step instructions on how to create a Lambda function, refer to Getting started with Lambda.

Machine Learning

Machine Learning Visualization IoT Digital Transformation

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

If storing operational data in a data warehouse is a requirement, synchronization of tables between operational data stores and Amazon Redshift tables is supported. In scenarios where data transformation is required, you can use Redshift stored procedures to modify data in Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

How SOCAR handles large IoT data with Amazon MSK and Amazon ElastiCache for Redis

AWS Big Data

MAY 3, 2023

Components of the consumer application The consumer application comprises three main parts that work together to consume, transform, and load messages from Amazon MSK into a target database. The following diagram shows an example of data transformations in the handler component.

IoT

IoT Internet of Things Data Transformation Management

Improve observability across Amazon MWAA tasks

AWS Big Data

FEBRUARY 6, 2023

In the next sections, we explore the following topics: The DAG file, in order to understand how to define and then pass the correlation ID in the AWS Glue and EMR tasks The code needed in the Python scripts to output information based on the correlation ID Refer to the GitHub repo for the detailed DAG definition and Spark scripts.

Management

Management Interactive Publishing Metadata

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

For full instructions, refer to Jira Cloud connector for Amazon AppFlow. You can do this by updating the CloudFormation stack with a flag that includes the CDC and data transformation steps. This will enable both the CDC steps and the data transformation steps for the Jira data. Choose Update.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

How Infomedia built a serverless data pipeline with change data capture using AWS Glue and Apache Hudi

AWS Big Data

MARCH 15, 2023

When designing the data processing pipeline for the attribute API, the Infomedia team wanted to use a flexible and open-source solution for processing data workloads with minimal operational overhead. If you would like to learn more, please visit AWS Glue and AWS Lake Formation to get started on your data integration journey.

Cost-Benefit

Cost-Benefit Data Processing Optimization Data-driven

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

Under the Transparency in Coverage (TCR) rule , hospitals and payors to publish their pricing data in a machine-readable format. For more information, refer to Delivering Consumer-friendly Healthcare Transparency in Coverage On AWS. The Data Catalog now contains references to the machine-readable data.

Visualization

Visualization Dashboards Data-driven Gap analysis

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, more data is becoming available for processing / enrichment of existing and new use cases e.g., recently we have experienced a rapid growth in data collection at the edge and an increase in availability of frameworks for processing that data. As a result, alternative data integration technologies (e.g.,

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Furthermore, it allows for necessary actions to be taken, such as rectifying errors in the data source, refining data transformation processes, and updating data quality rules. This automated approach reduces the need for manual intervention and streamlines the data quality evaluation process.

Data Quality

Data Quality Metrics Data-driven Visualization

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

Encounter 4 appears to refer to the customer with ID 8, but the email doesn’t match, and no Customer_ID is given. To learn more about ML in Neptune, refer to Amazon Neptune ML for machine learning on graphs. You can also explore Neptune notebooks demonstrating ML and data science for graphs.

Technology

Technology Data-driven Machine Learning Sales

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. For an example, refer to How JPMorgan Chase built a data mesh architecture to drive significant value to enhance their enterprise data platform. Platform architects define a well-architected platform.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Data Vault 2.0 allows for the following: Agile data warehouse development Parallel data ingestion A scalable approach to handle multiple data sources even on the same entity A high level of automation Historization Full lineage support However, Data Vault 2.0 JOB_NAME All The process name from the ETL framework.

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Stored procedure enhancements in Amazon Redshift

AWS Big Data

SEPTEMBER 6, 2023

Stored procedures are commonly used to encapsulate logic for data transformation, data validation, and business-specific logic. You can also schedule stored procedures to automate data processing on Amazon Redshift. For more information, refer to Bringing your stored procedures to Amazon Redshift.

Data Warehouse

Data Warehouse Insurance Statistics Software

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Webinars

Trending Sources

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Webinars

Reference guide to build inventory management and forecasting solutions on AWS

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

How the BMW Group analyses semiconductor demand with AWS Glue

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Transforming Big Data into Actionable Intelligence

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

7 key Microsoft Azure analytics services (plus one extra)

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Introducing Amazon Q data integration in AWS Glue

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Time for New Partnership Paradigms to Be Future-fit

The Best Data Management Tools For Small Businesses

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Apply fine-grained access and transformation on the SUPER data type in Amazon Redshift

Introducing blueprint discovery and other UI enhancements for Amazon OpenSearch Ingestion

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Transition from Amazon CloudSearch to Amazon OpenSearch Service

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Building Better Data Models to Unlock Next-Level Intelligence

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Ten new visual transforms in AWS Glue Studio

Extract time series from satellite weather data with AWS Lambda

Amazon Redshift data ingestion options

How SOCAR handles large IoT data with Amazon MSK and Amazon ElastiCache for Redis

Improve observability across Amazon MWAA tasks

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

How Infomedia built a serverless data pipeline with change data capture using AWS Glue and Apache Hudi

How healthcare organizations can analyze and create insights using price transparency data

Addressing the Three Scalability Challenges in Modern Data Platforms

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

Automate discovery of data relationships using ML and Amazon Neptune graph technology

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Stored procedure enhancements in Amazon Redshift

Stay Connected