Data Transformation, Reference and Visualization

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and from third-party sources. Refer to the detailed blog post on how you can use this to connect through various other tools.

Analytics

Analytics Visualization Data Governance Data-driven

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

This means you can refine your ETL jobs through natural follow-up questionsstarting with a basic data pipeline and progressively adding transformations, filters, and business logic through conversation. The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios.

Data Integration

Data Integration Visualization Data Processing Data Lake

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

Reporting being part of an effective DQM, we will also go through some data quality metrics examples you can use to assess your efforts in the matter. But first, let’s define what data quality actually is. What is the definition of data quality? Why Do You Need Data Quality Management? date, month, and year).

Data Quality

Data Quality Metrics Data-driven Management

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Ten new visual transforms in AWS Glue Studio

AWS Big Data

MAY 9, 2023

AWS Glue Studio is a graphical interface that makes it easy to create, run, and monitor extract, transform, and load (ETL) jobs in AWS Glue. It allows you to visually compose data transformation workflows using nodes that represent different data handling steps, which later are converted automatically into code to run.

Visualization

Visualization Marketing Big Data IT

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of. You can navigate to the projects Data page to visually verify the existence of the newly created table. option("url", jdbcurl).option("dbtable",

Visualization

Visualization Data Processing Testing Publishing

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

You can use AWS Glue Studio to set up data replication and mask PII with no coding required. AWS Glue Studio visual editor provides a low-code graphic environment to build, run, and monitor extract, transform, and load (ETL) scripts. Data transformation – Adjusts and removes unnecessary fields.

Visualization

Visualization Metadata Data Transformation Testing

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Wayne Yaddow

MARCH 5, 2025

The goal is to examine five major methods of verifying and validating data transformations in data pipelines with an eye toward high-quality data deployment. First, we look at how unit and integration tests uncover transformation errors at an early stage. Key Tools & Processes Data profiling tools (e.g.,

Testing

Testing Data Transformation Statistics Metadata

Tableau further democratizes analytics with AI-fueled features

CIO Business Intelligence

APRIL 30, 2024

Together the technologies aim to help business users and “novice” data analysts explore their data and gain insights without having to resort to data experts. This is really empowering everyone to be a data expert,” Maxon said. “It Shared Dimensions and Composable Data Sources.

Analytics

Analytics Metrics Visualization Dashboards

Data Engineers Are Using AI to Verify Data Transformations

Wayne Yaddow

FEBRUARY 26, 2025

AI is transforming how senior data engineers and data scientists validate data transformations and conversions. Artificial intelligence-based verification approaches aid in the detection of anomalies, the enforcement of data integrity, and the optimization of pipelines for improved efficiency.

Data Transformation

Data Transformation Testing Data-driven Data Quality

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources. Data transformations through stored procedures and use of materialized views to curate datasets and generate insights is a known pattern with relational databases.

Management

Management Metadata Analytics Dashboards

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

Under the Transparency in Coverage (TCR) rule , hospitals and payors to publish their pricing data in a machine-readable format. For more information, refer to Delivering Consumer-friendly Healthcare Transparency in Coverage On AWS. Then you can use Amazon Athena V3 to query the tables in the Data Catalog.

Visualization

Visualization Dashboards Data-driven Gap analysis

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

” I, thankfully, learned this early in my career, at a time when I could still refer to myself as a software developer. That takes us to a conspicuous omission from that list of roles: the data scientists who focused on building basic models. Companies will still need advanced ML modeling and data viz, sure.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

Introducing blueprint discovery and other UI enhancements for Amazon OpenSearch Ingestion

AWS Big Data

MAY 22, 2024

Amazon OpenSearch Ingestion is a fully managed serverless pipeline that allows you to ingest, filter, transform, enrich, and route data to an Amazon OpenSearch Service domain or Amazon OpenSearch Serverless collection. You can now view the configurations in JSON format in addition to the YAML format and edit them in place.

Data Architecture

Data Architecture Visualization Data Transformation Management

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

For more information on this foundation, refer to A Detailed Overview of the Cost Intelligence Dashboard. Additionally, it manages table definitions in the AWS Glue Data Catalog , containing references to data sources and targets of extract, transform, and load (ETL) jobs in AWS Glue.

Dashboards

Dashboards Analytics Metadata Data Warehouse

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

AWS Big Data

JUNE 20, 2023

Kinesis Data Firehose is a fully managed service for delivering near-real-time streaming data to various destinations for storage and performing near-real-time analytics. You can perform analytics on VPC flow logs delivered from your VPC using the Kinesis Data Firehose integration with Datadog as a destination.

Dashboards

Dashboards Visualization Metrics Data Transformation

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

AWS Glue is a serverless data integration service that makes it straightforward to discover, prepare, and combine data for analytics, machine learning (ML), and application development. AWS Glue provides both visual and code-based interfaces to make data integration effortless. Choose Create job and Visual ETL.

Analytics

Analytics IT Data Lake Visualization

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

But the features in Power BI Premium are now more powerful than the functionality in Azure Analysis Services, so while the service isn’t going away, Microsoft will offer an automated migration tool in the second half of this year for customers who want to move their data models into Power BI instead. Azure Data Factory.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

This allows business analysts and decision-makers to gain valuable insights, visualize key metrics, and explore the data in depth, enabling informed decision-making and strategic planning for pricing and promotional strategies. For Data sources , search for and select Snowflake. On the Visual tab, choose Add nodes.

Analytics

Analytics Data-driven Data Integration Data Lake

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

If storing operational data in a data warehouse is a requirement, synchronization of tables between operational data stores and Amazon Redshift tables is supported. In scenarios where data transformation is required, you can use Redshift stored procedures to modify data in Redshift tables. AWS Glue 4.0

IoT

IoT Data Warehouse Cost-Benefit Reporting

Extract time series from satellite weather data with AWS Lambda

AWS Big Data

JULY 6, 2023

It has not been specifically designed for heavy data transformation tasks. Now that the data is on Amazon S3, you can delete the directory that has been downloaded from your Linux machine. Create the Lambda functions For step-by-step instructions on how to create a Lambda function, refer to Getting started with Lambda.

Machine Learning

Machine Learning Visualization IoT Digital Transformation

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

For full instructions, refer to Jira Cloud connector for Amazon AppFlow. You can do this by updating the CloudFormation stack with a flag that includes the CDC and data transformation steps. This will enable both the CDC steps and the data transformation steps for the Jira data. Choose Update.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow. A reimagined visual editor to boost developer productivity and enable self service. Figure 5: Parameter references in the configuration panel and auto-complete.

Testing

Testing Cost-Benefit Interactive Visualization

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

In this post, we dive deep into the tool, walking through all steps from log ingestion, transformation, visualization, and architecture design to calculate TCO. For more details on how to configure and schedule the log collector, refer to the yarn-log-collector GitHub repo.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, more data is becoming available for processing / enrichment of existing and new use cases e.g., recently we have experienced a rapid growth in data collection at the edge and an increase in availability of frameworks for processing that data. As a result, alternative data integration technologies (e.g.,

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Advanced reporting and analytics for the Post Call Analytics (PCA) solution with Amazon QuickSight

AWS Big Data

JANUARY 27, 2023

You can visualize the PCA insights in the business intelligence (BI) tool Amazon QuickSight for advanced analysis. In this post, we show you how to use PCA’s data to build automated QuickSight dashboards for advanced analytics to assist in quality assurance (QA) and quality management (QM) processes.

Analytics

Analytics Reporting Dashboards Visualization

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

You can use your preferred IDE to implement AWS resource definition using the AWS Cloud Development Kit (AWS CDK) or AWS CloudFormation , and also the business logic of AWS Glue job scripts for data integration. To learn more about how to implement your AWS Glue job scripts locally, refer to Develop and test AWS Glue version 3.0

Data Integration

Data Integration Snapshot Testing Visualization

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

AWS Big Data

JUNE 29, 2023

Notebooks are provisioned quickly and provide a way for you to instantly view and analyze your streaming data. This pipeline could further be used to send data to Amazon OpenSearch Service or other targets for additional processing and visualization. View the stream data. Transform and enrich the data.

Data Analytics

Data Analytics Analytics IoT Data Lake

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Furthermore, it allows for necessary actions to be taken, such as rectifying errors in the data source, refining data transformation processes, and updating data quality rules. The following sample email provides operational metrics for the AWS Glue Data Quality ruleset evaluation.

Data Quality

Data Quality Metrics Data-driven Visualization

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

citibike-tripdata-destination-ACCOUNT_ID – The bucket used for storing the transformed dataset. When implementing the solution in this post, replace references to airflow-blog-bucket-ACCOUNT_ID and citibike-tripdata-destination-ACCOUNT_ID with the names of your own S3 buckets. Run the DAG Let’s look at how to run the DAGs.

Data Processing

Data Processing Management Publishing Visualization

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches. Refer to the instructions in the README file for steps on how to provision and decommission this solution.

Analytics

Analytics IoT Metadata Internet of Things

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

AWS Big Data

OCTOBER 12, 2023

AWS Step Functions is a serverless orchestration service that enables developers to build visual workflows for applications as a series of event-driven steps. References Amazon EMR Serverless AWS Step Functions About the Authors Naveen Balaraman is a Sr Cloud Application Architect at Amazon Web Services. Leave a comment.

Big Data

Big Data Data-driven Management Visualization

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

The following AWS services are used for data ingestion, processing, and load: Amazon AppFlow is a fully managed integration service that enables you to securely transfer data between SaaS applications like Salesforce, SAP, Marketo, Slack, and ServiceNow, and AWS services like Amazon S3 and Amazon Redshift , in just a few clicks.

Sales

Sales Visualization Software Metadata

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

AWS Big Data

FEBRUARY 8, 2023

For addressing data quality challenges in Amazon Simple Storage Service (Amazon S3) data lakes and data pipelines, AWS has announced AWS Glue Data Quality (preview). Create a table for weight information This reference table holds two columns; the table name and the column mapping with weights.

Data Quality

Data Quality Testing Data Warehouse Unstructured Data

The Best Embedded BI Tools For 2024

FineReport

APRIL 21, 2024

In summary, embedded analytics refers to actionable intelligence seamlessly integrated into customer-facing products, applications, or services. These solutions typically include data visualization, customizable dashboards, and self-service analytics. Features include interactive visualizations and native data connectors.

Dashboards

Dashboards Visualization Interactive Business Intelligence

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

AWS Big Data

JULY 31, 2023

AWS DMS enables us to capture deltas, including deletes from the source database, through the use of Change Data Capture (CDC) configuration. CDC in DMS enables us to capture deltas without writing code and without missing any changes, which is critical for the integrity of the data. Navigate to the Visual tab.

Sales

Sales Data Warehouse Visualization Testing

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

A modern data stack relies on cloud computing, whereas a legacy data stack stores data on servers instead of in the cloud. Modern data stacks provide access for more data professionals than a legacy data stack. Business Intelligence Tools: Business intelligence (BI) tools are used to visualize your data.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. For an example, refer to How JPMorgan Chase built a data mesh architecture to drive significant value to enhance their enterprise data platform. Platform architects define a well-architected platform.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Measuring Maturity

Peter James Thomas

MARCH 9, 2020

I used to talk about carrying out a Situational Analysis of Data Capabilities, nowadays I am more likely to refer to a Data Capability Review. I make such reviews with respect to my own Data Capability Framework, which I introduced to the public in 2019 via A Simple Data Capability Framework.

Measurement

Measurement Data Strategy Strategy Modeling

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Data ingestion – Steps 1 and 2 use AWS DMS, which connects to the source database and moves full and incremental data (CDC) to Amazon S3 in Parquet format. Let’s refer to this S3 bucket as the raw layer. Data transformation – Steps 3 and 4 represent an EMR Serverless Spark application (Amazon EMR 6.9

Data Lake

Data Lake Dashboards Metrics Metadata

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

Alternatively, you can use AWS Glue for Apache Spark, which provides built-in support for bucketing configurations during the data transformation process. AWS Glue allows you to define bucketing parameters, such as the number of buckets and the columns to bucket on, providing an optimized data layout for efficient querying with Athena.

Optimization

Optimization Data Lake Cost-Benefit Reporting

Transforming Big Data into Actionable Intelligence

Sisense

MARCH 14, 2021

Before we dive into the topics of big data as a service and analytics applied to same, let’s quickly clarify data analytics using an oft-used application of analytics: Visualization! As we move from right to left in the diagram, from big data to BI, we notice that unstructured data transforms into structured data. (ESB

Big Data

Big Data IoT Data Warehouse Data-driven

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

Before we dive into the topics of big data as a service and analytics applied to same, let’s quickly clarify data analytics using an oft-used application of analytics: Visualization! As we move from right to left in the diagram, from big data to BI, we notice that unstructured data transforms into structured data. (ESB

Modeling

Modeling Big Data IoT Data Warehouse

Mastering Data Analysis Report and Dashboard

FineReport

MARCH 7, 2024

Data Analysis Report (by FineReport ) Note: All the data analysis reports in this article are created using the FineReport reporting tool. Leveraging the advanced enterprise-level web reporting tool capabilities of FineReport , we empower businesses to achieve genuine data transformation. Try FineReport Now 1.

Dashboards

Dashboards Reporting Advertising Statistics

Manual Feature Engineering

Domino Data Lab

AUGUST 20, 2019

We are going to turn our attention away from expanding our catalog of models [as mentioned previously in the book ] and instead take a closer look at the data. Feature engineering refers to manipulation—addition, deletion, combination, mutation—of the features. In our example, we had data from a uniform or flat-ish distribution.

Testing

Testing Modeling Interactive Measurement

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Webinars

Trending Sources

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Webinars

Ten new visual transforms in AWS Glue Studio

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Tableau further democratizes analytics with AI-fueled features

Data Engineers Are Using AI to Verify Data Transformations

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

How healthcare organizations can analyze and create insights using price transparency data

Automating the Automators: Shift Change in the Robot Factory

Introducing blueprint discovery and other UI enhancements for Amazon OpenSearch Ingestion

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

7 key Microsoft Azure analytics services (plus one extra)

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Amazon Redshift data ingestion options

Extract time series from satellite weather data with AWS Lambda

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Addressing the Three Scalability Challenges in Modern Data Platforms

Advanced reporting and analytics for the Post Call Analytics (PCA) solution with Amazon QuickSight

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

Use Snowflake with Amazon MWAA to orchestrate data pipelines

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

Cross-account integration between SaaS platforms using Amazon AppFlow

­­Use fuzzy string matching to approximate duplicate records in Amazon Redshift

The Best Embedded BI Tools For 2024

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

The Modern Data Stack Explained: What The Future Holds

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Measuring Maturity

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Transforming Big Data into Actionable Intelligence

Building Better Data Models to Unlock Next-Level Intelligence

Mastering Data Analysis Report and Dashboard

Manual Feature Engineering

Stay Connected

Use fuzzy string matching to approximate duplicate records in Amazon Redshift