Data Transformation, Testing and Visualization

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

With this launch of JDBC connectivity, Amazon DataZone expands its support for data users, including analysts and scientists, allowing them to work in their preferred environments—whether it’s SQL Workbench, Domino, or Amazon-native solutions—while ensuring secure, governed access within Amazon DataZone. Choose Test connection.

Visualization

Visualization Data Lake Testing Data Governance

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and from third-party sources. When you’re connected, you can query, visualize, and share data—governed by Amazon DataZone—within Tableau.

Analytics

Analytics Visualization Data Governance Data-driven

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

This means you can refine your ETL jobs through natural follow-up questionsstarting with a basic data pipeline and progressively adding transformations, filters, and business logic through conversation. The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios.

Data Integration

Data Integration Visualization Data Processing Big Data

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

Through a visual designer, you can configure custom AI search flowsa series of AI-driven data enrichments performed during ingestion and search. You can use the flow builder through APIs or a visual designer. The visual designer is recommended for helping you manage workflow projects.

Machine Learning

Machine Learning Visualization Dashboards Metadata

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

For each service, you need to learn the supported authorization and authentication methods, data access APIs, and framework to onboard and test data sources. This approach simplifies your data journey and helps you meet your security requirements. Lets try a quick visualization to analyze the rating distribution.

Visualization

Visualization Data Processing Testing Publishing

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

At Atlanta’s Hartsfield-Jackson International Airport, an IT pilot has led to a wholesale data journey destined to transform operations at the world’s busiest airport, fueled by machine learning and generative AI. That enables the analytics team using Power BI to create a single visualization for the GM.”

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Available Now! Automated Testing for Data Transformations

Wayne Yaddow

FEBRUARY 18, 2025

Selecting the strategies and tools for validating data transformations and data conversions in your data pipelines. Introduction Data transformations and data conversions are crucial to ensure that raw data is organized, processed, and ready for useful analysis.

Testing

Testing Data Transformation Data-driven Data Quality

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Wayne Yaddow

MARCH 5, 2025

In this post, well see the fundamental procedures, tools, and techniques that data engineers, data scientists, and QA/testing teams use to ensure high-quality data as soon as its deployed. First, we look at how unit and integration tests uncover transformation errors at an early stage. PyTest, JUnit,NUnit).

Testing

Testing Data Transformation Statistics Metadata

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

Financial efficiency: One of the key benefits of big data in supply chain and logistics management is the reduction of unnecessary costs. Using the right dashboard and data visualizations, it’s possible to hone in on any trends or patterns that uncover inefficiencies within your processes.

Big Data

Big Data Internet of Things Cost-Benefit Optimization

Ten new visual transforms in AWS Glue Studio

AWS Big Data

MAY 9, 2023

AWS Glue Studio is a graphical interface that makes it easy to create, run, and monitor extract, transform, and load (ETL) jobs in AWS Glue. It allows you to visually compose data transformation workflows using nodes that represent different data handling steps, which later are converted automatically into code to run.

Visualization

Visualization Marketing Big Data IT

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

Duplicating data from a production database to a lower or lateral environment and masking personally identifiable information (PII) to comply with regulations enables development, testing, and reporting without impacting critical systems or exposing sensitive customer data. This adds a data source node to the canvas.

Visualization

Visualization Metadata Data Transformation Testing

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

AWS Big Data

JULY 27, 2023

AWS Glue Studio is a graphical interface that makes it easy to create, run, and monitor extract, transform, and load (ETL) jobs in AWS Glue. DataBrew is a visual data preparation tool that enables you to clean and normalize data without writing any code. Choose Visual with a blank canvas and create the visual job.

Visualization

Visualization Cost-Benefit Data Quality Publishing

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

He/she assists the organization by providing clarity and insight into advanced data technology solutions. As quality issues are often highlighted with the use of dashboard software , the change manager plays an important role in the visualization of data quality. Here, it all comes down to the data transformation error rate.

Data Quality

Data Quality Metrics Data-driven Management

Data Engineers Are Using AI to Verify Data Transformations

Wayne Yaddow

FEBRUARY 26, 2025

AI is transforming how senior data engineers and data scientists validate data transformations and conversions. Artificial intelligence-based verification approaches aid in the detection of anomalies, the enforcement of data integrity, and the optimization of pipelines for improved efficiency.

Data Transformation

Data Transformation Testing Data-driven Data Quality

What is business analytics? Using data to improve business outcomes

CIO Business Intelligence

JULY 5, 2022

While quantitative analysis, operational analysis, and data visualizations are key components of business analytics, the goal is to use the insights gained to shape business decisions. What is the difference between business analytics and data analytics? Business analytics is a subset of data analytics.

Business Analytics

Business Analytics Prescriptive Analytics Data mining Diagnostic Analytics

What is data analytics? Analyzing and managing data for decisions

CIO Business Intelligence

JUNE 7, 2022

Data analytics draws from a range of disciplines — including computer programming, mathematics, and statistics — to perform analysis on data in an effort to describe, predict, and improve performance. What are the four types of data analytics? Data analytics and data science are closely related.

Data Analytics

Data Analytics Diagnostic Analytics Management Analytics

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

Upload your data, click through a workflow, walk away. If you’re a professional data scientist, you already have the knowledge and skills to test these models. That takes us to a conspicuous omission from that list of roles: the data scientists who focused on building basic models. Get your results in a few hours.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

What is a DataOps Engineer?

DataKitchen

OCTOBER 5, 2021

The data organization wants to run the Value Pipeline as robustly as a six sigma factory, and it must be able to implement and deploy process improvements as rapidly as a Silicon Valley start-up. The data engineer builds data transformations. Their product is the data. Create tests. Run the factory.

Testing

Testing Dashboards Measurement Experimentation

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Here are a few examples that we have seen of how this can be done: Batch ETL with Azure Data Factory and Azure Databricks: In this pattern, Azure Data Factory is used to orchestrate and schedule batch ETL processes. Azure Blob Storage serves as the data lake to store raw data. Azure Machine Learning).

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

DataOps Observability: Taming the Chaos (Part 2)

DataKitchen

OCTOBER 25, 2022

The goal of DataOps Observability is to provide visibility of every journey that data takes from source to customer value across every tool, environment, data store, data and analytic team, and customer so that problems are detected, localized and raised immediately. A data journey spans and tracks multiple pipelines.

Testing

Testing Data-driven Visualization Dashboards

Migrate from Apache Solr to OpenSearch

AWS Big Data

JULY 18, 2024

The main driving factors include lower total cost of ownership, scalability, stability, improved ingestion connectors (such as Data Prepper , Fluent Bit, and OpenSearch Ingestion), elimination of external cluster managers like Zookeeper, enhanced reporting, and rich visualizations with OpenSearch Dashboards.

Dashboards

Dashboards Testing Data-driven Visualization

12 data science certifications that will pay off

CIO Business Intelligence

JANUARY 19, 2024

The exam tests general knowledge of the platform and applies to multiple roles, including administrator, developer, data analyst, data engineer, data scientist, and system architect. Candidates for the exam are tested on ML, AI solutions, NLP, computer vision, and predictive analytics.

Data Science

Data Science Machine Learning Predictive Modeling Forecasting

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

To grow the power of data at scale for the long term, it’s highly recommended to design an end-to-end development lifecycle for your data integration pipelines. The following are common asks from our customers: Is it possible to develop and test AWS Glue data integration jobs on my local laptop?

Data Integration

Data Integration Snapshot Testing Visualization

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow. A reimagined visual editor to boost developer productivity and enable self service. Figure 7: Test sessions provide an interactive experience that NiFi developers love.

Testing

Testing Cost-Benefit Interactive Visualization

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

CDP Data Engineering (1) – a service purpose-built for data engineers focused on deploying and orchestrating data transformation using Spark at scale. We invite you to learn more about CDP Public Cloud for yourself by watching a product demo or by taking the platform for a test drive (it’s free to get started). .

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

Once released, consumers use datasets from different providers for analysis, machine learning (ML) workloads, and visualization. Each CDH dataset has three processing layers: source (raw data), prepared (transformed data in Parquet), and semantic (combined datasets).

Analytics

Analytics Dashboards Metadata Data Warehouse

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

AWS offers Redshift Test Drive to validate whether the configuration chosen for Amazon Redshift is ideal for your workload before migrating the production environment. Do you want to know more about what we’re doing in the data area at Dafiti? We removed the DC2 cluster and completed the migration.

Data Lake

Data Lake Analytics Data Warehouse Data-driven

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources. Data transformations through stored procedures and use of materialized views to curate datasets and generate insights is a known pattern with relational databases.

Management

Management Metadata Analytics Dashboards

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

AWS Big Data

JUNE 20, 2023

Kinesis Data Firehose is a fully managed service for delivering near-real-time streaming data to various destinations for storage and performing near-real-time analytics. You can perform analytics on VPC flow logs delivered from your VPC using the Kinesis Data Firehose integration with Datadog as a destination.

Dashboards

Dashboards Visualization Metrics Data Transformation

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. This will enable right-sizing the Redshift data warehouse to meet workload demands cost-effectively. Additional considerations – Factor in additional tasks beyond schema conversion.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The advent of rapid adoption of serverless data lake architectures—with ever-growing datasets that need to be ingested from a variety of sources, followed by complex data transformation and machine learning (ML) pipelines—can present a challenge. Disable the rules after testing to avoid repeated messages.

Data Lake

Data Lake Metrics Cost-Benefit Testing

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

Allows them to iteratively develop processing logic and test with as little overhead as possible. Plays nice with existing CI/CD processes to promote a data pipeline to production. Provides monitoring, alerting, and troubleshooting for production data pipelines.

Testing

Testing Publishing Metadata Interactive

Extract time series from satellite weather data with AWS Lambda

AWS Big Data

JULY 6, 2023

It has not been specifically designed for heavy data transformation tasks. Configure the Step Functions workflow After you create the two Lambda functions, you can design the Step Functions workflow in the visual editor by using the Lambda Invoke and Map blocks, as shown in the following diagram. Add a data source block.

Machine Learning

Machine Learning Visualization IoT Digital Transformation

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

The data products from the Business Vault and Data Mart stages are now available for consumers. smava decided to use Tableau for business intelligence, data visualization, and further analytics. The data transformations are managed with dbt to simplify the workflow governance and team collaboration.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

AWS Big Data

JUNE 29, 2023

In this post, we discuss why AWS recommends moving from Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics for Apache Flink to take advantage of Apache Flink’s advanced streaming capabilities. Notebooks are provisioned quickly and provide a way for you to instantly view and analyze your streaming data.

Data Analytics

Data Analytics Analytics IoT Data Lake

Best BI Tools For 2024 You Need to Know

FineReport

MARCH 31, 2024

Harnessing the power of advanced APIs, automation, and AI, these tools simplify data compilation, organization, and visualization, empowering users to extract actionable insights effortlessly. Key features include comprehensive data connectors, user-friendly report-building tools, and web-based sharing options.

Dashboards

Dashboards Visualization Data mining Data-driven

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

AWS Big Data

FEBRUARY 8, 2023

For addressing data quality challenges in Amazon Simple Storage Service (Amazon S3) data lakes and data pipelines, AWS has announced AWS Glue Data Quality (preview). Now that the Python UDF has been created, you can test the response of different input values.

Data Quality

Data Quality Testing Data Warehouse Unstructured Data

How to Include BI in Your 2020 Budget

Sisense

DECEMBER 12, 2019

The 4 signs include: Reporting is done manually in Excel and is time consuming Difficulty pulling and joining data from multiple data sources Inability to access and utilize the data collected to see insights Need for data visualization in real time. This helps reduce extra costs beyond the software license fees.

Business Intelligence

Business Intelligence Software Data-driven Visualization

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

Cloudera

OCTOBER 14, 2020

Also, such a concept helps admin to visualize the jobs which are scheduled for debugging purposes. The Test and Development queue have fixed resource limits. For instance, Spark driver pods need to be scheduled earlier than worker pods. Lack of efficient capacity/quota management capability . Acknowledgments.

Machine Learning

Machine Learning Management Big Data Optimization

Using COD and CML to build applications that predict stock data

Cloudera

FEBRUARY 8, 2021

Simple, drag-and-drop building of dashboards and apps with Cloudera Data Visualization. Stock Data – for pulling the stock data, I used alpha vantage service (free version). Next step is to create our table in which the data will be stored in our database. Now, let’s start testing our model!

Machine Learning

Machine Learning Statistics Dashboards Modeling

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

If you’re testing on a different Amazon MWAA version, update the requirements file accordingly. For testing purposes, you can choose Add permissions and add the managed AmazonS3FullAccess policy to the user instead of providing restricted access. The requirements file is based on Amazon MWAA version 2.6.3.

Data Processing

Data Processing Management Publishing Visualization

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

In this post, we dive deep into the tool, walking through all steps from log ingestion, transformation, visualization, and architecture design to calculate TCO. With QuickSight, you can visualize YARN log data and conduct analysis against the datasets generated by pre-built dashboard templates and a widget. Choose Delete.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

The following AWS services are used for data ingestion, processing, and load: Amazon AppFlow is a fully managed integration service that enables you to securely transfer data between SaaS applications like Salesforce, SAP, Marketo, Slack, and ServiceNow, and AWS services like Amazon S3 and Amazon Redshift , in just a few clicks.

Sales

Sales Visualization Software Metadata

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Webinars

Trending Sources

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Webinars

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Available Now! Automated Testing for Data Transformations

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

Ten new visual transforms in AWS Glue Studio

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Data Engineers Are Using AI to Verify Data Transformations

What is business analytics? Using data to improve business outcomes

What is data analytics? Analyzing and managing data for decisions

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Automating the Automators: Shift Change in the Robot Factory

What is a DataOps Engineer?

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataOps Observability: Taming the Chaos (Part 2)

Migrate from Apache Solr to OpenSearch

12 data science certifications that will pay off

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Happy Birthday, CDP Public Cloud

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Monitor data pipelines in a serverless data lake

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Extract time series from satellite weather data with AWS Lambda

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

Best BI Tools For 2024 You Need to Know

­­Use fuzzy string matching to approximate duplicate records in Amazon Redshift

How to Include BI in Your 2020 Budget

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

Using COD and CML to build applications that predict stock data

Use Snowflake with Amazon MWAA to orchestrate data pipelines

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Cross-account integration between SaaS platforms using Amazon AppFlow

Stay Connected

Use fuzzy string matching to approximate duplicate records in Amazon Redshift