Data Transformation, Metrics and Testing

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data. 10) Data Quality Solutions: Key Attributes.

Data Quality

Data Quality Metrics Data-driven Management

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This approach helps in managing storage costs while maintaining the flexibility to analyze historical trends when needed.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. Choose Test Connection.

Data Warehouse

Data Warehouse Analytics Testing Modeling

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

Jon Pruitt, director of IT at Hartsfield-Jackson Atlanta International Airport, and his team crafted a visual business intelligence dashboard for a top executive in its Emergency Response Team to provide key metrics at a glance, including weather status, terminal occupancy, concessions operations, and parking capacity.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Cloudera

JULY 21, 2022

Co-author: Mike Godwin, Head of Marketing, Rill Data. Cloudera has partnered with Rill Data, an expert in metrics at any scale, as Cloudera’s preferred ISV partner to provide technical expertise and support services for Apache Druid customers. Deploying metrics shouldn’t be so hard. Cloudera Data Warehouse).

Metrics

Metrics Slice and Dice Data Warehouse Dashboards

Data Engineers Are Using AI to Verify Data Transformations

Wayne Yaddow

FEBRUARY 26, 2025

AI is transforming how senior data engineers and data scientists validate data transformations and conversions. Artificial intelligence-based verification approaches aid in the detection of anomalies, the enforcement of data integrity, and the optimization of pipelines for improved efficiency.

Data Transformation

Data Transformation Testing Data-driven Data Quality

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

Extrinsic Control Deficit: Many of these changes stem from tools and processes beyond the immediate control of the data team. Unregulated ETL/ELT Processes: The absence of stringent data quality tests in ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) processes further exacerbates the problem.

Data Quality

Data Quality Testing Data Lake Data Integration

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

Upload your data, click through a workflow, walk away. If you’re a professional data scientist, you already have the knowledge and skills to test these models. Get your results in a few hours. Why would you want autoML to build models for you? It buys time and breathing room. It does not exist in the code.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

What is a DataOps Engineer?

DataKitchen

OCTOBER 5, 2021

The data organization wants to run the Value Pipeline as robustly as a six sigma factory, and it must be able to implement and deploy process improvements as rapidly as a Silicon Valley start-up. The data engineer builds data transformations. Their product is the data. Create tests. Run the factory.

Testing

Testing Dashboards Measurement Experimentation

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Here are a few examples that we have seen of how this can be done: Batch ETL with Azure Data Factory and Azure Databricks: In this pattern, Azure Data Factory is used to orchestrate and schedule batch ETL processes. Azure Blob Storage serves as the data lake to store raw data. Azure Machine Learning).

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

What is business analytics? Using data to improve business outcomes

CIO Business Intelligence

JULY 5, 2022

What is the difference between business analytics and data analytics? Business analytics is a subset of data analytics. Data analytics is used across disciplines to find trends and solve problems using data mining , data cleansing, data transformation, data modeling, and more.

Business Analytics

Business Analytics Prescriptive Analytics Data mining Diagnostic Analytics

Best Web Analytics 2.0 Tools: Quantitative, Qualitative, Life Saving!

Occam's Razor

OCTOBER 19, 2010

If you can show ROI on a DW it would be a good use of your money to go with Omniture Discover, WebTrends Data Mart, Coremetrics Explore. If you have evolved to a stage that you need behavior targeting then get Omniture Test and Target or Sitespect. Mongoose Metrics ~ ifbyphone. Five Reasons And Awesome Testing Ideas.

Analytics

Analytics Testing Measurement Optimization

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. The success criteria are the key performance indicators (KPIs) for each component of the data workflow. This will enable right-sizing the Redshift data warehouse to meet workload demands cost-effectively.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Development Strategies to Prevent Data Quality Issues in Production (Part 1)

Wayne Yaddow

MARCH 3, 2025

Photo by CDC on Unsplash Many data pipeline failures and quality issues that are detected by data observability tools in production could have been prevented earlier in the pipeline lifecycle with better pre-production testing strategies. Helps identify transformation errors, and data quality issues early, minimizing risks.

Data Quality

Data Quality Strategy ROI Testing

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

A modern data platform entails maintaining data across multiple layers, targeting diverse platform capabilities like high performance, ease of development, cost-effectiveness, and DataOps features such as CI/CD, lineage, and unit testing. It does this by helping teams handle the T in ETL (extract, transform, and load) processes.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The advent of rapid adoption of serverless data lake architectures—with ever-growing datasets that need to be ingested from a variety of sources, followed by complex data transformation and machine learning (ML) pipelines—can present a challenge. Disable the rules after testing to avoid repeated messages.

Data Lake

Data Lake Metrics Testing Cost-Benefit

DataOps Observability: Taming the Chaos (Part 2)

DataKitchen

OCTOBER 25, 2022

The goal of DataOps Observability is to provide visibility of every journey that data takes from source to customer value across every tool, environment, data store, data and analytic team, and customer so that problems are detected, localized and raised immediately. A data journey spans and tracks multiple pipelines.

Testing

Testing Data-driven Visualization Dashboards

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

In this post, we discuss ways to modernize your legacy, on-premises, real-time analytics architecture to build serverless data analytics solutions on AWS using Amazon Managed Service for Apache Flink. Near-real-time streaming analytics captures the value of operational data and metrics to provide new insights to create business opportunities.

Management

Management Metadata Analytics Dashboards

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

AWS offers Redshift Test Drive to validate whether the configuration chosen for Amazon Redshift is ideal for your workload before migrating the production environment. At this point, only one-time queries and those made by Amazon QuickSight reached the new cluster. We removed the DC2 cluster and completed the migration.

Data Lake

Data Lake Analytics Data Warehouse Data-driven

Automate alerting and reporting for AWS Glue job resource usage

AWS Big Data

MAY 25, 2023

Data transformation plays a pivotal role in providing the necessary data insights for businesses in any organization, small and large. To gain these insights, customers often perform ETL (extract, transform, and load) jobs from their source systems and output an enriched dataset.

Reporting

Reporting Metrics Optimization Data Lake

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

Allows them to iteratively develop processing logic and test with as little overhead as possible. Plays nice with existing CI/CD processes to promote a data pipeline to production. Provides monitoring, alerting, and troubleshooting for production data pipelines.

Testing

Testing Publishing Metadata Interactive

Adding AI to Products: A High-Level Guide for Product Managers

Sisense

AUGUST 6, 2020

An obvious mechanical answer is: use relevance as a metric. Another important method is to benchmark existing metrics. Know the limitations of your existing dataset and answer these questions: What categories of data are there? Be sure test cases represent the diversity of app users. Can a chatbot help improve relations?

Management

Management Machine Learning Key Performance Indicator Cost-Benefit

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

AWS Big Data

JUNE 20, 2023

Kinesis Data Firehose is a fully managed service for delivering near-real-time streaming data to various destinations for storage and performing near-real-time analytics. You can perform analytics on VPC flow logs delivered from your VPC using the Kinesis Data Firehose integration with Datadog as a destination.

Dashboards

Dashboards Visualization Metrics Data Transformation

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

Each CDH dataset has three processing layers: source (raw data), prepared (transformed data in Parquet), and semantic (combined datasets). It is possible to define stages (DEV, INT, PROD) in each layer to allow structured release and test without affecting PROD.

Dashboards

Dashboards Analytics Metadata Data Warehouse

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

AWS Big Data

FEBRUARY 26, 2024

You simply configure your data sources to send information to OpenSearch Ingestion, which then automatically delivers the data to your specified destination. Additionally, you can configure OpenSearch Ingestion to apply data transformations before delivery. Choose the Test tab. For Method type ¸ choose POST.

Dashboards

Dashboards Testing Metrics Optimization

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

Detailed Data and Model Lineage Tracking*: Ensures comprehensive tracking and documentation of data transformations and model lifecycle events, enhancing reproducibility and auditability.

Optimization

Optimization Experimentation Metrics Enterprise

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Alation

OCTOBER 18, 2022

Alation is pleased to be named a dbt Metrics Partner and to announce the start of a partnership with dbt, which will bring dbt data into the Alation data catalog. In the modern data stack, dbt is a key tool to make data ready for analysis. Data Transformation in the Modern Data Stack.

Metadata

Metadata Metrics Recreation/Entertainment Data Quality

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

Let’s look at some key metrics. After analyzing YARN logs by various metrics, you’re ready to design future EMR architectures. Clean up After you complete all the steps and finish testing, complete the following steps to delete resources to avoid incurring costs: On the AWS CloudFormation console, choose the stack you created.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Alation

OCTOBER 27, 2022

Few actors in the modern data stack have inspired the enthusiasm and fervent support as dbt. This data transformation tool enables data analysts and engineers to transform, test and document data in the cloud data warehouse. But what does this mean from a practitioner perspective?

Dashboards

Dashboards Metrics Sales Reporting

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

Tricentis is the global leader in continuous testing for DevOps, cloud, and enterprise applications. Speed changes everything, and continuous testing across the entire CI/CD lifecycle is the key. Tricentis instills that confidence by providing software tools that enable Agile Continuous Testing (ACT) at scale.

Software

Software Data Lake Testing Dashboards

AI, the Power of Knowledge and the Future Ahead: An Interview with Head of Ontotext’s R&I Milena Yankova

Ontotext

APRIL 4, 2019

Within a large enterprise, there is a huge amount of data accumulated over the years – many decisions have been made and different methods have been tested. They have different metrics for judging whether some content is interesting or not. This is one of the main diagnostic tests.

Recreation/Entertainment

Recreation/Entertainment Testing Enterprise Knowledge Discovery

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

AWS Big Data

JULY 27, 2023

DataBrew is a visual data preparation tool that enables you to clean and normalize data without writing any code. The over 200 transformations it provides are now available to be used in an AWS Glue Studio visual job. On the Runs tab, you can keep track of the process and see detailed job metrics using the job ID link.

Visualization

Visualization Cost-Benefit Data Quality Publishing

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

AWS Big Data

FEBRUARY 21, 2023

We use Apache Spark as our main data processing engine and have over 1,000 Spark applications running over massive amounts of data every day. These Spark applications implement our business logic ranging from data transformation, machine learning (ML) model inference, to operational tasks. Their costs were climbing.

Cost-Benefit

Cost-Benefit Informatics Optimization Management

Smart Factories: Artificial Intelligence and Automation for Reduced OPEX in Manufacturing

DataRobot Blog

MARCH 10, 2022

The first step in building a model that can predict machine failure and even recommend the next best course of action is to aggregate, clean, and prepare data to train against. This task may require complex joins, aggregations, filtering, window functions, and many other data transformations against extremely large-scale data sets.

Manufacturing

Manufacturing IoT Machine Learning Forecasting

The Rising Need for Data Governance in Healthcare

Alation

OCTOBER 28, 2021

This, in turn, empowers data leaders to better identify and develop new revenue streams, customize patient offerings, and use data to optimize operations. To make good on this potential, healthcare organizations need to understand their data and how they can use it. Why Is Data Governance in Healthcare Important?

Data Governance

Data Governance Measurement Data Quality Metrics

Bringing MMM to 21st Century with Machine Learning and Automation?

DataRobot Blog

APRIL 4, 2022

Media data (usually weekly): media costs, media ratings generated (TVRs, magazine copies, digital impressions, likes, shares, etc.), The standard practice is that the data should be aggregated into a weekly format and span at least the last two to three years (ideally around five years). Classical Modeling Considerations.

Machine Learning

Machine Learning Sales Measurement ROI

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

DataKitchen

APRIL 26, 2021

GSK had been pursuing DataOps capabilities such as automation, containerization, automated testing and monitoring, and reusability, for several years. DataOps provides the “continuous delivery equivalent for Machine Learning and enables teams to manage the complexities around continuous training, A/B testing, and deploying without downtime.

Measurement

Measurement Metrics Data-driven Dashboards

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

The data products from the Business Vault and Data Mart stages are now available for consumers. smava decided to use Tableau for business intelligence, data visualization, and further analytics. The data transformations are managed with dbt to simplify the workflow governance and team collaboration.

Data Lake

Data Lake Data Warehouse Data-driven B2B

How DeNA Co., Ltd. accelerated anonymized data quality tests up to 100 times faster using Amazon Redshift Serverless and dbt

AWS Big Data

DECEMBER 17, 2024

Conduct data quality tests on anonymized data in compliance with data policies Conduct data quality tests to quickly identify and address data quality issues, maintaining high-quality data at all times. The challenge Data quality tests require performing 1,300 tests on 10 TB of data monthly.

Data Quality

Data Quality Testing Metrics Optimization

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

AWS Big Data

FEBRUARY 25, 2025

If data mapping has been enabled within the data processing job, then the structured data is prepared based on the given schema. This output is passed to next phase where data transformations and business validations can be applied. After this step, data is loaded to specified target.

Data Processing

Data Processing Machine Learning Data-driven Cost-Benefit

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

DataOps Observability includes monitoring and testing the data pipeline, data quality, data testing, and alerting. Data testing is an essential aspect of DataOps Observability; it helps to ensure that data is accurate, complete, and consistent with its specifications, documentation, and end-user requirements.

Testing

Testing Data Quality Data Governance Data-driven

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

As a result, end users can better view shared metrics (backed by accurate data), which ultimately drives performance. When treating a patient, a doctor may wish to study the patient’s vital metrics in comparison to those of their peer group. Visual Analytics Users are given data from which they can uncover new insights.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Tableau certification guide: How to boost your data analytics skills

CIO Business Intelligence

JANUARY 10, 2025

Tableaus certifications, in particular, focus on performance-based testing rather than theory in an effort to verify a candidates ability to apply the subject matter in a real work environment. The Tableau Certified Data Analyst title is active for two years from the date achieved. The certification does not expire.

Data Analytics

Data Analytics Analytics Consulting Visualization

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Trending Sources

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Data Engineers Are Using AI to Verify Data Transformations

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Automating the Automators: Shift Change in the Robot Factory

What is a DataOps Engineer?

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

What is business analytics? Using data to improve business outcomes

Best Web Analytics 2.0 Tools: Quantitative, Qualitative, Life Saving!

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Development Strategies to Prevent Data Quality Issues in Production (Part 1)

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Monitor data pipelines in a serverless data lake

DataOps Observability: Taming the Chaos (Part 2)

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

Automate alerting and reporting for AWS Glue job resource usage

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Adding AI to Products: A High-Level Guide for Product Managers

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

Deploy and Scale AI Applications With Cloudera AI Inference Service

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AI, the Power of Knowledge and the Future Ahead: An Interview with Head of Ontotext’s R&I Milena Yankova

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

Smart Factories: Artificial Intelligence and Automation for Reduced OPEX in Manufacturing

The Rising Need for Data Governance in Healthcare

Bringing MMM to 21st Century with Machine Learning and Automation?

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

How smava makes loans transparent and affordable using Amazon Redshift Serverless

How DeNA Co., Ltd. accelerated anonymized data quality tests up to 100 times faster using Amazon Redshift Serverless and dbt

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

“You Complete Me,” said Data Lineage to DataOps Observability.

What Is Embedded Analytics?

Tableau certification guide: How to boost your data analytics skills

Stay Connected