Data Transformation and Metrics - Data Leaders Brief

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data. 10) Data Quality Solutions: Key Attributes.

Data Quality

Data Quality Metrics Data-driven Management

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This approach helps in managing storage costs while maintaining the flexibility to analyze historical trends when needed.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

Jon Pruitt, director of IT at Hartsfield-Jackson Atlanta International Airport, and his team crafted a visual business intelligence dashboard for a top executive in its Emergency Response Team to provide key metrics at a glance, including weather status, terminal occupancy, concessions operations, and parking capacity.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Cloudera

JULY 21, 2022

Co-author: Mike Godwin, Head of Marketing, Rill Data. Cloudera has partnered with Rill Data, an expert in metrics at any scale, as Cloudera’s preferred ISV partner to provide technical expertise and support services for Apache Druid customers. Deploying metrics shouldn’t be so hard. Cloudera Data Warehouse).

Metrics

Metrics Slice and Dice Data Warehouse Dashboards

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

According to a study from Rocket Software and Foundry , 76% of IT decision-makers say challenges around accessing mainframe data and contextual metadata are a barrier to mainframe data usage, while 64% view integrating mainframe data with cloud data sources as the primary challenge.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Analytics

Analytics Data Warehouse Metrics Big Data

How Your Finance Team Can Lead Your Enterprise Data Transformation

Alation

OCTOBER 26, 2021

Building a Data Culture Within a Finance Department. Our finance users tell us that their first exposure to the Alation Data Catalog often comes soon after the launch of organization-wide data transformation efforts. After all, finance is one of the greatest consumers of data within a business.

Finance

Finance Data Transformation Enterprise Metrics

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. or a later version) database.

Data Warehouse

Data Warehouse Analytics Testing Modeling

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

With a unified catalog, enhanced analytics capabilities, and efficient data transformation processes, were laying the groundwork for future growth. Amazon DataZone empowers EUROGATE by setting the stage for long-term operational excellence and scalability.

IoT

IoT Machine Learning Metadata Data-driven

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

Identifying Anomalies: Use advanced algorithms to detect anomalies in data patterns. Establish baseline metrics for normal database operations, enabling the system to flag deviations as potential issues. Monitor for freshness, schema changes, volume, field health/quality, new tables, and usage.

Data Quality

Data Quality Testing Data Lake Data Integration

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Here are a few examples that we have seen of how this can be done: Batch ETL with Azure Data Factory and Azure Databricks: In this pattern, Azure Data Factory is used to orchestrate and schedule batch ETL processes. Azure Blob Storage serves as the data lake to store raw data. Azure Machine Learning). So go ahead.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

In this post, we discuss ways to modernize your legacy, on-premises, real-time analytics architecture to build serverless data analytics solutions on AWS using Amazon Managed Service for Apache Flink. Near-real-time streaming analytics captures the value of operational data and metrics to provide new insights to create business opportunities.

Management

Management Metadata Analytics Dashboards

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

In the past they understood the APIs of TensorFlow and Torch to build models by hand; today they are fluent in the autoML vendor’s APIs to train models, and they understand how to review the metrics. The second is the experienced ML professional who really knows how to build and tune models. It does not exist in the code.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

Is your data supply chain a liability?

CIO Business Intelligence

JUNE 23, 2022

The challenge is to capture source of the data correctly from the outset and ensure data quality does not degrade when moving across the data supply-chain. A key supply chain management metric used to evaluate the performance of physical supply chains is OTIF – On-Time-In-Full. Supply chain complexity.

Data Quality

Data Quality Key Performance Indicator Metrics KPI

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The advent of rapid adoption of serverless data lake architectures—with ever-growing datasets that need to be ingested from a variety of sources, followed by complex data transformation and machine learning (ML) pipelines—can present a challenge. Notify any failures to a Slack channel.

Data Lake

Data Lake Metrics Testing Cost-Benefit

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. The success criteria are the key performance indicators (KPIs) for each component of the data workflow. Data transformation experts to convert database stored functions in the producer or consumer.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Automate alerting and reporting for AWS Glue job resource usage

AWS Big Data

MAY 25, 2023

Data transformation plays a pivotal role in providing the necessary data insights for businesses in any organization, small and large. To gain these insights, customers often perform ETL (extract, transform, and load) jobs from their source systems and output an enriched dataset.

Reporting

Reporting Metrics Optimization Data Lake

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

AWS Big Data

JUNE 20, 2023

Kinesis Data Firehose is a fully managed service for delivering near-real-time streaming data to various destinations for storage and performing near-real-time analytics. You can perform analytics on VPC flow logs delivered from your VPC using the Kinesis Data Firehose integration with Datadog as a destination.

Dashboards

Dashboards Visualization Metrics Data Transformation

What is business analytics? Using data to improve business outcomes

CIO Business Intelligence

JULY 5, 2022

What is the difference between business analytics and data analytics? Business analytics is a subset of data analytics. Data analytics is used across disciplines to find trends and solve problems using data mining , data cleansing, data transformation, data modeling, and more.

Business Analytics

Business Analytics Prescriptive Analytics Data mining Diagnostic Analytics

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Furthermore, it allows for necessary actions to be taken, such as rectifying errors in the data source, refining data transformation processes, and updating data quality rules. The Lambda function is responsible for converting the data quality metrics and dispatching them to the designated email addresses via Amazon SNS.

Data Quality

Data Quality Metrics Data-driven Visualization

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

In this post, we delve into a case study for a retail use case, exploring how the Data Build Tool (dbt) was used effectively within an AWS environment to build a high-performing, efficient, and modern data platform. It does this by helping teams handle the T in ETL (extract, transform, and load) processes.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Best Web Analytics 2.0 Tools: Quantitative, Qualitative, Life Saving!

Occam's Razor

OCTOBER 19, 2010

Mongoose Metrics ~ ifbyphone. I know Mongoose Metrics a bit more and have been impressed with their solution and evolution over the last couple of years. Twitter to me is a proxy of how data collection is changing and what the future of relevant metrics might look like. Mongoose Metrics. AnalyzeWords. LivePerson.

Analytics

Analytics Testing Measurement Optimization

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Alation

OCTOBER 18, 2022

Alation is pleased to be named a dbt Metrics Partner and to announce the start of a partnership with dbt, which will bring dbt data into the Alation data catalog. In the modern data stack, dbt is a key tool to make data ready for analysis. Data Transformation in the Modern Data Stack.

Metadata

Metadata Metrics Recreation/Entertainment Data Quality

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

The difference lies in when and where data transformation takes place. In ETL, data is transformed before it’s loaded into the data warehouse. In ELT, raw data is loaded into the data warehouse first, then it’s transformed directly within the warehouse.

Dashboards

Dashboards Analytics Metadata Data Warehouse

Adding AI to Products: A High-Level Guide for Product Managers

Sisense

AUGUST 6, 2020

An obvious mechanical answer is: use relevance as a metric. Another important method is to benchmark existing metrics. Know the limitations of your existing dataset and answer these questions: What categories of data are there? What data transformations are needed from your data scientists to prepare the data?

Management

Management Machine Learning Key Performance Indicator Cost-Benefit

What is a DataOps Engineer?

DataKitchen

OCTOBER 5, 2021

The data organization wants to run the Value Pipeline as robustly as a six sigma factory, and it must be able to implement and deploy process improvements as rapidly as a Silicon Valley start-up. The data engineer builds data transformations. Their product is the data.

Testing

Testing Dashboards Measurement Experimentation

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

After the read query validation stage was complete and we were satisfied with the performance, we reconnected our orchestrator so that the data transformation queries could be run in the new cluster. At this point, only one-time queries and those made by Amazon QuickSight reached the new cluster.

Data Lake

Data Lake Analytics Data Warehouse Data-driven

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

Detailed Data and Model Lineage Tracking*: Ensures comprehensive tracking and documentation of data transformations and model lifecycle events, enhancing reproducibility and auditability.

Optimization

Optimization Experimentation Metrics Enterprise

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

If storing operational data in a data warehouse is a requirement, synchronization of tables between operational data stores and Amazon Redshift tables is supported. In scenarios where data transformation is required, you can use Redshift stored procedures to modify data in Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

Specifically, the system uses Amazon SageMaker Processing jobs to process the data stored in the data lake, employing the AWS SDK for Pandas (previously known as AWS Wrangler) for various data transformation operations, including cleaning, normalization, and feature engineering.

Data Lake

Data Lake Analytics Snapshot Data Quality

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

Let’s look at some key metrics. After analyzing YARN logs by various metrics, you’re ready to design future EMR architectures. His area of interests are data lakes and cloud modern data architecture delivery. Kalen Zhang was the Global Segment Tech Lead of Partner Data and Analytics at AWS.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

With the proliferation of IoT devices and the abundance of data generated by them, it has become possible to collect real-time data on inventory levels, customer behavior, and other key metrics. In the inventory management and forecasting solution, AWS Glue is recommended for data transformation.

Forecasting

Forecasting Management IoT Data-driven

DataOps Observability: Taming the Chaos (Part 2)

DataKitchen

OCTOBER 25, 2022

It’s because it’s a hard thing to accomplish when there are so many teams, locales, data sources, pipelines, dependencies, data transformations, models, visualizations, tests, internal customers, and external customers. It’s not just a fear of change.

Testing

Testing Data-driven Visualization Dashboards

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Data ingestion – Steps 1 and 2 use AWS DMS, which connects to the source database and moves full and incremental data (CDC) to Amazon S3 in Parquet format. Data transformation – Steps 3 and 4 represent an EMR Serverless Spark application (Amazon EMR 6.9 Let’s refer to this S3 bucket as the raw layer.

Data Lake

Data Lake Dashboards Metrics Metadata

Alation & Bigeye: A Potent Partnership for Data Quality

Alation

DECEMBER 7, 2021

This platform should: Connect to diverse data sources (on-prem, hybrid, legacy, or modern). Extract data quality information. Monitor data anomalies and data drift. Track how data transforms, noting unexpected changes during its lifecycle. Alation’s Data Catalog: Built-in Data Quality Capabilities.

Data Quality

Data Quality Data-driven Metrics Dashboards

Build Hybrid Data Pipelines and Enable Universal Connectivity With CDF-PC Inbound Connections

Cloudera

JUNE 17, 2022

The NiFi flow behind the Inbound Connection can not only receive data and forward it to a Kafka topic, but can perform schema validation, format conversions, and data transformation, as well as routing, filtering, and enriching the data.

Cost-Benefit

Cost-Benefit IoT Data Warehouse Manufacturing

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

Reporting Reporting contains the flattest and most cleaned version of our data. It often will collapse the metrics in a fact table to the level of a single dimension through a form of aggregation or lookback window. Importantly, both workflows for data analytics are supported by a set of data models that follow the same data pipeline.

Modeling

Modeling Big Data IoT Data Warehouse

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

A critical feature for every developer however is to get instantaneous feedback like configuration validations or performance metrics, as well as previewing data transformations for each step of their data flow. Test Sessions provide this functionality by provisioning compute resources on the fly within minutes.

Testing

Testing Publishing Metadata Interactive

Declarative Knowledge Graph APIs

Ontotext

DECEMBER 9, 2020

Are you having difficulty joining your knowledge graph APIs with other data sources? Maybe you spend an inordinate amount of time and effort managing operational concerns, deployments, monitoring, metrics and log collation? This leads to lots of small data fetches to/from GraphDB over the network.

Modeling

Modeling Management Optimization Machine Learning

Sure, Trust Your Data… Until It Breaks Everything: How Automated Data Lineage Saves the Day

Octopai

JUNE 9, 2024

For instance, aligning patient care data from Oracle databases with operational metrics from Power BI was daunting without clear data lineage. Different departments managed their data independently, leading to silos and inconsistencies. Accurate data lineage rebuilt trust among decision-makers.

IT

IT Data-driven Predictive Analytics Data Strategy

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Alation

OCTOBER 27, 2022

Few actors in the modern data stack have inspired the enthusiasm and fervent support as dbt. This data transformation tool enables data analysts and engineers to transform, test and document data in the cloud data warehouse. But what does this mean from a practitioner perspective?

Metrics

Metrics Dashboards Sales Reporting

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

This allows business analysts and decision-makers to gain valuable insights, visualize key metrics, and explore the data in depth, enabling informed decision-making and strategic planning for pricing and promotional strategies. Refer to Editing AWS Glue managed data transform nodes for more information.

Analytics

Analytics Data-driven Data Integration Data Lake

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

AWS Big Data

FEBRUARY 26, 2024

You simply configure your data sources to send information to OpenSearch Ingestion, which then automatically delivers the data to your specified destination. Additionally, you can configure OpenSearch Ingestion to apply data transformations before delivery.

Dashboards

Dashboards Testing Metrics Optimization

Improve power utility operational efficiency using smart sensor data and Amazon QuickSight

AWS Big Data

MAY 16, 2023

Data collection and processing are handled by a third-party smart sensor manufacturer application residing in Amazon Virtual Private Cloud (Amazon VPC) private subnets behind a Network Load Balancer. The AWS Glue Data Catalog contains the table definitions for the smart sensor data sources stored in the S3 buckets.

Dashboards

Dashboards Statistics Data Collection Business Intelligence

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Trending Sources

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Webinars

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Bridging the gap between mainframe data and hybrid cloud environments

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

How Your Finance Team Can Lead Your Enterprise Data Transformation

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

How EUROGATE established a data mesh architecture using Amazon DataZone

Navigating the Chaos of Unruly Data: Solutions for Data Teams

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Automating the Automators: Shift Change in the Robot Factory

Is your data supply chain a liability?

Monitor data pipelines in a serverless data lake

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Automate alerting and reporting for AWS Glue job resource usage

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

What is business analytics? Using data to improve business outcomes

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Best Web Analytics 2.0 Tools: Quantitative, Qualitative, Life Saving!

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Adding AI to Products: A High-Level Guide for Product Managers

What is a DataOps Engineer?

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

Deploy and Scale AI Applications With Cloudera AI Inference Service

Amazon Redshift data ingestion options

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Reference guide to build inventory management and forecasting solutions on AWS

DataOps Observability: Taming the Chaos (Part 2)

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Alation & Bigeye: A Potent Partnership for Data Quality

Build Hybrid Data Pipelines and Enable Universal Connectivity With CDF-PC Inbound Connections

Building Better Data Models to Unlock Next-Level Intelligence

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Declarative Knowledge Graph APIs

Sure, Trust Your Data… Until It Breaks Everything: How Automated Data Lineage Saves the Day

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

Improve power utility operational efficiency using smart sensor data and Amazon QuickSight

Stay Connected