Big Data, Data Quality and Data Transformation

Big Data

Data Quality

Data Transformation

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data.

Data Quality

Data Quality Metrics Data-driven Management

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

With all the data in and around the enterprise, users would say that they have a lot of information but need more insights to assist them in producing better and more informative content. This is where we dispel an old “big data” notion (heard a decade ago) that was expressed like this: “we need our data to run at the speed of business.”

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. With dbt, teams can define data quality checks and access controls as part of their transformation workflow.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Alerts and notifications play a crucial role in maintaining data quality because they facilitate prompt and efficient responses to any data quality issues that may arise within a dataset. This proactive approach helps mitigate the risk of making decisions based on inaccurate information.

Data Quality

Data Quality Metrics Data-driven Visualization

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. We would like to talk about data visualization and its role in the big data movement.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

cycle_end";') con.close() With this, as the data lands in the curated data lake (Amazon S3 in parquet format) in the producer account, the data science and AI teams gain instant access to the source data eliminating traditional delays in the data availability. She can reached via LinkedIn.

IoT

IoT Machine Learning Metadata Data-driven

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Domain ownership recognizes that the teams generating the data have the deepest understanding of it and are therefore best suited to manage, govern, and share it effectively. This principle makes sure data accountability remains close to the source, fostering higher data quality and relevance.

Metadata

Metadata Data Governance Data Quality Data-driven

Is your data supply chain a liability?

CIO Business Intelligence

JUNE 23, 2022

Yet as companies fight for skilled analyst roles to utilize data to make better decisions , they often fall short in improving the data supply chain and resulting data quality. Without a solid data supply-chain management practices in place, data quality often suffers. First mile/last mile impacts.

Data Quality

Data Quality Key Performance Indicator Metrics KPI

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

The Orca Platform is powered by a state-of-the-art anomaly detection system that uses cutting-edge ML algorithms and big data capabilities to detect potential security threats and alert customers in real time, ensuring maximum security for their cloud environment. This ensures that the data is suitable for training purposes.

Data Lake

Data Lake Analytics Snapshot Data Quality

Time for New Partnership Paradigms to Be Future-fit

CIO Business Intelligence

DECEMBER 6, 2023

Airbus was conceiving an ambitious plan to develop an open aviation data platform, Skywise, as a single platform of reference for all major aviation players that would enable them to improve their operational performance and business results and support Airbus’ own digital transformation.

Digital Transformation

Digital Transformation Software Cost-Benefit Manufacturing

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

In this post, we delve into a case study for a retail use case, exploring how the Data Build Tool (dbt) was used effectively within an AWS environment to build a high-performing, efficient, and modern data platform. It does this by helping teams handle the T in ETL (extract, transform, and load) processes. usr/local/airflow/.local/bin/dbt

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. Data engineers are crucial for schema conversion and data transformation, and DBAs can handle cluster configuration and workload monitoring. Platform architects define a well-architected platform.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

The Best Data Management Tools For Small Businesses

Smart Data Collective

APRIL 29, 2020

The techniques for managing organisational data in a standardised approach that minimises inefficiency. Extraction, Transform, Load (ETL). The extraction of raw data, transforming to a suitable format for business needs, and loading into a data warehouse. Data transformation. Microsoft Azure.

Management

Management Data Warehouse Digital Transformation Dashboards

The importance of data ingestion and integration for enterprise AI

IBM Big Data Hub

JANUARY 9, 2024

The entire generative AI pipeline hinges on the data pipelines that empower it, making it imperative to take the correct precautions. 4 key components to ensure reliable data ingestion Data quality and governance: Data quality means ensuring the security of data sources, maintaining holistic data and providing clear metadata.

Enterprise

Enterprise Data Integration Data Quality Contextual Data

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

AWS Big Data

JULY 27, 2023

DataBrew is a visual data preparation tool that enables you to clean and normalize data without writing any code. The over 200 transformations it provides are now available to be used in an AWS Glue Studio visual job. Now that we identified the data quality issues to address, we need to decide how to deal with each case.

Visualization

Visualization Cost-Benefit Data Quality Publishing

Self-service vs Centralized Data Management: How to Leverage Data Lineage to Empower and Control

Octopai

JUNE 22, 2023

In the era of big data, organizations are grappling with the challenge of effectively managing and leveraging vast amounts of data. Two prominent approaches have emerged: self-service data management and centralized data management. This approach offers greater control and reduces the risk of data inconsistencies.

Management

Management Data-driven Data Quality Data Governance

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

AWS Big Data

FEBRUARY 8, 2023

It’s common to ingest multiple data sources into Amazon Redshift to perform analytics. Often, each data source will have its own processes of creating and maintaining data, which can lead to data quality challenges within and across sources. Answering questions as simple as “How many unique customers do we have?”

Data Quality

Data Quality Testing Data Warehouse Unstructured Data

A step-by-step guide to setting up a data governance program

IBM Big Data Hub

FEBRUARY 9, 2023

In our last blog , we delved into the seven most prevalent data challenges that can be addressed with effective data governance. Today we will share our approach to developing a data governance program to drive data transformation and fuel a data-driven culture.

Data Governance

Data Governance Business Objectives Data Quality Measurement

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

AWS Glue provides both visual and code-based interfaces to make data integration effortless. Using a native AWS Glue connector increases agility, simplifies data movement, and improves data quality. This enables organizations to streamline data integration and analytics with OpenSearch Service.

Analytics

Analytics IT Data Lake Visualization

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

Additionally, the scale is significant because the multi-tenant data sources provide a continuous stream of testing activity, and our users require quick data refreshes as well as historical context for up to a decade due to compliance and regulatory demands. Finally, data integrity is of paramount importance.

Software

Software Data Lake Testing Cost-Benefit

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

Background The success of a data-driven organization recognizes data as a key enabler to increase and sustain innovation. The goal of a data product is to solve the long-standing issue of data silos and data quality. It follows what is called a distributed system architecture.

Technology

Technology Data-driven Machine Learning Sales

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

Businesses face significant hurdles when preparing data for artificial intelligence (AI) applications. The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

AWS Big Data

APRIL 5, 2023

“Each of these tools were getting data from a different place, and that’s where it gets difficult,” says Jeroen Minnaert, head of data at Showpad. “If If each tool tells a different story because it has different data, we won’t have alignment within the business on what this data means.”

Dashboards

Dashboards Reporting Cost-Benefit Visualization

What is a Data Pipeline?

Jet Global

MAY 9, 2024

Data Extraction : The process of gathering data from disparate sources, each of which may have its own schema defining the structure and format of the data and making it available for processing. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

How DeNA Co., Ltd. accelerated anonymized data quality tests up to 100 times faster using Amazon Redshift Serverless and dbt

AWS Big Data

DECEMBER 17, 2024

Prevent the inclusion of invalid values in categorical data and process data without any data loss. Conduct data quality tests on anonymized data in compliance with data policies Conduct data quality tests to quickly identify and address data quality issues, maintaining high-quality data at all times.

Data Quality

Data Quality Testing Metrics Optimization

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

AWS Big Data

FEBRUARY 25, 2025

If data mapping has been enabled within the data processing job, then the structured data is prepared based on the given schema. This output is passed to next phase where data transformations and business validations can be applied. After this step, data is loaded to specified target.

Data Processing

Data Processing Machine Learning Data-driven Cost-Benefit

Data Leaders Brief

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

SAP Datasphere Powers Business at the Speed of Data

Webinars

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

Biggest Trends in Data Visualization Taking Shape in 2022

How EUROGATE established a data mesh architecture using Amazon DataZone

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Is your data supply chain a liability?

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Time for New Partnership Paradigms to Be Future-fit

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

The Best Data Management Tools For Small Businesses

The importance of data ingestion and integration for enterprise AI

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

Self-service vs Centralized Data Management: How to Leverage Data Lineage to Empower and Control

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

A step-by-step guide to setting up a data governance program

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Automate discovery of data relationships using ML and Amazon Neptune graph technology

Tackling AI’s data challenges with IBM databases on AWS

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

What is a Data Pipeline?

How DeNA Co., Ltd. accelerated anonymized data quality tests up to 100 times faster using Amazon Redshift Serverless and dbt

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

Stay Connected

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

SAP Datasphere Powers Business at the Speed of Data

Webinars

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

Biggest Trends in Data Visualization Taking Shape in 2022

How EUROGATE established a data mesh architecture using Amazon DataZone

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Is your data supply chain a liability?

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Time for New Partnership Paradigms to Be Future-fit

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

The Best Data Management Tools For Small Businesses

The importance of data ingestion and integration for enterprise AI

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

Self-service vs Centralized Data Management: How to Leverage Data Lineage to Empower and Control

­­Use fuzzy string matching to approximate duplicate records in Amazon Redshift

A step-by-step guide to setting up a data governance program

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Automate discovery of data relationships using ML and Amazon Neptune graph technology

Tackling AI’s data challenges with IBM databases on AWS

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

What is a Data Pipeline?

How DeNA Co., Ltd. accelerated anonymized data quality tests up to 100 times faster using Amazon Redshift Serverless and dbt

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

Stay Connected

Use fuzzy string matching to approximate duplicate records in Amazon Redshift