Blog, Data Quality and Data Warehouse

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data.

Data Quality

Data Quality Metrics Data-driven Management

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Amazon SageMaker Lakehouse , now generally available, unifies all your data across Amazon Simple Storage Service (Amazon S3) data lakes and Amazon Redshift data warehouses, helping you build powerful analytics and AI/ML applications on a single copy of data. Having confidence in your data is key.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

AWS Big Data

OCTOBER 9, 2024

Today, customers are embarking on data modernization programs by migrating on-premises data warehouses and data lakes to the AWS Cloud to take advantage of the scale and advanced analytical capabilities of the cloud. Some customers build custom in-house data parity frameworks to validate data during migration.

Data Quality

Data Quality Data Lake Data Warehouse Metrics

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning. It takes days for data engineers to identify and implement data quality rules.

Data Quality

Data Quality Statistics Data Lake Visualization

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Read the complete blog below for a more detailed description of the vendors and their capabilities. This is not surprising given that DataOps enables enterprise data teams to generate significant business value from their data. QuerySurge – Continuously detect data issues in your delivery pipelines. Data breaks.

Testing

Testing Machine Learning Consulting Data Science

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The past decades of enterprise data platform architectures can be summarized in 69 words. First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt. Secure and permissioned – data is protected from unauthorized users.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Cloud Data Warehouse Migration 101: Expert Tips

Alation

JULY 28, 2022

It’s costly and time-consuming to manage on-premises data warehouses — and modern cloud data architectures can deliver business agility and innovation. However, CIOs declare that agility, innovation, security, adopting new capabilities, and time to value — never cost — are the top drivers for cloud data warehousing.

Data Warehouse

Data Warehouse Cost-Benefit Data-driven Data Governance

Your Data Won’t Speak Unless You Ask It The Right Data Analysis Questions

datapine

JANUARY 24, 2021

This can include a multitude of processes, like data profiling, data quality management, or data cleaning, but we will focus on tips and questions to ask when analyzing data to gain the most cost-effective solution for an effective business strategy. 4) How can you ensure data quality?

IT

IT Statistics KPI Data-driven

Take Your SQL Skills To The Next Level With These Popular SQL Books

datapine

SEPTEMBER 27, 2022

The all-encompassing nature of this book makes it a must for a data bookshelf. 18) “The Data Warehouse Toolkit” By Ralph Kimball and Margy Ross. It is a must-read for understanding data warehouse design. The book covers Oracle, Microsoft SQL Server, IBM DB2, MySQL, PostgreSQL, and Microsoft Access.

Business Intelligence

Business Intelligence Data Warehouse Data Processing Data mining

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Poor-quality data can lead to incorrect insights, bad decisions, and lost opportunities. AWS Glue Data Quality measures and monitors the quality of your dataset. It supports both data quality at rest and data quality in AWS Glue extract, transform, and load (ETL) pipelines.

Data Quality

Data Quality Data Lake Visualization Data-driven

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Poor data quality is one of the top barriers faced by organizations aspiring to be more data-driven. Ill-timed business decisions and misinformed business processes, missed revenue opportunities, failed business initiatives and complex data systems can all stem from data quality issues.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

Data in Place refers to the organized structuring and storage of data within a specific storage medium, be it a database, bucket store, files, or other storage platforms. In the contemporary data landscape, data teams commonly utilize data warehouses or lakes to arrange their data into L1, L2, and L3 layers.

Testing

Testing Data Quality Predictive Modeling Metrics

Data Governance and Metadata Management: You Can’t Have One Without the Other

erwin

FEBRUARY 13, 2020

Other benefits of automating data governance and metadata management processes include: Better Data Quality – Identification and repair of data issues and inconsistencies within integrated data sources in real time.

Metadata

Metadata Data Governance Management Cost-Benefit

Common Business Intelligence Challenges Facing Entrepreneurs

datapine

MAY 21, 2019

In addition to increasing the price of deployment, setting up these data warehouses and processors also impacted expensive IT labor resources. These tools can easily merge different data sets on the fly without the need of restructuring databases or setting up a data warehouse. Welcome to the future.

Business Intelligence

Business Intelligence Cost-Benefit Dashboards ROI

Dark Data: How to Find It and What to Do with It

Timo Elliott

JANUARY 6, 2022

Like the proverbial man looking for his keys under the streetlight , when it comes to enterprise data, if you only look at where the light is already shining, you can end up missing a lot. Modern technologies allow the creation of data orchestration pipelines that help pool and aggregate dark data silos. Data sense-making.

IT

IT Metadata Data-driven Data Governance

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This blog post is co-written with Hardeep Randhawa and Abhay Kumar from HPE. The data sources include 150+ files including 10-15 mandatory files per region ingested in various formats like xlxs, csv, and dat. In addition, they use AWS Glue jobs for orchestrating validation jobs and moving data through the data warehouse.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Cloudera

AUGUST 31, 2021

Cloudera and Accenture demonstrate strength in their relationship with an accelerator called the Smart Data Transition Toolkit for migration of legacy data warehouses into Cloudera Data Platform. Accenture’s Smart Data Transition Toolkit . Are you looking for your data warehouse to support the hybrid multi-cloud?

Data Warehouse

Data Warehouse Cost-Benefit Metadata Data-driven

Your Effective Roadmap To Implement A Successful Business Intelligence Strategy

datapine

FEBRUARY 22, 2022

This should also include creating a plan for data storage services. Are the data sources going to remain disparate? Or does building a data warehouse make sense for your organization? Clean data in, clean analytics out. Cleaning your data may not be quite as simple, but it will ensure the success of your BI.

Business Intelligence

Business Intelligence Strategy Cost-Benefit Dashboards

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. This is something that you can learn more about in just about any technology blog. We would like to talk about data visualization and its role in the big data movement.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

7 Benefits of Metadata Management

erwin

FEBRUARY 19, 2021

A strong data management strategy and supporting technology enables the data quality the business requires, including data cataloging (integration of data sets from various sources), mapping, versioning, business rules and glossaries maintenance and metadata management (associations and lineage).

Metadata

Metadata Management Data Quality Cost-Benefit

The Art of Lean Governance: Root Out Waste in Data Reconciliation

TDAN

FEBRUARY 1, 2022

In this blog, we will discuss a common problem for data warehouses that are designed to maintain data quality and provide evidence of accuracy. Without verification, the data can’t be trusted. Enter the mundane, but necessary, task of data reconciliation. Fortunately, it doesn’t have to be.

Data Warehouse

Data Warehouse Data Quality IT Data Governance

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

erwin

JULY 17, 2019

Part Two of the Digital Transformation Journey … In our last blog on driving digital transformation , we explored how enterprise architecture (EA) and business process (BP) modeling are pivotal factors in a viable digital transformation strategy. With automation, data quality is systemically assured.

Digital Transformation

Digital Transformation Strategy Metadata Data-driven

Top 5 Data Catalog Benefits: Understanding Your Organization’s Data Lineage

erwin

AUGUST 7, 2019

A data catalog benefits organizations in a myriad of ways. With the right data catalog tool, organizations can automate enterprise metadata management – including data cataloging, data mapping, data quality and code generation for faster time to value and greater accuracy for data movement and/or deployment projects.

Metadata

Metadata Data Governance Data Quality Data Warehouse

Certified technical partner solutions help customers succeed with Cloudera Data Platform

Cloudera

AUGUST 26, 2020

Gluent’s Smart Connector is capable of pushing processing to Cloudera, thereby reducing the storage and compute footprint on traditional data warehouses like Oracle. This allows our customers to reduce spend on highly specialized hardware and leverage the tools of a modern data warehouse. . Certified Data Quality Partner.

Machine Learning

Machine Learning Big Data Data Warehouse Data-driven

Using Business Intelligence in Demand Forecasting

Jet Global

JULY 11, 2019

Although it’s been around for decades, predictive analytics is becoming more and more mainstream, with growing volumes of data and readily accessible software ripe for transforming. In this blog post, we are going to cover the role of business intelligence in demand forecasting, an area of predictive analytics focused on customer demand.

Business Intelligence

Business Intelligence Forecasting Predictive Analytics Data Warehouse

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

For state and local agencies, data silos create compounding problems: Inaccessible or hard-to-access data creates barriers to data-driven decision making. Legacy data sharing involves proliferating copies of data, creating data management, and security challenges. Forrester ). Gartner ).

Data Architecture

Data Architecture Data Lake Data Warehouse Metadata

Did Big Data Deliver Business Transformation & Improved CX?

Alation

AUGUST 4, 2022

A key challenge of legacy approaches involved data quality. How could you ensure data was valid and accurate, and then follow through on new insights with action? It got people realizing that data is a business tool, and that technologists are the custodians of that data,” points out New Zealand CIO Anthony McMahon.

Big Data

Big Data Digital Transformation Data Lake Data-driven

DataOps with Matillion and DataKitchen

DataKitchen

JANUARY 19, 2022

DataKitchen acts as a process hub that unifies tools and pipelines across teams, tools and data centers. DataKitchen could, for example, provide the scaffolding upon which a Snowflake cloud data platform or data warehouse could be integrated into a heterogeneous data mesh domain.

Testing

Testing Data Integration Data Warehouse Enterprise

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

DECEMBER 15, 2022

With in-place table migration, you can rapidly convert to Iceberg tables since there is no need to regenerate data files. Newly generated metadata will then point to source data files as illustrated in the diagram below. . Data quality using table rollback. Read why the future of data lakehouses is open.

Metadata

Metadata Data Warehouse Snapshot Machine Learning

Automate large-scale data validation using Amazon EMR and Apache Griffin

AWS Big Data

APRIL 4, 2024

Griffin is an open source data quality solution for big data, which supports both batch and streaming mode. In today’s data-driven landscape, where organizations deal with petabytes of data, the need for automated data validation frameworks has become increasingly critical.

Data Quality

Data Quality Data Lake Data Warehouse Data-driven

Data Governance Stock Check: Using Data Governance to Take Stock of Your Data Assets

erwin

MARCH 8, 2019

Therefore, the organization needed to catalog the data it acquires from suppliers, ensure its quality, classify it, and then sell it to customers. The company wanted to assemble the data in a data warehouse and then provide controlled access to it. This, among other safeguards, ensures data quality.

Data Governance

Data Governance Metadata Data Warehouse Data Quality

O’Reilly Releases First Chapters of a New Book about Logical Data Management

Data Virtualization

JANUARY 21, 2025

However, companies are still struggling to manage data effectively, to implement GenAI applications that deliver proven business value. The post OReilly Releases First Chapters of a New Book about Logical Data Management appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Management

Management Data Integration Technology Data Warehouse

The Modern Data Lakehouse: An Architectural Innovation

Cloudera

SEPTEMBER 9, 2022

analyst Sumit Pal, in “Exploring Lakehouse Architecture and Use Cases,” published January 11, 2022: “Data lakehouses integrate and unify the capabilities of data warehouses and data lakes, aiming to support AI, BI, ML, and data engineering on a single platform.” According to Gartner, Inc.

Metadata

Metadata Machine Learning Unstructured Data Data Lake

How Metadata Makes Data Meaningful

erwin

DECEMBER 12, 2019

Here are some benefits of metadata management for data governance use cases: Better Data Quality: Data issues and inconsistencies within integrated data sources or targets are identified in real time to improve overall data quality by increasing time to insights and/or repair. by up to 70 percent.

Metadata

Metadata Data Governance Digital Transformation Data Quality

Salesforce and the (single source of) Truth about Customer 360

Andrew White

DECEMBER 4, 2019

If you read my blog regularly then you know I rarely write about IT vendors. The only time I have blogged about vendors was to comment on their messages or call out an interesting and contrary observation. This acquisition followed another with Mulesoft, a data integration vendor. That’s the way it is.

Digital Transformation

Digital Transformation Data Quality Data Integration Data Warehouse

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

The consumption of the data should be supported through an elastic delivery layer that aligns with demand, but also provides the flexibility to present the data in a physical format that aligns with the analytic application, ranging from the more traditional data warehouse view to a graph view in support of relationship analysis.

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Since its uniquely metadata-driven, the abstraction layer of a data fabric makes it easier to model, integrate and query any data sources, build data pipelines, and integrate data in real-time. This improves data engineering productivity and time-to-value for data consumers. What’s a data mesh?

Management

Management Metadata Data Architecture Data Lake

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

AWS Big Data

FEBRUARY 8, 2023

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. Amazon Redshift enables you to run complex SQL analytics at scale and performance on terabytes to petabytes of structured and unstructured data, and make the insights widely available through popular business intelligence (BI) and analytics tools.

Data Quality

Data Quality Testing Data Warehouse Unstructured Data

Digital Transformation in Municipal Government: The Hidden Force Powering Smart Cities

erwin

FEBRUARY 28, 2019

When you have the ability to understand all of the information related to a piece of data, you have more confidence in how it is analyzed, used and protected. Data governance doesn’t take place at a single application or in the data warehouse.

Digital Transformation

Digital Transformation Data Governance Data-driven Data Warehouse

What is an open data lakehouse and why you should care?

IBM Big Data Hub

JANUARY 17, 2023

A data lakehouse is an emerging data management architecture that improves efficiency and converges data warehouse and data lake capabilities driven by a need to improve efficiency and obtain critical insights faster. Let’s start with why data lakehouses are becoming increasingly important.

Data Lake

Data Lake Metadata Data Warehouse Data Governance

Using DataOps to Drive Agility and Business Value

DataKitchen

JUNE 24, 2021

Previously we would have a very laborious data warehouse or data mart initiative and it may take a very long time and have a large price tag. Automate the data collection and cleansing process. Jim Tyo added that in the financial services world, agility is critical. Take a show-me approach.

Metrics

Metrics ROI Measurement Cost-Benefit

Top 10 Reasons for Alation with Snowflake: Reduce Risk with Active Data Governance

Alation

SEPTEMBER 7, 2021

This is the last of the 4-part blog series. In the previous blog , we discussed how Alation provides a platform for data scientists and analysts to complete projects and analysis at speed. In this blog we will discuss how Alation helps minimize risk with active data governance. Find Trusted Data.

Data Governance

Data Governance Risk Data Quality Dashboards

Cloudera Data Engineering – Integration steps to leverage spark on Kubernetes

Cloudera

APRIL 14, 2021

Refer to the following cloudera blog to understand the full potential of Cloudera Data Engineering. . Precisely Data Integration, Change Data Capture and Data Quality tools support CDP Public Cloud as well as CDP Private Cloud. Why should technology partners care about CDE? References: [link].

Data Warehouse

Data Warehouse Data Processing Machine Learning Data Quality

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Webinars

Trending Sources

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

Webinars

AWS Glue Data Quality is Generally Available

The DataOps Vendor Landscape, 2021

What is a Data Mesh?

Cloud Data Warehouse Migration 101: Expert Tips

Your Data Won’t Speak Unless You Ask It The Right Data Analysis Questions

Take Your SQL Skills To The Next Level With These Popular SQL Books

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

Data architecture strategy for data quality

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

Data Governance and Metadata Management: You Can’t Have One Without the Other

Common Business Intelligence Challenges Facing Entrepreneurs

Dark Data: How to Find It and What to Do with It

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Your Effective Roadmap To Implement A Successful Business Intelligence Strategy

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Biggest Trends in Data Visualization Taking Shape in 2022

7 Benefits of Metadata Management

The Art of Lean Governance: Root Out Waste in Data Reconciliation

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

Top 5 Data Catalog Benefits: Understanding Your Organization’s Data Lineage

Certified technical partner solutions help customers succeed with Cloudera Data Platform

Using Business Intelligence in Demand Forecasting

Breaking State and Local Data Silos with Modern Data Architectures

Did Big Data Deliver Business Transformation & Improved CX?

DataOps with Matillion and DataKitchen

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Automate large-scale data validation using Amazon EMR and Apache Griffin

Data Governance Stock Check: Using Data Governance to Take Stock of Your Data Assets

O’Reilly Releases First Chapters of a New Book about Logical Data Management

The Modern Data Lakehouse: An Architectural Innovation

How Metadata Makes Data Meaningful

Salesforce and the (single source of) Truth about Customer 360

Demystifying Modern Data Platforms

Augmented data management: Data fabric versus data mesh

­­Use fuzzy string matching to approximate duplicate records in Amazon Redshift

Digital Transformation in Municipal Government: The Hidden Force Powering Smart Cities

What is an open data lakehouse and why you should care?

Using DataOps to Drive Agility and Business Value

Top 10 Reasons for Alation with Snowflake: Reduce Risk with Active Data Governance

Cloudera Data Engineering – Integration steps to leverage spark on Kubernetes

Stay Connected

Use fuzzy string matching to approximate duplicate records in Amazon Redshift