Data Lake, Data Quality and Management

Drug Launch Case Study: Amazing Efficiency Using DataOps

DataKitchen

DECEMBER 9, 2024

data engineers delivered over 100 lines of code and 1.5 data quality tests every day to support a cast of analysts and customers. The company focused on delivering small increments of customer value data sets, reports, and other items as their guiding principle.

Data Quality

Data Quality Data Lake Testing Statistics

Why Your Data Lake Needs Bad Data

David Menninger's Analyst Perspectives

MAY 13, 2021

Everyone talks about data quality, as they should. Our research shows that improving the quality of information is the top benefit of data preparation activities. Data quality efforts are focused on clean data. Yes, clean data is important. but so is bad data.

Data Lake

Data Lake Data Quality Data Governance Management

Talend Data Fabric Simplifies Data Life Cycle Management

David Menninger's Analyst Perspectives

NOVEMBER 16, 2021

Talend is a data integration and management software company that offers applications for cloud computing, big data integration, application integration, data quality and master data management.

Management

Management Data Warehouse Data Quality Data Integration

Webinars

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

MORE WEBINARS

Data Management on Display at Informatica World 2019

David Menninger's Analyst Perspectives

JUNE 12, 2019

Under that focus, Informatica's conference emphasized capabilities across six areas (all strong areas for Informatica): data integration, data management, data quality & governance, Master Data Management (MDM), data cataloging, and data security.

Management

Management Data Quality Data Integration Data Lake

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

This integration enables data teams to efficiently transform and manage data using Athena with dbt Cloud’s robust features, enhancing the overall data workflow experience. This enables you to extract insights from your data without the complexity of managing infrastructure.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Data Quality Power Moves: Scorecards & Data Checks for Organizational Impact

DataKitchen

SEPTEMBER 18, 2024

A DataOps Approach to Data Quality The Growing Complexity of Data Quality Data quality issues are widespread, affecting organizations across industries, from manufacturing to healthcare and financial services. 73% of data practitioners do not trust their data (IDC).

Scorecard

Scorecard Data Quality Measurement Testing

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. Manage catalog commit conflicts Catalog commit conflicts are relatively straightforward to handle through table properties.

Snapshot

Snapshot Management Metadata Big Data

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Yet, despite growing investments in advanced analytics and AI, organizations continue to grapple with a persistent and often underestimated challenge: poor data quality.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Ask questions in plain English to find the right datasets, automatically generate SQL queries, or create data pipelines without writing code. This innovation drives an important change: you’ll no longer have to copy or move data between data lake and data warehouses. Having confidence in your data is key.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Data Lakes on Cloud & it’s Usage in Healthcare

BizAcuity

MARCH 29, 2019

Data lakes are centralized repositories that can store all structured and unstructured data at any desired scale. The power of the data lake lies in the fact that it often is a cost-effective way to store data. The power of the data lake lies in the fact that it often is a cost-effective way to store data.

Data Lake

Data Lake Unstructured Data Cost-Benefit Data Quality

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. Business units access clean, standardized data.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

AWS Big Data

OCTOBER 9, 2024

Today, customers are embarking on data modernization programs by migrating on-premises data warehouses and data lakes to the AWS Cloud to take advantage of the scale and advanced analytical capabilities of the cloud. Some customers build custom in-house data parity frameworks to validate data during migration.

Data Quality

Data Quality Data Lake Data Warehouse Metrics

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning.

Data Quality

Data Quality Statistics Data Lake Visualization

Informatica’s new data management clouds target health, finance services

CIO Business Intelligence

MAY 24, 2022

Just after launching a focused data management platform for retail customers in March, enterprise data management vendor Informatica has now released two more industry-specific versions of its Intelligent Data Management Cloud (IDMC) — one for financial services, and the other for health and life sciences.

Finance

Finance Management Metadata Machine Learning

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Steps taken to build Sevita’s first enterprise data platform

CIO Business Intelligence

OCTOBER 23, 2024

But more than anything, the data platform is putting decision-making tools in the hands of our business so people can better manage their operations. How would you categorize the change management that needed to happen to build a new enterprise data platform? We thought about change in two ways.

Enterprise

Enterprise Dashboards KPI Data Lake

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Fire Your Super-Smart Data Consultants with DataOps

DataKitchen

JANUARY 25, 2022

When internal resources fall short, companies outsource data engineering and analytics. There’s no shortage of consultants who will promise to manage the end-to-end lifecycle of data from integration to transformation to visualization. . The challenge is that data engineering and analytics are incredibly complex.

Consulting

Consulting Testing Data Lake Data Quality

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Get started with AWS Glue Data Quality dynamic rules for ETL pipelines

AWS Big Data

MAY 23, 2024

They establish data quality rules to ensure the extracted data is of high quality for accurate business decisions. These rules assess the data based on fixed criteria reflecting current business states. We are excited to talk about how to use dynamic rules , a new capability of AWS Glue Data Quality.

Data Quality

Data Quality Metrics Data Lake Sales

Measure performance of AWS Glue Data Quality for ETL pipelines

AWS Big Data

MARCH 12, 2024

In recent years, data lakes have become a mainstream architecture, and data quality validation is a critical factor to improve the reusability and consistency of the data. In this post, we provide benchmark results of running increasingly complex data quality rulesets over a predefined test dataset.

Data Quality

Data Quality Measurement Testing Visualization

Decentralize LF-tag management with AWS Lake Formation

AWS Big Data

NOVEMBER 16, 2023

In today’s data-driven world, organizations face unprecedented challenges in managing and extracting valuable insights from their ever-expanding data ecosystems. As the number of data assets and users grow, the traditional approaches to data management and governance are no longer sufficient.

Management

Management Data Lake Sales Machine Learning

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

AWS Glue Data Quality allows you to measure and monitor the quality of data in your data repositories. It’s important for business users to be able to see quality scores and metrics to make confident business decisions and debug data quality issues. An AWS Glue crawler crawls the results.

Data Quality

Data Quality Metrics Visualization Dashboards

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. With the addition of these technologies alongside existing systems like terminal operating systems (TOS) and SAP, the number of data producers has grown substantially. datazone_env_twinsimsilverdata"."cycle_end";')

IoT

IoT Machine Learning Metadata Data-driven

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Recognizing this paradigm shift, ANZ Institutional Division has embarked on a transformative journey to redefine its approach to data management, utilization, and extracting significant business value from data insights.

Metadata

Metadata Data Governance Data Quality Data-driven

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake. What’s in a Data Lake? Data warehouses do a great job of standardizing data from disparate sources for analysis. Taking a Dip.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

3 things to get right with data management for gen AI projects

CIO Business Intelligence

OCTOBER 2, 2024

According to Kari Briski, VP of AI models, software, and services at Nvidia, successfully implementing gen AI hinges on effective data management and evaluating how different models work together to serve a specific use case. Data management, when done poorly, results in both diminished returns and extra costs.

Management

Management Data Governance Cost-Benefit Structured Data

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

They’re comparatively expensive and can’t handle big data analytics. However, they do contain effective data management, organization, and integrity capabilities. As a result, users can easily find what they need, and organizations avoid the operational and cost burdens of storing unneeded or duplicate data copies.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Poor-quality data can lead to incorrect insights, bad decisions, and lost opportunities. AWS Glue Data Quality measures and monitors the quality of your dataset. It supports both data quality at rest and data quality in AWS Glue extract, transform, and load (ETL) pipelines.

Data Quality

Data Quality Data Lake Visualization Data-driven

Analyzing the business-case approach Perdue Farms takes to derive value from data

CIO Business Intelligence

SEPTEMBER 20, 2023

On the agribusiness side we source, purchase, and process agricultural commodities and offer a diverse portfolio of products including grains, soybean meal, blended feed ingredients, and top-quality oils for the food industry to add value to the commodities our customers desire. The data can also help us enrich our commodity products.

Data Lake

Data Lake Data-driven Dashboards Risk

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

AWS Big Data

JUNE 6, 2023

You can use AWS Glue to create, run, and monitor data integration and ETL (extract, transform, and load) pipelines and catalog your assets across multiple data stores. Hundreds of thousands of customers use data lakes for analytics and ML to make data-driven business decisions.

Data Quality

Data Quality Data-driven Data Lake Metrics

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

Open table formats are emerging in the rapidly evolving domain of big data management, fundamentally altering the landscape of data storage and analysis. By providing a standardized framework for data representation, open table formats break down data silos, enhance data quality, and accelerate analytics at scale.

Snapshot

Snapshot Metadata Data Lake Optimization

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

The essential check list for effective data democratization

CIO Business Intelligence

JANUARY 20, 2023

“All of a sudden, you’re trying to give this data to somebody who’s not a data person,” he says, “and it’s really easy for them to draw erroneous or misleading insights from that data.” As more companies use the cloud and cloud-native development, normalizing data has become more complicated.

Data Lake

Data Lake Data-driven Finance Data Architecture

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Poor data quality is one of the top barriers faced by organizations aspiring to be more data-driven. Ill-timed business decisions and misinformed business processes, missed revenue opportunities, failed business initiatives and complex data systems can all stem from data quality issues.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

In this post, we explore how Bluestone uses AWS services, notably the cloud data warehousing service Amazon Redshift , to implement a cutting-edge data mesh architecture, revolutionizing the way they manage, access, and utilize their data assets. This enables data-driven decision-making across the organization.

Data-driven

Data-driven Data Lake Data Quality Data Governance

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

AWS Big Data

DECEMBER 9, 2024

As organizations process vast amounts of data, maintaining an accurate historical record is crucial. History management in data systems is fundamental for compliance, business intelligence, data quality, and time-based analysis. Financial systems use it for maintaining accurate transaction and balance histories.

Snapshot

Snapshot Data Warehouse Data Lake Data Quality

Differentiate generative AI applications with your data using AWS analytics and managed databases

AWS Big Data

SEPTEMBER 12, 2024

Alternatively, you might treat them as code and use source code control to manage their evolution over time. Amazon Bedrock is a fully managed service that makes high-performing FMs from leading AI startups and Amazon available through a unified API. The user interaction is stored in a data lake for downstream usage and BI analysis.

Management

Management Analytics Data Lake Interactive

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

When it comes to implementing and managing a successful BI strategy we have always proclaimed: start small, use the right BI tools , and involve your team. To fully utilize agile business analytics, we will go through a basic agile framework in regards to BI implementation and management. Let’s start with the concept.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

However, enterprise data generated from siloed sources combined with the lack of a data integration strategy creates challenges for provisioning the data for generative AI applications. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

What is Dark Data, Why Does it Matter, and Why Are Humans Still Needed?

Timo Elliott

JANUARY 3, 2022

It’s stored in corporate data warehouses, data lakes, and a myriad of other locations – and while some of it is put to good use, it’s estimated that around 73% of this data remains unexplored. Improving data quality. Unexamined and unused data is often of poor quality. Data augmentation.

IT

IT Unstructured Data Data Quality Machine Learning

O’Reilly Releases First Chapters of a New Book about Logical Data Management

Data Virtualization

JANUARY 21, 2025

However, companies are still struggling to manage data effectively, to implement GenAI applications that deliver proven business value. Gartner predicts that by the end of this year, 30%.

Management

Management Data Integration Technology Data Warehouse

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Data fabric and data mesh are emerging data management concepts that are meant to address the organizational change and complexities of understanding, governing and working with enterprise data in a hybrid multicloud ecosystem. The good news is that both data architecture concepts are complimentary.

Management

Management Metadata Data Architecture Data Lake

Drug Launch Case Study: Amazing Efficiency Using DataOps

Why Your Data Lake Needs Bad Data

Webinars

Trending Sources

Talend Data Fabric Simplifies Data Life Cycle Management

Webinars

Data Management on Display at Informatica World 2019

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Data Quality Power Moves: Scorecards & Data Checks for Organizational Impact

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Data’s dark secret: Why poor quality cripples AI and growth

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Data Lakes on Cloud & it’s Usage in Healthcare

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

AWS Glue Data Quality is Generally Available

Informatica’s new data management clouds target health, finance services

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Steps taken to build Sevita’s first enterprise data platform

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Fire Your Super-Smart Data Consultants with DataOps

What is a Data Mesh?

Get started with AWS Glue Data Quality dynamic rules for ETL pipelines

Measure performance of AWS Glue Data Quality for ETL pipelines

Decentralize LF-tag management with AWS Lake Formation

Visualize data quality scores and metrics generated by AWS Glue Data Quality

How EUROGATE established a data mesh architecture using Amazon DataZone

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Data Lakes: What Are They and Who Needs Them?

3 things to get right with data management for gen AI projects

Building a Beautiful Data Lakehouse

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

Analyzing the business-case approach Perdue Farms takes to derive value from data

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

The essential check list for effective data democratization

Data architecture strategy for data quality

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

Differentiate generative AI applications with your data using AWS analytics and managed databases

Accomplish Agile Business Intelligence & Analytics For Your Business

Data governance in the age of generative AI

What is Dark Data, Why Does it Matter, and Why Are Humans Still Needed?

O’Reilly Releases First Chapters of a New Book about Logical Data Management

Augmented data management: Data fabric versus data mesh

Stay Connected