Data Lake, Data Quality and Management

Data Lake

Data Quality

Management

Accelerate your data quality journey for lakehouse architecture with Amazon SageMaker, Apache Iceberg on AWS, Amazon S3 tables, and AWS Glue Data Quality

AWS Big Data

JULY 28, 2025

High-quality data is essential for building trust in analytics, enhancing the performance of machine learning (ML) models, and supporting strategic business initiatives. By using AWS Glue Data Quality , you can measure and monitor the quality of your data. py create_s3_table_on_s3_bucket.py

Data Quality

Data Quality Data Lake Data Architecture Visualization

Drug Launch Case Study: Amazing Efficiency Using DataOps

DataKitchen

DECEMBER 9, 2024

data engineers delivered over 100 lines of code and 1.5 data quality tests every day to support a cast of analysts and customers. The company focused on delivering small increments of customer value data sets, reports, and other items as their guiding principle.

Data Quality

Data Quality Data Lake Testing Statistics

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Streamline Payment Applications & Lien Waivers Through Innovative Construction Technology

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

This integration enables data teams to efficiently transform and manage data using Athena with dbt Cloud’s robust features, enhancing the overall data workflow experience. This enables you to extract insights from your data without the complexity of managing infrastructure.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Webinars

How to Streamline Payment Applications & Lien Waivers Through Innovative Construction Technology

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Ask questions in plain English to find the right datasets, automatically generate SQL queries, or create data pipelines without writing code. This innovation drives an important change: you’ll no longer have to copy or move data between data lake and data warehouses. Having confidence in your data is key.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. Manage catalog commit conflicts Catalog commit conflicts are relatively straightforward to handle through table properties.

Snapshot

Snapshot Management Metadata Big Data

Building End-to-End Data Pipelines: From Data Ingestion to Analysis

KDnuggets

JULY 15, 2025

But lets be honest: creating a reliable, scalable, and maintainable data pipeline is not an easy task. Whether its integrating multiple data sources, managing data transfers, or simply ensuring timely reporting, each component presents its own challenges. It may also be sent directly to dashboards, APIs, or ML models.

Data Science

Data Science Machine Learning Data Warehouse Data-driven

Bridging the AI Execution Gap: Why Strong Data Foundations Make or Break Enterprise AI

Jen Stirrup

JULY 12, 2025

According to the MIT Technology Review's 2024 Data Integration Survey, organizations with highly fragmented data environments spend up to 67% of their data scientists' time on data collection and preparation rather than on developing and refining AI models. million annually.

Enterprise

Enterprise Data Quality Data Governance Business Objectives

6 data risks CIOs should be paranoid about

CIO Business Intelligence

JULY 8, 2025

To address this gap and ensure the data supply chain receives enough top-level attention, CIOs have hired or partnered with chief data officers, entrusting them to address the data debt , automate data pipelines , and transform to a proactive data governance model focusing on health metrics, data quality , and data model interoperability. [

Risk

Risk Data Quality Data Governance Unstructured Data

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Yet, despite growing investments in advanced analytics and AI, organizations continue to grapple with a persistent and often underestimated challenge: poor data quality.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. With the addition of these technologies alongside existing systems like terminal operating systems (TOS) and SAP, the number of data producers has grown substantially. datazone_env_twinsimsilverdata"."cycle_end";')

IoT

IoT Machine Learning Metadata Data-driven

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Steps taken to build Sevita’s first enterprise data platform

CIO Business Intelligence

OCTOBER 23, 2024

But more than anything, the data platform is putting decision-making tools in the hands of our business so people can better manage their operations. How would you categorize the change management that needed to happen to build a new enterprise data platform? We thought about change in two ways.

Enterprise

Enterprise Dashboards KPI Data Lake

Scaling Data Reliability: The Definitive Guide to Test Coverage for Data Engineers

DataKitchen

JULY 8, 2025

The Dual Challenge of Production and Development Testing Test coverage in data and analytics operates across two distinct but interconnected dimensions: production testing and development testing. Production test coverage ensures that data quality remains high and error rates remain low throughout the value pipeline during live operations.

Testing

Testing Data Quality Cost-Benefit Manufacturing

Accelerate queries on Apache Iceberg tables through AWS Glue auto compaction

AWS Big Data

DECEMBER 19, 2024

Data lakes were originally designed to store large volumes of raw, unstructured, or semi-structured data at a low cost, primarily serving big data and analytics use cases. By using features like Icebergs compaction, OTFs streamline maintenance, making it straightforward to manage object and metadata versioning at scale.

Data Lake

Data Lake IoT Metadata Testing

Go vs. Python for Modern Data Workflows: Need Help Deciding?

KDnuggets

JUNE 19, 2025

This readability becomes valuable when collaborating with domain experts who need to understand and validate your data transformations. Real-world data projects often involve integrating multiple data sources, handling different formats, and dealing with inconsistent data quality.

Experimentation

Experimentation Machine Learning Data Science Statistics

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

Open table formats are emerging in the rapidly evolving domain of big data management, fundamentally altering the landscape of data storage and analysis. By providing a standardized framework for data representation, open table formats break down data silos, enhance data quality, and accelerate analytics at scale.

Snapshot

Snapshot Metadata Data Lake Optimization

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

AWS Big Data

DECEMBER 9, 2024

As organizations process vast amounts of data, maintaining an accurate historical record is crucial. History management in data systems is fundamental for compliance, business intelligence, data quality, and time-based analysis. Financial systems use it for maintaining accurate transaction and balance histories.

Snapshot

Snapshot Data Warehouse Data Lake Data Quality

Compaction support for Avro and ORC file formats in Apache Iceberg tables in Amazon S3

AWS Big Data

JULY 15, 2025

Apache Iceberg, a high-performance open table format (OTF), has gained widespread adoption among organizations managing large scale analytic tables and data volumes. Parquet is one of the most common and fastest growing data types in Amazon S3. An EC2 instance c5.xlarge For more information, see Get started with Amazon EC2.

Optimization

Optimization Data Lake Cost-Benefit IoT

Hertz adopts AI for fleet and workforce management

CIO Business Intelligence

JULY 7, 2025

To streamline an operation with so many moving parts, the company has deployed Hertz Connected Fleet OS, an AI-enabled operating system for fleet management. He also didn’t ask for extra funding to put data architecture and data governance in place. The trick for us was don’t try to perfect all of it,” he said.

Management

Management Data Architecture Data Governance Data Lake

How Stifel built a modern data platform using AWS Glue and an event-driven domain architecture

AWS Big Data

JULY 7, 2025

These challenges are encountered by financial institutions worldwide, leading to a reassessment of traditional data management practices. EventBridge supports custom event buses for domain-specific events, enabling clear separation of concerns and improved manageability.

Data-driven

Data-driven Metadata Digital Transformation Data Lake

Automate data lineage in Amazon SageMaker using AWS Glue Crawlers supported data sources

AWS Big Data

JULY 30, 2025

With data lineage captured at the table, column, and job level, data producers can conduct impact analysis of changes in their data pipelines and respond to data issues when needed, for example, when a column in the resulting dataset is missing the quality required by the business.

Metadata

Metadata Visualization Reporting Analytics

Realizing ocean data democratization: Furuno Electric’s initiatives using Amazon DataZone

AWS Big Data

JULY 10, 2025

Under the company motto of “making the invisible visible”, they’ve have expanded their business centered on marine sensing technology and are now extending into subscription-based data businesses using Internet of Things (IoT) data. Integrated risk management was also difficult.

Manufacturing

Manufacturing IoT Data Lake Digital Transformation

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

AWS Big Data

FEBRUARY 13, 2025

This plane drives users to engage in data-driven conversations with knowledge and insights shared across the organization. Through the product experience plane, data product owners can use automated workflows to capture data lineage and data quality metrics and oversee access controls.

Data Analytics

Data Analytics Analytics Modeling Data-driven

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

AWS Big Data

DECEMBER 9, 2024

Given the importance of data in the world today, organizations face the dual challenges of managing large-scale, continuously incoming data while vetting its quality and reliability. One of its key features is the ability to manage data using branches.

Data Quality

Data Quality Publishing Snapshot Data Lake

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

However, many companies today still struggle to effectively harness and use their data due to challenges such as data silos, lack of discoverability, poor data quality, and a lack of data literacy and analytical capabilities to quickly access and use data across the organization.

Data Governance

Data Governance Publishing Data-driven Metadata

Beyond pilots: How successful enterprises move from AI experiments to scalable transformation

CIO Business Intelligence

AUGUST 11, 2025

Technically, things fall apart when: Data quality doesn’t scale. But in the real world, enterprise data is fragmented, stale and riddled with missing metadata. It takes more than data scientists to scale AI. They build strong data and tech foundations You can’t scale AI on broken plumbing.

Enterprise

Enterprise Experimentation Metadata Strategy

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Recognizing this paradigm shift, ANZ Institutional Division has embarked on a transformative journey to redefine its approach to data management, utilization, and extracting significant business value from data insights.

Metadata

Metadata Data Governance Data Quality Data-driven

CIOs must mind their own data confidence gap

CIO Business Intelligence

APRIL 17, 2025

Moreover, 68% of vice presidents in charge of AI or data management already see their companies making decisions based on bad data all or most of the time, versus 47% of C-level IT leaders. That emphasis can erode an organizations data foundation over time.

Data-driven

Data-driven Data Quality Snapshot Dashboards

Struggling to meet AI’s data demands? Start with an AI-ready data infrastructure

CIO Business Intelligence

JUNE 3, 2025

As the world embraces artificial intelligence (AI), data has emerged as the most critical asset in driving innovation and efficiency. But true AI readiness starts with data readiness. The AI Data Lake Solution also supported the pathology AI model to cut diagnosis and report generation to just 15 seconds, he says.

Data Lake

Data Lake Digital Transformation Data-driven Sales

O’Reilly Releases First Chapters of a New Book about Logical Data Management

Data Virtualization

JANUARY 21, 2025

However, companies are still struggling to manage data effectively, to implement GenAI applications that deliver proven business value. Gartner predicts that by the end of this year, 30%.

Management

Management Data Integration Technology Data Warehouse

Build a Data Mesh Architecture Using Teradata VantageCloud on AWS

Teradata

MAY 30, 2025

May 30, 2025 6 min read Doug Mbaya Jimmy Hayes In this article, we'll explore how to build a data mesh architecture using Teradata VantageCloud Lake as the core data platform on Amazon Web Services (AWS). This emphasis on simplicity and ease of use in workload management simplifies operations and minimizes complexity.

Machine Learning

Machine Learning Cost-Benefit Experimentation Interactive

Data, agents and governance: Why enterprise architecture needs a new playbook

CIO Business Intelligence

MAY 14, 2025

The EA function (usually managed by IT) has not only struggled to adapt to outcome-driven business dynamics but has also unwittingly created its own existential crisis in the 21st-century enterprise. AI initiatives often need centralized data lakes, while domain-driven models emphasize decentralized ownership.

Enterprise

Enterprise Data Architecture Data-driven Data Quality

How CIOs are getting data right for AI

CIO Business Intelligence

JUNE 16, 2025

It’s important to build that architecture and infrastructure — to understand the data source, to generate the data, and to build a single data platform,” Jayadev says. A decade and more ago when big data burst onto the scene, data lakes emerged to accommodate unstructured data as a source of analytic insights.

Data Warehouse

Data Warehouse Data Quality Data Lake Unstructured Data

Beyond the lakehouse: Architecting the open, interoperable data cloud for AI

CIO Business Intelligence

MAY 29, 2025

AI in the enterprise has become a strategic imperative for every organization, but for it to be truly effective, CIOs need to manage the data layer in a way that can support the evolutionary breakthroughs in large language models and frameworks. These issues are resolved by the current lakehouse evolution. Modern unified catalogs (e.g.,

Metadata

Metadata Contextual Data Cost-Benefit Unstructured Data

Your data’s wasted without predictive AI. Here’s how to fix that

CIO Business Intelligence

MAY 6, 2025

There are several consistent patterns Ive observed across transformation programs, and they often fall into one of four categories: data quality, data silos, governance gaps and cloud cost sprawl. Whats worse, poor quality undermines trust, and once thats gone, its hard to win back stakeholders.

Prescriptive Analytics

Prescriptive Analytics Predictive Analytics Descriptive Analytics ROI

Enhance Trino Performance With Simba’s Powerful Connectivity

Jet Global

JANUARY 30, 2025

Its distributed architecture empowers organizations to query massive datasets across databases, data lakes, and cloud platforms with speed and reliability. Optimizing coordinators and workers ensures efficient query management, while intelligent load balancing prevents performance bottlenecks.

Data Lake

Data Lake Data-driven Optimization Enterprise

From legacy to lakehouse: Centralizing insurance data with Delta Lake

CIO Business Intelligence

APRIL 23, 2025

Many still rely on legacy platforms , such as on-premises warehouses or siloed data systems. These environments often consist of multiple disconnected systems, each managing distinct functions policy administration, claims processing, billing and customer relationship management all generating exponentially growing data as businesses scale.

Insurance

Insurance Digital Transformation Data Quality Data Lake

Data mesh: The secret ingredient in enterprise AI success

CIO Business Intelligence

JUNE 4, 2025

However, the underlying data sources remain distinct and can therefore be managed in whichever way is most appropriate on a case-by-case basis. Data mesh solves the challenge of forcing all of an organizations data into a single, inflexible location. The data is already cataloged and available through the data mesh.

Enterprise

Enterprise Data Quality Data Lake Sales

Prioritizing AI investments: Balancing short-term gains with long-term vision

CIO Business Intelligence

FEBRUARY 18, 2025

Lets follow that journey from the ground up and look at positioning AI in the modern enterprise in manageable, prioritized chunks of capabilities and incremental investment. Start with data as an AI foundation Data quality is the first and most critical investment priority for any viable enterprise AI strategy.

Machine Learning

Machine Learning Data Quality Enterprise Sales

How onsemi leveraged their data to successfully redefine AI

CIO Business Intelligence

JULY 30, 2025

However, the results were initially challenging—accuracy rates started at just 55%—but through focused data quality improvements, including humans in the loop, and enhanced search capabilities, the system now delivers 80-90% accuracy on technical responses.

Sales

Sales Data Lake Data Quality Digital Transformation

Transforming customer experience with AI at Alorica

CIO Business Intelligence

APRIL 16, 2025

Customer service agents are paid for their time on the phone, so we carefully measure first call resolution and time tracking to SLA management. We used to need structured data because our machine learning models expected field-level information. What matters is the data is ingestible and has longevity.

ROI

ROI Measurement Testing Data Lake

Is Your Data Catalog Ready for the AI Age?

BI-Survey

FEBRUARY 27, 2025

First, data catalog vendors have been integrating ML algorithms for years to automate tasks such as tagging and data classification, reducing manual effort and improving metadata management. AI Model Governance As laid out earlier, the scope of data governance is expanding as AI governance has become an additional requirement.

Unstructured Data

Unstructured Data Metadata Data Quality Data Governance

Why Your Data Lake Needs Bad Data

David Menninger's Analyst Perspectives

MAY 13, 2021

Everyone talks about data quality, as they should. Our research shows that improving the quality of information is the top benefit of data preparation activities. Data quality efforts are focused on clean data. Yes, clean data is important. but so is bad data.

Data Lake

Data Lake Data Quality Data Governance Management

Talend Data Fabric Simplifies Data Life Cycle Management

David Menninger's Analyst Perspectives

NOVEMBER 16, 2021

Talend is a data integration and management software company that offers applications for cloud computing, big data integration, application integration, data quality and master data management.

Management

Management Data Warehouse Data Quality Data Integration

Accelerate your data quality journey for lakehouse architecture with Amazon SageMaker, Apache Iceberg on AWS, Amazon S3 tables, and AWS Glue Data Quality

Drug Launch Case Study: Amazing Efficiency Using DataOps

Webinars

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Building End-to-End Data Pipelines: From Data Ingestion to Analysis

Bridging the AI Execution Gap: Why Strong Data Foundations Make or Break Enterprise AI

6 data risks CIOs should be paranoid about

Data’s dark secret: Why poor quality cripples AI and growth

How EUROGATE established a data mesh architecture using Amazon DataZone

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Steps taken to build Sevita’s first enterprise data platform

Scaling Data Reliability: The Definitive Guide to Test Coverage for Data Engineers

Accelerate queries on Apache Iceberg tables through AWS Glue auto compaction

Go vs. Python for Modern Data Workflows: Need Help Deciding?

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

Compaction support for Avro and ORC file formats in Apache Iceberg tables in Amazon S3

Hertz adopts AI for fleet and workforce management

How Stifel built a modern data platform using AWS Glue and an event-driven domain architecture

Automate data lineage in Amazon SageMaker using AWS Glue Crawlers supported data sources

Realizing ocean data democratization: Furuno Electric’s initiatives using Amazon DataZone

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

HEMA accelerates their data governance journey with Amazon DataZone

Beyond pilots: How successful enterprises move from AI experiments to scalable transformation

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

CIOs must mind their own data confidence gap

Struggling to meet AI’s data demands? Start with an AI-ready data infrastructure

O’Reilly Releases First Chapters of a New Book about Logical Data Management

Build a Data Mesh Architecture Using Teradata VantageCloud on AWS

Data, agents and governance: Why enterprise architecture needs a new playbook

How CIOs are getting data right for AI

Beyond the lakehouse: Architecting the open, interoperable data cloud for AI

Your data’s wasted without predictive AI. Here’s how to fix that

Enhance Trino Performance With Simba’s Powerful Connectivity

From legacy to lakehouse: Centralizing insurance data with Delta Lake

Data mesh: The secret ingredient in enterprise AI success

Prioritizing AI investments: Balancing short-term gains with long-term vision

How onsemi leveraged their data to successfully redefine AI

Transforming customer experience with AI at Alorica

Is Your Data Catalog Ready for the AI Age?

Why Your Data Lake Needs Bad Data

Talend Data Fabric Simplifies Data Life Cycle Management

Stay Connected