Data Lake, Data Quality and IT - Data Leaders Brief

Drug Launch Case Study: Amazing Efficiency Using DataOps

DataKitchen

DECEMBER 9, 2024

data engineers delivered over 100 lines of code and 1.5 data quality tests every day to support a cast of analysts and customers. They opted for Snowflake, a cloud-native data platform ideal for SQL-based analysis. It is necessary to have more than a data lake and a database.

Data Quality

Data Quality Data Lake Testing Statistics

Data Quality Power Moves: Scorecards & Data Checks for Organizational Impact

DataKitchen

SEPTEMBER 18, 2024

A DataOps Approach to Data Quality The Growing Complexity of Data Quality Data quality issues are widespread, affecting organizations across industries, from manufacturing to healthcare and financial services. 73% of data practitioners do not trust their data (IDC).

Scorecard

Scorecard Data Quality Measurement Testing

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. Using Athena and the dbt adapter, you can transform raw data in Amazon S3 into well-structured tables suitable for analytics.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Yet, despite growing investments in advanced analytics and AI, organizations continue to grapple with a persistent and often underestimated challenge: poor data quality.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

They’re taking data they’ve historically used for analytics or business reporting and putting it to work in machine learning (ML) models and AI-powered applications. This innovation drives an important change: you’ll no longer have to copy or move data between data lake and data warehouses.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Data Lakes on Cloud & it’s Usage in Healthcare

BizAcuity

MARCH 29, 2019

Data lakes are centralized repositories that can store all structured and unstructured data at any desired scale. The power of the data lake lies in the fact that it often is a cost-effective way to store data. The power of the data lake lies in the fact that it often is a cost-effective way to store data.

Data Lake

Data Lake Unstructured Data Cost-Benefit Data Quality

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

AWS Big Data

OCTOBER 9, 2024

Today, customers are embarking on data modernization programs by migrating on-premises data warehouses and data lakes to the AWS Cloud to take advantage of the scale and advanced analytical capabilities of the cloud. Data parity can help build confidence and trust with business users on the quality of migrated data.

Data Quality

Data Quality Data Lake Data Warehouse Metrics

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. Amazon DataZone natively supports data sharing for Amazon Redshift data assets.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning. It takes days for data engineers to identify and implement data quality rules.

Data Quality

Data Quality Statistics Data Lake Visualization

Talend Data Fabric Simplifies Data Life Cycle Management

David Menninger's Analyst Perspectives

NOVEMBER 16, 2021

Talend is a data integration and management software company that offers applications for cloud computing, big data integration, application integration, data quality and master data management. Its code generation architecture uses a visual interface to create Java or SQL code.

Management

Management Data Warehouse Data Quality Data Integration

Get started with AWS Glue Data Quality dynamic rules for ETL pipelines

AWS Big Data

MAY 23, 2024

They establish data quality rules to ensure the extracted data is of high quality for accurate business decisions. These rules assess the data based on fixed criteria reflecting current business states. We are excited to talk about how to use dynamic rules , a new capability of AWS Glue Data Quality.

Data Quality

Data Quality Metrics Data Lake Sales

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Measure performance of AWS Glue Data Quality for ETL pipelines

AWS Big Data

MARCH 12, 2024

In recent years, data lakes have become a mainstream architecture, and data quality validation is a critical factor to improve the reusability and consistency of the data. In this post, we provide benchmark results of running increasingly complex data quality rulesets over a predefined test dataset.

Data Quality

Data Quality Measurement Testing Visualization

Fire Your Super-Smart Data Consultants with DataOps

DataKitchen

JANUARY 25, 2022

When internal resources fall short, companies outsource data engineering and analytics. Large enterprises integrate hundreds or thousands of asynchronous data sources into a web of pipelines that flow into visualizations and purpose-built databases that support self-service analysis. Here is where the loss of control begins.

Consulting

Consulting Testing Data Lake Data Quality

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Steps taken to build Sevita’s first enterprise data platform

CIO Business Intelligence

OCTOBER 23, 2024

You ’re building an enterprise data platform for the first time in Sevita’s history. Our legacy architecture consisted of multiple standalone, on-prem data marts intended to integrate transactional data from roughly 30 electronic health record systems to deliver a reporting capability. What’s driving this investment?

Enterprise

Enterprise Dashboards KPI Data Lake

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt. Second-generation – gigantic, complex data lake maintained by a specialized team drowning in technical debt. The problem is not “you.”

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

AWS Glue Data Quality allows you to measure and monitor the quality of data in your data repositories. It’s important for business users to be able to see quality scores and metrics to make confident business decisions and debug data quality issues. An AWS Glue crawler crawls the results.

Data Quality

Data Quality Metrics Visualization Dashboards

What is Dark Data, Why Does it Matter, and Why Are Humans Still Needed?

Timo Elliott

JANUARY 3, 2022

It’s stored in corporate data warehouses, data lakes, and a myriad of other locations – and while some of it is put to good use, it’s estimated that around 73% of this data remains unexplored. So how can organizations find data in their own universes? Every data point stored has potential value.

IT

IT Unstructured Data Data Quality Machine Learning

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake. What’s in a Data Lake? Remember that data stream/river analogy earlier? Remember that data stream/river analogy earlier? Taking a Dip.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

This would be straightforward task were it not for the fact that, during the digital-era, there has been an explosion of data – collected and stored everywhere – much of it poorly governed, ill-understood, and irrelevant. Data Centricity. There is evidence to suggest that there is a blind spot when it comes to data in the AI context.

Data Governance

Data Governance IT Risk Data Lake

Get out of the data swamp with a governed data lake

IBM Big Data Hub

MAY 2, 2018

Making your data lake a “governed data lake” is the game changer. Without governance, organizations risk securing the data and as well as protecting it. A governed data lake contains data that’s accessible, clean, trusted and protected.

Data Lake

Data Lake Data Quality Risk IT

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Poor-quality data can lead to incorrect insights, bad decisions, and lost opportunities. AWS Glue Data Quality measures and monitors the quality of your dataset. It supports both data quality at rest and data quality in AWS Glue extract, transform, and load (ETL) pipelines.

Data Quality

Data Quality Data Lake Visualization Data-driven

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. Recently, EUROGATE has developed a digital twin for its container terminal Hamburg (CTH), generating millions of data points every second from Internet of Things (IoT)devices attached to its container handling equipment (CHE).

IoT

IoT Machine Learning Metadata Data-driven

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

AWS Big Data

JUNE 6, 2023

AWS Glue is a serverless data integration service that makes it simple to discover, prepare, and combine data for analytics, machine learning (ML), and application development. Hundreds of thousands of customers use data lakes for analytics and ML to make data-driven business decisions.

Data Quality

Data Quality Data-driven Data Lake Metrics

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

Applying artificial intelligence (AI) to data analytics for deeper, better insights and automation is a growing enterprise IT priority. But the data repository options that have been around for a while tend to fall short in their ability to serve as the foundation for big data analytics powered by AI. Meet the data lakehouse.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. Determine the changes in transaction, and write new data files. This scenario applies to any type of updates on an Iceberg table.

Snapshot

Snapshot Management Metadata Big Data

Analyzing the business-case approach Perdue Farms takes to derive value from data

CIO Business Intelligence

SEPTEMBER 20, 2023

On the agribusiness side we source, purchase, and process agricultural commodities and offer a diverse portfolio of products including grains, soybean meal, blended feed ingredients, and top-quality oils for the food industry to add value to the commodities our customers desire. The data can also help us enrich our commodity products.

Data Lake

Data Lake Data-driven Dashboards Risk

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

In today’s rapidly evolving financial landscape, data is the bedrock of innovation, enhancing customer and employee experiences and securing a competitive edge. Like many large financial institutions, ANZ Institutional Division operated with siloed data practices and centralized data management teams.

Metadata

Metadata Data Governance Data Quality Data-driven

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

These formats, exemplified by Apache Iceberg, Apache Hudi, and Delta Lake, addresses persistent challenges in traditional data lake structures by offering an advanced combination of flexibility, performance, and governance capabilities. In this post, we highlight notable updates on Iceberg, Hudi, and Delta Lake in AWS Glue 5.0.

Snapshot

Snapshot Metadata Data Lake Optimization

6 BI challenges IT teams must address

CIO Business Intelligence

DECEMBER 21, 2022

Every day, organizations of every description are deluged with data from a variety of sources, and attempting to make sense of it all can be overwhelming. By 2025, it’s estimated we’ll have 463 million terabytes of data created every day,” says Lisa Thee, data for good sector lead at Launch Consulting Group in Seattle.

IT

IT Business Intelligence Sales Key Performance Indicator

The essential check list for effective data democratization

CIO Business Intelligence

JANUARY 20, 2023

But to get maximum value out of data and analytics, companies need to have a data-driven culture permeating the entire organization, one in which every business unit gets full access to the data it needs in the way it needs it. This is called data democratization. They have data swamps,” he says.

Data Lake

Data Lake Data-driven Finance Data Architecture

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Poor data quality is one of the top barriers faced by organizations aspiring to be more data-driven. Ill-timed business decisions and misinformed business processes, missed revenue opportunities, failed business initiatives and complex data systems can all stem from data quality issues.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The core issue plaguing many organizations is the presence of out-of-control databases or data lakes characterized by: Unrestrained Data Changes: Numerous users and tools incessantly alter data, leading to a tumultuous environment. Monitor freshness, schema changes, volume, and column health are standard.

Data Quality

Data Quality Testing Data Lake Data Integration

Informatica’s new data management clouds target health, finance services

CIO Business Intelligence

MAY 24, 2022

Just after launching a focused data management platform for retail customers in March, enterprise data management vendor Informatica has now released two more industry-specific versions of its Intelligent Data Management Cloud (IDMC) — one for financial services, and the other for health and life sciences.

Finance

Finance Management Metadata Machine Learning

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

In the ever-evolving world of finance and lending, the need for real-time, reliable, and centralized data has become paramount. Bluestone , a leading financial institution, embarked on a transformative journey to modernize its data infrastructure and transition to a data-driven organization.

Data-driven

Data-driven Data Lake Data Quality Data Governance

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

As the volume and complexity of analytics workloads continue to grow, customers are looking for more efficient and cost-effective ways to ingest and analyse data. AWS Glue provides both visual and code-based interfaces to make data integration effortless. For setup instructions, refer to Getting started with Amazon OpenSearch Service.

Analytics

Analytics IT Data Lake Visualization

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

AWS Big Data

DECEMBER 9, 2024

As organizations process vast amounts of data, maintaining an accurate historical record is crucial. History management in data systems is fundamental for compliance, business intelligence, data quality, and time-based analysis. Common use cases for historical record management in CDC scenarios span various domains.

Snapshot

Snapshot Data Warehouse Data Lake Data Quality

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users. For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users.

Metadata

Metadata Data Lake Visualization Data Quality

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

To provide a response that includes the enterprise context, each user prompt needs to be augmented with a combination of insights from structured data from the data warehouse and unstructured data from the enterprise data lake. Implement data privacy policies. Implement data quality by data type and source.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Decentralize LF-tag management with AWS Lake Formation

AWS Big Data

NOVEMBER 16, 2023

One of the core features of AWS Lake Formation is the delegation of permissions on a subset of resources such as databases, tables, and columns in AWS Glue Data Catalog to data stewards, empowering them make decisions regarding who should get access to their resources and helping you decentralize the permissions management of your data lakes.

Management

Management Data Lake Sales Machine Learning

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

It’s necessary to say that these processes are recurrent and require continuous evolution of reports, online data visualization , dashboards, and new functionalities to adapt current processes and develop new ones. Pilot release to small subgroup. Train end-users. Train production staff. Deploy into production.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Drug Launch Case Study: Amazing Efficiency Using DataOps

Data Quality Power Moves: Scorecards & Data Checks for Organizational Impact

Webinars

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Data’s dark secret: Why poor quality cripples AI and growth

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Data Lakes on Cloud & it’s Usage in Healthcare

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Glue Data Quality is Generally Available

Talend Data Fabric Simplifies Data Life Cycle Management

Get started with AWS Glue Data Quality dynamic rules for ETL pipelines

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Measure performance of AWS Glue Data Quality for ETL pipelines

Fire Your Super-Smart Data Consultants with DataOps

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Steps taken to build Sevita’s first enterprise data platform

What is a Data Mesh?

Visualize data quality scores and metrics generated by AWS Glue Data Quality

What is Dark Data, Why Does it Matter, and Why Are Humans Still Needed?

Data Lakes: What Are They and Who Needs Them?

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

Get out of the data swamp with a governed data lake

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

How EUROGATE established a data mesh architecture using Amazon DataZone

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

Building a Beautiful Data Lakehouse

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Analyzing the business-case approach Perdue Farms takes to derive value from data

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Use open table format libraries on AWS Glue 5.0 for Apache Spark

6 BI challenges IT teams must address

The essential check list for effective data democratization

Data architecture strategy for data quality

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Informatica’s new data management clouds target health, finance services

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Data governance in the age of generative AI

Decentralize LF-tag management with AWS Lake Formation

Accomplish Agile Business Intelligence & Analytics For Your Business

Stay Connected