Data Lake, Data Quality and Metrics

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. With dbt, teams can define data quality checks and access controls as part of their transformation workflow.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

AWS Glue Data Quality allows you to measure and monitor the quality of data in your data repositories. It’s important for business users to be able to see quality scores and metrics to make confident business decisions and debug data quality issues.

Data Quality

Data Quality Metrics Visualization Dashboards

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

AWS Big Data

OCTOBER 9, 2024

Today, customers are embarking on data modernization programs by migrating on-premises data warehouses and data lakes to the AWS Cloud to take advantage of the scale and advanced analytical capabilities of the cloud. Some customers build custom in-house data parity frameworks to validate data during migration.

Data Quality

Data Quality Data Lake Data Warehouse Metrics

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Get started with AWS Glue Data Quality dynamic rules for ETL pipelines

AWS Big Data

MAY 23, 2024

They establish data quality rules to ensure the extracted data is of high quality for accurate business decisions. These rules assess the data based on fixed criteria reflecting current business states. We are excited to talk about how to use dynamic rules , a new capability of AWS Glue Data Quality.

Data Quality

Data Quality Metrics Data Lake Sales

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning. It takes days for data engineers to identify and implement data quality rules.

Data Quality

Data Quality Statistics Data Lake Visualization

Measure performance of AWS Glue Data Quality for ETL pipelines

AWS Big Data

MARCH 12, 2024

In recent years, data lakes have become a mainstream architecture, and data quality validation is a critical factor to improve the reusability and consistency of the data. In this post, we provide benchmark results of running increasingly complex data quality rulesets over a predefined test dataset.

Data Quality

Data Quality Measurement Testing Visualization

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources. The default output is log based.

Metadata

Metadata Snapshot Data Lake Metrics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Steps taken to build Sevita’s first enterprise data platform

CIO Business Intelligence

OCTOBER 23, 2024

We pulled these people together, and defined use cases we could all agree were the best to demonstrate our new data capability. Once they were identified, we had to determine we had the right data. Then we migrated the data to our new data lake, and stood up the new platform.

Enterprise

Enterprise Dashboards KPI Data Lake

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

AWS Big Data

JUNE 6, 2023

You can use AWS Glue to create, run, and monitor data integration and ETL (extract, transform, and load) pipelines and catalog your assets across multiple data stores. Hundreds of thousands of customers use data lakes for analytics and ML to make data-driven business decisions.

Data Quality

Data Quality Data-driven Data Lake Metrics

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake. What’s in a Data Lake? Data warehouses do a great job of standardizing data from disparate sources for analysis. Taking a Dip.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Poor-quality data can lead to incorrect insights, bad decisions, and lost opportunities. AWS Glue Data Quality measures and monitors the quality of your dataset. It supports both data quality at rest and data quality in AWS Glue extract, transform, and load (ETL) pipelines.

Data Quality

Data Quality Data Lake Visualization Data-driven

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

cycle_end"', "sagemakedatalakeenvironment_sub_db", ctas_approach=False) A similar approach is used to connect to shared data from Amazon Redshift, which is also shared using Amazon DataZone. The data science and AI teams are able to explore and use new data sources as they become available through Amazon DataZone.

IoT

IoT Machine Learning Metadata Data-driven

Analyzing the business-case approach Perdue Farms takes to derive value from data

CIO Business Intelligence

SEPTEMBER 20, 2023

On the agribusiness side we source, purchase, and process agricultural commodities and offer a diverse portfolio of products including grains, soybean meal, blended feed ingredients, and top-quality oils for the food industry to add value to the commodities our customers desire. The data can also help us enrich our commodity products.

Data Lake

Data Lake Data-driven Dashboards Risk

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The core issue plaguing many organizations is the presence of out-of-control databases or data lakes characterized by: Unrestrained Data Changes: Numerous users and tools incessantly alter data, leading to a tumultuous environment. Monitor for freshness, schema changes, volume, field health/quality, new tables, and usage.

Data Quality

Data Quality Testing Data Lake Data Integration

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

To provide a response that includes the enterprise context, each user prompt needs to be augmented with a combination of insights from structured data from the data warehouse and unstructured data from the enterprise data lake. Implement data privacy policies. Implement data quality by data type and source.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

Figure 2: Example data pipeline with DataOps automation. In this project, I automated data extraction from SFTP, the public websites, and the email attachments. The automated orchestration published the data to an AWS S3 Data Lake. All the code, Talend job, and the BI report are version controlled using Git.

Testing

Testing Metadata Dashboards Statistics

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

This would be straightforward task were it not for the fact that, during the digital-era, there has been an explosion of data – collected and stored everywhere – much of it poorly governed, ill-understood, and irrelevant. Further, data management activities don’t end once the AI model has been developed. Addressing the Challenge.

Data Governance

Data Governance IT Risk Data Lake

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Migrating to Amazon Redshift offers organizations the potential for improved price-performance, enhanced data processing, faster query response times, and better integration with technologies such as machine learning (ML) and artificial intelligence (AI).

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

DataOps Observability: Taming the Chaos (Part 3)

DataKitchen

NOVEMBER 18, 2022

In addition to the tracking of relationships and quality metrics, DataOps Observability journeys allow users to establish baselines?concrete concrete expectations for run schedules, run durations, data quality, and upstream and downstream dependencies. And she’ll know when newer data will arrive.

Testing

Testing Statistics Measurement Dashboards

You Can’t Hit What You Can’t See

Cloudera

DECEMBER 1, 2022

Data observability provides insight into the condition and evolution of the data resources from source through the delivery of the data products. Barr Moses of Monte Carlo presents it as a combination of data flow, data quality, data governance, and data lineage.

Data Quality

Data Quality Metrics Data Lake Statistics

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

You will need to continually return to your business dashboard to make sure that it’s working, the data is accurate and it’s still answering the right questions in the most effective way. Testing will eliminate lots of data quality challenges and bring a test-first approach through your agile cycle.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

A data hub contains data at multiple levels of granularity and is often not integrated. It differs from a data lake by offering data that is pre-validated and standardized, allowing for simpler consumption by users. Data hubs and data lakes can coexist in an organization, complementing each other.

Analytics

Analytics Data Warehouse Data Lake Metadata

6 BI challenges IT teams must address

CIO Business Intelligence

DECEMBER 21, 2022

Jim Hare, distinguished VP and analyst at Gartner, says that some people think they need to take all the data siloed in systems in various business units and dump it into a data lake. But what they really need to do is fundamentally rethink how data is managed and accessed,” he says.

IT

IT Business Intelligence Sales Key Performance Indicator

How AWS helped Altron Group accelerate their vision for optimized customer engagement

AWS Big Data

JULY 13, 2023

Data quality for account and customer data – Altron wanted to enable data quality and data governance best practices. Goals – Lay the foundation for a data platform that can be used in the future by internal and external stakeholders.

Optimization

Optimization B2B Data Quality Sales

Better, faster decisions: Why businesses thrive on real-time data

CIO Business Intelligence

SEPTEMBER 8, 2022

In Foundry’s 2022 Data & Analytics Study , 88% of IT decision-makers agree that data collection and analysis have the potential to fundamentally change their business models over the next three years. The ability to pivot quickly to address rapidly changing customer or market demands is driving the need for real-time data.

Cost-Benefit

Cost-Benefit Internet of Things Data-driven Data Lake

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

As part of their cloud modernization initiative, they sought to migrate and modernize their legacy data platform. Third-party APIs – These provide analytics and survey data related to ecommerce websites. This could include details like traffic metrics, user behavior, conversion rates, customer feedback, and more.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

AWS Big Data

FEBRUARY 13, 2025

This plane drives users to engage in data-driven conversations with knowledge and insights shared across the organization. Through the product experience plane, data product owners can use automated workflows to capture data lineage and data quality metrics and oversee access controls.

Data Analytics

Data Analytics Analytics Modeling Management

How data literacy allows gen AI to drive productivity at Dow

CIO Business Intelligence

JULY 31, 2024

As an integrated manufacturing capability, Dow is a complex puzzle, and these AI models help us incorporate historical data, market trends, and customer behaviors, all of which allow us to produce a more precise demand plan. Is gen AI a shiny toy in a hype cycle or will it have material impact on your business?

Manufacturing

Manufacturing Cost-Benefit Digital Transformation Forecasting

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

In 2022, AWS commissioned a study conducted by the American Productivity and Quality Center (APQC) to quantify the Business Value of Customer 360. The following figure shows some of the metrics derived from the study. The AWS modern data architecture shows a way to build a purpose-built, secure, and scalable data platform in the cloud.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Overcome these six data consumption challenges for a more data-driven enterprise

IBM Big Data Hub

JUNE 8, 2022

However, a foundational step in evolving into a data-driven organization requires trusted, readily available, and easily accessible data for users within the organization; thus, an effective data governance program is key. Here are a few common data management challenges: Regulatory compliance on data use.

Data-driven

Data-driven Enterprise Data Governance Data Lake

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes.

Optimization

Optimization Forecasting Data Lake Metadata

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

The Data Scientist’s Guide to the Data Catalog

Alation

JULY 19, 2022

Modern data catalogs also facilitate data quality checks. Historically restricted to the purview of data engineers, data quality information is essential for all user groups to see. Data scientists often have different requirements for a data catalog than data analysts.

Metadata

Metadata Data Quality Statistics Data Science

Fact-based Decision-making

Peter James Thomas

AUGUST 12, 2018

However, often the biggest stumbling block is a human one, getting people to buy in to the idea that the care and attention they pay to data capture will pay dividends later in the process. These and other areas are covered in greater detail in an older article, Using BI to drive improvements in data quality.

Metrics

Metrics Statistics Data Quality Measurement

What is Business Intelligence Consulting

BizAcuity

APRIL 1, 2023

Several large organizations have faltered on different stages of BI implementation, from poor data quality to the inability to scale due to larger volumes of data and extremely complex BI architecture. Data governance and security measures are critical components of data strategy. What is Business Intelligence?

Business Intelligence

Business Intelligence Consulting KPI Data Warehouse

What is Business Intelligence Consulting

BizAcuity

JANUARY 31, 2023

Several large organizations have faltered on different stages of BI implementation, from poor data quality to the inability to scale due to larger volumes of data and extremely complex BI architecture. Data governance and security measures are critical components of data strategy. What is Business Intelligence?

Business Intelligence

Business Intelligence Consulting KPI Data Warehouse

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

It has been well published since the State of DevOps 2019 DORA Metrics were published that with DevOps, companies can deploy software 208 times more often and 106 times faster, recover from incidents 2,604 times faster, and release 7 times fewer defects. Finally, data integrity is of paramount importance.

Software

Software Data Lake Testing Cost-Benefit

How data stores and governance impact your AI initiatives

IBM Big Data Hub

OCTOBER 12, 2023

To optimize data analytics and AI workloads, organizations need a data store built on an open data lakehouse architecture. This type of architecture combines the performance and usability of a data warehouse with the flexibility and scalability of a data lake.

Cost-Benefit

Cost-Benefit Metadata Data Governance Modeling

Case study: Policy Enforcement Automation With Semantics

Ontotext

MAY 2, 2024

Storage-centric approach In the storage-centric approach, people try to address data silos by throwing everything in a data lake or a data warehouse. But, although, this helps somewhat in terms of architecture, soon these data lakes become unwieldy.

Metadata

Metadata Data Lake Data-driven Enterprise

What Do You Actually Need from a Data Catalog Tool?

Alation

SEPTEMBER 23, 2021

Guided Navigation – Guided navigation provides intelligent suggestions, which guide correct usage of data. Behavioral intelligence, embedded in the catalog, learns from user behavior to enforce best practices through features like data quality flags, which help folks stay compliant as they use data.

Enterprise

Enterprise Metadata Data Governance Visualization

A Guide to Data Analytics in the Travel Industry

Alation

MARCH 21, 2023

Having been in business for over 50 years, ARC had accumulated a massive amount of data that was stored in siloed, on-premises servers across its 7 business domains. Using Alation, ARC automated the data curation and cataloging process. “So

Data Analytics

Data Analytics Analytics Data-driven Big Data

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Visualize data quality scores and metrics generated by AWS Glue Data Quality

Webinars

Trending Sources

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

Webinars

Get started with AWS Glue Data Quality dynamic rules for ETL pipelines

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Glue Data Quality is Generally Available

Measure performance of AWS Glue Data Quality for ETL pipelines

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Steps taken to build Sevita’s first enterprise data platform

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

Data Lakes: What Are They and Who Needs Them?

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

How EUROGATE established a data mesh architecture using Amazon DataZone

Analyzing the business-case approach Perdue Farms takes to derive value from data

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Data governance in the age of generative AI

A Day in the Life of a DataOps Engineer

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

DataOps Observability: Taming the Chaos (Part 3)

You Can’t Hit What You Can’t See

Accomplish Agile Business Intelligence & Analytics For Your Business

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

6 BI challenges IT teams must address

How AWS helped Altron Group accelerate their vision for optimized customer engagement

Better, faster decisions: Why businesses thrive on real-time data

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

How data literacy allows gen AI to drive productivity at Dow

Create an end-to-end data strategy for Customer 360 on AWS

Overcome these six data consumption challenges for a more data-driven enterprise

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

The Data Scientist’s Guide to the Data Catalog

Fact-based Decision-making

What is Business Intelligence Consulting

What is Business Intelligence Consulting

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

How data stores and governance impact your AI initiatives

Case study: Policy Enforcement Automation With Semantics

What Do You Actually Need from a Data Catalog Tool?

A Guide to Data Analytics in the Travel Industry

Stay Connected