Data Lake, Data Quality and Interactive

Data Lake

Data Quality

Interactive

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. Using Athena and the dbt adapter, you can transform raw data in Amazon S3 into well-structured tables suitable for analytics.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

AWS Glue Data Quality allows you to measure and monitor the quality of data in your data repositories. It’s important for business users to be able to see quality scores and metrics to make confident business decisions and debug data quality issues. An AWS Glue crawler crawls the results.

Data Quality

Data Quality Metrics Visualization Dashboards

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning. It takes days for data engineers to identify and implement data quality rules.

Data Quality

Data Quality Statistics Data Lake Visualization

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt. Second-generation – gigantic, complex data lake maintained by a specialized team drowning in technical debt.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake. What’s in a Data Lake? Data warehouses do a great job of standardizing data from disparate sources for analysis. Taking a Dip.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. We show two example scripts demonstrating a practical implementation of error handling for data conflicts in Iceberg streaming jobs.

Snapshot

Snapshot Management Metadata Big Data

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The core issue plaguing many organizations is the presence of out-of-control databases or data lakes characterized by: Unrestrained Data Changes: Numerous users and tools incessantly alter data, leading to a tumultuous environment. Monitor freshness, schema changes, volume, and column health are standard.

Data Quality

Data Quality Testing Data Lake Data Integration

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Unstructured data is typically stored across siloed systems in varying formats, and generally not managed or governed with the same level of rigor as structured data. On the backend, the batch data engineering processes refreshing the enterprise data lake need to expand to ingest, transform, and manage unstructured data.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

The following are the key components of the Bluestone Data Platform: Data mesh architecture – Bluestone adopted a data mesh architecture, a paradigm that distributes data ownership across different business units. This enables data-driven decision-making across the organization.

Data-driven

Data-driven Data Lake Data Quality Data Governance

Build a semantic search engine for tabular columns with Transformers and Amazon OpenSearch Service

AWS Big Data

MARCH 1, 2023

Finding similar columns in a data lake has important applications in data cleaning and annotation, schema matching, data discovery, and analytics across multiple data sources. Finally, to interact with and visualize results from our solution, we build an interactive Streamlit web application running on AWS Fargate.

Data Lake

Data Lake Deep Learning Interactive Machine Learning

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

17 software developers met to discuss lightweight development methods and subsequently produced the following manifesto : Manifesto for Agile Software Development: Individuals and interactions over processes and tools. Testing will eliminate lots of data quality challenges and bring a test-first approach through your agile cycle.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

A data hub contains data at multiple levels of granularity and is often not integrated. It differs from a data lake by offering data that is pre-validated and standardized, allowing for simpler consumption by users. Data hubs and data lakes can coexist in an organization, complementing each other.

Analytics

Analytics Data Warehouse Data Lake Metadata

Differentiate generative AI applications with your data using AWS analytics and managed databases

AWS Big Data

SEPTEMBER 12, 2024

This is just the tip of the iceberg, because what enables you to obtain differential value from generative AI is your data. This data is derived from your purpose-built data stores and previous interactions. Semantic context – Is there any meaningfully relevant data that would help the FMs generate the response?

Management

Management Analytics Data Lake Interactive

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

Data governance is increasingly top-of-mind for customers as they recognize data as one of their most important assets. Effective data governance enables better decision-making by improving data quality, reducing data management costs, and ensuring secure access to data for stakeholders.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Customer 360 (C360) provides a complete and unified view of a customer’s interactions and behavior across all touchpoints and channels. This view is used to identify patterns and trends in customer behavior, which can inform data-driven decisions to improve business outcomes. Then, you transform this data into a concise format.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Better, faster decisions: Why businesses thrive on real-time data

CIO Business Intelligence

SEPTEMBER 8, 2022

In Foundry’s 2022 Data & Analytics Study , 88% of IT decision-makers agree that data collection and analysis have the potential to fundamentally change their business models over the next three years. The ability to pivot quickly to address rapidly changing customer or market demands is driving the need for real-time data.

Cost-Benefit

Cost-Benefit Internet of Things Data-driven Data Lake

Automate large-scale data validation using Amazon EMR and Apache Griffin

AWS Big Data

APRIL 4, 2024

Griffin is an open source data quality solution for big data, which supports both batch and streaming mode. In today’s data-driven landscape, where organizations deal with petabytes of data, the need for automated data validation frameworks has become increasingly critical.

Data Quality

Data Quality Data Lake Data Warehouse Data-driven

Data Architecture and Strategy in the AI Era

Cloudera

MARCH 28, 2024

Among the most common challenges to achieving AI adoption at scale were data quality and availability (36%), scalability and deployment (36%), integration with existing systems and processes (35%), and change management and organizational culture (34%).

Data Architecture

Data Architecture Strategy Data Lake Data-driven

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Cloudera

JUNE 13, 2024

After countless open-source innovations ushered in the Big Data era, including the first commercial distribution of HDFS (Apache Hadoop Distributed File System), commonly referred to as Hadoop, the two companies joined forces, giving birth to an entire ecosystem of technology and tech companies. But, What Happened to Hadoop?

Big Data

Big Data Machine Learning Contextual Data Data Lake

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Migrating to Amazon Redshift offers organizations the potential for improved price-performance, enhanced data processing, faster query response times, and better integration with technologies such as machine learning (ML) and artificial intelligence (AI).

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

How data literacy allows gen AI to drive productivity at Dow

CIO Business Intelligence

JULY 31, 2024

How do data and digital technologies impact your business strategy? At the core, digital at Dow is about changing how we work, which includes how we interact with systems, data, and each other to be more productive and to grow. Data is at the heart of everything we do today, from AI to machine learning or generative AI.

Manufacturing

Manufacturing Cost-Benefit Digital Transformation Forecasting

8 tips for unleashing the power of unstructured data

CIO Business Intelligence

NOVEMBER 28, 2023

“The insights derived from this audio data have directly contributed to improving the game’s audio experience, ensuring that players are constantly emotionally engaged in the gameplay and interacting with the environment,” Konoval says. Games are dynamic, and so is the data they generate, Konoval says. Quality is job one.

Unstructured Data

Unstructured Data Data-driven Visualization Data Quality

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

As part of their cloud modernization initiative, they sought to migrate and modernize their legacy data platform. Data sources As part of this data platform, we are ingesting data from diverse and varied data sources, including: Transactional databases – These are active databases that store real-time data from various applications.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

AWS Big Data

FEBRUARY 13, 2025

This plane drives users to engage in data-driven conversations with knowledge and insights shared across the organization. Through the product experience plane, data product owners can use automated workflows to capture data lineage and data quality metrics and oversee access controls.

Data Analytics

Data Analytics Analytics Modeling Data-driven

6 BI challenges IT teams must address

CIO Business Intelligence

DECEMBER 21, 2022

“The number-one issue for our BI team is convincing people that business intelligence will help to make true data-driven decisions,” says Diana Stout, senior business analyst at Schellman, a global cybersecurity assessor based in Tampa, Fl. But what they really need to do is fundamentally rethink how data is managed and accessed,” he says.

IT Business Intelligence Sales Key Performance Indicator

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes.

Optimization

Optimization Forecasting Data Lake Metadata

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Improved Decision Making : Well-modeled data provides insights that drive informed decision-making across various business domains, resulting in enhanced strategic planning. Reduced Data Redundancy : By eliminating data duplication, it optimizes storage and enhances data quality, reducing errors and discrepancies.

Data-driven

Data-driven Modeling Enterprise Structured Data

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

Architecture for data democratization Data democratization requires a move away from traditional “data at rest” architecture, which is meant for storing static data. Traditionally, data was seen as information to be put on reserve, only called upon during customer interactions or executing a program.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

Clean up your Excel and CSV files without writing code using AWS Glue DataBrew

AWS Big Data

NOVEMBER 15, 2023

As the organization receives data from multiple external vendors, it often arrives in different formats, typically Excel or CSV files, with each vendor using their own unique data layout and structure. DataBrew is an excellent tool for data quality and preprocessing. Select the FoodMartSales-AllUp dataset.

Metadata

Metadata Sales Data Lake Big Data

Data Mesh 101: What it is and Why You Should Care

Ontotext

FEBRUARY 12, 2024

It proposes a technological, architectural, and organizational approach to solving data management problems by breaking up the monolithic data platform and de-centralizing data management across different domain teams and services. Once these domains interact and share data with each other, the mesh emerges.

IT Metadata Data Quality Data Lake

Why Invest Now? Three Investors Share the Story Behind Alation’s Series E

Alation

NOVEMBER 2, 2022

“At Databricks, we’re focused on enabling customers to adopt the data lakehouse, and that’s an open data architecture that combines the best of the data warehouse and the data lake into one platform,” Ferguson says. “[The And data governance is critical to driving adoption.”.

Data Governance

Data Governance Marketing Finance Data Lake

Migrate workloads from AWS Data Pipeline

AWS Big Data

JULY 25, 2024

With AWS Glue, you can discover and connect to hundreds of different data sources and manage your data in a centralized data catalog. You can visually create, run, and monitor ETL pipelines to load data into your data lakes.

Visualization

Visualization Management Data Integration Testing

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

AWS Big Data

DECEMBER 9, 2024

Equally crucial is the ability to segregate and audit problematic data, not just for maintaining data integrity, but also for regulatory compliance, error analysis, and potential data recovery. One of its key features is the ability to manage data using branches. It specifies which data is clean and ready to be provided.

Data Quality

Data Quality Publishing Snapshot Data Lake

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

Data mesh solves this by promoting data autonomy, allowing users to make decisions about domains without a centralized gatekeeper. It also improves development velocity with better data governance and access with improved data quality aligned with business needs.

Metadata

Metadata Data-driven Data Quality Data Architecture

Alation Earns 8 Top Rankings in BARC’s The Data Management Survey 23

Alation

OCTOBER 19, 2022

“Alation’s modern, web-based, user-friendly interfaces provide a good set of workflows and functions to interact with the system intuitively,” the report reads. Alation’s usability goes well beyond data discovery (used by 81 percent of our customers), data governance (74 percent), and data stewardship / data quality management (74 percent).

Management

Management KPI Data Governance Reporting

Scale knowledge management use cases with generative AI

IBM Big Data Hub

JULY 27, 2023

Data quality strongly impacts the quality and usefulness of content produced by an AI model, underscoring the significance of addressing data challenges. Summarization can help employees by providing them a brief of the customer’s problem and previous interactions with the company.

Management

Management Enterprise Modeling Data Quality

How Data Governance Supports Analytics

Alation

JANUARY 27, 2022

What Are the Top Data Challenges to Analytics? The proliferation of data sources means there is an increase in data volume that must be analyzed. Large volumes of data have led to the development of data lakes , data warehouses, and data management systems. Establishes Trust in Data.

Data Governance

Data Governance Analytics Cost-Benefit Data-driven

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

AWS Big Data

JUNE 12, 2023

Therefore, it’s crucial to keep the schema definition in the Schema Registry and the Data Catalog table in sync. To avoid this, it’s recommended to use a data quality check mechanism to identify such anomalies and take appropriate action in case of unexpected behavior. The following diagram illustrates our solution architecture.

Management

Management Metadata Internet of Things Testing

Fact-based Decision-making

Peter James Thomas

AUGUST 12, 2018

However, often the biggest stumbling block is a human one, getting people to buy in to the idea that the care and attention they pay to data capture will pay dividends later in the process. These and other areas are covered in greater detail in an older article, Using BI to drive improvements in data quality.

Statistics

Statistics Metrics Data Quality Measurement

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

AWS Big Data

APRIL 5, 2023

Showpad built new customer-facing embedded dashboards within Showpad eOSTM and migrated its legacy dashboards to Amazon QuickSight , a unified BI service providing modern interactive dashboards, natural language querying, paginated reports, machine learning (ML) insights, and embedded analytics at scale.

Dashboards

Dashboards Reporting Cost-Benefit Visualization

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Ontotext

MARCH 8, 2023

Graphs reconcile such data continuously crawled from diverse sources to support interactive queries and provide a graphic representation or model of the elements within supply chain, aiding in pathfinding and the ability to semantically enrich complex machine learning (ML) algorithms and decision making.

Enterprise

Enterprise Knowledge Discovery Risk Machine Learning

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Andrew White

JANUARY 11, 2021

Does Data warehouse as a software tool will play role in future of Data & Analytics strategy? You cannot get away from a formalized delivery capability focused on regular, scheduled, structured and reasonably governed data. Data lakes don’t offer this nor should they. E.g. Data Lakes in Azure – as SaaS.

Data Analytics

Data Analytics Analytics Data-driven Finance

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Visualize data quality scores and metrics generated by AWS Glue Data Quality

Webinars

Trending Sources

AWS Glue Data Quality is Generally Available

Webinars

What is a Data Mesh?

Data Lakes: What Are They and Who Needs Them?

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Data governance in the age of generative AI

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Build a semantic search engine for tabular columns with Transformers and Amazon OpenSearch Service

Accomplish Agile Business Intelligence & Analytics For Your Business

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Differentiate generative AI applications with your data using AWS analytics and managed databases

AWS Lake Formation 2022 year in review

Create an end-to-end data strategy for Customer 360 on AWS

Better, faster decisions: Why businesses thrive on real-time data

Automate large-scale data validation using Amazon EMR and Apache Griffin

Data Architecture and Strategy in the AI Era

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

How data literacy allows gen AI to drive productivity at Dow

8 tips for unleashing the power of unstructured data

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

6 BI challenges IT teams must address

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

Data democratization: How data architecture can drive business decisions and AI initiatives

Clean up your Excel and CSV files without writing code using AWS Glue DataBrew

Data Mesh 101: What it is and Why You Should Care

Why Invest Now? Three Investors Share the Story Behind Alation’s Series E

Migrate workloads from AWS Data Pipeline

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

What is Data Mesh?

Alation Earns 8 Top Rankings in BARC’s The Data Management Survey 23

Scale knowledge management use cases with generative AI

How Data Governance Supports Analytics

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

Fact-based Decision-making

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Stay Connected