Data Lake, Data Quality and Modeling

Data Quality Power Moves: Scorecards & Data Checks for Organizational Impact

DataKitchen

SEPTEMBER 18, 2024

A DataOps Approach to Data Quality The Growing Complexity of Data Quality Data quality issues are widespread, affecting organizations across industries, from manufacturing to healthcare and financial services. 73% of data practitioners do not trust their data (IDC).

Scorecard

Scorecard Data Quality Measurement Testing

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This saves time and effort, especially for teams looking to minimize infrastructure management and focus solely on data modeling.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Yet, despite growing investments in advanced analytics and AI, organizations continue to grapple with a persistent and often underestimated challenge: poor data quality.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

They’re taking data they’ve historically used for analytics or business reporting and putting it to work in machine learning (ML) models and AI-powered applications. Amazon SageMaker Unified Studio (Preview) solves this challenge by providing an integrated authoring experience to use all your data and tools for analytics and AI.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

AWS Big Data

OCTOBER 9, 2024

Today, customers are embarking on data modernization programs by migrating on-premises data warehouses and data lakes to the AWS Cloud to take advantage of the scale and advanced analytical capabilities of the cloud. Some customers build custom in-house data parity frameworks to validate data during migration.

Data Quality

Data Quality Data Lake Data Warehouse Metrics

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. Amazon DataZone natively supports data sharing for Amazon Redshift data assets.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning. It takes days for data engineers to identify and implement data quality rules.

Data Quality

Data Quality Statistics Data Lake Visualization

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

In addition to real-time analytics and visualization, the data needs to be shared for long-term data analytics and machine learning applications. To achieve this, EUROGATE designed an architecture that uses Amazon DataZone to publish specific digital twin data sets, enabling access to them with SageMaker in a separate AWS account.

IoT

IoT Machine Learning Metadata Data-driven

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt. Second-generation – gigantic, complex data lake maintained by a specialized team drowning in technical debt.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. When combined with well-timed maintenance operations, these patterns help build resilient data pipelines that can handle concurrent writes reliably.

Snapshot

Snapshot Management Metadata Big Data

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake. What’s in a Data Lake? Data warehouses do a great job of standardizing data from disparate sources for analysis. Taking a Dip.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Domain ownership recognizes that the teams generating the data have the deepest understanding of it and are therefore best suited to manage, govern, and share it effectively. This principle makes sure data accountability remains close to the source, fostering higher data quality and relevance.

Metadata

Metadata Data Governance Data Quality Data-driven

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

However, they do contain effective data management, organization, and integrity capabilities. As a result, users can easily find what they need, and organizations avoid the operational and cost burdens of storing unneeded or duplicate data copies. On the other hand, they don’t support transactions or enforce data quality.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Poor-quality data can lead to incorrect insights, bad decisions, and lost opportunities. AWS Glue Data Quality measures and monitors the quality of your dataset. It supports both data quality at rest and data quality in AWS Glue extract, transform, and load (ETL) pipelines.

Data Quality

Data Quality Data Lake Visualization Data-driven

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Poor data quality is one of the top barriers faced by organizations aspiring to be more data-driven. Ill-timed business decisions and misinformed business processes, missed revenue opportunities, failed business initiatives and complex data systems can all stem from data quality issues.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Q: Is data modeling cool again? In today’s fast-paced digital landscape, data reigns supreme. The data-driven enterprise relies on accurate, accessible, and actionable information to make strategic decisions and drive innovation. A: It always was and is getting cooler!!

Data-driven

Data-driven Modeling Enterprise Structured Data

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. Data governance is a critical building block across all these approaches, and we see two emerging areas of focus.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

IBM Big Data Hub

MAY 9, 2023

Data must be laboriously collected, curated, and labeled with task-specific annotations to train AI models. Building a model requires specialized, hard-to-find skills — and each new task requires repeating the process. ” These large models have lowered the cost and labor involved in automation.

Enterprise

Enterprise Technology Modeling Cost-Benefit

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

The following are the key components of the Bluestone Data Platform: Data mesh architecture – Bluestone adopted a data mesh architecture, a paradigm that distributes data ownership across different business units. This enables data-driven decision-making across the organization.

Data-driven

Data-driven Data Lake Data Quality Data Governance

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

This would be straightforward task were it not for the fact that, during the digital-era, there has been an explosion of data – collected and stored everywhere – much of it poorly governed, ill-understood, and irrelevant. Many organisations focus too heavily on fine tuning their computational models in their pursuit of ‘quick-wins.’

Data Governance

Data Governance IT Data Lake Risk

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

Business intelligence is moving away from the traditional engineering model: analysis, design, construction, testing, and implementation. In the traditional model communication between developers and business users is not a priority. This is also known as model storming, one of the practices in agile analytics development.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Build a semantic search engine for tabular columns with Transformers and Amazon OpenSearch Service

AWS Big Data

MARCH 1, 2023

Finding similar columns in a data lake has important applications in data cleaning and annotation, schema matching, data discovery, and analytics across multiple data sources. The workflow begins with an AWS Glue job that converts the CSV files into Apache Parquet data format.

Data Lake

Data Lake Deep Learning Interactive Machine Learning

Modeling, Modernization and Automation

BI-Survey

APRIL 27, 2023

While most continue to struggle with data quality issues and cumbersome manual processes, best-in-class companies are making improvements with commercial automation tools. The data vault has strong adherents among best-in-class companies, even though its usage lags the alternative approaches of third-normal-form and star schema.

Modeling

Modeling Data Warehouse Data Quality Business Driver

How BMO improved data security with Amazon Redshift and AWS Lake Formation

AWS Big Data

MARCH 1, 2024

One of the bank’s key challenges related to strict cybersecurity requirements is to implement field level encryption for personally identifiable information (PII), Payment Card Industry (PCI), and data that is classified as high privacy risk (HPR). Only users with required permissions are allowed to access data in clear text.

Data Lake

Data Lake Data Warehouse Management Risk

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

erwin

JULY 17, 2019

Part Two of the Digital Transformation Journey … In our last blog on driving digital transformation , we explored how enterprise architecture (EA) and business process (BP) modeling are pivotal factors in a viable digital transformation strategy. With automation, data quality is systemically assured.

Digital Transformation

Digital Transformation Strategy Metadata Data-driven

Making the gen AI and data connection work

CIO Business Intelligence

AUGUST 9, 2024

Privacy protection The first step in AI and gen AI projects is always to get the right data. “In In cases where privacy is essential, we try to anonymize as much as possible and then move on to training the model,” says University of Florence technologist Vincenzo Laveglia. “A A balance between privacy and utility is needed.

Risk

Risk Measurement Data Lake Data Collection

3 things to get right with data management for gen AI projects

CIO Business Intelligence

OCTOBER 2, 2024

According to Kari Briski, VP of AI models, software, and services at Nvidia, successfully implementing gen AI hinges on effective data management and evaluating how different models work together to serve a specific use case. But some IT leaders are getting it right because they focus on three key aspects.

Management

Management Data Governance Cost-Benefit Structured Data

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

Data governance is increasingly top-of-mind for customers as they recognize data as one of their most important assets. Effective data governance enables better decision-making by improving data quality, reducing data management costs, and ensuring secure access to data for stakeholders.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

AWS Big Data

FEBRUARY 13, 2025

Amazon SageMaker Unified Studio (preview) provides a unified experience for using data, analytics, and AI capabilities. You can use familiar AWS services for model development, generative AI, data processing, and analyticsall within a single, governed environment. To use Amazon Bedrock FMs, grant access to base models.

Data Analytics

Data Analytics Analytics Modeling Management

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

Figure 2: Example data pipeline with DataOps automation. In this project, I automated data extraction from SFTP, the public websites, and the email attachments. The automated orchestration published the data to an AWS S3 Data Lake. All the code, Talend job, and the BI report are version controlled using Git.

Testing

Testing Metadata Dashboards Statistics

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

Hence the drive to provide ML as a service to the Data & Tech team’s internal customers. All they would have to do is just build their model and run with it,” he says. That step, primarily undertaken by developers and data architects, established data governance and data integration. The offensive side?

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

Did Big Data Deliver Business Transformation & Improved CX?

Alation

AUGUST 4, 2022

Aligning business goals to data usage is a key foundation. There is no digital transformation without business-aligned data control,” Capgemini Executive Steve Jones adds. “The architectural model of big data is fundamental to this.”. Everything comes down to what organizations do with data.

Big Data

Big Data Digital Transformation Data Lake Data-driven

Putting the Business Back Into Business Innovation

Timo Elliott

DECEMBER 14, 2022

Most innovation platforms make you rip the data out of your existing applications and move it to some another environment—a data warehouse, or data lake, or data lake house or data cloud—before you can do any innovation. Business Context. Business Content.

Data Lake

Data Lake Recreation/Entertainment Data Warehouse Metadata

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

AWS Big Data

JULY 25, 2024

Solution To address the challenge, ATPCO sought inspiration from a modern data mesh architecture. Implementation Now, we’ll walk through how ATPCO implemented their solution to solve the challenges of analysts discovering, getting access to, and using data quickly to help their airline customers. Select Create environment profile.

Data Lake

Data Lake Metadata Sales Publishing

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Cloudera

JUNE 13, 2024

After countless open-source innovations ushered in the Big Data era, including the first commercial distribution of HDFS (Apache Hadoop Distributed File System), commonly referred to as Hadoop, the two companies joined forces, giving birth to an entire ecosystem of technology and tech companies.

Big Data

Big Data Machine Learning Contextual Data Data Lake

Data Architecture and Strategy in the AI Era

Cloudera

MARCH 28, 2024

At a time when AI is exploding in popularity and finding its way into nearly every facet of business operations, data has arguably never been more valuable. More recently, that value has been made clear by the emergence of AI-powered technologies like generative AI (GenAI) and the use of Large Language Models (LLMs).

Data Architecture

Data Architecture Strategy Data Lake Data-driven

8 tips for unleashing the power of unstructured data

CIO Business Intelligence

NOVEMBER 28, 2023

With each game release and update, the amount of unstructured data being processed grows exponentially, Konoval says. This volume of data poses serious challenges in terms of storage and efficient processing,” he says. To address this problem RetroStyle Games invested in data lakes. Quality is job one.

Unstructured Data

Unstructured Data Data-driven Visualization Data Quality

How Fujitsu implemented a global data mesh architecture and democratized data

AWS Big Data

MAY 1, 2024

Foundation When the teams don’t have a robust enough skillset, it can require more time to model and process data, and cause longer latency and lower data quality. To address this, the foundation role provides an already processed dataset as a generic data model for data commonly use cases used by many consumers.

Dashboards

Dashboards Publishing Data-driven Cost-Benefit

Accelerating model velocity through Snowflake Java UDF integration

Domino Data Lab

JUNE 15, 2021

Over the next decade, the companies that will beat competitors will be “model-driven” businesses. These companies often undertake large data science efforts in order to shift from “data-driven” to “model-driven” operations, and to provide model-underpinned insights to the business. anomaly detection).

Modeling

Modeling Data Science Data-driven Data Warehouse

A comparative assessment of digital transformation in Italy

CIO Business Intelligence

APRIL 24, 2024

In fact, AMA collects a huge amount of structured and unstructured data from bins, collection vehicles, facilities, and user reports, and until now, this data has remained disconnected, managed by disparate systems and interfaces, through Excel spreadsheets.

Digital Transformation

Digital Transformation Business Intelligence Unstructured Data Data Lake

DataOps Observability: Taming the Chaos (Part 3)

DataKitchen

NOVEMBER 18, 2022

In addition to the tracking of relationships and quality metrics, DataOps Observability journeys allow users to establish baselines?concrete concrete expectations for run schedules, run durations, data quality, and upstream and downstream dependencies. And she’ll know when newer data will arrive.

Testing

Testing Statistics Measurement Dashboards

Data Quality Power Moves: Scorecards & Data Checks for Organizational Impact

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Trending Sources

Data’s dark secret: Why poor quality cripples AI and growth

Webinars

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Glue Data Quality is Generally Available

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

How EUROGATE established a data mesh architecture using Amazon DataZone

What is a Data Mesh?

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Data Lakes: What Are They and Who Needs Them?

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Building a Beautiful Data Lakehouse

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Data architecture strategy for data quality

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

Data governance in the age of generative AI

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

Accomplish Agile Business Intelligence & Analytics For Your Business

Build a semantic search engine for tabular columns with Transformers and Amazon OpenSearch Service

Modeling, Modernization and Automation

How BMO improved data security with Amazon Redshift and AWS Lake Formation

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

Making the gen AI and data connection work

3 things to get right with data management for gen AI projects

AWS Lake Formation 2022 year in review

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

A Day in the Life of a DataOps Engineer

Straumann Group is transforming dentistry with data, AI

Did Big Data Deliver Business Transformation & Improved CX?

Putting the Business Back Into Business Innovation

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Data Architecture and Strategy in the AI Era

8 tips for unleashing the power of unstructured data

How Fujitsu implemented a global data mesh architecture and democratized data

Accelerating model velocity through Snowflake Java UDF integration

A comparative assessment of digital transformation in Italy

DataOps Observability: Taming the Chaos (Part 3)

Stay Connected