Data Lake, Data Quality and Strategy

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Yet, despite growing investments in advanced analytics and AI, organizations continue to grapple with a persistent and often underestimated challenge: poor data quality.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

AWS Glue Data Quality allows you to measure and monitor the quality of data in your data repositories. It’s important for business users to be able to see quality scores and metrics to make confident business decisions and debug data quality issues. An AWS Glue crawler crawls the results.

Data Quality

Data Quality Metrics Visualization Dashboards

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

This post explores how the shift to a data product mindset is being implemented, the challenges faced, and the early wins that are shaping the future of data management in the Institutional Division. This principle makes sure data accountability remains close to the source, fostering higher data quality and relevance.

Metadata

Metadata Data Governance Data Quality Data-driven

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Poor data quality is one of the top barriers faced by organizations aspiring to be more data-driven. Ill-timed business decisions and misinformed business processes, missed revenue opportunities, failed business initiatives and complex data systems can all stem from data quality issues.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

Steps taken to build Sevita’s first enterprise data platform

CIO Business Intelligence

OCTOBER 23, 2024

But because of the infrastructure, employees spent hours on manual data analysis and spreadsheet jockeying. We had plenty of reporting, but very little data insight, and no real semblance of a data strategy. Second, the manual spreadsheet work resulted in significant manual data entry.

Enterprise

Enterprise Dashboards KPI Data Lake

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. It includes exponential backoff and jitter strategy by adding a random delay of 025% to each retry interval.

Snapshot

Snapshot Management Metadata Big Data

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

However, they do contain effective data management, organization, and integrity capabilities. As a result, users can easily find what they need, and organizations avoid the operational and cost burdens of storing unneeded or duplicate data copies. On the other hand, they don’t support transactions or enforce data quality.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

erwin

JULY 17, 2019

Having a clearly defined digital transformation strategy is an essential best practice for successful digital transformation. But what makes a viable digital transformation strategy? Constructing A Digital Transformation Strategy: Data Enablement. With automation, data quality is systemically assured.

Digital Transformation

Digital Transformation Strategy Metadata Data-driven

Data Architecture and Strategy in the AI Era

Cloudera

MARCH 28, 2024

But, even with the backdrop of an AI-dominated future, many organizations still find themselves struggling with everything from managing data volumes and complexity to security concerns to rapidly proliferating data silos and governance challenges. The benefits are clear, and there’s plenty of potential that comes with AI adoption.

Data Architecture

Data Architecture Strategy Data Lake Data-driven

Analyzing the business-case approach Perdue Farms takes to derive value from data

CIO Business Intelligence

SEPTEMBER 20, 2023

Mark Booth: We have a growth strategy to improve our business, and to support that, we’re driving a transformation in technology and business processes. But the more challenging work is in making our processes as efficient as possible so we capture the right data in our desire to become a more data-driven business.

Data Lake

Data Lake Data-driven Dashboards Risk

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The core issue plaguing many organizations is the presence of out-of-control databases or data lakes characterized by: Unrestrained Data Changes: Numerous users and tools incessantly alter data, leading to a tumultuous environment. Monitor freshness, schema changes, volume, and column health are standard.

Data Quality

Data Quality Testing Data Lake Data Integration

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

The following are the key components of the Bluestone Data Platform: Data mesh architecture – Bluestone adopted a data mesh architecture, a paradigm that distributes data ownership across different business units. This enables data-driven decision-making across the organization.

Data-driven

Data-driven Data Lake Data Quality Data Governance

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

These formats, exemplified by Apache Iceberg, Apache Hudi, and Delta Lake, addresses persistent challenges in traditional data lake structures by offering an advanced combination of flexibility, performance, and governance capabilities. For more information, refer to What are deletion vectors?

Snapshot

Snapshot Metadata Data Lake Optimization

What is Dark Data, Why Does it Matter, and Why Are Humans Still Needed?

Timo Elliott

JANUARY 3, 2022

It’s stored in corporate data warehouses, data lakes, and a myriad of other locations – and while some of it is put to good use, it’s estimated that around 73% of this data remains unexplored. Improving data quality. Unexamined and unused data is often of poor quality. Data augmentation.

IT

IT Unstructured Data Data Quality Machine Learning

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

When it comes to implementing and managing a successful BI strategy we have always proclaimed: start small, use the right BI tools , and involve your team. You need to determine if you are going with an on-premise or cloud-hosted strategy. You want an organization-wide buy-in of your business intelligence strategy.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Decentralize LF-tag management with AWS Lake Formation

AWS Big Data

NOVEMBER 16, 2023

One of the core features of AWS Lake Formation is the delegation of permissions on a subset of resources such as databases, tables, and columns in AWS Glue Data Catalog to data stewards, empowering them make decisions regarding who should get access to their resources and helping you decentralize the permissions management of your data lakes.

Management

Management Data Lake Sales Machine Learning

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Migrating to Amazon Redshift offers organizations the potential for improved price-performance, enhanced data processing, faster query response times, and better integration with technologies such as machine learning (ML) and artificial intelligence (AI). For more details, see Strangler Fig Application.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

How BMO improved data security with Amazon Redshift and AWS Lake Formation

AWS Big Data

MARCH 1, 2024

As they continue to implement their Digital First strategy for speed, scale and the elimination of complexity, they are always seeking ways to innovate, modernize and also streamline data access control in the Cloud. BMO has accumulated sensitive financial data and needed to build an analytic environment that was secure and performant.

Data Lake

Data Lake Data Warehouse Management Risk

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

This would be straightforward task were it not for the fact that, during the digital-era, there has been an explosion of data – collected and stored everywhere – much of it poorly governed, ill-understood, and irrelevant. Further, data management activities don’t end once the AI model has been developed. Addressing the Challenge.

Data Governance

Data Governance IT Data Lake Risk

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

Data governance is increasingly top-of-mind for customers as they recognize data as one of their most important assets. Effective data governance enables better decision-making by improving data quality, reducing data management costs, and ensuring secure access to data for stakeholders.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

CIOs rise to the ESG reporting challenge

CIO Business Intelligence

JANUARY 30, 2024

Birgit Fridrich, who joined Allianz as sustainability manager responsible for ESG reporting in late 2022, spends many hours validating data in the company’s Microsoft Sustainability Manager tool. Data quality is key, but if we’re doing it manually there’s the potential for mistakes.

Reporting

Reporting Data Quality Strategy Data-driven

Your 5-Step Journey from Analytics to AI

CIO Business Intelligence

MARCH 22, 2022

To companies entrenched in decades-old business and IT processes, data fiefdoms, and legacy systems, the task may seem insurmountable. Develop a strategy to liberate data . Which type(s) of storage consolidation you use depends on the data you generate and collect. . Just starting out with analytics?

Analytics

Analytics Key Performance Indicator Data Warehouse Data-driven

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Amazon DataZone allows you to simply and securely govern end-to-end data assets stored in your Amazon Redshift data warehouses or data lakes cataloged with the AWS Glue data catalog. Note that a managed data asset is an asset for which Amazon DataZone can manage permissions.

Metadata

Metadata Data Lake Data Processing Data-driven

Putting the Business Back Into Business Innovation

Timo Elliott

DECEMBER 14, 2022

Most innovation platforms make you rip the data out of your existing applications and move it to some another environment—a data warehouse, or data lake, or data lake house or data cloud—before you can do any innovation. The analysts call this a data mesh or data fabric strategy.

Data Lake

Data Lake Recreation/Entertainment Data Warehouse Metadata

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

Selling the value of data transformation Iyengar and his team are 18 months into a three- to five-year journey that started by building out the data layer — corralling data sources such as ERP, CRM, and legacy databases into data warehouses for structured data and data lakes for unstructured data.

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

How AWS helped Altron Group accelerate their vision for optimized customer engagement

AWS Big Data

JULY 13, 2023

This allows for transparency, speed to action, and collaboration across the group while enabling the platform team to evangelize the use of data: Altron engaged with AWS to seek advice on their data strategy and cloud modernization to bring their vision to fruition.

Optimization

Optimization B2B Data Quality Sales

Making the gen AI and data connection work

CIO Business Intelligence

AUGUST 9, 2024

The complexities of compliance In May, the Italian Data Protection Authority highlighted how training models on which gen AI systems are based always require a huge amount of data, often obtained by web scraping, or a massive and indiscriminate collection carried out on the web, it says.

Risk

Risk Measurement Data Lake Data Collection

How Fujitsu implemented a global data mesh architecture and democratized data

AWS Big Data

MAY 1, 2024

In addition, by properly separating data and processing, it becomes effortless for the teams and organizations to share, manage, and inherit processes that were traditionally confined to individual PCs. Here, the foundation role takes the lead in compiling the knowledge of domain experts and making data suitable for analysis.

Dashboards

Dashboards Publishing Data-driven Cost-Benefit

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Cloudera

JUNE 13, 2024

After countless open-source innovations ushered in the Big Data era, including the first commercial distribution of HDFS (Apache Hadoop Distributed File System), commonly referred to as Hadoop, the two companies joined forces, giving birth to an entire ecosystem of technology and tech companies.

Big Data

Big Data Machine Learning Contextual Data Data Lake

Better, faster decisions: Why businesses thrive on real-time data

CIO Business Intelligence

SEPTEMBER 8, 2022

In Foundry’s 2022 Data & Analytics Study , 88% of IT decision-makers agree that data collection and analysis have the potential to fundamentally change their business models over the next three years. The ability to pivot quickly to address rapidly changing customer or market demands is driving the need for real-time data.

Cost-Benefit

Cost-Benefit Internet of Things Data-driven Data Lake

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Data has become an invaluable asset for businesses, offering critical insights to drive strategic decision-making and operational optimization. Delta tables technical metadata is stored in the Data Catalog, which is a native source for creating assets in the Amazon DataZone business catalog.

Data Governance

Data Governance Publishing Data-driven Metadata

Differentiate generative AI applications with your data using AWS analytics and managed databases

AWS Big Data

SEPTEMBER 12, 2024

The application gets prompt templates from an S3 data lake and creates the engineered prompt. The user interaction is stored in a data lake for downstream usage and BI analysis. The application sends the prompt to Amazon Bedrock and retrieves the LLM output.

Management

Management Analytics Data Lake Interactive

A data strategy checklist for the journey to the data-driven enterprise

BI-Survey

DECEMBER 22, 2020

Managers see data as relevant in the context of digitalization, but often think of data-related problems as minor details that have little strategic importance. Thus, it is taken for granted that companies should have a data strategy. But what is the scope of an effective strategy and who is affected by it?

Data-driven

Data-driven Data Strategy Strategy Enterprise

8 tips for unleashing the power of unstructured data

CIO Business Intelligence

NOVEMBER 28, 2023

Making the most of enterprise data is a top concern for IT leaders today. With organizations seeking to become more data-driven with business decisions, IT leaders must devise data strategies gear toward creating value from data no matter where — or in what form — it resides. Quality is job one.

Unstructured Data

Unstructured Data Data-driven Visualization Data Quality

How data literacy allows gen AI to drive productivity at Dow

CIO Business Intelligence

JULY 31, 2024

For Melanie Kalmar, the answer is data literacy and a strong foundation in tech. How do data and digital technologies impact your business strategy? At the core, digital at Dow is about changing how we work, which includes how we interact with systems, data, and each other to be more productive and to grow.

Manufacturing

Manufacturing Cost-Benefit Digital Transformation Forecasting

A comparative assessment of digital transformation in Italy

CIO Business Intelligence

APRIL 24, 2024

In fact, AMA collects a huge amount of structured and unstructured data from bins, collection vehicles, facilities, and user reports, and until now, this data has remained disconnected, managed by disparate systems and interfaces, through Excel spreadsheets.

Digital Transformation

Digital Transformation Business Intelligence Unstructured Data Data Lake

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Ontotext

FEBRUARY 12, 2024

Usually, organizations will combine different domain topologies, depending on the trade-offs, and choose to focus on specific aspects of data mesh. Once accomplished, an effective implementation spurs a mindset in which organizations prioritize and value data for decision-making, formulating strategies, and day-to-day operations.

Data-driven

Data-driven Data Lake Data Quality Business Objectives

6 BI challenges IT teams must address

CIO Business Intelligence

DECEMBER 21, 2022

Every day, organizations of every description are deluged with data from a variety of sources, and attempting to make sense of it all can be overwhelming. So a strong business intelligence (BI) strategy can help organize the flow and ensure business users have access to actionable business insights. “By

IT

IT Business Intelligence Sales Key Performance Indicator

How Data Management and Big Data Analytics Speed Up Business Growth

BizAcuity

APRIL 14, 2022

Big Data technology in today’s world. Did you know that the big data and business analytics market is valued at $198.08 Or that the US economy loses up to $3 trillion per year due to poor data quality? quintillion bytes of data which means an average person generates over 1.5 megabytes of data every second?

Big Data

Big Data Data Analytics Management Analytics

Modern Data Architecture for Telecommunications

Cloudera

SEPTEMBER 6, 2022

Previously, there were three types of data structures in telco: . Entity data sets — i.e. marketing data lakes . The result has been an extraordinary volume of data redundancy across the business, leading to disaggregated data strategy, unknown compliance exposures, and inconsistencies in data-based processes. .

Data Architecture

Data Architecture Cost-Benefit Digital Transformation Business Driver

Data’s dark secret: Why poor quality cripples AI and growth

Visualize data quality scores and metrics generated by AWS Glue Data Quality

Webinars

Trending Sources

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Webinars

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Data architecture strategy for data quality

Steps taken to build Sevita’s first enterprise data platform

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Building a Beautiful Data Lakehouse

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

Data Architecture and Strategy in the AI Era

Analyzing the business-case approach Perdue Farms takes to derive value from data

Create an end-to-end data strategy for Customer 360 on AWS

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Data governance in the age of generative AI

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Use open table format libraries on AWS Glue 5.0 for Apache Spark

What is Dark Data, Why Does it Matter, and Why Are Humans Still Needed?

Accomplish Agile Business Intelligence & Analytics For Your Business

Decentralize LF-tag management with AWS Lake Formation

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

How BMO improved data security with Amazon Redshift and AWS Lake Formation

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

AWS Lake Formation 2022 year in review

CIOs rise to the ESG reporting challenge

Your 5-Step Journey from Analytics to AI

Governing data in relational databases using Amazon DataZone

Putting the Business Back Into Business Innovation

Straumann Group is transforming dentistry with data, AI

How AWS helped Altron Group accelerate their vision for optimized customer engagement

Making the gen AI and data connection work

How Fujitsu implemented a global data mesh architecture and democratized data

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Better, faster decisions: Why businesses thrive on real-time data

HEMA accelerates their data governance journey with Amazon DataZone

Differentiate generative AI applications with your data using AWS analytics and managed databases

A data strategy checklist for the journey to the data-driven enterprise

8 tips for unleashing the power of unstructured data

How data literacy allows gen AI to drive productivity at Dow

A comparative assessment of digital transformation in Italy

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

6 BI challenges IT teams must address

How Data Management and Big Data Analytics Speed Up Business Growth

Modern Data Architecture for Telecommunications

Stay Connected