Data Architecture, Metadata and Metrics

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Need for a data mesh architecture Because entities in the EUROGATE group generate vast amounts of data from various sourcesacross departments, locations, and technologiesthe traditional centralized data architecture struggles to keep up with the demands for real-time insights, agility, and scalability.

IoT

IoT Machine Learning Metadata Data-driven

How to Manage Risk with Modern Data Architectures

Cloudera

JUNE 29, 2023

To improve the way they model and manage risk, institutions must modernize their data management and data governance practices. Implementing a modern data architecture makes it possible for financial institutions to break down legacy data silos, simplifying data management, governance, and integration — and driving down costs.

Data Architecture

Data Architecture Risk Management Risk Management

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS. Each file arrives as a pair with a tail metadata file in CSV format containing the size and name of the file.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

First, you must understand the existing challenges of the data team, including the data architecture and end-to-end toolchain. Monitoring Job Metadata. Monitoring and tracking is an essential feature that many data teams are looking to add to their pipelines. Second, you must establish a definition of “done.”

Testing

Testing Metadata Dashboards Statistics

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

This solution only replicates metadata in the Data Catalog, not the actual underlying data. To have a redundant data lake using Lake Formation and AWS Glue in an additional Region, we recommend replicating the Amazon S3-based storage using S3 replication , S3 sync, aws-s3-copy-sync-using-batch or S3 Batch replication process.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

The Future of Data Lineage and the Role of Metadata

Alation

AUGUST 18, 2022

The challenge today is to think more broadly about what these data things could or should be. It’s important to realize that we need visibility into lineage and relationships between all data and data-related assets, including business terms, metric definitions, policies, quality rules, access controls, algorithms, etc.

Metadata

Metadata Visualization Statistics Data Architecture

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

The AWS Glue Data Catalog is a metastore of the location, schema, and runtime metrics of your data. AWS Glue Data Catalog stores information as metadata tables, where each table specifies a single data store. Running the crawler on a schedule updates AWS Glue Data Catalog with new partitions and metadata.

Metadata

Metadata Dashboards Metrics Visualization

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Lakehouse provides an open data architecture that reduces data silos and unifies data across Amazon Simple Storage Service (Amazon S3) data lakes, Redshift data warehouses, and third-party and federated data sources. connection testing, metadata retrieval, and data preview.

Analytics

Analytics Data Lake Metadata Data Warehouse

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

AWS Big Data

NOVEMBER 7, 2024

BladeBridge offers a comprehensive suite of tools that automate much of the complex conversion work, allowing organizations to quickly and reliably transition their data analytics capabilities to the scalable Amazon Redshift data warehouse. Amazon Redshift is a fully managed data warehouse service offered by Amazon Web Services (AWS).

Data Warehouse

Data Warehouse Reporting Big Data Data Lake

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Since Apache Iceberg is well supported by AWS data services and Cloudinary was already using Spark on Amazon EMR, they could integrate writing to Data Catalog and start an additional Spark cluster to handle data maintenance and compaction. For example, for certain queries, Athena runtime was 2x–4x faster than Snowflake.

Data Lake

Data Lake Metadata Snapshot Analytics

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

BI Data Lineage Solutions: Your Trusted Guide For Success

Octopai

JULY 9, 2020

When conducted manually, however, which has tended to be the normal mode of operation before companies discovered automation – or machine learning data lineage solutions, data lineage can be extremely tedious and time-consuming for BI & Analytics teams. A key piece of legislation that emerged from that crisis was BCBS-239.

Insurance

Insurance Risk Management Machine Learning Metadata

Embedding AI Into Every Aspect of Your Business

Cloudera

JULY 20, 2021

Invest in maturing and improving your enterprise business metrics and metadata repositories, a multitiered data architecture, continuously improving data quality, and managing data acquisitions. Then back this up by embedding compliance and security protocols throughout the insights generation cycle.

Manufacturing

Manufacturing Forecasting IoT Insurance

Boosting Object Storage Performance with Ozone Manager

Cloudera

JULY 19, 2023

It is a replicated, highly-available service that is responsible for managing the metadata for all objects stored in Ozone. As Ozone scales to exabytes of data, it is important to ensure that Ozone Manager can perform at scale. The tool reads only the metadata for objects in a cluster with around 100 million keys.

Management

Management Metadata Metrics Optimization

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. Metadata tables offer insights into the physical data storage layout of the tables and offer the convenience of querying them with Athena version 3.

Data Lake

Data Lake Analytics Snapshot Data Quality

How Finance is Leveraging Automated Data Lineage for Regulations Compliance

Octopai

APRIL 8, 2020

While there are many factors that led to this event, one critical dynamic was the inadequacy of the data architectures supporting banks and their risk management systems. It required banks to maintain data architecture supporting risk aggregation at all times. These tools extract, sort, and integrate thousands of metrics.

Finance

Finance Cost-Benefit Metadata Data Architecture

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

It seamlessly consolidates data from various data sources within AWS, including AWS Cost Explorer (and forecasting with Cost Explorer ), AWS Trusted Advisor , and AWS Compute Optimizer. Data providers and consumers are the two fundamental users of a CDH dataset.

Dashboards

Dashboards Analytics Metadata Data Warehouse

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

Many organizations already use AWS Glue Data Quality to define and enforce data quality rules on their data, validate data against predefined rules , track data quality metrics, and monitor data quality over time using artificial intelligence (AI). option("header", "true").option("inferSchema",

Data Quality

Data Quality Visualization Metadata Metrics

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. The File Manager Lambda function consumes those messages, parses the metadata, and inserts the metadata to the DynamoDB table odpf_file_tracker.

Data Lake

Data Lake Data Processing Metadata Snapshot

What Is a Data Fabric and How Does a Data Catalog Support It?

Alation

JANUARY 25, 2022

As a reminder, here’s Gartner’s definition of data fabric: “A design concept that serves as an integrated layer (fabric) of data and connecting processes. In this blog, we will focus on the “integrated layer” part of this definition by examining each of the key layers of a comprehensive data fabric in more detail.

Metadata

Metadata IT Data-driven Metrics

Announcing the 2019 Data Impact Awards

Cloudera

MAY 22, 2019

The program recognizes organizations that are using Cloudera’s platform and services to unlock the power of data, with massive business and social impact. Cloudera’s data superheroes design modern data architectures that work across hybrid and multi-cloud and solve complex data management and analytic use cases spanning from the Edge to AI.

Machine Learning

Machine Learning Data Architecture IoT Data Warehouse

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

Parameters of success Acast succeeded in bootstrapping and scaling a new team- and domain-oriented data product and its corresponding infrastructure and setup, resulting in less friction in gathering insights and happier users and consumers. In this approach, teams responsible for generating data are referred to as producers.

Data-driven

Data-driven Advertising Metadata Data Architecture

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

Overview of solution As a data-driven company, smava relies on the AWS Cloud to power their analytics use cases. smava ingests data from various external and internal data sources into a landing stage on the data lake based on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Data Warehouse Data-driven B2B

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

AWS Big Data

OCTOBER 9, 2024

You can plan a period of parallel runs, where the legacy and new systems run in parallel, and the data is compared daily. Use functional queries to compare high-level aggregated business metrics between the source on-premises database and the target data lake. Outside of work, Himanshu likes playing chess and tennis.

Data Quality

Data Quality Data Lake Data Warehouse Metrics

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

The following figure shows some of the metrics derived from the study. Profile aggregation – When you’ve uniquely identified a customer, you can build applications in Managed Service for Apache Flink to consolidate all their metadata, from name to interaction history. Then, you transform this data into a concise format.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

Kinesis Data Streams has native integrations with other AWS services such as AWS Glue and Amazon EventBridge to build real-time streaming applications on AWS. Refer to Amazon Kinesis Data Streams integrations for additional details. Lambda is good for event-based and stateless processing.

Analytics

Analytics IoT Data-driven Snapshot

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

With fast and fine-grained scaling in EMR Serverless, if a pipeline runs daily and needs to process 1 GB of data one day and 100 GB of data another day, EMR Serverless automatically scales to handle that load. Monitoring – EMR Serverless sends metrics to Amazon CloudWatch at the application and job level every 1 minute.

Data Lake

Data Lake Dashboards Metrics Metadata

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Stream processing, however, can enable the chatbot to access real-time data and adapt to changes in availability and price, providing the best guidance to the customer and enhancing the customer experience. When the model finds an anomaly or abnormal metric value, it should immediately produce an alert and notify the operator.

Data Lake

Data Lake Unstructured Data Management Snapshot

How to Build a Performant Data Warehouse in Redshift

Sisense

SEPTEMBER 3, 2019

Think of it like something that houses the metrics used to power daily, weekly, or monthly business KPIs. roll-ups of many rows of data). Modeling Your Data for Performance. Data architecture. The data landscape has changed significantly over the last two decades. Redshift is a type of OLAP database.

Data Warehouse

Data Warehouse OLAP Statistics Cost-Benefit

5 Data Governance Mistakes to Avoid

Alation

APRIL 25, 2023

More specifically, it describes the process of creating, administering, and adapting a comprehensive plan for how an organization’s data will be managed. In this way, data governance has implications for a wide range of data management disciplines, including data architecture, quality, security, metadata, and more.

Data Governance

Data Governance Marketing Machine Learning Sales

5 Data Governance Mistakes to Avoid

Alation

APRIL 25, 2023

More specifically, it describes the process of creating, administering, and adapting a comprehensive plan for how an organization’s data will be managed. In this way, data governance has implications for a wide range of data management disciplines, including data architecture, quality, security, metadata, and more.

Data Governance

Data Governance Marketing Machine Learning Sales

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Andrew White

JANUARY 9, 2022

This is the same for scope, outcomes/metrics, practices, organization/roles, and technology. Check this out: The Foundation of an Effective Data and Analytics Operating Model — Presentation Materials. Most of D&A concerns and activities are done within EA in the Info/Data architecture domain/phases.

Analytics

Analytics Measurement Data-driven Modeling

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

As a result, end users can better view shared metrics (backed by accurate data), which ultimately drives performance. When treating a patient, a doctor may wish to study the patient’s vital metrics in comparison to those of their peer group. Visual Analytics Users are given data from which they can uncover new insights.

Analytics

Analytics Cost-Benefit Visualization Dashboards

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

On the other hand, DataOps Observability refers to understanding the state and behavior of data as it flows through systems. It allows organizations to see how data is being used, where it is coming from, and how it is being transformed. Data lineage is static and often lags by weeks or months. Are problems with data tests?

Testing

Testing Data Governance Data Quality Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Trending Sources

How to Manage Risk with Modern Data Architectures

Webinars

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

A Day in the Life of a DataOps Engineer

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

The Future of Data Lineage and the Role of Metadata

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Top analytics announcements of AWS re:Invent 2024

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

BI Data Lineage Solutions: Your Trusted Guide For Success

Embedding AI Into Every Aspect of Your Business

Boosting Object Storage Performance with Ozone Manager

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

How Finance is Leveraging Automated Data Lineage for Regulations Compliance

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

Choosing an open table format for your transactional data lake on AWS

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

What Is a Data Fabric and How Does a Data Catalog Support It?

Announcing the 2019 Data Impact Awards

Design a data mesh on AWS that reflects the envisioned organization

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

Create an end-to-end data strategy for Customer 360 on AWS

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Exploring real-time streaming for generative AI Applications

How to Build a Performant Data Warehouse in Redshift

5 Data Governance Mistakes to Avoid

5 Data Governance Mistakes to Avoid

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

What Is Embedded Analytics?

“You Complete Me,” said Data Lineage to DataOps Observability.

Stay Connected