Business Intelligence, Data Architecture and Metadata

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. In practice, OTFs are used in a broad range of analytical workloads, from business intelligence to machine learning.

Metadata

Metadata Data Lake Snapshot Data Warehouse

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. The past decades of enterprise data platform architectures can be summarized in 69 words. Introduction to Data Mesh. Source: Thoughtworks.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI.

Metadata

Metadata Data Lake Dashboards Interactive

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

In this post, we show you how EUROGATE uses AWS services, including Amazon DataZone , to make data discoverable by data consumers across different business units so that they can innovate faster. From here, the metadata is published to Amazon DataZone by using AWS Glue Data Catalog.

IoT

IoT Machine Learning Metadata Data-driven

How Metadata Makes Data Meaningful

erwin

DECEMBER 12, 2019

Metadata is an important part of data governance, and as a result, most nascent data governance programs are rife with project plans for assessing and documenting metadata. But in many scenarios, it seems that the underlying driver of metadata collection projects is that it’s just something you do for data governance.

Metadata

Metadata Data Governance Digital Transformation Data Quality

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS. Each file arrives as a pair with a tail metadata file in CSV format containing the size and name of the file.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

Modern data architectures. To eliminate or integrate these silos, the public sector needs to adopt robust data management solutions that support modern data architectures (MDAs). Deploying modern data architectures. Lack of sharing hinders the elimination of fraud, waste, and abuse. Forrester ).

Data Architecture

Data Architecture Data Lake Data Warehouse Metadata

5 Ways Data Modeling Is Critical to Data Governance

erwin

JANUARY 9, 2020

That’s because it’s the best way to visualize metadata , and metadata is now the heart of enterprise data management and data governance/ intelligence efforts. So here’s why data modeling is so critical to data governance. erwin Data Modeler: Where the Magic Happens.

Data Governance

Data Governance Modeling Metadata Unstructured Data

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

Octopai

APRIL 19, 2021

Aptly named, metadata management is the process in which BI and Analytics teams manage metadata, which is the data that describes other data. In other words, data is the context and metadata is the content. Without metadata, BI teams are unable to understand the data’s full story.

Metadata

Metadata Management Business Intelligence Data Governance

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

But there’s another factor of data quality that doesn’t get the recognition it deserves: your data architecture. How the right data architecture improves data quality. What does a modern data architecture do for your business? Reduce data duplication and fragmentation.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architecture is a complex and varied field and different organizations and industries have unique needs when it comes to their data architects. Solutions data architect: These individuals design and implement data solutions for specific business needs, including data warehouses, data marts, and data lakes.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Making OT-IT integration a reality with new data architectures and generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

Here, industrial knowledge graphs are going to prove vital by enabling manufacturers to combine structured and unstructured data from a wide range of operational and enterprise software systems to drive better decision-making, problem-solving and more advanced automation.”

Data Architecture

Data Architecture Unstructured Data Manufacturing IT

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The program must introduce and support standardization of enterprise data. Programs must support proactive and reactive change management activities for reference data values and the structure/use of master data and metadata.

Data Governance

Data Governance Management Metadata Data Quality

How Metadata Makes Data Meaningful

erwin

DECEMBER 12, 2019

Metadata is an important part of data governance, and as a result, most nascent data governance programs are rife with project plans for assessing and documenting metadata. But in many scenarios, it seems that the underlying driver of metadata collection projects is that it’s just something you do for data governance.

Metadata

Metadata Data Governance Digital Transformation Data Quality

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

Over the years, data lakes on Amazon Simple Storage Service (Amazon S3) have become the default repository for enterprise data and are a common choice for a large set of users who query data for a variety of analytics and machine leaning use cases. Analytics use cases on data lakes are always evolving.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

First, you must understand the existing challenges of the data team, including the data architecture and end-to-end toolchain. Monitoring Job Metadata. Monitoring and tracking is an essential feature that many data teams are looking to add to their pipelines. Second, you must establish a definition of “done.”

Testing

Testing Metadata Dashboards Statistics

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Lakehouse provides an open data architecture that reduces data silos and unifies data across Amazon Simple Storage Service (Amazon S3) data lakes, Redshift data warehouses, and third-party and federated data sources. connection testing, metadata retrieval, and data preview.

Analytics

Analytics Data Lake Metadata Data Warehouse

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Despite the potential separation of storage and compute in terms of architecture, they are often effectively fused together. This amalgamation empowers vendors with authority over a diverse range of workloads by virtue of owning the data. execute() Remove old metadata files Iceberg keeps track of table metadata using JSON files.

Data Lake

Data Lake Metadata Snapshot Analytics

SAP enhances Datasphere and SAC for AI-driven transformation

CIO Business Intelligence

MARCH 6, 2024

The company is expanding its partnership with Collibra to integrate Collibra’s AI Governance platform with SAP data assets to facilitate data governance for non-SAP data assets in customer environments. “We We are also seeing customers bringing in other data assets from other apps or data sources.

Unstructured Data

Unstructured Data Dashboards Business Intelligence Data Governance

SAP Datasphere review: turning data from a technical problem to a business data product.

Jen Stirrup

MARCH 29, 2023

SAP Datasphere helps eliminate hidden data debt within organizations, enabling customers to build a business data fabric architecture that quickly delivers meaningful data with business context and logic intact. Business Intelligence is often a search problem in disguise.

Data Warehouse

Data Warehouse Metadata Data Integration Business Intelligence

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

Today, the way businesses use data is much more fluid; data literate employees use data across hundreds of apps, analyze data for better decision-making, and access data from numerous locations. It uses knowledge graphs, semantics and AI/ML technology to discover patterns in various types of metadata.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

With Cloudera’s vision of hybrid data , enterprises adopting an open data lakehouse can easily get application interoperability and portability to and from on premises environments and any public cloud without worrying about data scaling. Why integrate Apache Iceberg with Cloudera Data Platform?

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

They conveniently store data in a flat architecture that can be queried in aggregate and offer the speed and lower cost required for big data analytics. On the other hand, they don’t support transactions or enforce data quality. If only there were a best-of-both-worlds compromise. .

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

How the right data and AI foundation can empower a successful ESG strategy

IBM Big Data Hub

APRIL 10, 2023

A well-designed data architecture should support business intelligence and analysis, automation, and AI—all of which can help organizations to quickly seize market opportunities, build customer value, drive major efficiencies, and respond to risks such as supply chain disruptions.

Strategy

Strategy Data Architecture Cost-Benefit Reporting

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

AWS Glue Data Catalog stores information as metadata tables, where each table specifies a single data store. The AWS Glue crawler writes metadata to the Data Catalog by classifying the data to determine the format, schema, and associated properties of the data.

Metadata

Metadata Dashboards Metrics Visualization

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift is a fast, fully managed petabyte-scale cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Iceberg stores the metadata pointer for all the metadata files.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

The Future Is Hybrid Data, Embrace It

CIO Business Intelligence

JUNE 23, 2022

Companies can now capitalize on the value in all their data, by delivering a hybrid data platform for modern data architectures with data anywhere. Cloudera Data Platform (CDP) is designed to address the critical requirements for modern data architectures today and tomorrow.

IT

IT Data Architecture Unstructured Data Big Data

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

“You had to be an expert in the programming language that interacts with that data, and understand the relationships of each data element within each data source, let alone understand its relation to elements in other data sources,” he says. Without those templates, it’s hard to add such information after the fact.”

Analytics

Analytics Data Lake Metadata Cost-Benefit

Enterprise Data Management — Driving Large-Scale Change in Your Organization

Sisense

JULY 6, 2020

First off, this involves defining workflows for every business process within the enterprise: the what, how, why, who, when, and where aspects of data. These regulations, ultimately, ensure key business values: data consistency, quality, and trustworthiness. Benefits of enterprise data management.

Enterprise

Enterprise Management Data Architecture Data-driven

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

However, as data processing at scale solutions grow, organizations need to build more and more features on top of their data lakes. Apache Iceberg overview Iceberg is an open-source table format that brings the power of SQL tables to big data files. The Iceberg table is synced with the AWS Glue Data Catalog.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

The majority of data produced by these accounts is used downstream for business intelligence (BI) purposes and in Amazon Athena , by hundreds of business users every day. The solution Acast implemented is a data mesh, architected on AWS.

Data-driven

Data-driven Advertising Metadata Data Architecture

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

How Finance is Leveraging Automated Data Lineage for Regulations Compliance

Octopai

APRIL 8, 2020

While there are many factors that led to this event, one critical dynamic was the inadequacy of the data architectures supporting banks and their risk management systems. Let’s examine how these processes failed, what regulations were put in place as a result, and what this means for business intelligence teams today.

Finance

Finance Cost-Benefit Metadata Data Architecture

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

Ontotext

AUGUST 4, 2023

A data fabric utilizes an integrated data layer over existing, discoverable, and inferenced metadata assets to support the design, deployment, and utilization of data across enterprises, including hybrid and multi-cloud platforms. It also helps capture and connect data based on business or domains.

Metadata

Metadata Data-driven Data Architecture Data Quality

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

Overview of solution As a data-driven company, smava relies on the AWS Cloud to power their analytics use cases. smava ingests data from various external and internal data sources into a landing stage on the data lake based on Amazon Simple Storage Service (Amazon S3). This is the Data Mart stage.

Data Lake

Data Lake Data Warehouse Data-driven B2B

The Key to Faster Impact Analysis: Automated Data Lineage

Octopai

JUNE 15, 2020

Business intelligence databases are dynamic repositories that must often be adjusted based on organizational and or regulatory requirements. With the insurance company’s current data architecture, the process would have no chance of being completed in time for the change. How can BI teams reduce this laborious process?

Insurance

Insurance Metadata Business Intelligence Data Warehouse

How Huron built an Amazon QuickSight Asset Catalogue with AWS CDK Based Deployment Pipeline

AWS Big Data

APRIL 26, 2023

Having an accurate and up-to-date inventory of all technical assets helps an organization ensure it can keep track of all its resources with metadata information such as their assigned oners, last updated date, used by whom, how frequently and more. This is a guest blog post co-written with Corey Johnson from Huron.

Metadata

Metadata Dashboards Visualization Consulting

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. These data pipelines generate valuable insights and curated data that are stored in Apache Iceberg tables for downstream usage.

Data Lake

Data Lake Analytics Snapshot Data Quality

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

In today’s AI/ML-driven world of data analytics, explainability needs a repository just as much as those doing the explaining need access to metadata, EG, information about the data being used. The Cloud Data Migration Challenge. On-premises business intelligence and databases. Cloud governance.

Metadata

Metadata Data Governance Data-driven Modeling

Data integrity vs. data quality: Is there a difference?

IBM Big Data Hub

JULY 13, 2023

The more complete, accurate and consistent a dataset is, the more informed business intelligence and business processes become. Geocoding Geocoding is the process of adding location metadata to an organization’s datasets. Learn more about designing the right data architecture to elevate your data quality here.

Data Quality

Data Quality Data Integration Metadata Cost-Benefit

Run Apache XTable in AWS Lambda for background conversion of open table formats

What is a Data Mesh?

Webinars

Trending Sources

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Webinars

How EUROGATE established a data mesh architecture using Amazon DataZone

How Metadata Makes Data Meaningful

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

Data’s dark secret: Why poor quality cripples AI and growth

Breaking State and Local Data Silos with Modern Data Architectures

5 Ways Data Modeling Is Critical to Data Governance

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

Data architecture strategy for data quality

What is a data architect? Skills, salaries, and how to become a data framework master

Making OT-IT integration a reality with new data architectures and generative AI

What is data governance? Best practices for managing data assets

How Metadata Makes Data Meaningful

Migrate an existing data lake to a transactional data lake using Apache Iceberg

A Day in the Life of a DataOps Engineer

Top analytics announcements of AWS re:Invent 2024

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

SAP enhances Datasphere and SAC for AI-driven transformation

SAP Datasphere review: turning data from a technical problem to a business data product.

Data democratization: How data architecture can drive business decisions and AI initiatives

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Building a Beautiful Data Lakehouse

How the right data and AI foundation can empower a successful ESG strategy

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

The Future Is Hybrid Data, Embrace It

Lay the groundwork now for advanced analytics and AI

Enterprise Data Management — Driving Large-Scale Change in Your Organization

The Future of the Data Lakehouse – Open

Choosing an open table format for your transactional data lake on AWS

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Design a data mesh on AWS that reflects the envisioned organization

The Future of the Data Lakehouse – Open

How Finance is Leveraging Automated Data Lineage for Regulations Compliance

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

How smava makes loans transparent and affordable using Amazon Redshift Serverless

The Key to Faster Impact Analysis: Automated Data Lineage

How Huron built an Amazon QuickSight Asset Catalogue with AWS CDK Based Deployment Pipeline

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

The Cloud Connection: How Governance Supports Security

Data integrity vs. data quality: Is there a difference?

Stay Connected