Data Processing, Metadata and Publishing

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

We want to publish this data to Amazon DataZone as discoverable S3 data. Custom subscription workflow architecture diagram To implement the solution, we complete the following steps: As a data producer, publish an unstructured S3 based data asset as S3ObjectCollectionType to Amazon DataZone.

Publishing

Publishing Unstructured Data Metadata Data-driven

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

To achieve this, EUROGATE designed an architecture that uses Amazon DataZone to publish specific digital twin data sets, enabling access to them with SageMaker in a separate AWS account. From here, the metadata is published to Amazon DataZone by using AWS Glue Data Catalog. This process is shown in the following figure.

IoT

IoT Machine Learning Metadata Data-driven

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Next, we focus on building the enterprise data platform where the accumulated data will be hosted. Business analysts enhance the data with business metadata/glossaries and publish the same as data assets or data products. The enterprise data platform is used to host and analyze the sales data and identify the customer demand.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

For instance, Domain A will have the flexibility to create data products that can be published to the divisional catalog, while also maintaining the autonomy to develop data products that are exclusively accessible to teams within the domain. A data portal for consumers to discover data products and access associated metadata.

Metadata

Metadata Data Governance Data Quality Data-driven

Integrate custom applications with AWS Lake Formation – Part 2

AWS Big Data

NOVEMBER 19, 2024

Solution overview AWS AppSync creates serverless GraphQL and pub/sub APIs that simplify application development through a single endpoint to securely query, update, or publish data. For simplicity, we use the Hosting with Amplify Console and Manual Deployment options. All the resources are now deployed on AWS and ready for use.

Data Processing

Data Processing Metadata Publishing Testing

Reduce your compute costs for stream processing applications with Kinesis Client Library 3.0

AWS Big Data

NOVEMBER 6, 2024

Kinesis Data Streams not only offers the flexibility to use many out-of-box integrations to process the data published to the streams, but also provides the capability to build custom stream processing applications that can be deployed on your compute fleet. KCL uses DynamoDB to store metadata such as shard-worker mapping and checkpoints.

Cost-Benefit

Cost-Benefit Metadata Optimization Publishing

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

The CDH is used to create, discover, and consume data products through a central metadata catalog, while enforcing permission policies and tightly integrating data engineering, analytics, and machine learning services to streamline the user journey from data to insight.

Data Lake

Data Lake Sales Metadata Machine Learning

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

The retail team, acting as the data producer, publishes the necessary data assets to Amazon DataZone, allowing you, as a consumer, to discover and subscribe to these assets. Publish data assets – As the data producer from the retail team, you must ingest individual data assets into Amazon DataZone.

Visualization

Visualization Data Lake Testing Data Governance

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of. For Host , enter your host name of your Aurora PostgreSQL database cluster. On your project, in the navigation pane, choose Data.

Visualization

Visualization Data Processing Testing Publishing

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

AWS Big Data

JUNE 19, 2023

The sample solution relies on access to a public S3 bucket hosted for this blog so egress rules and permissions modifications may be required if you use S3 endpoints. In a series of follow-up posts, we will review the source code and walkthrough published examples of the Lambda ingestion framework in the AWS Samples GitHub repo.

Publishing

Publishing Dashboards Visualization Management

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

This post explains how you can extend the governance capabilities of Amazon DataZone to data assets hosted in relational databases based on MySQL, PostgreSQL, Oracle or SQL Server engines. Second, the data producer needs to consolidate the data asset’s metadata in the business catalog and enrich it with business metadata.

Metadata

Metadata Data Lake Data Processing Data-driven

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

AWS Big Data

APRIL 17, 2024

The following diagram illustrates an indexing flow involving a metadata update in OR1 During indexing operations, individual documents are indexed into Lucene and also appended to a write-ahead log also known as a translog. The replica copies subsequently download newer segments and make them searchable.

Optimization

Optimization Snapshot Metadata Cost-Benefit

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

But there’s a host of new challenges when it comes to managing AI projects: more unknowns, non-deterministic outcomes, new infrastructures, new processes and new tools. You might establish a baseline by replicating collaborative filtering models published by teams that built recommenders for MovieLens, Netflix, and Amazon.

Management

Management Machine Learning Experimentation Metrics

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

In this blog, we discuss the technical challenges faced by Cargotec in replicating their AWS Glue metadata across AWS accounts, and how they navigated these challenges successfully to enable cross-account data sharing. Solution overview Cargotec required a single catalog per account that contained metadata from their other AWS accounts.

Metadata

Metadata Data Lake Machine Learning Big Data

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Each service is hosted in a dedicated AWS account and is built and maintained by a product owner and a development team, as illustrated in the following figure. Delta tables technical metadata is stored in the Data Catalog, which is a native source for creating assets in the Amazon DataZone business catalog.

Data Governance

Data Governance Publishing Data-driven Metadata

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Datasets used for generating insights are curated using materialized views inside the database and published for business intelligence (BI) reporting. The second streaming data source constitutes metadata information about the call center organization and agents that gets refreshed throughout the day.

Management

Management Metadata Analytics Dashboards

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Data and Metadata: Data inputs and data outputs produced based on the application logic. Also included, business and technical metadata, related to both data inputs / data outputs, that enable data discovery and achieving cross-organizational consensus on the definitions of data assets. Key Design Principles of a Data Mesh.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. Centralized catalog for published data – Multiple producers release data currently governed by their respective entities. For consumer access, a centralized catalog is necessary where producers can publish their data assets.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Build efficient, cross-Regional, I/O-intensive workloads with Dask on AWS

AWS Big Data

MAY 4, 2023

Amazon’s Open Data Sponsorship Program allows organizations to host free of charge on AWS. Solution overview Each day, the UK Met Office produces up to 300 TB of weather and climate data, a portion of which is published to ASDI. These datasets are distributed across the world and hosted for public use.

Data Processing

Data Processing Metadata Informatics Interactive

Use Amazon OpenSearch Ingestion to migrate to Amazon OpenSearch Serverless

AWS Big Data

FEBRUARY 27, 2024

Migration of metadata such as security roles and dashboard objects will be covered in another subsequent post. Update the following information for the source: Uncomment hosts and specify the endpoint of the existing OpenSearch Service endpoint. For now, you can leave the default minimum as 1 and maximum as 4.

Metadata

Metadata Data Processing Dashboards IoT

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

An AWS Glue crawler scans data on the S3 bucket and populates table metadata on the AWS Glue Data Catalog. Publish the QuickSight dashboard When the analysis is ready, complete the following steps to publish the dashboard: Choose PUBLISH. Select Publish new dashboard as , and enter GlueObservabilityDashboard.

Metrics

Metrics Visualization Dashboards Interactive

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

Its cloud-hosted tool manages customer communications to deliver the right messages at times when they can be absorbed. Its platform supports both publishers and advertisers so both can understand which creative work delivers the best results. Pega builds a low-code platform for designing and executing digital marketing campaigns.

Management

Management Advertising Data Lake Sales

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

This enabled producers to publish data products that were curated and authoritative assets for their domain. The FinAuto team built AWS Cloud Development Kit (AWS CDK), AWS CloudFormation , and API tools to maintain a metadata store that ingests from domain owner catalogs into the global catalog.

Finance

Finance Metadata Big Data Recreation/Entertainment

How REA Group approaches Amazon MSK cluster capacity planning

AWS Big Data

DECEMBER 5, 2024

Hydro is powered by Amazon MSK and other tools with which teams can move, transform, and publish data at low latency using event-driven architectures. In each environment, Hydro manages a single MSK cluster that hosts multiple tenants with differing workload requirements.

Metrics

Metrics Dashboards Testing Optimization

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

It involves: Reviewing data in detail Comparing and contrasting the data to its own metadata Running statistical models Data quality reports. According to recent information published by Gartner, poor data quality costs businesses an average of $12.9 Metadata management: Good data quality control starts with metadata management.

Data Quality

Data Quality Metrics Data-driven Management

Achieve high availability in Amazon OpenSearch Multi-AZ with Standby enabled domains: A deep dive into failovers

AWS Big Data

JANUARY 10, 2024

During the query phase of a search request, the coordinator determines the shards to be queried and sends a request to the data node hosting the shard copy. OpenSearch Service utilizes an internal node-to-node communication protocol for replicating write traffic and coordinating metadata updates through an elected leader.

Metadata

Metadata Broadcasting Data Processing Modeling

Ingest and analyze your data using Amazon OpenSearch Service with Amazon OpenSearch Ingestion

AWS Big Data

JUNE 12, 2024

Processors – The intermediate processing units that can filter, transform, and enrich records into a desired format before publishing them to the sink. It defines one or more destinations to which a pipeline publishes records. The processor is an optional component of a pipeline. Sink – The output component of a pipeline.

Dashboards

Dashboards Visualization Sales IoT

Implement a full stack serverless search application using AWS Amplify, Amazon Cognito, Amazon API Gateway, AWS Lambda, and Amazon OpenSearch Serverless

AWS Big Data

MAY 31, 2024

Amazon API Gateway is a fully managed service that makes it straightforward for developers to create, publish, maintain, monitor, and secure APIs at any scale. The workflow includes the following steps: The end-user accesses the CloudFront and Amazon S3 hosted movie search web application from their browser or mobile device.

Metadata

Metadata Data-driven Management Testing

How Amazon GTTS runs large-scale ETL jobs on AWS using Amazon MWAA

AWS Big Data

AUGUST 6, 2024

At a high level, the core of Langley’s architecture is based on a set of Amazon Simple Queue Service (Amazon SQS) queues and AWS Lambda functions, and a dedicated RDS database to store ETL job data and metadata. Web UI Amazon MWAA comes with a managed web server that hosts the Airflow UI.

Cost-Benefit

Cost-Benefit Metadata Snapshot Metrics

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

Its cloud-hosted tool manages customer communications to deliver the right messages at times when they can be absorbed. Its platform supports both publishers and advertisers so both can understand which creative work delivers the best results.

Management

Management Advertising Data Lake Sales

Data Management Requirements for the Enterprise Data Lake

In(tegrate) the Clouds

MAY 1, 2016

SnapLogic published Eight Data Management Requirements for the Enterprise Data Lake. Metadata and Governance. The company also recently hosted a webinar on Democratizing the Data Lake with Constellation Research and published 2 whitepapers from Mark Madsen. They are: Storage and Data Formats. Ingest and Delivery.

Data Lake

Data Lake Enterprise Management Metadata

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

erwin also provides data governance, metadata management and data lineage software called erwin Data Intelligence by Quest. It requires discipline, and information in the form of metadata about those being governed so that remedial action can be taken to hold people to account and ensure policies are being followed.

Metadata

Metadata Data Quality Data Governance Modeling

Do Large Language Models Dream of Knowledge Graphs – Impressions from Day 2 At SEMANTiCS 2023

Ontotext

OCTOBER 12, 2023

He invited everyone to contribute to Transactions on Graph Data and Knowledge (TGDK) – an open-access journal publishing research about graph data and knowledge. Both speakers talked about common metadata standards and adequate language resources as key enablers of efficient interoperable, multilingual projects.

Modeling

Modeling Recreation/Entertainment Data Processing Metadata

AI governance is rapidly evolving — Here’s how government agencies must prepare

IBM Big Data Hub

APRIL 11, 2024

Non-governmental bodies are also publishing guidance useful to public sector agencies. This year the World Economic Forum’s AI Governance Alliance this year published the Presidio AI Framework ( PDF ). For example, New York City published its own AI Action plan in October 2023, and formalized its AI principles in March 2024.

Risk

Risk Consulting Data Processing Publishing

Ontotext Invents the Universe So You Don’t Need To

Ontotext

NOVEMBER 22, 2020

Content Enrichment and Metadata Management. The value of metadata for content providers is well-established. When that metadata is connected within a knowledge graph, a powerful mechanism for content enrichment is unlocked. Ontotext Platform can be employed for a number of applications within an enterprise.

Metadata

Metadata Cost-Benefit Unstructured Data Technology

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Cloudera

SEPTEMBER 29, 2022

The data product is not just the data itself, but a bunch of metadata that surrounds it — the simple stuff like schema is a given. It is also agnostic to where the different domains are hosted. The teams would then “publish” specific tables within their namespaces as publicly referenceable. Data fabric defined.

Data Architecture

Data Architecture Data Warehouse Metadata Sales

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Iceberg employs internal metadata management that keeps track of data and empowers a set of rich features at scale. The transformed zone is an enterprise-wide zone to host cleaned and transformed data in order to serve multiple teams and use cases. Data can be organized into three different zones, as shown in the following figure.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Combining the Flexibility of Knowledge Graphs with the Power of Semantic Tagging: The Enterprise PowerPack

Ontotext

JULY 12, 2024

This enables our customers to work with a rich, user-friendly toolset to manage a graph composed of billions of edges hosted in data centers around the world. PoolParty also ensures that developing, publishing, or connecting content goes smoothly by making it easy to create tags with more precision.

Enterprise

Enterprise Cost-Benefit Metadata Data Integration

Amazon OpenSearch Service search enhancements: 2023 roundup

AWS Big Data

JANUARY 9, 2024

Now users seek methods that allow them to get even more relevant results through semantic understanding or even search through image visual similarities instead of textual search of metadata. To foster an open ecosystem, we created a framework to empower partners to easily build and publish AI connectors.

Visualization

Visualization Cost-Benefit Modeling Machine Learning

The new challenges of scale: What it takes to go from PB to EB data scale

CIO Business Intelligence

JUNE 14, 2023

Leveraging an open-source solution like Apache Ozone, which is specifically designed to handle exabyte-scale data by distributing metadata throughout the entire system, not only facilitates scalability in data management but also ensures resilience and availability at scale.

Unstructured Data

Unstructured Data IT Manufacturing Visualization

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

Developers can use the support in Amazon Location Service for publishing device position updates to Amazon EventBridge to build a near-real-time data pipeline that stores locations of tracked assets in Amazon Simple Storage Service (Amazon S3). This solution uses distance-based filtering to reduce costs and jitter. Choose Run.

Analytics

Analytics IoT Metadata Internet of Things

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

That’s a lot of priorities – especially when you group together closely related items such as data lineage and metadata management which rank nearby. Agile Manifesto get published. Allows metadata repositories to share and exchange. Disconnects, in a nutshell. Validates products for conformance. Upcoming Events.

Machine Learning

Machine Learning Data Governance Metadata Data Science

How Zurich Insurance Group built a log management solution on AWS

AWS Big Data

JULY 16, 2024

Priority 2 logs, such as operating system security logs, firewall, identity provider (IdP), email metadata, and AWS CloudTrail , are ingested into Amazon OpenSearch Service to enable the following capabilities. Previously, P2 logs were ingested into the SIEM. He helps financial services customers improve their security posture in the cloud.

Insurance

Insurance Management Cost-Benefit Optimization

Data Catalog: Part of the Solution – or Part of the Problem?

Alation

DECEMBER 13, 2022

Today a modern catalog hosts a wide range of users (like business leaders, data scientists and engineers) and supports an even wider set of use cases (like data governance , self-service , and cloud migration ). (See Yet lately, a few analysts have started publishing evaluations of data catalogs for specific use cases. Conclusion.

Metadata

Metadata Data Governance Enterprise Data-driven

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Trending Sources

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Webinars

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Integrate custom applications with AWS Lake Formation – Part 2

Reduce your compute costs for stream processing applications with Kinesis Client Library 3.0

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

Governing data in relational databases using Amazon DataZone

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

What you need to know about product management for AI

How Cargotec uses metadata replication to enable cross-account data sharing

HEMA accelerates their data governance journey with Amazon DataZone

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Build efficient, cross-Regional, I/O-intensive workloads with Dask on AWS

Use Amazon OpenSearch Ingestion to migrate to Amazon OpenSearch Serverless

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Top 15 data management platforms

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

How REA Group approaches Amazon MSK cluster capacity planning

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Achieve high availability in Amazon OpenSearch Multi-AZ with Standby enabled domains: A deep dive into failovers

Ingest and analyze your data using Amazon OpenSearch Service with Amazon OpenSearch Ingestion

Implement a full stack serverless search application using AWS Amplify, Amazon Cognito, Amazon API Gateway, AWS Lambda, and Amazon OpenSearch Serverless

How Amazon GTTS runs large-scale ETL jobs on AWS using Amazon MWAA

Top 15 data management platforms available today

Data Management Requirements for the Enterprise Data Lake

Empowering data mesh: The tools to deliver BI excellence

Do Large Language Models Dream of Knowledge Graphs – Impressions from Day 2 At SEMANTiCS 2023

AI governance is rapidly evolving — Here’s how government agencies must prepare

Ontotext Invents the Universe So You Don’t Need To

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Combining the Flexibility of Knowledge Graphs with the Power of Semantic Tagging: The Enterprise PowerPack

Amazon OpenSearch Service search enhancements: 2023 roundup

The new challenges of scale: What it takes to go from PB to EB data scale

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Themes and Conferences per Pacoid, Episode 8

How Zurich Insurance Group built a log management solution on AWS

Data Catalog: Part of the Solution – or Part of the Problem?

Stay Connected