Data Integration, Metadata and Publishing

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. We take care of the ETL for you by automating the creation and management of data replication. Glue ETL offers customer-managed data ingestion.

Data Integration

Data Integration Data Lake Statistics Data-driven

How companies are building sustainable AI and ML initiatives

O'Reilly on Data

JANUARY 29, 2019

In 2017, we published “ How Companies Are Putting AI to Work Through Deep Learning ,” a report based on a survey we ran aiming to help leaders better understand how organizations are applying AI through deep learning. Data scientists and data engineers are in demand.

Deep Learning

Deep Learning Machine Learning Data Science Metadata

RDF-Star: Metadata Complexity Simplified

Ontotext

JUNE 10, 2021

Not Every Graph is a Knowledge Graph: Schemas and Semantic Metadata Matter. To be able to automate these operations and maintain sufficient data quality, enterprises have started implementing the so-called data fabrics , that employ diverse metadata sourced from different systems. Such examples are provenance (e.g.

Metadata

Metadata Cost-Benefit OLAP Modeling

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Plug-and-play integration : A seamless, plug-and-play integration between data producers and consumers should facilitate rapid use of new data sets and enable quick proof of concepts, such as in the data science teams. As part of the required data, CHE data is shared using Amazon DataZone.

IoT

IoT Machine Learning Metadata Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Metadata, the Neglected Stepchild of IT

Data Virtualization

DECEMBER 8, 2022

Reading Time: 3 minutes While cleaning up our archive recently, I found an old article published in 1976 about data dictionary/directory systems (DD/DS). Nowadays, we no longer use the term DD/DS, but “data catalog” or simply “metadata system”. It was written by L.

Metadata

Metadata IT Data Integration Publishing

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Let’s briefly describe the capabilities of the AWS services we referred above: AWS Glue is a fully managed, serverless, and scalable extract, transform, and load (ETL) service that simplifies the process of discovering, preparing, and loading data for analytics. Amazon Athena is used to query, and explore the data.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

AWS Big Data

APRIL 17, 2024

In this post, we discuss how the reimagined data flow works with OR1 instances and how it can provide high indexing throughput and durability using a new physical replication protocol. We also dive deep into some of the challenges we solved to maintain correctness and data integrity.

Optimization

Optimization Snapshot Metadata Cost-Benefit

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

We have enhanced data sharing performance with improved metadata handling, resulting in data sharing first query execution that is up to four times faster when the data sharing producers data is being updated.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

Third, some services require you to set up and manage compute resources used for federated connectivity, and capabilities like connection testing and data preview arent available in all services. To solve for these challenges, we launched Amazon SageMaker Lakehouse unified data connectivity. Choose Run all.

Visualization

Visualization Data Processing Testing Publishing

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

For this, Cargotec built an Amazon Simple Storage Service (Amazon S3) data lake and cataloged the data assets in AWS Glue Data Catalog. They chose AWS Glue as their preferred data integration tool due to its serverless nature, low maintenance, ability to control compute resources in advance, and scale when needed.

Metadata

Metadata Data Lake Machine Learning Big Data

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

This unified catalog enables engineers, data scientists, and analysts to securely discover and access approved data and models using semantic search with generative AI-created metadata. Collaboration is seamless, with straightforward publishing and subscribing workflows, fostering a more connected and efficient work environment.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

What is a data fabric architecture?

IBM Big Data Hub

MARCH 25, 2022

A data fabric is an architectural approach that enables organizations to simplify data access and data governance across a hybrid multicloud landscape for better 360-degree views of the customer and enhanced MLOps and trustworthy AI. The post What is a data fabric architecture? appeared first on Journey to AI Blog.

Metadata

Metadata Data Quality Data Governance Data Integration

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Ontotext

JULY 29, 2021

The second one is the Linked Open Data (LOD): a cloud of interlinked structured datasets published without centralized control across thousands of servers. There are more than 80 million pages with semantic, machine interpretable metadata , according to the Schema.org standard. Take this restaurant, for example.

Enterprise

Enterprise Metadata Knowledge Discovery Management

How to Do Data Modeling the Right Way

erwin

MAY 27, 2020

What, then, should users look for in a data modeling product to support their governance/intelligence requirements in the data-driven enterprise? Nine Steps to Data Modeling. Provide metadata and schema visualization regardless of where data is stored. naming and database standards, formatting options, and so on.

Modeling

Modeling Metadata Data Governance Visualization

Collibra Provides a Platform for Data Intelligence

David Menninger's Analyst Perspectives

OCTOBER 8, 2024

As I recently noted , the term “data intelligence” has been used by multiple providers across analytics and data for several years and is becoming more widespread as software providers respond to the need to provide enterprises with a holistic view of data production and consumption.

Data Quality

Data Quality Data Governance Enterprise Visualization

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

We will partition and format the server access logs with Amazon Web Services (AWS) Glue , a serverless data integration service, to generate a catalog for access logs and create dashboards for insights. Both the user data and logs buckets must be in the same AWS Region and owned by the same account.

Metadata

Metadata Dashboards Metrics Visualization

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. Data and Metadata: Data inputs and data outputs produced based on the application logic.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

S3 Tables integration with the AWS Glue Data Catalog is in preview, allowing you to stream, query, and visualize dataincluding Amazon S3 Metadata tablesusing AWS analytics services such as Amazon Data Firehose , Amazon Athena , Amazon Redshift, Amazon EMR, and Amazon QuickSight. With AWS Glue 5.0, With AWS Glue 5.0,

Analytics

Analytics Data Lake Metadata Data Warehouse

Proposals for model vulnerability and security

O'Reilly on Data

MARCH 20, 2019

Data integrity constraints: Many databases don’t allow for strange or unrealistic combinations of input variables and this could potentially thwart watermarking attacks. Applying data integrity constraints on live, incoming data streams could have the same benefits. Disparate impact analysis: see section 1.

Modeling

Modeling Machine Learning Predictive Modeling Consulting

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Gartner defines a data fabric as “a design concept that serves as an integrated layer of data and connecting processes. The data fabric architectural approach can simplify data access in an organization and facilitate self-service data consumption at scale. 11 May 2021. . 3 March 2022.

Management

Management Metadata Data Architecture Data Lake

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

And each of these gains requires data integration across business lines and divisions. Limiting growth by (data integration) complexity Most operational IT systems in an enterprise have been developed to serve a single business function and they use the simplest possible model for this. We call this the Bad Data Tax.

Metadata

Metadata Slice and Dice Data Integration Enterprise

Five Benefits of an Automation Framework for Data Governance

erwin

JANUARY 24, 2019

In most companies, an incredible amount of data flows from multiple sources in a variety of formats and is constantly being moved and federated across a changing system landscape. With an automation framework, data professionals can meet these needs at a fraction of the cost of the traditional manual way. Governing metadata.

Data Governance

Data Governance Metadata Data-driven Cost-Benefit

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

Multi-channel publishing of data services. Agile BI and Reporting, Single Customer View, Data Services, Web and Cloud Computing Integration are scenarios where Data Virtualization offers feasible and more efficient alternatives to traditional solutions. Does Data Virtualization support web data integration?

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

Its platform supports both publishers and advertisers so both can understand which creative work delivers the best results. Publishers find a privacy-safe way to deliver first-party information to advertisers while advertisers get the information they need to track performance across all of the publishing platforms in the open web.

Management

Management Advertising Data Lake Sales

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

AWS Transfer Family seamlessly integrates with other AWS services, automates transfer, and makes sure data is protected with encryption and access controls. Each file arrives as a pair with a tail metadata file in CSV format containing the size and name of the file. 2 GB into the landing zone daily.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

GraphDB: MongoDB Document Store Integration for Large-scale Metadata Management

Ontotext

JUNE 27, 2019

Ontotext’s GraphDB is an enterprise-ready semantic graph database (also called RDF triplestore as it stores data in RDF triples). It provides the core infrastructure for solutions where modeling agility, data integration, relationship exploration, cross-enterprise data publishing and consumption are critical.

Metadata

Metadata Management Enterprise Publishing

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. To address this challenge, organizations can deploy a data mesh using AWS Lake Formation that connects the multiple EMR clusters. The data resides on Amazon S3, which reduces the storage costs significantly.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

At Center Stage IV: Ontotext Webinars About How GraphDB Levels the Field Between RDF and Property Graphs

Ontotext

NOVEMBER 4, 2021

As we’ve said again and again, we believe that knowledge graphs are the next generation tool for helping businesses make critical decisions, based on harmonized knowledge models and data derived from siloed source systems. But these tasks are only part of the story. That covers the two webinars that we wanted to present to you today.

Metadata

Metadata Visualization Modeling Enterprise

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

You can slice data by different dimensions like job name, see anomalies, and share reports securely across your organization. With these insights, teams have the visibility to make data integration pipelines more efficient. An AWS Glue crawler scans data on the S3 bucket and populates table metadata on the AWS Glue Data Catalog.

Metrics

Metrics Visualization Dashboards Publishing

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Business units can simply share data and collaborate by publishing and subscribing to the data assets. The Central IT team (Spoke N) subscribes the data from individual business units and consumes this data using Redshift Spectrum. Similarly, individual business units produce their own domain-specific data.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

Combining the Flexibility of Knowledge Graphs with the Power of Semantic Tagging: The Enterprise PowerPack

Ontotext

JULY 12, 2024

We offer two different PowerPacks – Agile Data Integration and High-Performance Tagging. The High-Performance Tagging PowerPack bundle The High-Performance Tagging PowerPack is designed to satisfy taxonomy and metadata management needs by allowing enterprise tagging at a scale.

Enterprise

Enterprise Cost-Benefit Metadata Data Integration

Benchmark Results Position GraphDB As the Most Versatile Graph Database Engine

Ontotext

FEBRUARY 23, 2023

The engines must facilitate the advanced data integration and metadata data management scenarios where an EKG is used for data fabrics or otherwise serves as a data hub between diverse data and content management systems. Need a reliable and robust RDF graph database for your use case?

Publishing

Publishing Metadata Optimization Testing

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

Its platform supports both publishers and advertisers so both can understand which creative work delivers the best results. Publishers find a privacy-safe way to deliver first-party information to advertisers while advertisers get the information they need to track performance across all of the publishing platforms in the open web.

Management

Management Advertising Data Lake Sales

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

Under the Transparency in Coverage (TCR) rule , hospitals and payors to publish their pricing data in a machine-readable format. The data ingestion process copies the machine-readable files from the hospitals, validates the data, and keeps the validated files available for analysis.

Visualization

Visualization Dashboards Data-driven Gap analysis

GraphDB in Action: Putting the Most Reliable RDF Database to Work for Better Human-machine Interaction

Ontotext

JANUARY 26, 2023

published as a special topic article in AI magazine, Volume 43, Issue 1 , Spring 2022. The paper introduces KnowWhereGraph (KWG) as a solution to the ever-growing challenge of integrating heterogeneous data and building services on top of already existing open data. The catalog stores the asset’s metadata in RDF.

Interactive

Interactive Metadata Data Integration Data-driven

What Does 2000 Year Old Concrete Have to Do with Knowledge Graphs?

Ontotext

SEPTEMBER 2, 2020

Instead, it creates a unified way, sometimes called a data fabric, of accessing an organization’s data as well as 3rd party or global data in a seamless manner. Data is represented in a holistic, human-friendly and meaningful way. For efficient drug discovery, linked data is key.

Insurance

Insurance Metadata Publishing Unstructured Data

The Importance of FAIR Data Principles in Healthcare & Life Sciences

Ontotext

AUGUST 5, 2020

What is FAIR data? A group of scientists published in 2016 a paper in Nature magazine, discussing the need for a set of principles to govern the discovery, management, and reuse of scientific data. Each of the four FAIR principles calls for data and metadata to be easily found, accessed, understood, exchanged and reused.

Metadata

Metadata Insurance Manufacturing Data Integration

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

It delivers the ability to capture and unify the business and technical perspectives of data assets, enables effective collaboration between a variety of stakeholders, and delivers metadata-driven automation to accelerate the creation and maintenance of data sources on virtually any data management platform.

Data-driven

Data-driven Modeling Enterprise Structured Data

How to Build Knowledge Graphs for Enterprise Applications with Two Industry Leaders

Ontotext

JULY 13, 2023

Knowledge graphs have greatly helped to successfully enhance business-critical enterprise applications, especially those where high performance tagging and agile data integration is needed. Enterprises generate an enormous amount of data and content every minute. How can you build knowledge graphs for enterprise applications?

Enterprise

Enterprise Metadata Digital Transformation Data Integration

Choosing A Graph Data Model to Best Serve Your Use Case

Ontotext

MARCH 27, 2024

For example, GPS, social media, cell phone handoffs are modeled as graphs while data catalogs, data lineage and MDM tools leverage knowledge graphs for linking metadata with semantics. LPG lacks schema and semantics, which makes it inappropriate for publishing and sharing of data. This makes LPGs inflexible.

Modeling

Modeling Metadata Data Quality Enterprise

Three Takeaways from Gartner’s 2019 Magic Quadrant for Data Management Solutions for Analytics

Cloudera

FEBRUARY 11, 2019

The Magic Quadrant (MQ) is an established, widely-referenced series of research reports published by the analyst firm Gartner, Inc. The January 2019 “Magic Quadrant for Data Management Solutions for Analytics” provides valuable insights into the status, direction, and players in the DMSA market.

Management

Management Metadata Analytics Machine Learning

Data Management Requirements for the Enterprise Data Lake

In(tegrate) the Clouds

MAY 1, 2016

SnapLogic published Eight Data Management Requirements for the Enterprise Data Lake. They are: Storage and Data Formats. Metadata and Governance. The company also recently hosted a webinar on Democratizing the Data Lake with Constellation Research and published 2 whitepapers from Mark Madsen.

Data Lake

Data Lake Enterprise Management Metadata

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

AWS Big Data

JUNE 12, 2023

Acting as a bridge between producer and consumer apps, it enforces the schema, reduces the data footprint in transit, and safeguards against malformed data. AWS Glue is an ideal solution for running stream consumer applications, discovering, extracting, transforming, loading, and integrating data from multiple sources.

Management

Management Metadata Internet of Things Testing

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

How companies are building sustainable AI and ML initiatives

Webinars

Trending Sources

RDF-Star: Metadata Complexity Simplified

Webinars

How EUROGATE established a data mesh architecture using Amazon DataZone

Data’s dark secret: Why poor quality cripples AI and growth

Metadata, the Neglected Stepchild of IT

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

Recap of Amazon Redshift key product announcements in 2024

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

How Cargotec uses metadata replication to enable cross-account data sharing

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

What is a data fabric architecture?

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

How to Do Data Modeling the Right Way

Collibra Provides a Platform for Data Intelligence

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Top analytics announcements of AWS re:Invent 2024

Proposals for model vulnerability and security

Augmented data management: Data fabric versus data mesh

You Cannot Get to the Moon on a Bike!

Five Benefits of an Automation Framework for Data Governance

Biggest Trends in Data Visualization Taking Shape in 2022

Top 15 data management platforms

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

GraphDB: MongoDB Document Store Integration for Large-scale Metadata Management

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

At Center Stage IV: Ontotext Webinars About How GraphDB Levels the Field Between RDF and Property Graphs

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Combining the Flexibility of Knowledge Graphs with the Power of Semantic Tagging: The Enterprise PowerPack

Benchmark Results Position GraphDB As the Most Versatile Graph Database Engine

Top 15 data management platforms available today

How healthcare organizations can analyze and create insights using price transparency data

GraphDB in Action: Putting the Most Reliable RDF Database to Work for Better Human-machine Interaction

What Does 2000 Year Old Concrete Have to Do with Knowledge Graphs?

The Importance of FAIR Data Principles in Healthcare & Life Sciences

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

How to Build Knowledge Graphs for Enterprise Applications with Two Industry Leaders

Choosing A Graph Data Model to Best Serve Your Use Case

Three Takeaways from Gartner’s 2019 Magic Quadrant for Data Management Solutions for Analytics

Data Management Requirements for the Enterprise Data Lake

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

Stay Connected