Data Integration, Events and Metadata

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. We take care of the ETL for you by automating the creation and management of data replication. Glue ETL offers customer-managed data ingestion.

Data Integration

Data Integration Data Lake Statistics Data-driven

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. Unlike direct Amazon S3 access, Iceberg supports these operations on petabyte-scale data lakes without requiring complex custom code.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

It addresses many of the shortcomings of traditional data lakes by providing features such as ACID transactions, schema evolution, row-level updates and deletes, and time travel. In this blog post, we’ll discuss how the metadata layer of Apache Iceberg can be used to make data lakes more efficient.

Metadata

Metadata Snapshot Data Lake Metrics

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

“The challenge that a lot of our customers have is that requires you to copy that data, store it in Salesforce; you have to create a place to store it; you have to create an object or field in which to store it; and then you have to maintain that pipeline of data synchronization and make sure that data is updated,” Carlson said.

Data Integration

Data Integration Data Lake Data Warehouse Metadata

Are You Content with Your Organization’s Content Strategy?

Rocket-Powered Data Science

JULY 6, 2021

This is accomplished through tags, annotations, and metadata (TAM). granules) of the data collection for fast search, access, and retrieval is also important for efficient orchestration and delivery of the data that fuels AI, automation, and machine learning operations. Collect, curate, and catalog (i.e.,

Strategy

Strategy Machine Learning Metadata Knowledge Discovery

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

An extract, transform, and load (ETL) process using AWS Glue is triggered once a day to extract the required data and transform it into the required format and quality, following the data product principle of data mesh architectures. From here, the metadata is published to Amazon DataZone by using AWS Glue Data Catalog.

IoT

IoT Machine Learning Metadata Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

If you suddenly see unexpected patterns in your social data, that may mean adversaries are attempting to poison your data sources. Anomaly detection may have originated in finance, but it is becoming a part of every data scientist’s toolkit. Tim Kraska on “How machine learning will accelerate data management systems”.

Machine Learning

Machine Learning Software Metadata Testing

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Let’s briefly describe the capabilities of the AWS services we referred above: AWS Glue is a fully managed, serverless, and scalable extract, transform, and load (ETL) service that simplifies the process of discovering, preparing, and loading data for analytics. Amazon Athena is used to query, and explore the data.

Sales

Sales Data-driven Data Processing Key Performance Indicator

IBM named a leader in the 2022 Gartner® Magic Quadrant™ for Data Integration Tools

IBM Big Data Hub

AUGUST 24, 2022

The only question is, how do you ensure effective ways of breaking down data silos and bringing data together for self-service access? It starts by modernizing your data integration capabilities – ensuring disparate data sources and cloud environments can come together to deliver data in real time and fuel AI initiatives.

Data Integration

Data Integration Metadata Data-driven Data Architecture

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

AWS Big Data

APRIL 17, 2024

In this post, we discuss how the reimagined data flow works with OR1 instances and how it can provide high indexing throughput and durability using a new physical replication protocol. We also dive deep into some of the challenges we solved to maintain correctness and data integrity.

Optimization

Optimization Snapshot Metadata Cost-Benefit

The Need For Personalized Data Journeys for Your Data Consumers

DataKitchen

OCTOBER 20, 2023

’ It assigns unique identifiers to each data item—referred to as ‘payloads’—related to each event. By offering real-time tracking mechanisms and sending targeted alerts to specific consumers, a Payload DJ can immediately notify them of any changes, delays, or issues affecting their data.

Insurance

Insurance Metadata Data-driven Data Quality

Data integrity vs. data quality: Is there a difference?

IBM Big Data Hub

JULY 13, 2023

When we talk about data integrity, we’re referring to the overarching completeness, accuracy, consistency, accessibility, and security of an organization’s data. Together, these factors determine the reliability of the organization’s data. In short, yes.

Data Quality

Data Quality Data Integration Metadata Cost-Benefit

Why data observability is essential to AI governance

erwin

DECEMBER 9, 2024

And if it isnt changing, its likely not being used within our organizations, so why would we use stagnant data to facilitate our use of AI? The key is understanding not IF, but HOW, our data fluctuates, and data observability can help us do just that. Lets give a for instance.

Metadata

Metadata Data Quality Sales Modeling

The Superpowers of Ontotext’s Relation and Event Detector

Ontotext

FEBRUARY 26, 2024

This is part of Ontotext’s AI-in-Action initiative aimed at enabling data scientists and engineers to benefit from the AI capabilities of our products. Ontotext’s Relation and Event Detector (RED) is designed to assess and analyze the impact of market-moving events. Why do risk and opportunity events matter?

Data-driven

Data-driven Risk Modeling Risk Management

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

This premier event showcased groundbreaking advancements, keynotes from AWS leadership, hands-on technical sessions, and exciting product launches. Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights.

Analytics

Analytics Data Lake Metadata Data Warehouse

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Enterprises and organizations across the globe want to harness the power of data to make better decisions by putting data at the center of every decision-making process. However, throughout history, data services have held dominion over their customers’ data. This concept makes Iceberg extremely versatile.

Data Lake

Data Lake Metadata Snapshot Analytics

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

We will partition and format the server access logs with Amazon Web Services (AWS) Glue , a serverless data integration service, to generate a catalog for access logs and create dashboards for insights. These logs can track activity, such as data access patterns, lifecycle and management activity, and security events.

Metadata

Metadata Dashboards Metrics Visualization

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The program must introduce and support standardization of enterprise data. Programs must support proactive and reactive change management activities for reference data values and the structure/use of master data and metadata.

Data Governance

Data Governance Management Metadata Data Quality

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

For this, Cargotec built an Amazon Simple Storage Service (Amazon S3) data lake and cataloged the data assets in AWS Glue Data Catalog. They chose AWS Glue as their preferred data integration tool due to its serverless nature, low maintenance, ability to control compute resources in advance, and scale when needed.

Metadata

Metadata Data Lake Machine Learning Big Data

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. Data and Metadata: Data inputs and data outputs produced based on the application logic.

Metadata

Metadata Cost-Benefit Enterprise Interactive

What is Data Lineage? Top 5 Benefits of Data Lineage

erwin

APRIL 29, 2020

Many large organizations, in their desire to modernize with technology, have acquired several different systems with various data entry points and transformation rules for data as it moves into and across the organization. Who are the data owners? Data lineage offers proof that the data provided is reflected accurately.

Metadata

Metadata Key Performance Indicator Data Governance Data Quality

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

Agile BI and Reporting, Single Customer View, Data Services, Web and Cloud Computing Integration are scenarios where Data Virtualization offers feasible and more efficient alternatives to traditional solutions. Does Data Virtualization support web data integration? In forecasting future events.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Ontotext

JULY 29, 2021

KGs bring the Semantic Web paradigm to the enterprises, by introducing semantic metadata to drive data management and content management to new levels of efficiency and breaking silos to let them synergize with various forms of knowledge management. Take this restaurant, for example. Enterprise Knowledge Graphs and the Semantic Web.

Enterprise

Enterprise Metadata Knowledge Discovery Management

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

Data engineers use Apache Iceberg because it’s fast, efficient, and reliable at any scale and keeps records of how datasets change over time. Apache Iceberg offers integrations with popular data processing frameworks such as Apache Spark, Apache Flink, Apache Hive, Presto, and more.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

After navigating the complexity of multiple systems and stages to bring data to its end-use case, the final product’s value becomes the ultimate yardstick for measuring success. By diligently testing and monitoring data in Use, you uphold data integrity and provide tangible value to end-users.

Testing

Testing Data Quality Predictive Modeling Metrics

If Curiosity Cabinets Were Knowledge Graphs

Ontotext

JUNE 3, 2020

With knowledge graphs , additional facts and figures can be threaded into the collection items and the metadata related to them. Imagine a curiosity cabinet with items attached to threads (strings) of well-described semantic information, linking them to other artifacts, events, people, institutions, you name it.

Contextual Data

Contextual Data Metadata Digital Transformation Technology

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

AWS Transfer Family seamlessly integrates with other AWS services, automates transfer, and makes sure data is protected with encryption and access controls. Each file arrives as a pair with a tail metadata file in CSV format containing the size and name of the file. To achieve this, Aruba used Amazon S3 Event Notifications.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

From Data Silos to Data Fabric with Knowledge Graphs

Ontotext

SEPTEMBER 15, 2020

This multiplicity of data leads to the growth silos, which in turns increases the cost of integration. The purpose of weaving a Data Fabric is to remove the friction and cost from accessing and sharing data in the distributed ICT environment that is the norm. Knowledge Graphs are the Warp and Weft of a Data Fabric.

Metadata

Metadata Knowledge Discovery Data Quality Strategy

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Ontotext

DECEMBER 1, 2023

The event held the space for presentations, discussions, and one-on-one meetings, where more than 20 partners, 1064 Registrants from 41 countries, spanning across 25 industries came together. Sumit started his talk by laying out the problems in today’s data landscapes. Abstract art and knowledge graphs: embracing your mess!

Metadata

Metadata Sales Machine Learning Consulting

Unlocking the value of data as your differentiator

AWS Big Data

NOVEMBER 29, 2023

Within seconds of data being written into Aurora, you can use Amazon Redshift to do near-real-time analytics and ML on petabytes of data. Amazon DataZone is being used by companies like Guardant Health and Bristol Meyers Squibb to catalog, discover, share, and govern data across their organization.

Data Warehouse

Data Warehouse Data Lake Dashboards Data Integration

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

The data ingestion process copies the machine-readable files from the hospitals, validates the data, and keeps the validated files available for analysis. Data analysis – In this stage, the files are transformed using AWS Glue and stored in the AWS Glue Data Catalog.

Visualization

Visualization Dashboards Data-driven Gap analysis

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

“Any enterprise CEO really ought to be able to ask a question that involves connecting data across the organization, be able to run a company effectively, and especially to be able to respond to unexpected events. Most organizations are missing this ability to connect all the data together.”

Metadata

Metadata Data Lake Data Warehouse Data Quality

Top 10 Data Lineage Podcasts, Blogs, and Magazines

Octopai

JANUARY 31, 2021

The particular episode we recommend looks at how WeWork struggled with understanding their data lineage so they created a metadata repository to increase visibility. Agile Data. Another podcast we think is worth a listen is Agile Data. Currently, he is in charge of the Technical Operations team at MIT Open Learning.

Data Governance

Data Governance Data Processing Data Quality Metadata

Ozone Write Pipeline V2 with Ratis Streaming

Cloudera

NOVEMBER 8, 2022

Ozone is also highly available — the Ozone metadata is replicated by Apache Ratis, an implementation of the Raft consensus algorithm for high-performance replication. Since Ozone supports both Hadoop FileSystem interface and Amazon S3 interface, frameworks like Apache Spark, YARN, Hive, and Impala can automatically use Ozone to store data.

Metadata

Metadata Data-driven Management Optimization

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Data ingestion You have to build ingestion pipelines based on factors like types of data sources (on-premises data stores, files, SaaS applications, third-party data), and flow of data (unbounded streams or batch data). Data processing Raw data is often cluttered with duplicates and irregular formats.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

GraphDB in Action: Putting the Most Reliable RDF Database to Work for Better Human-machine Interaction

Ontotext

JANUARY 26, 2023

” “How does this region/event compare to other regions/events?” ” To do so, KWG draws from over 30 fully integrated and semantically homogenized data layers. The catalog stores the asset’s metadata in RDF. As a result of these data quality issues, the need for integrity checks arises.

Interactive

Interactive Metadata Data Integration Data-driven

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Cloudera

AUGUST 31, 2021

Running on CDW is fully integrated with streaming, data engineering, and machine learning analytics. It has a consistent framework that secures and provides governance for all data and metadata on private clouds, multiple public clouds, or hybrid clouds. Smart DwH Mover helps in accelerating data warehouse migration.

Data Warehouse

Data Warehouse Cost-Benefit Metadata Data-driven

Four use cases defining the new wave of data management

IBM Big Data Hub

MAY 9, 2022

A confluence of events in the data management and AI landscape is bearing down on companies, no matter their size, industry or geographical location. Some of these, such as the continued sprawl of data across multicloud environments have been looming for years, if not decades. Multicloud data integration.

Management

Management Data Quality Metadata Data Integration

Sovereign Clouds: Partner Perspectives on Safeguarding Critical Customer Data

CIO Business Intelligence

APRIL 27, 2022

All are ideally qualified to help their customers achieve and maintain the highest standards for data integrity, including absolute control over data access, transparency and visibility into the provider’s operation, the knowledge that their information is managed appropriately, and access to VMware’s growing ecosystem of sovereign cloud solutions.

Digital Transformation

Digital Transformation Metadata Risk Enterprise

Process and analyze highly nested and large XML files using AWS Glue and Amazon Athena

AWS Big Data

SEPTEMBER 29, 2023

Analyzing XML files can help organizations gain insights into their data, allowing them to make better decisions and improve their operations. Analyzing XML files can also help in data integration, because many applications and systems use XML as a standard data format. This approach optimizes the use of your XML files.

Metadata

Metadata Visualization Data-driven Optimization

Four starting points to transform your organization into a data-driven enterprise

IBM Big Data Hub

JANUARY 17, 2023

Due to the convergence of events in the data analytics and AI landscape, many organizations are at an inflection point. IBM Cloud Pak for Data Express solutions offer clients a simple on ramp to start realizing the business value of a modern architecture. Data governance. Data integration. Start a trial.

Data-driven

Data-driven Enterprise Data Governance Data Science

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

AWS Big Data

JUNE 12, 2023

This data can come from a diverse range of sources, including Internet of Things (IoT) devices, user applications, and logging and telemetry information from applications, to name a few. By harnessing the power of streaming data, organizations are able to stay ahead of real-time events and make quick, informed decisions.

Management

Management Metadata Internet of Things Testing

Introducing Amazon MWAA support for the Airflow REST API and web server auto scaling

AWS Big Data

MAY 16, 2024

With the new REST API, you can now invoke DAG runs, manage datasets, or get the status of Airflow’s metadata database, trigger, and scheduler—all without relying on the Airflow web UI or CLI. This script automates the process of sending a specific number of requests per second to your web server, enabling you to trigger an auto scaling event.

Testing

Testing Metrics Interactive Management

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Build a high-performance quant research platform with Apache Iceberg

Webinars

Trending Sources

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Webinars

Salesforce debuts Zero Copy Partner Network to ease data integration

Are You Content with Your Organization’s Content Strategy?

How EUROGATE established a data mesh architecture using Amazon DataZone

Data’s dark secret: Why poor quality cripples AI and growth

Deep automation in machine learning

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

IBM named a leader in the 2022 Gartner® Magic Quadrant™ for Data Integration Tools

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

The Need For Personalized Data Journeys for Your Data Consumers

Data integrity vs. data quality: Is there a difference?

Why data observability is essential to AI governance

The Superpowers of Ontotext’s Relation and Event Detector

Top analytics announcements of AWS re:Invent 2024

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

What is data governance? Best practices for managing data assets

How Cargotec uses metadata replication to enable cross-account data sharing

How Cloudera Data Flow Enables Successful Data Mesh Architectures

What is Data Lineage? Top 5 Benefits of Data Lineage

Biggest Trends in Data Visualization Taking Shape in 2022

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

If Curiosity Cabinets Were Knowledge Graphs

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

From Data Silos to Data Fabric with Knowledge Graphs

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Unlocking the value of data as your differentiator

How healthcare organizations can analyze and create insights using price transparency data

How Knowledge Graphs Power Data Mesh and Data Fabric

Top 10 Data Lineage Podcasts, Blogs, and Magazines

Ozone Write Pipeline V2 with Ratis Streaming

Create an end-to-end data strategy for Customer 360 on AWS

GraphDB in Action: Putting the Most Reliable RDF Database to Work for Better Human-machine Interaction

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Four use cases defining the new wave of data management

Sovereign Clouds: Partner Perspectives on Safeguarding Critical Customer Data

Process and analyze highly nested and large XML files using AWS Glue and Amazon Athena

Four starting points to transform your organization into a data-driven enterprise

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

Introducing Amazon MWAA support for the Airflow REST API and web server auto scaling

Stay Connected