Data Integration, Metadata and Optimization

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. We take care of the ETL for you by automating the creation and management of data replication. Glue ETL offers customer-managed data ingestion.

Data Integration

Data Integration Data Lake Statistics Data-driven

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. Unlike direct Amazon S3 access, Iceberg supports these operations on petabyte-scale data lakes without requiring complex custom code.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Enterprises can gain an edge with Metadata Management

CIO Business Intelligence

SEPTEMBER 6, 2024

As artificial intelligence (AI) and machine learning (ML) continue to reshape industries, robust data management has become essential for organizations of all sizes. This means organizations must cover their bases in all areas surrounding data management including security, regulations, efficiency, and architecture.

Metadata

Metadata Enterprise Management Cost-Benefit

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

Despite their advantages, traditional data lake architectures often grapple with challenges such as understanding deviations from the most optimal state of the table over time, identifying issues in data pipelines, and monitoring a large number of tables. It is essential for optimizing read and write performance.

Metadata

Metadata Snapshot Data Lake Metrics

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

AWS Big Data

APRIL 17, 2024

Amazon OpenSearch Service recently introduced the OpenSearch Optimized Instance family (OR1), which delivers up to 30% price-performance improvement over existing memory optimized instances in internal benchmarks, and uses Amazon Simple Storage Service (Amazon S3) to provide 11 9s of durability.

Optimization

Optimization Snapshot Metadata Cost-Benefit

RDF-Star: Metadata Complexity Simplified

Ontotext

JUNE 10, 2021

With graph databases the representation of relationships as data make it possible to better represent data in real time, addressing newly discovered types of data and relationships. Relational databases benefit from decades of tweaks and optimizations to deliver performance. Metadata about Relationships Come in Handy.

Metadata

Metadata Cost-Benefit OLAP Modeling

Comprehensive data management for AI: The next-gen data management engine that will drive AI to new heights

CIO Business Intelligence

NOVEMBER 19, 2024

Some challenges include data infrastructure that allows scaling and optimizing for AI; data management to inform AI workflows where data lives and how it can be used; and associated data services that help data scientists protect AI workflows and keep their models clean. Seamless data integration.

Management

Management Unstructured Data Deep Learning Metadata

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. From here, the metadata is published to Amazon DataZone by using AWS Glue Data Catalog. This process is shown in the following figure.

IoT

IoT Machine Learning Metadata Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

First query response times for dashboard queries have significantly improved by optimizing code execution and reducing compilation overhead. We have enhanced autonomics algorithms to generate and implement smarter and quicker optimal data layout recommendations for distribution and sort keys, further optimizing performance.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

5 Ways Data Modeling Is Critical to Data Governance

erwin

JANUARY 9, 2020

Then there’s unstructured data with no contextual framework to govern data flows across the enterprise not to mention time-consuming manual data preparation and limited views of data lineage. So here’s why data modeling is so critical to data governance. erwin Data Modeler: Where the Magic Happens.

Data Governance

Data Governance Modeling Metadata Unstructured Data

IBM named a leader in the 2022 Gartner® Magic Quadrant™ for Data Integration Tools

IBM Big Data Hub

AUGUST 24, 2022

The only question is, how do you ensure effective ways of breaking down data silos and bringing data together for self-service access? It starts by modernizing your data integration capabilities – ensuring disparate data sources and cloud environments can come together to deliver data in real time and fuel AI initiatives.

Data Integration

Data Integration Metadata Data-driven Data Architecture

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

We won’t be writing code to optimize scheduling in a manufacturing plant; we’ll be training ML algorithms to find optimum performance based on historical data. With machine learning, the challenge isn’t writing the code; the algorithms are implemented in a number of well-known and highly optimized libraries.

Machine Learning

Machine Learning Software Metadata Testing

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

AWS Big Data

FEBRUARY 6, 2023

AWS Glue is a serverless data integration service that makes it simple to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. MongoDB Atlas is a developer data service from AWS technology partner MongoDB, Inc.

Metadata

Metadata Data Lake Machine Learning Big Data

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

S3 Tables are specifically optimized for analytics workloads, resulting in up to 3 times faster query throughput and up to 10 times higher transactions per second compared to self-managed tables. These metadata tables are stored in S3 Tables, the new S3 storage offering optimized for tabular data. With AWS Glue 5.0,

Analytics

Analytics Data Lake Metadata Data Warehouse

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

These tools range from enterprise service bus (ESB) products, data integration tools; extract, transform and load (ETL) tools, procedural code, application program interfaces (APIs), file transfer protocol (FTP) processes, and even business intelligence (BI) reports that further aggregate and transform data.

Data Governance

Data Governance Metadata Testing Data Lake

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Cloudinary is a cloud-based media management platform that provides a comprehensive set of tools and services for managing, optimizing, and delivering images, videos, and other media assets on websites and mobile applications. This concept makes Iceberg extremely versatile.

Data Lake

Data Lake Metadata Snapshot Analytics

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Let’s briefly describe the capabilities of the AWS services we referred above: AWS Glue is a fully managed, serverless, and scalable extract, transform, and load (ETL) service that simplifies the process of discovering, preparing, and loading data for analytics. As stated earlier, the first step involves data ingestion.

Sales

Sales Data-driven Data Processing Key Performance Indicator

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

AWS Transfer Family seamlessly integrates with other AWS services, automates transfer, and makes sure data is protected with encryption and access controls. Each file arrives as a pair with a tail metadata file in CSV format containing the size and name of the file. 2 GB into the landing zone daily.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

Becoming a machine learning company means investing in foundational technologies

O'Reilly on Data

MAY 21, 2019

Not surprisingly, data integration and ETL were among the top responses, with 60% currently building or evaluating solutions in this area. In an age of data-hungry algorithms, everything really begins with collecting and aggregating data. Metadata and artifacts needed for audits. and managed services in the cloud.

Machine Learning

Machine Learning Technology Deep Learning Data Science

2024 Gartner Market Guide To DataOps

DataKitchen

AUGUST 16, 2024

At DataKitchen, we think of this is a ‘meta-orchestration’ of the code and tools acting upon the data. Data Pipeline Observability: Optimizes pipelines by monitoring data quality, detecting issues, tracing data lineage, and identifying anomalies using live and historical metadata.

Marketing

Marketing Data Quality Testing Metadata

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

Data governance principles According to the Data Governance Institute, eight principles are at the center of all successful data governance and stewardship programs: All participants must have integrity in their dealings with each other. The program must introduce and support standardization of enterprise data.

Data Governance

Data Governance Management Metadata Data Quality

Denodo Provides a Logical Approach to Data Management

David Menninger's Analyst Perspectives

OCTOBER 24, 2024

Denodo also offers query optimization and acceleration capabilities to deliver high-performance analytics, as well as support for business semantics and security and access controls. The breadth and depth of Denodo Platform’s functionality is illustrated by its designation as a Leader in Capability in our 2024 Data Integration Buyers Guide.

Management

Management Data-driven Data Governance Data Lake

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

Through their unique position in ports, at sea, and on roads, they optimize global cargo flows and create sustainable customer value. Cargotec captures terabytes of IoT telemetry data from their machinery operated by numerous customers across the globe. An AWS Glue job (metadata exporter) runs daily on the source account.

Metadata

Metadata Data Lake Machine Learning Big Data

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

Agile BI and Reporting, Single Customer View, Data Services, Web and Cloud Computing Integration are scenarios where Data Virtualization offers feasible and more efficient alternatives to traditional solutions. Does Data Virtualization support web data integration? In improving operational processes.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Data confidence begins at the edge

CIO Business Intelligence

SEPTEMBER 23, 2024

A recipe for trustworthy data As the compute stack becomes more distributed across constrained environments, companies need the ability to prove data integrity through a trust fabric to unlock data insights they can rely on. Specifically, what the DCF does is capture metadata related to the application and compute stack.

Manufacturing

Manufacturing Internet of Things Metadata Risk

How to Do Data Modeling the Right Way

erwin

MAY 27, 2020

What, then, should users look for in a data modeling product to support their governance/intelligence requirements in the data-driven enterprise? Nine Steps to Data Modeling. Provide metadata and schema visualization regardless of where data is stored. naming and database standards, formatting options, and so on.

Modeling

Modeling Metadata Data Governance Visualization

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Ontotext

JULY 29, 2021

KGs bring the Semantic Web paradigm to the enterprises, by introducing semantic metadata to drive data management and content management to new levels of efficiency and breaking silos to let them synergize with various forms of knowledge management. Take this restaurant, for example. Enterprise Knowledge Graphs and the Semantic Web.

Enterprise

Enterprise Metadata Knowledge Discovery Management

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

We will partition and format the server access logs with Amazon Web Services (AWS) Glue , a serverless data integration service, to generate a catalog for access logs and create dashboards for insights. Both the user data and logs buckets must be in the same AWS Region and owned by the same account.

Metadata

Metadata Dashboards Metrics Visualization

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Hudi provides tables , transactions , efficient upserts and deletes , advanced indexes , streaming ingestion services , data clustering and compaction optimizations, and concurrency control , all while keeping your data in open source file formats. This effectively provides change streams to enable incremental data pipelines.

Data Lake

Data Lake Snapshot Metadata Optimization

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

L1 is usually the raw, unprocessed data ingested directly from various sources; L2 is an intermediate layer featuring data that has undergone some form of transformation or cleaning; and L3 contains highly processed, optimized, and typically ready for analytics and decision-making processes.

Testing

Testing Data Quality Predictive Modeling Metrics

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

Despite soundings on this from leading thinkers such as Andrew Ng , the AI community remains largely oblivious to the important data management capabilities, practices, and – importantly – the tools that ensure the success of AI development and deployment. Further, data management activities don’t end once the AI model has been developed.

Data Governance

Data Governance IT Data Lake Risk

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

Figure 1: Apache Iceberg fits the next generation data architecture by abstracting storage layer from analytics layer while introducing net new capabilities like time-travel and partition evolution. #1: Apache Iceberg enables seamless integration between different streaming and processing engines while maintaining data integrity between them.

Metadata

Metadata Data Architecture Machine Learning Cost-Benefit

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

This introduces the need for both polling and pushing the data to access and analyze in near-real time. From an operational standpoint, we designed a new shared responsibility model for data ingestion using AWS Glue instead of internal services (REST APIs) designed on Amazon EC2 to extract the data.

Optimization

Optimization Forecasting Data Lake Metadata

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Modernizing analytics for scale, performance, and reliability “Our migration from legacy on-premises platform to Amazon Redshift allows us to ingest data 88% faster, query data 3x faster, and load daily data to the cloud 6x faster. Here’s a couple of highlights from this week and for the full list, see below.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. Data and Metadata: Data inputs and data outputs produced based on the application logic.

Metadata

Metadata Cost-Benefit Enterprise Interactive

AVB accelerates search in LINQ with Amazon OpenSearch Service

AWS Big Data

MAY 21, 2024

During implementation, the LINQ team worked with OpenSearch Service specialists to optimize the OpenSearch Service cluster configuration to maximize performance and optimize cost of the solution. This results in an optimized record for each product for quick and efficient search in OpenSearch Service.

Manufacturing

Manufacturing Sales Optimization Data Processing

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Get a closer look at how scaling for data warehousing works in AWS with the latest introduction of AI driven scaling and optimizations in Amazon Redshift Serverless to enable better price-performance for your workloads. Discover how you can use Amazon Redshift to build a data mesh architecture to analyze your data.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Dive deep into security management: The Data on EKS Platform

AWS Big Data

APRIL 29, 2024

Addressing big data challenges – Big data comes with unique challenges, like managing large volumes of rapidly evolving data across multiple platforms. Effective permission management helps tackle these challenges by controlling how data is accessed and used, providing data integrity and minimizing the risk of data breaches.

Management

Management Big Data Data Warehouse Metadata

GraphDB: MongoDB Document Store Integration for Large-scale Metadata Management

Ontotext

JUNE 27, 2019

Ontotext’s GraphDB is an enterprise-ready semantic graph database (also called RDF triplestore as it stores data in RDF triples). It provides the core infrastructure for solutions where modeling agility, data integration, relationship exploration, cross-enterprise data publishing and consumption are critical.

Metadata

Metadata Management Enterprise Publishing

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

AWS Big Data

JUNE 25, 2024

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence (BI) tools. It provides secure, real-time access to Redshift data without copying, keeping enterprise data in place.

Data Lake

Data Lake Cost-Benefit Data-driven Data Warehouse

Improving Multi-tenancy with Virtual Private Clusters

Cloudera

JUNE 6, 2019

While this approach provides isolation, it creates another significant challenge: duplication of data, metadata, and security policies, or ‘split-brain’ data lake. Now the admins need to synchronize multiple copies of the data and metadata and ensure that users across the many clusters are not viewing stale information.

Metadata

Metadata Data Lake Optimization Strategy

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

The Iceberg specification allows seamless table evolution such as schema and partition evolution, and its design is optimized for usage on Amazon S3. Iceberg stores the metadata pointer for all the metadata files. In this post, we use the Yellow taxi public dataset from NYC Taxi & Limousine Commission as our source data.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

CIOs recalibrate multicloud strategies as challenges remain

CIO Business Intelligence

OCTOBER 22, 2024

A market in need of more interoperability Systems integrators and cloud services teams have stepped in to remedy some of multicloud’s interoperability hurdles, but the optimal solution is for public cloud providers to build APIs directly into the cloud stack layer, Gartner’s Nag says.

Strategy

Strategy Cost-Benefit Risk Enterprise

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Build a high-performance quant research platform with Apache Iceberg

Webinars

Trending Sources

Enterprises can gain an edge with Metadata Management

Webinars

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

RDF-Star: Metadata Complexity Simplified

Comprehensive data management for AI: The next-gen data management engine that will drive AI to new heights

How EUROGATE established a data mesh architecture using Amazon DataZone

Data’s dark secret: Why poor quality cripples AI and growth

Recap of Amazon Redshift key product announcements in 2024

5 Ways Data Modeling Is Critical to Data Governance

IBM named a leader in the 2022 Gartner® Magic Quadrant™ for Data Integration Tools

Deep automation in machine learning

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

Top analytics announcements of AWS re:Invent 2024

Doing Cloud Migration and Data Governance Right the First Time

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

Becoming a machine learning company means investing in foundational technologies

2024 Gartner Market Guide To DataOps

What is data governance? Best practices for managing data assets

Denodo Provides a Logical Approach to Data Management

How Cargotec uses metadata replication to enable cross-account data sharing

Biggest Trends in Data Visualization Taking Shape in 2022

Data confidence begins at the edge

How to Do Data Modeling the Right Way

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

Introducing Apache Hudi support with AWS Glue crawlers

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

How Cloudera Data Flow Enables Successful Data Mesh Architectures

AVB accelerates search in LINQ with Amazon OpenSearch Service

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Dive deep into security management: The Data on EKS Platform

GraphDB: MongoDB Document Store Integration for Large-scale Metadata Management

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

Improving Multi-tenancy with Virtual Private Clusters

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

CIOs recalibrate multicloud strategies as challenges remain

Stay Connected