Data Integration, Data Lake and Metadata

Data Integration

Data Lake

Metadata

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

A high hurdle many enterprises have yet to overcome is accessing mainframe data via the cloud. Mainframes hold an enormous amount of critical and sensitive business data including transactional information, healthcare records, customer data, and inventory metrics.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it’s fast, efficient, and reliable at any scale and keeps records of how datasets change over time.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. Unlike direct Amazon S3 access, Iceberg supports these operations on petabyte-scale data lakes without requiring complex custom code.

Metadata

Metadata Snapshot Cost-Benefit Optimization

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

When evolving such a partition definition, the data in the table prior to the change is unaffected, as is its metadata. Only data that is written to the table after the evolution is partitioned with the new definition, and the metadata for this new set of data is kept separately.

Data Lake

Data Lake Metadata Snapshot Analytics

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

An extract, transform, and load (ETL) process using AWS Glue is triggered once a day to extract the required data and transform it into the required format and quality, following the data product principle of data mesh architectures. From here, the metadata is published to Amazon DataZone by using AWS Glue Data Catalog.

IoT

IoT Machine Learning Metadata Data-driven

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift enables you to directly access data stored in Amazon Simple Storage Service (Amazon S3) using SQL queries and join data across your data warehouse and data lake. With Amazon Redshift, you can query the data in your S3 data lake using a central AWS Glue metastore from your Redshift data warehouse.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Amazon SageMaker Lakehouse , now generally available, unifies all your data across Amazon Simple Storage Service (Amazon S3) data lakes and Amazon Redshift data warehouses, helping you build powerful analytics and AI/ML applications on a single copy of data. Having confidence in your data is key.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. To address this challenge, organizations can deploy a data mesh using AWS Lake Formation that connects the multiple EMR clusters. An entity can act both as a producer of data assets and as a consumer of data assets.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. Amazon DataZone natively supports data sharing for Amazon Redshift data assets.

Data Lake

Data Lake Data Warehouse Publishing Data Governance

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

AWS Big Data

SEPTEMBER 10, 2024

We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services. In such scenarios, data engineers face challenges in connecting and extracting data from storage containers on Microsoft Azure.

Data Lake

Data Lake Metadata Management Software

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

“The challenge that a lot of our customers have is that requires you to copy that data, store it in Salesforce; you have to create a place to store it; you have to create an object or field in which to store it; and then you have to maintain that pipeline of data synchronization and make sure that data is updated,” Carlson said.

Data Integration

Data Integration Data Lake Data Warehouse Metadata

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

AWS Big Data

FEBRUARY 6, 2023

For data lake customers who need to discover petabytes of data, AWS Glue crawlers are a popular way to discover and catalog data in the background. This allows users to search and find relevant data from multiple data sources. Choose the table to view the schema and other metadata.

Metadata

Metadata Data Lake Machine Learning Big Data

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture. Amazon Athena is used to query, and explore the data.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

These tools range from enterprise service bus (ESB) products, data integration tools; extract, transform and load (ETL) tools, procedural code, application program interfaces (APIs), file transfer protocol (FTP) processes, and even business intelligence (BI) reports that further aggregate and transform data.

Data Governance

Data Governance Metadata Testing Data Lake

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

Third, some services require you to set up and manage compute resources used for federated connectivity, and capabilities like connection testing and data preview arent available in all services. To solve for these challenges, we launched Amazon SageMaker Lakehouse unified data connectivity.

Visualization

Visualization Data Processing Testing Publishing

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as managing continuously evolving datasets with transactions while maintaining query performance.

Data Lake

Data Lake Snapshot Metadata Optimization

The Data Lakehouse: Blending Data Warehouses and Data Lakes

Data Virtualization

APRIL 21, 2022

Reading Time: 3 minutes First we had data warehouses, then came data lakes, and now the new kid on the block is the data lakehouse. But what is a data lakehouse and why should we develop one? In a way, the name describes what.

Data Lake

Data Lake Data Warehouse Data Integration Management

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

AWS Big Data

JUNE 25, 2024

This post is co-authored by Vijay Gopalakrishnan, Director of Product, Salesforce Data Cloud. In today’s data-driven business landscape, organizations collect a wealth of data across various touch points and unify it in a central data warehouse or a data lake to deliver business insights.

Data Lake

Data Lake Cost-Benefit Data-driven Data Warehouse

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

erwin

JULY 17, 2019

Your organization won’t be able to take complete advantage of analytics tools to become data-driven unless you establish a foundation for agile and complete data management. You need automated data mapping and cataloging through the integration lifecycle process, inclusive of data at rest and data in motion.

Digital Transformation

Digital Transformation Strategy Metadata Data-driven

Denodo Provides a Logical Approach to Data Management

David Menninger's Analyst Perspectives

OCTOBER 24, 2024

The ability to discover and access data via Denodo Platform is enabled by Denodo Data Catalog , which provides a search-based interface for finding data sources based on metadata or content, as well as metrics related to data popularity and usage.

Management

Management Data-driven Data Governance Data Lake

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Zero-ETL integration also enables you to load and analyze data from multiple operational database clusters in a new or existing Amazon Redshift instance to derive holistic insights across many applications. Learn more about the zero-ETL integrations, data lake performance enhancements, and other announcements below.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

However, enterprise data generated from siloed sources combined with the lack of a data integration strategy creates challenges for provisioning the data for generative AI applications. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization. For this, Cargotec built an Amazon Simple Storage Service (Amazon S3) data lake and cataloged the data assets in AWS Glue Data Catalog.

Metadata

Metadata Data Lake Machine Learning Big Data

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

All this data arrives by the terabyte, and a data management platform can help marketers make sense of it all. Marketing-focused or not, DMPs excel at negotiating with a wide array of databases, data lakes, or data warehouses, ingesting their streams of data and then cleaning, sorting, and unifying the information therein.

Management

Management Advertising Data Lake Sales

Data Management Requirements for the Enterprise Data Lake

In(tegrate) the Clouds

MAY 1, 2016

SnapLogic published Eight Data Management Requirements for the Enterprise Data Lake. They are: Storage and Data Formats. Metadata and Governance. The company also recently hosted a webinar on Democratizing the Data Lake with Constellation Research and published 2 whitepapers from Mark Madsen.

Data Lake

Data Lake Enterprise Management Metadata

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

This would be straightforward task were it not for the fact that, during the digital-era, there has been an explosion of data – collected and stored everywhere – much of it poorly governed, ill-understood, and irrelevant. Further, data management activities don’t end once the AI model has been developed. Addressing the Challenge.

Data Governance

Data Governance IT Data Lake Risk

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

We have seen a strong customer demand to expand its scope to cloud-based data lakes because data lakes are increasingly the enterprise solution for large-scale data initiatives due to their power and capabilities. Let’s say that this company is located in Europe and the data product must comply with the GDPR.

Data Lake

Data Lake Management Metrics Data Warehouse

What is an Information Steward, and Why You Should Care

Grooper

MARCH 5, 2020

These stewards monitor the input and output of data integrations and workflows to ensure data quality. Their focus is on master data management , data lakes / warehouses, and ensuring the trackability of data using audit trails and metadata. How to Get Started with Information Stewardship.

Data Lake

Data Lake Metadata Data Quality Software

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Gartner defines a data fabric as “a design concept that serves as an integrated layer of data and connecting processes. The data fabric architectural approach can simplify data access in an organization and facilitate self-service data consumption at scale. 11 May 2021. . 3 March 2022.

Management

Management Metadata Data Architecture Data Lake

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. Data and Metadata: Data inputs and data outputs produced based on the application logic.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Unlocking the value of data as your differentiator

AWS Big Data

NOVEMBER 29, 2023

You also need services to store data for analysis and machine learning (ML) like Amazon Simple Storage Service (Amazon S3). Customers have created hundreds of thousands of data lakes on Amazon S3. Amazon DataZone uses ML to automatically add metadata to your data catalog, making all of your data more discoverable.

Data Warehouse

Data Warehouse Data Lake Dashboards Data Integration

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Sessions ANT203 | What’s new in Amazon Redshift Watch this session to learn about the newest innovations within Amazon Redshift—the petabyte-scale AWS Cloud data warehousing solution. Easily build and train machine learning models using SQL within Amazon Redshift to generate predictive analytics and propel data-driven decision-making.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Data ingestion You have to build ingestion pipelines based on factors like types of data sources (on-premises data stores, files, SaaS applications, third-party data), and flow of data (unbounded streams or batch data). Then, you transform this data into a concise format.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Data Mesh vs. Data Fabric: A Love Story

Alation

JANUARY 13, 2022

Thoughtworks says data mesh is key to moving beyond a monolithic data lake. Spoiler alert: data fabric and data mesh are independent design concepts that are, in fact, quite complementary. Thoughtworks says data mesh is key to moving beyond a monolithic data lake 2. Gartner on Data Fabric.

Data Lake

Data Lake Metadata Data-driven Data Governance

A hybrid approach in healthcare data warehousing with Amazon Redshift

AWS Big Data

FEBRUARY 21, 2023

Loading complex multi-point datasets into a dimensional model, identifying issues, and validating data integrity of the aggregated and merged data points are the biggest challenges that clinical quality management systems face. Although data lakes resemble data vaults, a data vault provides more features of a data warehouse.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Metadata

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Bridging the gap between mainframe data and hybrid cloud environments

Webinars

Trending Sources

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Recap of Amazon Redshift key product announcements in 2024

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Build a high-performance quant research platform with Apache Iceberg

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

How EUROGATE established a data mesh architecture using Amazon DataZone

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Salesforce debuts Zero Copy Partner Network to ease data integration

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

Top analytics announcements of AWS re:Invent 2024

Data’s dark secret: Why poor quality cripples AI and growth

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Doing Cloud Migration and Data Governance Right the First Time

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Introducing Apache Hudi support with AWS Glue crawlers

The Data Lakehouse: Blending Data Warehouses and Data Lakes

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

Denodo Provides a Logical Approach to Data Management

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Data governance in the age of generative AI

How Cargotec uses metadata replication to enable cross-account data sharing

Top 15 data management platforms

Data Management Requirements for the Enterprise Data Lake

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

What is an Information Steward, and Why You Should Care

Augmented data management: Data fabric versus data mesh

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Unlocking the value of data as your differentiator

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Create an end-to-end data strategy for Customer 360 on AWS

Data Mesh vs. Data Fabric: A Love Story

A hybrid approach in healthcare data warehousing with Amazon Redshift

Stay Connected