Data Architecture, Data Integration and Data Lake

What is data architecture? A framework to manage data

CIO Business Intelligence

DECEMBER 20, 2024

Data architecture definition Data architecture describes the structure of an organizations logical and physical data assets, and data management resources, according to The Open Group Architecture Framework (TOGAF). An organizations data architecture is the purview of data architects.

Data Architecture

Data Architecture Management Consulting Internet of Things

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

AWS Big Data

FEBRUARY 26, 2025

Amazon Web Services (AWS) has been recognized as a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools. This recognition, we feel, reflects our ongoing commitment to innovation and excellence in data integration, demonstrating our continued progress in providing comprehensive data management solutions.

Data Integration

Data Integration Data Lake Data Warehouse Unstructured Data

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights. They are the same.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Laying the Foundation for Modern Data Architecture

Cloudera

MAY 28, 2024

It’s not enough for businesses to implement and maintain a data architecture. The unpredictability of market shifts and the evolving use of new technologies means businesses need more data they can trust than ever to stay agile and make the right decisions.

Data Architecture

Data Architecture Data Lake Data Warehouse Cost-Benefit

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Amazon SageMaker Lakehouse , now generally available, unifies all your data across Amazon Simple Storage Service (Amazon S3) data lakes and Amazon Redshift data warehouses, helping you build powerful analytics and AI/ML applications on a single copy of data. The tools to transform your business are here.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift enables you to directly access data stored in Amazon Simple Storage Service (Amazon S3) using SQL queries and join data across your data warehouse and data lake. With Amazon Redshift, you can query the data in your S3 data lake using a central AWS Glue metastore from your Redshift data warehouse.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Use cases for Hive metastore federation for Amazon EMR Hive metastore federation for Amazon EMR is applicable to the following use cases: Governance of Amazon EMR-based data lakes – Producers generate data within their AWS accounts using an Amazon EMR-based data lake supported by EMRFS on Amazon Simple Storage Service (Amazon S3)and HBase.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Solving the small file problem and improving query performance In modern data architectures, stream processing engines such as Amazon EMR are often used to ingest continuous streams of data into data lakes using Apache Iceberg. Iceberg provides several maintenance operations to keep your tables in good shape.

Data Lake

Data Lake Metadata Snapshot Analytics

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

AWS Big Data

JULY 31, 2024

In the current industry landscape, data lakes have become a cornerstone of modern data architecture, serving as repositories for vast amounts of structured and unstructured data. Maintaining data consistency and integrity across distributed data lakes is crucial for decision-making and analytics.

Data Lake

Data Lake Marketing Data Processing Management

Modern Data Architecture: Data Warehousing, Data Lakes, and Data Mesh Explained

Data Virtualization

OCTOBER 5, 2022

Reading Time: 3 minutes At the heart of every organization lies a data architecture, determining how data is accessed, organized, and used. For this reason, organizations must periodically revisit their data architectures, to ensure that they are aligned with current business goals.

Data Lake

Data Lake Data Architecture Data Integration Management

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Need for a data mesh architecture Because entities in the EUROGATE group generate vast amounts of data from various sourcesacross departments, locations, and technologiesthe traditional centralized data architecture struggles to keep up with the demands for real-time insights, agility, and scalability.

IoT

IoT Machine Learning Metadata Data-driven

Why Every Organization Needs a Data Marketplace

Data Virtualization

APRIL 30, 2025

Reading Time: 3 minutes Data is often hailed as the most valuable assetbut for many organizations, its still locked behind technical barriers and organizational bottlenecks. Modern data architectures like data lakehouses and cloud-native ecosystems were supposed to solve this, promising centralized access and scalability.

Data Architecture

Data Architecture Data Integration Management IT

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Several factors determine the quality of your enterprise data like accuracy, completeness, consistency, to name a few. But there’s another factor of data quality that doesn’t get the recognition it deserves: your data architecture. How the right data architecture improves data quality.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

Today, the way businesses use data is much more fluid; data literate employees use data across hundreds of apps, analyze data for better decision-making, and access data from numerous locations. Then, it applies these insights to automate and orchestrate the data lifecycle.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

Ingestion: Data lake batch, micro-batch, and streaming Many organizations land their source data into their data lake in various ways, including batch, micro-batch, and streaming jobs. Amazon AppFlow can be used to transfer data from different SaaS applications to a data lake.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

Databricks’ new data lakehouse aims at media, entertainment sector

CIO Business Intelligence

APRIL 25, 2022

The other 10% represents the effort of initial deployment, data-loading, configuration and the setup of administrative tasks and analysis that is specific to the customer, the Henschen said. The joint solution with Labelbox is targeted toward media companies and is expected to help firms derive more value out of unstructured data.

Recreation/Entertainment

Recreation/Entertainment Data Lake Data Warehouse Unstructured Data

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Zero-ETL integration also enables you to load and analyze data from multiple operational database clusters in a new or existing Amazon Redshift instance to derive holistic insights across many applications. Learn more about the zero-ETL integrations, data lake performance enhancements, and other announcements below.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Modernizing the Data Warehouse: Challenges and Benefits

BI-Survey

AUGUST 21, 2020

The primary modernization approach is data warehouse/ETL automation, which helps promote broad usage of the data warehouse but can only partially improve efficiency in data management processes. However, an automation approach alone is of limited usefulness when data management processes are inefficient.

Data Warehouse

Data Warehouse Data Lake Data Governance Data Architecture

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. Additionally, data is extracted from vendor APIs that includes data related to product, marketing, and customer experience.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture. 4:30 PM – 5:30 PM (PDT) Wynn ANT207 | Understand your data with business context. 1:00 PM – 2:00 PM (PDT) Venetian ANT201 | Accelerate innovation with real-time data.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Snowflake: Data Ingestion Using Snowpipe and AWS Glue

BizAcuity

NOVEMBER 22, 2022

In today’s world that is largely data-driven, organizations depend on data for their success and survival, and therefore need robust, scalable data architecture to handle their data needs. This typically requires a data warehouse for analytics needs that is able to ingest and handle real time data of huge volumes.

Data Warehouse

Data Warehouse Cost-Benefit Data Lake Internet of Things

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. It also helps you securely access your data in operational databases, data lakes, or third-party datasets with minimal movement or copying of data.

Analytics

Analytics Data Warehouse Big Data Metrics

The Lakehouse Isn’t The End Game — Here’s What Comes Next

Data Virtualization

MAY 22, 2025

Reading Time: 2 minutes The data lakehouse has emerged as a powerful and popular data architecture, combining the scale of data lakes with the management features of data warehouses. It promises a unified platform for storing and analyzing structured and unstructured data, particularly for.

Data Lake

Data Lake Unstructured Data Data Warehouse Data Architecture

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Data fabric and data mesh are emerging data management concepts that are meant to address the organizational change and complexities of understanding, governing and working with enterprise data in a hybrid multicloud ecosystem. The good news is that both data architecture concepts are complimentary.

Management

Management Metadata Data Architecture Data Lake

Chose Both: Data Fabric and Data Lakehouse

Cloudera

SEPTEMBER 12, 2022

Combining and analyzing both structured and unstructured data is a whole new challenge to come to grips with, let alone doing so across different infrastructures. Both obstacles can be overcome using modern data architectures, specifically data fabric and data lakehouse. Unified data fabric.

Unstructured Data

Unstructured Data Data Architecture Data Lake Snapshot

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads. Discover how you can use Amazon Redshift to build a data mesh architecture to analyze your data.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

CIO Ryan Snyder on the benefits of interpreting data as a layer cake

CIO Business Intelligence

AUGUST 2, 2023

So Thermo Fisher Scientific CIO Ryan Snyder and his colleagues have built a data layer cake based on a cascading series of discussions that allow IT and business partners to act as one team. Martha Heller: What are the business drivers behind the data architecture ecosystem you’re building at Thermo Fisher Scientific?

Manufacturing

Manufacturing Data Architecture Data Strategy Strategy

Dive deep into AWS Glue 4.0 for Apache Spark

AWS Big Data

MAY 18, 2023

It’s even harder when your organization is dealing with silos that impede data access across different data stores. Seamless data integration is a key requirement in a modern data architecture to break down data silos. AWS Glue released version 4.0 runtime ( 3.5 AWS Glue released version 4.0

Testing

Testing Data Lake Cost-Benefit Data Integration

Extract data from SAP ERP using AWS Glue and the SAP SDK

AWS Big Data

FEBRUARY 8, 2023

Vyaire developed a custom data integration platform, iDataHub, powered by AWS services such as AWS Glue , AWS Lambda , and Amazon API Gateway. In this post, we share how we extracted data from SAP ERP using AWS Glue and the SAP SDK. Prahalathan M is the Data Integration Architect at Vyaire Medical Inc.

Testing

Testing Data Integration Data Lake Enterprise

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

In fact, we recently announced the integration with our cloud ecosystem bringing the benefits of Iceberg to enterprises as they make their journey to the public cloud, and as they adopt more converged architectures like the Lakehouse. 1: Multi-function analytics . 1: Multi-function analytics . Flexible and open file formats.

Metadata

Metadata Data Architecture Machine Learning Cost-Benefit

Modernizing Data Analytics Architecture with the Denodo Platform on Azure

Data Virtualization

JANUARY 19, 2023

Reading Time: 2 minutes Today, many businesses are modernizing their on-premises data warehouses or cloud-based data lakes using Microsoft Azure Synapse Analytics. Unfortunately, with data spread.

Data Analytics

Data Analytics Data Lake Data Warehouse Analytics

Accelerate Amazon Redshift secure data use with Satori – Part 1

AWS Big Data

SEPTEMBER 21, 2023

Satori accelerates implementing data security controls on datawarehouses like Amazon Redshift, is straightforward to integrate, and doesn’t require any changes to your Amazon Redshift data, schema, or how your users interact with data. To learn more, start a free trial or request a demo meeting.

Data Warehouse

Data Warehouse Interactive Data Architecture Data-driven

Accelerate Cloud Data Integration with Data Virtualization in the Cloud

Data Virtualization

JULY 8, 2020

In my last post, I covered some of the latest best practices for enhancing data management capabilities in the cloud. Despite the increasing popularity of cloud services, enterprises continue to struggle with creating and implementing a comprehensive cloud strategy that.

Data Integration

Data Integration Strategy Enterprise Management

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Data ingestion You have to build ingestion pipelines based on factors like types of data sources (on-premises data stores, files, SaaS applications, third-party data), and flow of data (unbounded streams or batch data). Data exploration Data exploration helps unearth inconsistencies, outliers, or errors.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. How CDF enables successful Data Mesh Architectures. Introduction.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Go Fast and Far Using Data Virtualization

Data Virtualization

JANUARY 20, 2022

Reading Time: 3 minutes We are always focused on making things “Go Fast” but how do we make sure we future proof our data architecture and ensure that we can “Go Far”? Technologies change constantly within organizations and having a flexible architecture is key.

Data Architecture

Data Architecture Data Integration Technology Management

Go Fast and Far Using Data Virtualization to help you Go Fast and Go Far

Data Virtualization

JANUARY 20, 2022

Reading Time: 3 minutes We are always focused on making things “Go Fast” but how do we make sure we future proof our data architecture and ensure that we can “Go Far”? Technologies change constantly within organizations and having a flexible architecture is key.

Data Architecture

Data Architecture Data Integration Technology Management

Are Data Silos Undermining Digital Transformation?

BI-Survey

NOVEMBER 23, 2021

Thus, alternative data architecture concepts have emerged, such as the data lake and the data lakehouse. Which data architecture is right for the data-driven enterprise remains a subject of ongoing debate. Data black holes: the high cost of supposed flexibility.

Digital Transformation

Digital Transformation Data Warehouse Data Lake Data-driven

What is data architecture? A framework to manage data

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Webinars

Trending Sources

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

Webinars

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Load data incrementally from transactional data lakes to data warehouses

Laying the Foundation for Modern Data Architecture

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

Modern Data Architecture: Data Warehousing, Data Lakes, and Data Mesh Explained

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

How EUROGATE established a data mesh architecture using Amazon DataZone

Why Every Organization Needs a Data Marketplace

Data’s dark secret: Why poor quality cripples AI and growth

Data architecture strategy for data quality

Data democratization: How data architecture can drive business decisions and AI initiatives

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Top analytics announcements of AWS re:Invent 2024

Databricks’ new data lakehouse aims at media, entertainment sector

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Modernizing the Data Warehouse: Challenges and Benefits

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Your guide to AWS Analytics at AWS re:Invent 2023

Snowflake: Data Ingestion Using Snowpipe and AWS Glue

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

The Lakehouse Isn’t The End Game — Here’s What Comes Next

Augmented data management: Data fabric versus data mesh

Chose Both: Data Fabric and Data Lakehouse

AWS re:Invent 2023 Amazon Redshift Sessions Recap

CIO Ryan Snyder on the benefits of interpreting data as a layer cake

Dive deep into AWS Glue 4.0 for Apache Spark

Extract data from SAP ERP using AWS Glue and the SAP SDK

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Modernizing Data Analytics Architecture with the Denodo Platform on Azure

Accelerate Amazon Redshift secure data use with Satori – Part 1

Accelerate Cloud Data Integration with Data Virtualization in the Cloud

Create an end-to-end data strategy for Customer 360 on AWS

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Go Fast and Far Using Data Virtualization

Go Fast and Far Using Data Virtualization to help you Go Fast and Go Far

Are Data Silos Undermining Digital Transformation?

Stay Connected