Data Integration, Data Lake and Data Warehouse

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

Amazon Q data integration , introduced in January 2024, allows you to use natural language to author extract, transform, load (ETL) jobs and operations in AWS Glue specific data abstraction DynamicFrame. In this post, we discuss how Amazon Q data integration transforms ETL workflow development.

Data Integration

Data Integration Visualization Data Processing Big Data

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

AWS Big Data

FEBRUARY 26, 2025

Amazon Web Services (AWS) has been recognized as a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools. This recognition, we feel, reflects our ongoing commitment to innovation and excellence in data integration, demonstrating our continued progress in providing comprehensive data management solutions.

Data Integration

Data Integration Data Lake Data Warehouse Unstructured Data

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. Business units access clean, standardized data.

Data Lake

Data Lake Data Warehouse Publishing Data Governance

Oracle Wants to Be the Database for AI

David Menninger's Analyst Perspectives

MAY 15, 2025

In addition to various deployment options, Oracle offers several database services, including Oracle Exadata optimized infrastructure, Oracle Autonomous Database, Oracle Autonomous Data Warehouse and Heatwave MySQL service. Exadata is Oracles engineered system for data and now artificial intelligence (AI) operations.

Data Lake

Data Lake Data Warehouse Machine Learning Software

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. Industry-leading price-performance Amazon Redshift offers up to three times better price-performance than alternative cloud data warehouses.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

AWS Big Data

MARCH 13, 2025

Unified access to your data is provided by Amazon SageMaker Lakehouse , a unified, open, and secure data lakehouse built on Apache Iceberg open standards. Now, theyre able to build and collaborate with their data and tools available in one experience, dramatically reducing time-to-value.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Talend Data Fabric Simplifies Data Life Cycle Management

David Menninger's Analyst Perspectives

NOVEMBER 16, 2021

Talend is a data integration and management software company that offers applications for cloud computing, big data integration, application integration, data quality and master data management. Its code generation architecture uses a visual interface to create Java or SQL code.

Management

Management Data Warehouse Data Quality Data Integration

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Amazon SageMaker Lakehouse , now generally available, unifies all your data across Amazon Simple Storage Service (Amazon S3) data lakes and Amazon Redshift data warehouses, helping you build powerful analytics and AI/ML applications on a single copy of data. The tools to transform your business are here.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

Today, we’re excited to announce general availability of Amazon Q data integration in AWS Glue. Amazon Q data integration, a new generative AI-powered capability of Amazon Q Developer , enables you to build data integration pipelines using natural language.

Data Integration

Data Integration Data Lake Data Warehouse Software

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift is a fast, fully managed petabyte-scale cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Amazon Redshift also supports querying nested data with complex data types such as struct, array, and map.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Modernizing the Data Warehouse: Challenges and Benefits

BI-Survey

AUGUST 21, 2020

But what are the right measures to make the data warehouse and BI fit for the future? Can the basic nature of the data be proactively improved? The following insights came from a global BARC survey into the current status of data warehouse modernization. What role do technology and IT infrastructure play?

Data Warehouse

Data Warehouse Data Lake Data Governance Data Architecture

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it’s fast, efficient, and reliable at any scale and keeps records of how datasets change over time.

Data Lake

Data Lake Snapshot Metadata Data Architecture

What is data architecture? A framework to manage data

CIO Business Intelligence

DECEMBER 20, 2024

Beyond breaking down silos, modern data architectures need to provide interfaces that make it easy for users to consume data using tools fit for their jobs. Data must be able to freely move to and from data warehouses, data lakes, and data marts, and interfaces must make it easy for users to consume that data.

Data Architecture

Data Architecture Management Consulting Internet of Things

The Data Lakehouse: Blending Data Warehouses and Data Lakes

Data Virtualization

APRIL 21, 2022

Reading Time: 3 minutes First we had data warehouses, then came data lakes, and now the new kid on the block is the data lakehouse. But what is a data lakehouse and why should we develop one? In a way, the name describes what.

Data Lake

Data Lake Data Warehouse Data Integration Management

Accelerate data integration with Salesforce and AWS using AWS Glue

AWS Big Data

SEPTEMBER 4, 2024

Effective data analytics relies on seamlessly integrating data from disparate systems through identifying, gathering, cleansing, and combining relevant data into a unified format. Reverse ETL use cases are also supported, allowing you to write data back to Salesforce.

Data Integration

Data Integration Data Lake Data-driven Cost-Benefit

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

One of the key challenges in modern big data management is facilitating efficient data sharing and access control across multiple EMR clusters. Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. Test access using SageMaker Studio in the consumer account.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

Currently, a handful of startups offer “reverse” extract, transform, and load (ETL), in which they copy data from a customer’s data warehouse or data platform back into systems of engagement where business users do their work. Sharing Customer 360 insights back without data replication.

Data Integration

Data Integration Data Lake Data Warehouse Metadata

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Solving the small file problem and improving query performance In modern data architectures, stream processing engines such as Amazon EMR are often used to ingest continuous streams of data into data lakes using Apache Iceberg. Iceberg provides several maintenance operations to keep your tables in good shape.

Data Lake

Data Lake Metadata Snapshot Analytics

How Kaplan, Inc. implemented modern data pipelines using Amazon MWAA and Amazon AppFlow with Amazon Redshift as a data warehouse

AWS Big Data

AUGUST 22, 2024

The infrastructure provides an analytics experience to hundreds of in-house analysts, data scientists, and student-facing frontend specialists. The data engineering team is on a mission to modernize its data integration platform to be agile, adaptive, and straightforward to use.

Data Warehouse

Data Warehouse Data Lake Data Integration Management

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

AWS Big Data

JULY 31, 2024

In the current industry landscape, data lakes have become a cornerstone of modern data architecture, serving as repositories for vast amounts of structured and unstructured data. Maintaining data consistency and integrity across distributed data lakes is crucial for decision-making and analytics.

Data Lake

Data Lake Marketing Data Processing Management

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

cycle_end"', "sagemakedatalakeenvironment_sub_db", ctas_approach=False) A similar approach is used to connect to shared data from Amazon Redshift, which is also shared using Amazon DataZone. AWS Database Migration Service (AWS DMS) is used to securely transfer the relevant data to a central Amazon Redshift cluster.

IoT

IoT Machine Learning Metadata Data-driven

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon AppFlow automatically encrypts data in motion, and allows you to restrict data from flowing over the public internet for SaaS applications that are integrated with AWS PrivateLink , reducing exposure to security threats. He has worked with building data warehouses and big data solutions for over 13 years.

Analytics

Analytics Data Warehouse Big Data Metrics

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

It’s Only Logical

Data Virtualization

MAY 20, 2025

Reading Time: 3 minutes Gartner has had a long history of analyzing the potential of a logical approach to data management. In 2020, in The Practical Logical Data Warehouse, Gartner begins by saying, The logical data warehouse a data consolidation and virtualization architecture.

Data Warehouse

Data Warehouse Data Integration Management Data Lake

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

AWS Big Data

JUNE 25, 2024

This post is co-authored by Vijay Gopalakrishnan, Director of Product, Salesforce Data Cloud. In today’s data-driven business landscape, organizations collect a wealth of data across various touch points and unify it in a central data warehouse or a data lake to deliver business insights.

Data Lake

Data Lake Cost-Benefit Data-driven Data Warehouse

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

Automatically detect Personally Identifiable Information in Amazon Redshift using AWS Glue

AWS Big Data

DECEMBER 15, 2023

Many companies identify and label PII through manual, time-consuming, and error-prone reviews of their databases, data warehouses and data lakes, thereby rendering their sensitive data unprotected and vulnerable to regulatory penalties and breach incidents. Load data from Amazon S3 to the Redshift data warehouse.

Data Warehouse

Data Warehouse Data Lake Big Data Structured Data

What is Data Pipeline? A Detailed Explanation

Smart Data Collective

OCTOBER 17, 2022

A point of data entry in a given pipeline. Examples of an origin include storage systems like data lakes, data warehouses and data sources that include IoT devices, transaction processing applications, APIs or social media. The final point to which the data has to be eventually transferred is a destination.

Data Warehouse

Data Warehouse Data Lake Visualization Big Data

Compose your ETL jobs for MongoDB Atlas with AWS Glue

AWS Big Data

MAY 3, 2023

In today’s data-driven business environment, organizations face the challenge of efficiently preparing and transforming large amounts of data for analytics and data science purposes. Businesses need to build data warehouses and data lakes based on operational data.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. These upstream data sources constitute the data producer components.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

The New Data Integration Requirements

In(tegrate) the Clouds

JUNE 23, 2016

This week SnapLogic posted a presentation of the 10 Modern Data Integration Platform Requirements on the company’s blog. They are: Application integration is done primarily through REST & SOAP services. Large-volume data integration is available to Hadoop-based data lakes or cloud-based data warehouses.

Data Integration

Data Integration Data Lake Data Warehouse Data-driven

Databricks’ new data lakehouse aims at media, entertainment sector

CIO Business Intelligence

APRIL 25, 2022

The data lakehouse is a relatively new data architecture concept, first championed by Cloudera, which offers both storage and analytics capabilities as part of the same solution, in contrast to the concepts for data lake and data warehouse which, respectively, store data in native format, and structured data, often in SQL format.

Recreation/Entertainment

Recreation/Entertainment Data Lake Data Warehouse Unstructured Data

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Laying the Foundation for Modern Data Architecture

Cloudera

MAY 28, 2024

This form of architecture can handle data in all forms—structured, semi-structured, unstructured—blending capabilities from data warehouses and data lakes into data lakehouses.

Data Architecture

Data Architecture Data Lake Data Warehouse Cost-Benefit

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads. Discover how you can use Amazon Redshift to build a data mesh architecture to analyze your data.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Snowflake: Data Ingestion Using Snowpipe and AWS Glue

BizAcuity

NOVEMBER 22, 2022

This typically requires a data warehouse for analytics needs that is able to ingest and handle real time data of huge volumes. Snowflake is a cloud-native platform that eliminates the need for separate data warehouses, data lakes, and data marts allowing secure data sharing across the organization.

Data Warehouse

Data Warehouse Cost-Benefit Data Lake Internet of Things

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

All this data arrives by the terabyte, and a data management platform can help marketers make sense of it all. Marketing-focused or not, DMPs excel at negotiating with a wide array of databases, data lakes, or data warehouses, ingesting their streams of data and then cleaning, sorting, and unifying the information therein.

Management

Management Advertising Data Lake Sales

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

AWS Big Data

OCTOBER 20, 2023

Today, we are pleased to announce new AWS Glue connectors for Azure Blob Storage and Azure Data Lake Storage that allow you to move data bi-directionally between Azure Blob Storage, Azure Data Lake Storage, and Amazon Simple Storage Service (Amazon S3). option("header","true").load("wasbs://yourblob@youraccountname.blob.core.windows.net/loadingtest-input/100mb")

Data Lake

Data Lake Big Data Data Warehouse Consulting

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

AWS Big Data

NOVEMBER 10, 2023

This integration expands the possibilities for AWS analytics and machine learning (ML) solutions, making the data warehouse accessible to a broader range of applications. These tables are then joined with tables from the Enterprise Data Lake (EDL) at runtime. options(**read_config).option("query", cast("string")).dropDuplicates())

Data Processing

Data Processing Data Lake Data Warehouse Optimization

The Key Components of a Successful Data Lake Strategy

Data Virtualization

MARCH 16, 2023

Reading Time: 6 minutes Data lake, by combining the flexibility of object storage with the scalability and agility of cloud platforms, are becoming an increasingly popular choice as an enterprise data repository. Whether you are on Amazon Web Services (AWS) and leverage AWS S3.

Data Lake

Data Lake Strategy Data Integration Enterprise

The Key Components of a Successful Data Lake Strategy

Data Virtualization

MARCH 16, 2023

Reading Time: 6 minutes Data lake, by combining the flexibility of object storage with the scalability and agility of cloud platforms, are becoming an increasingly popular choice as an enterprise data repository. Whether you are on Amazon Web Services (AWS) and leverage AWS S3.

Data Lake

Data Lake Strategy Data Integration Enterprise

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Webinars

Trending Sources

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

Webinars

Load data incrementally from transactional data lakes to data warehouses

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Oracle Wants to Be the Database for AI

Recap of Amazon Redshift key product announcements in 2024

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

Talend Data Fabric Simplifies Data Life Cycle Management

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Introducing Amazon Q data integration in AWS Glue

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Modernizing the Data Warehouse: Challenges and Benefits

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

What is data architecture? A framework to manage data

The Data Lakehouse: Blending Data Warehouses and Data Lakes

Accelerate data integration with Salesforce and AWS using AWS Glue

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Salesforce debuts Zero Copy Partner Network to ease data integration

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

How Kaplan, Inc. implemented modern data pipelines using Amazon MWAA and Amazon AppFlow with Amazon Redshift as a data warehouse

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

How EUROGATE established a data mesh architecture using Amazon DataZone

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

It’s Only Logical

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

Top analytics announcements of AWS re:Invent 2024

Automatically detect Personally Identifiable Information in Amazon Redshift using AWS Glue

What is Data Pipeline? A Detailed Explanation

Compose your ETL jobs for MongoDB Atlas with AWS Glue

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

The New Data Integration Requirements

Databricks’ new data lakehouse aims at media, entertainment sector

Scaling RISE with SAP data and AWS Glue

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Laying the Foundation for Modern Data Architecture

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Snowflake: Data Ingestion Using Snowpipe and AWS Glue

Top 15 data management platforms

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

The Key Components of a Successful Data Lake Strategy

The Key Components of a Successful Data Lake Strategy

Stay Connected