Data Analytics, Data Integration and Data Lake

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

Amazon Q data integration , introduced in January 2024, allows you to use natural language to author extract, transform, load (ETL) jobs and operations in AWS Glue specific data abstraction DynamicFrame. In this post, we discuss how Amazon Q data integration transforms ETL workflow development.

Data Integration

Data Integration Visualization Data Processing Data Lake

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

This week on the keynote stages at AWS re:Invent 2024, you heard from Matt Garman, CEO, AWS, and Swami Sivasubramanian, VP of AI and Data, AWS, speak about the next generation of Amazon SageMaker , the center for all of your data, analytics, and AI. The relationship between analytics and AI is rapidly evolving.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

Today, we’re excited to announce general availability of Amazon Q data integration in AWS Glue. Amazon Q data integration, a new generative AI-powered capability of Amazon Q Developer , enables you to build data integration pipelines using natural language.

Data Integration

Data Integration Data Lake Data Warehouse Software

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

AWS Big Data

SEPTEMBER 10, 2024

We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services. In such scenarios, data engineers face challenges in connecting and extracting data from storage containers on Microsoft Azure.

Data Lake

Data Lake Metadata Management Software

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

In this blog post, we dive into different data aspects and how Cloudinary breaks the two concerns of vendor locking and cost efficient data analytics by using Apache Iceberg, Amazon Simple Storage Service (Amazon S3 ), Amazon Athena , Amazon EMR , and AWS Glue. 5 seconds $0.08 8 seconds $0.07 8 seconds $0.02 107 seconds $0.25

Data Lake

Data Lake Metadata Snapshot Analytics

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

AWS Big Data

MARCH 13, 2025

At AWS re:Invent 2024, we announced the next generation of Amazon SageMaker , the center for all your data, analytics, and AI. Unified access to your data is provided by Amazon SageMaker Lakehouse , a unified, open, and secure data lakehouse built on Apache Iceberg open standards.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Accelerate data integration with Salesforce and AWS using AWS Glue

AWS Big Data

SEPTEMBER 4, 2024

The rapid adoption of software as a service (SaaS) solutions has led to data silos across various platforms, presenting challenges in consolidating insights from diverse sources. Introducing the Salesforce connector for AWS Glue To meet the demands of diverse data integration use cases, AWS Glue now supports SaaS connectivity for Salesforce.

Data Integration

Data Integration Data Lake Data-driven Cost-Benefit

Fire Your Super-Smart Data Consultants with DataOps

DataKitchen

JANUARY 25, 2022

DataOps improves the robustness, transparency and efficiency of data workflows through automation. For example, DataOps can be used to automate data integration. Previously, the consulting team had been using a patchwork of ETL to consolidate data from disparate sources into a data lake.

Consulting

Consulting Testing Data Lake Data Quality

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

In addition to real-time analytics and visualization, the data needs to be shared for long-term data analytics and machine learning applications. This approach supports both the immediate needs of visualization tools such as Tableau and the long-term demands of digital twin and IoT data analytics.

IoT

IoT Machine Learning Metadata Data-driven

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Use cases for Hive metastore federation for Amazon EMR Hive metastore federation for Amazon EMR is applicable to the following use cases: Governance of Amazon EMR-based data lakes – Producers generate data within their AWS accounts using an Amazon EMR-based data lake supported by EMRFS on Amazon Simple Storage Service (Amazon S3)and HBase.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

AWS Big Data

JULY 10, 2024

Now you can author data preparation transformations and edit them with the AWS Glue Studio visual editor. The AWS Glue Studio visual editor is a graphical interface that enables you to create, run, and monitor data integration jobs in AWS Glue. She is passionate about helping customers build data lakes using ETL workloads.

Interactive

Interactive Visualization Data Integration Statistics

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

With the rapid growth of technology, more and more data volume is coming in many different formats—structured, semi-structured, and unstructured. Data analytics on operational data at near-real time is becoming a common need. a new version of AWS Glue that accelerates data integration workloads in AWS.

Data Lake

Data Lake Visualization Dashboards Insurance

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

AWS Big Data

NOVEMBER 11, 2024

About the Authors Samir Patel is a Senior Data Architect at Amazon Web Services, where he specializes in OpenSearch, data analytics, and cutting-edge generative AI technologies. Samir works directly with enterprise customers to design and build customized solutions catered to their data analytics and cybersecurity needs.

Snapshot

Snapshot Strategy Dashboards Data Lake

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

AWS Big Data

MARCH 27, 2024

AWS has invested in a zero-ETL (extract, transform, and load) future so that builders can focus more on creating value from data, instead of having to spend time preparing data for analysis. This means you no longer have to create an external schema in Amazon Redshift to use the data lake tables cataloged in the Data Catalog.

Data Analytics

Data Analytics Analytics Data Warehouse Data Lake

How Kaplan, Inc. implemented modern data pipelines using Amazon MWAA and Amazon AppFlow with Amazon Redshift as a data warehouse

AWS Big Data

AUGUST 22, 2024

Kaplan data engineers empower data analytics using Amazon Redshift and Tableau. The infrastructure provides an analytics experience to hundreds of in-house analysts, data scientists, and student-facing frontend specialists. He works with Data Engineers at Kaplan for building data lakes using AWS Services.

Data Warehouse

Data Warehouse Data Lake Data Integration Management

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. This zero-ETL integration reduces the complexity and operational burden of data replication to let you focus on deriving insights from your data.

Analytics

Analytics Data Lake Metadata Data Warehouse

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

AWS Big Data

MAY 30, 2023

Customers have been using data warehousing solutions to perform their traditional analytics tasks. Traditional batch ingestion and processing pipelines that involve operations such as data cleaning and joining with reference data are straightforward to create and cost-efficient to maintain. options(**additional_options).mode("append").save(s3_output_folder)

Data Lake

Data Lake Data Analytics Analytics Data Processing

Modernizing Data Analytics Architecture with the Denodo Platform on Azure

Data Virtualization

JANUARY 19, 2023

Reading Time: 2 minutes Today, many businesses are modernizing their on-premises data warehouses or cloud-based data lakes using Microsoft Azure Synapse Analytics. Unfortunately, with data spread.

Data Analytics

Data Analytics Data Lake Data Warehouse Analytics

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

We have defined all layers and components of our design in line with the AWS Well-Architected Framework Data Analytics Lens. Ingestion: Data lake batch, micro-batch, and streaming Many organizations land their source data into their data lake in various ways, including batch, micro-batch, and streaming jobs.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. These upstream data sources constitute the data producer components.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. Conclusion In this post, we walked you through the process of using Amazon AppFlow to integrate data from Google Ads and Google Sheets.

Analytics

Analytics Data Warehouse Big Data Metrics

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

As organizations increasingly rely on data stored across various platforms, such as Snowflake , Amazon Simple Storage Service (Amazon S3), and various software as a service (SaaS) applications, the challenge of bringing these disparate data sources together has never been more pressing.

Analytics

Analytics Data-driven Data Integration Data Lake

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

As the volume and complexity of analytics workloads continue to grow, customers are looking for more efficient and cost-effective ways to ingest and analyse data. AWS Glue provides both visual and code-based interfaces to make data integration effortless.

Analytics

Analytics IT Data Lake Visualization

Your 5-Step Journey from Analytics to AI

CIO Business Intelligence

MARCH 22, 2022

Which type(s) of storage consolidation you use depends on the data you generate and collect. . One option is a data lake—on-premises or in the cloud—that stores unprocessed data in any type of format, structured or unstructured, and can be queried in aggregate. Set up unified data governance rules and processes.

Analytics

Analytics Key Performance Indicator Data Warehouse Data-driven

Automate schema evolution at scale with Apache Hudi in AWS Glue

AWS Big Data

FEBRUARY 7, 2023

In the data analytics space, organizations often deal with many tables in different databases and file formats to hold data for different business functions. Apache Hudi supports ACID transactions and CRUD operations on a data lake. You don’t alter queries separately in the data lake. and save it.

Data Lake

Data Lake Testing Big Data Structured Data

Data replication holds the key to hybrid cloud effectiveness

CIO Business Intelligence

MARCH 18, 2024

But when it comes to getting the most value out of hybrid cloud, one of the most crucial capabilities required is data replication and synchronization—what enables businesses to efficiently capture data changes and unify various data stores while ensuring low latency, high availability, and data integrity.

Cost-Benefit

Cost-Benefit Data Lake Machine Learning Data Integration

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

ChatGPT> DataOps, or data operations, is a set of practices and technologies that organizations use to improve the speed, quality, and reliability of their data analytics processes. Overall, DataOps is an essential component of modern data-driven organizations. Query> DataOps. Query> Write an essay on DataOps.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

In today’s data-driven world, seamless integration and transformation of data across diverse sources into actionable insights is paramount. It enables you to visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Kamen Sharlandjiev is a Sr.

Data Processing

Data Processing Visualization Data Lake Data Processing

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Sessions ANT203 | What’s new in Amazon Redshift Watch this session to learn about the newest innovations within Amazon Redshift—the petabyte-scale AWS Cloud data warehousing solution. Easily build and train machine learning models using SQL within Amazon Redshift to generate predictive analytics and propel data-driven decision-making.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Unlock scalable analytics with AWS Glue and Google BigQuery

AWS Big Data

OCTOBER 27, 2023

Data integration is the foundation of robust data analytics. It encompasses the discovery, preparation, and composition of data from diverse sources. In the modern data landscape, accessing, integrating, and transforming data from diverse sources is a vital process for data-driven decision-making.

Analytics

Analytics Visualization Data Integration Cost-Benefit

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Let’s go through the ten Azure data pipeline tools Azure Data Factory : This cloud-based data integration service allows you to create data-driven workflows for orchestrating and automating data movement and transformation. You can use it for big data analytics and machine learning workloads.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

However, enterprise data generated from siloed sources combined with the lack of a data integration strategy creates challenges for provisioning the data for generative AI applications. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Handle UPSERT data operations using open-source Delta Lake and AWS Glue

AWS Big Data

JANUARY 30, 2023

Many customers need an ACID transaction (atomic, consistent, isolated, durable) data lake that can log change data capture (CDC) from operational data sources. There is also demand for merging real-time data into batch data. Delta Lake framework provides these two capabilities. option("header",True).schema(schema).load("s3://"+

Insurance

Insurance Data Lake Data-driven Management

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

The data fabric architectural approach can simplify data access in an organization and facilitate self-service data consumption at scale. Read: The first capability of a data fabric is a semantic knowledge data catalog, but what are the other 5 core capabilities of a data fabric? 11 May 2021. .

Management

Management Metadata Data Architecture Data Lake

Prepare and load Amazon S3 data into Teradata using AWS Glue through its native connector for Teradata Vantage

AWS Big Data

NOVEMBER 30, 2023

In this post, we explore how to use the AWS Glue native connector for Teradata Vantage to streamline data integrations and unlock the full potential of your data. Businesses often rely on Amazon Simple Storage Service (Amazon S3) for storing large amounts of data from various data sources in a cost-effective and secure manner.

IT

IT Visualization Machine Learning Data Integration

Connect your data for faster decisions with AWS

AWS Big Data

NOVEMBER 7, 2023

Third, AWS continues adding support for more data sources including connections to software as a service (SaaS) applications, on-premises applications, and other clouds so organizations can act on their data. Visit Data integration with AWS to learn more.

Dashboards

Dashboards Data-driven Data Integration Data Lake

Differentiate generative AI applications with your data using AWS analytics and managed databases

AWS Big Data

SEPTEMBER 12, 2024

The application gets prompt templates from an S3 data lake and creates the engineered prompt. The user interaction is stored in a data lake for downstream usage and BI analysis. EMEA Data & AI PSA, based in Madrid. In his current role, Angel helps partners develop businesses centered on Data and AI.

Management

Management Analytics Data Lake Interactive

Dive deep into AWS Glue 4.0 for Apache Spark

AWS Big Data

MAY 18, 2023

It’s even harder when your organization is dealing with silos that impede data access across different data stores. Seamless data integration is a key requirement in a modern data architecture to break down data silos. He is passionate about distributed computing and everything and anything about the data.

Testing

Testing Data Lake Cost-Benefit Data Integration

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Webinars

Trending Sources

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Webinars

Introducing Amazon Q data integration in AWS Glue

Recap of Amazon Redshift key product announcements in 2024

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

Accelerate data integration with Salesforce and AWS using AWS Glue

Fire Your Super-Smart Data Consultants with DataOps

How EUROGATE established a data mesh architecture using Amazon DataZone

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

How Kaplan, Inc. implemented modern data pipelines using Amazon MWAA and Amazon AppFlow with Amazon Redshift as a data warehouse

Top analytics announcements of AWS re:Invent 2024

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

Modernizing Data Analytics Architecture with the Denodo Platform on Azure

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Scaling RISE with SAP data and AWS Glue

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

Your 5-Step Journey from Analytics to AI

Automate schema evolution at scale with Apache Hudi in AWS Glue

Data replication holds the key to hybrid cloud effectiveness

An AI Chat Bot Wrote This Blog Post …

Use AWS Glue to streamline SFTP data processing

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Unlock scalable analytics with AWS Glue and Google BigQuery

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Data governance in the age of generative AI

Handle UPSERT data operations using open-source Delta Lake and AWS Glue

Augmented data management: Data fabric versus data mesh

Prepare and load Amazon S3 data into Teradata using AWS Glue through its native connector for Teradata Vantage

Connect your data for faster decisions with AWS

Differentiate generative AI applications with your data using AWS analytics and managed databases

Dive deep into AWS Glue 4.0 for Apache Spark

Stay Connected