Big Data, Data Architecture and Data Integration

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

AWS Big Data

FEBRUARY 26, 2025

Amazon Web Services (AWS) has been recognized as a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools. This recognition, we feel, reflects our ongoing commitment to innovation and excellence in data integration, demonstrating our continued progress in providing comprehensive data management solutions.

Data Integration

Data Integration Data Lake Data Warehouse Unstructured Data

Big Data Ingestion: Parameters, Challenges, and Best Practices

datapine

AUGUST 20, 2019

Operations data: Data generated from a set of operations such as orders, online transactions, competitor analytics, sales data, point of sales data, pricing data, etc. The gigantic evolution of structured, unstructured, and semi-structured data is referred to as Big data. Big Data Ingestion.

Big Data

Big Data B2B Cost-Benefit Structured Data

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS. The new solution has helped Aruba integrate data from multiple sources, along with optimizing their cost, performance, and scalability.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Need for a data mesh architecture Because entities in the EUROGATE group generate vast amounts of data from various sourcesacross departments, locations, and technologiesthe traditional centralized data architecture struggles to keep up with the demands for real-time insights, agility, and scalability.

IoT

IoT Machine Learning Metadata Data-driven

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

SageMaker brings together widely adopted AWS ML and analytics capabilities—virtually all of the components you need for data exploration, preparation, and integration; petabyte-scale big data processing; fast SQL analytics; model development and training; governance; and generative AI development.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Conclusion In this post, we walked you through the process of using Amazon AppFlow to integrate data from Google Ads and Google Sheets. We demonstrated how the complexities of data integration are minimized so you can focus on deriving actionable insights from your data.

Analytics

Analytics Data Warehouse Big Data Metrics

Creating a Vision for Data Integration Modernization

TDAN

JULY 19, 2022

Data is considered by some to be the world’s most valuable resource. Going far beyond the limitations of physical resources, data has wide applications for education, automation, and governance. It is perhaps no surprise then, that the value of all the world’s data is projected to reach $280 billion by 2025.

Data Integration

Data Integration Data Architecture IT Business Intelligence

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Several factors determine the quality of your enterprise data like accuracy, completeness, consistency, to name a few. But there’s another factor of data quality that doesn’t get the recognition it deserves: your data architecture. How the right data architecture improves data quality.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

IBM named a leader in the 2022 Gartner® Magic Quadrant™ for Data Integration Tools

IBM Big Data Hub

AUGUST 24, 2022

The only question is, how do you ensure effective ways of breaking down data silos and bringing data together for self-service access? It starts by modernizing your data integration capabilities – ensuring disparate data sources and cloud environments can come together to deliver data in real time and fuel AI initiatives.

Data Integration

Data Integration Metadata Data-driven Data Architecture

Data integrity vs. data quality: Is there a difference?

IBM Big Data Hub

JULY 13, 2023

When we talk about data integrity, we’re referring to the overarching completeness, accuracy, consistency, accessibility, and security of an organization’s data. Together, these factors determine the reliability of the organization’s data. In short, yes.

Data Quality

Data Quality Data Integration Metadata Cost-Benefit

Dive deep into security management: The Data on EKS Platform

AWS Big Data

APRIL 29, 2024

The construction of big data applications based on open source software has become increasingly uncomplicated since the advent of projects like Data on EKS , an open source project from AWS to provide blueprints for building data and machine learning (ML) applications on Amazon Elastic Kubernetes Service (Amazon EKS).

Management

Management Big Data Data Warehouse Metadata

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions. Andries has over 20 years of experience in the field of data and analytics.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Modern Data Architecture: Data Warehousing, Data Lakes, and Data Mesh Explained

Data Virtualization

OCTOBER 5, 2022

Reading Time: 3 minutes At the heart of every organization lies a data architecture, determining how data is accessed, organized, and used. For this reason, organizations must periodically revisit their data architectures, to ensure that they are aligned with current business goals.

Data Lake

Data Lake Data Architecture Data Integration Management

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

Today, the way businesses use data is much more fluid; data literate employees use data across hundreds of apps, analyze data for better decision-making, and access data from numerous locations. Security Data security is a high priority.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The Business Application Research Center (BARC) warns that data governance is a highly complex, ongoing program, not a “big bang initiative,” and it runs the risk of participants losing trust and interest over time.

Data Governance

Data Governance Management Metadata Data Quality

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

This blog post presents an architecture solution that allows customers to extract key insights from Amazon S3 access logs at scale. We will partition and format the server access logs with Amazon Web Services (AWS) Glue , a serverless data integration service, to generate a catalog for access logs and create dashboards for insights.

Metadata

Metadata Dashboards Metrics Visualization

Explore visualizations with AWS Glue interactive sessions

AWS Big Data

SEPTEMBER 20, 2023

She is passionate about designing and building end-to-end solutions to address customer data integration and analytic needs. Big Data Architect. Gal Heyne is a Product Manager for AWS Glue with a strong focus on AI/ML, data engineering and BI. Zach Mitchell is a Sr.

Interactive

Interactive Visualization Measurement Data Architecture

Building Trust in Public Sector AI Starts with Trusting Your Data

Cloudera

DECEMBER 1, 2023

Governments must ensure that the data used for training AI models is of high quality, accurately representing the diverse range of scenarios and demographics it seeks to address. It is vital to establish stringent data governance practices to maintain data integrity, privacy, and compliance with regulatory requirements.

Data Governance

Data Governance Data-driven Strategy Data Architecture

Extract data from SAP ERP using AWS Glue and the SAP SDK

AWS Big Data

FEBRUARY 8, 2023

Vyaire developed a custom data integration platform, iDataHub, powered by AWS services such as AWS Glue , AWS Lambda , and Amazon API Gateway. In this post, we share how we extracted data from SAP ERP using AWS Glue and the SAP SDK. Prahalathan M is the Data Integration Architect at Vyaire Medical Inc.

Testing

Testing Data Integration Data Lake Enterprise

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

In this post, we delve into the key aspects of using Amazon EMR for modern data management, covering topics such as data governance, data mesh deployment, and streamlined data discovery. Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. By decoupling storage and compute, data lakes promote cost-effective storage and processing of big data. Why did Orca choose Apache Iceberg?

Data Lake

Data Lake Analytics Snapshot Data Quality

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Create a unified view of the local table and historical data in Amazon Redshift As a modern data architecture strategy, you can organize historical data or less frequently accessed data in the data lake and keep frequently accessed data in the Redshift data warehouse.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Dive deep into AWS Glue 4.0 for Apache Spark

AWS Big Data

MAY 18, 2023

It’s even harder when your organization is dealing with silos that impede data access across different data stores. Seamless data integration is a key requirement in a modern data architecture to break down data silos. Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team.

Testing

Testing Data Lake Cost-Benefit Data Integration

Announcing zero-ETL integrations with AWS Databases and Amazon Redshift

AWS Big Data

NOVEMBER 28, 2023

We think that by automating the undifferentiated parts, we can help our customers increase the pace of their data-driven innovation by breaking down data silos and simplifying data integration.

Data Warehouse

Data Warehouse Data-driven Machine Learning B2B

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

AWS Glue A data integration service, AWS Glue consolidates major data integration capabilities into a single service. These include data discovery, modern ETL, cleansing, transforming, and centralized cataloging. Its also serverless, which means theres no infrastructure to manage.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Automate data loading from your database into Amazon Redshift using AWS Database Migration Service (DMS), AWS Step Functions, and the Redshift Data API

AWS Big Data

JULY 2, 2024

It also provides timely refreshes of data in your data warehouse. He has helped customers build scalable data warehousing and big data solutions for over 16 years. He has worked with building databases and data warehouse solutions for over 15 years.

Data Warehouse

Data Warehouse Sales Testing Big Data

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure. Various data stores are supported in AWS Glue; for example, AWS Glue 4.0

Data Lake

Data Lake Data Warehouse Visualization Snapshot

If Johnny Mnemonic Smuggled Linked Data

Ontotext

MAY 30, 2019

Linked Data and Volume. Speaking about data and volume, it seems apt to start this with the famous saying that “most companies think they have “Big Data” problems while they actually have big “data problems””. Linked Data and Information Retrieval.

Cost-Benefit

Cost-Benefit Big Data Technology Metadata

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Data fabric and data mesh are emerging data management concepts that are meant to address the organizational change and complexities of understanding, governing and working with enterprise data in a hybrid multicloud ecosystem. The good news is that both data architecture concepts are complimentary.

Management

Management Metadata Data Architecture Data Lake

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

AWS Big Data

JULY 31, 2024

In the current industry landscape, data lakes have become a cornerstone of modern data architecture, serving as repositories for vast amounts of structured and unstructured data. With a decade of experience, he excels in aiding customers with their big data workloads, focusing on data processing and analytics.

Data Lake

Data Lake Marketing Data Processing Management

The power of remote engine execution for ETL/ELT data pipelines

IBM Big Data Hub

MAY 15, 2024

Unified, governed data can also be put to use for various analytical, operational and decision-making purposes. This process is known as data integration, one of the key components to a strong data fabric. The remote execution engine is a fantastic technical development which takes data integration to the next level.

Cost-Benefit

Cost-Benefit Data Integration Data Architecture Manufacturing

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Apache Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for processing engines such as Apache Spark, Trino, Apache Flink, Presto, Apache Hive, and Impala to safely work with the same tables at the same time.

Data Lake

Data Lake Metadata Snapshot Analytics

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

As Gameskraft’s portfolio of gaming products increased, it led to an approximate five-times growth of dedicated data analytics and data science teams. Consequently, there was a fivefold rise in data integrations and a fivefold increase in ad hoc queries submitted to the Redshift cluster.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

Over the years, data lakes on Amazon Simple Storage Service (Amazon S3) have become the default repository for enterprise data and are a common choice for a large set of users who query data for a variety of analytics and machine leaning use cases. Analytics use cases on data lakes are always evolving.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

If Johnny Mnemonic Smuggled Linked Data

Ontotext

MAY 30, 2019

Linked Data and Volume. Speaking about data and volume, it seems apt to start this with the famous saying that “most companies think they have “Big Data” problems while they actually have big “data problems””. Linked Data and Information Retrieval.

Cost-Benefit

Cost-Benefit Big Data Technology Metadata

Creating an Agile BI infrastructure with Data Virtualization

Data Virtualization

DECEMBER 15, 2022

Reading Time: 3 minutes One of the biggest challenges for organizations is to integrate data from various sources. Despite modern advancements such as big data technologies and cloud, data often ends up in organized silos, but this means that cloud data is separated from.

Big Data

Big Data Data Integration Technology Management

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Data ingestion You have to build ingestion pipelines based on factors like types of data sources (on-premises data stores, files, SaaS applications, third-party data), and flow of data (unbounded streams or batch data). Data exploration Data exploration helps unearth inconsistencies, outliers, or errors.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Accelerate Amazon Redshift secure data use with Satori – Part 1

AWS Big Data

SEPTEMBER 21, 2023

Satori accelerates implementing data security controls on datawarehouses like Amazon Redshift, is straightforward to integrate, and doesn’t require any changes to your Amazon Redshift data, schema, or how your users interact with data. To learn more, start a free trial or request a demo meeting.

Data Warehouse

Data Warehouse Interactive Data Architecture Data-driven

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture. 4:30 PM – 5:30 PM (PDT) Wynn ANT207 | Understand your data with business context. 1:00 PM – 2:00 PM (PDT) Venetian ANT201 | Accelerate innovation with real-time data.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Lakehouse provides an open data architecture that reduces data silos and unifies data across Amazon Simple Storage Service (Amazon S3) data lakes, Redshift data warehouses, and third-party and federated data sources. With AWS Glue 5.0, AWS Glue 5.0 AWS Glue 5.0 Apache Iceberg 1.6.1,

Analytics

Analytics Data Lake Metadata Data Warehouse

How Metadata Makes Data Meaningful

erwin

DECEMBER 12, 2019

Here are some benefits of metadata management for data governance use cases: Better Data Quality: Data issues and inconsistencies within integrated data sources or targets are identified in real time to improve overall data quality by increasing time to insights and/or repair. by up to 70 percent.

Metadata

Metadata Data Governance Digital Transformation Data Quality

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Maximize value with comprehensive analytics and ML capabilities “Amazon Redshift is one of the most important tools we had in growing Jobcase as a company.” – Ajay Joshi, Distinguished Engineer, Jobcase With all your data integrated and available, you can easily build and run near real-time analytics to AI/ML/Generative AI applications.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Exploring new ETL and ELT capabilities for Amazon Redshift from the AWS Glue Studio visual editor

AWS Big Data

APRIL 20, 2023

In a modern data architecture, unified analytics enable you to access the data you need, whether it’s stored in a data lake or a data warehouse. One of the most common use cases for data preparation on Amazon Redshift is to ingest and transform data from different data stores into an Amazon Redshift data warehouse.

Visualization

Visualization Data Warehouse Big Data Data Lake

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

Webinars

Trending Sources

Big Data Ingestion: Parameters, Challenges, and Best Practices

Webinars

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

How EUROGATE established a data mesh architecture using Amazon DataZone

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Creating a Vision for Data Integration Modernization

Data architecture strategy for data quality

IBM named a leader in the 2022 Gartner® Magic Quadrant™ for Data Integration Tools

Data integrity vs. data quality: Is there a difference?

Dive deep into security management: The Data on EKS Platform

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Modern Data Architecture: Data Warehousing, Data Lakes, and Data Mesh Explained

Data democratization: How data architecture can drive business decisions and AI initiatives

What is data governance? Best practices for managing data assets

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

Explore visualizations with AWS Glue interactive sessions

Building Trust in Public Sector AI Starts with Trusting Your Data

Extract data from SAP ERP using AWS Glue and the SAP SDK

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Dive deep into AWS Glue 4.0 for Apache Spark

Announcing zero-ETL integrations with AWS Databases and Amazon Redshift

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Automate data loading from your database into Amazon Redshift using AWS Database Migration Service (DMS), AWS Step Functions, and the Redshift Data API

Load data incrementally from transactional data lakes to data warehouses

If Johnny Mnemonic Smuggled Linked Data

Augmented data management: Data fabric versus data mesh

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

The power of remote engine execution for ETL/ELT data pipelines

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Migrate an existing data lake to a transactional data lake using Apache Iceberg

If Johnny Mnemonic Smuggled Linked Data

Creating an Agile BI infrastructure with Data Virtualization

Create an end-to-end data strategy for Customer 360 on AWS

Accelerate Amazon Redshift secure data use with Satori – Part 1

Your guide to AWS Analytics at AWS re:Invent 2023

Top analytics announcements of AWS re:Invent 2024

How Metadata Makes Data Meaningful

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Exploring new ETL and ELT capabilities for Amazon Redshift from the AWS Glue Studio visual editor

Stay Connected