Data Architecture, Data Lake and Events

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue.

Data Lake

Data Lake Data Processing Optimization Machine Learning

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This enables you to extract insights from your data without the complexity of managing infrastructure.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. In practice, OTFs are used in a broad range of analytical workloads, from business intelligence to machine learning.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

AWS Big Data

OCTOBER 10, 2024

Over the years, this customer-centric approach has led to the introduction of groundbreaking features such as zero-ETL , data sharing , streaming ingestion , data lake integration , Amazon Redshift ML , Amazon Q generative SQL , and transactional data lake capabilities.

Data Lake

Data Lake Data Warehouse Recreation/Entertainment Data-driven

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

Over the years, organizations have invested in creating purpose-built, cloud-based data lakes that are siloed from one another. A major challenge is enabling cross-organization discovery and access to data across these multiple data lakes, each built on different technology stacks.

Data Lake

Data Lake Publishing Metadata Data-driven

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Enterprises and organizations across the globe want to harness the power of data to make better decisions by putting data at the center of every decision-making process. However, throughout history, data services have held dominion over their customers’ data.

Data Lake

Data Lake Metadata Snapshot Analytics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

AWS Big Data

JULY 31, 2024

In the current industry landscape, data lakes have become a cornerstone of modern data architecture, serving as repositories for vast amounts of structured and unstructured data. Maintaining data consistency and integrity across distributed data lakes is crucial for decision-making and analytics.

Data Lake

Data Lake Marketing Data Processing Management

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. Of those tables, some are larger (such as in terms of record volume) than others, and some are updated more frequently than others.

Data Lake

Data Lake Data Processing Metadata Snapshot

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

AWS Big Data

OCTOBER 30, 2024

Overview of the auto-copy feature in Amazon Redshift The auto-copy feature in Amazon Redshift leverages the S3 event integration to automatically load data into Amazon Redshift and simplifies automatic data loading from Amazon S3 with a simple SQL command. Prior to AWS, he built data warehouse solutions at Amazon.com.

Data Warehouse

Data Warehouse Sales Data Lake Recreation/Entertainment

Automate replication of relational sources into a transactional data lake with Apache Iceberg and AWS Glue

AWS Big Data

FEBRUARY 14, 2023

Organizations have chosen to build data lakes on top of Amazon Simple Storage Service (Amazon S3) for many years. A data lake is the most popular choice for organizations to store all their organizational data generated by different teams, across business domains, from all different formats, and even over history.

Data Lake

Data Lake Statistics Data Architecture Finance

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

AWS Big Data

MARCH 28, 2023

As organizations across the globe are modernizing their data platforms with data lakes on Amazon Simple Storage Service (Amazon S3), handling SCDs in data lakes can be challenging.

Data Lake

Data Lake Testing Snapshot Big Data

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Need for a data mesh architecture Because entities in the EUROGATE group generate vast amounts of data from various sourcesacross departments, locations, and technologiesthe traditional centralized data architecture struggles to keep up with the demands for real-time insights, agility, and scalability.

IoT

IoT Machine Learning Metadata Data-driven

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. These conflicts are particularly common in large-scale data cleanup operations.

Snapshot

Snapshot Management Metadata Big Data

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

This solution only replicates metadata in the Data Catalog, not the actual underlying data. To have a redundant data lake using Lake Formation and AWS Glue in an additional Region, we recommend replicating the Amazon S3-based storage using S3 replication , S3 sync, aws-s3-copy-sync-using-batch or S3 Batch replication process.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

This premier event showcased groundbreaking advancements, keynotes from AWS leadership, hands-on technical sessions, and exciting product launches. Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights.

Analytics

Analytics Data Lake Metadata Data Warehouse

Building a vision for real-time artificial intelligence

CIO Business Intelligence

APRIL 12, 2023

After walking his executive team through the data hops, flows, integrations, and processing across different ingestion software, databases, and analytical platforms, they were shocked by the complexity of their current data architecture and technology stack. It isn’t easy.

Machine Learning

Machine Learning Cost-Benefit Data-driven Strategy

Deploy and Optimize Your Snowflake Environment Faster With Accelerators

CDW Research Hub

JULY 18, 2022

One modern data platform solution that provides simplicity and flexibility to grow is Snowflake’s data cloud and platform. These Snowflake accelerators reduce the time to analytics for your users at all levels so you can make data-driven decisions faster. Security Data Lake. Overall data architecture and strategy.

Optimization

Optimization Data Lake Data Warehouse Manufacturing

Texas Rangers data transformation modernizes stadium operations

CIO Business Intelligence

OCTOBER 18, 2022

She decided to bring Resultant in to assist, starting with the firm’s strategic data assessment (SDA) framework, which evaluates a client’s data challenges in terms of people and processes, data models and structures, data architecture and platforms, visual analytics and reporting, and advanced analytics.

Data Transformation

Data Transformation Consulting Data Lake Reporting

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

Ingestion: Data lake batch, micro-batch, and streaming Many organizations land their source data into their data lake in various ways, including batch, micro-batch, and streaming jobs. Amazon AppFlow can be used to transfer data from different SaaS applications to a data lake.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

Insiders Cite The Wondrous Benefits Of Big Data In Fortnite

Smart Data Collective

AUGUST 9, 2019

Even Forbes Tech Council has written about the benefits of data lakes in Fortnite. The game’s parent company, Epic Games, processes millions of events each minute, and its mountain of data grows steadily. As it turns out, Epic uses a data lake for this massive undertaking.

Big Data

Big Data Data Lake Data Architecture Machine Learning

Porsche Carrera Cup Brasil gets real-time data boost

CIO Business Intelligence

MAY 21, 2024

Unlike many other events, which consist of multiple racing teams and manufacturers, Porsche Carrera Cup Brasil provides and maintains all 75 cars used in the race. If I don’t do predictive maintenance, if I have to do corrective maintenance at events, a lot of money is wasted.”

Broadcasting

Broadcasting Recreation/Entertainment Manufacturing Data Lake

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

For example, in a chatbot, data events could pertain to an inventory of flights and hotels or price changes that are constantly ingested to a streaming storage engine. Furthermore, data events are filtered, enriched, and transformed to a consumable format using a stream processor.

Data Lake

Data Lake Unstructured Data Management Snapshot

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

Visit us at the AWS Analytics Kiosk in the AWS Village at the Expo to discover the AWS Analytics Superhero in you, participate in a playful quiz and AWS book signing events. 11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Why the Data Journey Manifesto?

DataKitchen

JUNE 12, 2023

We had been talking about “Agile Analytic Operations,” “DevOps for Data Teams,” and “Lean Manufacturing For Data,” but the concept was hard to get across and communicate. I spent much time de-categorizing DataOps: we are not discussing ETL, Data Lake, or Data Science.

Testing

Testing Dashboards Data Lake Data Science

Real estate CIOs drive deals with data

CIO Business Intelligence

JULY 26, 2023

The CIO delights in detailing the work of Re/Max’s technology team, which is building the pipelines and cloud-native applications to deliver agents in the field the most refined and insightful data from more than 500 MLS listing serivces in the US and Canada as quickly as possible.

Data Lake

Data Lake Digital Transformation Machine Learning Data Architecture

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon AppFlow is a fully managed integration service that you can use to securely transfer data from software as a service (SaaS) applications, such as Google BigQuery, Salesforce, SAP, HubSpot, and ServiceNow, to Amazon Web Services (AWS) services such as Amazon Simple Storage Service (Amazon S3) and Amazon Redshift, in just a few clicks.

Analytics

Analytics Data Warehouse Big Data Metrics

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To bring their customers the best deals and user experience, smava follows the modern data architecture principles with a data lake as a scalable, durable data store and purpose-built data stores for analytical processing and data consumption.

Data Lake

Data Lake Data Warehouse Data-driven B2B

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. Additionally, data is extracted from vendor APIs that includes data related to product, marketing, and customer experience.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

It aims to provide a framework to create low-latency streaming applications on the AWS Cloud using Amazon Kinesis Data Streams and AWS purpose-built data analytics services. In this post, we will review the common architectural patterns of two use cases: Time Series Data Analysis and Event Driven Microservices.

Analytics

Analytics IoT Data-driven Snapshot

How Getir unleashed data democratization using a data mesh architecture with Amazon Redshift

AWS Big Data

OCTOBER 23, 2024

Amazon Redshift enables data warehousing by seamlessly integrating with other data stores and services in the modern data organization through features such as Zero-ETL , data sharing , streaming ingestion , data lake integration , and Redshift ML.

Data Warehouse

Data Warehouse Cost-Benefit Data Lake Data-driven

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

First, you must understand the existing challenges of the data team, including the data architecture and end-to-end toolchain. The DataOps pipeline you have built has enough automated tests to catch errors, and error events are tied to some form of real-time alerts. Figure 2: Example data pipeline with DataOps automation.

Testing

Testing Metadata Dashboards Statistics

Simplify access management with Amazon Redshift and AWS Lake Formation for users in an External Identity Provider

AWS Big Data

FEBRUARY 15, 2024

You might be modernizing your data architecture using Amazon Redshift to enable access to your data lake and data in your data warehouse, and are looking for a centralized and scalable way to define and manage the data access based on IdP identities. Select the data share and choose Authorize.

Management

Management Data Lake Sales Data Warehouse

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

July brings summer vacations, holiday gatherings, and for the first time in two years, the return of the Massachusetts Institute of Technology (MIT) Chief Data Officer symposium as an in-person event. A key area of focus for the symposium this year was the design and deployment of modern data platforms. What is a data fabric?

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Cloudera

JUNE 13, 2024

There were thousands of attendees at the event – lining up for book signings and meetings with recruiters to fill the endless job openings for developers experienced with MapReduce and managing Big Data. This was the gold rush of the 21st century, except the gold was data.

Big Data

Big Data Machine Learning Contextual Data Data Lake

An Introduction to Disaster Recovery with the Cloudera Data Platform

Cloudera

AUGUST 9, 2022

The simple answer: the customer journey is evolving beyond single data clusters, single clouds, and simple infrastructures into robust, fault-tolerant architectures that can survive a failure event and keep the customer running. Conclusion.

Data Lake

Data Lake Data Warehouse Data-driven IoT

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

During the first-ever virtual broadcast of our annual Data Impact Awards (DIA) ceremony, we had the great pleasure of announcing this year’s finalists and winners. In a year marked by unusual events, and disruption to our “normal” lives, it was a pleasure to recognize our customers’ most impressive achievements.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Trending Sources

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Choosing an open table format for your transactional data lake on AWS

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

Automate replication of relational sources into a transactional data lake with Apache Iceberg and AWS Glue

Data’s dark secret: Why poor quality cripples AI and growth

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

How EUROGATE established a data mesh architecture using Amazon DataZone

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

Top analytics announcements of AWS re:Invent 2024

Building a vision for real-time artificial intelligence

Deploy and Optimize Your Snowflake Environment Faster With Accelerators

Texas Rangers data transformation modernizes stadium operations

AWS Lake Formation 2022 year in review

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Insiders Cite The Wondrous Benefits Of Big Data In Fortnite

Porsche Carrera Cup Brasil gets real-time data boost

Exploring real-time streaming for generative AI Applications

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Your guide to AWS Analytics at AWS re:Invent 2023

Why the Data Journey Manifesto?

Real estate CIOs drive deals with data

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

How smava makes loans transparent and affordable using Amazon Redshift Serverless

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

How Getir unleashed data democratization using a data mesh architecture with Amazon Redshift

How Cloudera Data Flow Enables Successful Data Mesh Architectures

A Day in the Life of a DataOps Engineer

Simplify access management with Amazon Redshift and AWS Lake Formation for users in an External Identity Provider

Demystifying Modern Data Platforms

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

An Introduction to Disaster Recovery with the Cloudera Data Platform

Announcing the 2020 Data Impact Award Winners

Stay Connected