Data Architecture, Interactive and Snapshot

Data Architecture

Interactive

Snapshot

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. The Data Catalog provides the functionality as the Iceberg catalog. Determine the changes in transaction, and write new data files.

Snapshot

Snapshot Management Metadata Big Data

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. Querying all snapshots, we can see that we created three snapshots with overwrites after the initial one.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Is The Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Over the past decade, the successful deployment of large scale data platforms at our customers has acted as a big data flywheel driving demand to bring in even more data, apply more sophisticated analytics, and on-board many new data practitioners from business analysts to data scientists.

Snapshot

Snapshot Metadata Cost-Benefit Data Architecture

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Is The Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

Over the years, data lakes on Amazon Simple Storage Service (Amazon S3) have become the default repository for enterprise data and are a common choice for a large set of users who query data for a variety of analytics and machine leaning use cases. Analytics use cases on data lakes are always evolving.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Tracking data changes and rollback Build your transactional data lake on AWS You can build your modern data architecture with a scalable data lake that integrates seamlessly with an Amazon Redshift powered cloud warehouse. Athena provides a simplified, flexible way to analyze petabytes of data where it lives.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure. Run the first cell to set up an AWS Glue interactive session.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Estimating Scope 1 Carbon Footprint with Amazon Athena

AWS Big Data

AUGUST 2, 2023

In this blog, we will walk through how we can apply existing enterprise data to better understand and estimate Scope 1 carbon footprint using Amazon Simple Storage Service (S3) and Amazon Athena , a serverless interactive analytics service that makes it easy to analyze data using standard SQL.

Data Lake

Data Lake Measurement Visualization Data Architecture

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 2

AWS Big Data

JUNE 12, 2024

This connection allows ETL processes to interact with the Redshift cluster by establishing a JDBC connection. He has over 20 years of experience in software engineering, software architecture, and cloud architecture. He has over 25 years of experience in Enterprise data architecture, databases and data warehousing.

Data-driven

Data-driven Snapshot Optimization Management

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Furthermore, data events are filtered, enriched, and transformed to a consumable format using a stream processor. The result is made available to the application by querying the latest snapshot. AWS Glue can interact with streaming data services such as Kinesis Data Streams and Amazon MSK for processing and transforming CDC data.

Data Lake

Data Lake Unstructured Data Management Snapshot

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. Amazon Athena is used for interactive querying and AWS Lake Formation is used for access controls.

Data Lake

Data Lake Data Processing Metadata Snapshot

Use Batch Processing Gateway to automate job management in multi-cluster Amazon EMR on EKS environments

AWS Big Data

SEPTEMBER 13, 2024

Any code or connection interacts with the interface of the gateway only. Suvojit Dasgupta is a Principal Data Architect at Amazon Web Services. He leads a team of skilled engineers in designing and building scalable data solutions for AWS customers. In this case, the resource is the EMR on EKS clusters running Spark.

Management

Management Snapshot Cost-Benefit Testing

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Success criteria alignment by all stakeholders (producers, consumers, operators, auditors) is key for successful transition to a new Amazon Redshift modern data architecture. The success criteria are the key performance indicators (KPIs) for each component of the data workflow. The following figure shows a daily usage KPI.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

By analyzing the historical report snapshot, you can identify areas for improvement, implement changes, and measure the effectiveness of those changes. In Apache Spark, a SparkSession is the entry point for interacting with DataFrames and Spark’s built-in functions. config("spark.jars.packages", pydeequ.deequ_maven_coord).config("spark.jars.excludes",

Data Quality

Data Quality Visualization Metadata Metrics

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

AWS Big Data

OCTOBER 18, 2023

Many customers migrate their data warehousing workloads to Amazon Redshift and benefit from the rich capabilities it offers, such as the following: Amazon Redshift seamlessly integrates with broader data, analytics, and AI or machine learning (ML) services on AWS , enabling you to choose the right tool for the right job.

Analytics

Analytics Data Warehouse Dashboards Testing

How Amazon optimized its high-volume financial reconciliation process with Amazon EMR for higher scalability and performance

AWS Big Data

MARCH 28, 2024

With these frameworks and related open-source projects, you can process data for analytics purposes and BI workloads. Sakti Mishra is a Principal Solutions Architect at AWS, where he helps customers modernize their data architecture and define their end-to-end data strategy, including data security, accessibility, governance, and more.

Optimization

Optimization IT Big Data Data Processing

Secure Data Sharing and Interoperability Powered by Iceberg REST Catalog

Cloudera

DECEMBER 3, 2024

Apache Iceberg, together with the REST Catalog, dramatically simplifies the enterprise data architecture, reducing the Time to Value, Time to Market, and overall TCO, and driving greater ROI. It provides real time metadata access by directly integrating with the Iceberg-compatible metastore. Iceberg replication for disaster recovery.

Metadata

Metadata Data Warehouse ROI Machine Learning

Data Leaders Brief

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Trending Sources

Introducing Apache Iceberg in Cloudera Data Platform

Webinars

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Load data incrementally from transactional data lakes to data warehouses

Estimating Scope 1 Carbon Footprint with Amazon Athena

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 2

Exploring real-time streaming for generative AI Applications

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Use Batch Processing Gateway to automate job management in multi-cluster Amazon EMR on EKS environments

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

How Amazon optimized its high-volume financial reconciliation process with Amazon EMR for higher scalability and performance

Secure Data Sharing and Interoperability Powered by Iceberg REST Catalog

Stay Connected