Analytics, Data Warehouse and Snapshot

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. In practice, OTFs are used in a broad range of analytical workloads, from business intelligence to machine learning.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

AWS Big Data

DECEMBER 12, 2024

One-time and complex queries are two common scenarios in enterprise data analytics. Complex queries, on the other hand, refer to large-scale data processing and in-depth analysis based on petabyte-level data warehouses in massive data scenarios. Here, data modeling uses dbt on Amazon Redshift.

Snapshot

Snapshot Recreation/Entertainment Experimentation Data Lake

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

AWS Big Data

MARCH 27, 2023

Amazon Redshift Serverless makes it simple to run and scale analytics without having to manage your data warehouse infrastructure. For Filter by resource type , you can filter by Workgroup , Namespace , Snapshot , and Recovery Point. For more details on tagging, refer to Tagging resources overview.

Data Warehouse

Data Warehouse Management Snapshot Data Lake

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Cloudera

APRIL 3, 2023

Cloudera Contributors: Ayush Saxena, Tamas Mate, Simhadri Govindappa Since we announced the general availability of Apache Iceberg in Cloudera Data Platform (CDP), we are excited to see customers testing their analytic workloads on Iceberg. We will publish follow up blogs for other data services.

Data Warehouse

Data Warehouse Snapshot Metadata Cost-Benefit

How Gupshup built their multi-tenant messaging analytics platform on Amazon Redshift

AWS Big Data

FEBRUARY 12, 2024

Objective Gupshup wanted to build a messaging analytics platform that provided: Build a platform to get detailed insights, data, and reports about WhatsApp/SMS campaigns and track the success of every text message sent by the end customers. Additionally, extract, load, and transform (ELT) data processing is sped up and made easier.

Analytics

Analytics Data Warehouse Snapshot Cost-Benefit

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

In this blog post, we dive into different data aspects and how Cloudinary breaks the two concerns of vendor locking and cost efficient data analytics by using Apache Iceberg, Amazon Simple Storage Service (Amazon S3 ), Amazon Athena , Amazon EMR , and AWS Glue. SparkActions.get().expireSnapshots(iceTable).expireOlderThan(TimeUnit.DAYS.toMillis(7)).execute()

Data Lake

Data Lake Metadata Snapshot Analytics

Cloud Data Warehouse Migration 101: Expert Tips

Alation

JULY 28, 2022

There was a time when most CIOs would never consider putting their crown jewels — AKA customer data and associated analytics — into the cloud. But today, there is a magic quadrant for cloud databases and warehouses comprising more than 20 vendors. The cloud is no longer synonymous with risk. What do you migrate, how, and when?

Data Warehouse

Data Warehouse Cost-Benefit Data-driven Data Governance

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

AWS Big Data

DECEMBER 9, 2024

product_id product_name price _change_type 00001 Heater 250 INSERT 00001 Heater 250 UPDATE_BEFORE 00001 Heater 500 UPDATE_AFTER This capability not only simplifies historical analysis but also opens possibilities for advanced time-based analytics, auditing, and data governance. Initialize the SparkSession with Iceberg settings.

Snapshot

Snapshot Data Warehouse Data Lake Data Quality

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

AWS Big Data

JUNE 21, 2023

Deriving business insights by identifying year-on-year sales growth is an example of an online analytical processing (OLAP) query. These types of queries are suited for a data warehouse. Amazon Redshift is fully managed, scalable, cloud data warehouse. This dimensional model will be built in Amazon Redshift.

Data Warehouse

Data Warehouse Data Lake OLAP Cost-Benefit

Achieve near real time operational analytics using Amazon Aurora PostgreSQL zero-ETL integration with Amazon Redshift

AWS Big Data

APRIL 10, 2024

When data is used to improve customer experiences and drive innovation, it can lead to business growth,” – Swami Sivasubramanian , VP of Database, Analytics, and Machine Learning at AWS in With a zero-ETL approach, AWS is helping builders realize near-real-time analytics.

Data Warehouse

Data Warehouse Analytics Metrics Snapshot

Implement disaster recovery with Amazon Redshift

AWS Big Data

JUNE 27, 2024

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers. For additional details, refer to Automated snapshots.

Snapshot

Snapshot Data Warehouse Data Processing Strategy

Enhance your security posture by storing Amazon Redshift admin credentials without human intervention using AWS Secrets Manager integration

AWS Big Data

OCTOBER 18, 2023

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. Restore a snapshot New warehouses can be launched from both serverless and provisioned snapshots.

Snapshot

Snapshot Management Data Warehouse Dashboards

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

With the ever-increasing volume of data available, Dafiti faces the challenge of effectively managing and extracting valuable insights from this vast pool of information to gain a competitive edge and make data-driven decisions that align with company business objectives. TB of data. We started with 115 dc2.large

Data Lake

Data Lake Analytics Data Warehouse Data-driven

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

AWS Big Data

OCTOBER 18, 2023

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that provides the flexibility to use provisioned or serverless compute for your analytical workloads. Modern analytics is much wider than SQL-based data warehousing. Fault tolerance is built in. Any hardware failures are automatically replaced.

Analytics

Analytics Data Warehouse Dashboards Testing

Evaluating sample Amazon Redshift data sharing architecture using Redshift Test Drive and advanced SQL analysis

AWS Big Data

SEPTEMBER 10, 2024

With the launch of Amazon Redshift Serverless and the various provisioned instance deployment options , customers are looking for tools that help them determine the most optimal data warehouse configuration to support their Amazon Redshift workloads. About the Authors Ayan Majumder is an Analytics Specialist Solutions Architect at AWS.

Testing

Testing Snapshot Data Warehouse Metrics

Excellent Analytics Tip #17: Calculate Customer Lifetime Value

Occam's Razor

APRIL 5, 2010

Take a snapshot of your customer database for the past 2 years and it may look like this: That is an average. You'll work with your acquisition team or your finance team to get the cost data. Not Omniture's Site Catalyst, WebTrends Analytics, Coremetrics, Google Analytics or Unica or whatever. Look 'em up.

Analytics

Analytics Marketing Measurement Metrics

Getting started guide for near-real time operational analytics using Amazon Aurora zero-ETL integration with Amazon Redshift

AWS Big Data

JUNE 28, 2023

In this post, we provide step-by-step guidance on how to get started with near-real time operational analytics using this feature. There are two broad approaches to analyzing operational data for these use cases: Analyze the data in-place in the operational database (e.g.

Data Warehouse

Data Warehouse Analytics Metrics Dashboards

Implement data warehousing solution using dbt on Amazon Redshift

AWS Big Data

NOVEMBER 17, 2023

Amazon Redshift is a cloud data warehousing service that provides high-performance analytical processing based on a massively parallel processing (MPP) architecture. Building and maintaining data pipelines is a common challenge for all enterprises. For more information, refer SQL models.

Snapshot

Snapshot Data Processing Testing Data Warehouse

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

This is the first post to a blog series that offers common architectural patterns in building real-time data streaming infrastructures using Kinesis Data Streams for a wide range of use cases. In this post, we will review the common architectural patterns of two use cases: Time Series Data Analysis and Event Driven Microservices.

Analytics

Analytics IoT Data-driven Snapshot

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

AWS-powered data lakes, supported by the unmatched availability of Amazon Simple Storage Service (Amazon S3), can handle the scale, agility, and flexibility required to combine different data and analytics approaches. It will never remove files that are still required by a non-expired snapshot.

Snapshot

Snapshot Data Lake Metadata Optimization

How OLX Group migrated to Amazon Redshift RA3 for simpler, faster, and more cost-effective analytics

AWS Big Data

FEBRUARY 13, 2023

This is a guest post by Miguel Chin, Data Engineering Manager at OLX Group and David Greenshtein, Specialist Solutions Architect for Analytics, AWS. We live in a data-producing world, and as companies want to become data driven, there is the need to analyze more and more data. Take snapshot from 6 x RA3.4xlarge.

Snapshot

Snapshot Data Warehouse Analytics Testing

How to Use Apache Iceberg in CDP’s Open Lakehouse

Cloudera

AUGUST 8, 2022

The general availability covers Iceberg running within some of the key data services in CDP, including Cloudera Data Warehouse ( CDW ), Cloudera Data Engineering ( CDE ), and Cloudera Machine Learning ( CML ). Cloudera Data Engineering (Spark 3) with Airflow enabled. Cloudera Machine Learning . group by year.

Snapshot

Snapshot Data Warehouse Machine Learning Cost-Benefit

What is business intelligence? Transforming data into business insights

CIO Business Intelligence

JANUARY 20, 2023

BI tools access and analyze data sets and present analytical findings in reports, summaries, dashboards, graphs, charts, and maps to provide users with detailed intelligence about the state of the business. Whereas BI studies historical data to guide business decision-making, business analytics is about looking forward.

Business Intelligence

Business Intelligence Dashboards Data mining OLAP

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions. Snowflake integrates with AWS Glue Data Catalog to retrieve the snapshot location.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback.

Data Lake

Data Lake Data Processing Metadata Snapshot

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

They enable transactions on top of data lakes and can simplify data storage, management, ingestion, and processing. These transactional data lakes combine features from both the data lake and the data warehouse. The Data Catalog provides a central location to govern and keep track of the schema and metadata.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Materialized Views in Hive for Iceberg Table Format

Cloudera

FEBRUARY 8, 2024

Apache Iceberg is a high-performance open table format for petabyte-scale analytic datasets. It brings the reliability and simplicity of SQL tables to big data while enabling engines like Hive, Impala, Spark, Trino, Flink, and Presto to work with the same tables at the same time. Starting from the CDW Public Cloud DWX-1.6.1

Snapshot

Snapshot Metadata Cost-Benefit Data Warehouse

Blending Art and Science: Using Data to Forecast and Manage Your Sales Pipeline

Sisense

JANUARY 6, 2020

Analytics and sales should partner to forecast new business revenue and manage pipeline, because sales teams that have an analyst dedicated to their data and trends, drive insights that optimize workflows and decision making. This is not to say that data modeling should be focused specifically on sales.

Sales

Sales Forecasting Snapshot Management

Unlock insights on Amazon RDS for MySQL data with zero-ETL integration to Amazon Redshift

AWS Big Data

MARCH 21, 2024

In this post, we provide step-by-step guidance on how to get started with near real-time operational analytics using this feature. This post is a continuation of the zero-ETL series that started with Getting started guide for near-real time operational analytics using Amazon Aurora zero-ETL integration with Amazon Redshift.

Data Warehouse

Data Warehouse Metrics Statistics Optimization

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

AWS Big Data

DECEMBER 13, 2023

Traditionally, customers used batch-based approaches for data movement from operational systems to analytical systems. A batch-based approach can introduce latency in data movement and reduce the value of data for analytics. usually a data warehouse) needs to reflect those changes in near real-time.

Data Warehouse

Data Warehouse Snapshot Data Processing Internet of Things

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Large-scale data warehouse migration to the cloud is a complex and challenging endeavor that many organizations undertake to modernize their data infrastructure, enhance data management capabilities, and unlock new business opportunities. This makes sure the new data platform can meet current and future business goals.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

How the Edge Is Changing Data-First Modernization

CIO Business Intelligence

MAY 16, 2022

The advent of distributed workforces, smart devices, and internet-of-things (IoT) applications is creating a deluge of data generated and consumed outside of traditional centralized data warehouses. billion connected IoT devices by 2025, generating almost 80 billion zettabytes of data at the edge. “The

IoT

IoT Internet of Things Data Warehouse Machine Learning

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

RIO is really great",date("2023-04-06"),2023)""") You can check the new snapshot is created after this append operation by querying the Iceberg snapshot: spark.sql("""SELECT * FROM dev.db.amazon_reviews_iceberg.snapshots""").show() In that case, we have to query the table with the snapshot-id corresponding to the deleted row.

Data Lake

Data Lake Snapshot Metadata Optimization

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as managing continuously evolving datasets with transactions while maintaining query performance.

Data Lake

Data Lake Snapshot Metadata Optimization

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

AWS Big Data

NOVEMBER 10, 2023

Amazon Redshift offers seamless integration with Apache Spark, allowing you to easily access your Redshift data on both Amazon Redshift provisioned clusters and Amazon Redshift Serverless. These tables are then joined with tables from the Enterprise Data Lake (EDL) at runtime. Connect with him on LinkedIn.

Data Processing

Data Processing Data Lake Data Warehouse Optimization

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

Apache Iceberg is an open table format for data lakes that manages large collections of files as tables. It supports modern analytical data lake operations such as create table as select (CTAS), upsert and merge, and time travel queries. However, this requires knowledge of a table’s current snapshots.

Data Lake

Data Lake Snapshot Optimization Data Transformation

Empower Your Cyber Defenders with Real-Time Analytics

Cloudera

NOVEMBER 15, 2024

However, there is a fundamental challenge standing in the way of being successful: data. By breaking down data silos and integrating log data from multiple sources, Cloudera empowers defenders with the real-time analytics to respond to threats swiftly.

Analytics

Analytics Metadata Data-driven Snapshot

Find the best Amazon Redshift configuration for your workload using Redshift Test Drive

AWS Big Data

JULY 27, 2023

Amazon Redshift is a widely used, fully managed, petabyte-scale cloud data warehouse. Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytics workloads. Take a snapshot of the source Redshift data warehouse.

Testing

Testing Data Warehouse Data Processing Snapshot

Configure monitoring, limits, and alarms in Amazon Redshift Serverless to keep costs predictable

AWS Big Data

JULY 25, 2023

Amazon Redshift Serverless makes it simple to run and scale analytics in seconds. It automatically provisions and intelligently scales data warehouse compute capacity to deliver fast performance, and you pay only for what you use. The following screenshot shows the metrics available at the snapshot storage level.

Metrics

Metrics Data Warehouse Dashboards Snapshot

AI at Scale isn’t Magic, it’s Data – Hybrid Data

Cloudera

OCTOBER 11, 2022

Al needs machine learning (ML), ML needs data science. Data science needs analytics. And they all need lots of data. The takeaway – businesses need control over all their data in order to achieve AI at scale and digital business transformation. But it isn’t just aggregating data for models.

Data Science

Data Science Snapshot Data Warehouse Metadata

Run Apache XTable in AWS Lambda for background conversion of open table formats

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

Webinars

Trending Sources

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

Webinars

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

How Gupshup built their multi-tenant messaging analytics platform on Amazon Redshift

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Cloud Data Warehouse Migration 101: Expert Tips

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

Achieve near real time operational analytics using Amazon Aurora PostgreSQL zero-ETL integration with Amazon Redshift

Implement disaster recovery with Amazon Redshift

Enhance your security posture by storing Amazon Redshift admin credentials without human intervention using AWS Secrets Manager integration

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

Evaluating sample Amazon Redshift data sharing architecture using Redshift Test Drive and advanced SQL analysis

Excellent Analytics Tip #17: Calculate Customer Lifetime Value

Getting started guide for near-real time operational analytics using Amazon Aurora zero-ETL integration with Amazon Redshift

Implement data warehousing solution using dbt on Amazon Redshift

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Use Amazon Athena with Spark SQL for your open-source transactional table formats

How OLX Group migrated to Amazon Redshift RA3 for simpler, faster, and more cost-effective analytics

How to Use Apache Iceberg in CDP’s Open Lakehouse

What is business intelligence? Transforming data into business insights

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Use Apache Iceberg in a data lake to support incremental data processing

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Materialized Views in Hive for Iceberg Table Format

Blending Art and Science: Using Data to Forecast and Manage Your Sales Pipeline

Unlock insights on Amazon RDS for MySQL data with zero-ETL integration to Amazon Redshift

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

How the Edge Is Changing Data-First Modernization

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Introducing Apache Hudi support with AWS Glue crawlers

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Top 20 most-asked questions about Amazon RDS for Db2 answered

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

Empower Your Cyber Defenders with Real-Time Analytics

Find the best Amazon Redshift configuration for your workload using Redshift Test Drive

Configure monitoring, limits, and alarms in Amazon Redshift Serverless to keep costs predictable

AI at Scale isn’t Magic, it’s Data – Hybrid Data

Stay Connected