Analytics, Reference and Snapshot

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in Amazon OpenSearch Service

AWS Big Data

OCTOBER 11, 2024

Snapshots are crucial for data backup and disaster recovery in Amazon OpenSearch Service. These snapshots allow you to generate backups of your domain indexes and cluster state at specific moments and save them in a reliable storage location such as Amazon Simple Storage Service (Amazon S3). Snapshots are not instantaneous.

Snapshot

Snapshot Dashboards Management Testing

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

AWS Big Data

NOVEMBER 11, 2024

OpenSearch is a distributed search and analytics engine, which is an open-source project. OpenSearch Service seamlessly integrates with other AWS offerings, providing a robust solution for building scalable and resilient search and analytics applications in the cloud.

Snapshot

Snapshot Strategy Dashboards Data Lake

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

Metadata layer Contains metadata files that track table history, schema evolution, and snapshot information. In many operations (like OVERWRITE, MERGE, and DELETE), the query engine needs to know which files or rows are relevant, so it reads the current table snapshot. This is optional for operations like INSERT.

Snapshot

Snapshot Management Metadata Big Data

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

By providing a standardized framework for data representation, open table formats break down data silos, enhance data quality, and accelerate analytics at scale. For more details, refer to Iceberg Release 1.6.1. Branching Branches are independent lineage of snapshot history that point to the head of each lineage.

Snapshot

Snapshot Metadata Data Lake Optimization

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

AWS Big Data

DECEMBER 12, 2024

One-time and complex queries are two common scenarios in enterprise data analytics. Complex queries, on the other hand, refer to large-scale data processing and in-depth analysis based on petabyte-level data warehouses in massive data scenarios. One-time queries are flexible and suitable for instant analysis and exploratory research.

Snapshot

Snapshot Recreation/Entertainment Experimentation Data Lake

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Initially, data warehouses were the go-to solution for structured data and analytical workloads but were limited by proprietary storage formats and their inability to handle unstructured data. In practice, OTFs are used in a broad range of analytical workloads, from business intelligence to machine learning.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Unleash the power of Snapshot Management to take automated snapshots using Amazon OpenSearch Service

AWS Big Data

OCTOBER 18, 2023

in Amazon OpenSearch Service , we introduced Snapshot Management , which automates the process of taking snapshots of your domain. Snapshot Management helps you create point-in-time backups of your domain using OpenSearch Dashboards, including both data and configuration settings (for visualizations and dashboards).

Snapshot

Snapshot Management Dashboards Data Processing

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

In this post, we use the term vanilla Parquet to refer to Parquet files stored directly in Amazon S3 and accessed through standard query engines like Apache Spark, without the additional features provided by table formats such as Iceberg. When a user requests a time travel query, the typical workflow involves querying a specific snapshot.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Your Introduction To CFO Dashboards & Reports In The Digital Age

datapine

JUNE 23, 2020

CFO dashboards exist to enhance the strategic as well as the analytical efforts related to every financial aspect of your business. In essence, a CFO dashboard is the analytical nerve center for all of your most invaluable financial data. If a CFO KPI dashboard is the analytical framework, the reports are your analytical eyes.

Dashboards

Dashboards Reporting KPI Metrics

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

AWS Big Data

SEPTEMBER 12, 2024

Iceberg creates a new version called a snapshot for every change to the data in the table. Iceberg has features like time travel and rollback that allow you to query data lake snapshots or roll back to previous versions. The Glue Data Catalog honors retention policies for Iceberg branches and tags referencing snapshots.

Optimization

Optimization Snapshot Metadata Metrics

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

AWS-powered data lakes, supported by the unmatched availability of Amazon Simple Storage Service (Amazon S3), can handle the scale, agility, and flexibility required to combine different data and analytics approaches. For more information, refer to Amazon S3: Allows read and write access to objects in an S3 Bucket.

Snapshot

Snapshot Data Lake Metadata Optimization

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

It aims to provide a framework to create low-latency streaming applications on the AWS Cloud using Amazon Kinesis Data Streams and AWS purpose-built data analytics services. The collected data is available in milliseconds to allow real-time analytics use cases, such as real-time dashboards, real-time anomaly detection, and dynamic pricing.

Analytics

Analytics IoT Data-driven Snapshot

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Organizations with legacy, on-premises, near-real-time analytics solutions typically rely on self-managed relational databases as their data store for analytics workloads. Near-real-time streaming analytics captures the value of operational data and metrics to provide new insights to create business opportunities.

Management

Management Metadata Analytics Dashboards

Implement data warehousing solution using dbt on Amazon Redshift

AWS Big Data

NOVEMBER 17, 2023

Amazon Redshift is a cloud data warehousing service that provides high-performance analytical processing based on a massively parallel processing (MPP) architecture. For more information, refer SQL models. Snapshots – These implements type-2 slowly changing dimensions (SCDs) over mutable source tables.

Snapshot

Snapshot Data Processing Testing Data Warehouse

Implement disaster recovery with Amazon Redshift

AWS Big Data

JUNE 27, 2024

With built-in features such as automated snapshots and cross-Region replication, you can enhance your disaster resilience with Amazon Redshift. Amazon Redshift supports two kinds of snapshots: automatic and manual, which can be used to recover data. Snapshots are point-in-time backups of the Redshift data warehouse.

Snapshot

Snapshot Data Warehouse Data Processing Strategy

How OLX Group migrated to Amazon Redshift RA3 for simpler, faster, and more cost-effective analytics

AWS Big Data

FEBRUARY 13, 2023

This is a guest post by Miguel Chin, Data Engineering Manager at OLX Group and David Greenshtein, Specialist Solutions Architect for Analytics, AWS. To do this, we required the following: A reference cluster snapshot – This ensures that we can replay any tests starting from the same state. Take snapshot from 6 x RA3.4xlarge.

Snapshot

Snapshot Data Warehouse Analytics Testing

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

AWS Big Data

OCTOBER 3, 2023

For more information on streaming applications on AWS, refer to Real-time Data Streaming and Analytics. To learn more about the available optimize data executors and catalog properties, refer to the README file in the GitHub repo. For instructions to set up an EMR notebook, refer to Amazon EMR Studio overview.

Optimization

Optimization Snapshot Data Lake Metadata

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

AWS Big Data

MARCH 4, 2024

As enterprises collect increasing amounts of data from various sources, the structure and organization of that data often need to change over time to meet evolving analytical needs. For example, an ecommerce company may add new customer demographic attributes or order status flags to enrich analytics.

Snapshot

Snapshot Data Lake Metadata Recreation/Entertainment

Deprecation of Lake Formation’s Governed Tables Feature

AWS Big Data

OCTOBER 2, 2024

In this case, refer to Use CTAS and INSERT INTO to work around the 100 partition limit. If you specify partitions or buckets as part of the Apache Iceberg table definition, then you may run into the 100 partition per bucket limitation.

Snapshot

Snapshot Metadata Big Data Analytics

Achieve near real time operational analytics using Amazon Aurora PostgreSQL zero-ETL integration with Amazon Redshift

AWS Big Data

APRIL 10, 2024

When data is used to improve customer experiences and drive innovation, it can lead to business growth,” – Swami Sivasubramanian , VP of Database, Analytics, and Machine Learning at AWS in With a zero-ETL approach, AWS is helping builders realize near-real-time analytics. You can also connect via a client and create the database.

Data Warehouse

Data Warehouse Analytics Metrics Snapshot

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

Amazon SageMaker Lakehouse unifies all your data across Amazon S3 data lakes and Amazon Redshift data warehouses, helping you build powerful analytics and AI/ML applications on a single copy of data. About the authors Shovan Kanjilal is a Senior Analytics and Machine Learning Architect with Amazon Web Services.

Data Integration

Data Integration Data Lake Statistics Data-driven

Amazon OpenSearch Service H1 2023 in review

AWS Big Data

AUGUST 23, 2023

With managed domains, you can use advanced capabilities at no extra cost such as cross-cluster search, cross-cluster replication, anomaly detection, semantic search, security analytics, and more. At release, you could create search and time series collections for full-text search and log analytics use cases, respectively.

Snapshot

Snapshot Dashboards Visualization Metrics

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

Customers are using AWS and Snowflake to develop purpose-built data architectures that provide the performance required for modern analytics and artificial intelligence (AI) use cases. Snowflake integrates with AWS Glue Data Catalog to access the Iceberg table catalog and the files on Amazon S3 for analytical queries.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Your Definitive Guide To KPI Tracking By Utilizing Modern Software & Tools

datapine

APRIL 2, 2020

To track KPIs and set actionable benchmarks, today’s most forward-thinking businesses use what is often referred to as a KPI tracking system or a key performance indicator report. Key performance provides a panoramic snapshot of your business’s essential activities. So, what do most companies use to track KPIs?

KPI

KPI Key Performance Indicator Software Dashboards

Optimize checkpointing in your Amazon Managed Service for Apache Flink applications with buffer debloating and unaligned checkpoints – Part 2

AWS Big Data

SEPTEMBER 14, 2023

We’ve already discussed how checkpoints, when triggered by the job manager, signal all source operators to snapshot their state, which is then broadcasted as a special record called a checkpoint barrier. When barriers from all upstream partitions have arrived, the sub-task takes a snapshot of its state.

Snapshot

Snapshot Broadcasting Optimization Management

Improve the resilience of Amazon Managed Service for Apache Flink application with system-rollback feature

AWS Big Data

AUGUST 14, 2024

For example, when the application scales up but runs into issues restoring from a savepoint due to operator mismatch between the snapshot and the Flink job graph. You may also receive a snapshot compatibility error when upgrading to a new Apache Flink version. For troubleshooting information, refer to documentation.

Management

Management Snapshot Testing Dashboards

Evaluating sample Amazon Redshift data sharing architecture using Redshift Test Drive and advanced SQL analysis

AWS Big Data

SEPTEMBER 10, 2024

Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytics workloads. Refer to the Workload Replicator README and the Configuration Comparison README for more detailed instructions to execute a replay using the respective tool. The following image shows the process flow.

Testing

Testing Snapshot Data Warehouse Metrics

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

For more information, refer to Retry Amazon S3 requests with EMRFS. To learn more about how to create an EMR cluster with Iceberg and use Amazon EMR Studio, refer to Use an Iceberg cluster with Spark and the Amazon EMR Studio Management Guide , respectively. We expire the old snapshots from the table and keep only the last two.

Data Lake

Data Lake Snapshot Metadata Optimization

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

It supports modern analytical data lake operations such as create table as select (CTAS), upsert and merge, and time travel queries. Athena also supports the ability to create views and perform VACUUM (snapshot expiration) on Apache Iceberg tables to optimize storage and performance.

Data Lake

Data Lake Snapshot Optimization Data Transformation

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Apache Iceberg integration is supported by AWS analytics services including Amazon EMR , Amazon Athena , and AWS Glue. The snapshot points to the manifest list.

Data Lake

Data Lake Data Processing Metadata Snapshot

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

AWS Big Data

OCTOBER 18, 2023

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that provides the flexibility to use provisioned or serverless compute for your analytical workloads. Modern analytics is much wider than SQL-based data warehousing. Solution overview AWS SCT uses a service account to connect to your Azure Synapse Analytics.

Analytics

Analytics Data Warehouse Dashboards Testing

Getting started guide for near-real time operational analytics using Amazon Aurora zero-ETL integration with Amazon Redshift

AWS Big Data

JUNE 28, 2023

For more details, refer to the What’s New Post. In this post, we provide step-by-step guidance on how to get started with near-real time operational analytics using this feature. The transactional data from the source gets refreshed in near-real time on the destination, which processes analytical queries.

Data Warehouse

Data Warehouse Analytics Metrics Dashboards

Data Observability and Monitoring with DataOps

DataKitchen

MAY 10, 2021

When analytics and dashboards are inaccurate, business leaders may not be able to solve problems and pursue opportunities. If you have been in the data profession for any length of time, you probably know what it means to face a mob of stakeholders who are angry about inaccurate or late analytics. Data errors impact decision-making.

Testing

Testing Manufacturing Data Quality Statistics

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

AWS Big Data

MARCH 27, 2023

Amazon Redshift Serverless makes it simple to run and scale analytics without having to manage your data warehouse infrastructure. For Filter by resource type , you can filter by Workgroup , Namespace , Snapshot , and Recovery Point. For more details on tagging, refer to Tagging resources overview. Choose Save changes.

Data Warehouse

Data Warehouse Management Snapshot Data Lake

Move Beyond Excel, PowerPoint And Static Business Reporting with Powerful Interactive Dashboards

datapine

OCTOBER 14, 2020

Sometimes referred to as nested charts, they are especially useful in tables, where you can access additional drilldown options such as aggregated data for categories/breakdowns (e.g. For example, you may have different SQL databases, Google Analytics, and sales data in a CSV. In general, drilldowns can be added to any type of chart.

Dashboards

Dashboards Interactive Reporting KPI

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

AWS Big Data

JULY 28, 2023

Many customers are looking for best practices to keep their Amazon Redshift analytics environment compliant and have an ability to respond to GDPR right to forgotten requests. Redshift resources, such as namespaces, workgroups, snapshots, and clusters can be tagged. Tags provide metadata about resources at a glance.

Snapshot

Snapshot Metadata Measurement Data Warehouse

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

AWS Big Data

MAY 30, 2023

Customers have been using data warehousing solutions to perform their traditional analytics tasks. Recently, data lakes have gained lot of traction to become the foundation for analytical solutions, because they come with benefits such as scalability, fault tolerance, and support for structured, semi-structured, and unstructured datasets.

Data Lake

Data Lake Data Analytics Analytics Data Processing

Real-time cost savings for Amazon Managed Service for Apache Flink

AWS Big Data

MARCH 11, 2024

The third cost component is durable application backups, or snapshots. This is entirely optional and its impact on the overall cost is small, unless you retain a very large number of snapshots. The cost of durable application backup (snapshots) is $0.023 per GB per month. per hour, and attached application storage costs $0.10

Management

Management Snapshot Metrics Cost-Benefit

What is business intelligence? Transforming data into business insights

CIO Business Intelligence

JANUARY 20, 2023

BI tools access and analyze data sets and present analytical findings in reports, summaries, dashboards, graphs, charts, and maps to provide users with detailed intelligence about the state of the business. Whereas BI studies historical data to guide business decision-making, business analytics is about looking forward.

Business Intelligence

Business Intelligence Dashboards Data mining OLAP

Embed Amazon OpenSearch Service dashboards in your application

AWS Big Data

AUGUST 19, 2024

Customers across diverse industries rely on Amazon OpenSearch Service for interactive log analytics, real-time application monitoring, website search, vector database, deriving meaningful insights from data, and visualizing these insights using OpenSearch Dashboards. Under Generate the link as , select Snapshot and choose Copy iFrame code.

Dashboards

Dashboards Data Processing Visualization Snapshot

An automated approach to perform an in-place engine upgrade in Amazon OpenSearch Service

AWS Big Data

OCTOBER 26, 2023

Refer to OpenSearch language clients for a list of all supported client libraries. For more information, refer to Starting an upgrade (CLI) and Starting an upgrade (SDK). Take a manual snapshot of your domain. This snapshot serves as a backup that you can restore on a new domain if you want to return to using the prior version.

Dashboards

Dashboards Snapshot Testing Data-driven

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Data Vault overview For a brief review of the core Data Vault premise and concepts, refer to the first post in this series. For more information, refer to Amazon Redshift database encryption. Automated snapshots retain all of the data required to restore a data warehouse from a snapshot. model in Amazon Redshift.

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Configure monitoring, limits, and alarms in Amazon Redshift Serverless to keep costs predictable

AWS Big Data

JULY 25, 2023

Amazon Redshift Serverless makes it simple to run and scale analytics in seconds. For more information, refer to Granting access to monitor queries. For a complete list of system views and their uses, refer to Monitoring views. For more information, refer to WLM query monitoring rules.

Metrics

Metrics Data Warehouse Dashboards Snapshot

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

The result is made available to the application by querying the latest snapshot. The snapshot constantly updates through stream processing; therefore, the up-to-date data is provided in the context of a user prompt to the model. For more information, refer to Notions of Time: Event Time and Processing Time.

Data Lake

Data Lake Unstructured Data Management Snapshot

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in Amazon OpenSearch Service

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Webinars

Trending Sources

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Webinars

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

Run Apache XTable in AWS Lambda for background conversion of open table formats

Unleash the power of Snapshot Management to take automated snapshots using Amazon OpenSearch Service

Build a high-performance quant research platform with Apache Iceberg

Your Introduction To CFO Dashboards & Reports In The Digital Age

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Implement data warehousing solution using dbt on Amazon Redshift

Implement disaster recovery with Amazon Redshift

How OLX Group migrated to Amazon Redshift RA3 for simpler, faster, and more cost-effective analytics

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

Deprecation of Lake Formation’s Governed Tables Feature

Achieve near real time operational analytics using Amazon Aurora PostgreSQL zero-ETL integration with Amazon Redshift

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Amazon OpenSearch Service H1 2023 in review

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Your Definitive Guide To KPI Tracking By Utilizing Modern Software & Tools

Optimize checkpointing in your Amazon Managed Service for Apache Flink applications with buffer debloating and unaligned checkpoints – Part 2

Improve the resilience of Amazon Managed Service for Apache Flink application with system-rollback feature

Evaluating sample Amazon Redshift data sharing architecture using Redshift Test Drive and advanced SQL analysis

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

Use Apache Iceberg in a data lake to support incremental data processing

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

Getting started guide for near-real time operational analytics using Amazon Aurora zero-ETL integration with Amazon Redshift

Data Observability and Monitoring with DataOps

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

Move Beyond Excel, PowerPoint And Static Business Reporting with Powerful Interactive Dashboards

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

Real-time cost savings for Amazon Managed Service for Apache Flink

What is business intelligence? Transforming data into business insights

Embed Amazon OpenSearch Service dashboards in your application

An automated approach to perform an in-place engine upgrade in Amazon OpenSearch Service

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Configure monitoring, limits, and alarms in Amazon Redshift Serverless to keep costs predictable

Exploring real-time streaming for generative AI Applications

Stay Connected