article thumbnail

Accelerate your migration to Amazon OpenSearch Service with Reindexing-from-Snapshot

AWS Big Data

In this post, we will introduce a new mechanism called Reindexing-from-Snapshot (RFS), and explain how it can address your concerns and simplify migrating to OpenSearch. Documents are parsed from the snapshot and then reindexed to the target cluster, so that performance impact to the source clusters is minimized during migration.

article thumbnail

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

AWS Big Data

Has many years of experience in big data, enterprise digital transformation research and development, consulting, and project management across telecommunications, entertainment, and financial industries.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

AWS Big Data

However, altering schema and table partitions in traditional data lakes can be a disruptive and time-consuming task, requiring renaming or recreating entire tables and reprocessing large datasets. Iceberg creates snapshots for the table contents. Each snapshot is a complete set of data files in the table at a point in time.

Snapshot 123
article thumbnail

Comparing DynamoDB and MongoDB for Big Data Management

Smart Data Collective

But MongoDB also offers filesystem snapshot backups and queryable backups. You don’t get queryable backup on DynamoDB and you might need to manually recreate many configurations that are not backed up. The two systems are relatively similar when it comes to backup. MongoDB offers on-demand, continuous backups — as does DynamoDB.

Big Data 132
article thumbnail

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

In this method, the metadata are recreated in an isolated environment and colocated with the existing data files. An in-place migration can be performed in either of two ways: Using add_files : This procedure adds existing data files to an existing Iceberg table with a new snapshot that includes the files. This can save time.

Data Lake 116
article thumbnail

Implement disaster recovery with Amazon Redshift

AWS Big Data

With built-in features such as automated snapshots and cross-Region replication, you can enhance your disaster resilience with Amazon Redshift. Amazon Redshift supports two kinds of snapshots: automatic and manual, which can be used to recover data. Snapshots are point-in-time backups of the Redshift data warehouse.

article thumbnail

Evaluating sample Amazon Redshift data sharing architecture using Redshift Test Drive and advanced SQL analysis

AWS Big Data

Configure Amazon Redshift Data Warehouse Create a snapshot following the guidance in the Amazon Redshift Management Guide. Launch the baseline replica from the snapshot by restoring a 2 node ra3.4xlarge provisioned cluster from the snapshot. Enable audit logging following the guidance in Amazon Redshift Management Guide.

Testing 107