Remove Data Lake Remove Experimentation Remove Snapshot Remove Testing
article thumbnail

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

AWS Big Data

Terminology Let’s first discuss some of the terminology used in this post: Research data lake on Amazon S3 – A data lake is a large, centralized repository that allows you to manage all your structured and unstructured data at any scale.

article thumbnail

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

When you build your transactional data lake using Apache Iceberg to solve your functional use cases, you need to focus on operational use cases for your S3 data lake to optimize the production environment. You can use either the AWS Glue Data Catalog (recommended) or a Hive catalog for Iceberg tables.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

It has far-reaching implications as to how such applications should be developed and by whom: ML applications are directly exposed to the constantly changing real world through data, whereas traditional software operates in a simplified, static, abstract world which is directly constructed by the developer. This approach is not novel.

IT 352
article thumbnail

Unleashing the power of Presto: The Uber case study

IBM Big Data Hub

Uber understood that digital superiority required the capture of all their transactional data, not just a sampling. They stood up a file-based data lake alongside their analytical database. Because much of the work done on their data lake is exploratory in nature, many users want to execute untested queries on petabytes of data.

OLAP 87