Data Lake, Marketing and Snapshot

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Iceberg provides time travel and snapshotting capabilities out of the box to manage lookahead bias that could be embedded in the data (such as delayed data delivery). Simplified data corrections and updates Iceberg enhances data management for quants in capital markets through its robust insert, delete, and update capabilities.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it’s fast, efficient, and reliable at any scale and keeps records of how datasets change over time.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

AWS Big Data

JULY 3, 2023

Major market indexes, such as S&P 500, are subject to periodic inclusions and exclusions for reasons beyond the scope of this post (for an example, refer to CoStar Group, Invitation Homes Set to Join S&P 500; Others to Join S&P 100, S&P MidCap 400, and S&P SmallCap 600 ).

Snapshot

Snapshot Data Lake Testing Strategy

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

AWS-powered data lakes, supported by the unmatched availability of Amazon Simple Storage Service (Amazon S3), can handle the scale, agility, and flexibility required to combine different data and analytics approaches. It will never remove files that are still required by a non-expired snapshot.

Snapshot

Snapshot Data Lake Metadata Optimization

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

AWS Big Data

MARCH 27, 2023

Amazon Redshift Serverless makes it simple to run and scale analytics without having to manage your data warehouse infrastructure. Solution overview Let’s say that your company has two departments: marketing and finance. For Filter by resource type , you can filter by Workgroup , Namespace , Snapshot , and Recovery Point.

Data Warehouse

Data Warehouse Management Snapshot Data Lake

How Gupshup built their multi-tenant messaging analytics platform on Amazon Redshift

AWS Big Data

FEBRUARY 12, 2024

Across verticals, thousands of large and small businesses in emerging markets use Gupshup to build conversational experiences across marketing, sales, and support. It makes it fast, simple, and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools.

Analytics

Analytics Data Warehouse Snapshot Cost-Benefit

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Interview with Dominic Sartorio, Senior Vice President for Products & Development, Protegrity

Corinium

APRIL 25, 2019

Most of my days focus on understanding what’s happening in the market, defining overall product strategy and direction, and translating into execution across the various teams. Why should Chief Data & Analytics Officers care about data security? In data-driven organizations, data is flowing.

Insurance

Insurance Risk IoT Data-driven

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

In the subsequent post in our series, we will explore the architectural patterns in building streaming pipelines for real-time BI dashboards, contact center agent, ledger data, personalized real-time recommendation, log analytics, IoT data, Change Data Capture, and real-time marketing data.

Analytics

Analytics IoT Data-driven Snapshot

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

AWS Big Data

JULY 28, 2023

Tagging Consider tagging your Amazon Redshift resources to quickly identify which clusters and snapshots contain the PII data, the owners, the data retention policy, and so on. Redshift resources, such as namespaces, workgroups, snapshots, and clusters can be tagged. Tags provide metadata about resources at a glance.

Snapshot

Snapshot Metadata Measurement Data Warehouse

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

Every day, Amazon devices process and analyze billions of transactions from global shipping, inventory, capacity, supply, sales, marketing, producers, and customer service teams. This data is used in procuring devices’ inventory to meet Amazon customers’ demands. Then we chose Amazon Athena as our query service.

Optimization

Optimization Forecasting Data Lake Metadata

Chose Both: Data Fabric and Data Lakehouse

Cloudera

SEPTEMBER 12, 2022

A data lakehouse that enables multiple engines to run on the same data improves speed to market and productivity of users. . Cloudera has supported data lakehouses for over five years. Applying the Iceberg table format to all the organization’s data in the data lake makes it more performant and usable at scale.

Unstructured Data

Unstructured Data Data Architecture Data Lake Snapshot

Snowflake and Domino: Better Together

Domino Data Lab

JANUARY 11, 2021

Data Science works best with a high degree of data granularity when the data offers the closest possible representation of what happened during actual events – as in financial transactions, medical consultations or marketing campaign results. About Domino Data Lab. Integration Features.

Recreation/Entertainment

Recreation/Entertainment Data Science Data Warehouse Publishing

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

From detailed design to a beta release, Tricentis had customers expecting to consume data from a data lake specific to only their data, and all of the data that had been generated for over a decade. Data export As stated earlier, some customers want to get an export of their test data and create their data lake.

Software

Software Data Lake Testing Cost-Benefit

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

AWS Big Data

MARCH 18, 2024

Namespaces group together all of the resources you use in Redshift Serverless, such as schemas, tables, users, datashares, and snapshots. First, we need to give our Redshift namespace permission via AWS Identity and Access Management (IAM) to access subscriptions on AWS Data Exchange. Workgroup – A collection of compute resources.

Data Warehouse

Data Warehouse Visualization Snapshot Data-driven

Dimensional modeling in Amazon Redshift

AWS Big Data

JULY 19, 2023

We show how to perform extract, transform, and load (ELT), an integration process focused on getting the raw data from a data lake into a staging layer to perform the modeling. The data (business process) needs to be integrated across various departments, in this case, marketing can access the sales data.

Modeling

Modeling Sales Data Warehouse Snapshot

Accelerating revenue growth with real-time analytics: Poshmark’s journey

AWS Big Data

MARCH 20, 2023

The two main approaches organizations employ to increase revenue are to expand geographically to enter new markets and to increase market share within a market by improving customer experience (CX). Improving CX is a well-known guideline to attract and retain customers and thereby increase the market share.

Analytics

Analytics Slice and Dice Data Processing Data Lake

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

AWS Big Data

JUNE 12, 2023

One notable trend in the streaming solutions market is the widespread use of Apache Kafka for data ingestion and Apache Spark for streaming processing across industries. Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed Apache Kafka service that offers a seamless way to ingest and process streaming data.

Management

Management Metadata Internet of Things Testing

Jumia builds a next-generation data platform with metadata-driven specification frameworks

AWS Big Data

DECEMBER 20, 2024

Jumia is present in NYSE and has a market cap of $554 million. In this post, we share part of the journey that Jumia took with AWS Professional Services to modernize its data platform that ran under a Hadoop distribution to AWS serverless based solutions. This process brings data from all the different layers of the data lake.

Metadata

Metadata Data-driven Snapshot Data Lake

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

AWS Big Data

FEBRUARY 18, 2025

To optimize their security operations, organizations are adopting modern approaches that combine real-time monitoring with scalable data analytics. They are using data lake architectures and Apache Iceberg to efficiently process large volumes of security data while minimizing operational overhead.

Snapshot

Snapshot Optimization Data Lake Metadata

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

AWS Big Data

NOVEMBER 6, 2024

Second, because traditional data warehousing approaches are unable to keep up with the volume, velocity, and variety of data, engineering teams are building data lakes and adopting open data formats such as Parquet and Apache Iceberg to store their data.

Metadata

Metadata Data Lake Management Internet of Things

Secure Data Sharing and Interoperability Powered by Iceberg REST Catalog

Cloudera

DECEMBER 3, 2024

Constant data duplication, complex Extract, Transform & Load (ETL) pipelines, and sprawling infrastructure leads to prohibitively expensive solutions, adversely impacting the Time to Value, Time to Market, overall Total Cost of Ownership (TCO), and Return on Investment (ROI) for the business.

Metadata

Metadata Data Warehouse ROI Snapshot

Data Leaders Brief

Build a high-performance quant research platform with Apache Iceberg

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Webinars

Trending Sources

Choosing an open table format for your transactional data lake on AWS

Webinars

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

How Gupshup built their multi-tenant messaging analytics platform on Amazon Redshift

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Interview with Dominic Sartorio, Senior Vice President for Products & Development, Protegrity

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Chose Both: Data Fabric and Data Lakehouse

Snowflake and Domino: Better Together

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

Dimensional modeling in Amazon Redshift

Accelerating revenue growth with real-time analytics: Poshmark’s journey

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

Jumia builds a next-generation data platform with metadata-driven specification frameworks

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

Secure Data Sharing and Interoperability Powered by Iceberg REST Catalog

Stay Connected