Data Lake, Snapshot and Visualization

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it’s fast, efficient, and reliable at any scale and keeps records of how datasets change over time.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

AWS Big Data

DECEMBER 12, 2024

In the context of comprehensive data governance, Amazon DataZone offers organization-wide data lineage visualization using Amazon Web Services (AWS) services, while dbt provides project-level lineage through model analysis and supports cross-project integration between data lakes and warehouses.

Snapshot

Snapshot Recreation/Entertainment Experimentation Data Lake

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor

AWS Big Data

MARCH 20, 2023

In the first post of this series , we described how AWS Glue for Apache Spark works with Apache Hudi, Linux Foundation Delta Lake, and Apache Iceberg datasets tables using the native support of those data lake formats. Even without prior experience using Hudi, Delta Lake or Iceberg, you can easily achieve typical use cases.

Visualization

Visualization Data Lake Snapshot Big Data

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as managing continuously evolving datasets with transactions while maintaining query performance. Create your S3 bucket if you do not have it.

Data Lake

Data Lake Snapshot Metadata Optimization

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

These processes retrieve data from around 90 different data sources, resulting in updating roughly 2,000 tables in the data warehouse and 3,000 external tables in Parquet format, accessed through Amazon Redshift Spectrum and a data lake on Amazon Simple Storage Service (Amazon S3). TB of data.

Data Lake

Data Lake Analytics Data Warehouse Data-driven

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

AWS Big Data

MARCH 27, 2023

Amazon Redshift Serverless makes it simple to run and scale analytics without having to manage your data warehouse infrastructure. For Filter by resource type , you can filter by Workgroup , Namespace , Snapshot , and Recovery Point. For this post, we don’t include any tag filters, so we can view all the resources across our account.

Data Warehouse

Data Warehouse Management Snapshot Data Lake

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

AWS Big Data

JUNE 21, 2023

When setting out to build a data warehouse, it’s a common pattern to have a data lake as the source of the data warehouse. The data lake in this context serves a number of important functions: It acts as a central source for multiple applications, not just exclusively for data warehousing purposes.

Data Warehouse

Data Warehouse Data Lake OLAP Cost-Benefit

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

With Kinesis Data Streams, customers can continuously capture terabytes of time series data from thousands of sources for cleaning, enrichment, storage, analysis, and visualization. The raw data can be streamed to Amazon S3 for archiving. Grafana supports data sources that are storage backends for time series data.

Analytics

Analytics IoT Data-driven Snapshot

Estimating Scope 1 Carbon Footprint with Amazon Athena

AWS Big Data

AUGUST 2, 2023

The data architecture diagram below shows an example of how you could use AWS services to calculate and visualize an organization’s estimated carbon footprint. Customers have the flexibility to choose the services in each stage of the data pipeline based on their use case. usage_therms", "gasutilization"."usage_scf"

Data Lake

Data Lake Measurement Visualization Data Architecture

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Furthermore, data events are filtered, enriched, and transformed to a consumable format using a stream processor. The result is made available to the application by querying the latest snapshot. OpenSearch Service offers visualization capabilities powered by OpenSearch Dashboards and Kibana (1.5

Data Lake

Data Lake Unstructured Data Management Snapshot

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

AWS Big Data

MARCH 18, 2024

Namespaces group together all of the resources you use in Redshift Serverless, such as schemas, tables, users, datashares, and snapshots. First, we need to give our Redshift namespace permission via AWS Identity and Access Management (IAM) to access subscriptions on AWS Data Exchange. Select Directly query your data.

Data Warehouse

Data Warehouse Visualization Snapshot Data-driven

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

AWS Big Data

JULY 28, 2023

Data mapping involves identifying and documenting the flow of personal data in an organization. It helps organizations understand how personal data moves through their systems, where it is stored, and how it is processed. Note that putting a comprehensive data strategy in place is not in scope for this post.

Snapshot

Snapshot Metadata Measurement Data Warehouse

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes.

Optimization

Optimization Forecasting Data Lake Metadata

Configure monitoring, limits, and alarms in Amazon Redshift Serverless to keep costs predictable

AWS Big Data

JULY 25, 2023

This gives a visual representation of peaks and lows in your consumption over your chosen period of time. Choose your level of metrics to monitor: Workgroup Namespace Snapshot storage If we select Workgroup , we can choose from the workgroup-level metrics shown in the following screenshot. Ashish has over 24 years of experience in IT.

Metrics

Metrics Data Warehouse Dashboards Snapshot

Dimensional modeling in Amazon Redshift

AWS Big Data

JULY 19, 2023

We show how to perform extract, transform, and load (ELT), an integration process focused on getting the raw data from a data lake into a staging layer to perform the modeling. We use our data mart to visually present the facts in the form of a dashboard. We use this dataset to visualize the data mart.

Modeling

Modeling Sales Data Warehouse Snapshot

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes.

Data Lake

Data Lake Dashboards Metrics Metadata

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

We can determine the following are needed: An open data format ingestion architecture processing the source dataset and refining the data in the S3 data lake. This requires a dedicated team of 3–7 members building a serverless data lake for all data sources. Vijay Bagur is a Sr.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Empower Your Cyber Defenders with Real-Time Analytics

Cloudera

NOVEMBER 15, 2024

A Better Way Forward: Cloudera’s Open Data Lakehouse Cloudera offers a solution to these challenges with its open data lakehouse, which combines the flexibility and scalability of data lake storage with data warehouse functionality to unify and simplify the management of cyber log data.

Analytics

Analytics Metadata Snapshot Data-driven

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

For example, a Jupyter notebook in CML, can use Spark or Python framework to directly access an Iceberg table to build a forecast model, while new data is ingested via NiFi flows, and a SQL analyst monitors revenue targets using Data Visualization. 2: Open formats. Financial regulation. Reproducibility for ML Ops.

Metadata

Metadata Data Architecture Machine Learning Cost-Benefit

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

While aggregating, summarizing, and aligning to a common information model, all transformations must not affect the integrity of data from its source. The solution Tricentis Analytics aims to address the challenges of high volume, near-real-time, and visually appealing reporting and analytics across the entire Tricentis product portfolio.

Software

Software Data Lake Testing Cost-Benefit

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

AWS Big Data

DECEMBER 9, 2024

Today, many customers build data quality validation pipelines using its Data Quality Definition Language (DQDL) because with static rules, dynamic rules , and anomaly detection capability , its fairly straightforward. One of its key features is the ability to manage data using branches. additional_python_modules pandas==2.2

Data Quality

Data Quality Publishing Snapshot Data Lake

Empower Your Cyber Defenders with Real-Time Analytics Author: Carolyn Duby, Field CTO

Cloudera

NOVEMBER 15, 2024

A Better Way Forward: Cloudera’s Open Data Lakehouse Cloudera offers a solution to these challenges with its open data lakehouse, which combines the flexibility and scalability of data lake storage with data warehouse functionality to unify and simplify the management of cyber log data.

Analytics

Analytics Metadata Snapshot Data-driven

Data Leaders Brief

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Load data incrementally from transactional data lakes to data warehouses

Webinars

Trending Sources

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

Webinars

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor

Introducing Apache Hudi support with AWS Glue crawlers

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Estimating Scope 1 Carbon Footprint with Amazon Athena

Exploring real-time streaming for generative AI Applications

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Configure monitoring, limits, and alarms in Amazon Redshift Serverless to keep costs predictable

Dimensional modeling in Amazon Redshift

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Empower Your Cyber Defenders with Real-Time Analytics

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

Empower Your Cyber Defenders with Real-Time Analytics Author: Carolyn Duby, Field CTO

Stay Connected