Data Lake and Metrics - Data Leaders Brief

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Here’s Why Automation For Data Lakes Could Be Important

Smart Data Collective

APRIL 2, 2019

Data Lakes are among the most complex and sophisticated data storage and processing facilities we have available to us today as human beings. Analytics Magazine notes that data lakes are among the most useful tools that an enterprise may have at its disposal when aiming to compete with competitors via innovation.

Data Lake

Data Lake Big Data OLAP Testing

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The combination of a data lake in a serverless paradigm brings significant cost and performance benefits. By monitoring application logs, you can gain insights into job execution, troubleshoot issues promptly to ensure the overall health and reliability of data pipelines.

Data Lake

Data Lake Metrics Testing Cost-Benefit

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Big Data

NOVEMBER 20, 2023

Today we are pleased to announce a new class of Amazon CloudWatch metrics reported with your pipelines built on top of AWS Glue for Apache Spark jobs. The new metrics provide aggregate and fine-grained insights into the health and operations of your job runs and the data being processed. workerUtilization showed 1.0

Metrics

Metrics Data Lake Cost-Benefit Dashboards

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Since Apache Iceberg is well supported by AWS data services and Cloudinary was already using Spark on Amazon EMR, they could integrate writing to Data Catalog and start an additional Spark cluster to handle data maintenance and compaction. For example, for certain queries, Athena runtime was 2x–4x faster than Snowflake.

Data Lake

Data Lake Metadata Snapshot Analytics

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics: Part 2

AWS Big Data

FEBRUARY 13, 2024

AWS Glue has made this more straightforward with the launch of AWS Glue job observability metrics , which provide valuable insights into your data integration pipelines built on AWS Glue. This post, walks through how to integrate AWS Glue job observability metrics with Grafana using Amazon Managed Grafana. Choose Administration.

Metrics

Metrics Dashboards Visualization Key Performance Indicator

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources. The default output is log based.

Metadata

Metadata Snapshot Data Lake Metrics

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

According to a study from Rocket Software and Foundry , 76% of IT decision-makers say challenges around accessing mainframe data and contextual metadata are a barrier to mainframe data usage, while 64% view integrating mainframe data with cloud data sources as the primary challenge.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

In 2022, data organizations will institute robust automated processes around their AI systems to make them more accountable to stakeholders. Quality test suites will enforce “equity,” like any other performance metric. For example, a Hub-Spoke architecture could integrate data from a multitude of sources into a data lake.

Testing

Testing Data Lake Data Architecture Manufacturing

Deriving Value from Data Lakes with AI

Sisense

DECEMBER 23, 2019

AI and ML are the only ways to derive value from massive data lakes, cloud-native data warehouses, and other huge stores of information. There just aren’t enough AI and data science practitioners to go around to tackle this lofty goal. Apply that metric to any other business-critical function.

Data Lake

Data Lake Machine Learning Data Warehouse Digital Transformation

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

In Part 2 of this series, we discussed how to enable AWS Glue job observability metrics and integrate them with Grafana for real-time monitoring. In this post, we explore how to connect QuickSight to Amazon CloudWatch metrics and build graphs to uncover trends in AWS Glue job observability metrics.

Metrics

Metrics Visualization Dashboards Publishing

Steps taken to build Sevita’s first enterprise data platform

CIO Business Intelligence

OCTOBER 23, 2024

We pulled these people together, and defined use cases we could all agree were the best to demonstrate our new data capability. Once they were identified, we had to determine we had the right data. Then we migrated the data to our new data lake, and stood up the new platform.

Enterprise

Enterprise Dashboards KPI Data Lake

Scalable analytics and centralized governance for Apache Iceberg tables using Amazon S3 Tables and Amazon Redshift

AWS Big Data

MAY 22, 2025

While this blog post helps you to get started using Amazon Redshift with Amazon S3 Tables, there are additional steps you need to consider when working with your data in production environments, including who has access to your data and with what level of permissions. impl=org.apache.iceberg.aws.s3.S3FileIO"

Analytics

Analytics Data Lake Management Insurance

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake. What’s in a Data Lake? Data warehouses do a great job of standardizing data from disparate sources for analysis. Taking a Dip.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

AWS Big Data

OCTOBER 30, 2024

It shows the aggregate metrics of the files that have been processed by a auto-copy job. Eren Baydemir , a Technical Product Manager at AWS, has 15 years of experience in building customer-facing products and is currently focusing on data lake and file ingestion topics in the Amazon Redshift team.

Data Warehouse

Data Warehouse Sales Data Lake Recreation/Entertainment

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

AWS Glue Data Quality allows you to measure and monitor the quality of data in your data repositories. It’s important for business users to be able to see quality scores and metrics to make confident business decisions and debug data quality issues. An AWS Glue crawler crawls the results.

Data Quality

Data Quality Metrics Visualization Dashboards

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

cycle_end"', "sagemakedatalakeenvironment_sub_db", ctas_approach=False) A similar approach is used to connect to shared data from Amazon Redshift, which is also shared using Amazon DataZone. datazone_env_twinsimsilverdata"."cycle_end";') She can reached via LinkedIn. Siamak Nariman is a Senior Product Manager at AWS.

IoT

IoT Machine Learning Metadata Data-driven

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Analyzing the business-case approach Perdue Farms takes to derive value from data

CIO Business Intelligence

SEPTEMBER 20, 2023

The data can also help us enrich our commodity products. How are you populating your data lake? We’ve decided to take a practical approach, led by Kyle Benning, who runs our data function. Then our analytics team, an IT group, makes sure we build the data lake in the right sequence.

Data Lake

Data Lake Data-driven Dashboards Risk

AtScale Universal Semantic Layer Democratizes and Scales Analytics

David Menninger's Analyst Perspectives

FEBRUARY 10, 2022

Organizations of all sizes are dealing with exponentially increasing data volume and data sources, which creates challenges such as siloed information, increased technical complexities across various systems and slow reporting of important business metrics.

Analytics

Analytics Business Intelligence Metrics Reporting

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

We have seen a strong customer demand to expand its scope to cloud-based data lakes because data lakes are increasingly the enterprise solution for large-scale data initiatives due to their power and capabilities. The gold model joins the technical logs with billing data and organizes the metrics per business unit.

Data Lake

Data Lake Management Metrics Data Warehouse

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

AWS Big Data

OCTOBER 9, 2024

Today, customers are embarking on data modernization programs by migrating on-premises data warehouses and data lakes to the AWS Cloud to take advantage of the scale and advanced analytical capabilities of the cloud. Compare ongoing data that is replicated from the source on-premises database to the target S3 data lake.

Data Quality

Data Quality Data Lake Data Warehouse Metrics

Data Cataloging in the Data Lake: Alation + Kylo

Alation

FEBRUARY 20, 2020

When it was no longer a hard requirement that a physical data model be created upon the ingestion of data, there was a resulting drop in richness of the description and consistency of the data stored in Hadoop. You did not have to understand or prepare the data to get it into Hadoop, so people rarely did.

Data Lake

Data Lake Metadata Structured Data Big Data

Data-Centric Firms Address Athena Shortcomings with Smart Indexing

Smart Data Collective

FEBRUARY 23, 2022

Traditional relational databases provide certain benefits, but they are not suitable to handle big and various data. That is when data lake products started gaining popularity, and since then, more companies introduced lake solutions as part of their data infrastructure. How to improve indexing.

Data Lake

Data Lake Cost-Benefit Optimization Big Data

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. It also helps you securely access your data in operational databases, data lakes, or third-party datasets with minimal movement or copying of data.

Analytics

Analytics Data Warehouse Big Data Metrics

Mosaic builds a global IT foundation for growth

CIO Business Intelligence

JULY 29, 2024

They also built an Azure-based data lake to provide global visibility of the company’s data to its 13,000-strong workforce. Digital transformation projects have always been about creating a data-driven business. For example, optimizing water usage in agriculture is a key metric.

Digital Transformation

Digital Transformation IT Data Lake Sales

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

AWS Big Data

DECEMBER 4, 2024

In today’s data-driven world , organizations are constantly seeking efficient ways to process and analyze vast amounts of information across data lakes and warehouses. This post will showcase how this data can also be queried by other data teams using Amazon Athena. Verify that you have Python version 3.7

Data Lake

Data Lake Metadata Insurance Data-driven

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

AWS Big Data

MARCH 4, 2024

As enterprises collect increasing amounts of data from various sources, the structure and organization of that data often need to change over time to meet evolving analytical needs. Schema evolution enables adding, deleting, renaming, or modifying columns without needing to rewrite existing data.

Snapshot

Snapshot Data Lake Metadata Recreation/Entertainment

Interview with: Sankar Narayanan, Chief Practice Officer at Fractal Analytics

Corinium

JUNE 6, 2019

Some of the work is very foundational, such as building an enterprise data lake and migrating it to the cloud, which enables other more direct value-added activities such as self-service. Measure user adoption and engagement metrics to not just understand products take-up, but also to enhance the overall product propositions.

Insurance

Insurance Analytics Forecasting Deep Learning

The essential check list for effective data democratization

CIO Business Intelligence

JANUARY 20, 2023

Truly data-driven companies see significantly better business outcomes than those that aren’t. According to a recent IDC whitepaper , leaders saw on average two and a half times better results than other organizations in many business metrics. Most organizations don’t end up with data lakes, says Orlandini.

Data Lake

Data Lake Data-driven Finance Data Architecture

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

DataOps For Business Analytics Teams

DataKitchen

JANUARY 3, 2022

There’s a recent trend toward people creating data lake or data warehouse patterns and calling it data enablement or a data hub. DataOps expands upon this approach by focusing on the processes and workflows that create data enablement and business analytics. DataOps Process Hub.

Business Analytics

Business Analytics Analytics Testing Dashboards

Gartner Market Guide to DataOps Software

DataKitchen

DECEMBER 6, 2022

The two things we are most excited about are: First, DataOps is distinct from all Data Analytic tools. As founders, we sat in a room eight years ago (when all the rage was Hadoop, data prep, and data lakes) and debated — will there ever be an ‘ops’ layer that sits next to all the current data tools?

Software

Software Marketing Data Lake Testing

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

Jon Pruitt, director of IT at Hartsfield-Jackson Atlanta International Airport, and his team crafted a visual business intelligence dashboard for a top executive in its Emergency Response Team to provide key metrics at a glance, including weather status, terminal occupancy, concessions operations, and parking capacity.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

Implement alerts in Amazon OpenSearch Service with PagerDuty

AWS Big Data

JUNE 8, 2023

Configure OpenSearch Service alerts to send notifications to PagerDuty We can monitor OpenSearch cluster health in two different ways: Using the OpenSearch Dashboard alerting plugin by setting up a per cluster metrics monitor. This provides a query to retrieve metrics related to the cluster health. Choose Preview query.

Data Lake

Data Lake Dashboards Metrics Testing

Thermo Fisher transforms its customer experience

CIO Business Intelligence

AUGUST 12, 2022

For its order-entry automation module, Northstar leans on AI and RPA to optimize data recognition and verification, and to reduce errors and accelerate order cycle times. The team also built a centralized data lake on AWS, Databricks, and Power BI. Catalyzing change.

IT

IT Data Lake Sales Machine Learning

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

AWS Big Data

NOVEMBER 7, 2024

Tens of thousands of customers use Amazon Redshift every day to run analytics, processing exabytes of data for business insights. times better price performance than other cloud data warehouses. Amazon Redshift is built for scale and delivers up to 7.9

Data Warehouse

Data Warehouse Reporting Big Data Data Lake

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The Perilous State of Today’s Data Environments Data teams often navigate a labyrinth of chaos within their databases. Extrinsic Control Deficit: Many of these changes stem from tools and processes beyond the immediate control of the data team. Identifying Anomalies: Use advanced algorithms to detect anomalies in data patterns.

Data Quality

Data Quality Testing Data Lake Data Integration

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Here’s Why Automation For Data Lakes Could Be Important

Webinars

Trending Sources

Monitor data pipelines in a serverless data lake

Webinars

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

Choosing an open table format for your transactional data lake on AWS

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics: Part 2

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Bridging the gap between mainframe data and hybrid cloud environments

Eight Top DataOps Trends for 2022

Deriving Value from Data Lakes with AI

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Steps taken to build Sevita’s first enterprise data platform

Scalable analytics and centralized governance for Apache Iceberg tables using Amazon S3 Tables and Amazon Redshift

Data Lakes: What Are They and Who Needs Them?

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

Visualize data quality scores and metrics generated by AWS Glue Data Quality

How EUROGATE established a data mesh architecture using Amazon DataZone

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Analyzing the business-case approach Perdue Farms takes to derive value from data

AtScale Universal Semantic Layer Democratizes and Scales Analytics

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

Data Cataloging in the Data Lake: Alation + Kylo

Data-Centric Firms Address Athena Shortcomings with Smart Indexing

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Mosaic builds a global IT foundation for growth

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

Interview with: Sankar Narayanan, Chief Practice Officer at Fractal Analytics

The essential check list for effective data democratization

Top analytics announcements of AWS re:Invent 2024

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

DataOps For Business Analytics Teams

Gartner Market Guide to DataOps Software

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Implement alerts in Amazon OpenSearch Service with PagerDuty

Thermo Fisher transforms its customer experience

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Stay Connected