Data Lake, Metrics and Unstructured Data

Data Lake

Metrics

Unstructured Data

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data governance is a critical building block across all these approaches, and we see two emerging areas of focus. First, many LLM use cases rely on enterprise knowledge that needs to be drawn from unstructured data such as documents, transcripts, and images, in addition to structured data from data warehouses.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

Jon Pruitt, director of IT at Hartsfield-Jackson Atlanta International Airport, and his team crafted a visual business intelligence dashboard for a top executive in its Emergency Response Team to provide key metrics at a glance, including weather status, terminal occupancy, concessions operations, and parking capacity.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Cloudera

AUGUST 13, 2021

Every enterprise is trying to collect and analyze data to get better insights into their business. Whether it is consuming log files, sensor metrics, and other unstructured data, most enterprises manage and deliver data to the data lake and leverage various applications like ETL tools, search engines, and databases for analysis.

Analytics

Analytics Data Lake Unstructured Data Enterprise

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Stream processing, however, can enable the chatbot to access real-time data and adapt to changes in availability and price, providing the best guidance to the customer and enhancing the customer experience. When the model finds an anomaly or abnormal metric value, it should immediately produce an alert and notify the operator.

Data Lake

Data Lake Unstructured Data Management Snapshot

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Inflexible schema, poor for unstructured or real-time data. Data lake Raw storage for all types of structured and unstructured data. Low cost, flexibility, captures diverse data sources. Easy to lose control, risk of becoming a data swamp. Exploratory analytics, raw and diverse data types.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Data science vs data analytics: Unpacking the differences

IBM Big Data Hub

SEPTEMBER 19, 2023

Data science is an area of expertise that combines many disciplines such as mathematics, computer science, software engineering and statistics. It focuses on data collection and management of large-scale structured and unstructured data for various academic and business applications.

Data Science

Data Science Data Analytics Prescriptive Analytics Analytics

Get maximum value out of your cloud data warehouse with Amazon Redshift

AWS Big Data

APRIL 19, 2023

Building an optimal data system As data grows at an extraordinary rate, data proliferation across your data stores, data warehouse, and data lakes can become a challenge. This performance innovation allows Nasdaq to have a multi-use data lake between teams.

Data Warehouse

Data Warehouse Data Lake Unstructured Data Optimization

A hybrid approach in healthcare data warehousing with Amazon Redshift

AWS Big Data

FEBRUARY 21, 2023

Regardless of the division or use case it is related to, dimensional data models can be used to store data obtained from tracking various processes like patient encounters, provider practice metrics, aftercare surveys, and more. Although data lakes resemble data vaults, a data vault provides more features of a data warehouse.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Metadata

Advancing AI: The emergence of a modern information lifecycle

CIO Business Intelligence

DECEMBER 4, 2023

As data skyrockets—especially unstructured data—organizations need an intentional plan and ongoing approach to protect, manage, and optimize data for monetization. The ways modern data is used, processed, and analyzed are continuously evolving as machine learning technology becomes better at these tasks.

Unstructured Data

Unstructured Data Data Lake Business Objectives Metadata

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

Stream ingestion – The stream ingestion layer is responsible for ingesting data into the stream storage layer. It provides the ability to collect data from tens of thousands of data sources and ingest in real time. You can use Amazon EMR for streaming data processing to use your favorite open source big data frameworks.

Analytics

Analytics IoT Data-driven Snapshot

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

The reasons for this are simple: Before you can start analyzing data, huge datasets like data lakes must be modeled or transformed to be usable. According to a recent survey conducted by IDC , 43% of respondents were drawing intelligence from 10 to 30 data sources in 2020, with a jump to 64% in 2021!

Modeling

Modeling Big Data IoT Data Warehouse

A Guide to Data Analytics in the Travel Industry

Alation

MARCH 21, 2023

To fully realize data’s value, organizations in the travel industry need to dismantle data silos so that they can securely and efficiently leverage analytics across their organizations. What is big data in the travel and tourism industry? Using Alation, ARC automated the data curation and cataloging process. “So

Data Analytics

Data Analytics Analytics Data-driven Big Data

What is a Data Pipeline?

Jet Global

MAY 9, 2024

The key components of a data pipeline are typically: Data Sources : The origin of the data, such as a relational database , data warehouse, data lake , file, API, or other data store. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

Revolutionizing data management: Trends driving security, scalability, and governance in 2025

CIO Business Intelligence

JANUARY 30, 2025

Cloud-native data lakes and warehouses simplify analytics by integrating structured and unstructured data. Enhanced interoperability between tools enables seamless data sharing and collaborative decision-making across teams.

Management

Management Data-driven Data Governance Unstructured Data

Is Your Data Catalog Ready for the AI Age?

BI-Survey

FEBRUARY 27, 2025

Leading-edge: Does it provide data quality or anomaly detection features to enrich metadata with quality metrics and insights, proactively identifying potential issues? It enhances Gen AI model accuracy with richer training datasets and provides more contextual insights for applications using unstructured data at inference time.

Unstructured Data

Unstructured Data Metadata Data Quality Data Governance

Prioritizing AI investments: Balancing short-term gains with long-term vision

CIO Business Intelligence

FEBRUARY 18, 2025

Consider a simple use case example like email marketing where an agent can devise a plan that executes tasks across enterprise systems to access structured and unstructured data, transactional systems, APIs and document management systems. Agentic AI is here to stay and will gain tremendous momentum in 2024.

Machine Learning

Machine Learning Data Quality Enterprise Sales

Your data’s wasted without predictive AI. Here’s how to fix that

CIO Business Intelligence

MAY 6, 2025

Customer data in Salesforce, product usage data in Snowflake and financials in Oracle none integrated Regional systems using different naming conventions and field formats This fragmentation leads to inconsistent definitions, duplication of work and multiple versions of the truth. Frameworks for measuring ROI The good news?

Prescriptive Analytics

Prescriptive Analytics Predictive Analytics Descriptive Analytics ROI

Data Leaders Brief

Choosing an open table format for your transactional data lake on AWS

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Webinars

Trending Sources

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Webinars

Data governance in the age of generative AI

Top analytics announcements of AWS re:Invent 2024

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Exploring real-time streaming for generative AI Applications

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Data’s dark secret: Why poor quality cripples AI and growth

Data science vs data analytics: Unpacking the differences

Get maximum value out of your cloud data warehouse with Amazon Redshift

A hybrid approach in healthcare data warehousing with Amazon Redshift

Advancing AI: The emergence of a modern information lifecycle

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Building Better Data Models to Unlock Next-Level Intelligence

A Guide to Data Analytics in the Travel Industry

What is a Data Pipeline?

Revolutionizing data management: Trends driving security, scalability, and governance in 2025

Is Your Data Catalog Ready for the AI Age?

Prioritizing AI investments: Balancing short-term gains with long-term vision

Your data’s wasted without predictive AI. Here’s how to fix that

Stay Connected