Data Lake, Events and Unstructured Data

United Airlines sets its flight plan for gen AI success

CIO Business Intelligence

DECEMBER 20, 2024

As part of its storytelling ethos, the flight-status LLM will specify, for example, which precise weather event may be affecting a delayed flight and provide quick and useful information to customers about next actions.

IT

IT Unstructured Data Experimentation Data Lake

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Initially, data warehouses were the go-to solution for structured data and analytical workloads but were limited by proprietary storage formats and their inability to handle unstructured data. For a table that will be converted, it invokes the converter Lambda function through an event.

Metadata

Metadata Data Lake Snapshot Data Warehouse

The Increasing Importance of Open Table Formats

David Menninger's Analyst Perspectives

OCTOBER 31, 2024

I previously wrote about the importance of open table formats to the evolution of data lakes into data lakehouses. The concept of the data lake was initially proposed as a single environment where data could be combined from multiple sources to be stored and processed to enable analysis by multiple users for multiple purposes.

Data Lake

Data Lake Unstructured Data Data Warehouse Software

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Enrich your serverless data lake with Amazon Bedrock

AWS Big Data

SEPTEMBER 26, 2024

Organizations are collecting and storing vast amounts of structured and unstructured data like reports, whitepapers, and research documents. By consolidating this information, analysts can discover and integrate data from across the organization, creating valuable data products based on a unified dataset.

Data Lake

Data Lake Cost-Benefit Unstructured Data Modeling

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

AWS Big Data

JULY 31, 2024

In the current industry landscape, data lakes have become a cornerstone of modern data architecture, serving as repositories for vast amounts of structured and unstructured data. Maintaining data consistency and integrity across distributed data lakes is crucial for decision-making and analytics.

Data Lake

Data Lake Marketing Data Processing Management

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

This premier event showcased groundbreaking advancements, keynotes from AWS leadership, hands-on technical sessions, and exciting product launches. Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights.

Analytics

Analytics Data Lake Metadata Data Warehouse

Understanding Structured and Unstructured Data

Sisense

APRIL 26, 2020

Different types of information are more suited to being stored in a structured or unstructured format. Read on to explore more about structured vs unstructured data, why the difference between structured and unstructured data matters, and how cloud data warehouses deal with them both. Unstructured data.

Unstructured Data

Unstructured Data Data Warehouse Structured Data Data mining

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

With the rapid growth of technology, more and more data volume is coming in many different formats—structured, semi-structured, and unstructured. Data analytics on operational data at near-real time is becoming a common need. Then we can query the data with Amazon Athena visualize it in Amazon QuickSight.

Data Lake

Data Lake Visualization Dashboards Insurance

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Cloudera

AUGUST 13, 2021

Every enterprise is trying to collect and analyze data to get better insights into their business. Whether it is consuming log files, sensor metrics, and other unstructured data, most enterprises manage and deliver data to the data lake and leverage various applications like ETL tools, search engines, and databases for analysis.

Analytics

Analytics Data Lake Unstructured Data Enterprise

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

For instance, a Data Cloud-triggered flow could update an account manager in Slack when shipments in an external data lake are marked as delayed. Sharing Customer 360 insights back without data replication. Currently, Data Cloud leverages live SQL queries to access data from external data platforms via zero copy.

Data Integration

Data Integration Data Lake Data Warehouse Metadata

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

For example, in a chatbot, data events could pertain to an inventory of flights and hotels or price changes that are constantly ingested to a streaming storage engine. Furthermore, data events are filtered, enriched, and transformed to a consumable format using a stream processor.

Data Lake

Data Lake Unstructured Data Management Snapshot

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

Azure Data Explorer is used to store and query data in services such as Microsoft Purview, Microsoft Defender for Endpoint, Microsoft Sentinel, and Log Analytics in Azure Monitor. Azure Data Lake Analytics. Data warehouses are designed for questions you already know you want to ask about your data, again and again.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Inflexible schema, poor for unstructured or real-time data. Data lake Raw storage for all types of structured and unstructured data. Low cost, flexibility, captures diverse data sources. Easy to lose control, risk of becoming a data swamp. Exploratory analytics, raw and diverse data types.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

AWS Big Data

JULY 3, 2023

Terminology Let’s first discuss some of the terminology used in this post: Research data lake on Amazon S3 – A data lake is a large, centralized repository that allows you to manage all your structured and unstructured data at any scale. This is where the tagging feature in Apache Iceberg comes in handy.

Snapshot

Snapshot Data Lake Testing Strategy

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

It aims to provide a framework to create low-latency streaming applications on the AWS Cloud using Amazon Kinesis Data Streams and AWS purpose-built data analytics services. In this post, we will review the common architectural patterns of two use cases: Time Series Data Analysis and Event Driven Microservices.

Analytics

Analytics IoT Data-driven Snapshot

Real estate CIOs drive deals with data

CIO Business Intelligence

JULY 26, 2023

The only thing we have on premise, I believe, is a data server with a bunch of unstructured data on it for our legal team,” says Grady Ligon, who was named Re/Max’s first CIO in October 2022. Finally, the IT team developed a digital market center that offers event management as well as training and education content.

Data Lake

Data Lake Digital Transformation Machine Learning Data Architecture

Access Amazon Athena in your applications using the WebSocket API

AWS Big Data

MARCH 2, 2023

Many organizations are building data lakes to store and analyze large volumes of structured, semi-structured, and unstructured data. In addition, many teams are moving towards a data mesh architecture, which requires them to expose their data sets as easily consumable data products.

Data Lake

Data Lake Testing Interactive Unstructured Data

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

The Madness of Data (and analytics) Governance

Andrew White

DECEMBER 9, 2019

Scope could be: Data (i.e. Information (processed data). Records (files, or what you might all unstructured data). Events or transactions. Analytical stewardship is a missing link in analytics, BI and data science. Downstream in the analytics pipeline. Analytic (the analytics itself). Images (i.e.

Analytics

Analytics Data Lake Data Governance Data Warehouse

Data science vs data analytics: Unpacking the differences

IBM Big Data Hub

SEPTEMBER 19, 2023

Data science is an area of expertise that combines many disciplines such as mathematics, computer science, software engineering and statistics. It focuses on data collection and management of large-scale structured and unstructured data for various academic and business applications.

Data Science

Data Science Data Analytics Prescriptive Analytics Analytics

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

July brings summer vacations, holiday gatherings, and for the first time in two years, the return of the Massachusetts Institute of Technology (MIT) Chief Data Officer symposium as an in-person event. A key area of focus for the symposium this year was the design and deployment of modern data platforms.

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

Five Strategies to Accelerate Data Product Development

Cloudera

JULY 26, 2021

The alleviation of infrastructure and computational constraints associated with solely on-premises data platforms; Data Products can now use different deployment models (e.g., The proliferation of real-time processing by deploying event-driven architectures (e.g., data warehousing). Deep Java Learning, Apache Spark 3.x,

Strategy

Strategy Data Science Unstructured Data Marketing

Celebrating Data Superheroes: The 2021 Data Impact Awards Winners

Cloudera

NOVEMBER 18, 2021

By adopting a custom developed application based on the Cloudera ecosystem, Carrefour has combined the legacy systems into one platform which provides access to customer data in a single data lake. transactions per day and processing information at a rate of 1k events per second. Cloud Innovation.

Data Lake

Data Lake Cost-Benefit Digital Transformation Risk

Infuse Actionable Intelligence into Your Product with AWS Lake House

Sisense

JULY 6, 2021

To drive this point home, Yonatan Dolan, an Analytics Specialist from AWS, introduced AWS’ new Lake House architecture. This cutting-edge service integrates the abilities of a data lake, a data warehouse, and purpose-built stores, to enable unified governance and easy data movement.

Data Lake

Data Lake Data Warehouse Data-driven Unstructured Data

Cross-Functional Trade Surveillance

Cloudera

MAY 16, 2018

This approach also relates to monitoring internal fiduciary risk by tying separate events together, such as a large position (relative to historic norms) being taken immediately after the risk model that would have flagged it was modified in a separate system. Market data: Coordinated trading among multiple parties.

Data Lake

Data Lake Risk Visualization Unstructured Data

A hybrid approach in healthcare data warehousing with Amazon Redshift

AWS Big Data

FEBRUARY 21, 2023

At the heart of all data warehousing is integration, and this layer contains integrated data from multiple sources built around the enterprise-wide business keys. Although data lakes resemble data vaults, a data vault provides more features of a data warehouse.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Metadata

What Is Data Modernization? 5 Benefits Worth Knowing

Alation

APRIL 19, 2022

Data modernization is the process of transferring data to modern cloud-based databases from outdated or siloed legacy databases, including structured and unstructured data. In that sense, data modernization is synonymous with cloud migration. Efficient Data Processing. Enhanced Accessibility.

Cost-Benefit

Cost-Benefit Data Governance Manufacturing Data Architecture

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

The reasons for this are simple: Before you can start analyzing data, huge datasets like data lakes must be modeled or transformed to be usable. According to a recent survey conducted by IDC , 43% of respondents were drawing intelligence from 10 to 30 data sources in 2020, with a jump to 64% in 2021!

Modeling

Modeling Big Data IoT Data Warehouse

It’s not your data. It’s how you use it. Unlock the power of data & build foundations of a data driven organisation

CIO Business Intelligence

MAY 24, 2022

Today transactional data is the largest segment, which includes streaming and data flows. EXTRACTING VALUE FROM DATA. One of the biggest challenges presented by having massive volumes of disparate unstructured data is extracting useable information and insights.

Data-driven

Data-driven Data Lake Data Warehouse Machine Learning

What is a Data Pipeline?

Jet Global

MAY 9, 2024

The key components of a data pipeline are typically: Data Sources : The origin of the data, such as a relational database , data warehouse, data lake , file, API, or other data store. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

Remodel Your Oracle Cloud Data with a Data Lakehouse

Jet Global

NOVEMBER 21, 2023

Continued global digitalization is creating huge quantities of data for modern organizations. To have any hope of generating value from growing data sets, enterprise organizations must turn to the latest technology. Since then, technology has improved in leaps and bounds and data management has become more complicated.

Data Lake

Data Lake Data Warehouse Reporting Enterprise

Unlocking Trino’s Full Potential With Simba Drivers for BI & ETL

Jet Global

OCTOBER 1, 2024

Trino allows users to run ad hoc queries across massive datasets, making real-time decision-making a reality without needing extensive data transformations. This is particularly valuable for teams that require instant answers from their data. Data Lake Analytics: Trino doesn’t just stop at databases. Privacy Policy.

Dashboards

Dashboards Data Lake Reporting Cost-Benefit

Databricks Scores Massive Funding Round, Continues to Expand Its Offerings

David Menninger's Analyst Perspectives

JANUARY 29, 2025

Over time, the worlds of data lakes and data warehouses collided. Databricks introduced the concept of a data lakehouse , adding Databricks SQL as well as open table formats. Databricks was also rated Exemplary in our Data Intelligence , Data Integration and Data Governance Buyers Guides.

IT

IT Dashboards Unstructured Data Big Data

Key takeaways for CIOs from AWS re:Invent 2024

CIO Business Intelligence

DECEMBER 9, 2024

In addition to technical advancements, the event highlighted strategic initiatives that resonate with CIOs, including cost optimization, workflow efficiency, and accelerated AI application development. AWSs annual re:Invent developer conference concluded last week.

Metadata

Metadata Unstructured Data Data Lake Data-driven

Revolutionizing data management: Trends driving security, scalability, and governance in 2025

CIO Business Intelligence

JANUARY 30, 2025

Real-time data integration at scale Real-time data integration is crucial for businesses like e-commerce and finance, where speed is critical. In the years to come, advancements in event-driven architectures and technologies like change data capture (CDC) will enable seamless data synchronization across systems with minimal lag.

Management

Management Data-driven Data Governance Unstructured Data

United Airlines sets its flight plan for gen AI success

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Trending Sources

The Increasing Importance of Open Table Formats

Webinars

Enrich your serverless data lake with Amazon Bedrock

Choosing an open table format for your transactional data lake on AWS

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

Top analytics announcements of AWS re:Invent 2024

Understanding Structured and Unstructured Data

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Salesforce debuts Zero Copy Partner Network to ease data integration

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Exploring real-time streaming for generative AI Applications

7 key Microsoft Azure analytics services (plus one extra)

Data’s dark secret: Why poor quality cripples AI and growth

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Real estate CIOs drive deals with data

Access Amazon Athena in your applications using the WebSocket API

How Cloudera Data Flow Enables Successful Data Mesh Architectures

The Madness of Data (and analytics) Governance

Data science vs data analytics: Unpacking the differences

Demystifying Modern Data Platforms

Five Strategies to Accelerate Data Product Development

Celebrating Data Superheroes: The 2021 Data Impact Awards Winners

Infuse Actionable Intelligence into Your Product with AWS Lake House

Cross-Functional Trade Surveillance

A hybrid approach in healthcare data warehousing with Amazon Redshift

What Is Data Modernization? 5 Benefits Worth Knowing

Building Better Data Models to Unlock Next-Level Intelligence

It’s not your data. It’s how you use it. Unlock the power of data & build foundations of a data driven organisation

What is a Data Pipeline?

Remodel Your Oracle Cloud Data with a Data Lakehouse

Unlocking Trino’s Full Potential With Simba Drivers for BI & ETL

Databricks Scores Massive Funding Round, Continues to Expand Its Offerings

Key takeaways for CIOs from AWS re:Invent 2024

Revolutionizing data management: Trends driving security, scalability, and governance in 2025

Stay Connected