Data Lake, IoT and Metadata - Data Leaders Brief

Data Lake

IoT

Metadata

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. Recently, EUROGATE has developed a digital twin for its container terminal Hamburg (CTH), generating millions of data points every second from Internet of Things (IoT)devices attached to its container handling equipment (CHE).

IoT

IoT Machine Learning Metadata Data-driven

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

AWS Big Data

SEPTEMBER 10, 2024

We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services. In such scenarios, data engineers face challenges in connecting and extracting data from storage containers on Microsoft Azure.

Data Lake

Data Lake Metadata Management Software

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake. What’s in a Data Lake? Data warehouses do a great job of standardizing data from disparate sources for analysis. Taking a Dip.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

AWS Big Data

OCTOBER 3, 2023

In our previous post Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes , we discussed how you can implement solutions to improve operational efficiencies of your Amazon Simple Storage Service (Amazon S3) data lake that is using the Apache Iceberg open table format and running on the Amazon EMR big data platform.

Optimization

Optimization Snapshot Data Lake Metadata

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

This approach simplifies your data journey and helps you meet your security requirements. The SageMaker Lakehouse data connection testing capability boosts your confidence in established connections. About the Authors Chiho Sugimoto is a Cloud Support Engineer on the AWS Big Data Support team.

Visualization

Visualization Data Processing Testing Publishing

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

A data hub contains data at multiple levels of granularity and is often not integrated. It differs from a data lake by offering data that is pre-validated and standardized, allowing for simpler consumption by users. Data hubs and data lakes can coexist in an organization, complementing each other.

Analytics

Analytics Data Warehouse Data Lake Metadata

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

Cargotec captures terabytes of IoT telemetry data from their machinery operated by numerous customers across the globe. This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization.

Metadata

Metadata Data Lake Machine Learning Big Data

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

In the subsequent post in our series, we will explore the architectural patterns in building streaming pipelines for real-time BI dashboards, contact center agent, ledger data, personalized real-time recommendation, log analytics, IoT data, Change Data Capture, and real-time marketing data.

Analytics

Analytics IoT Data-driven Snapshot

Operational Database Security – Part 2

Cloudera

SEPTEMBER 23, 2020

Access audits are mastered centrally in Apache Ranger which provides comprehensive non-repudiable audit log for every access event to every resource with rich access event metadata such as: IP. Both fine-grained access control of database objects and access to metadata is provided. Sensitive data identification.

Data Lake

Data Lake Metadata IoT Enterprise

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Modernizing Data Architectures

Data Virtualization

AUGUST 26, 2020

Recently, we have seen the rise of new technologies like big data, the Internet of things (IoT), and data lakes. But we have not seen many developments in the way that data gets delivered. Modernizing the data infrastructure is the.

Data Architecture

Data Architecture Internet of Things Data Lake IoT

A hybrid approach in healthcare data warehousing with Amazon Redshift

AWS Big Data

FEBRUARY 21, 2023

At the heart of all data warehousing is integration, and this layer contains integrated data from multiple sources built around the enterprise-wide business keys. Although data lakes resemble data vaults, a data vault provides more features of a data warehouse. What is a hybrid model?

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Metadata

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. Data fabric promotes data discoverability.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Announcing the 2021 Data Impact Awards

Cloudera

MAY 12, 2021

This category is open to organizations that have tackled transformative business use cases by connecting multiple parts of the data lifecycle to enrich, report, serve, and predict. . DATA FOR ENTERPRISE AI. Industry Transformation: Telkomsel — Ingesting 25TB of data daily to provide advanced customer analytics in real-time .

Digital Transformation

Digital Transformation Machine Learning Optimization Data Lake

Why You Need a Data Catalog & How to Choose One

Octopai

MAY 30, 2019

At the most basic level, data catalogs help you organize your company’s massive datasets. Most enterprises have huge data lakes with millions of touchpoints all living in the dark. It’s not enough to simply store customer data in siloed systems; companies need to be able to locate specific metadata points when needed.

Metadata

Metadata Data Governance Data Lake IoT

Why We Started the Data Intelligence Project

Alation

JULY 7, 2022

In 2013 I joined American Family Insurance as a metadata analyst. I had always been fascinated by how people find, organize, and access information, so a metadata management role after school was a natural choice. The use cases for metadata are boundless, offering opportunities for innovation in every sector.

Metadata

Metadata Data-driven Insurance Statistics

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

AWS Big Data

JUNE 12, 2023

Organizations across the world are increasingly relying on streaming data, and there is a growing need for real-time data analytics, considering the growing velocity and volume of data being collected. For more information about checkpointing, see the appendix at the end of this post.

Management

Management Metadata Internet of Things Testing

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

In today’s AI/ML-driven world of data analytics, explainability needs a repository just as much as those doing the explaining need access to metadata, EG, information about the data being used. The Cloud Data Migration Challenge. Pushing data to a data lake and assuming it is ready for use is shortsighted.

Metadata

Metadata Data Governance Data-driven Modeling

Big Data Fabric Weaves Together Automation, Scalability, and Intelligence

Cloudera

JANUARY 22, 2019

Forrester describes Big Data Fabric as, “A unified, trusted, and comprehensive view of business data produced by orchestrating data sources automatically, intelligently, and securely, then preparing and processing them in big data platforms such as Hadoop and Apache Spark, data lakes, in-memory, and NoSQL.”.

Big Data

Big Data Data Lake Internet of Things Enterprise

A Few 2016 Technology Predictions

In(tegrate) the Clouds

DECEMBER 21, 2015

Aside from the Internet of Things, which of the following software areas will experience the most change in 2016 – big data solutions, analytics, security, customer success/experience, sales & marketing approach or something else? 2016 will be the year of the data lake.

Technology

Technology Internet of Things Digital Transformation Big Data

How to Build a Customer Centric Business: The Complete Guide

Alation

AUGUST 2, 2022

Customer centricity requires modernized data and IT infrastructures. Too often, companies manage data in spreadsheets or individual databases. This means that you’re likely missing valuable insights that could be gleaned from data lakes and data analytics. Data discovery was conducted 67% times faster.

Cost-Benefit

Cost-Benefit Metrics Strategy Data Lake

Accelerate queries on Apache Iceberg tables through AWS Glue auto compaction

AWS Big Data

DECEMBER 19, 2024

Data lakes were originally designed to store large volumes of raw, unstructured, or semi-structured data at a low cost, primarily serving big data and analytics use cases. By using features like Icebergs compaction, OTFs streamline maintenance, making it straightforward to manage object and metadata versioning at scale.

Data Lake

Data Lake IoT Metadata Testing

Achieve the best price-performance in Amazon Redshift with elastic histograms for selectivity estimation

AWS Big Data

OCTOBER 25, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. It also helps you securely access your data in operational databases, data lakes, or third-party datasets with minimal movement or copying of data.

Statistics

Statistics Data Warehouse Metadata Data Lake

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

AWS Big Data

NOVEMBER 6, 2024

Second, because traditional data warehousing approaches are unable to keep up with the volume, velocity, and variety of data, engineering teams are building data lakes and adopting open data formats such as Parquet and Apache Iceberg to store their data. b64decode(record['data']).decode('utf-8')

Metadata

Metadata Data Lake Management Internet of Things

How EUROGATE established a data mesh architecture using Amazon DataZone

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Webinars

Trending Sources

Data Lakes: What Are They and Who Needs Them?

Webinars

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

How Cargotec uses metadata replication to enable cross-account data sharing

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Operational Database Security – Part 2

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Modernizing Data Architectures

A hybrid approach in healthcare data warehousing with Amazon Redshift

Data platform trinity: Competitive or complementary?

Announcing the 2021 Data Impact Awards

Why You Need a Data Catalog & How to Choose One

Why We Started the Data Intelligence Project

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

The Cloud Connection: How Governance Supports Security

Big Data Fabric Weaves Together Automation, Scalability, and Intelligence

A Few 2016 Technology Predictions

How to Build a Customer Centric Business: The Complete Guide

Accelerate queries on Apache Iceberg tables through AWS Glue auto compaction

Achieve the best price-performance in Amazon Redshift with elastic histograms for selectivity estimation

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

Stay Connected