Book, Data Lake and Metadata - Data Leaders Brief

Book

Data Lake

Metadata

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Data Lake

Data Lake Data Processing Optimization Machine Learning

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

Under the hood, UniForm generates Iceberg metadata files (including metadata and manifest files) that are required for Iceberg clients to access the underlying data files in Delta Lake tables. Both Delta Lake and Iceberg metadata files reference the same data files.

Metadata

Metadata Data Warehouse Big Data Data Lake

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Trending Sources

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

In this post, we focus on data management implementation options such as accessing data directly in Amazon Simple Storage Service (Amazon S3), using popular data formats like Parquet, or using open table formats like Iceberg. Data management is the foundation of quantitative research.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

This approach simplifies your data journey and helps you meet your security requirements. The SageMaker Lakehouse data connection testing capability boosts your confidence in established connections. Next, you will query the data in this table using SageMaker Unified Studios SQL query book feature. Choose Save changes.

Visualization

Visualization Data Processing Testing Publishing

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

But most important of all, the assumed dormant value in the unstructured data is a question mark, which can only be answered after these sophisticated techniques have been applied. Therefore, there is a need to being able to analyze and extract value from the data economically and flexibly. The solution integrates data in three tiers.

Unstructured Data

Unstructured Data Metadata Management Analytics

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To bring their customers the best deals and user experience, smava follows the modern data architecture principles with a data lake as a scalable, durable data store and purpose-built data stores for analytical processing and data consumption.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Enhance query performance using AWS Glue Data Catalog column-level statistics

AWS Big Data

NOVEMBER 22, 2023

Data lakes are designed for storing vast amounts of raw, unstructured, or semi-structured data at a low cost, and organizations share those datasets across multiple departments and teams. The queries on these large datasets read vast amounts of data and can perform complex join operations on multiple datasets.

Statistics

Statistics Data Lake Optimization Data-driven

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

This dynamic integration of streaming data enables generative AI applications to respond promptly to changing conditions, improving their adaptability and overall performance in various tasks. To better understand this, imagine a chatbot that helps travelers book their travel.

Data Lake

Data Lake Unstructured Data Management Snapshot

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

You have a specific book in mind, but you have no idea where to find it. You enter the title of the book into the computer and the library’s digital inventory system tells you the exact section and aisle where the book is located. It uses metadata and data management tools to organize all data assets within your organization.

Metadata

Metadata Data Quality Data-driven Data Governance

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

Stream Processing – An application created with Amazon Managed Service for Apache Flink can read the records from the data stream to detect and clean any errors in the time series data and enrich the data with specific metadata to optimize operational analytics.

Analytics

Analytics IoT Data-driven Snapshot

PODCAST: Making AI Real – Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities

bridgei2i

MARCH 3, 2021

Curious to know, like, what keeps you busy apart from data, lakes and technologies, what we just discussed? Prinkan: So I spend quite a lot of time reading books of different kinds as it gives me you know, different perspectives. I think, all said about the professional hustle we have been discussing about.

Enterprise

Enterprise Digital Transformation Data-driven Interactive

A Guide to Data Analytics in the Travel Industry

Alation

MARCH 21, 2023

Why is data analytics important for travel organizations? When it embarked on a digital transformation and modernization initiative in 2018, the company migrated all its data to AWS S3 Data Lake and Snowflake Data Cloud to provide accessibility to data to all users.

Data Analytics

Data Analytics Analytics Data-driven Big Data

Themes and Conferences per Pacoid, Episode 12

Domino Data Lab

AUGUST 8, 2019

Then down the long hallway that led into the Google X auditorium, there were stacks of amazing books written by notable scientists we were going to spend a weekend with in close quarters: talking, eating, debating, ideating, drinking, laughing, exchanging, planning how we collaborate going forward, etc. Nothing Spreads Like Fear”.

Data Science

Data Science Machine Learning Data Governance Statistics

Data Governance for Dummies: Your Questions, Answered

Alation

FEBRUARY 17, 2023

In this blog, I’ll address some of the questions we did not have time to answer live, pulling from both Dr. Reichental’s book as well as my own experience as a data governance leader for 30+ years. Can you have proper data management without establishing a formal data governance program? Where do you govern?

Data Governance

Data Governance Data Quality Metadata Cost-Benefit

Do the Benefits of Cloud Outweigh the Costs?

Jet Global

SEPTEMBER 19, 2023

What are the best practices for analyzing cloud ERP data? Data Management How do we create a data warehouse or data lake in the cloud using our cloud ERP? How do I access the legacy data from my previous ERP? Self-service BI How can we rapidly build BI reports on cloud ERP data without any help from IT?

Cost-Benefit

Cost-Benefit Data Warehouse Reporting Enterprise

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

Source-to-target mapping integration tasks vary in complexity, depending on data hierarchy and structure. Business applications use metadata and semantic rules to ensure seamless data transfer without loss. Next, identify the data sources that will be involved in the mapping.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Accelerate queries on Apache Iceberg tables through AWS Glue auto compaction

AWS Big Data

DECEMBER 19, 2024

Data lakes were originally designed to store large volumes of raw, unstructured, or semi-structured data at a low cost, primarily serving big data and analytics use cases. By using features like Icebergs compaction, OTFs streamline maintenance, making it straightforward to manage object and metadata versioning at scale.

Data Lake

Data Lake IoT Metadata Testing

Access your existing data and resources through Amazon SageMaker Unified Studio, Part 1: AWS Glue Data Catalog and Amazon Redshift

AWS Big Data

APRIL 28, 2025

Use existing AWS Glue tables This section has following prerequisites: A data lake administrator user by following Create a data lake administrator. For detailed instruction see Revoking permission using the Lake Formation console. Choose AWS Glue (Lakehouse) for Data source type.

Metadata

Metadata Data Lake Big Data Publishing

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Webinars

Trending Sources

Build a high-performance quant research platform with Apache Iceberg

Webinars

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Top analytics announcements of AWS re:Invent 2024

Unstructured data management and governance using AWS AI/ML and analytics services

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Enhance query performance using AWS Glue Data Catalog column-level statistics

Exploring real-time streaming for generative AI Applications

Five benefits of a data catalog

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

PODCAST: Making AI Real – Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities

A Guide to Data Analytics in the Travel Industry

Themes and Conferences per Pacoid, Episode 12

Data Governance for Dummies: Your Questions, Answered

Do the Benefits of Cloud Outweigh the Costs?

What is Data Mapping?

Accelerate queries on Apache Iceberg tables through AWS Glue auto compaction

Access your existing data and resources through Amazon SageMaker Unified Studio, Part 1: AWS Glue Data Catalog and Amazon Redshift

Stay Connected