Data Lake, Data Strategy and Metadata

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Data Lake

Data Lake Data Processing Optimization Machine Learning

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

A modern data strategy redefines and enables sharing data across the enterprise and allows for both reading and writing of a singular instance of the data using an open table format. When evolving such a partition definition, the data in the table prior to the change is unaffected, as is its metadata.

Data Lake

Data Lake Metadata Snapshot Analytics

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

To achieve this, they aimed to break down data silos and centralize data from various business units and countries into the BMW Cloud Data Hub (CDH). This led to inefficiencies in data governance and access control.

Data Lake

Data Lake Sales Metadata Machine Learning

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

Under the hood, UniForm generates Iceberg metadata files (including metadata and manifest files) that are required for Iceberg clients to access the underlying data files in Delta Lake tables. Both Delta Lake and Iceberg metadata files reference the same data files.

Metadata

Metadata Data Warehouse Big Data Data Lake

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

Open table formats are emerging in the rapidly evolving domain of big data management, fundamentally altering the landscape of data storage and analysis. By providing a standardized framework for data representation, open table formats break down data silos, enhance data quality, and accelerate analytics at scale.

Snapshot

Snapshot Metadata Data Lake Optimization

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

This post explores how the shift to a data product mindset is being implemented, the challenges faced, and the early wins that are shaping the future of data management in the Institutional Division. The following diagram illustrates the building blocks of the Institutional Data & AI Platform.

Metadata

Metadata Data Governance Data Quality Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

But most important of all, the assumed dormant value in the unstructured data is a question mark, which can only be answered after these sophisticated techniques have been applied. Therefore, there is a need to being able to analyze and extract value from the data economically and flexibly. The solution integrates data in three tiers.

Unstructured Data

Unstructured Data Metadata Management Analytics

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data. Then, you transform this data into a concise format.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

AWS Big Data

SEPTEMBER 12, 2024

For Shared database’s region , choose the Data Catalog view source Region. The Shared database and Shared database’s owner ID fields are populated manually from the database metadata. The resource link appears on the Databases page on the Lake Formation console, as shown in the following screenshot.

Data Lake

Data Lake Analytics Data-driven Data Strategy

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architect Armando Vázquez identifies eight common types of data architects: Enterprise data architect: These data architects oversee an organization’s overall data architecture, defining data architecture strategy and designing and implementing architectures.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Achieve your AI goals with an open data lakehouse approach

IBM Big Data Hub

OCTOBER 4, 2023

Artificial intelligence (AI) is now at the forefront of how enterprises work with data to help reinvent operations, improve customer experiences, and maintain a competitive advantage. It’s no longer a nice-to-have, but an integral part of a successful data strategy. All of this supports the use of AI.

Data Lake

Data Lake Metadata Data Warehouse Cost-Benefit

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Data Strategies for Getting Greater Business Value from Distributed Data

Data Virtualization

MAY 19, 2023

Reading Time: 11 minutes The post Data Strategies for Getting Greater Business Value from Distributed Data appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Data Strategy

Data Strategy Strategy Data Integration Management

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

Consumers prioritized data discoverability, fast data access, low latency, and high accuracy of data. These inputs reinforced the need of a unified data strategy across the FinOps teams. We decided to build a scalable data management product that is based on the best practices of modern data architecture.

Finance

Finance Metadata Big Data Recreation/Entertainment

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

The business end-users were given a tool to discover data assets produced within the mesh and seamlessly self-serve on their data sharing needs. The integration of Databricks Delta tables into Amazon DataZone is done using the AWS Glue Data Catalog. The following figure illustrates the data mesh architecture.

Data Governance

Data Governance Publishing Data-driven Metadata

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes.

Optimization

Optimization Forecasting Data Lake Metadata

Trends in Data Management and Analytics

TDAN

MARCH 19, 2019

Various databases, plus one or more data warehouses, have been the state-of-the art data management infrastructure in companies for years. The emergence of various new concepts, technologies, and applications such as Hadoop, Tableau, R, Power BI, or Data Lakes indicate that changes are under way.

Management

Management Data Lake Data Warehouse Analytics

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

The first generation of data architectures represented by enterprise data warehouse and business intelligence platforms were characterized by thousands of ETL jobs, tables, and reports that only a small group of specialized data engineers understood, resulting in an under-realized positive impact on the business.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

AWS Big Data

JULY 28, 2023

By creating visual representations of data flows, organizations can gain a clear understanding of the lifecycle of personal data and identify potential vulnerabilities or compliance gaps. Note that putting a comprehensive data strategy in place is not in scope for this post.

Snapshot

Snapshot Metadata Measurement Data Warehouse

Overcome these six data consumption challenges for a more data-driven enterprise

IBM Big Data Hub

JUNE 8, 2022

Implementing the right data strategy spurs innovation and outstanding business outcomes by recognizing data as a critical asset that provides insights for better and more informed decision-making. Integrating data across this hybrid ecosystem can be time consuming and expensive. The volume of data assets.

Data-driven

Data-driven Enterprise Data Governance Data Lake

Case study: Policy Enforcement Automation With Semantics

Ontotext

MAY 2, 2024

They are expected to understand the entire data landscape and generate business-moving insights while facing the voracious needs of different teams and the constraints of technology architecture and compliance. Evolution of data approaches The data strategies we’ve had so far have led to a lot of challenges and pain points.

Metadata

Metadata Data Lake Data-driven Enterprise

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

By leveraging data services and APIs, a data fabric can also pull together data from legacy systems, data lakes, data warehouses and SQL databases, providing a holistic view into business performance. It uses knowledge graphs, semantics and AI/ML technology to discover patterns in various types of metadata.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

Today, the brightest minds in our industry are targeting the massive proliferation of data volumes and the accompanying but hard-to-find value locked within all that data. We chatted about industry trends, why decentralization has become a hot topic in the data world, and how metadata drives many data-centric use cases.

Metadata

Metadata Data Warehouse Data Quality Data Lake

AWS re:Invent Recap: The Future of Cloud

Alation

DECEMBER 14, 2021

How do you provide access and connect the right people to the right data? AWS has created a way to manage policies and access, but this is only for data lake formation. What about other data sources? In summary, AWS powers next-generation analytics with the best of both data lakes and purpose-built data stores.

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

Data governance shows up as the fourth-most-popular kind of solution that enterprise teams were adopting or evaluating during 2019. That’s a lot of priorities – especially when you group together closely related items such as data lineage and metadata management which rank nearby. in lieu of simply landing in a data lake.

Machine Learning

Machine Learning Data Governance Metadata Data Science

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

Cloudera

AUGUST 21, 2020

Data-in-motion is predominantly about streaming data so enterprises typically have two different ways or binary ways of looking at data. The governance aspect is perhaps even more important and businesses need to be able to understand where the data comes from.

Enterprise

Enterprise Data Lake Strategy Metadata

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

This involves unifying and sharing a single copy of data and metadata across IBM® watsonx.data ™, IBM® Db2 ®, IBM® Db2® Warehouse and IBM® Netezza ®, using native integrations and supporting open formats, all without the need for migration or recataloging.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Join the Alation MLDC World Tour!

Alation

FEBRUARY 20, 2020

The next stops on the MLDC World Tour include Data Transparency in Washington, Gartner Symposium/ITxpo in Orlando, Teradata Analytics Universe in Las Vegas, Tableau in New Orleans, Big Data LDN in London, TDWI in Orlando and Forrester Data Strategy & Insights in Orlando, again. Data Catalogs Are the New Black.

Machine Learning

Machine Learning Metadata Reporting Data-driven

A Guide to Data Analytics in the Travel Industry

Alation

MARCH 21, 2023

When it embarked on a digital transformation and modernization initiative in 2018, the company migrated all its data to AWS S3 Data Lake and Snowflake Data Cloud to provide accessibility to data to all users. Using Alation, ARC automated the data curation and cataloging process. “So

Data Analytics

Data Analytics Analytics Data-driven Big Data

The CDO Imperative: From Process Centric to data-driven

Alation

FEBRUARY 20, 2020

As data initiatives mature, the Alation data catalog is becoming central to an expanding set of use cases. Governing Data Lakes to Find Opportunities for Customers. At Munich Re, our data strategy is geared to offer new and better risk-related services to our customers.

Data-driven

Data-driven Internet of Things Data Lake Strategy

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Ontotext

MARCH 8, 2023

Rich metadata and semantic modeling continue to drive the matching of 50K training materials to specific curricula, leading new, data-driven, audience-based marketing efforts that demonstrate how the recommender service is achieving increased engagement and performance from over 2.3 million users.

Enterprise

Enterprise Knowledge Discovery Risk Machine Learning

Tales & Tips from the Trenches: Data Catalogs are a Landmark

TDAN

AUGUST 4, 2020

451 Research begins its paper on the “unstoppable rise of the Data Catalog” with the following: Could the data catalog be the most important data management breakthrough to have emerged in the last decade? 1] The data catalog is indeed the […].

Management

Management Analytics IT Data Lake

Navigating the New Data Landscape: Trends and Opportunities

Data Virtualization

JUNE 19, 2024

Reading Time: 5 minutes The data landscape has evolved and become more complex as organizations recognize the need to leverage data and analytics. Generative artificial intelligence has further put pressure on organizations to manage this complexity. At TDWI, we see companies collecting traditional structured.

Data Integration

Data Integration Management Analytics Data Architecture

Choosing a Data Catalog: Data Map or Data Delivery App?

Data Virtualization

NOVEMBER 17, 2022

Reading Time: 5 minutes Today, many applications call themselves “data catalogs.” The idea seems, on the face of it, easy to understand: a data catalog is simply a centralized inventory of the data assets within an organization. Data catalogs also seek to be the.

Data Integration

Data Integration Management Data Lake IT

Data Swamp, Data Lake, Data Lakehouse: What to Know

Alation

OCTOBER 21, 2021

Data Swamp vs Data Lake. When you imagine a lake, it’s likely an idyllic image of a tree-ringed body of reflective water amid singing birds and dabbling ducks. I’ll take the lake, thank you very much. Many organizations have built a data lake to solve their data storage, access, and utilization challenges.

Data Lake

Data Lake Metadata Data Warehouse Data Governance

How Cloudera Supports Zero Trust for Data

Cloudera

JUNE 7, 2023

The revised ZTMM is organized by five categories or pillars: identity, devices, networks, applications and workloads, and data, and four levels of maturity: traditional, initial, advanced, and optimal. It operates independently from compute and storage layers, offering integrated security and governance based on metadata.

Metadata

Metadata Data Lake Optimization Modeling

Revolutionizing data management: Trends driving security, scalability, and governance in 2025

CIO Business Intelligence

JANUARY 30, 2025

In the upcoming years, augmented data management solutions will drive efficiency and accuracy across multiple domains, from data cataloguing to anomaly detection. AI-driven platforms process vast datasets to identify patterns, automating tasks like metadata tagging, schema creation and data lineage mapping.

Management

Management Data-driven Data Governance Unstructured Data

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Webinars

Trending Sources

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Webinars

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Data’s dark secret: Why poor quality cripples AI and growth

Top analytics announcements of AWS re:Invent 2024

Unstructured data management and governance using AWS AI/ML and analytics services

Create an end-to-end data strategy for Customer 360 on AWS

Data governance in the age of generative AI

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

What is a data architect? Skills, salaries, and how to become a data framework master

Achieve your AI goals with an open data lakehouse approach

AWS Lake Formation 2022 year in review

Data Strategies for Getting Greater Business Value from Distributed Data

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

HEMA accelerates their data governance journey with Amazon DataZone

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Trends in Data Management and Analytics

Data architecture strategy for data quality

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

Overcome these six data consumption challenges for a more data-driven enterprise

Case study: Policy Enforcement Automation With Semantics

Data democratization: How data architecture can drive business decisions and AI initiatives

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

AWS re:Invent Recap: The Future of Cloud

Themes and Conferences per Pacoid, Episode 8

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

Tackling AI’s data challenges with IBM databases on AWS

Join the Alation MLDC World Tour!

A Guide to Data Analytics in the Travel Industry

The CDO Imperative: From Process Centric to data-driven

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Tales & Tips from the Trenches: Data Catalogs are a Landmark

Navigating the New Data Landscape: Trends and Opportunities

Choosing a Data Catalog: Data Map or Data Delivery App?

Data Swamp, Data Lake, Data Lakehouse: What to Know

How Cloudera Supports Zero Trust for Data

Revolutionizing data management: Trends driving security, scalability, and governance in 2025

Stay Connected