Data Governance, Data Lake and Publishing

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Data landscape in EUROGATE and current challenges faced in data governance The EUROGATE Group is a conglomerate of container terminals and service providers, providing container handling, intermodal transports, maintenance and repair, and seaworthy packaging services. Eliminate centralized bottlenecks and complex data pipelines.

IoT

IoT Machine Learning Metadata Data-driven

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

Over the years, organizations have invested in creating purpose-built, cloud-based data lakes that are siloed from one another. A major challenge is enabling cross-organization discovery and access to data across these multiple data lakes, each built on different technology stacks.

Data Lake

Data Lake Publishing Metadata Data-driven

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. Amazon DataZone natively supports data sharing for Amazon Redshift data assets.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Initially, the data inventories of different services were siloed within isolated environments, making data discovery and sharing across services manual and time-consuming for all teams involved. Implementing robust data governance is challenging. The following figure illustrates the data mesh architecture.

Data Governance

Data Governance Publishing Data-driven Metadata

2021 Gift Giving Guide for Data Nerds

DataKitchen

DECEMBER 7, 2021

This book is not available until January 2022, but considering all the hype around the data mesh, we expect it to be a best seller. In the book, author Zhamak Dehghani reveals that, despite the time, money, and effort poured into them, data warehouses and data lakes fail when applied at the scale and speed of today’s organizations.

Data-driven

Data-driven Data Governance Big Data Data Science

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

However, the initial version of CDH supported only coarse-grained access control to entire data assets, and hence it was not possible to scope access to data asset subsets. This led to inefficiencies in data governance and access control. After filter packages have been created and published, they can be requested.

Data Lake

Data Lake Sales Metadata Machine Learning

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

In this post, we delve into the key aspects of using Amazon EMR for modern data management, covering topics such as data governance, data mesh deployment, and streamlined data discovery. Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Under the federated mesh architecture, each divisional mesh functions as a node within the broader enterprise data mesh, maintaining a degree of autonomy in managing its data products. These nodes can implement analytical platforms like data lake houses, data warehouses, or data marts, all united by producing data products.

Metadata

Metadata Data Governance Data Quality Data-driven

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture.

Sales

Sales Data-driven Data Processing Key Performance Indicator

AWS Lake Formation 2023 year in review

AWS Big Data

JANUARY 18, 2024

AWS Lake Formation and the AWS Glue Data Catalog form an integral part of a data governance solution for data lakes built on Amazon Simple Storage Service (Amazon S3) with multiple AWS analytics services integrating with them. We realized that your use cases need more flexibility in data governance.

Data Lake

Data Lake Metadata Data Governance Statistics

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

AWS Big Data

MARCH 5, 2025

Solution overview This solution provides a streamlined way to enable cross-account data collaboration using Amazon DataZone domain association while maintaining security and governance. Data publishers : Users in producer AWS accounts. Data subscribers : Users in consumer AWS accounts.

Analytics

Analytics Publishing Metadata Sales

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

Data governance is the collection of policies, processes, and systems that organizations use to ensure the quality and appropriate handling of their data throughout its lifecycle for the purpose of generating business value.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Amazon DataZone announces custom blueprints for AWS services

AWS Big Data

JUNE 26, 2024

New feature: Custom AWS service blueprints Previously, Amazon DataZone provided default blueprints that created AWS resources required for data lake, data warehouse, and machine learning use cases. You can build projects and subscribe to both unstructured and structured data assets within the Amazon DataZone portal.

Data Lake

Data Lake Data Warehouse Unstructured Data Data Governance

How Novanta’s CIO mobilized its data-driven transformation

CIO Business Intelligence

MAY 10, 2023

We could do all that mapping and validation with you, but if the underlying data isn’t accurate, it has nothing to do with the mechanism which provides that. On data governance: We have 17 different ERP systems, and Novanta is a very acquisitive company, so it’s an ongoing challenge. It’s the clean-up effort.

Data-driven

Data-driven IT Digital Transformation Data Governance

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

Collaboration – Analysts, data scientists, and data engineers often own different steps within the end-to-end analytics journey but do not have an simple way to collaborate on the same governed data, using the tools of their choice. This is more than mere data; it’s our dynamic journey.”

Metadata

Metadata Data Lake Publishing Data Governance

5 Ways Data Engineers Can Support Data Governance

Alation

JANUARY 26, 2023

These data requirements could be satisfied with a strong data governance strategy. Governance can — and should — be the responsibility of every data user, though how that’s achieved will depend on the role within the organization. How can data engineers address these challenges directly?

Data Governance

Data Governance Strategy Data Quality Data Collection

How Fujitsu implemented a global data mesh architecture and democratized data

AWS Big Data

MAY 1, 2024

For those reasons, it was extremely difficult for Fujitsu to manage and utilize data at scale with Excel. Solution overview OneData defines three personas: Publisher – This role includes the organizational and management team of systems that serve as data sources. It is crucial in data governance and data management.

Dashboards

Dashboards Publishing Data-driven Cost-Benefit

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. Amazon DataZone allows you to simply and securely govern end-to-end data assets stored in your Amazon Redshift data warehouses or data lakes cataloged with the AWS Glue data catalog.

Metadata

Metadata Data Lake Data Processing Data-driven

Why Spreadsheets Are Your Secret Weapon for Efficient Data Governance

Alation

APRIL 6, 2023

Data governance is traditionally applied to structured data assets that are most often found in databases and information systems. The ability to connect straight to the source allows knowledge workers to work natively in spreadsheets, pulling data directly from true data sources like the data warehouse or data lake.

Data Governance

Data Governance Metadata Cost-Benefit Structured Data

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

The data fabric architectural approach can simplify data access in an organization and facilitate self-service data consumption at scale. Read: The first capability of a data fabric is a semantic knowledge data catalog, but what are the other 5 core capabilities of a data fabric? 11 May 2021. .

Management

Management Metadata Data Architecture Data Lake

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

A data hub is a center of data exchange that constitutes a hub of data repositories and is supported by data engineering, data governance, security, and monitoring services. A data hub contains data at multiple levels of granularity and is often not integrated.

Analytics

Analytics Data Warehouse Data Lake Metadata

Amazon DataZone introduces OpenLineage-compatible data lineage visualization in preview

AWS Big Data

JULY 8, 2024

In Amazon DataZone, lineage not only shares the story of data movement outside it, but it also represents the lineage of activities inside Amazon DataZone, such as asset creation, curation, publishing, and subscription. To learn more, refer to Creating inventory and published data in Amazon DataZone.

Visualization

Visualization Metadata Publishing Sales

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

1960s – Pre-Relational Era: IBM develops the Information Management System (IMS), a hierarchical database management system (DBMS), which organized data in a tree-like structure. Codd, a computer scientist at IBM, publishes a groundbreaking paper titled “A Relational Model of Data for Large Shared Data Banks.”

Data-driven

Data-driven Modeling Enterprise Structured Data

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

We also celebrated the first-ever winner of the Data Impact Achievement Award — a new award category that recognizes one customer who has consistently achieved transformation across their business, pursuing a diverse set of use cases and creating a culture of data-driven innovation. . Data Impact Achievement Award.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

How The CIO Can Become The CMO’s Best Ally In The Use Of Data

CIO Business Intelligence

SEPTEMBER 21, 2022

“The good news for many CIOs is that they’ve already laid the groundwork through investments in data governance and migration to the cloud,” LiveRamp noted in a recent report. Inconsistent data , which can result in inaccuracies in interacting with customers, and affect the internal operational use of data.

Data Lake

Data Lake Risk Marketing Data Warehouse

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

AWS Big Data

FEBRUARY 13, 2025

This enables domain users to publish and consume data from these AWS accounts. This enables the user to create a data lake environment with AWS Glue database and Athena workgroup to query the data. The status changes to Activated only when the user logs in to the SageMaker Unified Studio URL.

Data Analytics

Data Analytics Analytics Modeling Management

The hidden history of Db2

IBM Big Data Hub

JULY 5, 2022

Back in the 1960s and 70s, vast amounts of data were stored in the world’s new mainframe computers—many of them IBM System/360 machines—and had become a problem. Finally, 13 years after Codd published his paper, IBM Db2 on z/OS was born, and 10 years after that the first IBM Db2 database for LUW was released. . They were expensive.

Data Lake

Data Lake Data Warehouse Publishing Structured Data

Introducing data products in Amazon DataZone: Simplify discovery and subscription with business use case based grouping

AWS Big Data

AUGUST 5, 2024

In this post, we highlight the key benefits of data products, outline their essential features and workflows, and demonstrate how customers can use these features for easier publishing, discovery, and subscription. Curate data product – The data publisher adds a readme, glossaries, and metadata forms to the data product.

Metadata

Metadata Sales Data Lake Publishing

Get started with the new Amazon DataZone enhancements for Amazon Redshift

AWS Big Data

JULY 29, 2024

Amazon DataZone is a powerful data management service that empowers data engineers, data scientists, product managers, analysts, and business users to seamlessly catalog, discover, analyze, and govern data across organizational boundaries, AWS accounts, data lakes, and data warehouses.

Data Warehouse

Data Warehouse Sales Metadata Publishing

Celebrating Data Superheroes: The 2021 Data Impact Awards Winners

Cloudera

NOVEMBER 18, 2021

So, without further ado, it is with great delight that we officially publish the 2021 Data Impact Award winners! Data Lifecycle Connection. This allows for an omni-channel view of the customer and enables real-time data streaming and a safe zone to test machine learning models using Cloudera Data Science Workbench (CDSW).

Data Lake

Data Lake Cost-Benefit Digital Transformation Risk

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

Paco Nathan ‘s latest column dives into data governance. This month’s article features updates from one of the early data conferences of the year, Strata Data Conference – which was held just last week in San Francisco. In particular, here’s my Strata SF talk “Overview of Data Governance” presented in article form.

Machine Learning

Machine Learning Data Governance Metadata Data Science

Data Mesh 101: How Data Mesh Can Be Used in an Organization

Ontotext

FEBRUARY 12, 2024

Determine ownership by making sure all teams involved in the data mesh own the quality of their domain data, ensure service-level agreements are met, and share that data with data contracts. Domain teams should continually monitor for data errors with data validation checks and incorporate data lineage to track usage.

Data Quality

Data Quality Data-driven Data Lake Data Governance

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

In this post, we discuss how the Amazon Finance Automation team used AWS Lake Formation and the AWS Glue Data Catalog to build a data mesh architecture that simplified data governance at scale and provided seamless data access for analytics, AI, and machine learning (ML) use cases.

Finance

Finance Metadata Big Data Recreation/Entertainment

Of Muffins and Machine Learning Models

Cloudera

FEBRUARY 16, 2022

In the case of CDP Public Cloud, this includes virtual networking constructs and the data lake as provided by a combination of a Cloudera Shared Data Experience (SDX) and the underlying cloud storage. Each project consists of a declarative series of steps or operations that define the data science workflow.

Machine Learning

Machine Learning Modeling Metadata Recreation/Entertainment

The Audience for Data Catalogs and Data Intelligence

Alation

JUNE 21, 2022

The audience grew to include data scientists (who were even more scarce and expensive) and their supporting resources (e.g., After that came data governance , privacy, and compliance staff. Power business users and other non-purely-analytic data citizens came after that. Data engineers want to catalog data pipelines.

Metadata

Metadata Data Quality Visualization Data Lake

Themes and Conferences per Pacoid, Episode 12

Domino Data Lab

AUGUST 8, 2019

In this episode I’ll cover themes from Sci Foo and important takeaways that data science teams should be tracking. First and foremost: there’s substantial overlap between what the scientific community is working toward for scholarly infrastructure and some of the current needs of data governance in industry. We did it again.”.

Data Science

Data Science Machine Learning Data Governance Statistics

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

Data mesh solves this by promoting data autonomy, allowing users to make decisions about domains without a centralized gatekeeper. It also improves development velocity with better data governance and access with improved data quality aligned with business needs.

Metadata

Metadata Data-driven Data Quality Data Architecture

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

This produces end-to-end lineage so business and technology users alike can understand the state of a data lake and/or lake house. The table details are extracted from the IDF pipeline information, which then syncs details like column, table, business, and technical metadata.

Metadata

Metadata Cost-Benefit Data Quality Data Lake

What Is Alation Connected Sheets? Q&A with the Creators

Alation

NOVEMBER 28, 2022

But refreshing this analysis with the latest data was impossible… unless you were proficient in SQL or Python. We wanted to make it easy for anyone to pull data and self service without the technical know-how of the underlying database or data lake. Sathish and I met in 2004 when we were working for Oracle.

Metadata

Metadata Enterprise Cost-Benefit Data Quality

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Trending Sources

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Webinars

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

HEMA accelerates their data governance journey with Amazon DataZone

2021 Gift Giving Guide for Data Nerds

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Lake Formation 2023 year in review

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

AWS Lake Formation 2022 year in review

Amazon DataZone announces custom blueprints for AWS services

How Novanta’s CIO mobilized its data-driven transformation

Top analytics announcements of AWS re:Invent 2024

Unlock data across organizational boundaries using Amazon DataZone – now generally available

5 Ways Data Engineers Can Support Data Governance

How Fujitsu implemented a global data mesh architecture and democratized data

Governing data in relational databases using Amazon DataZone

Why Spreadsheets Are Your Secret Weapon for Efficient Data Governance

Augmented data management: Data fabric versus data mesh

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Amazon DataZone introduces OpenLineage-compatible data lineage visualization in preview

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Announcing the 2020 Data Impact Award Winners

How The CIO Can Become The CMO’s Best Ally In The Use Of Data

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

The hidden history of Db2

Introducing data products in Amazon DataZone: Simplify discovery and subscription with business use case based grouping

Get started with the new Amazon DataZone enhancements for Amazon Redshift

Celebrating Data Superheroes: The 2021 Data Impact Awards Winners

Themes and Conferences per Pacoid, Episode 8

Data Mesh 101: How Data Mesh Can Be Used in an Organization

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Of Muffins and Machine Learning Models

The Audience for Data Catalogs and Data Intelligence

Themes and Conferences per Pacoid, Episode 12

What is Data Mesh?

Turnkey Cloud DataOps: Solution from Alation and Accenture

What Is Alation Connected Sheets? Q&A with the Creators

Stay Connected