Data Governance, Data Lake and Data Science

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. Two use cases illustrate how this can be applied for business intelligence (BI) and data science applications, using AWS services such as Amazon Redshift and Amazon SageMaker.

IoT

IoT Machine Learning Metadata Data-driven

2021 Gift Giving Guide for Data Nerds

DataKitchen

DECEMBER 7, 2021

This book is not available until January 2022, but considering all the hype around the data mesh, we expect it to be a best seller. In the book, author Zhamak Dehghani reveals that, despite the time, money, and effort poured into them, data warehouses and data lakes fail when applied at the scale and speed of today’s organizations.

Data-driven

Data-driven Data Governance Big Data Data Science

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Initially, the data inventories of different services were siloed within isolated environments, making data discovery and sharing across services manual and time-consuming for all teams involved. Implementing robust data governance is challenging. The following figure illustrates the data mesh architecture.

Data Governance

Data Governance Publishing Data-driven Metadata

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Emerging Data Platforms Tackle Big Challenges

David Menninger's Analyst Perspectives

JANUARY 14, 2021

But unlocking value from data requires multiple analytics workloads, data science tools and machine learning algorithms to run against the same diverse data sets. In our ongoing benchmark research project , we are researching the ways in which organizations work with big data and the challenges they face.

Machine Learning

Machine Learning Data Governance Big Data Data Science

What is data architecture? A framework to manage data

CIO Business Intelligence

DECEMBER 20, 2024

Beyond breaking down silos, modern data architectures need to provide interfaces that make it easy for users to consume data using tools fit for their jobs. Data must be able to freely move to and from data warehouses, data lakes, and data marts, and interfaces must make it easy for users to consume that data.

Data Architecture

Data Architecture Management Consulting Internet of Things

How to Simplify Your Approach to Data Governance

Data Virtualization

JUNE 16, 2022

Reading Time: 6 minutes Data Governance as a concept and practice has been around for as long as data management has been around. It, however is gaining prominence and interest in recent years due to the increasing volume of data that needs to be.

Data Governance

Data Governance Data Integration Management Data Lake

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

The original proof of concept was to have one data repository ingesting data from 11 sources, including flat files and data stored via APIs on premises and in the cloud, Pruitt says. There are a lot of variables that determine what should go into the data lake and what will probably stay on premise,” Pruitt says.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

AWS Big Data

DECEMBER 4, 2024

In today’s data-driven world , organizations are constantly seeking efficient ways to process and analyze vast amounts of information across data lakes and warehouses. This post will showcase how this data can also be queried by other data teams using Amazon Athena. Verify that you have Python version 3.7

Data Lake

Data Lake Metadata Insurance Data-driven

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

The data architect also “provides a standard common business vocabulary, expresses strategic requirements, outlines high-level integrated designs to meet those requirements, and aligns with enterprise strategy and related business architecture,” according to DAMA International’s Data Management Body of Knowledge.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Decentralize LF-tag management with AWS Lake Formation

AWS Big Data

NOVEMBER 16, 2023

One of the core features of AWS Lake Formation is the delegation of permissions on a subset of resources such as databases, tables, and columns in AWS Glue Data Catalog to data stewards, empowering them make decisions regarding who should get access to their resources and helping you decentralize the permissions management of your data lakes.

Management

Management Data Lake Sales Machine Learning

Why the Data Journey Manifesto?

DataKitchen

JUNE 12, 2023

We had been talking about “Agile Analytic Operations,” “DevOps for Data Teams,” and “Lean Manufacturing For Data,” but the concept was hard to get across and communicate. I spent much time de-categorizing DataOps: we are not discussing ETL, Data Lake, or Data Science.

Testing

Testing Dashboards Data Science Data Lake

What you don’t know about data management could kill your business

CIO Business Intelligence

NOVEMBER 28, 2023

Data, of course, has been all the rage the past decade, having been declared the “new oil” of the digital economy. And yes, data has enormous potential to create value for your business, making its accrual and the analysis of it, aka data science, very exciting.

Management

Management Data Architecture Data Lake Data Strategy

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

A data hub is a center of data exchange that constitutes a hub of data repositories and is supported by data engineering, data governance, security, and monitoring services. A data hub contains data at multiple levels of granularity and is often not integrated.

Analytics

Analytics Data Warehouse Data Lake Metadata

Data Governance for Dummies: Your Questions, Answered

Alation

FEBRUARY 17, 2023

This past week, I had the pleasure of hosting Data Governance for Dummies author Jonathan Reichental for a fireside chat , along with Denise Swanson , Data Governance lead at Alation. Can you have proper data management without establishing a formal data governance program?

Data Governance

Data Governance Data Quality Metadata Cost-Benefit

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. Amazon DataZone allows you to simply and securely govern end-to-end data assets stored in your Amazon Redshift data warehouses or data lakes cataloged with the AWS Glue data catalog.

Metadata

Metadata Data Lake Data Processing Data-driven

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

Modak Nabu automates repetitive tasks in the data preparation process and thus accelerates the data preparation by 4x. Modak Nabu reliably curates datasets for any line of business and personas, from business analysts to data scientists. Customers using Modak Nabu with CDP today have deployed Data Lakes and.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture. 4:30 PM – 5:30 PM (PDT) Wynn ANT207 | Understand your data with business context. 1:00 PM – 2:00 PM (PDT) Venetian ANT201 | Accelerate innovation with real-time data.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

To ensure maximum momentum and flawless service the Experian BIS Data Enrichment team decided to use the power of big data by utilizing Cloudera’s Data Science Workbench. This enabled Merck KGaA to control and maintain secure data access, and greatly increase business agility for multiple users.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

The Award Winning Formula: How Cloudera Empowered OCBC With Trusted Data To Unlock Business Value from AI

Cloudera

JUNE 6, 2024

To keep pace as banking becomes increasingly digitized in Southeast Asia, OCBC was looking to utilize AI/ML to make more data-driven decisions to improve customer experience and mitigate risks. Lastly, data security is paramount, especially in the finance industry.

Contextual Data

Contextual Data Data Lake Data-driven Risk

The Madness of Data (and analytics) Governance

Andrew White

DECEMBER 9, 2019

The outline of the call went as follows: I was taking to a central state agency who was organizing a data governance initiative (in their words) across three other state agencies. All four agencies had reported an independent but identical experience with data governance in the past. An expensive consulting engagement.

Analytics

Analytics Data Lake Data Governance Data Warehouse

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

Paco Nathan ‘s latest column dives into data governance. This month’s article features updates from one of the early data conferences of the year, Strata Data Conference – which was held just last week in San Francisco. In particular, here’s my Strata SF talk “Overview of Data Governance” presented in article form.

Machine Learning

Machine Learning Data Governance Metadata Data Science

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

In this post, we discuss how you can use purpose-built AWS services to create an end-to-end data strategy for C360 to unify and govern customer data that address these challenges. The AWS modern data architecture shows a way to build a purpose-built, secure, and scalable data platform in the cloud.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

The data fabric architectural approach can simplify data access in an organization and facilitate self-service data consumption at scale. Read: The first capability of a data fabric is a semantic knowledge data catalog, but what are the other 5 core capabilities of a data fabric? 11 May 2021. .

Management

Management Metadata Data Architecture Data Lake

What is an open data lakehouse and why you should care?

IBM Big Data Hub

JANUARY 17, 2023

A data lakehouse is an emerging data management architecture that improves efficiency and converges data warehouse and data lake capabilities driven by a need to improve efficiency and obtain critical insights faster. Let’s start with why data lakehouses are becoming increasingly important.

Data Lake

Data Lake Metadata Data Warehouse Data Governance

Themes and Conferences per Pacoid, Episode 12

Domino Data Lab

AUGUST 8, 2019

Paco Nathan ‘s latest monthly article covers Sci Foo as well as why data science leaders should rethink hiring and training priorities for their data science teams. In this episode I’ll cover themes from Sci Foo and important takeaways that data science teams should be tracking. Introduction.

Data Science

Data Science Machine Learning Data Governance Statistics

Celebrating Data Superheroes: The 2021 Data Impact Awards Winners

Cloudera

NOVEMBER 18, 2021

By adopting a custom developed application based on the Cloudera ecosystem, Carrefour has combined the legacy systems into one platform which provides access to customer data in a single data lake. In doing so, Bank of the West has modernized and centralized its Big Data platform in just one year.

Data Lake

Data Lake Cost-Benefit Digital Transformation Risk

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

Combining AWS data integration services like AWS Glue with data platforms like Snowflake allows you to build scalable, secure data lakes and pipelines to power analytics, BI, data science, and ML use cases. This unlocks scalable analytics while maintaining data governance, compliance, and access control.

Analytics

Analytics Data-driven Data Integration Data Lake

8 tips for unleashing the power of unstructured data

CIO Business Intelligence

NOVEMBER 28, 2023

With each game release and update, the amount of unstructured data being processed grows exponentially, Konoval says. This volume of data poses serious challenges in terms of storage and efficient processing,” he says. To address this problem RetroStyle Games invested in data lakes.

Unstructured Data

Unstructured Data Data-driven Visualization Data Quality

Announcing the 2021 Data Impact Awards

Cloudera

MAY 12, 2021

Use cases could include but are not limited to: optimizing healthcare processes to save lives, data analysis for emergency resource management, building smarter cities with data science, using data and analytics to fight climate change, tackle the food crisis or prioritize actions against poverty, and more.

Digital Transformation

Digital Transformation Machine Learning Optimization Data Lake

What Is Data Curation?

Alation

FEBRUARY 13, 2020

Data curation is important in today’s world of data sharing and self-service analytics, but I think it is a frequently misused term. When speaking and consulting, I often hear people refer to data in their data lakes and data warehouses as curated data, believing that it is curated because it is stored as shareable data.

Metadata

Metadata Data Warehouse Data Lake Data Governance

Of Muffins and Machine Learning Models

Cloudera

FEBRUARY 16, 2022

In the case of CDP Public Cloud, this includes virtual networking constructs and the data lake as provided by a combination of a Cloudera Shared Data Experience (SDX) and the underlying cloud storage. Each project consists of a declarative series of steps or operations that define the data science workflow.

Machine Learning

Machine Learning Modeling Metadata Recreation/Entertainment

Preparing for a Logical Data Management Solution

Data Virtualization

JUNE 25, 2024

Reading Time: 5 minutes For years, organizations have been managing data by consolidating it into a single data repository, such as a cloud data warehouse or data lake, so it can be analyzed and delivered to business users. Unfortunately, organizations struggle to get this.

Management

Management Data Lake Data Warehouse Data Integration

3 Major Trends at Strata New York 2017

DataRobot Blog

OCTOBER 3, 2017

This highlights the two companies’ shared vision on self-service data discovery with an emphasis on collaboration and data governance. 2) When data becomes information, many (incremental) use cases surface. He is creating information services for his clients, an emerging use case for SSDP.

Data Lake

Data Lake Advertising Data Architecture Insurance

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

In this post, we discuss how the Amazon Finance Automation team used AWS Lake Formation and the AWS Glue Data Catalog to build a data mesh architecture that simplified data governance at scale and provided seamless data access for analytics, AI, and machine learning (ML) use cases.

Finance

Finance Metadata Big Data Recreation/Entertainment

Where Do Data Catalogs Fit in Metadata Management?

Alation

FEBRUARY 13, 2020

Modern data catalogs—originated to help data analysts find and evaluate data—continue to meet the needs of analysts, but they have expanded their reach. They are now central to data stewardship, data curation, and data governance—all metadata dependent activities.

Metadata

Metadata Management Data Lake Data Governance

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Smarten

AUGUST 4, 2023

If your team has easy-to-use tools and features, you are much more likely to experience the user adoption you want and to improve data literacy and data democratization across the organization.

Data Lake

Data Lake Machine Learning Data Integration Data Quality

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

The right data architecture can help your organization improve data quality because it provides the framework that determines how data is collected, transported, stored, secured, used and shared for business intelligence and data science use cases. Perform data quality monitoring based on pre-configured rules.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

The Data Lakehouse Myth

Data Virtualization

FEBRUARY 22, 2023

Reading Time: 2 minutes The data lakehouse attempts to combine the best parts of the data warehouse with the best parts of data lakes while avoiding all of the problems inherent in both. However, the data lakehouse is not the last word in data.

Data Lake

Data Lake Data Warehouse Data Integration Management

The Data Lakehouse Myth

Data Virtualization

FEBRUARY 22, 2023

Reading Time: 2 minutes The data lakehouse attempts to combine the best parts of the data warehouse with the best parts of data lakes while avoiding all of the problems inherent in both. However, the data lakehouse is not the last word in data.

Data Lake

Data Lake Data Warehouse Data Integration Management

ChatGPT: le nuove sfide della strategia sui dati nell’era dell’IA generativa

CIO Business Intelligence

MARCH 27, 2024

“Le azioni successive per il miglioramento della data quality possono essere sia di processo che applicative e includono la definizione di un modello organizzativo intorno alla data governance , assegnando ruoli e compiti chiari alle varie figure coinvolte (data scientist, data engineering, data owner, data steward, eccetera)”.

Data Governance

Data Governance Data Lake Data Strategy Data-driven

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

Semantics, context, and how data is tracked and used mean even more as you stretch to reach post-migration goals. This is why, when data moves, it’s imperative for organizations to prioritize data discovery. Data discovery is also critical for data governance , which, when ineffective, can actually hinder organizational growth.

Metadata

Metadata Data Governance Data-driven Modeling

A Simple Data Capability Framework

Peter James Thomas

MAY 3, 2019

Leverage of Data to generate Insight. In this second area we have disciplines such as Analytics and Data Science. The objective here is to use a variety of techniques to tease out findings from available data (both internal and external) that go beyond the explicit purpose for which it was captured. Watch this space. [2].

Strategy

Strategy Data Architecture Data Quality Data Strategy

How EUROGATE established a data mesh architecture using Amazon DataZone

2021 Gift Giving Guide for Data Nerds

Webinars

Trending Sources

HEMA accelerates their data governance journey with Amazon DataZone

Webinars

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Emerging Data Platforms Tackle Big Challenges

What is data architecture? A framework to manage data

How to Simplify Your Approach to Data Governance

How to modernize data lakes with a data lakehouse architecture

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

What is a data architect? Skills, salaries, and how to become a data framework master

Decentralize LF-tag management with AWS Lake Formation

Why the Data Journey Manifesto?

What you don’t know about data management could kill your business

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Data Governance for Dummies: Your Questions, Answered

Governing data in relational databases using Amazon DataZone

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Your guide to AWS Analytics at AWS re:Invent 2023

Announcing the 2020 Data Impact Award Winners

The Award Winning Formula: How Cloudera Empowered OCBC With Trusted Data To Unlock Business Value from AI

The Madness of Data (and analytics) Governance

Themes and Conferences per Pacoid, Episode 8

Create an end-to-end data strategy for Customer 360 on AWS

Augmented data management: Data fabric versus data mesh

What is an open data lakehouse and why you should care?

Themes and Conferences per Pacoid, Episode 12

Celebrating Data Superheroes: The 2021 Data Impact Awards Winners

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

8 tips for unleashing the power of unstructured data

Announcing the 2021 Data Impact Awards

What Is Data Curation?

Of Muffins and Machine Learning Models

Preparing for a Logical Data Management Solution

3 Major Trends at Strata New York 2017

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Where Do Data Catalogs Fit in Metadata Management?

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Data architecture strategy for data quality

The Data Lakehouse Myth

The Data Lakehouse Myth

ChatGPT: le nuove sfide della strategia sui dati nell’era dell’IA generativa

The Cloud Connection: How Governance Supports Security

A Simple Data Capability Framework

Stay Connected