Data Governance, Data Science and Metadata

Data Governance

Data Science

Metadata

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

Data collections are the ones and zeroes that encode the actionable insights (patterns, trends, relationships) that we seek to extract from our data through machine learning and data science. Datasphere is not just for data managers. As you would guess, maintaining context relies on metadata.

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. Two use cases illustrate how this can be applied for business intelligence (BI) and data science applications, using AWS services such as Amazon Redshift and Amazon SageMaker.

IoT

IoT Machine Learning Metadata Data-driven

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Is The Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

The state of data quality in 2020

O'Reilly on Data

FEBRUARY 11, 2020

They don’t have the resources they need to clean up data quality problems. The building blocks of data governance are often lacking within organizations. These include the basics, such as metadata creation and management, data provenance, data lineage, and other essentials. And that’s just the beginning.

Data Quality

Data Quality Metadata Data Governance Publishing

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Is The Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How companies are building sustainable AI and ML initiatives

O'Reilly on Data

JANUARY 29, 2019

In other words, could we see a roadmap for transitioning from legacy cases (perhaps some business intelligence) toward data science practices, and from there into the tooling required for more substantial AI adoption? Data scientists and data engineers are in demand.

Deep Learning

Deep Learning Machine Learning Data Science Metadata

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

Data governance definition Data governance is a system for defining who within an organization has authority and control over data assets and how those data assets may be used. It encompasses the people, processes, and technologies required to manage and protect data assets.

Data Governance

Data Governance Management Metadata Data Quality

7 data trends on our radar

O'Reilly on Data

JANUARY 8, 2019

Beyond investments in narrowing the skills gap, companies are beginning to put processes in place for their data science projects, for example creating analytics centers of excellence that centralize capabilities and share best practices. Automation in data science and data. Burgeoning IoT technologies.

Machine Learning

Machine Learning IoT Internet of Things Data Science

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Initially, the data inventories of different services were siloed within isolated environments, making data discovery and sharing across services manual and time-consuming for all teams involved. Implementing robust data governance is challenging.

Data Governance

Data Governance Publishing Data-driven Metadata

What Is a Metadata Management Tool?

Octopai

DECEMBER 12, 2021

What enables you to use all those gigabytes and terabytes of data you’ve collected? Metadata is the pertinent, practical details about data assets: what they are, what to use them for, what to use them with. Without metadata, data is just a heap of numbers and letters collecting dust. Where does metadata come from?

Metadata

Metadata Management Data Quality Data Governance

Becoming a machine learning company means investing in foundational technologies

O'Reilly on Data

MAY 21, 2019

You also need solutions that let you understand what data you have and who can access it. About a third of the respondents in the survey indicated they are interested in data governance systems and data catalogs. 58% of survey respondents indicated they are building or evaluating data science platforms.

Machine Learning

Machine Learning Technology Deep Learning Data Science

AI adoption in the enterprise 2020

O'Reilly on Data

MARCH 18, 2020

Whether it’s controlling for common risk factors—bias in model development, missing or poorly conditioned data, the tendency of models to degrade in production—or instantiating formal processes to promote data governance, adopters will have their work cut out for them as they work to establish reliable AI production lines.

Enterprise

Enterprise Deep Learning Data Governance Risk

Specialized tools for machine learning development and model governance are becoming essential

O'Reilly on Data

APRIL 2, 2019

A few years ago, we started publishing articles (see “Related resources” at the end of this post) on the challenges facing data teams as they start taking on more machine learning (ML) projects. Metadata and artifacts needed for audits: as an example, the output from the components of MLflow will be very pertinent for audits.

Machine Learning

Machine Learning Modeling Data Science Software

When is data too clean to be useful for enterprise AI?

CIO Business Intelligence

NOVEMBER 27, 2024

Good data governance has always involved dealing with errors and inconsistencies in datasets, as well as indexing and classifying that structured data by removing duplicates, correcting typos, standardizing and validating the format and type of data, and augmenting incomplete information or detecting unusual and impossible variations in the data.

Enterprise

Enterprise Data Quality Structured Data Modeling

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

We need to do more than automate model building with autoML; we need to automate tasks at every stage of the data pipeline. In a previous post , we talked about applications of machine learning (ML) to software development, which included a tour through sample tools in data science and for managing data infrastructure.

Machine Learning

Machine Learning Software Metadata Testing

Apache Ozone Powers Data Science in CDP Private Cloud

Cloudera

AUGUST 26, 2021

This means that there is out of the box support for Ozone storage in services like Apache Hive , Apache Impala, Apache Spark, and Apache Nifi, as well as in Private Cloud experiences like Cloudera Machine Learning (CML) and Data Warehousing Experience (DWX). Data ingestion through ‘s3’. Ozone Namespace Overview.

Data Science

Data Science Forecasting Metadata Machine Learning

A Data Prediction for 2025

DataKitchen

FEBRUARY 2, 2023

A combined, interoperable suite of tools for data team productivity, governance, and security for large and small data teams. Ultimately, there will be an interoperable toolset for running the data team , just like a more focused toolset (ELT/Data Science/BI) for acting upon data.

Metadata

Metadata Testing Data Science Risk

DataOps Facilitates Remote Work

DataKitchen

JANUARY 5, 2021

Execution of this mission requires the contribution of several groups: data center/IT, data engineering, data science, data visualization, and data governance. Each of the roles mentioned above views the world through a preferred set of tools: Data Center/IT – Servers, storage, software.

Testing

Testing Data Governance Metadata Visualization

Dark Data: How to Find It and What to Do with It

Timo Elliott

JANUARY 6, 2022

If storage costs are escalating in a particular area, you may have found a good source of dark data. Analyze your metadata. If you’ve been properly managing your metadata as part of a broader data governance policy, you can use metadata management explorers to reveal silos of dark data in your landscape.

IT Metadata Data-driven Data Governance

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

The data architect also “provides a standard common business vocabulary, expresses strategic requirements, outlines high-level integrated designs to meet those requirements, and aligns with enterprise strategy and related business architecture,” according to DAMA International’s Data Management Body of Knowledge.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Themes and Conferences per Pacoid, Episode 11

Domino Data Lab

JULY 2, 2019

In other words, using metadata about data science work to generate code. In this case, code gets generated for data preparation, where so much of the “time and labor” in data science work is concentrated. BTW, videos for Rev2 are up: [link]. On deck this time ’round the Moon: program synthesis.

Metadata

Metadata Data Science Machine Learning Data-driven

Where Do Data Catalogs Fit in Metadata Management?

Alation

FEBRUARY 13, 2020

In an earlier blog, I defined a data catalog as “a collection of metadata, combined with data management and search tools, that helps analysts and other data users to find the data that they need, serves as an inventory of available data, and provides information to evaluate fitness data for intended uses.”.

Metadata

Metadata Management Data Lake Data Governance

Building a Data Governance Strategy in 7 Steps

Alation

DECEMBER 15, 2021

Yet high-volume collection makes keeping that foundation sound a challenge, as the amount of data collected by businesses is greater than ever before. An effective data governance strategy is critical for unlocking the full benefits of this information. Data governance requires a system.

Data Governance

Data Governance Strategy Metadata Data Strategy

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. Amazon DataZone allows you to simply and securely govern end-to-end data assets stored in your Amazon Redshift data warehouses or data lakes cataloged with the AWS Glue data catalog.

Metadata

Metadata Data Lake Data Processing Data-driven

The Data-Centric Revolution: Toss Out Metadata That Does Not Bring Joy

TDAN

SEPTEMBER 3, 2019

As I write this, I can almost hear you wail “No, no, we don’t have too much metadata, we don’t have nearly enough! We have several projects in flight to expand our use of metadata.” Sorry, I’m going to have to disagree with you there. You are on a fool’s errand that will just provide […].

Metadata

Metadata Data Governance Big Data Modeling

Data Governance for Dummies: Your Questions, Answered

Alation

FEBRUARY 17, 2023

This past week, I had the pleasure of hosting Data Governance for Dummies author Jonathan Reichental for a fireside chat , along with Denise Swanson , Data Governance lead at Alation. Can you have proper data management without establishing a formal data governance program?

Data Governance

Data Governance Data Quality Metadata Cost-Benefit

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

Analytics reference architecture for gaming organizations In this section, we discuss how gaming organizations can use a data hub architecture to address the analytical needs of an enterprise, which requires the same data at multiple levels of granularity and different formats, and is standardized for faster consumption.

Analytics

Analytics Data Warehouse Data Lake Metadata

Data Governance in a Data Mesh or Data Fabric Architecture

Data Virtualization

DECEMBER 21, 2023

Reading Time: 2 minutes Data mesh is a modern, distributed data architecture in which different domain based data products are owned by different groups within an organization. And data fabric is a self-service data layer that is supported in an orchestrated fashion to serve.

Data Governance

Data Governance Data Architecture Data Integration Management

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

Paco Nathan ‘s latest column dives into data governance. This month’s article features updates from one of the early data conferences of the year, Strata Data Conference – which was held just last week in San Francisco. In particular, here’s my Strata SF talk “Overview of Data Governance” presented in article form.

Machine Learning

Machine Learning Data Governance Metadata Data Science

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

AWS Big Data

DECEMBER 4, 2024

The post will include details on how to perform read/write data operations against Amazon S3 tables with AWS Lake Formation managing metadata and underlying data access using temporary credential vending. Create a user defined IAM role following the instructions in Requirements for roles used to register locations.

Data Lake

Data Lake Metadata Insurance Data-driven

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Gartner defines a data fabric as “a design concept that serves as an integrated layer of data and connecting processes. The data fabric architectural approach can simplify data access in an organization and facilitate self-service data consumption at scale. 2 “Exposing The Data Mesh Blind Side ” Forrester.

Management

Management Metadata Data Architecture Data Lake

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

In this post, we discuss how the Amazon Finance Automation team used AWS Lake Formation and the AWS Glue Data Catalog to build a data mesh architecture that simplified data governance at scale and provided seamless data access for analytics, AI, and machine learning (ML) use cases.

Finance

Finance Metadata Big Data Recreation/Entertainment

Themes and Conferences per Pacoid, Episode 12

Domino Data Lab

AUGUST 8, 2019

Paco Nathan ‘s latest monthly article covers Sci Foo as well as why data science leaders should rethink hiring and training priorities for their data science teams. In this episode I’ll cover themes from Sci Foo and important takeaways that data science teams should be tracking. Introduction.

Data Science

Data Science Machine Learning Data Governance Statistics

The Madness of Data (and analytics) Governance

Andrew White

DECEMBER 9, 2019

The outline of the call went as follows: I was taking to a central state agency who was organizing a data governance initiative (in their words) across three other state agencies. All four agencies had reported an independent but identical experience with data governance in the past. An expensive consulting engagement.

Analytics

Analytics Data Lake Data Governance Data Warehouse

Defining Data Acquisition and Why it Matters

Alation

FEBRUARY 20, 2020

As data drives more and more of the modern economy, data governance and data management are racing to keep up with an ever-expanding range of requirements, constraints and opportunities. Prior to the Big Data revolution, companies were inward-looking in terms of data. THE NEED FOR METADATA TOOLS.

Metadata

Metadata IT Data Governance Data Warehouse

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

In this post, we discuss how you can use purpose-built AWS services to create an end-to-end data strategy for C360 to unify and govern customer data that address these challenges. Then, you transform this data into a concise format.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Providing fine-grained, trusted access to enterprise datasets with Okera and Domino

Domino Data Lab

OCTOBER 1, 2020

This data supports all kinds of use cases within organizations, from helping production analysts understand how production is progressing, to allowing research scientists to look at the results of a set of treatments across different trials and cross-sections of the population.

Enterprise

Enterprise Metadata Cost-Benefit Data Science

Alation 2022.3: Alation Anywhere Connecting the Modern Data Stack

Alation

AUGUST 30, 2022

We continue to make deep investments in governance, including new capabilities in the Stewardship Workbench, a core part of the Data Governance App. Centralization of metadata. A decade ago, metadata was everywhere. Consequently, useful metadata was unfindable and unusable. Then Alation came along.

Metadata

Metadata Data Quality Data Governance Machine Learning

What is an open data lakehouse and why you should care?

IBM Big Data Hub

JANUARY 17, 2023

Essential components of a data lakehouse architecture and what makes an open data lakehouse. At the core of a data lakehouse architecture includes the storage, metadata service and the query engine, and typically a data governance component made up of a policy engine and a data dictionary.

Data Lake

Data Lake Metadata Data Warehouse Data Governance

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

The platform converges data cataloging, data ingestion, data profiling, data tagging, data discovery, and data exploration into a unified platform, driven by metadata. Modak Nabu automates repetitive tasks in the data preparation process and thus accelerates the data preparation by 4x.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

Four starting points to transform your organization into a data-driven enterprise

IBM Big Data Hub

JANUARY 17, 2023

IBM Cloud Pak for Data Express solutions offer clients a simple on ramp to start realizing the business value of a modern architecture. Data governance. The data governance capability of a data fabric focuses on the collection, management and automation of an organization’s data. Start a trial.

Data-driven

Data-driven Enterprise Data Governance Data Science

The year of the data catalog

Alation

FEBRUARY 13, 2020

Gartner: Magic Quadrant for Metadata Management Solutions. Magic Quadrant for Metadata Management Solutions 4 based on its ability to execute and completeness of vision. Today, metadata management has become a critical business driver as data leaders seek to govern and maximize the value from the influx of data at their disposal.

Metadata

Metadata Machine Learning Data Governance Reporting

What Is Data Curation?

Alation

FEBRUARY 13, 2020

Making datasets easy to find, understand, and access is the purpose of data curation—a purpose that demands well-described datasets. Data curation is a metadata management activity and data catalogs are essential data curation technology. Who Are the Data Curators? What about Data Stewards?

Metadata

Metadata Data Warehouse Data Lake Data Governance

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

The solution generates a list of data products, product attributes, and the associated probability scores to show join ability. We use Valentine, a data science algorithm for comparing datasets, to improve data product recommendations. The data science algorithm Valentine is an effective tool for this.

Technology

Technology Data-driven Machine Learning Sales

10 Years Later: Who’s the GOAT of Data Catalogs?

Alation

DECEMBER 15, 2022

June 2017: Dresner Advisory Services names Alation the #1 data catalog in its inaugural Data Catalog End-User Market Study. August 2017: Alation debuts as a leader in the Gartner MQ for Metadata Management Solutions. August 2018: Gartner names Alation a 2X Leader in the MQ for Metadata Management Solutions.

Metadata

Metadata Data Governance Data Quality Marketing

Governing for digital transformation and growth

Cloudera

FEBRUARY 11, 2019

The root cause is firmly entrenched in legacy systems and traditional data governance challenges that not only result in data silos but also the misguided belief that data privacy is diametrically opposed to effective exploration of information. Data scientists are the ultimate users of multi-disciplinary analytics.

Digital Transformation

Digital Transformation Data Governance Data Science Data Warehouse

SAP Datasphere Powers Business at the Speed of Data

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Trending Sources

The state of data quality in 2020

Webinars

How companies are building sustainable AI and ML initiatives

What is data governance? Best practices for managing data assets

7 data trends on our radar

HEMA accelerates their data governance journey with Amazon DataZone

What Is a Metadata Management Tool?

Becoming a machine learning company means investing in foundational technologies

AI adoption in the enterprise 2020

Specialized tools for machine learning development and model governance are becoming essential

When is data too clean to be useful for enterprise AI?

Deep automation in machine learning

Apache Ozone Powers Data Science in CDP Private Cloud

A Data Prediction for 2025

DataOps Facilitates Remote Work

Dark Data: How to Find It and What to Do with It

What is a data architect? Skills, salaries, and how to become a data framework master

Themes and Conferences per Pacoid, Episode 11

Where Do Data Catalogs Fit in Metadata Management?

Building a Data Governance Strategy in 7 Steps

Governing data in relational databases using Amazon DataZone

The Data-Centric Revolution: Toss Out Metadata That Does Not Bring Joy

Data Governance for Dummies: Your Questions, Answered

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Data Governance in a Data Mesh or Data Fabric Architecture

Themes and Conferences per Pacoid, Episode 8

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

Augmented data management: Data fabric versus data mesh

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Themes and Conferences per Pacoid, Episode 12

The Madness of Data (and analytics) Governance

Defining Data Acquisition and Why it Matters

Create an end-to-end data strategy for Customer 360 on AWS

Providing fine-grained, trusted access to enterprise datasets with Okera and Domino

Alation 2022.3: Alation Anywhere Connecting the Modern Data Stack

What is an open data lakehouse and why you should care?

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Four starting points to transform your organization into a data-driven enterprise

The year of the data catalog

What Is Data Curation?

Automate discovery of data relationships using ML and Amazon Neptune graph technology

10 Years Later: Who’s the GOAT of Data Catalogs?

Governing for digital transformation and growth

Stay Connected