Remove 2012 Remove Data Governance Remove Metadata
article thumbnail

Becoming a machine learning company means investing in foundational technologies

O'Reilly on Data

Use ML to unlock new data types—e.g., Consider deep learning, a specific form of machine learning that resurfaced in 2011/2012 due to record-setting models in speech and computer vision. You also need solutions that let you understand what data you have and who can access it. Metadata and artifacts needed for audits.

article thumbnail

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

In this post, we delve into the key aspects of using Amazon EMR for modern data management, covering topics such as data governance, data mesh deployment, and streamlined data discovery. Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated.

Data Lake 116
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

The current method is largely manual, relying on emails and general communication, which not only increases overhead but also varies from one use case to another in terms of data governance. Data domain producers publish data assets using datasource run to Amazon DataZone in the Central Governance account.

Data Lake 130
article thumbnail

Orchestrate an end-to-end ETL pipeline using Amazon S3, AWS Glue, and Amazon Redshift Serverless with Amazon MWAA

AWS Big Data

This approach allows the team to process the raw data extracted from Account A to Account B, which is dedicated for data handling tasks. This makes sure the raw and processed data can be maintained securely separated across multiple accounts, if required, for enhanced data governance and security.

Metadata 132
article thumbnail

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

This streamlined architecture approach offers several advantages: Single source of truth – The Central IT team acts as the custodian of the combined and curated data from all business units, thereby providing a unified and consistent dataset. Similarly, individual business units produce their own domain-specific data.

Data Lake 122
article thumbnail

How Novo Nordisk built distributed data governance and control at scale

AWS Big Data

The first post of this series describes the overall architecture and how Novo Nordisk built a decentralized data mesh architecture, including Amazon Athena as the data query engine. The third post will show how end-users can consume data from their tool of choice, without compromising data governance.

article thumbnail

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

Paco Nathan ‘s latest column dives into data governance. This month’s article features updates from one of the early data conferences of the year, Strata Data Conference – which was held just last week in San Francisco. In particular, here’s my Strata SF talk “Overview of Data Governance” presented in article form.