Remove Blog Remove Data Processing Remove Metadata
article thumbnail

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

This is a guest blog post co-written with Sumesh M R from Cargotec and Tero Karttunen from Knowit Finland. In this blog, we discuss the technical challenges faced by Cargotec in replicating their AWS Glue metadata across AWS accounts, and how they navigated these challenges successfully to enable cross-account data sharing.

article thumbnail

How Volkswagen Autoeuropa built a data mesh to accelerate digital transformation using Amazon DataZone

AWS Big Data

This is a joint blog post co-authored with Martin Mikoleizig from Volkswagen Autoeuropa. Absence of data catalog and metadata management – Data didn’t have any metadata associated with it, and so use cases couldn’t consume the data without further explanation from the data source owners and specialists.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Top 10 Data Lineage Podcasts, Blogs, and Magazines

Octopai

Our list of Top 10 Data Lineage Podcasts, Blogs, and Websites To Follow in 2021. The host is Tobias Macey, an engineer with many years of experience. The particular episode we recommend looks at how WeWork struggled with understanding their data lineage so they created a metadata repository to increase visibility. Agile Data.

article thumbnail

Accelerate your migration to Amazon OpenSearch Service with Reindexing-from-Snapshot

AWS Big Data

Each Lucene index (and, therefore, each OpenSearch shard) represents a completely independent search and storage capability hosted on a single machine. How RFS works OpenSearch and Elasticsearch snapshots are a directory tree that contains both data and metadata. The following is an example for the structure of an Elasticsearch 7.10

article thumbnail

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

This means the data files in the data lake aren’t modified during the migration and all Apache Iceberg metadata files (manifests, manifest files, and table metadata files) are generated outside the purview of the data. In this method, the metadata are recreated in an isolated environment and colocated with the existing data files.

Data Lake 110
article thumbnail

Boosting Object Storage Performance with Ozone Manager

Cloudera

It is a replicated, highly-available service that is responsible for managing the metadata for all objects stored in Ozone. In this blog post, we will highlight the work done recently to improve the performance of Ozone Manager to scale to exabytes of data. The hardware specifications are included at the end of this blog.

article thumbnail

Introducing erwin Data Intelligence 14: Dive into data quality, ensure data reliability and leverage new deployment flexibility

erwin

Leveraging the metadata within the erwin Data Intelligence data catalog, erwin Data Quality automates data profiling and quality assessment and then leverages the resulting quality scoring to provide intelligence-integrated data quality visibility throughout erwin Data Intelligence. Register Now!