Remove Data Processing Remove Definition Remove Metadata
article thumbnail

CIOs are (still) closer than ever to their dream data lakehouse

CIO Business Intelligence

“The data catalog is critical because it’s where business manages its metadata,” said Venkat Rajaji, Senior Vice President of Product Management at Cloudera. But the metadata turf war is just getting started.” That put them in a better position to keep data under management – and possibly to host processing as well.

Metadata 117
article thumbnail

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

In this blog, we discuss the technical challenges faced by Cargotec in replicating their AWS Glue metadata across AWS accounts, and how they navigated these challenges successfully to enable cross-account data sharing. Solution overview Cargotec required a single catalog per account that contained metadata from their other AWS accounts.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Disaster recovery strategies for Amazon MWAA – Part 2

AWS Big Data

The solution for this post is hosted on GitHub. Backup and restore architecture The backup and restore strategy involves periodically backing up Amazon MWAA metadata to Amazon Simple Storage Service (Amazon S3) buckets in the primary Region. This is the bucket where you host all of your DAGs for your environment. [1.b]

article thumbnail

Use Amazon Kinesis Data Streams to deliver real-time data to Amazon OpenSearch Service domains with Amazon OpenSearch Ingestion

AWS Big Data

OpenSearch Ingestion supports up to 96 OCUs per pipeline, and 24,000 characters per pipeline definition file (see OpenSearch Ingestion quotas ). The IAM role ARN must be the same for both the OpenSearch Servicer sink definition and the Kinesis Data Streams source definition.

article thumbnail

Data governance beyond SDX: Adding third party assets to Apache Atlas

Cloudera

In this blog, we’ll highlight the key CDP aspects that provide data governance and lineage and show how they can be extended to incorporate metadata for non-CDP systems from across the enterprise. Atlas provides open metadata management and governance capabilities to build a catalog of all assets, and also classify and govern these assets.

article thumbnail

How Amazon GTTS runs large-scale ETL jobs on AWS using Amazon MWAA

AWS Big Data

At a high level, the core of Langley’s architecture is based on a set of Amazon Simple Queue Service (Amazon SQS) queues and AWS Lambda functions, and a dedicated RDS database to store ETL job data and metadata. Web UI Amazon MWAA comes with a managed web server that hosts the Airflow UI.

article thumbnail

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

BMS’s EDLS platform hosts over 5,000 jobs and is growing at 15% YoY (year over year). EDLS job steps and metadata Every EDLS job comprises one or more job steps chained together and run in a predefined order orchestrated by the custom ETL framework. It retrieves the specified files and available metadata to show on the UI.