Data Transformation, Metadata and Publishing

Data Transformation

Metadata

Publishing

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Plug-and-play integration : A seamless, plug-and-play integration between data producers and consumers should facilitate rapid use of new data sets and enable quick proof of concepts, such as in the data science teams. As part of the required data, CHE data is shared using Amazon DataZone.

IoT

IoT Machine Learning Metadata Data-driven

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

For instance, Domain A will have the flexibility to create data products that can be published to the divisional catalog, while also maintaining the autonomy to develop data products that are exclusively accessible to teams within the domain. Consumer feedback and demand drives creation and maintenance of the data product.

Metadata

Metadata Data Governance Data Quality Data-driven

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Trending Sources

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

To achieve this, you need access to sales orders, shipment details, and customer data owned by the retail team. The retail team, acting as the data producer, publishes the necessary data assets to Amazon DataZone, allowing you, as a consumer, to discover and subscribe to these assets.

Visualization

Visualization Data Lake Testing Data Governance

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

This person (or group of individuals) ensures that the theory behind data quality is communicated to the development team. 2 – Data profiling. Data profiling is an essential process in the DQM lifecycle. This means there are no unintended data errors, and it corresponds to its appropriate designation (e.g.,

Data Quality

Data Quality Metrics Data-driven Management

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of. This will save the current SQL query book, and the status of the notebook will change from Draft to Saved. Choose Run all.

Visualization

Visualization Data Processing Testing Publishing

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources. Data transformations through stored procedures and use of materialized views to curate datasets and generate insights is a known pattern with relational databases.

Management

Management Metadata Analytics Dashboards

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. Data virtualization is ideal in any situation where the is necessary: Information coming from diverse data sources.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

Once a draft has been created or opened, developers use the visual Designer to build their data flow logic and validate it using interactive test sessions. Managing drafts outside the Catalog keeps a clean distinction between phases of the development cycle, leaving only those flows that are ready for deployment published in the Catalog.

Testing

Testing Publishing Metadata Interactive

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

Under the Transparency in Coverage (TCR) rule , hospitals and payors to publish their pricing data in a machine-readable format. Due to this low complexity, the solution uses AWS serverless services to ingest the data, transform it, and make it available for analytics.

Visualization

Visualization Dashboards Data-driven Gap analysis

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

AWS Big Data

APRIL 12, 2023

The solution uses the TPC-DS dataset and unmodified data schema and table relationships, but derives queries from TPC-DS to support the SparkSQL test cases. It is not comparable to other published TPC-DS benchmark results. About the Authors Melody Yang is a Senior Big Data Solution Architect for Amazon EMR at AWS.

Testing

Testing Big Data Metadata Optimization

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow. With NiFi you can configure your source processor and run it independently of any other processors to retrieve data. Enabling self-service for developers.

Testing

Testing Cost-Benefit Interactive Visualization

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

Developers can use the support in Amazon Location Service for publishing device position updates to Amazon EventBridge to build a near-real-time data pipeline that stores locations of tracked assets in Amazon Simple Storage Service (Amazon S3). Athena is used to run geospatial queries on the location data stored in the S3 buckets.

Analytics

Analytics IoT Metadata Internet of Things

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

In this blog, we’ll delve into the critical role of governance and data modeling tools in supporting a seamless data mesh implementation and explore how erwin tools can be used in that role. erwin also provides data governance, metadata management and data lineage software called erwin Data Intelligence by Quest.

Metadata

Metadata Data Quality Data Governance Modeling

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Choosing A Graph Data Model to Best Serve Your Use Case

Ontotext

MARCH 27, 2024

For example, GPS, social media, cell phone handoffs are modeled as graphs while data catalogs, data lineage and MDM tools leverage knowledge graphs for linking metadata with semantics. LPG lacks schema and semantics, which makes it inappropriate for publishing and sharing of data. This makes LPGs inflexible.

Modeling

Modeling Metadata Data Quality Enterprise

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

More specifically, IDF has been integrated with Alation at an API level; this means that all generated pipeline code, metadata attributes, configuration files, and lineage are automatically synced (representing a huge time savings). They can better understand data transformations, checks, and normalization. Transparency is key.

Metadata

Metadata Cost-Benefit Data Quality Data Lake

Improve observability across Amazon MWAA tasks

AWS Big Data

FEBRUARY 6, 2023

For data pipeline orchestration, the Apache Airflow UI is a user-friendly tool that provides detailed views into your data pipeline. When it comes to pipeline health management, each service that your tasks are interacting with could be storing or publishing logs to different locations, such as an S3 bucket or Amazon CloudWatch logs.

Management

Management Interactive Publishing Metadata

Why The Public Sector Needs Data Governance

Alation

NOVEMBER 22, 2022

With data governance, public sector entities can publish data ethically, using open-data portals and giving citizens visibility into how their data is used. Modernizes Data Systems. Data governance enables structured and secure data exchanges for systems built on the premise of privacy-by-design.

Data Governance

Data Governance Metadata Data-driven Unstructured Data

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

This adds an additional ETL step, making the data even more stale. Data lakehouse was created to solve these problems. The data warehouse storage layer is removed from lakehouse architectures. Instead, continuous data transformation is performed within the BLOB storage. Data discoverability.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

foundation models to help users discover, augment, and enrich data with natural language. Watsonx.data is built on 3 core integrated components: multiple query engines, a catalog that keeps track of metadata, and storage and relational data sources which the query engines directly access.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

The following AWS services are used for data ingestion, processing, and load: Amazon AppFlow is a fully managed integration service that enables you to securely transfer data between SaaS applications like Salesforce, SAP, Marketo, Slack, and ServiceNow, and AWS services like Amazon S3 and Amazon Redshift , in just a few clicks.

Sales

Sales Visualization Software Metadata

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Partners in Innovation: Voice of the Customer Enhancements to Logi Symphony

Jet Global

JULY 19, 2023

Data Connectivity Enhancements Data and content authors are the first users in the app building infrastructure and content. It is important for our customers to access advanced connectors and data transformation features so they can build a robust data layer.

Dashboards

Dashboards Visualization Reporting Interactive

A Stitch in Time: How Jet Analytics Boosts Microsoft Fabric Time-to-Value

Jet Global

MARCH 14, 2024

Data Lineage and Documentation Jet Analytics simplifies the process of documenting data assets and tracking data lineage in Fabric. It offers a transparent and accurate view of how data flows through the system, ensuring robust compliance.

Analytics

Analytics Management Reporting Data Quality

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Requirement Multi-Source Data Blending Data from multiple sources is compiled and the output is a single view, metric, or visualization. Data Transformation and Enrichment Data can be enriched for analysis. Metadata Self-service analysis is made easy with user-friendly naming conventions for tables and columns.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

AWS Big Data

NOVEMBER 6, 2024

In our examples, we use Kinesis Data Generator , a sample application to generate and publish data streams to Firehose. You can also set up Firehose to use other data sources for your real-time streams. We set up Firehose to deliver the stream into Iceberg tables in the Data Catalog. b64decode(record['data']).decode('utf-8')

Metadata

Metadata Data Lake Management Internet of Things

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

AWS Big Data

NOVEMBER 14, 2024

We use the built-in features of Data Firehose, including AWS Lambda for necessary data transformation and Amazon Simple Notification Service (Amazon SNS) for near real-time alerts. APIs act as the entry point for applications to access data, business logic, or functionality from your backend services.

Data Lake

Data Lake Metadata Testing Data-driven

Data Leaders Brief

How EUROGATE established a data mesh architecture using Amazon DataZone

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Webinars

Trending Sources

Data’s dark secret: Why poor quality cripples AI and growth

Webinars

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Biggest Trends in Data Visualization Taking Shape in 2022

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

How healthcare organizations can analyze and create insights using price transparency data

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Empowering data mesh: The tools to deliver BI excellence

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Choosing A Graph Data Model to Best Serve Your Use Case

Turnkey Cloud DataOps: Solution from Alation and Accenture

Improve observability across Amazon MWAA tasks

Why The Public Sector Needs Data Governance

Data platform trinity: Competitive or complementary?

Exploring the AI and data capabilities of watsonx

Cross-account integration between SaaS platforms using Amazon AppFlow

What is Data Mapping?

Partners in Innovation: Voice of the Customer Enhancements to Logi Symphony

A Stitch in Time: How Jet Analytics Boosts Microsoft Fabric Time-to-Value

What Is Embedded Analytics?

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

Stay Connected