Data Integration, Data Transformation and Metadata

Data Integration

Data Transformation

Metadata

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

A high hurdle many enterprises have yet to overcome is accessing mainframe data via the cloud. Mainframes hold an enormous amount of critical and sensitive business data including transactional information, healthcare records, customer data, and inventory metrics. Four key challenges prevent them from doing so: 1.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

An extract, transform, and load (ETL) process using AWS Glue is triggered once a day to extract the required data and transform it into the required format and quality, following the data product principle of data mesh architectures. This process is shown in the following figure.

IoT

IoT Machine Learning Metadata Data-driven

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

Third, some services require you to set up and manage compute resources used for federated connectivity, and capabilities like connection testing and data preview arent available in all services. To solve for these challenges, we launched Amazon SageMaker Lakehouse unified data connectivity.

Visualization

Visualization Data Processing Testing Publishing

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Available Now! Automated Testing for Data Transformations

Wayne Yaddow

FEBRUARY 18, 2025

Selecting the strategies and tools for validating data transformations and data conversions in your data pipelines. Introduction Data transformations and data conversions are crucial to ensure that raw data is organized, processed, and ready for useful analysis.

Testing

Testing Data Transformation Data-driven Data Quality

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Wayne Yaddow

MARCH 5, 2025

The goal is to examine five major methods of verifying and validating data transformations in data pipelines with an eye toward high-quality data deployment. First, we look at how unit and integration tests uncover transformation errors at an early stage. Applicability by Transformation Type 2.

Testing

Testing Data Transformation Statistics Metadata

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

What is Data Lineage? Top 5 Benefits of Data Lineage

erwin

APRIL 29, 2020

Many large organizations, in their desire to modernize with technology, have acquired several different systems with various data entry points and transformation rules for data as it moves into and across the organization. Who are the data owners? What are the transformation rules? Data Governance.

Key Performance Indicator

Key Performance Indicator Metadata Data Governance Data Quality

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. Does Data Virtualization support web data integration?

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

The importance of data ingestion and integration for enterprise AI

IBM Big Data Hub

JANUARY 9, 2024

The entire generative AI pipeline hinges on the data pipelines that empower it, making it imperative to take the correct precautions. 4 key components to ensure reliable data ingestion Data quality and governance: Data quality means ensuring the security of data sources, maintaining holistic data and providing clear metadata.

Enterprise

Enterprise Data Integration Data Quality Contextual Data

How Infomedia built a serverless data pipeline with change data capture using AWS Glue and Apache Hudi

AWS Big Data

MARCH 15, 2023

The API retrieves data at runtime from an Amazon Aurora PostgreSQL-Compatible Edition database for end-user consumption. To populate the database, the Infomedia team developed a data pipeline using Amazon Simple Storage Service (Amazon S3) for data storage, AWS Glue for data transformations, and Apache Hudi for CDC and record-level updates.

Cost-Benefit

Cost-Benefit Data Processing Optimization Data-driven

Why Enterprise Data Lineage is Critical for the Success of Your Modern Data Stack

Octopai

NOVEMBER 13, 2022

The modern data stack is a data management system built out of cloud-based data systems. A given modern data stack will usually include components for data ingestion from your data sources, data transformation, data storage, data analysis and reporting.

Enterprise

Enterprise Data Warehouse Reporting Metadata

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

Due to this low complexity, the solution uses AWS serverless services to ingest the data, transform it, and make it available for analytics. The data ingestion process copies the machine-readable files from the hospitals, validates the data, and keeps the validated files available for analysis.

Visualization

Visualization Dashboards Data-driven Gap analysis

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

We chatted about industry trends, why decentralization has become a hot topic in the data world, and how metadata drives many data-centric use cases. But, through it all, Mohan says it’s critical to view everything through the same lens: gaining business value from data. Data fabric is a technology architecture.

Metadata

Metadata Data Warehouse Data Quality Data Lake

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. And there’s control of that landscape to facilitate insight and collaboration and limit risk.

Data Governance

Data Governance Risk Metadata Management

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

Rise in polyglot data movement because of the explosion in data availability and the increased need for complex data transformations (due to, e.g., different data formats used by different processing frameworks or proprietary applications). As a result, alternative data integration technologies (e.g.,

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

Specifically, the system uses Amazon SageMaker Processing jobs to process the data stored in the data lake, employing the AWS SDK for Pandas (previously known as AWS Wrangler) for various data transformation operations, including cleaning, normalization, and feature engineering.

Data Lake

Data Lake Analytics Snapshot Data Quality

Choosing A Graph Data Model to Best Serve Your Use Case

Ontotext

MARCH 27, 2024

For example, GPS, social media, cell phone handoffs are modeled as graphs while data catalogs, data lineage and MDM tools leverage knowledge graphs for linking metadata with semantics. RDF is used extensively for data publishing and data interchange and is based on W3C and other industry standards.

Modeling

Modeling Metadata Data Quality Enterprise

Sure, Trust Your Data… Until It Breaks Everything: How Automated Data Lineage Saves the Day

Octopai

JUNE 9, 2024

By embracing automated, multidimensional data lineage, HealthCo was able to maintain a consistent and reliable data environment across their hybrid systems. Solving the data lineage problem directly supported their data products by ensuring data integrity and reliability. This is where Octopai excels.

IT Data-driven Predictive Analytics Data Strategy

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

What if, experts asked, you could load raw data into a warehouse, and then empower people to transform it for their own unique needs? Today, data integration platforms like Rivery do just that. By pushing the T to the last step in the process, such products have revolutionized how data is understood and analyzed.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

Data mapping is essential for integration, migration, and transformation of different data sets; it allows you to improve your data quality by preventing duplications and redundancies in your data fields. Data mapping is important for several reasons.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

A Stitch in Time: How Jet Analytics Boosts Microsoft Fabric Time-to-Value

Jet Global

MARCH 14, 2024

Jet streamlines many aspects of data administration, greatly improving data solutions built on Microsoft Fabric. It enhances analytics capabilities, streamlines migration, and enhances data integration. Through Jet’s integration with Fabric, your organization can better handle, process, and use your data.

Analytics

Analytics Management Reporting Data Quality

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Requirement Multi-Source Data Blending Data from multiple sources is compiled and the output is a single view, metric, or visualization. Data Transformation and Enrichment Data can be enriched for analysis. addresses).

Analytics

Analytics Cost-Benefit Visualization Dashboards

Introducing the HubSpot connector for AWS Glue

AWS Big Data

DECEMBER 2, 2024

More companies have realized there is an opportunity to integrate, enhance, and present this SaaS data to improve internal operations and gain valuable insights on their data. From there, they can perform meaningful analytics, gain valuable insights, and optionally push enriched data back to external SaaS platforms.

Data Lake

Data Lake Testing Data Integration Metadata

Automating Data Warehouses in the Era of AI, Data Products and Data Lakehouses

BI-Survey

MARCH 6, 2025

While efficiency is a priority, data quality and security remain non-negotiable. Developing and maintaining data transformation pipelines are among the first tasks to be targeted for automation. However, caution is advised since accuracy, timeliness, and other aspects of data quality depend on the quality of data pipelines.

Data Warehouse

Data Warehouse Metadata Unstructured Data Data-driven

Unleashing GenAI — Ensuring Data Quality at Scale (Part 2)

Wayne Yaddow

MARCH 28, 2025

First: It is critical to set up a thorough data inventory and assessment procedure. Organizations must do a comprehensive inventory of their current data repositories, recording the data sources, kind, structure, and quality before starting data integration.

Data Quality

Data Quality Data Integration Data Governance Metadata

Hybrid big data analytics with Amazon EMR on AWS Outposts

AWS Big Data

JANUARY 29, 2025

To avoid this situation, Oktank aims to decouple compute from storage, allowing them to scale down compute nodes and repurpose them for other workloads without compromising data integrity and accessibility. The Data Catalog is a centralized metadata repository for all your data assets across various data sources.

Big Data

Big Data Data Analytics Analytics Interactive

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

To capture a more complete picture of the data’s journey, it is important to have a DataOps Observability system in place. Data lineage is static and often lags by weeks or months. Data lineage is often considered static because it is typically based on snapshots of data and metadata taken at a specific time.

Testing

Testing Data Quality Data Governance Data-driven

Data Leaders Brief

Bridging the gap between mainframe data and hybrid cloud environments

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Trending Sources

Data’s dark secret: Why poor quality cripples AI and growth

Webinars

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Available Now! Automated Testing for Data Transformations

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Ensuring Data Transformation Quality with dbt Core

What is Data Lineage? Top 5 Benefits of Data Lineage

Biggest Trends in Data Visualization Taking Shape in 2022

The importance of data ingestion and integration for enterprise AI

How Infomedia built a serverless data pipeline with change data capture using AWS Glue and Apache Hudi

Why Enterprise Data Lineage is Critical for the Success of Your Modern Data Stack

How healthcare organizations can analyze and create insights using price transparency data

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Addressing the Three Scalability Challenges in Modern Data Platforms

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Choosing A Graph Data Model to Best Serve Your Use Case

Sure, Trust Your Data… Until It Breaks Everything: How Automated Data Lineage Saves the Day

The Modern Data Stack Explained: What The Future Holds

What is Data Mapping?

A Stitch in Time: How Jet Analytics Boosts Microsoft Fabric Time-to-Value

What Is Embedded Analytics?

Introducing the HubSpot connector for AWS Glue

Automating Data Warehouses in the Era of AI, Data Products and Data Lakehouses

Unleashing GenAI — Ensuring Data Quality at Scale (Part 2)

Hybrid big data analytics with Amazon EMR on AWS Outposts

“You Complete Me,” said Data Lineage to DataOps Observability.

Stay Connected