Data Transformation, Metadata and Software

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

A high hurdle many enterprises have yet to overcome is accessing mainframe data via the cloud. Connecting mainframe data to the cloud also has financial benefits as it leads to lower mainframe CPU costs by leveraging cloud computing for data transformations. Four key challenges prevent them from doing so: 1.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

He/she assists the organization by providing clarity and insight into advanced data technology solutions. As quality issues are often highlighted with the use of dashboard software , the change manager plays an important role in the visualization of data quality. 2 – Data profiling. How Do You Measure Data Quality?

Data Quality

Data Quality Metrics Data-driven Management

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of. Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team. He loves exploring different cultures and cuisines.

Visualization

Visualization Data Processing Testing Publishing

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Publish data assets – As the data producer from the retail team, you must ingest individual data assets into Amazon DataZone. For this use case, create a data source and import the technical metadata of four data assets— customers , order_items , orders , products , reviews , and shipments —from AWS Glue Data Catalog.

Visualization

Visualization Data Lake Testing Data Governance

Top 6 Benefits of Automating End-to-End Data Lineage

erwin

SEPTEMBER 17, 2020

Data automation reduces the loss of time in collecting, processing and storing large chunks of data because it replaces manual processes (and human errors) with intelligent processes, software and artificial intelligence (AI). Automating data capture frees up resources to focus on more strategic and useful tasks.

Cost-Benefit

Cost-Benefit Data Governance Metadata Reporting

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. Data virtualization is becoming more popular due to its huge benefits. Maximizing customer engagement.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

In response, Lenovo launched a new line of entry-level gaming laptops and desktops it now brands as Lenovo LOQ that caters to a new gamer’s first foray into gaming, says Girish Hoogar, global head of engineering for Lenovo’s cloud and software business in its Intelligent Devices Group.

Analytics

Analytics Data Lake Metadata Cost-Benefit

Making OT-IT integration a reality with new data architectures and generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

The data transformation imperative What Denso and other industry leaders realise is that for IT-OT convergence to be realised, and the benefits of AI unlocked, data transformation is vital. The company can also unify its knowledge base and promote search and information use that better meets its needs.

Data Architecture

Data Architecture Unstructured Data Manufacturing IT

How Your Finance Team Can Lead Your Enterprise Data Transformation

Alation

OCTOBER 26, 2021

Building a Data Culture Within a Finance Department. Our finance users tell us that their first exposure to the Alation Data Catalog often comes soon after the launch of organization-wide data transformation efforts. After all, finance is one of the greatest consumers of data within a business. Don’t overthink it.

Finance

Finance Data Transformation Enterprise Metrics

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

dbt allows data teams to produce trusted data sets for reporting, ML modeling, and operational workflows using SQL, with a simple workflow that follows software engineering best practices like modularity, portability, and continuous integration/continuous development (CI/CD). The Open Data Lakehouse . Introduction.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

The What & Why of Data Governance

erwin

MARCH 4, 2021

In addition to drivers like digital transformation and compliance, it’s really important to look at the effect of poor data on enterprise efficiency/productivity. The Benefits of erwin Data Intelligence.

Data Governance

Data Governance Digital Transformation Data-driven Cost-Benefit

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

We are excited to announce the general availability of Apache Iceberg in Cloudera Data Platform (CDP). Iceberg is a 100% open table format, developed through the Apache Software Foundation , and helps users avoid vendor lock-in. Why integrate Apache Iceberg with Cloudera Data Platform? This is a huge accelerator to adoption.

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. The AWS Glue Data Catalog provides a uniform repository where disparate systems can store and find metadata to keep track of data in data silos. Refer to Catalogs for more information.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

AWS Big Data

APRIL 12, 2023

Metadata store – We use Spark’s in-memory data catalog to store metadata for TPC-DS databases and tables— spark.sql.catalogImplementation is set to the default value in-memory. Her areas of interests are open-source frameworks and automation, data engineering and DataOps. test: EMR release – EMR 6.10.0

Testing

Testing Big Data Metadata Optimization

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data may be stored in its raw original form or optimized into a different format suitable for consumption by specialized engines. Data could be persisted in open data formats, democratizing its consumption, as well as replicated automatically which helped you sustain high availability.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

In this blog, we’ll delve into the critical role of governance and data modeling tools in supporting a seamless data mesh implementation and explore how erwin tools can be used in that role. erwin also provides data governance, metadata management and data lineage software called erwin Data Intelligence by Quest.

Metadata

Metadata Data Quality Data Governance Modeling

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

break; } } } const frameOptions = { url: ' ', container: document.getElementById("dashboardContainer"), width: "100%", height: "AutoFit", loadingHeight: "200px", withIframePlaceholder: true, onChange: (changeEvent, metadata) => { switch (changeEvent.eventName) { case 'ERROR': { document.getElementById("dashboardContainer").append('Unable

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. A data store lets a business connect existing data with new data and discover new insights with real-time analytics and business intelligence. Track models and drive transparent processes.

Risk

Risk Modeling Management Metadata

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

In the world of software engineering and development, organizations use project management tools like Atlassian Jira Cloud. This post shows you how to use Amazon AppFlow and AWS Glue to create a fully automated data ingestion pipeline that will synchronize your Jira data into your data lake. Choose Update.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches. Athena is used to run geospatial queries on the location data stored in the S3 buckets. Choose Run.

Analytics

Analytics IoT Metadata Internet of Things

5 best open source data flow lineage tools

Octopai

AUGUST 11, 2024

By reverse-engineering, parsing, and converting scripts, Octopai seamlessly connects all data points within and across organizational systems. While open-source tools such as Apache Atlas, Open Metadata, Egeria, Spline, and OpenLineage offer valuable capabilities, they come with their own sets of pros and cons.

Metadata

Metadata Visualization Data Quality Data Governance

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To ingest the data, smava uses a set of popular third-party customer data platforms complemented by custom scripts. After the data lands in Amazon S3, smava uses the AWS Glue Data Catalog and crawlers to automatically catalog the available data, capture the metadata, and provide an interface that allows querying all data assets.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Alation

JULY 18, 2022

To ensure you can deliver on this world-changing vision of data, Alation helps you maximize the value of your data lake with integrations to the Unity catalog. Alation will leverage the Databricks Unity Catalog so users can easily integrate metadata from multiple workspaces, powering discovery, governance, and insights inside Alation.

ROI

ROI Metadata Data Lake Digital Transformation

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

This involves unifying and sharing a single copy of data and metadata across IBM® watsonx.data ™, IBM® Db2 ®, IBM® Db2® Warehouse and IBM® Netezza ®, using native integrations and supporting open formats, all without the need for migration or recataloging.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

We took this a step further by creating a blueprint to create smart recommendations by linking similar data products using graph technology and ML. In this post, we showed how an organization can augment a data catalog with additional metadata by using ML and Neptune with an automated process.

Technology

Technology Data-driven Machine Learning Sales

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

Specifically, the system uses Amazon SageMaker Processing jobs to process the data stored in the data lake, employing the AWS SDK for Pandas (previously known as AWS Wrangler) for various data transformation operations, including cleaning, normalization, and feature engineering.

Data Lake

Data Lake Analytics Snapshot Data Quality

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

Implementing an effective data sharing strategy that satisfies compliance and regulatory requirements is complex. Customers often need to share data between disparate software as a service (SaaS) platforms within their organization or across organizations.

Sales

Sales Visualization Software Metadata

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

AWS Big Data

OCTOBER 18, 2023

FINRA centralizes all its data in Amazon Simple Storage Service (Amazon S3) with a remote Hive metastore on Amazon Relational Database Service (Amazon RDS) to manage their metadata information. Melody Yang is a Senior Big Data Solutions Architect for Amazon EMR at AWS. or later installed.

Big Data

Big Data Data Processing Interactive Testing

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

IBM software products are embedding watsonx capabilities across digital labor, IT automation, security, sustainability, and application modernization to help unlock new levels of business value for clients. foundation models to help users discover, augment, and enrich data with natural language.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

In this article, I will explain the modern data stack in detail, list some benefits, and discuss what the future holds. What Is the Modern Data Stack? The modern data stack is a combination of various software tools used to collect, process, and store data on a well-integrated cloud-based data platform.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Ontotext

FEBRUARY 12, 2024

For many organizations, a centralized data platform will fall short as it gives data teams much less autonomy over managing increasingly diverse and voluminous datasets. Centralized teams also adopted an auditing mechanism to verify data accuracy and adherence to SLAs and to ensure data quality. Intuit, a U.S.

Data-driven

Data-driven Data Lake Data Quality Business Objectives

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

Incremental query refers to a query strategy that focuses on processing and analyzing only the new or updated data within a data lake since the last query. The key idea behind incremental queries is to use metadata or change tracking mechanisms to identify the new or modified data since the last query.

Data Lake

Data Lake Snapshot Big Data Data-driven

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

Alternatively, you can use AWS Glue for Apache Spark, which provides built-in support for bucketing configurations during the data transformation process. AWS Glue allows you to define bucketing parameters, such as the number of buckets and the columns to bucket on, providing an optimized data layout for efficient querying with Athena.

Optimization

Optimization Data Lake Cost-Benefit Reporting

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

This is in contrast to traditional BI, which extracts insight from data outside of the app. Commercial vs. Internal Apps Any organization that develops or deploys a software application often has a need to embed analytics inside its application. These capabilities are to be made available inside the applications people use every day.

Analytics

Analytics Cost-Benefit Visualization Dashboards

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Partners in Innovation: Voice of the Customer Enhancements to Logi Symphony

Jet Global

JULY 19, 2023

Data Connectivity Enhancements Data and content authors are the first users in the app building infrastructure and content. It is important for our customers to access advanced connectors and data transformation features so they can build a robust data layer.

Dashboards

Dashboards Visualization Reporting Interactive

A Stitch in Time: How Jet Analytics Boosts Microsoft Fabric Time-to-Value

Jet Global

MARCH 14, 2024

Data Lineage and Documentation Jet Analytics simplifies the process of documenting data assets and tracking data lineage in Fabric. It offers a transparent and accurate view of how data flows through the system, ensuring robust compliance.

Analytics

Analytics Management Reporting Data Quality

Automating Data Warehouses in the Era of AI, Data Products and Data Lakehouses

BI-Survey

MARCH 6, 2025

While efficiency is a priority, data quality and security remain non-negotiable. Developing and maintaining data transformation pipelines are among the first tasks to be targeted for automation. However, caution is advised since accuracy, timeliness, and other aspects of data quality depend on the quality of data pipelines.

Data Warehouse

Data Warehouse Metadata Unstructured Data Data-driven

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

AWS Big Data

FEBRUARY 18, 2025

These include managing complex extract, transform, and load (ETL) processes, handling schema validation, providing reliable delivery, and maintaining custom code for data transformations. Firehose delivers streaming data with configurable buffering options that can be optimized for near-zero latency.

Snapshot

Snapshot Optimization Data Lake Metadata

Introducing the HubSpot connector for AWS Glue

AWS Big Data

DECEMBER 2, 2024

Most companies have adopted a diverse set of software as a service (SaaS) platforms to support various applications. The rapid adoption has enabled them to quickly streamline operations, enhance collaboration, and gain more accessible, scalable solutions for managing their critical data and workflows.

Data Lake

Data Lake Testing Data Integration Metadata

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

Data lineage is typically stored in separate systems from the data itself and can be difficult to keep up-to-date. Five on DataOps Observability : DataOps Observability is the ability to understand the state and behavior of data and the software and hardware that carries and transforms it as it flows through systems.

Testing

Testing Data Governance Data Quality Data-driven

Bridging the gap between mainframe data and hybrid cloud environments

Data’s dark secret: Why poor quality cripples AI and growth

Webinars

Trending Sources

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Webinars

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Ensuring Data Transformation Quality with dbt Core

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Top 6 Benefits of Automating End-to-End Data Lineage

Biggest Trends in Data Visualization Taking Shape in 2022

Lay the groundwork now for advanced analytics and AI

Making OT-IT integration a reality with new data architectures and generative AI

How Your Finance Team Can Lead Your Enterprise Data Transformation

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

The What & Why of Data Governance

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Build a data lake with Apache Flink on Amazon EMR

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

How to modernize data lakes with a data lakehouse architecture

Empowering data mesh: The tools to deliver BI excellence

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

How to use foundation models and trusted governance to manage AI workflow risk

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Gain insights from historical location data using Amazon Location Service and AWS analytics services

5 best open source data flow lineage tools

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Tackling AI’s data challenges with IBM databases on AWS

Automate discovery of data relationships using ML and Amazon Neptune graph technology

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Cross-account integration between SaaS platforms using Amazon AppFlow

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

Exploring the AI and data capabilities of watsonx

The Modern Data Stack Explained: What The Future Holds

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

What Is Embedded Analytics?

What is Data Mapping?

Partners in Innovation: Voice of the Customer Enhancements to Logi Symphony

A Stitch in Time: How Jet Analytics Boosts Microsoft Fabric Time-to-Value

Automating Data Warehouses in the Era of AI, Data Products and Data Lakehouses

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

Introducing the HubSpot connector for AWS Glue

“You Complete Me,” said Data Lineage to DataOps Observability.

Stay Connected