Data Transformation, Document and Metadata

Data Transformation

Document

Metadata

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

These data processing and analytical services support Structured Query Language (SQL) to interact with the data. Writing SQL queries requires not just remembering the SQL syntax rules, but also knowledge of the tables metadata, which is data about table schemas, relationships among the tables, and possible column values.

Metadata

Metadata Data Lake Modeling Data Warehouse

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

Data collections are the ones and zeroes that encode the actionable insights (patterns, trends, relationships) that we seek to extract from our data through machine learning and data science. Source: [link] SAP also announced key partners that further enhance Datasphere as a powerful business data fabric.

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

This middleware consists of custom code that runs data flows to stitch data transformations, search queries, and AI enrichments in varying combinations tailored to use cases, datasets, and requirements. Ingest flows are created to enrich data as its added to an index. An index constructed from the processed documents.

Machine Learning

Machine Learning Visualization Dashboards Metadata

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Top 6 Benefits of Automating End-to-End Data Lineage

erwin

SEPTEMBER 17, 2020

For example, automatically importing mappings from developers’ Excel sheets, flat files, Access and ETL tools into a comprehensive mappings inventory, complete with auto generated and meaningful documentation of the mappings, is a powerful way to support overall data governance. Data quality is crucial to every organization.

Cost-Benefit

Cost-Benefit Data Governance Metadata Reporting

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

This person (or group of individuals) ensures that the theory behind data quality is communicated to the development team. 2 – Data profiling. Data profiling is an essential process in the DQM lifecycle. These processes could include reports, campaigns, or financial documentation. date, month, and year).

Data Quality

Data Quality Metrics Data-driven Management

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

How to Build a Successful Metadata Management Framework

Alation

JUNE 28, 2022

This is where metadata, or the data about data, comes into play. Having a data catalog is the cornerstone of your data governance strategy, but what supports your data catalog? Your metadata management framework provides the underlying structure that makes your data accessible and manageable.

Metadata

Metadata Management Data Governance Machine Learning

What is Data Lineage? Top 5 Benefits of Data Lineage

erwin

APRIL 29, 2020

An understanding of the data’s origins and history helps answer questions about the origin of data in a Key Performance Indicator (KPI) reports, including: How the report tables and columns are defined in the metadata? Who are the data owners? What are the transformation rules? Data Governance.

Key Performance Indicator

Key Performance Indicator Metadata Data Governance Data Quality

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Publish data assets – As the data producer from the retail team, you must ingest individual data assets into Amazon DataZone. For this use case, create a data source and import the technical metadata of four data assets— customers , order_items , orders , products , reviews , and shipments —from AWS Glue Data Catalog.

Visualization

Visualization Data Lake Testing Data Governance

Available Now! Automated Testing for Data Transformations

Wayne Yaddow

FEBRUARY 18, 2025

Selecting the strategies and tools for validating data transformations and data conversions in your data pipelines. Introduction Data transformations and data conversions are crucial to ensure that raw data is organized, processed, and ready for useful analysis.

Testing

Testing Data Transformation Data-driven Data Quality

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Alation

OCTOBER 18, 2022

Data analysts and engineers use dbt to transform, test, and document data in the cloud data warehouse. Yet every dbt transformation contains vital metadata that is not captured – until now. Data Transformation in the Modern Data Stack. How did the data transform exactly?

Metadata

Metadata Metrics Recreation/Entertainment Data Quality

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

Solution overview The following diagram illustrates the solution architecture: The solution uses AWS Glue as an ETL engine to extract data from the source Amazon RDS database. Built-in data transformations then scrub columns containing PII using pre-defined masking functions. This saves time over manually defining schemas.

Visualization

Visualization Metadata Data Transformation Testing

How Your Finance Team Can Lead Your Enterprise Data Transformation

Alation

OCTOBER 26, 2021

Building a Data Culture Within a Finance Department. Our finance users tell us that their first exposure to the Alation Data Catalog often comes soon after the launch of organization-wide data transformation efforts. After all, finance is one of the greatest consumers of data within a business.

Finance

Finance Data Transformation Enterprise Metrics

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

We’re excited to announce the general availability of the open source adapters for dbt for all the engines in CDP — Apache Hive , Apache Impala , and Apache Spark, with added support for Apache Livy and Cloudera Data Engineering. The Open Data Lakehouse . Cloudera builds dbt adaptors for all engines in the open data lakehouse.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. A data store lets a business connect existing data with new data and discover new insights with real-time analytics and business intelligence. Increase trust in AI outcomes.

Risk

Risk Modeling Management Metadata

The importance of data ingestion and integration for enterprise AI

IBM Big Data Hub

JANUARY 9, 2024

The entire generative AI pipeline hinges on the data pipelines that empower it, making it imperative to take the correct precautions. 4 key components to ensure reliable data ingestion Data quality and governance: Data quality means ensuring the security of data sources, maintaining holistic data and providing clear metadata.

Enterprise

Enterprise Data Integration Data Quality Contextual Data

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

This can be done using the initiatePrint action: embeddedDashboard.initiatePrint(); The following code sample shows a loading animation, SDK code status, and dashboard interaction monitoring, along with initiating dashboard print from the application: Embedding demo $(document).ready(function()

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. And there’s control of that landscape to facilitate insight and collaboration and limit risk.

Data Governance

Data Governance Risk Metadata Management

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches. Athena is used to run geospatial queries on the location data stored in the S3 buckets. Choose Run.

Analytics

Analytics IoT Metadata Internet of Things

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow. With NiFi you can configure your source processor and run it independently of any other processors to retrieve data. Enabling self-service for developers.

Testing

Testing Cost-Benefit Interactive Visualization

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

This allows for a new way of thinking and new organizational elements—namely, a modern data community. However, today’s data mesh platform contains largely independent data products. Even with well-documented data products, knowing how to connect or join data products is a time-consuming job.

Technology

Technology Data-driven Machine Learning Sales

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

A critical feature for every developer however is to get instantaneous feedback like configuration validations or performance metrics, as well as previewing data transformations for each step of their data flow. Attributes contain key metadata like the source directory of a file or the source topic of a Kafka message.

Testing

Testing Publishing Metadata Interactive

Sure, Trust Your Data… Until It Breaks Everything: How Automated Data Lineage Saves the Day

Octopai

JUNE 9, 2024

Accurate data lineage rebuilt trust among decision-makers. HealthCo’s leadership could confidently rely on data-driven insights, knowing the data’s journey was well-documented and reliable. This trust empowered them to deploy new data products without fear of inaccuracies, driving innovation and operational improvements.

IT Data-driven Predictive Analytics Data Strategy

Why The Public Sector Needs Data Governance

Alation

NOVEMBER 22, 2022

A well-governed data landscape enables data users in the public sector to better understand the driving forces and needs to support public policy – and measure impact once a change is made. Efficient Access To Data. Citizens, companies, and government employees need access to data and documents.

Data Governance

Data Governance Metadata Data-driven Unstructured Data

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

These encoder-only architecture models are fast and effective for many enterprise NLP tasks, such as classifying customer feedback and extracting information from large documents. While they require task-specific labeled data for fine tuning, they also offer clients the best cost performance trade-off for non-generative use cases.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

How Data Lineage Improves Data Compliance

Octopai

DECEMBER 11, 2022

It’s for that reason that even as the first BCBS-239 implementation deadline came into effect a few years ago, McKinsey reported that one-third of Global Systemically Important Banks had focused on “documenting data lineage up to the level of provisioning data elements and including data transformation.”.

Insurance

Insurance Risk Metadata Visualization

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

These help data analysts visualize key insights that can help you make better data-backed decisions. ELT Data Transformation Tools: ELT data transformation tools are used to extract, load, and transform your data. Examples of data transformation tools include dbt and dataform.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Modern Data Sources Painlessly connect with modern data such as streaming, search, big data, NoSQL, cloud, document-based sources. Quickly link all your data from Amazon Redshift, MongoDB, Hadoop, Snowflake, Apache Solr, Elasticsearch, Impala, and more. addresses). Read carefully.

Analytics

Analytics Cost-Benefit Visualization Dashboards

A Stitch in Time: How Jet Analytics Boosts Microsoft Fabric Time-to-Value

Jet Global

MARCH 14, 2024

This straightforward and user-friendly access to source data makes it easier for your business users to examine and extract insights from your core data systems. Data Lineage and Documentation Jet Analytics simplifies the process of documenting data assets and tracking data lineage in Fabric.

Analytics

Analytics Management Reporting Data Quality

How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and Amazon Athena

AWS Big Data

APRIL 29, 2025

While enabling organization-wide efficiency, the team also applied these principles to the data architecture, making sure that CLEA itself operates frugally. After evaluating various tools, we built a serverless data transformation pipeline using Amazon Athena and dbt. The Source stage maintains raw data in its original form.

Data Transformation

Data Transformation Cost-Benefit Testing Data Lake

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

AWS Big Data

MAY 12, 2025

Streaming pipelines used Spark Streaming to ingest real-time data from Kafka, writing raw datasets to an Amazon Simple Storage Service (Amazon S3) data lake while simultaneously loading them into BigQuery and Google Cloud Storage to build logical data layers.

Metrics

Metrics Cost-Benefit Metadata Data Lake

Automating Data Warehouses in the Era of AI, Data Products and Data Lakehouses

BI-Survey

MARCH 6, 2025

While efficiency is a priority, data quality and security remain non-negotiable. Developing and maintaining data transformation pipelines are among the first tasks to be targeted for automation. However, caution is advised since accuracy, timeliness, and other aspects of data quality depend on the quality of data pipelines.

Data Warehouse

Data Warehouse Metadata Unstructured Data Data-driven

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

DataOps Observability includes monitoring and testing the data pipeline, data quality, data testing, and alerting. Data testing is an essential aspect of DataOps Observability; it helps to ensure that data is accurate, complete, and consistent with its specifications, documentation, and end-user requirements.

Testing

Testing Data Governance Data Quality Data-driven

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

SAP Datasphere Powers Business at the Speed of Data

Webinars

Trending Sources

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

Webinars

Data’s dark secret: Why poor quality cripples AI and growth

Top 6 Benefits of Automating End-to-End Data Lineage

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Ensuring Data Transformation Quality with dbt Core

How to Build a Successful Metadata Management Framework

What is Data Lineage? Top 5 Benefits of Data Lineage

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Available Now! Automated Testing for Data Transformations

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

How Your Finance Team Can Lead Your Enterprise Data Transformation

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

How to use foundation models and trusted governance to manage AI workflow risk

The importance of data ingestion and integration for enterprise AI

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Automate discovery of data relationships using ML and Amazon Neptune graph technology

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Sure, Trust Your Data… Until It Breaks Everything: How Automated Data Lineage Saves the Day

Why The Public Sector Needs Data Governance

Exploring the AI and data capabilities of watsonx

How Data Lineage Improves Data Compliance

The Modern Data Stack Explained: What The Future Holds

What is Data Mapping?

What Is Embedded Analytics?

A Stitch in Time: How Jet Analytics Boosts Microsoft Fabric Time-to-Value

How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and Amazon Athena

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

Automating Data Warehouses in the Era of AI, Data Products and Data Lakehouses

“You Complete Me,” said Data Lineage to DataOps Observability.

Stay Connected