Data Transformation, Information and Metadata

Data Transformation

Information

Metadata

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Writing SQL queries requires not just remembering the SQL syntax rules, but also knowledge of the tables metadata, which is data about table schemas, relationships among the tables, and possible column values. We discuss the challenges in maintaining the metadata as well as ways to overcome those challenges and enrich the metadata.

Metadata

Metadata Data Lake Modeling Data Warehouse

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

Data collections are the ones and zeroes that encode the actionable insights (patterns, trends, relationships) that we seek to extract from our data through machine learning and data science. The insights are used to produce informative content for stakeholders (decision-makers, business users, and clients).

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

According to a study from Rocket Software and Foundry , 76% of IT decision-makers say challenges around accessing mainframe data and contextual metadata are a barrier to mainframe data usage, while 64% view integrating mainframe data with cloud data sources as the primary challenge.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. In addition to real-time analytics and visualization, the data needs to be shared for long-term data analytics and machine learning applications.

IoT

IoT Machine Learning Metadata Data-driven

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Institutional Data & AI Platform architecture The Institutional Division has implemented a self-service data platform to enable the domain teams to build and manage data products autonomously. The following diagram illustrates the building blocks of the Institutional Data & AI Platform.

Metadata

Metadata Data Governance Data Quality Data-driven

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

These innovations run AI search flows to uncover relevant information through semantic, cross-language, and content understanding; adapt information ranking to individual behaviors; and enable guided conversations to pinpoint answers. Ingest flows are created to enrich data as its added to an index.

Machine Learning

Machine Learning Visualization Dashboards Metadata

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

However, with all good things comes many challenges and businesses often struggle with managing their information in the correct way. Oftentimes, the data being collected and used is incomplete or damaged, leading to many other issues that can considerably harm the company. Enters data quality management.

Data Quality

Data Quality Metrics Data-driven Management

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

How to Build a Successful Metadata Management Framework

Alation

JUNE 28, 2022

Collecting and using data to make informed decisions is the new foundation for businesses. The key term here is usable : Anyone can be data rich, and collect vast troves of data. This is where metadata, or the data about data, comes into play. What is a Metadata Management Framework?

Metadata

Metadata Management Data Governance Machine Learning

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources. Data transformations through stored procedures and use of materialized views to curate datasets and generate insights is a known pattern with relational databases.

Management

Management Metadata Analytics Dashboards

Available Now! Automated Testing for Data Transformations

Wayne Yaddow

FEBRUARY 18, 2025

Selecting the strategies and tools for validating data transformations and data conversions in your data pipelines. Introduction Data transformations and data conversions are crucial to ensure that raw data is organized, processed, and ready for useful analysis.

Testing

Testing Data Transformation Data-driven Data Quality

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Alation

OCTOBER 18, 2022

Data analysts and engineers use dbt to transform, test, and document data in the cloud data warehouse. Yet every dbt transformation contains vital metadata that is not captured – until now. Data Transformation in the Modern Data Stack. How did the data transform exactly?

Metadata

Metadata Metrics Recreation/Entertainment Data Quality

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

Duplicating data from a production database to a lower or lateral environment and masking personally identifiable information (PII) to comply with regulations enables development, testing, and reporting without impacting critical systems or exposing sensitive customer data. See AWS Glue: How it works for further details.

Visualization

Visualization Metadata Data Transformation Testing

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. Data Virtualization allows accessing them from a single point, replicating them only when strictly necessary.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

What is Data Lineage? Top 5 Benefits of Data Lineage

erwin

APRIL 29, 2020

An understanding of the data’s origins and history helps answer questions about the origin of data in a Key Performance Indicator (KPI) reports, including: How the report tables and columns are defined in the metadata? Who are the data owners? What are the transformation rules? Data Governance.

Key Performance Indicator

Key Performance Indicator Metadata Data Governance Data Quality

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

It’s a set of HTTP endpoints to perform operations such as invoking Directed Acyclic Graphs (DAGs), checking task statuses, retrieving metadata about workflows, managing connections and variables, and even initiating dataset-related events, without directly accessing the Airflow web interface or command line tools.

Interactive

Interactive Testing Data-driven Data Lake

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

But before consolidating the required data, Lenovo had to overcome concerns around sharing potentially sensitive information. Hoogar’s staff helped relieve such fears by educating employees that information included in the solution, such as notices of bug fixes or software updates, was already public.

Analytics

Analytics Data Lake Metadata Cost-Benefit

Top 6 Benefits of Automating End-to-End Data Lineage

erwin

SEPTEMBER 17, 2020

Replace manual and recurring tasks for fast, reliable data lineage and overall data governance. It’s paramount that organizations understand the benefits of automating end-to-end data lineage. Critically, it makes it easier to get a clear view of how information is created and flows into, across and outside an enterprise.

Cost-Benefit

Cost-Benefit Data Governance Metadata Reporting

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

For more information on this foundation, refer to A Detailed Overview of the Cost Intelligence Dashboard. It seamlessly consolidates data from various data sources within AWS, including AWS Cost Explorer (and forecasting with Cost Explorer ), AWS Trusted Advisor , and AWS Compute Optimizer.

Analytics

Analytics Dashboards Metadata Data Warehouse

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. The AWS Glue Data Catalog provides a uniform repository where disparate systems can store and find metadata to keep track of data in data silos. Refer to Catalogs for more information.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

AWS Big Data

APRIL 2, 2024

You can see the decompressed data has metadata information such as logGroup , logStream , and subscriptionFilters , and the actual data is included within the message field under logEvents (the following example shows an example of CloudTrail events in the CloudWatch Logs). You can connect with Ranjit on LinkedIn.

Metadata

Metadata Marketing Analytics Data Transformation

How Your Finance Team Can Lead Your Enterprise Data Transformation

Alation

OCTOBER 26, 2021

Building a Data Culture Within a Finance Department. Our finance users tell us that their first exposure to the Alation Data Catalog often comes soon after the launch of organization-wide data transformation efforts. After all, finance is one of the greatest consumers of data within a business. Don’t overthink it.

Finance

Finance Data Transformation Enterprise Metrics

Making OT-IT integration a reality with new data architectures and generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

The company can also unify its knowledge base and promote search and information use that better meets its needs. The data transformation imperative What Denso and other industry leaders realise is that for IT-OT convergence to be realised, and the benefits of AI unlocked, data transformation is vital.

Data Architecture

Data Architecture Unstructured Data Manufacturing IT

The What & Why of Data Governance

erwin

MARCH 4, 2021

But when IT-driven data management and business-oriented data governance work together in terms of both personnel, processes and technology, decisions can be made and their impacts determined based on a full inventory of reliable information. Virginia residents also would be able to opt out of data collection.

Data Governance

Data Governance Digital Transformation Data-driven Cost-Benefit

Tableau further democratizes analytics with AI-fueled features

CIO Business Intelligence

APRIL 30, 2024

This feature enables users to save calculations from a Tableau dashboard directly to Tableau’s metrics layer so they can monitor and track the information over time. It could tell the user whether the data is trending in a positive direction or what’s driving a trend, for instance. Metrics Bootstrapping.

Analytics

Analytics Metrics Visualization Dashboards

Why Data Lineage is Key to the LIBOR Transition

Octopai

NOVEMBER 23, 2020

In fact, the LIBOR transition program marks one of the largest data transformation obstacles ever seen in financial services. Building an inventory of what will be affected is a huge undertaking across all of the data, reports, and structures that must be accounted for. Automated Data Lineage for Your LIBOR Project.

Metadata

Metadata Enterprise Business Intelligence Data Governance

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

AWS Big Data

APRIL 12, 2023

Metadata store – We use Spark’s in-memory data catalog to store metadata for TPC-DS databases and tables— spark.sql.catalogImplementation is set to the default value in-memory. The Amazon EMR on EKS uplift calculation is based on the hourly billing information provided by AWS Cost Explorer. test: EMR release – EMR 6.10.0

Testing

Testing Big Data Metadata Optimization

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

The platform converges data cataloging, data ingestion, data profiling, data tagging, data discovery, and data exploration into a unified platform, driven by metadata. Modak Nabu automates repetitive tasks in the data preparation process and thus accelerates the data preparation by 4x.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

offers increased visibility by providing new experience-specific information and warnings within the SDK. supports a new callback onChange , which returns eventNames along with corresponding eventCodes to indicate errors, warnings, or information from the SDK. Additionally, SDK v2.0 The QuickSight SDK v2.0

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

By combining historical vehicle location data with information from other sources, the company can devise empirical approaches for better decision-making. For example, the company’s procurement team can use this information to make decisions about which vehicles to prioritize for replacement before policy changes go into effect.

Analytics

Analytics IoT Metadata Internet of Things

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

Under the Transparency in Coverage (TCR) rule , hospitals and payors to publish their pricing data in a machine-readable format. With this move, patients can compare prices between different hospitals and make informed healthcare decisions. The Data Catalog now contains references to the machine-readable data.

Visualization

Visualization Dashboards Data-driven Gap analysis

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

Alation

APRIL 4, 2023

Now, joint users will get an enhanced view into cloud and data transformations , with valuable context to guide smarter usage. At the heart of this release is the need to empower people with the right information at the right time. To build effective data pipelines, they need context (or metadata) on every source.

Metadata

Metadata Cost-Benefit Data Transformation Predictive Modeling

The importance of data ingestion and integration for enterprise AI

IBM Big Data Hub

JANUARY 9, 2024

The entire generative AI pipeline hinges on the data pipelines that empower it, making it imperative to take the correct precautions. 4 key components to ensure reliable data ingestion Data quality and governance: Data quality means ensuring the security of data sources, maintaining holistic data and providing clear metadata.

Enterprise

Enterprise Data Integration Data Quality Contextual Data

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

Ontotext

NOVEMBER 18, 2021

OntoRefine is a data transformation tool that lets you unite plenty of data formats and get them into your triplestore. Now that the data is in the database, we can start benefiting from the RDF technology’s strengths. One of the core upsides of storing your data in that format is inference.

Visualization

Visualization Reporting Metadata Enterprise

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

In this blog, we’ll delve into the critical role of governance and data modeling tools in supporting a seamless data mesh implementation and explore how erwin tools can be used in that role. erwin also provides data governance, metadata management and data lineage software called erwin Data Intelligence by Quest.

Metadata

Metadata Data Quality Data Governance Modeling

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

We chatted about industry trends, why decentralization has become a hot topic in the data world, and how metadata drives many data-centric use cases. But, through it all, Mohan says it’s critical to view everything through the same lens: gaining business value from data. Data fabric is a technology architecture.

Metadata

Metadata Data Warehouse Data Quality Data Lake

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow. The side panel is context-sensitive and instantly displays relevant configuration information as you navigate through your flow components.

Testing

Testing Cost-Benefit Interactive Visualization

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

Specifically, the system uses Amazon SageMaker Processing jobs to process the data stored in the data lake, employing the AWS SDK for Pandas (previously known as AWS Wrangler) for various data transformation operations, including cleaning, normalization, and feature engineering.

Data Lake

Data Lake Analytics Snapshot Data Quality

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

We use Valentine, a data science algorithm for comparing datasets, to improve data product recommendations. Neptune, the managed AWS graph database service, stores information about explicit connections between datasets, improving the recommendations. This reduces the time to discover, analyze, and create new insights.

Technology

Technology Data-driven Machine Learning Sales

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. Most of today’s largest foundation models, including the large language model (LLM) powering ChatGPT, have been trained on information culled from the internet. Trustworthiness is critical.

Risk

Risk Modeling Management Metadata

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Why The Public Sector Needs Data Governance

Alation

NOVEMBER 22, 2022

Public sector departments and agencies traditionally collect data so that they can support citizens and deliver services. In today’s analytics-driven society, the public sector can transform this historic information to reduce operational costs and improve public service to better address the needs of a given community.

Data Governance

Data Governance Metadata Data-driven Unstructured Data

Improve observability across Amazon MWAA tasks

AWS Big Data

FEBRUARY 6, 2023

When each service in the workflow needs to log information, it can include this correlation ID, thereby ensuring you can track a full request from start to finish. So even if you use the correlation ID to query the different CloudWatch log groups, you won’t get any information about the run of the Spark job.

Management

Management Interactive Publishing Metadata

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

SAP Datasphere Powers Business at the Speed of Data

Webinars

Trending Sources

Bridging the gap between mainframe data and hybrid cloud environments

Webinars

How EUROGATE established a data mesh architecture using Amazon DataZone

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

Data’s dark secret: Why poor quality cripples AI and growth

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

How to Build a Successful Metadata Management Framework

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Available Now! Automated Testing for Data Transformations

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

Biggest Trends in Data Visualization Taking Shape in 2022

What is Data Lineage? Top 5 Benefits of Data Lineage

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Lay the groundwork now for advanced analytics and AI

Top 6 Benefits of Automating End-to-End Data Lineage

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Build a data lake with Apache Flink on Amazon EMR

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

How Your Finance Team Can Lead Your Enterprise Data Transformation

Making OT-IT integration a reality with new data architectures and generative AI

The What & Why of Data Governance

Tableau further democratizes analytics with AI-fueled features

Why Data Lineage is Key to the LIBOR Transition

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

Gain insights from historical location data using Amazon Location Service and AWS analytics services

How healthcare organizations can analyze and create insights using price transparency data

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

The importance of data ingestion and integration for enterprise AI

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

Empowering data mesh: The tools to deliver BI excellence

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Automate discovery of data relationships using ML and Amazon Neptune graph technology

How to use foundation models and trusted governance to manage AI workflow risk

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Why The Public Sector Needs Data Governance

Improve observability across Amazon MWAA tasks

Stay Connected