Data Transformation, Metadata and Reporting

Data Transformation

Metadata

Reporting

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Table metadata is fetched from AWS Glue. The generated Athena SQL query is run. ./

Metadata

Metadata Data Lake Modeling Data Warehouse

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

Data collections are the ones and zeroes that encode the actionable insights (patterns, trends, relationships) that we seek to extract from our data through machine learning and data science. Source: [link] SAP also announced key partners that further enhance Datasphere as a powerful business data fabric.

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

In the following section, two use cases demonstrate how the data mesh is established with Amazon DataZone to better facilitate machine learning for an IoT-based digital twin and BI dashboards and reporting using Tableau. From here, the metadata is published to Amazon DataZone by using AWS Glue Data Catalog.

IoT

IoT Machine Learning Metadata Data-driven

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How to Implement Data Lineage Mapping Techniques

Octopai

MARCH 31, 2021

Look for the Metadata. In order to perform accurate data lineage mapping, every process in the system that transforms or touches the data must be recorded. This metadata (read: data about your data) is key to tracking your data. Data Lineage by Tagging or Self-Contained Data Lineage.

Metadata

Metadata Data Transformation Business Intelligence Reporting

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

In this article, we will detail everything which is at stake when we talk about DQM: why it is essential, how to measure data quality, the pillars of good quality management, and some data quality control techniques. But first, let’s define what data quality actually is. 2 – Data profiling. 4 – Data Reporting.

Data Quality

Data Quality Metrics Data-driven Management

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Publish data assets – As the data producer from the retail team, you must ingest individual data assets into Amazon DataZone. For this use case, create a data source and import the technical metadata of four data assets— customers , order_items , orders , products , reviews , and shipments —from AWS Glue Data Catalog.

Visualization

Visualization Data Lake Testing Data Governance

Top 6 Benefits of Automating End-to-End Data Lineage

erwin

SEPTEMBER 17, 2020

According to erwin’s “2020 State of Data Governance and Automation” report , close to 70 percent of data professional respondents say they spend an average of 10 or more hours per week on data-related activities, and most of that time is spent searching for and preparing data.

Cost-Benefit

Cost-Benefit Data Governance Metadata Reporting

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources. Data transformations through stored procedures and use of materialized views to curate datasets and generate insights is a known pattern with relational databases.

Management

Management Metadata Analytics Dashboards

How to Build a Successful Metadata Management Framework

Alation

JUNE 28, 2022

This is where metadata, or the data about data, comes into play. Having a data catalog is the cornerstone of your data governance strategy, but what supports your data catalog? Your metadata management framework provides the underlying structure that makes your data accessible and manageable.

Metadata

Metadata Management Data Governance Machine Learning

What is Data Lineage? Top 5 Benefits of Data Lineage

erwin

APRIL 29, 2020

These tools range from enterprise service bus (ESB) products, data integration tools; extract, transform and load (ETL) tools, procedural code, application program interfaces (API)s, file transfer protocol (FTP) processes, and even business intelligence (BI) reports that further aggregate and transform data.

Key Performance Indicator

Key Performance Indicator Metadata Data Governance Data Quality

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Wayne Yaddow

MARCH 5, 2025

The goal is to examine five major methods of verifying and validating data transformations in data pipelines with an eye toward high-quality data deployment. First, we look at how unit and integration tests uncover transformation errors at an early stage. Applicability by Transformation Type 2.

Testing

Testing Data Transformation Statistics Metadata

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

“You had to be an expert in the programming language that interacts with that data, and understand the relationships of each data element within each data source, let alone understand its relation to elements in other data sources,” he says. Without those templates, it’s hard to add such information after the fact.”

Analytics

Analytics Data Lake Metadata Cost-Benefit

How Your Finance Team Can Lead Your Enterprise Data Transformation

Alation

OCTOBER 26, 2021

They need trusted data to drive reliable reporting, decision-making, and risk reduction. A Strong Data Culture Supports Strategic Decision Making. Our successful customers invest in and infuse data and analytics throughout the enterprise. After all, finance is one of the greatest consumers of data within a business.

Finance

Finance Data Transformation Enterprise Metrics

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

Duplicating data from a production database to a lower or lateral environment and masking personally identifiable information (PII) to comply with regulations enables development, testing, and reporting without impacting critical systems or exposing sensitive customer data. This saves time over manually defining schemas.

Visualization

Visualization Metadata Data Transformation Testing

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

It seamlessly consolidates data from various data sources within AWS, including AWS Cost Explorer (and forecasting with Cost Explorer ), AWS Trusted Advisor , and AWS Compute Optimizer. Data providers and consumers are the two fundamental users of a CDH dataset. You might notice that this differs slightly from traditional ETL.

Analytics

Analytics Dashboards Metadata Data Warehouse

The What & Why of Data Governance

erwin

MARCH 4, 2021

And when you talk about that question at a high level, he says, you get a very “simple answer,”– which is ‘the only thing we want to have is the right data with the right quality to the right person at the right time at the right cost.’. The Why: Data Governance Drivers. Why should companies care about data governance?

Data Governance

Data Governance Digital Transformation Data-driven Cost-Benefit

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. Data virtualization is ideal in any situation where the is necessary: Information coming from diverse data sources.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Making OT-IT integration a reality with new data architectures and generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

Legacy data management is holding back manufacturing transformation Until now, however, this vision has remained out of reach. The data transformation imperative What Denso and other industry leaders realise is that for IT-OT convergence to be realised, and the benefits of AI unlocked, data transformation is vital.

Data Architecture

Data Architecture Unstructured Data Manufacturing IT

Why Data Lineage is Key to the LIBOR Transition

Octopai

NOVEMBER 23, 2020

In fact, the LIBOR transition program marks one of the largest data transformation obstacles ever seen in financial services. Building an inventory of what will be affected is a huge undertaking across all of the data, reports, and structures that must be accounted for. Automated Data Lineage for Your LIBOR Project.

Metadata

Metadata Enterprise Business Intelligence Data Governance

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

dbt allows data teams to produce trusted data sets for reporting, ML modeling, and operational workflows using SQL, with a simple workflow that follows software engineering best practices like modularity, portability, and continuous integration/continuous development (CI/CD). The Open Data Lakehouse . Introduction.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

Why Enterprise Data Lineage is Critical for the Success of Your Modern Data Stack

Octopai

NOVEMBER 13, 2022

The modern data stack is a data management system built out of cloud-based data systems. A given modern data stack will usually include components for data ingestion from your data sources, data transformation, data storage, data analysis and reporting.

Enterprise

Enterprise Data Warehouse Reporting Metadata

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

Ontotext

NOVEMBER 18, 2021

OntoRefine is a data transformation tool that lets you unite plenty of data formats and get them into your triplestore. One of the core upsides of storing your data in that format is inference. You can think about that as metadata about the data, describing its relationships. Inferring new knowledge.

Visualization

Visualization Reporting Metadata Enterprise

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

The platform converges data cataloging, data ingestion, data profiling, data tagging, data discovery, and data exploration into a unified platform, driven by metadata. Modak Nabu automates repetitive tasks in the data preparation process and thus accelerates the data preparation by 4x.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

The data products used inside the company include insights from user journeys, operational reports, and marketing campaign results, among others. The data platform serves on average 60 thousand queries per day. The data volume is in double-digit TBs with steady growth as business and data sources evolve.

Data Lake

Data Lake Data Warehouse Data-driven B2B

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. A data store lets a business connect existing data with new data and discover new insights with real-time analytics and business intelligence. Increase trust in AI outcomes.

Risk

Risk Modeling Management Metadata

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications. For InitialRunFlag , choose Setup.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Data Vault 2.0 allows for the following: Agile data warehouse development Parallel data ingestion A scalable approach to handle multiple data sources even on the same entity A high level of automation Historization Full lineage support However, Data Vault 2.0

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

Amazon QuickSight is a fully managed, cloud-native business intelligence (BI) service that makes it easy to connect to your data, create interactive dashboards and reports, and share these with tens of thousands of users, either within QuickSight or embedded in your application or website.

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

NEW: Octopai Announces Support of Microsoft Azure Data Factory

Octopai

JANUARY 19, 2021

With Octopai’s support and analysis of Azure Data Factory, enterprises can now view complete end-to-end data lineage from Azure Data Factory all the way through to reporting for the first time ever.

Metadata

Metadata ROI Machine Learning Data Quality

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

Alation

APRIL 4, 2023

Now, joint users will get an enhanced view into cloud and data transformations , with valuable context to guide smarter usage. Integrating helpful metadata into user workflows gives all people, from data scientists to analysts , the context they need to use data more effectively.

Metadata

Metadata Cost-Benefit Data Transformation Predictive Modeling

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches. Athena is used to run geospatial queries on the location data stored in the S3 buckets. Choose Run.

Analytics

Analytics IoT Metadata Internet of Things

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. Creating a High-Quality Data Pipeline.

Data Governance

Data Governance Risk Metadata Management

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

This data is then used by various applications for streaming analytics, business intelligence, and reporting. In addition, using Apache Iceberg’s metadata tables proved to be very helpful in identifying issues related to the physical layout of Iceberg’s tables, which can directly impact query performance.

Data Lake

Data Lake Analytics Snapshot Data Quality

How Data Lineage Improves Data Compliance

Octopai

DECEMBER 11, 2022

It’s for that reason that even as the first BCBS-239 implementation deadline came into effect a few years ago, McKinsey reported that one-third of Global Systemically Important Banks had focused on “documenting data lineage up to the level of provisioning data elements and including data transformation.”.

Insurance

Insurance Risk Metadata Visualization

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

This involves unifying and sharing a single copy of data and metadata across IBM® watsonx.data ™, IBM® Db2 ®, IBM® Db2® Warehouse and IBM® Netezza ®, using native integrations and supporting open formats, all without the need for migration or recataloging. With Netezza support for 1.2

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Improve observability across Amazon MWAA tasks

AWS Big Data

FEBRUARY 6, 2023

The task_id used can be the name of your choice; here we use add_steps : # EMR steps to be executed by EMR cluster SPARK_TEST_STEPS = [{ 'Name': 'Run Spark', 'ActionOnFailure': 'CANCEL_AND_WAIT', 'HadoopJarStep': { 'Jar': 'command-runner.jar', 'Args': ['spark-submit', '/home/hadoop/aggregations.py', 's3://{}/data/transformed/green'.format(S3_BUCKET_NAME),

Management

Management Interactive Publishing Metadata

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

Data platform architecture has an interesting history. Towards the turn of millennium, enterprises started to realize that the reporting and business intelligence workload required a new solution rather than the transactional applications. A read-optimized platform that can integrate data from multiple applications emerged.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

These help data analysts visualize key insights that can help you make better data-backed decisions. ELT Data Transformation Tools: ELT data transformation tools are used to extract, load, and transform your data. Examples of data transformation tools include dbt and dataform.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

Alternatively, you can use AWS Glue for Apache Spark, which provides built-in support for bucketing configurations during the data transformation process. AWS Glue allows you to define bucketing parameters, such as the number of buckets and the columns to bucket on, providing an optimized data layout for efficient querying with Athena.

Optimization

Optimization Data Lake Cost-Benefit Reporting

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

SAP Datasphere Powers Business at the Speed of Data

Webinars

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Data’s dark secret: Why poor quality cripples AI and growth

How to Implement Data Lineage Mapping Techniques

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Top 6 Benefits of Automating End-to-End Data Lineage

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

How to Build a Successful Metadata Management Framework

What is Data Lineage? Top 5 Benefits of Data Lineage

Ensuring Data Transformation Quality with dbt Core

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Lay the groundwork now for advanced analytics and AI

How Your Finance Team Can Lead Your Enterprise Data Transformation

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

The What & Why of Data Governance

Biggest Trends in Data Visualization Taking Shape in 2022

Making OT-IT integration a reality with new data architectures and generative AI

Why Data Lineage is Key to the LIBOR Transition

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Why Enterprise Data Lineage is Critical for the Success of Your Modern Data Stack

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

How smava makes loans transparent and affordable using Amazon Redshift Serverless

How to use foundation models and trusted governance to manage AI workflow risk

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

NEW: Octopai Announces Support of Microsoft Azure Data Factory

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

How Data Lineage Improves Data Compliance

Tackling AI’s data challenges with IBM databases on AWS

Improve observability across Amazon MWAA tasks

Data platform trinity: Competitive or complementary?

The Modern Data Stack Explained: What The Future Holds

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Stay Connected