Data Transformation, Management and Metadata

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

These data processing and analytical services support Structured Query Language (SQL) to interact with the data. Writing SQL queries requires not just remembering the SQL syntax rules, but also knowledge of the tables metadata, which is data about table schemas, relationships among the tables, and possible column values.

Metadata

Metadata Data Lake Modeling Data Warehouse

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

Datasphere accesses and integrates both SAP and non-SAP data sources into end-users’ data flows, including on-prem data warehouses, cloud data warehouses and lakehouses, relational databases, virtual data products, in-memory data, and applications that generate data (such as external API data loads).

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data. 10) Data Quality Solutions: Key Attributes.

Data Quality

Data Quality Metrics Data-driven Management

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Recognizing this paradigm shift, ANZ Institutional Division has embarked on a transformative journey to redefine its approach to data management, utilization, and extracting significant business value from data insights. This enables global discoverability and collaboration without centralizing ownership or operations.

Metadata

Metadata Data Governance Data Quality Data-driven

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

A high hurdle many enterprises have yet to overcome is accessing mainframe data via the cloud. Mainframes hold an enormous amount of critical and sensitive business data including transactional information, healthcare records, customer data, and inventory metrics. Four key challenges prevent them from doing so: 1.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. With the addition of these technologies alongside existing systems like terminal operating systems (TOS) and SAP, the number of data producers has grown substantially. This process is shown in the following figure.

IoT

IoT Machine Learning Metadata Data-driven

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Organizations with legacy, on-premises, near-real-time analytics solutions typically rely on self-managed relational databases as their data store for analytics workloads. Near-real-time streaming analytics captures the value of operational data and metrics to provide new insights to create business opportunities.

Management

Management Metadata Analytics Dashboards

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a fully managed service that builds upon Apache Airflow, offering its benefits while eliminating the need for you to set up, operate, and maintain the underlying infrastructure, reducing operational overhead while increasing security and resilience.

Interactive

Interactive Testing Data-driven Data Lake

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

This middleware consists of custom code that runs data flows to stitch data transformations, search queries, and AI enrichments in varying combinations tailored to use cases, datasets, and requirements. The visual designer is recommended for helping you manage workflow projects. But what happens if we search for NBA?

Machine Learning

Machine Learning Visualization Dashboards Metadata

How to Build a Successful Metadata Management Framework

Alation

JUNE 28, 2022

The real challenge lies in getting people to access, manage, and search for it appropriately. This is where metadata, or the data about data, comes into play. Having a data catalog is the cornerstone of your data governance strategy, but what supports your data catalog? This information is metadata.

Metadata

Metadata Management Data Governance Machine Learning

How to Implement Data Lineage Mapping Techniques

Octopai

MARCH 31, 2021

Look for the Metadata. In order to perform accurate data lineage mapping, every process in the system that transforms or touches the data must be recorded. This metadata (read: data about your data) is key to tracking your data. Data Lineage by Tagging or Self-Contained Data Lineage.

Metadata

Metadata Data Transformation Business Intelligence Reporting

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

For each service, you need to learn the supported authorization and authentication methods, data access APIs, and framework to onboard and test data sources. This fragmented, repetitive, and error-prone experience for data connectivity is a significant obstacle to data integration, analysis, and machine learning (ML) initiatives.

Visualization

Visualization Data Processing Testing Publishing

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Ali Tore, Senior Vice President of Advanced Analytics at Salesforce, highlighting the value of this integration, says “We’re excited to partner with Amazon to bring Tableau’s powerful data exploration and AI-driven analytics capabilities to customers managing data across organizational boundaries with Amazon DataZone.

Visualization

Visualization Data Lake Testing Data Governance

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In collaboration with AWS, BMS identified a business need to migrate and modernize their custom extract, transform, and load (ETL) platform to a native AWS solution to reduce complexities, resources, and investment to upgrade when new Spark, Python, or AWS Glue versions are released.

Metadata

Metadata Data Lake Visualization Data Quality

Available Now! Automated Testing for Data Transformations

Wayne Yaddow

FEBRUARY 18, 2025

Selecting the strategies and tools for validating data transformations and data conversions in your data pipelines. Introduction Data transformations and data conversions are crucial to ensure that raw data is organized, processed, and ready for useful analysis.

Testing

Testing Data Transformation Data-driven Data Quality

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

It seamlessly consolidates data from various data sources within AWS, including AWS Cost Explorer (and forecasting with Cost Explorer ), AWS Trusted Advisor , and AWS Compute Optimizer. Overview of the BMW Cloud Data Hub At the BMW Group, Cloud Data Hub (CDH) is the central platform for managing company-wide data and data solutions.

Analytics

Analytics Dashboards Metadata Data Warehouse

Top 6 Benefits of Automating End-to-End Data Lineage

erwin

SEPTEMBER 17, 2020

Regulatory compliance places greater transparency demands on firms when it comes to tracing and auditing data. Business terms and data policies should be implemented through standardized and documented business rules. Some data needs to be backed up daily while some types of data demand weekly or monthly backups.

Cost-Benefit

Cost-Benefit Data Governance Metadata Reporting

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

AWS Glue Studio visual editor provides a low-code graphic environment to build, run, and monitor extract, transform, and load (ETL) scripts. There’s no infrastructure to manage, so you can focus on rapidly building compliant data flows between key systems. An AWS Identity and Access Management (IAM) role is used for AWS Glue.

Visualization

Visualization Metadata Data Transformation Testing

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

“You had to be an expert in the programming language that interacts with that data, and understand the relationships of each data element within each data source, let alone understand its relation to elements in other data sources,” he says.

Analytics

Analytics Data Lake Metadata Cost-Benefit

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

AI governance refers to the practice of directing, managing and monitoring an organization’s AI activities. It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits.

Risk

Risk Modeling Management Metadata

What is Data Lineage? Top 5 Benefits of Data Lineage

erwin

APRIL 29, 2020

An understanding of the data’s origins and history helps answer questions about the origin of data in a Key Performance Indicator (KPI) reports, including: How the report tables and columns are defined in the metadata? Who are the data owners? What are the transformation rules? Data Governance.

Key Performance Indicator

Key Performance Indicator Metadata Data Governance Data Quality

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. How does Data Virtualization manage data quality requirements?

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Making OT-IT integration a reality with new data architectures and generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

Where all data – structured, semi-structured, and unstructured – is sourced, unified, and exploited in automated processes, AI tools and by highly skilled, but over-stretched, employees. Legacy data management is holding back manufacturing transformation Until now, however, this vision has remained out of reach.

Data Architecture

Data Architecture Unstructured Data Manufacturing IT

How Your Finance Team Can Lead Your Enterprise Data Transformation

Alation

OCTOBER 26, 2021

Building a Data Culture Within a Finance Department. Our finance users tell us that their first exposure to the Alation Data Catalog often comes soon after the launch of organization-wide data transformation efforts. After all, finance is one of the greatest consumers of data within a business. Don’t overthink it.

Finance

Finance Data Transformation Enterprise Metrics

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

The AWS Glue Data Catalog provides a uniform repository where disparate systems can store and find metadata to keep track of data in data silos. Apache Flink is a widely used data processing engine for scalable streaming ETL, analytics, and event-driven applications. Apache Hudi also has its own catalog management.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Data Catalog Role in Data Transformation and Governance

TDAN

AUGUST 15, 2023

Nearly every data leader I talk to is in the midst of a data transformation. As businesses look for ways to increase sales, improve customer experience, and stay ahead of the competition, they are realizing that data is their competitive advantage and the key to achieving their goals. And it’s no surprise, really.

Data Transformation

Data Transformation Sales Metadata Data Governance

The What & Why of Data Governance

erwin

MARCH 4, 2021

However, while digital transformation and other data-driven initiatives are desired outcomes, few organizations know what data they have or where it is, and they struggle to integrate known data in various formats and numerous systems – especially if they don’t have a way to automate those processes.

Data Governance

Data Governance Digital Transformation Data-driven Cost-Benefit

Tableau further democratizes analytics with AI-fueled features

CIO Business Intelligence

APRIL 30, 2024

This feature enables users to compare progress on a metric with a set benchmark or goal, allowing a sales manager to track their pipeline versus targets, for example. Einstein Copilot for Tableau remains in beta, but Tableau announced two new features for the AI assistant as well: AI-assisted data transformation.

Analytics

Analytics Metrics Visualization Dashboards

Why Data Lineage is Key to the LIBOR Transition

Octopai

NOVEMBER 23, 2020

Naturally, this will produce a massive growth in data management needs. Particularly in financial services, each company’s data team will have significant work to do to keep up with these developments and this is on top of the regulatory challenges that they already face on a daily basis.

Metadata

Metadata Enterprise Business Intelligence Data Governance

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

AWS Big Data

APRIL 2, 2024

You can see the decompressed data has metadata information such as logGroup , logStream , and subscriptionFilters , and the actual data is included within the message field under logEvents (the following example shows an example of CloudTrail events in the CloudWatch Logs). You can connect with Ranjit on LinkedIn.

Metadata

Metadata Marketing Analytics Data Transformation

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

The platform converges data cataloging, data ingestion, data profiling, data tagging, data discovery, and data exploration into a unified platform, driven by metadata. Modak Nabu automates repetitive tasks in the data preparation process and thus accelerates the data preparation by 4x.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. Creating a High-Quality Data Pipeline.

Data Governance

Data Governance Risk Metadata Management

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

We’re excited to announce the general availability of the open source adapters for dbt for all the engines in CDP — Apache Hive , Apache Impala , and Apache Spark, with added support for Apache Livy and Cloudera Data Engineering. The Open Data Lakehouse . Cloudera builds dbt adaptors for all engines in the open data lakehouse.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

AWS Big Data

DECEMBER 5, 2023

A combination of Amazon Redshift Spectrum and COPY commands are used to ingest the survey data stored as CSV files. For the files with unknown structures, AWS Glue crawlers are used to extract metadata and create table definitions in the Data Catalog.

Measurement

Measurement Dashboards Data Warehouse Analytics

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

AWS Big Data

APRIL 12, 2023

EMR on EKS simplifies your infrastructure management, maximizes resource utilization, and reduces your cost. We have been continually improving the Spark performance in each Amazon EMR release to further shorten job runtime and optimize users’ spending on their Amazon EMR big data workloads. test: EMR release – EMR 6.10.0

Testing

Testing Big Data Metadata Optimization

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

Amazon QuickSight is a fully managed, cloud-native business intelligence (BI) service that makes it easy to connect to your data, create interactive dashboards and reports, and share these with tens of thousands of users, either within QuickSight or embedded in your application or website. The QuickSight SDK v2.0

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

How Infomedia built a serverless data pipeline with change data capture using AWS Glue and Apache Hudi

AWS Big Data

MARCH 15, 2023

Infomedia Ltd (ASX:IFM) is a leading global provider of DaaS and SaaS solutions that empowers the data-driven automotive ecosystem. Infomedia’s solutions help OEMs, NSCs, dealerships and 3rd party partners manage the vehicle and customer lifecycle.

Cost-Benefit

Cost-Benefit Data Processing Optimization Data-driven

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

This post shows how you can use Amazon Location, EventBridge, Lambda, Amazon Data Firehose , and Amazon S3 to build a location-aware data pipeline, and use this data to drive meaningful insights using AWS Glue and Athena. Overview of solution This is a fully serverless solution for location-based asset management.

Analytics

Analytics IoT Metadata Internet of Things

Why Enterprise Data Lineage is Critical for the Success of Your Modern Data Stack

Octopai

NOVEMBER 13, 2022

The modern data stack is a data management system built out of cloud-based data systems. A given modern data stack will usually include components for data ingestion from your data sources, data transformation, data storage, data analysis and reporting.

Enterprise

Enterprise Data Warehouse Reporting Metadata

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

Moreover, running advanced analytics and ML on disparate data sources proved challenging. To overcome these issues, Orca decided to build a data lake. By decoupling storage and compute, data lakes promote cost-effective storage and processing of big data. This ensures that the data is suitable for training purposes.

Data Lake

Data Lake Analytics Snapshot Data Quality

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

SAP Datasphere Powers Business at the Speed of Data

Webinars

Trending Sources

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Webinars

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Bridging the gap between mainframe data and hybrid cloud environments

How EUROGATE established a data mesh architecture using Amazon DataZone

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Data’s dark secret: Why poor quality cripples AI and growth

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

How to Build a Successful Metadata Management Framework

How to Implement Data Lineage Mapping Techniques

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Available Now! Automated Testing for Data Transformations

Ensuring Data Transformation Quality with dbt Core

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Top 6 Benefits of Automating End-to-End Data Lineage

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

Lay the groundwork now for advanced analytics and AI

How to use foundation models and trusted governance to manage AI workflow risk

What is Data Lineage? Top 5 Benefits of Data Lineage

Biggest Trends in Data Visualization Taking Shape in 2022

Making OT-IT integration a reality with new data architectures and generative AI

How Your Finance Team Can Lead Your Enterprise Data Transformation

Build a data lake with Apache Flink on Amazon EMR

Data Catalog Role in Data Transformation and Governance

The What & Why of Data Governance

Tableau further democratizes analytics with AI-fueled features

Why Data Lineage is Key to the LIBOR Transition

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

How Infomedia built a serverless data pipeline with change data capture using AWS Glue and Apache Hudi

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Why Enterprise Data Lineage is Critical for the Success of Your Modern Data Stack

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Stay Connected