Data Transformation, IT and Metadata

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Writing SQL queries requires not just remembering the SQL syntax rules, but also knowledge of the tables metadata, which is data about table schemas, relationships among the tables, and possible column values. We discuss the challenges in maintaining the metadata as well as ways to overcome those challenges and enrich the metadata.

Metadata

Metadata Data Lake Modeling Data Warehouse

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

This is where we dispel an old “big data” notion (heard a decade ago) that was expressed like this: “we need our data to run at the speed of business.” Instead, what we really need is for our business to run at the speed of data. Datasphere is not just for data managers.

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

A high hurdle many enterprises have yet to overcome is accessing mainframe data via the cloud. Mainframes hold an enormous amount of critical and sensitive business data including transactional information, healthcare records, customer data, and inventory metrics.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. Recently, EUROGATE has developed a digital twin for its container terminal Hamburg (CTH), generating millions of data points every second from Internet of Things (IoT)devices attached to its container handling equipment (CHE).

IoT

IoT Machine Learning Metadata Data-driven

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

In today’s rapidly evolving financial landscape, data is the bedrock of innovation, enhancing customer and employee experiences and securing a competitive edge. Like many large financial institutions, ANZ Institutional Division operated with siloed data practices and centralized data management teams.

Metadata

Metadata Data Governance Data Quality Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

These issues dont just hinder next-gen analytics and AI; they erode trust, delay transformation and diminish business value. Data quality is no longer a back-office concern. In this article, I am drawing from firsthand experience working with CIOs, CDOs, CTOs and transformation leaders across industries.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

This middleware consists of custom code that runs data flows to stitch data transformations, search queries, and AI enrichments in varying combinations tailored to use cases, datasets, and requirements. Ingest flows are created to enrich data as its added to an index. Flows are a pipeline of processor resources.

Machine Learning

Machine Learning Visualization Dashboards Metadata

How to Implement Data Lineage Mapping Techniques

Octopai

MARCH 31, 2021

Ever reflect on what it would be like to be a piece of data that enters your BI system? It ain’t easy being data. Then again, it ain’t easy to be a BI developer trying to track data through a stream of twists, turns, transformations, and multiple BI systems. Look for the Metadata. Honey, I’m home!

Metadata

Metadata Data Transformation Business Intelligence Reporting

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

What Is Data Quality Management (DQM)? Data quality management is a set of practices that aim at maintaining a high quality of information. It goes all the way from the acquisition of data and the implementation of advanced data processes, to an effective distribution of data.

Data Quality

Data Quality Metrics Data-driven Management

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Making OT-IT integration a reality with new data architectures and generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

Manufacturers have long held a data-driven vision for the future of their industry. It’s one where near real-time data flows seamlessly between IT and operational technology (OT) systems. Legacy data management is holding back manufacturing transformation Until now, however, this vision has remained out of reach.

Data Architecture

Data Architecture Unstructured Data Manufacturing IT

How to Build a Successful Metadata Management Framework

Alation

JUNE 28, 2022

This is where metadata, or the data about data, comes into play. Having a data catalog is the cornerstone of your data governance strategy, but what supports your data catalog? Your metadata management framework provides the underlying structure that makes your data accessible and manageable.

Metadata

Metadata Management Data Governance Machine Learning

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone natively integrates with Amazon-specific options like Amazon Athena , Amazon Redshift , and Amazon SageMaker , allowing users to analyze their project governed data. Publish data assets – As the data producer from the retail team, you must ingest individual data assets into Amazon DataZone.

Visualization

Visualization Data Lake Testing Data Governance

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Wayne Yaddow

MARCH 5, 2025

In this post, well see the fundamental procedures, tools, and techniques that data engineers, data scientists, and QA/testing teams use to ensure high-quality data as soon as its deployed. First, we look at how unit and integration tests uncover transformation errors at an early stage. using Docker or local runners).

Testing

Testing Data Transformation Statistics Metadata

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

The lift and shift migration approach is limited in its ability to transform businesses because it relies on outdated, legacy technologies and architectures that limit flexibility and slow down productivity. It shows a call center streaming data source that sends the latest call center feed in every 15 seconds.

Management

Management Metadata Analytics Dashboards

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of. Choose Add data. Prerequisites Before you begin, make sure you have the followings: An AWS account. Choose Save changes.

Visualization

Visualization Data Processing Testing Publishing

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Alation

OCTOBER 18, 2022

Data analysts and engineers use dbt to transform, test, and document data in the cloud data warehouse. Yet every dbt transformation contains vital metadata that is not captured – until now. Data Transformation in the Modern Data Stack. How did the data transform exactly?

Metadata

Metadata Metrics Recreation/Entertainment Data Quality

Top 6 Benefits of Automating End-to-End Data Lineage

erwin

SEPTEMBER 17, 2020

Replace manual and recurring tasks for fast, reliable data lineage and overall data governance. It’s paramount that organizations understand the benefits of automating end-to-end data lineage. The importance of end-to-end data lineage is widely understood and ignoring it is risky business. Doing Data Lineage Right.

Cost-Benefit

Cost-Benefit Data Governance Metadata Reporting

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

It’s a set of HTTP endpoints to perform operations such as invoking Directed Acyclic Graphs (DAGs), checking task statuses, retrieving metadata about workflows, managing connections and variables, and even initiating dataset-related events, without directly accessing the Airflow web interface or command line tools.

Interactive

Interactive Testing Data-driven Data Lake

What is Data Lineage? Top 5 Benefits of Data Lineage

erwin

APRIL 29, 2020

Data lineage is the journey data takes from its creation through its transformations over time. Tracing the source of data is an arduous task. With all these diverse data sources, and if systems are integrated, it is difficult to understand the complicated data web they form much less get a simple visual flow.

Key Performance Indicator

Key Performance Indicator Metadata Data Governance Data Quality

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

It seamlessly consolidates data from various data sources within AWS, including AWS Cost Explorer (and forecasting with Cost Explorer ), AWS Trusted Advisor , and AWS Compute Optimizer. Overview of the BMW Cloud Data Hub At the BMW Group, Cloud Data Hub (CDH) is the central platform for managing company-wide data and data solutions.

Analytics

Analytics Dashboards Metadata Data Warehouse

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

You will learn how to prepare a multi-account environment to access the databases from AWS Glue, and how to model an ETL data flow that automatically masks PII as part of the transfer process, so that no sensitive information will be copied to the target database in its original form. 16 10.2.10.0/24 24 AWS Glue account 10.1.0.0/16

Visualization

Visualization Metadata Data Transformation Testing

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

When global technology company Lenovo started utilizing data analytics, they helped identify a new market niche for its gaming laptops, and powered remote diagnostics so their customers got the most from their servers and other devices.

Analytics

Analytics Data Lake Metadata Cost-Benefit

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. We would like to talk about data visualization and its role in the big data movement.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

How Your Finance Team Can Lead Your Enterprise Data Transformation

Alation

OCTOBER 26, 2021

These organizations that build an organization-wide data culture enjoy clear and compelling benefits. In our recent State of Data Culture Report , Alation found that nearly every organization (86%) with a top-tier data culture met or exceeded its revenue targets. Building a Data Culture Within a Finance Department.

Finance

Finance Data Transformation Enterprise Metrics

Tableau further democratizes analytics with AI-fueled features

CIO Business Intelligence

APRIL 30, 2024

At Tableau Conference 2024 in San Diego today, Tableau announced new AI features for Tableau Pulse and Einstein Copilot for Tableau, along with several platform improvements aimed at democratizing data insights. This is really empowering everyone to be a data expert,” Maxon said. “It Metric Goals.

Analytics

Analytics Metrics Visualization Dashboards

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

To build a data-driven business, it is important to democratize enterprise data assets in a data catalog. With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. For metadata read/write, Flink has the catalog interface.

Data Lake

Data Lake Metadata Business Analysis Data-driven

The What & Why of Data Governance

erwin

MARCH 4, 2021

Modern data governance is a strategic, ongoing and collaborative practice that enables organizations to discover and track their data, understand what it means within a business context, and maximize its security, quality and value. The What: Data Governance Defined. Data governance has no standard definition.

Data Governance

Data Governance Digital Transformation Data-driven Cost-Benefit

Why Data Lineage is Key to the LIBOR Transition

Octopai

NOVEMBER 23, 2020

The LIBOR transition is, first and foremost, a data challenge – and it’s one with a concrete deadline that companies can’t afford to miss. In fact, the LIBOR transition program marks one of the largest data transformation obstacles ever seen in financial services. Like any data project, it boils down to the details.

Metadata

Metadata Enterprise Business Intelligence Data Governance

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

We’re excited to announce the general availability of the open source adapters for dbt for all the engines in CDP — Apache Hive , Apache Impala , and Apache Spark, with added support for Apache Livy and Cloudera Data Engineering. The Open Data Lakehouse . Cloudera builds dbt adaptors for all engines in the open data lakehouse.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

AWS Big Data

APRIL 12, 2023

The solution uses the TPC-DS dataset and unmodified data schema and table relationships, but derives queries from TPC-DS to support the SparkSQL test cases. Metadata store – We use Spark’s in-memory data catalog to store metadata for TPC-DS databases and tables— spark.sql.catalogImplementation is set to the default value in-memory.

Testing

Testing Big Data Metadata Optimization

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

AWS Big Data

DECEMBER 5, 2023

This frees up our local computer space, greatly automates the survey cleaning and analysis step, and allows our clients to easily access the data results. .” — Harman Singh Dhodi, Analyst at HR&A Advisors, Inc. A combination of Amazon Redshift Spectrum and COPY commands are used to ingest the survey data stored as CSV files.

Measurement

Measurement Dashboards Data Warehouse Analytics

Sure, Trust Your Data… Until It Breaks Everything: How Automated Data Lineage Saves the Day

Octopai

JUNE 9, 2024

.” The Data Strategy HealthCo, like many forward-thinking organizations, recognized early on that data is not just a valuable asset but a strategic imperative. They put data at the forefront of their business, integrating it into decision-making processes, products, and services. The lack of trust in data created inertia.

IT

IT Data-driven Predictive Analytics Data Strategy

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

Why Enterprise Data Lineage is Critical for the Success of Your Modern Data Stack

Octopai

NOVEMBER 13, 2022

The modern data stack is a data management system built out of cloud-based data systems. A given modern data stack will usually include components for data ingestion from your data sources, data transformation, data storage, data analysis and reporting.

Enterprise

Enterprise Data Warehouse Reporting Metadata

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

The platform converges data cataloging, data ingestion, data profiling, data tagging, data discovery, and data exploration into a unified platform, driven by metadata. Modak Nabu automates repetitive tasks in the data preparation process and thus accelerates the data preparation by 4x.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

The solution consists of the following interfaces: IoT or mobile application – A mobile application or an Internet of Things (IoT) device allows the tracking of a company vehicle while it is in use and transmits its current location securely to the data ingestion layer in AWS. The ingestion approach is not in scope of this post. Choose Run.

Analytics

Analytics IoT Metadata Internet of Things

The importance of data ingestion and integration for enterprise AI

IBM Big Data Hub

JANUARY 9, 2024

The emergence of generative AI prompted several prominent companies to restrict its use because of the mishandling of sensitive internal data. Currently, no standardized process exists for overcoming data ingestion’s challenges, but the model’s accuracy depends on it. Increased variance: Variance measures consistency.

Enterprise

Enterprise Data Integration Data Quality Contextual Data

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

Amazon QuickSight is a fully managed, cloud-native business intelligence (BI) service that makes it easy to connect to your data, create interactive dashboards and reports, and share these with tens of thousands of users, either within QuickSight or embedded in your application or website.

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

Alation

APRIL 4, 2023

Now, joint users will get an enhanced view into cloud and data transformations , with valuable context to guide smarter usage. Integrating helpful metadata into user workflows gives all people, from data scientists to analysts , the context they need to use data more effectively. How was it used in the past?

Metadata

Metadata Cost-Benefit Data Transformation Predictive Modeling

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

Ontotext

NOVEMBER 18, 2021

You step onto the market, and if you don’t keep your data, there’s no knowing where you might be swept off to. [1]. Picture this – you start with the perfect use case for your data analytics product. Nowadays, data analytics doesn’t exist on its own. You make a great pitch and you sell well. Sounds unlikely.

Visualization

Visualization Reporting Metadata Enterprise

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

Due to this low complexity, the solution uses AWS serverless services to ingest the data, transform it, and make it available for analytics. AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, ML, and application development.

Visualization

Visualization Dashboards Data-driven Gap analysis

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

SAP Datasphere Powers Business at the Speed of Data

Webinars

Trending Sources

Bridging the gap between mainframe data and hybrid cloud environments

Webinars

How EUROGATE established a data mesh architecture using Amazon DataZone

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Data’s dark secret: Why poor quality cripples AI and growth

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

How to Implement Data Lineage Mapping Techniques

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Making OT-IT integration a reality with new data architectures and generative AI

How to Build a Successful Metadata Management Framework

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Ensuring Data Transformation Quality with dbt Core

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Top 6 Benefits of Automating End-to-End Data Lineage

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

What is Data Lineage? Top 5 Benefits of Data Lineage

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

Lay the groundwork now for advanced analytics and AI

Biggest Trends in Data Visualization Taking Shape in 2022

How Your Finance Team Can Lead Your Enterprise Data Transformation

Tableau further democratizes analytics with AI-fueled features

Build a data lake with Apache Flink on Amazon EMR

The What & Why of Data Governance

Why Data Lineage is Key to the LIBOR Transition

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

Sure, Trust Your Data… Until It Breaks Everything: How Automated Data Lineage Saves the Day

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Why Enterprise Data Lineage is Critical for the Success of Your Modern Data Stack

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Gain insights from historical location data using Amazon Location Service and AWS analytics services

The importance of data ingestion and integration for enterprise AI

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

How healthcare organizations can analyze and create insights using price transparency data

Stay Connected