Analytics, Data Transformation and Metadata

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Table metadata is fetched from AWS Glue. The generated Athena SQL query is run. ./

Metadata

Metadata Data Lake Modeling Data Warehouse

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

Here are just 10 of the many key features of Datasphere that were covered during the launch day announcements : Datasphere works with the SAP Analytics Cloud and runs on the existing SAP BTP (Business Technology Platform), with all the essential features: security, access control, high availability. Datasphere is not just for data managers.

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

A high hurdle many enterprises have yet to overcome is accessing mainframe data via the cloud. Mainframes hold an enormous amount of critical and sensitive business data including transactional information, healthcare records, customer data, and inventory metrics.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. Enhance agility by localizing changes within business domains and clear data contracts. Eliminate centralized bottlenecks and complex data pipelines.

IoT

IoT Machine Learning Metadata Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Yet, despite growing investments in advanced analytics and AI, organizations continue to grapple with a persistent and often underestimated challenge: poor data quality.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

These nodes can implement analytical platforms like data lake houses, data warehouses, or data marts, all united by producing data products. The Institutional Data & AI platform adopts a federated approach to data while centralizing the metadata to facilitate simpler discovery and sharing of data products.

Metadata

Metadata Data Governance Data Quality Data-driven

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Organizations with legacy, on-premises, near-real-time analytics solutions typically rely on self-managed relational databases as their data store for analytics workloads. Near-real-time streaming analytics captures the value of operational data and metrics to provide new insights to create business opportunities.

Management

Management Metadata Analytics Dashboards

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

When global technology company Lenovo started utilizing data analytics, they helped identify a new market niche for its gaming laptops, and powered remote diagnostics so their customers got the most from their servers and other devices.

Analytics

Analytics Data Lake Metadata Cost-Benefit

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

BMW Cloud Efficiency Analytics (CLEA) is a homegrown tool developed within the BMW FinOps CoE (Center of Excellence) aiming to optimize and reduce costs across all these accounts. In this post, we explore how the BMW Group FinOps CoE implemented their Cloud Efficiency Analytics tool (CLEA), powered by Amazon QuickSight and Amazon Athena.

Dashboards

Dashboards Analytics Metadata Data Warehouse

Tableau further democratizes analytics with AI-fueled features

CIO Business Intelligence

APRIL 30, 2024

At Tableau Conference 2024 in San Diego today, Tableau announced new AI features for Tableau Pulse and Einstein Copilot for Tableau, along with several platform improvements aimed at democratizing data insights. Tableau pitched its unveiling of Tableau Pulse last year as the harbinger of a new era of proactive analytics.

Analytics

Analytics Metrics Visualization Dashboards

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data. 10) Data Quality Solutions: Key Attributes. This person (or group of individuals) ensures that the theory behind data quality is communicated to the development team.

Data Quality

Data Quality Metrics Data-driven Management

How to Build a Successful Metadata Management Framework

Alation

JUNE 28, 2022

This is where metadata, or the data about data, comes into play. Having a data catalog is the cornerstone of your data governance strategy, but what supports your data catalog? Your metadata management framework provides the underlying structure that makes your data accessible and manageable.

Metadata

Metadata Management Data Governance Machine Learning

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Wayne Yaddow

MARCH 5, 2025

In this post, well see the fundamental procedures, tools, and techniques that data engineers, data scientists, and QA/testing teams use to ensure high-quality data as soon as its deployed. First, we look at how unit and integration tests uncover transformation errors at an early stage. using Docker or local runners).

Testing

Testing Data Transformation Statistics Metadata

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Alation

OCTOBER 18, 2022

Data analysts and engineers use dbt to transform, test, and document data in the cloud data warehouse. Yet every dbt transformation contains vital metadata that is not captured – until now. Data Transformation in the Modern Data Stack. How did the data transform exactly?

Metadata

Metadata Metrics Recreation/Entertainment Data Quality

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

Data analytics – Business analysts gather operational insights from multiple data sources, including the location data collected from the vehicles. You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches.

Analytics

Analytics IoT Metadata Internet of Things

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

break; } } } const frameOptions = { url: ' ', container: document.getElementById("dashboardContainer"), width: "100%", height: "AutoFit", loadingHeight: "200px", withIframePlaceholder: true, onChange: (changeEvent, metadata) => { switch (changeEvent.eventName) { case 'ERROR': { document.getElementById("dashboardContainer").append('Unable

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

AWS Big Data

DECEMBER 5, 2023

This cut down significantly on analytical turnaround times. The CARTO Analytics Toolbox for Redshift is composed of a set of user-defined functions and procedures organized in a set of modules based on the functionality they offer. These table definitions are used as the metadata repository for external tables in Amazon Redshift.

Measurement

Measurement Dashboards Data Warehouse Analytics

What is Data Lineage? Top 5 Benefits of Data Lineage

erwin

APRIL 29, 2020

An understanding of the data’s origins and history helps answer questions about the origin of data in a Key Performance Indicator (KPI) reports, including: How the report tables and columns are defined in the metadata? Who are the data owners? What are the transformation rules? Data Governance.

Metadata

Metadata Key Performance Indicator Data Governance Data Quality

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. If we talk about Big Data, data visualization is crucial to more successfully drive high-level decision making.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

One key component that plays a central role in modern data architectures is the data lake, which allows organizations to store and analyze large amounts of data in a cost-effective manner and run advanced analytics and machine learning (ML) at scale. To overcome these issues, Orca decided to build a data lake.

Data Lake

Data Lake Analytics Snapshot Data Quality

How Your Finance Team Can Lead Your Enterprise Data Transformation

Alation

OCTOBER 26, 2021

Because of the criticality of the data they deal with, we think that finance teams should lead the enterprise adoption of data and analytics solutions. Recent articles extol the benefits of supercharging analytics for finance departments 1. A Strong Data Culture Supports Strategic Decision Making.

Finance

Finance Data Transformation Enterprise Metrics

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

It’s a set of HTTP endpoints to perform operations such as invoking Directed Acyclic Graphs (DAGs), checking task statuses, retrieving metadata about workflows, managing connections and variables, and even initiating dataset-related events, without directly accessing the Airflow web interface or command line tools.

Interactive

Interactive Testing Data-driven Data Lake

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

Solution overview The following diagram illustrates the solution architecture: The solution uses AWS Glue as an ETL engine to extract data from the source Amazon RDS database. Built-in data transformations then scrub columns containing PII using pre-defined masking functions. This saves time over manually defining schemas.

Visualization

Visualization Metadata Data Transformation Testing

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

AWS Big Data

APRIL 2, 2024

You can use Amazon Data Firehose to aggregate and deliver log events from your applications and services captured in Amazon CloudWatch Logs to your Amazon Simple Storage Service (Amazon S3) bucket and Splunk destinations, for use cases such as data analytics, security analysis, application troubleshooting etc.

Metadata

Metadata Marketing Analytics Data Transformation

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of.

Visualization

Visualization Data Processing Testing Publishing

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

We’re excited to announce the general availability of the open source adapters for dbt for all the engines in CDP — Apache Hive , Apache Impala , and Apache Spark, with added support for Apache Livy and Cloudera Data Engineering. The Open Data Lakehouse . Cloudera builds dbt adaptors for all engines in the open data lakehouse.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

The What & Why of Data Governance

erwin

MARCH 4, 2021

Why should companies care about data governance? erwin’s 2020 State of Data Governance and Automation report found that better decision-making is the primary driver for data governance (62 percent), with analytics secondary (51 percent), and regulatory compliance coming in third (48 percent).

Data Governance

Data Governance Digital Transformation Data-driven Cost-Benefit

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

Modak’s Nabu is a born in the cloud, cloud-neutral integrated data engineering platform designed to accelerate the journey of enterprises to the cloud. Modak empowers organizations to maximize their ROI from existing analytics infrastructure through interoperability. Modak Nabu TM and CDE’s Spark-on-Kubernetes.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. The AWS Glue Data Catalog provides a uniform repository where disparate systems can store and find metadata to keep track of data in data silos. Refer to Catalogs for more information.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications. For InitialRunFlag , choose Setup. Choose Update.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

The data in the machine-readable files can provide valuable insights to understand the true cost of healthcare services and compare prices and quality across hospitals. The availability of machine-readable files opens up new possibilities for data analytics, allowing organizations to analyze large amounts of pricing data.

Visualization

Visualization Dashboards Data-driven Gap analysis

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

Ontotext

NOVEMBER 18, 2021

Picture this – you start with the perfect use case for your data analytics product. And all of them are asking hard questions: “Can you integrate my data, with my particular format?”, “How well can you scale?”, “How many visualizations do you offer?”. Nowadays, data analytics doesn’t exist on its own.

Visualization

Visualization Reporting Metadata Enterprise

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. This was, without a question, a significant departure from traditional analytic environments, which often meant vendor-lock in and the inability to work with data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

Alation

APRIL 4, 2023

Alation delivers extended connectivity for Databricks Unity Catalog , the lakehouse company, and new connectivity for dbt Cloud by dbt Labs , the pioneer in analytics engineering. Now, joint users will get an enhanced view into cloud and data transformations , with valuable context to guide smarter usage.

Metadata

Metadata Cost-Benefit Data Transformation Predictive Modeling

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

DataOps sprung up to connect data sources to data consumers. The data warehouse and analytical data stores moved to the cloud and disaggregated into the data mesh. But, through it all, Mohan says it’s critical to view everything through the same lens: gaining business value from data.

Metadata

Metadata Data Warehouse Data Quality Data Lake

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

AWS Big Data

APRIL 12, 2023

Metadata store – We use Spark’s in-memory data catalog to store metadata for TPC-DS databases and tables— spark.sql.catalogImplementation is set to the default value in-memory. About the Authors Melody Yang is a Senior Big Data Solution Architect for Amazon EMR at AWS. test: EMR release – EMR 6.10.0

Testing

Testing Big Data Metadata Optimization

NEW: Octopai Announces Support of Microsoft Azure Data Factory

Octopai

JANUARY 19, 2021

This is done by visualizing the Azure Data Factory pipelines’ full column-level with source-to-target traceability through different data transformations at the most detailed level. Octopai can fully map the BI landscape and trace metadata movement in a mixed environment including complex multi-vendor landscapes.

Metadata

Metadata ROI Machine Learning Data Quality

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To speed up the self-service analytics and foster innovation based on data, a solution was needed to provide ways to allow any team to create data products on their own in a decentralized manner. To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In legacy analytical systems such as enterprise data warehouses, the scalability challenges of a system were primarily associated with computational scalability, i.e., the ability of a data platform to handle larger volumes of data in an agile and cost-efficient way. Introduction. CRM platforms).

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

SAP Datasphere Powers Business at the Speed of Data

Webinars

Trending Sources

Bridging the gap between mainframe data and hybrid cloud environments

Webinars

How EUROGATE established a data mesh architecture using Amazon DataZone

Data’s dark secret: Why poor quality cripples AI and growth

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Lay the groundwork now for advanced analytics and AI

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Tableau further democratizes analytics with AI-fueled features

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

How to Build a Successful Metadata Management Framework

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Ensuring Data Transformation Quality with dbt Core

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

What is Data Lineage? Top 5 Benefits of Data Lineage

Biggest Trends in Data Visualization Taking Shape in 2022

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

How Your Finance Team Can Lead Your Enterprise Data Transformation

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

The What & Why of Data Governance

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Build a data lake with Apache Flink on Amazon EMR

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

How healthcare organizations can analyze and create insights using price transparency data

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

How to modernize data lakes with a data lakehouse architecture

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

NEW: Octopai Announces Support of Microsoft Azure Data Factory

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Addressing the Three Scalability Challenges in Modern Data Platforms

Stay Connected