Data Transformation, Metadata and Presentation

Data Transformation

Metadata

Presentation

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

These data processing and analytical services support Structured Query Language (SQL) to interact with the data. Writing SQL queries requires not just remembering the SQL syntax rules, but also knowledge of the tables metadata, which is data about table schemas, relationships among the tables, and possible column values.

Metadata

Metadata Data Lake Modeling Data Warehouse

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

Data collections are the ones and zeroes that encode the actionable insights (patterns, trends, relationships) that we seek to extract from our data through machine learning and data science. Live online presentations, demos, and customer testimonials were complemented with new content posted at sap.com/datasphere.

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

A high hurdle many enterprises have yet to overcome is accessing mainframe data via the cloud. Mainframes hold an enormous amount of critical and sensitive business data including transactional information, healthcare records, customer data, and inventory metrics. Four key challenges prevent them from doing so: 1.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Institutional Data & AI Platform architecture The Institutional Division has implemented a self-service data platform to enable the domain teams to build and manage data products autonomously. The following diagram illustrates the building blocks of the Institutional Data & AI Platform.

Metadata

Metadata Data Governance Data Quality Data-driven

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

The Airflow REST API facilitates a wide range of use cases, from centralizing and automating administrative tasks to building event-driven, data-aware data pipelines. In this post, we discuss the enhancement and present several use cases that the enhancement unlocks for your Amazon MWAA environment.

Interactive

Interactive Testing Data-driven Data Lake

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

This person (or group of individuals) ensures that the theory behind data quality is communicated to the development team. 2 – Data profiling. Data profiling is an essential process in the DQM lifecycle. Data Quality Management Best Practices. Here, it all comes down to the data transformation error rate.

Data Quality

Data Quality Metrics Data-driven Management

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

It seamlessly consolidates data from various data sources within AWS, including AWS Cost Explorer (and forecasting with Cost Explorer ), AWS Trusted Advisor , and AWS Compute Optimizer. Data providers and consumers are the two fundamental users of a CDH dataset. You might notice that this differs slightly from traditional ETL.

Analytics

Analytics Dashboards Metadata Data Warehouse

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Data Vault 2.0 allows for the following: Agile data warehouse development Parallel data ingestion A scalable approach to handle multiple data sources even on the same entity A high level of automation Historization Full lineage support However, Data Vault 2.0

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

The data science algorithm Valentine is an effective tool for this. Valentine is presented in the paper Valentine: Evaluating Matching Techniques for Dataset Discovery (2021, Koutras et al.). This solution solves the interoperability and linkage problem for data products. We focus on the former.

Technology

Technology Data-driven Machine Learning Sales

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches. Athena is used to run geospatial queries on the location data stored in the S3 buckets. Choose Run. You can repeat this exercise using the lambda table.

Analytics

Analytics IoT Metadata Internet of Things

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

We chatted about industry trends, why decentralization has become a hot topic in the data world, and how metadata drives many data-centric use cases. But, through it all, Mohan says it’s critical to view everything through the same lens: gaining business value from data. Data fabric is a technology architecture.

Metadata

Metadata Data Warehouse Data Quality Data Lake

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow. With NiFi you can configure your source processor and run it independently of any other processors to retrieve data. Enabling self-service for developers.

Testing

Testing Cost-Benefit Interactive Visualization

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Alation

JULY 18, 2022

To ensure you can deliver on this world-changing vision of data, Alation helps you maximize the value of your data lake with integrations to the Unity catalog. Alation will leverage the Databricks Unity Catalog so users can easily integrate metadata from multiple workspaces, powering discovery, governance, and insights inside Alation.

ROI

ROI Metadata Data Lake Digital Transformation

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, more data is becoming available for processing / enrichment of existing and new use cases e.g., recently we have experienced a rapid growth in data collection at the edge and an increase in availability of frameworks for processing that data. As a result, alternative data integration technologies (e.g.,

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

We explore why Orca chose to build a transactional data lake and examine the key considerations that guided the selection of Apache Iceberg as the preferred table format. Lastly, we discuss the challenges encountered throughout the project, present the solutions used to address them, and share valuable lessons learned.

Data Lake

Data Lake Analytics Snapshot Data Quality

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

Businesses face significant hurdles when preparing data for artificial intelligence (AI) applications. The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Improve observability across Amazon MWAA tasks

AWS Big Data

FEBRUARY 6, 2023

The run_id is present as part of the Airflow task logs. format(S3_BUCKET_NAME), 's3://{}/data/aggregated/green'.format(S3_BUCKET_NAME), So even if you use the correlation ID to query the different CloudWatch log groups, you won’t get any information about the run of the Spark job.

Management

Management Interactive Publishing Metadata

Why The Public Sector Needs Data Governance

Alation

NOVEMBER 22, 2022

Effective data governance for the public sector enables entities to ensure data quality, enhance security, protect privacy, and meet compliance requirements. With so much focus on compliance, democratizing data for self-service analytics can present a challenge. Balance Defensive And Offensive Data Strategy.

Data Governance

Data Governance Metadata Data-driven Unstructured Data

How Data Lineage Improves Data Compliance

Octopai

DECEMBER 11, 2022

It’s for that reason that even as the first BCBS-239 implementation deadline came into effect a few years ago, McKinsey reported that one-third of Global Systemically Important Banks had focused on “documenting data lineage up to the level of provisioning data elements and including data transformation.”.

Insurance

Insurance Risk Metadata Visualization

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Data ingestion – Steps 1 and 2 use AWS DMS, which connects to the source database and moves full and incremental data (CDC) to Amazon S3 in Parquet format. Data transformation – Steps 3 and 4 represent an EMR Serverless Spark application (Amazon EMR 6.9 Let’s refer to this S3 bucket as the raw layer. if len(tables)!=len(partition_keys):

Data Lake

Data Lake Dashboards Metrics Metadata

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Discuss, don’t present. Present your business case. To support your case, present findings from the State of Embedded Analytics study. Information Delivery The main reason software providers take on an embedded analytics project is to improve how data is presented. It is now most definitely a need-to-have.

Analytics

Analytics Cost-Benefit Visualization Dashboards

A Stitch in Time: How Jet Analytics Boosts Microsoft Fabric Time-to-Value

Jet Global

MARCH 14, 2024

This allows you to fully utilize your Fabric-based systems and overcome typical obstacles related to complex data environments. Bridge Functional Gaps Fabric has shifted away from traditional relational database management systems (RDBMS), presenting users with a new challenge.

Analytics

Analytics Management Reporting Data Quality

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

AWS Big Data

APRIL 28, 2025

The data is stored in Apache Parquet format with AWS Glue Catalog providing metadata management. While this architecture supported NI analytical needs, it lacked the flexibility required for a truly open and adaptable data platform. The gold layer was coupled only with query engines that supported Hive and AWS Glue Data Catalog.

Data Lake

Data Lake Metadata Cost-Benefit Snapshot

Automating Data Warehouses in the Era of AI, Data Products and Data Lakehouses

BI-Survey

MARCH 6, 2025

While efficiency is a priority, data quality and security remain non-negotiable. Developing and maintaining data transformation pipelines are among the first tasks to be targeted for automation. However, caution is advised since accuracy, timeliness, and other aspects of data quality depend on the quality of data pipelines.

Data Warehouse

Data Warehouse Metadata Unstructured Data Data-driven

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

AWS Big Data

MAY 12, 2025

Streaming pipelines used Spark Streaming to ingest real-time data from Kafka, writing raw datasets to an Amazon Simple Storage Service (Amazon S3) data lake while simultaneously loading them into BigQuery and Google Cloud Storage to build logical data layers. but some of AppsFlyers workloads used earlier versions.

Metrics

Metrics Cost-Benefit Metadata Data Lake

Introducing the HubSpot connector for AWS Glue

AWS Big Data

DECEMBER 2, 2024

The rapid adoption has enabled them to quickly streamline operations, enhance collaboration, and gain more accessible, scalable solutions for managing their critical data and workflows. AWS Glue establishes a secure connection to HubSpot using OAuth for authorization and TLS for data encryption in transit.

Data Lake

Data Lake Testing Data Integration Metadata

How Airties achieved scalability and cost-efficiency by moving from Kafka to Amazon Kinesis Data Streams

AWS Big Data

MAY 29, 2025

Challenges with the Kafka-based architecture At Airties, managing and scaling Kafka clusters has presented several challenges, hindering the organization from focusing on delivering business value effectively: Operational overhead: Maintaining and monitoring Kafka clusters requires significant manual effort and operational overhead at Airties.

Cost-Benefit

Cost-Benefit Optimization Metadata Data-driven

Data Leaders Brief

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

SAP Datasphere Powers Business at the Speed of Data

Webinars

Trending Sources

Bridging the gap between mainframe data and hybrid cloud environments

Webinars

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Automate discovery of data relationships using ML and Amazon Neptune graph technology

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Addressing the Three Scalability Challenges in Modern Data Platforms

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Tackling AI’s data challenges with IBM databases on AWS

Improve observability across Amazon MWAA tasks

Why The Public Sector Needs Data Governance

How Data Lineage Improves Data Compliance

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

What Is Embedded Analytics?

A Stitch in Time: How Jet Analytics Boosts Microsoft Fabric Time-to-Value

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

Automating Data Warehouses in the Era of AI, Data Products and Data Lakehouses

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

Introducing the HubSpot connector for AWS Glue

How Airties achieved scalability and cost-efficiency by moving from Kafka to Amazon Kinesis Data Streams

Stay Connected