Data Transformation, Metadata and Strategy

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

These data processing and analytical services support Structured Query Language (SQL) to interact with the data. Writing SQL queries requires not just remembering the SQL syntax rules, but also knowledge of the tables metadata, which is data about table schemas, relationships among the tables, and possible column values.

Metadata

Metadata Data Lake Modeling Data Warehouse

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

Effective DQM is recognized as essential to any consistent data analysis, as the quality of data is crucial to derive actionable and – more importantly – accurate insights from your information. There are a lot of strategies that you can use to improve the quality of your information. 2 – Data profiling.

Data Quality

Data Quality Metrics Data-driven Management

How to Build a Successful Metadata Management Framework

Alation

JUNE 28, 2022

This is where metadata, or the data about data, comes into play. Having a data catalog is the cornerstone of your data governance strategy, but what supports your data catalog? Your metadata management framework provides the underlying structure that makes your data accessible and manageable.

Metadata

Metadata Management Data Governance Machine Learning

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

Available Now! Automated Testing for Data Transformations

Wayne Yaddow

FEBRUARY 18, 2025

Selecting the strategies and tools for validating data transformations and data conversions in your data pipelines. Introduction Data transformations and data conversions are crucial to ensure that raw data is organized, processed, and ready for useful analysis.

Testing

Testing Data Transformation Data-driven Data Quality

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Wayne Yaddow

MARCH 5, 2025

The goal is to examine five major methods of verifying and validating data transformations in data pipelines with an eye toward high-quality data deployment. First, we look at how unit and integration tests uncover transformation errors at an early stage. Applicability by Transformation Type 2.

Testing

Testing Data Transformation Statistics Metadata

Top 6 Benefits of Automating End-to-End Data Lineage

erwin

SEPTEMBER 17, 2020

Business terms and data policies should be implemented through standardized and documented business rules. Compliance with these business rules can be tracked through data lineage, incorporating auditability and validation controls across data transformations and pipelines to generate alerts when there are non-compliant data instances.

Cost-Benefit

Cost-Benefit Data Governance Metadata Reporting

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. How does Data Virtualization complement Data Warehousing and SOA Architectures?

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Data Catalog Role in Data Transformation and Governance

TDAN

AUGUST 15, 2023

Nearly every data leader I talk to is in the midst of a data transformation. As businesses look for ways to increase sales, improve customer experience, and stay ahead of the competition, they are realizing that data is their competitive advantage and the key to achieving their goals. And it’s no surprise, really.

Data Transformation

Data Transformation Sales Metadata Data Governance

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

AWS Big Data

APRIL 2, 2024

You can see the decompressed data has metadata information such as logGroup , logStream , and subscriptionFilters , and the actual data is included within the message field under logEvents (the following example shows an example of CloudTrail events in the CloudWatch Logs). You can connect with Ranjit on LinkedIn.

Metadata

Metadata Marketing Analytics Data Transformation

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

AWS Big Data

DECEMBER 5, 2023

A combination of Amazon Redshift Spectrum and COPY commands are used to ingest the survey data stored as CSV files. For the files with unknown structures, AWS Glue crawlers are used to extract metadata and create table definitions in the Data Catalog. The first image shows the dashboard without any active filters.

Measurement

Measurement Dashboards Data Warehouse Analytics

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

Conclusion Data-driven organizations are transitioning to a data product way of thinking. Utilizing strategies like data mesh generates value on a large scale. We took this a step further by creating a blueprint to create smart recommendations by linking similar data products using graph technology and ML.

Technology

Technology Data-driven Machine Learning Sales

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

Due to this low complexity, the solution uses AWS serverless services to ingest the data, transform it, and make it available for analytics. The Data Catalog now contains references to the machine-readable data. Use the Data Catalog and transform the hospital price transparency data.

Visualization

Visualization Dashboards Data-driven Gap analysis

NEW: Octopai Announces Support of Microsoft Azure Data Factory

Octopai

JANUARY 19, 2021

Octopai is the first BI Intelligence platform to analyze Azure Data Factory in hybrid BI environments, providing automated data lineage and discovery and will continue to announce the early support of more platforms as part of an overall strategy to have one centralized view of the entire BI landscape. “We

Metadata

Metadata ROI Machine Learning Data Quality

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches. Athena is used to run geospatial queries on the location data stored in the S3 buckets. Use DeviceId as an additional prefix to write the objects to the bucket. Choose Run.

Analytics

Analytics IoT Metadata Internet of Things

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

We chatted about industry trends, why decentralization has become a hot topic in the data world, and how metadata drives many data-centric use cases. But, through it all, Mohan says it’s critical to view everything through the same lens: gaining business value from data. Data fabric is a technology architecture.

Metadata

Metadata Data Warehouse Data Quality Data Lake

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. Creating a High-Quality Data Pipeline.

Data Governance

Data Governance Risk Metadata Management

Sure, Trust Your Data… Until It Breaks Everything: How Automated Data Lineage Saves the Day

Octopai

JUNE 9, 2024

This challenge is especially critical for executives responsible for data strategy and operations. Here’s how automated data lineage can transform these challenges into opportunities, as illustrated by the journey of a health services company we’ll call “HealthCo.” This is where Octopai excels.

IT

IT Data-driven Predictive Analytics Data Strategy

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

This involves unifying and sharing a single copy of data and metadata across IBM® watsonx.data ™, IBM® Db2 ®, IBM® Db2® Warehouse and IBM® Netezza ®, using native integrations and supporting open formats, all without the need for migration or recataloging.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

Specifically, the system uses Amazon SageMaker Processing jobs to process the data stored in the data lake, employing the AWS SDK for Pandas (previously known as AWS Wrangler) for various data transformation operations, including cleaning, normalization, and feature engineering.

Data Lake

Data Lake Analytics Snapshot Data Quality

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

Implementing an effective data sharing strategy that satisfies compliance and regulatory requirements is complex. Customers often need to share data between disparate software as a service (SaaS) platforms within their organization or across organizations.

Sales

Sales Visualization Software Metadata

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

foundation models to help users discover, augment, and enrich data with natural language. Watsonx.data is built on 3 core integrated components: multiple query engines, a catalog that keeps track of metadata, and storage and relational data sources which the query engines directly access.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

How Data Lineage Improves Data Compliance

Octopai

DECEMBER 11, 2022

It’s for that reason that even as the first BCBS-239 implementation deadline came into effect a few years ago, McKinsey reported that one-third of Global Systemically Important Banks had focused on “documenting data lineage up to the level of provisioning data elements and including data transformation.”.

Insurance

Insurance Risk Metadata Visualization

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

Another popular transaction data lake use case is incremental query. Incremental query refers to a query strategy that focuses on processing and analyzing only the new or updated data within a data lake since the last query. Melody Yang is a Senior Big Data Solution Architect for Amazon EMR at AWS.

Data Lake

Data Lake Snapshot Big Data Data-driven

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Data ingestion – Steps 1 and 2 use AWS DMS, which connects to the source database and moves full and incremental data (CDC) to Amazon S3 in Parquet format. Data transformation – Steps 3 and 4 represent an EMR Serverless Spark application (Amazon EMR 6.9 Let’s refer to this S3 bucket as the raw layer.

Data Lake

Data Lake Dashboards Metrics Metadata

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

These help data analysts visualize key insights that can help you make better data-backed decisions. ELT Data Transformation Tools: ELT data transformation tools are used to extract, load, and transform your data. Examples of data transformation tools include dbt and dataform.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

Alternatively, you can use AWS Glue for Apache Spark, which provides built-in support for bucketing configurations during the data transformation process. Given this scenario, it would be a good idea to partition the data by report_type and bucket it by station_id. There are two folders: data and metadata.

Optimization

Optimization Data Lake Cost-Benefit Reporting

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

This post explores how the shift to a data product mindset is being implemented, the challenges faced, and the early wins that are shaping the future of data management in the Institutional Division. The following diagram illustrates the building blocks of the Institutional Data & AI Platform.

Metadata

Metadata Data Governance Data Quality Data-driven

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Publish data assets – As the data producer from the retail team, you must ingest individual data assets into Amazon DataZone. For this use case, create a data source and import the technical metadata of four data assets— customers , order_items , orders , products , reviews , and shipments —from AWS Glue Data Catalog.

Visualization

Visualization Data Lake Testing Data Governance

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

This was, without a question, a significant departure from traditional analytic environments, which often meant vendor-lock in and the inability to work with data at scale. Another unexpected challenge was the introduction of Spark as a processing framework for big data. Comprehensive data security and data governance (i.e.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Why The Public Sector Needs Data Governance

Alation

NOVEMBER 22, 2022

Identifying structured and unstructured data. Setting data management policies, like tagging data. A comprehensive data governance strategy ensures that you have quality data so you can leverage insights for data-driven decision making. Why Is Data Governance In The Public Sector Important?

Data Governance

Data Governance Metadata Data-driven Unstructured Data

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Ontotext

FEBRUARY 12, 2024

Usually, organizations will combine different domain topologies, depending on the trade-offs, and choose to focus on specific aspects of data mesh. Once accomplished, an effective implementation spurs a mindset in which organizations prioritize and value data for decision-making, formulating strategies, and day-to-day operations.

Data-driven

Data-driven Data Lake Data Quality Business Objectives

Tableau further democratizes analytics with AI-fueled features

CIO Business Intelligence

APRIL 30, 2024

Your AI strategy is only as good as your data strategy,” Tableau CMO Elizabeth Maxon said in a press conference Monday. But to us, it’s more than just having a data strategy; it’s also about building a great foundation of a data culture.”

Analytics

Analytics Metrics Visualization Dashboards

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

AWS Big Data

NOVEMBER 6, 2024

To learn more about how to process Firehose records using Lambda, see Transform source data in Amazon Data Firehose. After executing your Lambda function, Firehose looks for routing information and operations in the metadata fields (in the following format) provided by your Lambda function. b64decode(record['data']).decode('utf-8')

Metadata

Metadata Data Lake Management Internet of Things

Automating Data Warehouses in the Era of AI, Data Products and Data Lakehouses

BI-Survey

MARCH 6, 2025

While efficiency is a priority, data quality and security remain non-negotiable. Developing and maintaining data transformation pipelines are among the first tasks to be targeted for automation. However, caution is advised since accuracy, timeliness, and other aspects of data quality depend on the quality of data pipelines.

Data Warehouse

Data Warehouse Metadata Unstructured Data Data-driven

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

AWS Big Data

FEBRUARY 18, 2025

These include managing complex extract, transform, and load (ETL) processes, handling schema validation, providing reliable delivery, and maintaining custom code for data transformations. Firehose delivers streaming data with configurable buffering options that can be optimized for near-zero latency.

Snapshot

Snapshot Optimization Data Lake Metadata

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Other money-making strategies include adding users in a per-seat structure or achieving price dominance in the market due. This strategy will ultimately increase sales, and prove a competitive advantage. Data Transformation and Enrichment Data can be enriched for analysis. addresses).

Analytics

Analytics Cost-Benefit Visualization Dashboards

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

A Stitch in Time: How Jet Analytics Boosts Microsoft Fabric Time-to-Value

Jet Global

MARCH 14, 2024

Data Lineage and Documentation Jet Analytics simplifies the process of documenting data assets and tracking data lineage in Fabric. It offers a transparent and accurate view of how data flows through the system, ensuring robust compliance.

Analytics

Analytics Management Reporting Data Quality

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Webinars

Trending Sources

How to Build a Successful Metadata Management Framework

Webinars

Ensuring Data Transformation Quality with dbt Core

Available Now! Automated Testing for Data Transformations

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Top 6 Benefits of Automating End-to-End Data Lineage

Biggest Trends in Data Visualization Taking Shape in 2022

Data Catalog Role in Data Transformation and Governance

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

Automate discovery of data relationships using ML and Amazon Neptune graph technology

How healthcare organizations can analyze and create insights using price transparency data

NEW: Octopai Announces Support of Microsoft Azure Data Factory

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Sure, Trust Your Data… Until It Breaks Everything: How Automated Data Lineage Saves the Day

Tackling AI’s data challenges with IBM databases on AWS

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Cross-account integration between SaaS platforms using Amazon AppFlow

Exploring the AI and data capabilities of watsonx

How Data Lineage Improves Data Compliance

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

The Modern Data Stack Explained: What The Future Holds

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

How to modernize data lakes with a data lakehouse architecture

Why The Public Sector Needs Data Governance

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Tableau further democratizes analytics with AI-fueled features

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

Automating Data Warehouses in the Era of AI, Data Products and Data Lakehouses

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

What Is Embedded Analytics?

What is Data Mapping?

A Stitch in Time: How Jet Analytics Boosts Microsoft Fabric Time-to-Value

Stay Connected