Data Transformation, Management and Reference

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data. 10) Data Quality Solutions: Key Attributes.

Data Quality

Data Quality Metrics Data-driven Management

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

Inventory management is a critical function for any business that deals with physical products. The primary challenge businesses face with inventory management is balancing the cost of holding inventory with the need to ensure that products are available when customers demand them.

Forecasting

Forecasting Management IoT Data-driven

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and from third-party sources. Refer to the detailed blog post on how you can use this to connect through various other tools.

Analytics

Analytics Visualization Data Governance Data-driven

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

AWS Big Data

NOVEMBER 22, 2024

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that you can use to analyze your data at scale. Reusing database sessions to simplify the connection management logic in your API implementation, reducing the complexity of the code and making it more straightforward to maintain and scale.

Data Warehouse

Data Warehouse Recreation/Entertainment Cost-Benefit Data-driven

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Organizations with legacy, on-premises, near-real-time analytics solutions typically rely on self-managed relational databases as their data store for analytics workloads. Near-real-time streaming analytics captures the value of operational data and metrics to provide new insights to create business opportunities.

Management

Management Metadata Analytics Dashboards

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

In this post, we show you how to establish the data ingestion pipeline between Google Analytics 4, Google Sheets, and an Amazon Redshift Serverless workgroup. With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand.

Analytics

Analytics Data Warehouse Big Data Metrics

The Best Data Management Tools For Small Businesses

Smart Data Collective

APRIL 29, 2020

As the world is gradually becoming more dependent on data, the services, tools and infrastructure are all the more important for businesses in every sector. Data management has become a fundamental business concern, and especially for businesses that are going through a digital transformation. What is data management?

Management

Management Data Warehouse Digital Transformation Dashboards

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

This means you can refine your ETL jobs through natural follow-up questionsstarting with a basic data pipeline and progressively adding transformations, filters, and business logic through conversation. The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios.

Data Integration

Data Integration Visualization Data Processing Big Data

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

Since software engineers manage to build ordinary software without experiencing as much pain as their counterparts in the ML department, it begs the question: should we just start treating ML projects as software engineering projects as usual, maybe educating ML practitioners about the existing best practices? Orchestration. Versioning.

IT

IT Testing Experimentation Software

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

This middleware consists of custom code that runs data flows to stitch data transformations, search queries, and AI enrichments in varying combinations tailored to use cases, datasets, and requirements. The visual designer is recommended for helping you manage workflow projects. But what happens if we search for NBA?

Machine Learning

Machine Learning Visualization Dashboards Metadata

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

For each service, you need to learn the supported authorization and authentication methods, data access APIs, and framework to onboard and test data sources. This fragmented, repetitive, and error-prone experience for data connectivity is a significant obstacle to data integration, analysis, and machine learning (ML) initiatives.

Visualization

Visualization Data Processing Testing Publishing

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Within seconds of transactional data being written into Amazon Aurora (a fully managed modern relational database service offering performance and high availability at scale), the data is seamlessly made available in Amazon Redshift for analytics and machine learning. Create dbt models in dbt Cloud. Choose Create.

Data Warehouse

Data Warehouse Analytics Testing Sales

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Granting Anthropic’s Claude permissions on Amazon Bedrock Have an AWS account and sign in using the AWS Management Console. Choose Manage model access. Change the AWS Region to US West (Oregon). Navigate to the AWS Service Catalog console and choose Amazon Bedrock.

Metadata

Metadata Data Lake Modeling Data Warehouse

Key Challenges Affecting Data Transformations—Dev and Testing

Wayne Yaddow

FEBRUARY 6, 2025

Common challenges and practical mitigation strategies for reliable data transformations. Photo by Mika Baumeister on Unsplash Introduction Data transformations are important processes in data engineering, enabling organizations to structure, enrich, and integrate data for analytics , reporting, and operational decision-making.

Testing

Testing Data Transformation Data-driven Manufacturing

Functional Gaps in Your Data Transformation Testing Tools?

Wayne Yaddow

FEBRUARY 11, 2025

Managing tests of complex data transformations when automated data testing tools lack important features? Photo by Marvin Meyer on Unsplash Introduction Data transformations are at the core of modern business intelligence, blending and converting disparate datasets into coherent, reliable outputs.

Testing

Testing Data Transformation Data Quality Statistics

How the BMW Group analyses semiconductor demand with AWS Glue

AWS Big Data

APRIL 26, 2023

Additionally, this forecasting system needs to provide data enrichment steps including byproducts, serve as the master data around the semiconductor management, and enable further use cases at the BMW Group. To enable this use case, we used the BMW Group’s cloud-native data platform called the Cloud Data Hub.

Forecasting

Forecasting Manufacturing Data Lake Big Data

Transition from Amazon CloudSearch to Amazon OpenSearch Service

AWS Big Data

JULY 25, 2024

Comparing Amazon CloudSearch and Amazon OpenSearch Service CloudSearch is a fully managed service in the cloud that makes it straightforward to set up, manage, and scale a search solution for your website or application. We recommend that you use Amazon OpenSearch Ingestion to ingest data.

Cost-Benefit

Cost-Benefit Machine Learning Dashboards Management

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

The goal of DataOps is to help organizations make better use of their data to drive business decisions and improve outcomes. ChatGPT> DataOps is a term that refers to the set of practices and tools that organizations use to improve the quality and speed of data analytics and machine learning.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

For more information on this foundation, refer to A Detailed Overview of the Cost Intelligence Dashboard. Overview of the BMW Cloud Data Hub At the BMW Group, Cloud Data Hub (CDH) is the central platform for managing company-wide data and data solutions.

Analytics

Analytics Dashboards Metadata Data Warehouse

Introducing blueprint discovery and other UI enhancements for Amazon OpenSearch Ingestion

AWS Big Data

MAY 22, 2024

Amazon OpenSearch Ingestion is a fully managed serverless pipeline that allows you to ingest, filter, transform, enrich, and route data to an Amazon OpenSearch Service domain or Amazon OpenSearch Serverless collection. When you create a new pipeline on the OpenSearch Service console, you’re presented with a new catalog page.

Data Architecture

Data Architecture Visualization Data Transformation Management

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

” I, thankfully, learned this early in my career, at a time when I could still refer to myself as a software developer. Related to the previous point, a company could go from “raw data” to “it’s serving predictions on live data” in a single work day. You have other work to do.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

AWS Big Data

JUNE 20, 2023

You can easily deliver data to supported destinations using the Amazon Kinesis Data Firehose integration with VPC flow logs. Kinesis Data Firehose is a fully managed service for delivering near-real-time streaming data to various destinations for storage and performing near-real-time analytics.

Dashboards

Dashboards Visualization Metrics Data Transformation

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

Amazon Q Developer can now generate complex data integration jobs with multiple sources, destinations, and data transformations. Generated jobs can use a variety of data transformations, including filter, project, union, join, and custom user-supplied SQL. Matt Su is a Senior Product Manager on the AWS Glue team.

Data Integration

Data Integration Data Lake Data Warehouse Software

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

AWS Big Data

AUGUST 8, 2024

Oracle GoldenGate for Oracle Database and Big Data adapters Oracle GoldenGate is a real-time data integration and replication tool used for disaster recovery, data migrations, high availability. An AWS Identity and Access Management (IAM) user. For more details, refer to Operating System Requirements.

Analytics

Analytics Big Data Software Data Integration

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

AI governance refers to the practice of directing, managing and monitoring an organization’s AI activities. It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits.

Risk

Risk Modeling Management Metadata

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

The AWS Glue job uses the secure connection established by the VPC endpoints to access Snowflake data. Snowflake credentials are securely stored in AWS Secrets Manager. The AWS Glue job retrieves these credentials at runtime to authenticate and connect to Snowflake, providing secure access management.

Analytics

Analytics Data-driven Data Integration Data Lake

Time for New Partnership Paradigms to Be Future-fit

CIO Business Intelligence

DECEMBER 6, 2023

Despite logistics challenges caused by the global pandemic, the company managed to rapidly scale up its team to over 1,000 people in a period of only 11 months.

Digital Transformation

Digital Transformation Software Cost-Benefit Manufacturing

Improve observability across Amazon MWAA tasks

AWS Big Data

FEBRUARY 6, 2023

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that makes it simple to set up and operate end-to-end data pipelines in the cloud at scale. To run the scripts, refer to the Amazon MWAA analytics workshop. Refer to the GitHub repo for the complete DAG code.

Management

Management Interactive Publishing Metadata

Tableau further democratizes analytics with AI-fueled features

CIO Business Intelligence

APRIL 30, 2024

This feature enables users to compare progress on a metric with a set benchmark or goal, allowing a sales manager to track their pipeline versus targets, for example. Einstein Copilot for Tableau remains in beta, but Tableau announced two new features for the AI assistant as well: AI-assisted data transformation.

Analytics

Analytics Metrics Visualization Dashboards

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

The Amazon Redshift integration for Apache Spark combined with AWS Glue or Amazon EMR performs transformations before loading data into Amazon Redshift. Finally, data can be loaded into Amazon Redshift with popular ETL tools like Informatica , Matillion and DBT Labs. AWS Glue 4.0

IoT

IoT Data Warehouse Cost-Benefit Reporting

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

AWS Big Data

APRIL 2, 2024

With these settings, you can now seamlessly ingest decompressed CloudWatch log data into Splunk using Firehose. Pricing The Firehose decompression feature decompress the data and charges per GB of decompressed data. To understand decompression pricing, refer to Amazon Data Firehose pricing.

Metadata

Metadata Marketing Analytics Data Transformation

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

Diagram 1: Overall architecture of the solution, using AWS Step Functions, Amazon Redshift and Amazon S3 The following AWS services were used to shape our new ETL architecture: Amazon Redshift A fully managed, petabyte-scale data warehouse service in the cloud. Its also serverless, which means theres no infrastructure to manage.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

How SOCAR handles large IoT data with Amazon MSK and Amazon ElastiCache for Redis

AWS Big Data

MAY 3, 2023

This post is a continuation of How SOCAR built a streaming data pipeline to process IoT data for real-time analytics and control. SOCAR wanted to design and build a solution for a new Fleet Management System (FMS). The following figure shows an example of the data flow at SOCAR.

IoT

IoT Internet of Things Data Transformation Management

How Infomedia built a serverless data pipeline with change data capture using AWS Glue and Apache Hudi

AWS Big Data

MARCH 15, 2023

Infomedia Ltd (ASX:IFM) is a leading global provider of DaaS and SaaS solutions that empowers the data-driven automotive ecosystem. Infomedia’s solutions help OEMs, NSCs, dealerships and 3rd party partners manage the vehicle and customer lifecycle.

Cost-Benefit

Cost-Benefit Data Processing Optimization Data-driven

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

In the world of software engineering and development, organizations use project management tools like Atlassian Jira Cloud. Managing projects with Jira leads to rich datasets, which can provide historical and predictive insights about project and development efforts. An AWS account and a login with access to the AWS Management Console.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

Data mesh is a new approach to data management. Companies across industries are using a data mesh to decentralize data management to improve data agility and get value from data. A modern data architecture is critical in order to become a data-driven organization.

Technology

Technology Data-driven Machine Learning Sales

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

With a data pipeline, which is a set of tasks used to automate the movement and transformation of data between different systems, you can reduce the time and effort needed to gain insights from the data. Apache Airflow and Snowflake have emerged as powerful technologies for data management and analysis.

Data Processing

Data Processing Management Publishing Visualization

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

Analytics is the means for discovering those insights, and doing it well requires the right tools for ingesting and preparing data, enriching and tagging it, building and sharing reports, and managing and protecting your data and insights. Azure Data Factory. Azure Synapse Analytics.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

The AWS Glue Data Catalog provides a uniform repository where disparate systems can store and find metadata to keep track of data in data silos. Apache Flink is a widely used data processing engine for scalable streaming ETL, analytics, and event-driven applications. Refer to Catalogs for more information.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

In this post, we dive deep into the tool, walking through all steps from log ingestion, transformation, visualization, and architecture design to calculate TCO. The tool provides a YARN log collector to connect Hadoop Resource Manager to collect YARN logs. About the authors Sungyoul Park is a Senior Practice Manager at AWS ProServe.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

Amazon Athena supports the MERGE command on Apache Iceberg tables, which allows you to perform inserts, updates, and deletes in your data lake at scale using familiar SQL statements that are compliant with ACID (Atomic, Consistent, Isolated, Durable). Navigate to the Athena console and choose Query editor.

Data Lake

Data Lake Snapshot Optimization Data Transformation

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

Under the Transparency in Coverage (TCR) rule , hospitals and payors to publish their pricing data in a machine-readable format. For more information, refer to Delivering Consumer-friendly Healthcare Transparency in Coverage On AWS. The Data Catalog now contains references to the machine-readable data.

Visualization

Visualization Dashboards Data-driven Gap analysis

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Reference guide to build inventory management and forecasting solutions on AWS

Webinars

Trending Sources

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Webinars

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

The Best Data Management Tools For Small Businesses

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

MLOps and DevOps: Why Data Makes It Different

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Key Challenges Affecting Data Transformations—Dev and Testing

Functional Gaps in Your Data Transformation Testing Tools?

How the BMW Group analyses semiconductor demand with AWS Glue

Transition from Amazon CloudSearch to Amazon OpenSearch Service

An AI Chat Bot Wrote This Blog Post …

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Introducing blueprint discovery and other UI enhancements for Amazon OpenSearch Ingestion

Automating the Automators: Shift Change in the Robot Factory

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

Introducing Amazon Q data integration in AWS Glue

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

How to use foundation models and trusted governance to manage AI workflow risk

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Time for New Partnership Paradigms to Be Future-fit

Improve observability across Amazon MWAA tasks

Tableau further democratizes analytics with AI-fueled features

Amazon Redshift data ingestion options

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

How SOCAR handles large IoT data with Amazon MSK and Amazon ElastiCache for Redis

How Infomedia built a serverless data pipeline with change data capture using AWS Glue and Apache Hudi

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Automate discovery of data relationships using ML and Amazon Neptune graph technology

Use Snowflake with Amazon MWAA to orchestrate data pipelines

7 key Microsoft Azure analytics services (plus one extra)

Build a data lake with Apache Flink on Amazon EMR

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

How healthcare organizations can analyze and create insights using price transparency data

Stay Connected