Data Transformation and Machine Learning

Data Transformation: Standardization vs Normalization

KDnuggets

AUGUST 12, 2022

Increasing accuracy in your models is often obtained through the first steps of data transformations. This guide explains the difference between the key feature scaling methods of standardization and normalization, and demonstrates when and how to apply each approach.

Data Transformation

Data Transformation Modeling Machine Learning

Data Engineering – A Journal with Pragmatic Blueprint

Analytics Vidhya

JUNE 23, 2022

Introduction to Data Engineering In recent days the consignment of data produced from innumerable sources is drastically increasing day-to-day. So, processing and storing of these data has also become highly strenuous. The post Data Engineering – A Journal with Pragmatic Blueprint appeared first on Analytics Vidhya.

Machine Learning

Machine Learning Data Science Publishing Technology

What companies get wrong about data transformation

CIO Business Intelligence

JUNE 14, 2022

For years, IT and data leaders have been striving to help their companies become more data driven. But technology investment alone is not enough to make your organization data driven. I think that speaks volumes to the type of commitment that organizations have to make around data in order to actually move the needle.”.

Data Transformation

Data Transformation Data-driven Data Strategy Strategy

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

A Comprehensive Guide on i-Transformer

Analytics Vidhya

MAY 13, 2024

Introduction Transformers have revolutionized various domains of machine learning, notably in natural language processing (NLP) and computer vision. Their ability to capture long-range dependencies and handle sequential data effectively has made them a staple in every AI researcher and practitioner’s toolbox.

Machine Learning

Machine Learning Analytics IT Forecasting

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

At Atlanta’s Hartsfield-Jackson International Airport, an IT pilot has led to a wholesale data journey destined to transform operations at the world’s busiest airport, fueled by machine learning and generative AI. They’re trying to get a handle on their data estate right now.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

The following requirements were essential to decide for adopting a modern data mesh architecture: Domain-oriented ownership and data-as-a-product : EUROGATE aims to: Enable scalable and straightforward data sharing across organizational boundaries. Eliminate centralized bottlenecks and complex data pipelines.

IoT

IoT Machine Learning Metadata Data-driven

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

Think about what the model results tell you: “Maybe a random forest isn’t the best tool to split this data, but XLNet is.” ” If none of your models performed well, that tells you that your dataset–your choice of raw data, feature selection, and feature engineering–is not amenable to machine learning.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

We live in a data-rich, insights-rich, and content-rich world. Data collections are the ones and zeroes that encode the actionable insights (patterns, trends, relationships) that we seek to extract from our data through machine learning and data science. Source: [link] I will finish with three quotes.

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

This middleware consists of custom code that runs data flows to stitch data transformations, search queries, and AI enrichments in varying combinations tailored to use cases, datasets, and requirements. Ingest flows are created to enrich data as its added to an index. Flows are a pipeline of processor resources.

Machine Learning

Machine Learning Visualization Dashboards Metadata

Is Big Data Transforming Our Broken Hospital Management Systems?

Smart Data Collective

JULY 25, 2019

Big Data is the Key to Hospital Management. Big data is changing the scope of hospital management. Healthcare providers are using machine learning, predictive analytics and other big data technologies to trim costs and improve the quality of care. However, all big data solutions are not created equally.

Big Data

Big Data Data Transformation Management Software

Transforming Your Data Pipeline with dbt(data build tool)

Analytics Vidhya

JUNE 14, 2024

Introduction Have you ever struggled with managing complex data transformations? In today’s data-driven world, extracting, transforming, and loading (ETL) data is crucial for gaining valuable insights. While many ETL tools exist, dbt (data build tool) is emerging as a game-changer.

Data-driven

Data-driven Data Transformation Management Analytics

Unparalleled Productivity: The Power of Cloudera Copilot for Cloudera Machine Learning

Cloudera

JUNE 24, 2024

In the fast-evolving landscape of data science and machine learning, efficiency is not just desirable—it’s essential. Imagine a world where every data practitioner, from seasoned data scientists to budding developers, has an intelligent assistant at their fingertips.

Machine Learning

Machine Learning Data Science Data-driven Testing

How AI and ML Can Transform Data Integration

Smart Data Collective

OCTOBER 20, 2021

As per the TDWI survey, more than a third (nearly 37%) of people has shown dissatisfaction with their ability to access and integrate complex data streams. Why is Data Integration a Challenge for Enterprises? As complexities in big data increase each day, data integration is becoming a challenge.

Data Integration

Data Integration Machine Learning Big Data Statistics

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

Much has been written about struggles of deploying machine learning projects to production. As with many burgeoning fields and disciplines, we don’t yet have a shared canonical infrastructure stack or best practices for developing and deploying data-intensive applications. However, the concept is quite abstract.

IT

IT Testing Experimentation Software

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

You can use it for big data analytics and machine learning workloads. Azure Databricks Delta Live Table s: These provide a more straightforward way to build and manage Data Pipelines for the latest, high-quality data in Delta Lake. Azure Blob Storage serves as the data lake to store raw data.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

Data Engineers Are Using AI to Verify Data Transformations

Wayne Yaddow

FEBRUARY 26, 2025

AI is transforming how senior data engineers and data scientists validate data transformations and conversions. Artificial intelligence-based verification approaches aid in the detection of anomalies, the enforcement of data integrity, and the optimization of pipelines for improved efficiency.

Data Transformation

Data Transformation Testing Data-driven Data Quality

At AstraZeneca, data and AI are more than game changers – they are life changers

CIO Business Intelligence

OCTOBER 11, 2022

The goal, she explained, is to knock down data silos between those groups, using multiple data lakes supported by strong security and governance, to drive positive impact across the supply chain, manufacturing, and the clinical trials of new drugs. . Four ways to improve data-driven business transformation .

Machine Learning

Machine Learning Data Science Data-driven Testing

Ensuring Data Transformation Results with Great Expectations

Wayne Yaddow

MARCH 12, 2025

Data quality rules are codified into structured Expectation Suites by Great Expectations instead of relying on ad-hoc scripts or manual checks. The framework ensures that your data transformations comply with rigorous specifications from the moment they are created through every iteration of your pipeline.

Data Transformation

Data Transformation Data Quality Testing Data Warehouse

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

DataKitchen

APRIL 26, 2021

Workiva also prioritized improving the data lifecycle of machine learning models, which otherwise can be very time consuming for the team to monitor and deploy. GSK’s DataOps journey paralleled their data transformation journey.

Measurement

Measurement Metrics Data-driven Dashboards

Functional Gaps in Your Data Transformation Testing Tools?

Wayne Yaddow

FEBRUARY 11, 2025

Managing tests of complex data transformations when automated data testing tools lack important features? Photo by Marvin Meyer on Unsplash Introduction Data transformations are at the core of modern business intelligence, blending and converting disparate datasets into coherent, reliable outputs.

Testing

Testing Data Transformation Data Quality Statistics

AzureML and CRISP-DM – a Framework to help the Business Intelligence professional move to AI

Jen Stirrup

SEPTEMBER 30, 2021

Although CRISP-DM is not perfect , the CRISP-DM framework offers a pathway for machine learning using AzureML for Microsoft Data Platform professionals. AI vs ML vs Data Science vs Business Intelligence. They may also learn from evidence, but the data and the modelling fundamentally comes from humans in some way.

Business Intelligence

Business Intelligence Data mining Machine Learning Testing

12 data science certifications that will pay off

CIO Business Intelligence

JANUARY 19, 2024

The exam covers everything from fundamental to advanced data science concepts such as big data best practices, business strategies for data, building cross-organizational support, machine learning, natural language processing, scholastic modeling, and more.

Data Science

Data Science Machine Learning Predictive Modeling Forecasting

Navigating the Data Provider Jungle

Dataiku

SEPTEMBER 23, 2021

We speak a lot about the ways we can use data, transform it, and create powerful models based on advanced machine learning techniques, but we sometimes forget where the data comes from initially.

Machine Learning

Machine Learning Data Transformation Modeling IT

KDnuggets News, August 17: How to Perform Motion Detection Using Python • The Complete Collection of Data Science Projects

KDnuggets

AUGUST 17, 2022

How to Perform Motion Detection Using Python • The Complete Collection of Data Science Projects - Part 2 • What Does ETL Have to Do with Machine Learning? Data Transformation: Standardization vs Normalization • The Evolution From Artificial Intelligence to Machine Learning to Data Science.

Data Science

Data Science Machine Learning Data Transformation

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Within seconds of transactional data being written into Amazon Aurora (a fully managed modern relational database service offering performance and high availability at scale), the data is seamlessly made available in Amazon Redshift for analytics and machine learning.

Data Warehouse

Data Warehouse Analytics Testing Sales

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Wayne Yaddow

MARCH 5, 2025

In this post, well see the fundamental procedures, tools, and techniques that data engineers, data scientists, and QA/testing teams use to ensure high-quality data as soon as its deployed. First, we look at how unit and integration tests uncover transformation errors at an early stage.

Testing

Testing Data Transformation Statistics Metadata

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

ChatGPT> DataOps is a term that refers to the set of practices and tools that organizations use to improve the quality and speed of data analytics and machine learning. It involves bringing together people, processes, and technology to enable data-driven decision making and improve the efficiency of data-related workflows.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

What is DataOps? Collaborative, cross-functional analytics

CIO Business Intelligence

DECEMBER 22, 2022

Where DataOps fits Enterprises today are increasingly injecting machine learning into a vast array of products and services and DataOps is an approach geared toward supporting the end-to-end needs of machine learning. The DataOps approach is not limited to machine learning,” they add.

Analytics

Analytics Machine Learning Data mining Advertising

A Planning Center of Excellence Delivers Performance Improvement

David Menninger's Analyst Perspectives

NOVEMBER 7, 2024

This does away with the need for analysts to repeatedly perform data extraction, enrichment or transformation motions from the required source systems, all but eliminating the substantial amount of time analysts and business users spend routinely on data preparation.

Forecasting

Forecasting Machine Learning Finance Predictive Analytics

What is data analytics? Analyzing and managing data for decisions

CIO Business Intelligence

JUNE 7, 2022

Data analytics draws from a range of disciplines — including computer programming, mathematics, and statistics — to perform analysis on data in an effort to describe, predict, and improve performance. What are the four types of data analytics? Data analytics methods and techniques.

Data Analytics

Data Analytics Diagnostic Analytics Management Analytics

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

Taking the broadest possible interpretation of data analytics , Azure offers more than a dozen services — and that’s before you include Power BI, with its AI-powered analysis and new datamart option , or governance-oriented approaches such as Microsoft Purview. Azure Data Factory. Azure Synapse Analytics. Datamarts in Power BI.

Analytics

Analytics Data Lake Data Warehouse Machine Learning

What is business analytics? Using data to improve business outcomes

CIO Business Intelligence

JULY 5, 2022

What is the difference between business analytics and data analytics? Business analytics is a subset of data analytics. Data analytics is used across disciplines to find trends and solve problems using data mining , data cleansing, data transformation, data modeling, and more.

Business Analytics

Business Analytics Prescriptive Analytics Data mining Diagnostic Analytics

Automating Data Pipelines in CDP with CDE Managed Airflow Service

Cloudera

AUGUST 17, 2021

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. Typically users need to ingest data, transform it into optimal format with quality checks, and optimize querying of the data by visual analytics tool.

Management

Management Cost-Benefit Data Transformation Optimization

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

Third, some services require you to set up and manage compute resources used for federated connectivity, and capabilities like connection testing and data preview arent available in all services. To solve for these challenges, we launched Amazon SageMaker Lakehouse unified data connectivity.

Visualization

Visualization Data Processing Testing Publishing

CIOs are rethinking how they use public cloud services. Here’s why.

CIO Business Intelligence

JANUARY 8, 2025

Secure storage, together with data transformation, monitoring, auditing, and a compliance layer, increase the complexity of the system. AI projects can break budgets Because AI and machine learning are data intensive, these projects can greatly increase cloud costs.

Data Processing

Data Processing Optimization Modeling Enterprise

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Build data validation rules directly into ingestion layers so that insufficient data is stopped at the gate and not detected after damage is done. Use lineage tooling to trace data from source to report. Understanding how data transforms and where it breaks is crucial for audibility and root-cause resolution.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

In the beginning, CDP ran only on AWS with a set of services that supported a handful of use cases and workload types: CDP Data Warehouse: a kubernetes-based service that allows business analysts to deploy data warehouses with secure, self-service access to enterprise data. Predict – Data Engineering (Apache Spark).

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

Bringing MMM to 21st Century with Machine Learning and Automation?

DataRobot Blog

APRIL 4, 2022

Before the data is put into the model comes a process called feature engineering – transforming the original data columns to impose certain business assumptions or simply increase model accuracy. The post Bringing MMM to 21st Century with Machine Learning and Automation? Want to See DataRobot in Action?

Machine Learning

Machine Learning Sales Measurement ROI

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

AWS Step Functions With AWS Step Functions, you can create workflows, also called State machines, to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning pipelines. The following Diagram 2 shows this workflow.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

My vision is that I can give the keys to my businesses to manage their data and run their data on their own, as opposed to the Data & Tech team being at the center and helping them out,” says Iyengar, director of Data & Tech at Straumann Group North America. The company’s Findability.ai

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

Transition from Amazon CloudSearch to Amazon OpenSearch Service

AWS Big Data

JULY 25, 2024

OpenSearch Ingestion can ingest data from a wide variety of sources, such as Amazon Simple Storage Service (Amazon S3) buckets and HTTP endpoints, and has a rich ecosystem of built-in processors to take care of your most complex data transformation needs.

Cost-Benefit

Cost-Benefit Machine Learning Dashboards Management

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

We’re excited to announce the general availability of the open source adapters for dbt for all the engines in CDP — Apache Hive , Apache Impala , and Apache Spark, with added support for Apache Livy and Cloudera Data Engineering. The Open Data Lakehouse . Cloudera builds dbt adaptors for all engines in the open data lakehouse.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

An Introductory Guide to Feature Stores | Domino Data Lab

Domino Data Lab

JUNE 7, 2022

Features are input for machine learning models. The most efficient way to use them across an organization is in a feature store that automates the data transformations, stores them and makes them available for training and inference.

Machine Learning

Machine Learning Data Transformation Modeling

Data Transformation: Standardization vs Normalization

Data Engineering – A Journal with Pragmatic Blueprint

Webinars

Trending Sources

What companies get wrong about data transformation

Webinars

A Comprehensive Guide on i-Transformer

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

How EUROGATE established a data mesh architecture using Amazon DataZone

Automating the Automators: Shift Change in the Robot Factory

SAP Datasphere Powers Business at the Speed of Data

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

Is Big Data Transforming Our Broken Hospital Management Systems?

Transforming Your Data Pipeline with dbt(data build tool)

Unparalleled Productivity: The Power of Cloudera Copilot for Cloudera Machine Learning

How AI and ML Can Transform Data Integration

MLOps and DevOps: Why Data Makes It Different

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Ensuring Data Transformation Quality with dbt Core

Data Engineers Are Using AI to Verify Data Transformations

At AstraZeneca, data and AI are more than game changers – they are life changers

Ensuring Data Transformation Results with Great Expectations

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

Functional Gaps in Your Data Transformation Testing Tools?

AzureML and CRISP-DM – a Framework to help the Business Intelligence professional move to AI

12 data science certifications that will pay off

Navigating the Data Provider Jungle

KDnuggets News, August 17: How to Perform Motion Detection Using Python • The Complete Collection of Data Science Projects

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

An AI Chat Bot Wrote This Blog Post …

What is DataOps? Collaborative, cross-functional analytics

A Planning Center of Excellence Delivers Performance Improvement

What is data analytics? Analyzing and managing data for decisions

7 key Microsoft Azure analytics services (plus one extra)

What is business analytics? Using data to improve business outcomes

Automating Data Pipelines in CDP with CDE Managed Airflow Service

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

CIOs are rethinking how they use public cloud services. Here’s why.

Data’s dark secret: Why poor quality cripples AI and growth

Happy Birthday, CDP Public Cloud

Bringing MMM to 21st Century with Machine Learning and Automation?

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Straumann Group is transforming dentistry with data, AI

Transition from Amazon CloudSearch to Amazon OpenSearch Service

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

An Introductory Guide to Feature Stores | Domino Data Lab

Stay Connected