Data Transformation, Document and Machine Learning

Data Transformation

Document

Machine Learning

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

This middleware consists of custom code that runs data flows to stitch data transformations, search queries, and AI enrichments in varying combinations tailored to use cases, datasets, and requirements. Ingest flows are created to enrich data as its added to an index. An index constructed from the processed documents.

Machine Learning

Machine Learning Visualization Dashboards Metadata

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

Data collections are the ones and zeroes that encode the actionable insights (patterns, trends, relationships) that we seek to extract from our data through machine learning and data science. incorporates the business context of the data and data products that are being recommended and delivered).

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Within seconds of transactional data being written into Amazon Aurora (a fully managed modern relational database service offering performance and high availability at scale), the data is seamlessly made available in Amazon Redshift for analytics and machine learning. If this number is 0, then the test is successful.

Data Warehouse

Data Warehouse Analytics Testing Sales

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

Think about what the model results tell you: “Maybe a random forest isn’t the best tool to split this data, but XLNet is.” ” If none of your models performed well, that tells you that your dataset–your choice of raw data, feature selection, and feature engineering–is not amenable to machine learning.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

Unparalleled Productivity: The Power of Cloudera Copilot for Cloudera Machine Learning

Cloudera

JUNE 24, 2024

In the fast-evolving landscape of data science and machine learning, efficiency is not just desirable—it’s essential. Imagine a world where every data practitioner, from seasoned data scientists to budding developers, has an intelligent assistant at their fingertips.

Machine Learning

Machine Learning Data Science Data-driven Testing

Transition from Amazon CloudSearch to Amazon OpenSearch Service

AWS Big Data

JULY 25, 2024

With CloudSearch, you can search large collections of data such as webpages, document files, forum posts, or product information. You send your documents to OpenSearch Serverless, which indexes them for search using the OpenSearch REST API. With OpenSearch Serverless , you get improved, out-of-the-box, hands-free operation.

Cost-Benefit

Cost-Benefit Machine Learning Dashboards Management

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

Ensuring Data Transformation Results with Great Expectations

Wayne Yaddow

MARCH 12, 2025

Great Expectations can be integrated directly into existing data pipelines to define, test, and document expectations about the appearance of transformed or converted data. Data quality rules are codified into structured Expectation Suites by Great Expectations instead of relying on ad-hoc scripts or manual checks.

Data Transformation

Data Transformation Data Quality Testing Data Warehouse

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Amazon Redshift is used to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes. Amazon EMR provides a big data environment for data processing, interactive analysis, and machine learning using open source frameworks such as Apache Spark, Apache Hive, and Presto.

Metadata

Metadata Data Lake Modeling Data Warehouse

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Build data validation rules directly into ingestion layers so that insufficient data is stopped at the gate and not detected after damage is done. Use lineage tooling to trace data from source to report. Understanding how data transforms and where it breaks is crucial for audibility and root-cause resolution.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

We’re excited to announce the general availability of the open source adapters for dbt for all the engines in CDP — Apache Hive , Apache Impala , and Apache Spark, with added support for Apache Livy and Cloudera Data Engineering. The Open Data Lakehouse . Cloudera builds dbt adaptors for all engines in the open data lakehouse.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

Adding AI to Products: A High-Level Guide for Product Managers

Sisense

AUGUST 6, 2020

AI and machine learning (ML) are not just catchy buzzwords; they’re vital to the future of our planet and your business. Doing it right can mean the difference between thriving in the new world of data and disappearing from it. Take Grammarly as an example: This popular program checks the grammar, tone, and style of documents.

Management

Management Machine Learning Key Performance Indicator Cost-Benefit

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Analytics

Analytics Data Warehouse Metrics Big Data

Breaking down data silos for digital success

CIO Business Intelligence

NOVEMBER 7, 2023

Given the importance of sharing information among diverse disciplines in the era of digital transformation, this concept is arguably as important as ever. The aim is to normalize, aggregate, and eventually make available to analysts across the organization data that originates in various pockets of the enterprise.

Data Warehouse

Data Warehouse Digital Transformation Data-driven Reporting

Applying Fine Grained Security to Apache Spark

Cloudera

AUGUST 3, 2022

By leveraging Hive to apply Ranger FGAC, Spark obtains secure access to the data in a protected staging area. Since Spark has direct access to the staged data, any Spark APIs can be used, from complex data transformations to data science and machine learning. . so stay tuned! .

Snapshot

Snapshot Cost-Benefit Machine Learning Data Science

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

AWS Big Data

OCTOBER 5, 2023

With Amazon AppFlow, you can run data flows at nearly any scale at the frequency you choose—on a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Data Warehouse

Data Warehouse Machine Learning Data Integration Data-driven

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

However, when a data producer shares data products on a data mesh self-serve web portal, it’s neither intuitive nor easy for a data consumer to know which data products they can join to create new insights. This is especially true in a large enterprise with thousands of data products.

Technology

Technology Data-driven Machine Learning Sales

Extract time series from satellite weather data with AWS Lambda

AWS Big Data

JULY 6, 2023

It has not been specifically designed for heavy data transformation tasks. Step Functions helps developers use AWS services to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning (ML) pipelines. Note that Lambda is a general purpose serverless engine.

Machine Learning

Machine Learning Visualization IoT Digital Transformation

Talk Data to Me: Why Employee Data Literacy Matters

erwin

MARCH 26, 2020

There are three technological advances driving this data consumption and, in turn, the ability for employees to leverage this data to deliver business value 1) exploding data production 2) scalable big data computation, and 3) the accessibility of advanced analytics, machine learning (ML) and artificial intelligence (AI).

Data-driven

Data-driven Unstructured Data Enterprise Machine Learning

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. An AI governance framework ensures the ethical, responsible and transparent use of AI and machine learning (ML). Capture and document model metadata for report generation.

Risk

Risk Modeling Management Metadata

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

By using AWS Glue to integrate data from Snowflake, Amazon S3, and SaaS applications, organizations can unlock new opportunities in generative artificial intelligence (AI) , machine learning (ML) , business intelligence (BI) , and self-service analytics or feed data to underlying applications.

Analytics

Analytics Data-driven Data Integration Data Lake

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

Detailed Data and Model Lineage Tracking*: Ensures comprehensive tracking and documentation of data transformations and model lifecycle events, enhancing reproducibility and auditability. Developers are provided open inference protocol APIs for traditional ML models and with an OpenAI compatible API for LLMs.

Optimization

Optimization Experimentation Metrics Enterprise

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

In today’s data-driven world, seamless integration and transformation of data across diverse sources into actionable insights is paramount. With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog.

Data Processing

Data Processing Visualization Data Lake Data Processing

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Migrating to Amazon Redshift offers organizations the potential for improved price-performance, enhanced data processing, faster query response times, and better integration with technologies such as machine learning (ML) and artificial intelligence (AI). Platform architects define a well-architected platform.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Declarative Knowledge Graph APIs

Ontotext

DECEMBER 9, 2020

We all want to solve the interesting data challenges, build analytics, generate graph embeddings and train smart machine learning models over our knowledge graph data. This leads to lots of small data fetches to/from GraphDB over the network. Custom code also tends to over-fetch data that is not required.

Modeling

Modeling Management Optimization Machine Learning

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. While they require task-specific labeled data for fine tuning, they also offer clients the best cost performance trade-off for non-generative use cases.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Unlock scalable analytics with AWS Glue and Google BigQuery

AWS Big Data

OCTOBER 27, 2023

Overview of AWS Glue AWS Glue is a serverless data integration service that makes it easier to discover, prepare, and combine data for analytics, machine learning (ML), and application development. Follow the documentation to clean up the Google resources.

Analytics

Analytics Visualization Data Integration Cost-Benefit

How to Build a Successful Metadata Management Framework

Alation

JUNE 28, 2022

A metadata management framework combines organizational structure and a set of tools to create a data asset taxonomy. Document type: describes creation, storage, and use during business processes. Scale effectively: Leverage taxonomies to ensure consistent modeling outcomes when introducing new data sets or changing business demands.

Metadata

Metadata Management Data Governance Machine Learning

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

This can be done using the initiatePrint action: embeddedDashboard.initiatePrint(); The following code sample shows a loading animation, SDK code status, and dashboard interaction monitoring, along with initiating dashboard print from the application: Embedding demo $(document).ready(function()

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

Smart Factories: Artificial Intelligence and Automation for Reduced OPEX in Manufacturing

DataRobot Blog

MARCH 10, 2022

With Snowflake’s newest feature release, Snowpark , developers can now quickly build and scale data-driven pipelines and applications in their programming language of choice, taking full advantage of Snowflake’s highly performant and scalable processing engine that accelerates the traditional data engineering and machine learning life cycles.

Manufacturing

Manufacturing IoT Machine Learning Forecasting

PODCAST: AI for Digital Enterprise – Episode 5: How Intelligent Operations can become prime advantage for enterprises

bridgei2i

SEPTEMBER 3, 2020

Ronobijay: Sure, I think it would, you know, what used to be anathema till a few months back, you know, data transformation is real now, right? We would have to visit a branch possibly, you know, multiple locations, submit multiple documents. So earlier customers would spend a week or two, trying to open a bank account.

Enterprise

Enterprise Insurance Digital Transformation Interactive

Enable data analytics with Talend and Amazon Redshift Serverless

AWS Big Data

JULY 25, 2023

Redshift Serverless automatically provisions and intelligently scales data warehouse capacity to deliver fast performance for even the most demanding and unpredictable workloads, and you pay only for what you use. Solution overview The integration of Talend with Amazon Redshift adds new features and capabilities.

Data Analytics

Data Analytics Analytics Data Warehouse Data Processing

Sure, Trust Your Data… Until It Breaks Everything: How Automated Data Lineage Saves the Day

Octopai

JUNE 9, 2024

As data inconsistencies grew, so did skepticism about the accuracy of the data. Decision-makers hesitated to rely on data-driven insights, fearing the consequences of potential errors. Accurate data lineage rebuilt trust among decision-makers. Ensuring compliance with healthcare regulations became a daunting task.

IT Data-driven Predictive Analytics Data Strategy

Manual Feature Engineering

Domino Data Lab

AUGUST 20, 2019

Many thanks to AWP Pearson for the permission to excerpt “Manual Feature Engineering: Manipulating Data for Fun and Profit” from the book, Machine Learning with Python for Everyone by Mark E. Missing values can be filled in based on expert knowledge, heuristics, or by some machine learning techniques.

Testing

Testing Modeling Interactive Measurement

Why The Public Sector Needs Data Governance

Alation

NOVEMBER 22, 2022

A well-governed data landscape enables data users in the public sector to better understand the driving forces and needs to support public policy – and measure impact once a change is made. Efficient Access To Data. Citizens, companies, and government employees need access to data and documents.

Data Governance

Data Governance Metadata Data-driven Unstructured Data

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

AWS Big Data

JULY 31, 2023

This concludes creating data sources on the AWS Glue job canvas. Next, we add transformations by combining data from these different tables. Transform the data Complete the following steps to add data transformations: On the AWS Glue job canvas, choose the plus sign. Choose Run to run the job.

Sales

Sales Data Warehouse Visualization Testing

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Modern Data Sources Painlessly connect with modern data such as streaming, search, big data, NoSQL, cloud, document-based sources. Quickly link all your data from Amazon Redshift, MongoDB, Hadoop, Snowflake, Apache Solr, Elasticsearch, Impala, and more. addresses). Read carefully. Instead, software can be used.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Enable Image Analysis with Cloudera’s New Accelerator for Machine Learning Projects Based on Anthropic Claude

Cloudera

NOVEMBER 15, 2024

Enterprise organizations collect massive volumes of unstructured data, such as images, handwritten text, documents, and more. They also still capture much of this data through manual processes. The way to leverage this for business insight is to digitize that data.

Machine Learning

Machine Learning Unstructured Data Visualization Data Transformation

Johnson Controls rethinks IT for the cloud-native and AI era

CIO Business Intelligence

MAY 5, 2025

The company, which employs 100,000 globally, relies primarily on Azure for cloud and data services, has deployed a variety of ERP packages, and maintains a mature Snowflake data lakehouse.

IT Manufacturing ROI Data-driven

Introducing Amazon Q Developer in Amazon OpenSearch Service

AWS Big Data

MAY 9, 2025

Solution overview Setting up observability signal data for analysis involves many steps, including instrumenting application code, creating complex queries, creating visualizations and dashboards, configuring appropriate alerts, and often machine learning-based anomaly detectors.

Visualization

Visualization Interactive Dashboards Machine Learning

Unified scheduling for visual ETL flows and query books in Amazon SageMaker Unified Studio

AWS Big Data

APRIL 30, 2025

Data engineers and analysts often need to automate their data processing workflows and queries to maintain up-to-date data pipelines and reports. Amazon SageMaker Unified Studio provides a unified environment for data, analytics, machine learning (ML), and AI workloads.

Visualization

Visualization Software Machine Learning Reporting

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

AWS Big Data

MAY 12, 2025

Streaming pipelines used Spark Streaming to ingest real-time data from Kafka, writing raw datasets to an Amazon Simple Storage Service (Amazon S3) data lake while simultaneously loading them into BigQuery and Google Cloud Storage to build logical data layers.

Metrics

Metrics Cost-Benefit Metadata Data Lake

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

SAP Datasphere Powers Business at the Speed of Data

Webinars

Trending Sources

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

Automating the Automators: Shift Change in the Robot Factory

Unparalleled Productivity: The Power of Cloudera Copilot for Cloudera Machine Learning

Transition from Amazon CloudSearch to Amazon OpenSearch Service

Ensuring Data Transformation Quality with dbt Core

Ensuring Data Transformation Results with Great Expectations

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Data’s dark secret: Why poor quality cripples AI and growth

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Adding AI to Products: A High-Level Guide for Product Managers

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Breaking down data silos for digital success

Applying Fine Grained Security to Apache Spark

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

Automate discovery of data relationships using ML and Amazon Neptune graph technology

Extract time series from satellite weather data with AWS Lambda

Talk Data to Me: Why Employee Data Literacy Matters

How to use foundation models and trusted governance to manage AI workflow risk

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Deploy and Scale AI Applications With Cloudera AI Inference Service

Use AWS Glue to streamline SFTP data processing

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Declarative Knowledge Graph APIs

Exploring the AI and data capabilities of watsonx

Unlock scalable analytics with AWS Glue and Google BigQuery

How to Build a Successful Metadata Management Framework

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

Smart Factories: Artificial Intelligence and Automation for Reduced OPEX in Manufacturing

PODCAST: AI for Digital Enterprise – Episode 5: How Intelligent Operations can become prime advantage for enterprises

Enable data analytics with Talend and Amazon Redshift Serverless

Sure, Trust Your Data… Until It Breaks Everything: How Automated Data Lineage Saves the Day

Manual Feature Engineering

Why The Public Sector Needs Data Governance

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

What is Data Mapping?

What Is Embedded Analytics?

Enable Image Analysis with Cloudera’s New Accelerator for Machine Learning Projects Based on Anthropic Claude

Johnson Controls rethinks IT for the cloud-native and AI era

Introducing Amazon Q Developer in Amazon OpenSearch Service

Unified scheduling for visual ETL flows and query books in Amazon SageMaker Unified Studio

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

Stay Connected