Data Transformation, Machine Learning and Visualization

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

The following requirements were essential to decide for adopting a modern data mesh architecture: Domain-oriented ownership and data-as-a-product : EUROGATE aims to: Enable scalable and straightforward data sharing across organizational boundaries. Eliminate centralized bottlenecks and complex data pipelines.

IoT

IoT Machine Learning Metadata Data-driven

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

At Atlanta’s Hartsfield-Jackson International Airport, an IT pilot has led to a wholesale data journey destined to transform operations at the world’s busiest airport, fueled by machine learning and generative AI. That enables the analytics team using Power BI to create a single visualization for the GM.”

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

Through a visual designer, you can configure custom AI search flowsa series of AI-driven data enrichments performed during ingestion and search. You can use the flow builder through APIs or a visual designer. The visual designer is recommended for helping you manage workflow projects.

Machine Learning

Machine Learning Visualization Dashboards Metadata

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

Data collections are the ones and zeroes that encode the actionable insights (patterns, trends, relationships) that we seek to extract from our data through machine learning and data science. incorporates the business context of the data and data products that are being recommended and delivered).

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

Think about what the model results tell you: “Maybe a random forest isn’t the best tool to split this data, but XLNet is.” ” If none of your models performed well, that tells you that your dataset–your choice of raw data, feature selection, and feature engineering–is not amenable to machine learning.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

Third, some services require you to set up and manage compute resources used for federated connectivity, and capabilities like connection testing and data preview arent available in all services. To solve for these challenges, we launched Amazon SageMaker Lakehouse unified data connectivity. Under Create job , choose Visual ETL.

Visualization

Visualization Data Processing Testing Publishing

What is data analytics? Analyzing and managing data for decisions

CIO Business Intelligence

JUNE 7, 2022

Data analytics draws from a range of disciplines — including computer programming, mathematics, and statistics — to perform analysis on data in an effort to describe, predict, and improve performance. What are the four types of data analytics? Data analytics methods and techniques.

Data Analytics

Data Analytics Diagnostic Analytics Management Analytics

What is business analytics? Using data to improve business outcomes

CIO Business Intelligence

JULY 5, 2022

While quantitative analysis, operational analysis, and data visualizations are key components of business analytics, the goal is to use the insights gained to shape business decisions. What is the difference between business analytics and data analytics? Business analytics is a subset of data analytics.

Business Analytics

Business Analytics Prescriptive Analytics Data mining Diagnostic Analytics

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

You can use it for big data analytics and machine learning workloads. Azure Databricks Delta Live Table s: These provide a more straightforward way to build and manage Data Pipelines for the latest, high-quality data in Delta Lake. Azure Blob Storage serves as the data lake to store raw data.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

12 data science certifications that will pay off

CIO Business Intelligence

JANUARY 19, 2024

The exam covers everything from fundamental to advanced data science concepts such as big data best practices, business strategies for data, building cross-organizational support, machine learning, natural language processing, scholastic modeling, and more.

Data Science

Data Science Machine Learning Predictive Modeling Forecasting

Data Engineers Are Using AI to Verify Data Transformations

Wayne Yaddow

FEBRUARY 26, 2025

AI is transforming how senior data engineers and data scientists validate data transformations and conversions. Artificial intelligence-based verification approaches aid in the detection of anomalies, the enforcement of data integrity, and the optimization of pipelines for improved efficiency.

Data Transformation

Data Transformation Testing Data-driven Data Quality

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

In the beginning, CDP ran only on AWS with a set of services that supported a handful of use cases and workload types: CDP Data Warehouse: a kubernetes-based service that allows business analysts to deploy data warehouses with secure, self-service access to enterprise data. 1) Currently available on AWS only. (2)

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Wayne Yaddow

MARCH 5, 2025

In this post, well see the fundamental procedures, tools, and techniques that data engineers, data scientists, and QA/testing teams use to ensure high-quality data as soon as its deployed. First, we look at how unit and integration tests uncover transformation errors at an early stage. Statistical tests (e.g.,

Testing

Testing Data Transformation Statistics Metadata

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

Taking the broadest possible interpretation of data analytics , Azure offers more than a dozen services — and that’s before you include Power BI, with its AI-powered analysis and new datamart option , or governance-oriented approaches such as Microsoft Purview. Azure Data Factory. Everything is visual. Azure Synapse Analytics.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Unveiling the Top 10 Data Visualization Companies of 2024

FineReport

JUNE 7, 2024

In 2024, data visualization companies play a pivotal role in transforming complex data into captivating narratives. This blog provides an insightful exploration of the leading entities shaping the data visualization landscape.

Visualization

Visualization Predictive Analytics Dashboards Predictive Modeling

Tableau further democratizes analytics with AI-fueled features

CIO Business Intelligence

APRIL 30, 2024

Together the technologies aim to help business users and “novice” data analysts explore their data and gain insights without having to resort to data experts. This is really empowering everyone to be a data expert,” Maxon said. “It Shared Dimensions and Composable Data Sources.

Analytics

Analytics Metrics Visualization Dashboards

Automating Data Pipelines in CDP with CDE Managed Airflow Service

Cloudera

AUGUST 17, 2021

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. Typically users need to ingest data, transform it into optimal format with quality checks, and optimize querying of the data by visual analytics tool.

Management

Management Cost-Benefit Data Transformation Optimization

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

The availability of machine-readable files opens up new possibilities for data analytics, allowing organizations to analyze large amounts of pricing data. Using machine learning (ML) and data visualization tools, these datasets can be transformed into actionable insights that can inform decision-making.

Visualization

Visualization Dashboards Data-driven Gap analysis

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

They can use their own toolsets or rely on provided blueprints to ingest the data from source systems. Once released, consumers use datasets from different providers for analysis, machine learning (ML) workloads, and visualization. The difference lies in when and where data transformation takes place.

Analytics

Analytics Dashboards Metadata Data Warehouse

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

AWS Glue is a serverless data integration service that helps analytics users to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. For this example, you use AWS Glue Studio to develop a visual ETL pipeline. Choose the Job details tab.

Data Processing

Data Processing Visualization Data Lake Data Processing

Unlock scalable analytics with AWS Glue and Google BigQuery

AWS Big Data

OCTOBER 27, 2023

Overview of AWS Glue AWS Glue is a serverless data integration service that makes it easier to discover, prepare, and combine data for analytics, machine learning (ML), and application development. AWS Glue provides both visual and code-based interfaces to make data integration easier. Open AWS Glue console.

Analytics

Analytics Visualization Data Integration Cost-Benefit

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

Movement of data across data lakes, data warehouses, and purpose-built stores is achieved by extract, transform, and load (ETL) processes using data integration services such as AWS Glue. AWS Glue provides both visual and code-based interfaces to make data integration effortless.

Analytics

Analytics IT Data Lake Visualization

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Her special areas of interest are data analytics, machine learning/AI, and application modernization. Rada Stanic is a Chief Technologist at Amazon Web Services, where she helps ANZ customers across different segments solve their business problems using AWS Cloud technologies.

Metadata

Metadata Data Governance Data Quality Data-driven

Extract time series from satellite weather data with AWS Lambda

AWS Big Data

JULY 6, 2023

It has not been specifically designed for heavy data transformation tasks. Step Functions helps developers use AWS services to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning (ML) pipelines. Add a data source block.

Machine Learning

Machine Learning Visualization IoT Digital Transformation

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

AWS Big Data

MAY 9, 2023

Hundreds of thousands of customers use AWS Glue , a serverless data integration service, to discover, prepare, and combine data for analytics, machine learning (ML), and application development. AWS Glue for Apache Spark jobs work with your code and configuration of the number of data processing units (DPU).

Data Lake

Data Lake Cost-Benefit Data Integration Data Transformation

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

By using AWS Glue to integrate data from Snowflake, Amazon S3, and SaaS applications, organizations can unlock new opportunities in generative artificial intelligence (AI) , machine learning (ML) , business intelligence (BI) , and self-service analytics or feed data to underlying applications. Choose the Job details tab.

Analytics

Analytics Data-driven Data Integration Data Lake

Advanced reporting and analytics for the Post Call Analytics (PCA) solution with Amazon QuickSight

AWS Big Data

JANUARY 27, 2023

The Post Call Analytics (PCA) solution uses AWS machine learning (ML) services like Amazon Transcribe and Amazon Comprehend to extract insights from contact center call audio recordings uploaded after the call, or from integration with our companion Live Call Analytics (LCA) solution. You can use filters to specify your criteria.

Analytics

Analytics Reporting Dashboards Visualization

Using COD and CML to build applications that predict stock data

Cloudera

FEBRUARY 8, 2021

Continuing from my previous blog post about how awesome and easy it is to develop web-based applications backed by Cloudera Operational Database (COD), I started a small project to integrate COD with another CDP cloud experience, Cloudera Machine Learning (CML). . b) Basic data transformation. Go to runner.py and run it.

Machine Learning

Machine Learning Statistics Dashboards Modeling

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

AWS Big Data

OCTOBER 5, 2023

With Amazon AppFlow, you can run data flows at nearly any scale at the frequency you choose—on a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Data Warehouse

Data Warehouse Machine Learning Data Integration Data-driven

How to Use Apache Iceberg in CDP’s Open Lakehouse

Cloudera

AUGUST 8, 2022

The general availability covers Iceberg running within some of the key data services in CDP, including Cloudera Data Warehouse ( CDW ), Cloudera Data Engineering ( CDE ), and Cloudera Machine Learning ( CML ). Cloudera Data Engineering (Spark 3) with Airflow enabled. Cloudera Machine Learning .

Snapshot

Snapshot Data Warehouse Machine Learning Cost-Benefit

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

Cloudera

OCTOBER 14, 2020

Apache Spark unifies batch processing, real-time processing, stream analytics, machine learning, and interactive query in one-platform. Also, such a concept helps admin to visualize the jobs which are scheduled for debugging purposes. Background. Why choose K8s for Apache Spark. Acknowledgments.

Machine Learning

Machine Learning Management Big Data Optimization

Connect your data for faster decisions with AWS

AWS Big Data

NOVEMBER 7, 2023

Second, organizations still need transformations like cleansing, deduplication, and combining datasets for analysis and machine learning (ML). For these, AWS Glue provides fast, scalable data transformation. This integration empowers users to go from data to predictions and visualizations faster than ever.

Dashboards

Dashboards Data-driven Data Integration Data Lake

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, data pipelines include more and more stages, thus making it difficult for data engineers to compile, manage, and troubleshoot those analytical workloads. As a result, alternative data integration technologies (e.g.,

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. Automated development: Automates data preparation, model development, feature engineering and hyperparameter optimization using AutoAI.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

If storing operational data in a data warehouse is a requirement, synchronization of tables between operational data stores and Amazon Redshift tables is supported. In scenarios where data transformation is required, you can use Redshift stored procedures to modify data in Redshift tables. AWS Glue 4.0

IoT

IoT Data Warehouse Cost-Benefit Reporting

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The advent of rapid adoption of serverless data lake architectures—with ever-growing datasets that need to be ingested from a variety of sources, followed by complex data transformation and machine learning (ML) pipelines—can present a challenge.

Data Lake

Data Lake Metrics Testing Cost-Benefit

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

Every time the business requirement changes (such as adding data sources or changing data transformation logic), you make changes on the AWS Glue app stack and re-provision the stack to reflect your changes. Prerequisites You need the following resources: Python 3.9 jobs locally using a Docker container. aws:/home/glue_user/.aws

Data Integration

Data Integration Snapshot Testing Visualization

Harnessing Streaming Data: Insights at the Speed of Life

Sisense

OCTOBER 15, 2020

As real-time analytics and machine learning stream processing are growing rapidly, they introduce a new set of technological and conceptual challenges. These experts can define transformations from streams to tables and govern the processing progress using a visual, SQL-based interface.

Dashboards

Dashboards IoT Optimization Internet of Things

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications. Choose Update.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Best BI Tools For 2024 You Need to Know

FineReport

MARCH 31, 2024

In 2024, business intelligence (BI) software has undergone significant advancements, revolutionizing data management and decision-making processes. Harnessing the power of advanced APIs, automation, and AI, these tools simplify data compilation, organization, and visualization, empowering users to extract actionable insights effortlessly.

Dashboards

Dashboards Visualization Data mining Data-driven

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

In this post, we dive deep into the tool, walking through all steps from log ingestion, transformation, visualization, and architecture design to calculate TCO. With QuickSight, you can visualize YARN log data and conduct analysis against the datasets generated by pre-built dashboard templates and a widget.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

The following AWS services are used for data ingestion, processing, and load: Amazon AppFlow is a fully managed integration service that enables you to securely transfer data between SaaS applications like Salesforce, SAP, Marketo, Slack, and ServiceNow, and AWS services like Amazon S3 and Amazon Redshift , in just a few clicks.

Sales

Sales Visualization Software Metadata

NEW: Octopai Announces Support of Microsoft Azure Data Factory

Octopai

JANUARY 19, 2021

With Octopai’s support and analysis of Azure Data Factory, enterprises can now view complete end-to-end data lineage from Azure Data Factory all the way through to reporting for the first time ever. The post NEW: Octopai Announces Support of Microsoft Azure Data Factory appeared first on Octopai.

Metadata

Metadata ROI Machine Learning Data Quality

How EUROGATE established a data mesh architecture using Amazon DataZone

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Webinars

Trending Sources

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

Webinars

SAP Datasphere Powers Business at the Speed of Data

Automating the Automators: Shift Change in the Robot Factory

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

What is data analytics? Analyzing and managing data for decisions

What is business analytics? Using data to improve business outcomes

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

12 data science certifications that will pay off

Data Engineers Are Using AI to Verify Data Transformations

Happy Birthday, CDP Public Cloud

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

7 key Microsoft Azure analytics services (plus one extra)

Unveiling the Top 10 Data Visualization Companies of 2024

Tableau further democratizes analytics with AI-fueled features

Automating Data Pipelines in CDP with CDE Managed Airflow Service

How healthcare organizations can analyze and create insights using price transparency data

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Use AWS Glue to streamline SFTP data processing

Unlock scalable analytics with AWS Glue and Google BigQuery

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Extract time series from satellite weather data with AWS Lambda

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Advanced reporting and analytics for the Post Call Analytics (PCA) solution with Amazon QuickSight

Using COD and CML to build applications that predict stock data

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

How to Use Apache Iceberg in CDP’s Open Lakehouse

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

Connect your data for faster decisions with AWS

Addressing the Three Scalability Challenges in Modern Data Platforms

Exploring the AI and data capabilities of watsonx

Amazon Redshift data ingestion options

Monitor data pipelines in a serverless data lake

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Harnessing Streaming Data: Insights at the Speed of Life

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Best BI Tools For 2024 You Need to Know

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Cross-account integration between SaaS platforms using Amazon AppFlow

NEW: Octopai Announces Support of Microsoft Azure Data Factory

Stay Connected