Data Transformation, Interactive and Testing

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

This improvement streamlines the ability to access and manage your Airflow environments and their integration with external systems, and allows you to interact with your workflows programmatically. Airflow REST API The Airflow REST API is a programmatic interface that allows you to interact with Airflow’s core functionalities.

Interactive

Interactive Testing Data-driven Data Lake

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. Next, use the dbt Cloud interactive development environment (IDE) to deploy your project.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. Choose Test Connection.

Data Warehouse

Data Warehouse Analytics Testing Modeling

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

The rise of SaaS business intelligence tools is answering that need, providing a dynamic vessel for presenting and interacting with essential insights in a way that is digestible and accessible. The future is bright for logistics companies that are willing to take advantage of big data.

Big Data

Big Data Internet of Things Cost-Benefit Optimization

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

Extrinsic Control Deficit: Many of these changes stem from tools and processes beyond the immediate control of the data team. Unregulated ETL/ELT Processes: The absence of stringent data quality tests in ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) processes further exacerbates the problem.

Data Quality

Data Quality Testing Data Lake Data Integration

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

from the business interactions), but if not available, then through confirmation techniques of an independent nature. It will indicate whether data is void of significant errors. Also known as data validation, integrity refers to the structural testing of data to ensure that the data complies with procedures.

Data Quality

Data Quality Metrics Data-driven Management

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Amazon Athena provides interactive analytics service for analyzing the data in Amazon Simple Storage Service (Amazon S3). Amazon Redshift is used to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes.

Metadata

Metadata Data Lake Modeling Data Warehouse

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow. Interactivity when needed while saving costs. To meet this need we’ve introduced a new concept called test sessions with the DataFlow Designer. .

Testing

Testing Cost-Benefit Interactive Visualization

Migrate from Apache Solr to OpenSearch

AWS Big Data

JULY 18, 2024

It’s also an analytics suite that you can use to perform interactive log analytics, real-time application monitoring, security analytics and more. OpenSearch also includes capabilities to ingest and analyze data. For example, the following creates a collection called test with one shard and no replicas.

Dashboards

Dashboards Testing Data-driven Visualization

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

A modern data platform entails maintaining data across multiple layers, targeting diverse platform capabilities like high performance, ease of development, cost-effectiveness, and DataOps features such as CI/CD, lineage, and unit testing. It does this by helping teams handle the T in ETL (extract, transform, and load) processes.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Introducing Self-Service, No-Code Airflow Authoring UI in Cloudera Data Engineering

Cloudera

OCTOBER 19, 2021

Airflow has been adopted by many Cloudera Data Platform (CDP) customers in the public cloud as the next generation orchestration service to setup and operationalize complex data pipelines. It was critical to make the interactions as intuitive as possible to avoid slowing down the flow of the user.

Data Transformation

Data Transformation Interactive Machine Learning Testing

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

AWS Big Data

OCTOBER 18, 2023

They use various AWS analytics services, such as Amazon EMR, to enable their analysts and data scientists to apply advanced analytics techniques to interactively develop and test new surveillance patterns and improve investor protection. starts_with(OutputKey,'eksclusterEKSConfig')].OutputValue" OutputKey=='HiveSecretName'].OutputValue"

Big Data

Big Data Data Processing Interactive Testing

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

Allows them to iteratively develop processing logic and test with as little overhead as possible. Plays nice with existing CI/CD processes to promote a data pipeline to production. Provides monitoring, alerting, and troubleshooting for production data pipelines.

Testing

Testing Publishing Metadata Interactive

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

Each CDH dataset has three processing layers: source (raw data), prepared (transformed data in Parquet), and semantic (combined datasets). It is possible to define stages (DEV, INT, PROD) in each layer to allow structured release and test without affecting PROD.

Dashboards

Dashboards Analytics Metadata Data Warehouse

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

To grow the power of data at scale for the long term, it’s highly recommended to design an end-to-end development lifecycle for your data integration pipelines. The following are common asks from our customers: Is it possible to develop and test AWS Glue data integration jobs on my local laptop?

Data Integration

Data Integration Snapshot Testing Visualization

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. This will enable right-sizing the Redshift data warehouse to meet workload demands cost-effectively. Additional considerations – Factor in additional tasks beyond schema conversion.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Cloudera

JULY 21, 2022

As creators and experts in Apache Druid, Rill understands the data store’s importance as the engine for real-time, highly interactive analytics. Cloudera Data Warehouse). Efficient batch data processing. Complex data transformations. Figure 1: Rill and Cloudera Architecture. Apache Hive. Windowing functions.

Metrics

Metrics Slice and Dice Data Warehouse Dashboards

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

We introduce you to Amazon Managed Service for Apache Flink Studio and get started querying streaming data interactively using Amazon Kinesis Data Streams. Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources.

Management

Management Metadata Analytics Dashboards

Manual Feature Engineering

Domino Data Lab

AUGUST 20, 2019

The problem is that a new unique identifier of a test example won’t be anywhere in the tree. Feature extraction means moving from low-level features that are unsuitable for learning—practically speaking, we get poor testing results—to higher-level features which are useful for learning. Separate out a hold-out test set.

Testing

Testing Modeling Interactive Measurement

Adding AI to Products: A High-Level Guide for Product Managers

Sisense

AUGUST 6, 2020

Be sure test cases represent the diversity of app users. As an AI product manager, here are some important data-related questions you should ask yourself: What is the problem you’re trying to solve? What data transformations are needed from your data scientists to prepare the data? The perfect fit.

Management

Management Machine Learning Key Performance Indicator Cost-Benefit

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

Comprehensive safeguards, including authentication and authorization, ensure that only users with configured access can interact with the model endpoint. The service also meets enterprise-grade security and compliance standards, recording all model interactions for governance and audit.

Optimization

Optimization Experimentation Metrics Enterprise

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

AWS Big Data

JULY 27, 2023

DataBrew is a visual data preparation tool that enables you to clean and normalize data without writing any code. The over 200 transformations it provides are now available to be used in an AWS Glue Studio visual job. Create a DataBrew recipe Start by registering the data store for the claims file.

Visualization

Visualization Cost-Benefit Data Quality Publishing

How to Use Apache Iceberg in CDP’s Open Lakehouse

Cloudera

AUGUST 8, 2022

The general availability covers Iceberg running within some of the key data services in CDP, including Cloudera Data Warehouse ( CDW ), Cloudera Data Engineering ( CDE ), and Cloudera Machine Learning ( CML ). In the final stage of our ETL pipeline, we load new data into this partition. Using CDW with Iceberg.

Snapshot

Snapshot Data Warehouse Machine Learning Cost-Benefit

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

Cloudera

OCTOBER 14, 2020

Apache Spark unifies batch processing, real-time processing, stream analytics, machine learning, and interactive query in one-platform. The Test and Development queue have fixed resource limits. Background. Why choose K8s for Apache Spark. All other queues are only limited by the size of the cluster. Acknowledgments.

Machine Learning

Machine Learning Management Big Data Optimization

Improve observability across Amazon MWAA tasks

AWS Big Data

FEBRUARY 6, 2023

For data pipeline orchestration, the Apache Airflow UI is a user-friendly tool that provides detailed views into your data pipeline. When it comes to pipeline health management, each service that your tasks are interacting with could be storing or publishing logs to different locations, such as an S3 bucket or Amazon CloudWatch logs.

Management

Management Interactive Publishing Metadata

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Building a starter version of anything can often be straightforward, but building something with enterprise-grade scale, security, resiliency, and performance typically requires knowledge of and adherence to battle-tested best practices, and using the right tools and features in the right scenario. Data Vault 2.0

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches. This solution includes a Lambda function that continuously updates the Amazon Location tracker with simulated location data from fictitious journeys.

Analytics

Analytics IoT Metadata Internet of Things

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

AWS Big Data

JUNE 29, 2023

In this post, we discuss why AWS recommends moving from Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics for Apache Flink to take advantage of Apache Flink’s advanced streaming capabilities. Kinesis Data Analytics Studio allows us to create a notebook, which is a web-based development environment.

Data Analytics

Data Analytics Analytics IoT Data Lake

AI, the Power of Knowledge and the Future Ahead: An Interview with Head of Ontotext’s R&I Milena Yankova

Ontotext

APRIL 4, 2019

Within a large enterprise, there is a huge amount of data accumulated over the years – many decisions have been made and different methods have been tested. This is one of the main diagnostic tests. The doctor needs to know how to collect the data from this image. This process requires great expertise.

Recreation/Entertainment

Recreation/Entertainment Testing Enterprise Knowledge Discovery

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Alation

OCTOBER 27, 2022

Few actors in the modern data stack have inspired the enthusiasm and fervent support as dbt. This data transformation tool enables data analysts and engineers to transform, test and document data in the cloud data warehouse. Jason: What’s the value of using dbt with the data catalog ?

Dashboards

Dashboards Metrics Sales Reporting

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

As data science is growing in popularity and importance , if your organization uses data science, you’ll need to pay more attention to picking the right tools for this. An example of a data science tool is Dataiku. Business Intelligence Tools: Business intelligence (BI) tools are used to visualize your data.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

For these workloads, data lake vendors usually recommend extracting data into flat files to be used solely for model training and testing purposes. This adds an additional ETL step, making the data even more stale. Data lakehouse was created to solve these problems. Each node can be different from the others.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

AWS Big Data

FEBRUARY 21, 2023

Solutions Architect – AWS SafeGraph is a geospatial data company that curates over 41 million global points of interest (POIs) with detailed attributes, such as brand affiliation, advanced category tagging, and open hours, as well as how people interact with those places. These versions are all exposed to users via their UI.

Cost-Benefit

Cost-Benefit Informatics Optimization Management

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

While they require task-specific labeled data for fine tuning, they also offer clients the best cost performance trade-off for non-generative use cases. offers a Prompt Lab, where users can interact with different prompts using prompt engineering on generative AI models for both zero-shot prompting and few-shot prompting.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Bringing MMM to 21st Century with Machine Learning and Automation?

DataRobot Blog

APRIL 4, 2022

Before the data is put into the model comes a process called feature engineering – transforming the original data columns to impose certain business assumptions or simply increase model accuracy. The classical approach is to assume the adstock function (typically linear ) and test out various values of ?

Machine Learning

Machine Learning Sales Measurement ROI

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

The initiative has enhanced coordination, as automation APIs facilitate interaction with security tools as well as streamline coordination and enhance mitigation responses. This is a new way to interact with the web and search. This enabled the team to expose the technology to a small group of senior leaders to test.

IT

IT Insurance Cost-Benefit Testing

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

After the data lands in Amazon S3, smava uses the AWS Glue Data Catalog and crawlers to automatically catalog the available data, capture the metadata, and provide an interface that allows querying all data assets. The data products from the Business Vault and Data Mart stages are now available for consumers.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

Modak Nabu relies on a framework of “Botworks”, a series of micro-jobs to accomplish various data transformation steps from ingestion to profiling, and indexing. Cloudera Data Engineering within CDP provides : Fully managed Spark-on-Kubernetes service that hides the complexity running production DE workloads at scale.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

Best BI Tools For 2024 You Need to Know

FineReport

MARCH 31, 2024

Through meticulous testing and research, we’ve curated a list of the ten best BI tools, ensuring accessibility and efficacy for businesses of all sizes. In essence, the core capabilities of the best BI tools revolve around four essential functions: data integration, data transformation, data visualization, and reporting.

Dashboards

Dashboards Visualization Data mining Data-driven

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

AWS Big Data

APRIL 5, 2023

Showpad built new customer-facing embedded dashboards within Showpad eOSTM and migrated its legacy dashboards to Amazon QuickSight , a unified BI service providing modern interactive dashboards, natural language querying, paginated reports, machine learning (ML) insights, and embedded analytics at scale.

Dashboards

Dashboards Reporting Cost-Benefit Visualization

Introducing the HubSpot connector for AWS Glue

AWS Big Data

DECEMBER 2, 2024

AWS Glue establishes a secure connection to HubSpot using OAuth for authorization and TLS for data encryption in transit. AWS Glue also supports the ability to apply complex data transformations, enabling efficient data integration and preparation to meet your needs. Choose Next. Choose Connect App. Choose Next.

Data Lake

Data Lake Testing Data Integration Metadata

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

This is in contrast to traditional BI, which extracts insight from data outside of the app. As rich, data-driven user experiences are increasingly intertwined with our daily lives, end users are demanding new standards for how they interact with their business data. Yes—but basic dashboards won’t be enough.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Trending Sources

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

Ensuring Data Transformation Quality with dbt Core

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

Navigating the Chaos of Unruly Data: Solutions for Data Teams

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Migrate from Apache Solr to OpenSearch

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Introducing Self-Service, No-Code Airflow Authoring UI in Cloudera Data Engineering

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Manual Feature Engineering

Adding AI to Products: A High-Level Guide for Product Managers

Deploy and Scale AI Applications With Cloudera AI Inference Service

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

How to Use Apache Iceberg in CDP’s Open Lakehouse

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

Improve observability across Amazon MWAA tasks

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

AI, the Power of Knowledge and the Future Ahead: An Interview with Head of Ontotext’s R&I Milena Yankova

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

The Modern Data Stack Explained: What The Future Holds

Data platform trinity: Competitive or complementary?

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

Exploring the AI and data capabilities of watsonx

Bringing MMM to 21st Century with Machine Learning and Automation?

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

CIO 100 Award winners drive business results with IT

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Best BI Tools For 2024 You Need to Know

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

Introducing the HubSpot connector for AWS Glue

What Is Embedded Analytics?

Stay Connected