Data Transformation, Modeling and Testing

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

Let’s start by considering the job of a non-ML software engineer: writing traditional software deals with well-defined, narrowly-scoped inputs, which the engineer can exhaustively and cleanly model in the code. Not only is data larger, but models—deep learning models in particular—are much larger than before.

IT

IT Testing Experimentation Software

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. Create dbt models in dbt Cloud.

Data Warehouse

Data Warehouse Analytics Testing Modeling

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This saves time and effort, especially for teams looking to minimize infrastructure management and focus solely on data modeling.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

Given that, what would you say is the job of a data scientist (or ML engineer, or any other such title)? Building Models. A common task for a data scientist is to build a predictive model. You know the drill: pull some data, carve it up into features, feed it into one of scikit-learn’s various algorithms.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

Building this single source of truth was the only way the airport would have the capacity to augment the data with a digital twin, IoT sensor data, and predictive analytics, he says. It’s a big win for us — being able to look at all of our data in one repository and build machine learning models off of that,” he says.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

Data Engineers Are Using AI to Verify Data Transformations

Wayne Yaddow

FEBRUARY 26, 2025

AI is transforming how senior data engineers and data scientists validate data transformations and conversions. Artificial intelligence-based verification approaches aid in the detection of anomalies, the enforcement of data integrity, and the optimization of pipelines for improved efficiency.

Data Transformation

Data Transformation Testing Data-driven Data Quality

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Writing SQL queries requires not just remembering the SQL syntax rules, but also knowledge of the tables metadata, which is data about table schemas, relationships among the tables, and possible column values. Generative AI models can translate natural language questions into valid SQL queries, a capability known as text-to-SQL generation.

Metadata

Metadata Data Lake Modeling Data Warehouse

AzureML and CRISP-DM – a Framework to help the Business Intelligence professional move to AI

Jen Stirrup

SEPTEMBER 30, 2021

They may also learn from evidence, but the data and the modelling fundamentally comes from humans in some way. Data Science – Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.

Business Intelligence

Business Intelligence Data mining Machine Learning Testing

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Azure Databricks, a big data analytics platform built on Apache Spark, performs the actual data transformations. The cleaned and transformed data can then be stored in Azure Blob Storage or moved to Azure Synapse Analytics for further analysis and reporting. Some tools are excellent for batch processing (e.g.,

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

What is business analytics? Using data to improve business outcomes

CIO Business Intelligence

JULY 5, 2022

Business analytics is the practical application of statistical analysis and technologies on business data to identify and anticipate trends and predict business outcomes. Data analytics is used across disciplines to find trends and solve problems using data mining , data cleansing, data transformation, data modeling, and more.

Business Analytics

Business Analytics Prescriptive Analytics Data mining Diagnostic Analytics

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

Business/Data Analyst: The business analyst is all about the “meat and potatoes” of the business. These needs are then quantified into data models for acquisition and delivery. This person (or group of individuals) ensures that the theory behind data quality is communicated to the development team. 2 – Data profiling.

Data Quality

Data Quality Metrics Data-driven Management

What is a DataOps Engineer?

DataKitchen

OCTOBER 5, 2021

The data organization wants to run the Value Pipeline as robustly as a six sigma factory, and it must be able to implement and deploy process improvements as rapidly as a Silicon Valley start-up. The data engineer builds data transformations. Their product is the data. Create tests. Run the factory.

Testing

Testing Dashboards Measurement Experimentation

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

Your Chance: Want to test a professional logistics analytics software? Use our 14-days free trial today & transform your supply chain! Big data enables automated systems by intelligently routing many data sets and data streams. Your Chance: Want to test a professional logistics analytics software?

Big Data

Big Data Internet of Things Cost-Benefit Optimization

DataOps Observability: Taming the Chaos (part 1)

DataKitchen

OCTOBER 5, 2022

DataOps Observability can help you ensure that your complex data pipelines and processes are accurate and that they deliver as designed. Observability also validates that your data transformations, models, and reports are performing as expected. to monitor your data operations. without replacing staff or systems?to

Testing

Testing Risk Data Processing Statistics

What is data analytics? Analyzing and managing data for decisions

CIO Business Intelligence

JUNE 7, 2022

Data analytics draws from a range of disciplines — including computer programming, mathematics, and statistics — to perform analysis on data in an effort to describe, predict, and improve performance. What are the four types of data analytics? Data analytics includes the tools and techniques used to perform data analysis.

Data Analytics

Data Analytics Diagnostic Analytics Management Analytics

12 data science certifications that will pay off

CIO Business Intelligence

JANUARY 19, 2024

The exam tests general knowledge of the platform and applies to multiple roles, including administrator, developer, data analyst, data engineer, data scientist, and system architect. The exam is designed for seasoned and high-achiever data science thought and practice leaders.

Data Science

Data Science Machine Learning Predictive Modeling Forecasting

Migrate from Apache Solr to OpenSearch

AWS Big Data

JULY 18, 2024

As part of the migration, reconsider your data model. In examining your data model, you can find efficiencies that dramatically improve your search latencies and throughput. Poor data modeling doesn’t only result in search performance problems but extends to other areas.

Dashboards

Dashboards Testing Data-driven Visualization

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

A modern data platform entails maintaining data across multiple layers, targeting diverse platform capabilities like high performance, ease of development, cost-effectiveness, and DataOps features such as CI/CD, lineage, and unit testing. It does this by helping teams handle the T in ETL (extract, transform, and load) processes.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

DataOps Observability: Taming the Chaos (Part 2)

DataKitchen

OCTOBER 25, 2022

The goal of DataOps Observability is to provide visibility of every journey that data takes from source to customer value across every tool, environment, data store, data and analytic team, and customer so that problems are detected, localized and raised immediately. That data then fills several database tables.

Testing

Testing Data-driven Visualization Dashboards

DataOps Observability: Taming the Chaos (part 1)

DataKitchen

OCTOBER 5, 2022

DataOps Observability can help you ensure that your complex data pipelines and processes are accurate and that they deliver as designed. Observability also validates that your data transformations, models, and reports are performing as expected. to monitor your data operations. without replacing staff or systems?to

Testing

Testing Risk Data Processing Statistics

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

dbt allows data teams to produce trusted data sets for reporting, ML modeling, and operational workflows using SQL, with a simple workflow that follows software engineering best practices like modularity, portability, and continuous integration/continuous development (CI/CD). Introduction.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

Each CDH dataset has three processing layers: source (raw data), prepared (transformed data in Parquet), and semantic (combined datasets). It is possible to define stages (DEV, INT, PROD) in each layer to allow structured release and test without affecting PROD.

Analytics

Analytics Dashboards Metadata Data Warehouse

Adding AI to Products: A High-Level Guide for Product Managers

Sisense

AUGUST 6, 2020

Be sure test cases represent the diversity of app users. As an AI product manager, here are some important data-related questions you should ask yourself: What is the problem you’re trying to solve? What data transformations are needed from your data scientists to prepare the data? The perfect fit.

Management

Management Machine Learning Key Performance Indicator Cost-Benefit

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. This will enable right-sizing the Redshift data warehouse to meet workload demands cost-effectively. Dependency analysis Understanding dependencies between objects is crucial for a successful migration.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

To grow the power of data at scale for the long term, it’s highly recommended to design an end-to-end development lifecycle for your data integration pipelines. The following are common asks from our customers: Is it possible to develop and test AWS Glue data integration jobs on my local laptop?

Data Integration

Data Integration Snapshot Testing Visualization

Best Web Analytics 2.0 Tools: Quantitative, Qualitative, Life Saving!

Occam's Razor

OCTOBER 19, 2010

If you can show ROI on a DW it would be a good use of your money to go with Omniture Discover, WebTrends Data Mart, Coremetrics Explore. If you have evolved to a stage that you need behavior targeting then get Omniture Test and Target or Sitespect. Experimentation and Testing Tools [The "Why" – Part 1]. and embrace Multiplicity.

Analytics

Analytics Testing Measurement Optimization

Unparalleled Productivity: The Power of Cloudera Copilot for Cloudera Machine Learning

Cloudera

JUNE 24, 2024

The complexities of modern data workflows often translate into countless hours spent coding, debugging, and optimizing models. Recognizing this pain point, we set out to redefine the data science experience with AI-driven innovation. This practical support speeds up project initiation and maintains consistent coding practices.

Machine Learning

Machine Learning Data Science Data-driven Testing

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

Data Warehouse – in addition to a number of performance optimizations, DW has added a number of new features for better scalability, monitoring and reliability to enable self-service access with security and performance . Predict – Data Engineering (Apache Spark). New Services. Learn More, Keep in Touch.

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

Understand PMML (It’s Not That Hard)!

Smarten

MAY 13, 2024

Incorporate PMML Integration Within Augmented Analytics to Easily Manage Predictive Models! PMML is Predictive Model Markup Language. It is an interchange format that provides a method by which analytical applications and software can describe and exchange predictive models. So, what is PMML Integration?

Predictive Modeling

Predictive Modeling Data mining Predictive Analytics Modeling

Enrich, standardize, and translate streaming data in Amazon Redshift with generative AI

AWS Big Data

AUGUST 6, 2024

Amazon Redshift ML is a feature of Amazon Redshift that enables you to build, train, and deploy machine learning (ML) models directly within the Redshift environment. Generative AI models can derive new features from your data and enhance decision-making. Create a materialized view to load the raw streaming data.

Data Warehouse

Data Warehouse Data-driven Modeling Internet of Things

Manual Feature Engineering

Domino Data Lab

AUGUST 20, 2019

Feature engineering is useful for data scientists when assessing tradeoff decisions regarding the impact of their ML models. It is a framework for approaching ML as well as providing techniques for extracting features from raw data that can be used within the models. Feature Engineering Terminology and Motivation.

Testing

Testing Modeling Interactive Measurement

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

This service supports a range of optimized AI models, enabling seamless and scalable AI inference. Enterprise developers began exploring proof of concepts (POCs) for generative AI applications, leveraging API services and open models such as Llama 2 and Mistral. By 2023, the focus shifted towards experimentation.

Optimization

Optimization Experimentation Metrics Enterprise

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

Our approach The migration initiative consisted of two main parts: building the new architecture and migrating data pipelines from the existing tool to the new architecture. Often, we would work on both in parallel, testing one component of the architecture while developing another at the same time.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

The 10 biggest issues IT faces today

CIO Business Intelligence

JUNE 13, 2022

According to Evanta’s 2022 CIO Leadership Perspectives study, CIOs’ second top priority within the IT function is around data and analytics, with CIOs seeing advancing organizational use of data as key to reaching enterprise objectives. To get there, Angel-Johnson has embarked on a master data management initiative.

IT

IT Digital Transformation Internet of Things Strategy

Cloudera Data Engineering 2021 Year End Review

Cloudera

DECEMBER 21, 2021

Cloudera’s Shared Data Experience (SDX) provides all these capabilities allowing seamless data sharing across all the Data Services including CDE. This enabled new use-cases with customers that were using a mix of Spark and Hive to perform data transformations. . Test Drive CDP Pubic Cloud.

Snapshot

Snapshot Data-driven Optimization Data Architecture

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Cloudera

JULY 21, 2022

Cloudera users can securely connect Rill to a source of event stream data, such as Cloudera DataFlow , model data into Rill’s cloud-based Druid service, and share live operational dashboards within minutes via Rill’s interactive metrics dashboard or any connected BI solution. Cloudera Data Warehouse). Apache Hive.

Metrics

Metrics Slice and Dice Data Warehouse Dashboards

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

As with all AWS services, Amazon Redshift is a customer-obsessed service that recognizes there isn’t a one-size-fits-all for customers when it comes to data models, which is why Amazon Redshift supports multiple data models such as Star Schemas, Snowflake Schemas and Data Vault. Data Vault 2.0

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

Duplicating data from a production database to a lower or lateral environment and masking personally identifiable information (PII) to comply with regulations enables development, testing, and reporting without impacting critical systems or exposing sensitive customer data. PII detection and scrubbing.

Visualization

Visualization Metadata Data Transformation Testing

Using COD and CML to build applications that predict stock data

Cloudera

FEBRUARY 8, 2021

Development Environment for Data Scientists, Isolated, Containerized, and Elastic. Production ML Toolkit – Deploying, Serving, Monitoring, and Governance of ML models. Simple, drag-and-drop building of dashboards and apps with Cloudera Data Visualization. Now, let’s start testing our model! and run it.

Machine Learning

Machine Learning Statistics Dashboards Modeling

Connecting the Data Lifecycle

Cloudera

NOVEMBER 29, 2021

Data transforms businesses. That’s where the data lifecycle comes into play. Managing data and its flow, from the edge to the cloud, is one of the most important tasks in the process of gaining data intelligence. .

Data Lake

Data Lake Data Warehouse Data Architecture Reporting

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

AWS Big Data

FEBRUARY 26, 2024

When you start the process of designing your data model for Amazon Keyspaces, it’s essential to possess a comprehensive understanding of your access patterns, similar to the approach used in other NoSQL databases. Additionally, you can configure OpenSearch Ingestion to apply data transformations before delivery.

Dashboards

Dashboards Testing Metrics Optimization

How SafetyCulture scales unpredictable dbt Cloud workloads in a cost-effective manner with Amazon Redshift

AWS Big Data

MARCH 16, 2023

A source of unpredictable workloads is dbt Cloud , which SafetyCulture uses to manage data transformations in the form of models. Whenever models are created or modified, a dbt Cloud CI job is triggered to test the models by materializing the models in Amazon Redshift.

Data Warehouse

Data Warehouse Testing Snapshot Modeling

Assessing and interviewing data engineers from a distance

Insight

APRIL 8, 2020

In some cases, they work to deploy data science models into production with an eye towards optimization, scalability and maintainability. Data architects and data modelers who specialize in areas such as schema design, identifying query access patterns and building and maintaining data warehouses.

Data Warehouse

Data Warehouse Cost-Benefit Software Optimization

MLOps and DevOps: Why Data Makes It Different

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Automating the Automators: Shift Change in the Robot Factory

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Ensuring Data Transformation Quality with dbt Core

Data Engineers Are Using AI to Verify Data Transformations

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AzureML and CRISP-DM – a Framework to help the Business Intelligence professional move to AI

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

What is business analytics? Using data to improve business outcomes

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

What is a DataOps Engineer?

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

DataOps Observability: Taming the Chaos (part 1)

What is data analytics? Analyzing and managing data for decisions

12 data science certifications that will pay off

Migrate from Apache Solr to OpenSearch

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

DataOps Observability: Taming the Chaos (Part 2)

DataOps Observability: Taming the Chaos (part 1)

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Adding AI to Products: A High-Level Guide for Product Managers

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Best Web Analytics 2.0 Tools: Quantitative, Qualitative, Life Saving!

Unparalleled Productivity: The Power of Cloudera Copilot for Cloudera Machine Learning

Happy Birthday, CDP Public Cloud

Understand PMML (It’s Not That Hard)!

Enrich, standardize, and translate streaming data in Amazon Redshift with generative AI

Manual Feature Engineering

Deploy and Scale AI Applications With Cloudera AI Inference Service

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

The 10 biggest issues IT faces today

Cloudera Data Engineering 2021 Year End Review

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

Using COD and CML to build applications that predict stock data

Connecting the Data Lifecycle

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

How SafetyCulture scales unpredictable dbt Cloud workloads in a cost-effective manner with Amazon Redshift

Assessing and interviewing data engineers from a distance

Stay Connected