Data Transformation, IT and Testing

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

Why: Data Makes It Different. In contrast, a defining feature of ML-powered applications is that they are directly exposed to a large amount of messy, real-world data which is too complex to be understood and modeled by hand. However, the concept is quite abstract. Can’t we just fold it into existing DevOps best practices?

IT

IT Testing Experimentation Software

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

response = client.create( key="test", value="Test value", description="Test description" ) print(response) print("nListing all variables.") variables = client.list() print(variables) print("nGetting the test variable.") Creating a test variable. Creating a test variable. Creating a test variable.

Interactive

Interactive Testing Data-driven Data Lake

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

The data in Amazon Redshift is transactionally consistent and updates are automatically and continuously propagated. Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization.

Data Warehouse

Data Warehouse Analytics Testing Modeling

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This approach helps in managing storage costs while maintaining the flexibility to analyze historical trends when needed.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Complex Data Transformations — Test Planning Best Practices

Wayne Yaddow

FEBRUARY 21, 2025

Complex Data TransformationsTest Planning Best Practices Ensuring data accuracy with structured testing and best practices Photo by Taylor Vick on Unsplash Introduction Data transformations and conversions are crucial for data pipelines, enabling organizations to process, integrate, and refine raw data into meaningful insights.

Testing

Testing Data Transformation Data Quality Data Integration

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Wayne Yaddow

MARCH 5, 2025

In this post, well see the fundamental procedures, tools, and techniques that data engineers, data scientists, and QA/testing teams use to ensure high-quality data as soon as its deployed. First, we look at how unit and integration tests uncover transformation errors at an early stage. PyTest, JUnit,NUnit).

Testing

Testing Data Transformation Statistics Metadata

Functional Gaps in Your Data Transformation Testing Tools?

Wayne Yaddow

FEBRUARY 11, 2025

Managing tests of complex data transformations when automated data testing tools lack important features? Photo by Marvin Meyer on Unsplash Introduction Data transformations are at the core of modern business intelligence, blending and converting disparate datasets into coherent, reliable outputs.

Testing

Testing Data Transformation Data Quality Statistics

Key Challenges Affecting Data Transformations—Dev and Testing

Wayne Yaddow

FEBRUARY 6, 2025

Common challenges and practical mitigation strategies for reliable data transformations. Photo by Mika Baumeister on Unsplash Introduction Data transformations are important processes in data engineering, enabling organizations to structure, enrich, and integrate data for analytics , reporting, and operational decision-making.

Testing

Testing Data Transformation Data-driven Manufacturing

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

At Atlanta’s Hartsfield-Jackson International Airport, an IT pilot has led to a wholesale data journey destined to transform operations at the world’s busiest airport, fueled by machine learning and generative AI. Applying AI to elevate ROI Pruitt and Databricks recently finished a pilot test with Microsoft called Smart Flow.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Is Big Data Transforming Our Broken Hospital Management Systems?

Smart Data Collective

JULY 25, 2019

When it comes to the use of modern big data technologies by hospitals, it is about health care and saving lives. These systems rely heavily on big data to improve efficiency and cost-effectiveness. It is simple and convenient to use outsourcing IT services when you need to get a perfect big data solution. Conclusion.

Big Data

Big Data Data Transformation Management Software

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

Ensuring Data Transformation Results with Great Expectations

Wayne Yaddow

MARCH 12, 2025

How GX helps data teams validate, test, and monitor complex data pipelines Introduction Data flows from diverse sources, and transformations are becoming increasingly complex. Great Expectations can enable a wide range of data transformations and conversion operations.

Data Transformation

Data Transformation Data Quality Testing Data Warehouse

The 10 biggest issues IT faces today

CIO Business Intelligence

JUNE 13, 2022

Ever increasing demands for transformation. Growing cybersecurity, data privacy threats. According to Evanta’s 2022 CIO Leadership Perspectives study, CIOs’ second top priority within the IT function is around data and analytics, with CIOs seeing advancing organizational use of data as key to reaching enterprise objectives.

IT

IT Digital Transformation Internet of Things Strategy

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

For each service, you need to learn the supported authorization and authentication methods, data access APIs, and framework to onboard and test data sources. This approach simplifies your data journey and helps you meet your security requirements. Choose Add data. For Database , enter your database name.

Visualization

Visualization Data Processing Testing Publishing

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

A common task for a data scientist is to build a predictive model. You know the drill: pull some data, carve it up into features, feed it into one of scikit-learn’s various algorithms. Collectively, your attempts teach you about your data and its relation to the problem you’re trying to solve.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

It’s Essential — Verifying Data Transformations (Part 4)

Wayne Yaddow

FEBRUARY 4, 2025

Its EssentialVerifying Data Transformations (Part4) Uncovering the leading problems in data transformation workflowsand practical ways to detect and preventthem In Parts 13 of this series of blogs, categories of data transformations were identified as among the top causes of data quality defects in data pipeline workflows.

Data Transformation

Data Transformation Testing Data Quality Strategy

Data Engineers Are Using AI to Verify Data Transformations

Wayne Yaddow

FEBRUARY 26, 2025

AI is transforming how senior data engineers and data scientists validate data transformations and conversions. Artificial intelligence-based verification approaches aid in the detection of anomalies, the enforcement of data integrity, and the optimization of pipelines for improved efficiency.

Data Transformation

Data Transformation Testing Data-driven Data Quality

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

Table of Contents 1) Benefits Of Big Data In Logistics 2) 10 Big Data In Logistics Use Cases Big data is revolutionizing many fields of business, and logistics analytics is no exception. The complex and ever-evolving nature of logistics makes it an essential use case for big data applications. Did you know?

Big Data

Big Data Internet of Things Cost-Benefit Optimization

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

Your generated jobs can use a variety of data transformations, including filters, projections, unions, joins, and aggregations, giving you the flexibility to handle complex data processing requirements. In this post, we discuss how Amazon Q data integration transforms ETL workflow development.

Data Integration

Data Integration Visualization Data Processing Data Lake

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

What Is Data Quality Management (DQM)? Data quality management is a set of practices that aim at maintaining a high quality of information. It goes all the way from the acquisition of data and the implementation of advanced data processes, to an effective distribution of data.

Data Quality

Data Quality Metrics Data-driven Management

What is a DataOps Engineer?

DataKitchen

OCTOBER 5, 2021

Data operations (or data production) is a series of pipeline procedures that take raw data, progress through a series of processing and transformation steps, and output finished products in the form of dashboards, predictions, data warehouses or whatever the business requires. Their product is the data.

Testing

Testing Dashboards Measurement Experimentation

DataOps Observability: Taming the Chaos (part 1)

DataKitchen

OCTOBER 5, 2022

Observability is a methodology for providing visibility of every journey that data takes from source to customer value across every tool, environment, data store, team, and customer so that problems are detected and addressed immediately. to monitor your data operations. DataOps Industry Challenges. This call is from the CEO?his

Testing

Testing Risk Data Processing Statistics

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

While working in Azure with our customers, we have noticed several standard Azure tools people use to develop data pipelines and ETL or ELT processes. We counted ten ‘standard’ ways to transform and set up batch data pipelines in Microsoft Azure. You can use it for big data analytics and machine learning workloads.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

Extrinsic Control Deficit: Many of these changes stem from tools and processes beyond the immediate control of the data team. Unregulated ETL/ELT Processes: The absence of stringent data quality tests in ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) processes further exacerbates the problem.

Data Quality

Data Quality Testing Data Lake Data Integration

AzureML and CRISP-DM – a Framework to help the Business Intelligence professional move to AI

Jen Stirrup

SEPTEMBER 30, 2021

They may also learn from evidence, but the data and the modelling fundamentally comes from humans in some way. Data Science – Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data. Credit: [link].

Business Intelligence

Business Intelligence Data mining Machine Learning Testing

Improve Business Agility by Hiring a DataOps Engineer

DataKitchen

DECEMBER 20, 2020

Data-driven companies sense change through data analytics. Companies turn to their data organization to provide the analytics that stimulates creative problem-solving. Companies turn to their data organization to provide the analytics that stimulates creative problem-solving. – Leon C. Adapt or face decline.

Data-driven

Data-driven Manufacturing Data Architecture Data Analytics

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

Building a data platform involves various approaches, each with its unique blend of complexities and solutions. In this post, we delve into a case study for a retail use case, exploring how the Data Build Tool (dbt) was used effectively within an AWS environment to build a high-performing, efficient, and modern data platform.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

We live in a world of data: There’s more of it than ever before, in a ceaselessly expanding array of forms and locations. Dealing with Data is your window into the ways data teams are tackling the challenges of this new world to help their companies and their customers thrive. What is data integrity?

Data Integration

Data Integration Testing Data Quality Data-driven

What is business analytics? Using data to improve business outcomes

CIO Business Intelligence

JULY 5, 2022

What is the difference between business analytics and data analytics? Business analytics is a subset of data analytics. Data analytics is used across disciplines to find trends and solve problems using data mining , data cleansing, data transformation, data modeling, and more.

Business Analytics

Business Analytics Prescriptive Analytics Data mining Diagnostic Analytics

Migrate from Apache Solr to OpenSearch

AWS Big Data

JULY 18, 2024

The main driving factors include lower total cost of ownership, scalability, stability, improved ingestion connectors (such as Data Prepper , Fluent Bit, and OpenSearch Ingestion), elimination of external cluster managers like Zookeeper, enhanced reporting, and rich visualizations with OpenSearch Dashboards.

Dashboards

Dashboards Testing Data-driven Visualization

What is data analytics? Analyzing and managing data for decisions

CIO Business Intelligence

JUNE 7, 2022

What is data analytics? Data analytics is a discipline focused on extracting insights from data. It comprises the processes, tools and techniques of data analysis and management, including the collection, organization, and storage of data. What are the four types of data analytics?

Data Analytics

Data Analytics Diagnostic Analytics Management Analytics

DataOps Observability: Taming the Chaos (part 1)

DataKitchen

OCTOBER 5, 2022

Observability is a methodology for providing visibility of every journey that data takes from source to customer value across every tool, environment, data store, team, and customer so that problems are detected and addressed immediately. to monitor your data operations. DataOps Industry Challenges. This call is from the CEO?his

Testing

Testing Risk Data Processing Statistics

DataOps Observability: Taming the Chaos (Part 2)

DataKitchen

OCTOBER 25, 2022

Observability is a methodology for providing visibility of every journey that data takes from source to customer value across every tool, environment, data store, team, and customer so that problems are detected and addressed immediately. Data journey observability is the first step in implementing DataOps.

Testing

Testing Data-driven Visualization Dashboards

Development Strategies to Prevent Data Quality Issues in Production (Part 1)

Wayne Yaddow

MARCH 3, 2025

Photo by CDC on Unsplash Many data pipeline failures and quality issues that are detected by data observability tools in production could have been prevented earlier in the pipeline lifecycle with better pre-production testing strategies. Helps identify transformation errors, and data quality issues early, minimizing risks.

Data Quality

Data Quality Strategy ROI Testing

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

Cloudera has been providing enterprise support for Apache NiFi since 2015, helping hundreds of organizations take control of their data movement pipelines on premises and in the public cloud. Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow.

Testing

Testing Cost-Benefit Interactive Visualization

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Best Web Analytics 2.0 Tools: Quantitative, Qualitative, Life Saving!

Occam's Razor

OCTOBER 19, 2010

An expansive view of what it means to use data online, both from the type of data perspective and the kind of desired impact perspective. What is the first thing you want when you think about web analytics? Of course tools. What to do, where to start, what's cool. recommending tools for the complete web analytics 2.0 spectrum.

Analytics

Analytics Testing Measurement Optimization

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

We’re excited to announce the general availability of the open source adapters for dbt for all the engines in CDP — Apache Hive , Apache Impala , and Apache Spark, with added support for Apache Livy and Cloudera Data Engineering. The Open Data Lakehouse . Cloudera builds dbt adaptors for all engines in the open data lakehouse.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

12 data science certifications that will pay off

CIO Business Intelligence

JANUARY 19, 2024

Data scientist is one of the hottest jobs in IT. Companies are increasingly eager to hire data professionals who can make sense of the wide array of data the business collects. Candidates for the exam are tested on ML, AI solutions, NLP, computer vision, and predictive analytics.

Data Science

Data Science Machine Learning Predictive Modeling Forecasting

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

AWS Big Data

APRIL 12, 2023

We also share a Spark benchmark solution that suits all Amazon EMR deployment options, so you can replicate the process in your environment for your own performance test cases. The solution uses the TPC-DS dataset and unmodified data schema and table relationships, but derives queries from TPC-DS to support the SparkSQL test cases.

Testing

Testing Big Data Metadata Optimization

Apply fine-grained access and transformation on the SUPER data type in Amazon Redshift

AWS Big Data

JUNE 19, 2024

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing ETL (extract, transform, and load), business intelligence (BI), and reporting tools. All columns should masked for them.

Data Warehouse

Data Warehouse Testing Sales Structured Data

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

To grow the power of data at scale for the long term, it’s highly recommended to design an end-to-end development lifecycle for your data integration pipelines. The following are common asks from our customers: Is it possible to develop and test AWS Glue data integration jobs on my local laptop?

Data Integration

Data Integration Snapshot Testing Visualization

Introducing Self-Service, No-Code Airflow Authoring UI in Cloudera Data Engineering

Cloudera

OCTOBER 19, 2021

Airflow has been adopted by many Cloudera Data Platform (CDP) customers in the public cloud as the next generation orchestration service to setup and operationalize complex data pipelines. Making the most commonly used as readily available as possible is critical to reduce development friction. .

Data Transformation

Data Transformation Interactive Machine Learning Testing

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Writing SQL queries requires not just remembering the SQL syntax rules, but also knowledge of the tables metadata, which is data about table schemas, relationships among the tables, and possible column values. As a result, the model used the correct column names and data types and restricted the DATE casting to a literal string value.

Metadata

Metadata Data Lake Modeling Data Warehouse

MLOps and DevOps: Why Data Makes It Different

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Webinars

Trending Sources

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Complex Data Transformations — Test Planning Best Practices

From Raw Inputs to Polished Outputs: The Art of Testing Data Transformations

Functional Gaps in Your Data Transformation Testing Tools?

Key Challenges Affecting Data Transformations—Dev and Testing

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Is Big Data Transforming Our Broken Hospital Management Systems?

Ensuring Data Transformation Quality with dbt Core

Ensuring Data Transformation Results with Great Expectations

The 10 biggest issues IT faces today

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Automating the Automators: Shift Change in the Robot Factory

It’s Essential — Verifying Data Transformations (Part 4)

Data Engineers Are Using AI to Verify Data Transformations

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

What is a DataOps Engineer?

DataOps Observability: Taming the Chaos (part 1)

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Navigating the Chaos of Unruly Data: Solutions for Data Teams

AzureML and CRISP-DM – a Framework to help the Business Intelligence professional move to AI

Improve Business Agility by Hiring a DataOps Engineer

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Data Integrity, the Basis for Reliable Insights

What is business analytics? Using data to improve business outcomes

Migrate from Apache Solr to OpenSearch

What is data analytics? Analyzing and managing data for decisions

DataOps Observability: Taming the Chaos (part 1)

DataOps Observability: Taming the Chaos (Part 2)

Development Strategies to Prevent Data Quality Issues in Production (Part 1)

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Best Web Analytics 2.0 Tools: Quantitative, Qualitative, Life Saving!

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

12 data science certifications that will pay off

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

Apply fine-grained access and transformation on the SUPER data type in Amazon Redshift

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Introducing Self-Service, No-Code Airflow Authoring UI in Cloudera Data Engineering

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Stay Connected