Data Transformation, Management and Testing

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a fully managed service that builds upon Apache Airflow, offering its benefits while eliminating the need for you to set up, operate, and maintain the underlying infrastructure, reducing operational overhead while increasing security and resilience.

Interactive

Interactive Testing Data-driven Data Lake

Is Big Data Transforming Our Broken Hospital Management Systems?

Smart Data Collective

JULY 25, 2019

A lot of the emphasis so far has been on the use of big data to better engage with external third-parties, but big data can be equally valuable for managing internal hospital systems. Big Data is the Key to Improving the Efficiency of Hospital Management Systems? Big Data is the Key to Hospital Management.

Big Data

Big Data Data Transformation Management Software

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

Since software engineers manage to build ordinary software without experiencing as much pain as their counterparts in the ML department, it begs the question: should we just start treating ML projects as software engineering projects as usual, maybe educating ML practitioners about the existing best practices? Why did something break?

IT

IT Testing Experimentation Software

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data. 10) Data Quality Solutions: Key Attributes.

Data Quality

Data Quality Metrics Data-driven Management

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

This integration enables data teams to efficiently transform and manage data using Athena with dbt Cloud’s robust features, enhancing the overall data workflow experience. This enables you to extract insights from your data without the complexity of managing infrastructure.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Within seconds of transactional data being written into Amazon Aurora (a fully managed modern relational database service offering performance and high availability at scale), the data is seamlessly made available in Amazon Redshift for analytics and machine learning. Create dbt models in dbt Cloud.

Data Warehouse

Data Warehouse Analytics Testing Modeling

Available Now! Automated Testing for Data Transformations

Wayne Yaddow

FEBRUARY 18, 2025

Selecting the strategies and tools for validating data transformations and data conversions in your data pipelines. Introduction Data transformations and data conversions are crucial to ensure that raw data is organized, processed, and ready for useful analysis.

Testing

Testing Data Transformation Data-driven Data Quality

Key Challenges Affecting Data Transformations—Dev and Testing

Wayne Yaddow

FEBRUARY 6, 2025

Common challenges and practical mitigation strategies for reliable data transformations. Photo by Mika Baumeister on Unsplash Introduction Data transformations are important processes in data engineering, enabling organizations to structure, enrich, and integrate data for analytics , reporting, and operational decision-making.

Testing

Testing Data Transformation Data-driven Manufacturing

Functional Gaps in Your Data Transformation Testing Tools?

Wayne Yaddow

FEBRUARY 11, 2025

Managing tests of complex data transformations when automated data testing tools lack important features? Photo by Marvin Meyer on Unsplash Introduction Data transformations are at the core of modern business intelligence, blending and converting disparate datasets into coherent, reliable outputs.

Testing

Testing Data Transformation Data Quality Statistics

What is data analytics? Analyzing and managing data for decisions

CIO Business Intelligence

JUNE 7, 2022

What is data analytics? Data analytics is a discipline focused on extracting insights from data. It comprises the processes, tools and techniques of data analysis and management, including the collection, organization, and storage of data. What are the four types of data analytics?

Data Analytics

Data Analytics Diagnostic Analytics Management Analytics

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

The dashboard now in production uses Databricks’ Azure data lake to ingest, clean, store, and analyze the data, and Microsoft’s Power BI to generate graphical analytics that present critical operational data in a single view, such as the number of flights coming into domestic and international terminals and average security wait times.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

With this launch of JDBC connectivity, Amazon DataZone expands its support for data users, including analysts and scientists, allowing them to work in their preferred environments—whether it’s SQL Workbench, Domino, or Amazon-native solutions—while ensuring secure, governed access within Amazon DataZone. Choose Test connection.

Visualization

Visualization Data Lake Testing Data Governance

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and from third-party sources. In the JDBC parameters dialog box, select Using IDC auth and copy the JDBC URL. The following screenshot shows the dialog box.

Analytics

Analytics Visualization Data Governance Data-driven

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

DataKitchen

APRIL 26, 2021

In early April 2021, DataKItchen sat down with Jonathan Hodges, VP Data Management & Analytics, at Workiva ; Chuck Smith, VP of R&D Data Strategy at GlaxoSmithKline (GSK) ; and Chris Bergh, CEO and Head Chef at DataKitchen, to find out about their enterprise DataOps transformation journey, including key successes and lessons learned.

Measurement

Measurement Metrics Data-driven Dashboards

The Best Data Management Tools For Small Businesses

Smart Data Collective

APRIL 29, 2020

As the world is gradually becoming more dependent on data, the services, tools and infrastructure are all the more important for businesses in every sector. Data management has become a fundamental business concern, and especially for businesses that are going through a digital transformation. What is data management?

Management

Management Data Warehouse Digital Transformation Dashboards

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

This means you can refine your ETL jobs through natural follow-up questionsstarting with a basic data pipeline and progressively adding transformations, filters, and business logic through conversation. The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios.

Data Integration

Data Integration Visualization Data Processing Data Lake

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

Benefits Of Big Data In Logistics Before we look at our selection of practical examples and applications, let’s look at the benefits of big data in logistics – starting with the (not so) small matter of costs. Your Chance: Want to test a professional logistics analytics software? million miles.

Big Data

Big Data Internet of Things Cost-Benefit Optimization

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

For each service, you need to learn the supported authorization and authentication methods, data access APIs, and framework to onboard and test data sources. This approach simplifies your data journey and helps you meet your security requirements. Choose Add data. For Database , enter your database name.

Visualization

Visualization Data Processing Testing Publishing

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Organizations with legacy, on-premises, near-real-time analytics solutions typically rely on self-managed relational databases as their data store for analytics workloads. Near-real-time streaming analytics captures the value of operational data and metrics to provide new insights to create business opportunities.

Management

Management Metadata Analytics Dashboards

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Azure Databricks Delta Live Table s: These provide a more straightforward way to build and manage Data Pipelines for the latest, high-quality data in Delta Lake. It provides data prep, management, and enterprise data warehousing tools. It has a data pipeline tool , as well. It does the job.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

What is a DataOps Engineer?

DataKitchen

OCTOBER 5, 2021

The Value Pipeline represents data operations where data progresses on its journey to charts, graphs and other analytics which create value for the organization. The Innovation Pipeline includes analytics development, QA, deployment and the rest of the change management processes for the Value Pipeline. Create tests.

Testing

Testing Dashboards Measurement Experimentation

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Adding AI to Products: A High-Level Guide for Product Managers

Sisense

AUGUST 6, 2020

Be sure test cases represent the diversity of app users. Accurately prepared data is the base of AI. As an AI product manager, here are some important data-related questions you should ask yourself: What is the problem you’re trying to solve? Can a chatbot help improve relations? What are the business consequences?

Management

Management Machine Learning Key Performance Indicator Cost-Benefit

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

Upload your data, click through a workflow, walk away. If you’re a professional data scientist, you already have the knowledge and skills to test these models. Related to the previous point, a company could go from “raw data” to “it’s serving predictions on live data” in a single work day.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

What is business analytics? Using data to improve business outcomes

CIO Business Intelligence

JULY 5, 2022

What is the difference between business analytics and data analytics? Business analytics is a subset of data analytics. Data analytics is used across disciplines to find trends and solve problems using data mining , data cleansing, data transformation, data modeling, and more.

Business Analytics

Business Analytics Prescriptive Analytics Data mining Diagnostic Analytics

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

A modern data platform entails maintaining data across multiple layers, targeting diverse platform capabilities like high performance, ease of development, cost-effectiveness, and DataOps features such as CI/CD, lineage, and unit testing. It does this by helping teams handle the T in ETL (extract, transform, and load) processes.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

At AstraZeneca, data and AI are more than game changers – they are life changers

CIO Business Intelligence

OCTOBER 11, 2022

This initiative alone has generated an explosion in the quantity and complexity of data the company collects, stores, and analyzes for insights. . “We More quickly moving from ideas to insights has aided new drug development and the clinical trials used for testing new products. Accelerating drug discovery and clinical trials.

Machine Learning

Machine Learning Data Science Data-driven Testing

12 data science certifications that will pay off

CIO Business Intelligence

JANUARY 19, 2024

Data science certifications give you an opportunity to not only develop skills that are hard to find in your desired industry, but also validate your data science know-how so recruiters and hiring managers know what they get if they hire you.

Data Science

Data Science Machine Learning Predictive Modeling Forecasting

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

Uncomfortable truth incoming: Most people in your organization don’t think about the quality of their data from intake to production of insights. However, as a data team member, you know how important data integrity (and a whole host of other aspects of data management) is. Data integrity: A process and a state.

Data Integration

Data Integration Testing Data Quality Data-driven

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow. What if there was a way to not require developers to manage their own Apache NiFi installation without putting that burden on platform administrators?

Testing

Testing Cost-Benefit Interactive Visualization

DataOps Observability: Taming the Chaos (Part 2)

DataKitchen

OCTOBER 25, 2022

The goal of DataOps Observability is to provide visibility of every journey that data takes from source to customer value across every tool, environment, data store, data and analytic team, and customer so that problems are detected, localized and raised immediately. A data journey spans and tracks multiple pipelines.

Testing

Testing Data-driven Visualization Dashboards

Migrate from Apache Solr to OpenSearch

AWS Big Data

JULY 18, 2024

OpenSearch is an open source, distributed search engine suitable for a wide array of use-cases such as ecommerce search, enterprise search (content management search, document search, knowledge management search, and so on), site search, application search, and semantic search. You use the schema API to manage schema.

Dashboards

Dashboards Testing Data-driven Visualization

Introducing Self-Service, No-Code Airflow Authoring UI in Cloudera Data Engineering

Cloudera

OCTOBER 19, 2021

Airflow has been adopted by many Cloudera Data Platform (CDP) customers in the public cloud as the next generation orchestration service to setup and operationalize complex data pipelines. This intermediate definition can easily be integrated with source code management, such as Git, as needed.

Data Transformation

Data Transformation Interactive Machine Learning Testing

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In collaboration with AWS, BMS identified a business need to migrate and modernize their custom extract, transform, and load (ETL) platform to a native AWS solution to reduce complexities, resources, and investment to upgrade when new Spark, Python, or AWS Glue versions are released.

Metadata

Metadata Data Lake Visualization Data Quality

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

We’re excited to announce the general availability of the open source adapters for dbt for all the engines in CDP — Apache Hive , Apache Impala , and Apache Spark, with added support for Apache Livy and Cloudera Data Engineering. This variety can result in a lack of standardization, leading to data duplication and inconsistency.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

Overview of the BMW Cloud Data Hub At the BMW Group, Cloud Data Hub (CDH) is the central platform for managing company-wide data and data solutions. Each CDH dataset has three processing layers: source (raw data), prepared (transformed data in Parquet), and semantic (combined datasets).

Dashboards

Dashboards Analytics Metadata Data Warehouse

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Large-scale data warehouse migration to the cloud is a complex and challenging endeavor that many organizations undertake to modernize their data infrastructure, enhance data management capabilities, and unlock new business opportunities.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

This trend is no exception for Dafiti , an ecommerce company that recognizes the importance of using data to drive strategic decision-making processes. Amazon Redshift is widely used for Dafiti’s data analytics, supporting approximately 100,000 daily queries from over 400 users across three countries.

Data Lake

Data Lake Analytics Data Warehouse Data-driven

Cloudera Data Engineering 2021 Year End Review

Cloudera

DECEMBER 21, 2021

In working with thousands of customers deploying Spark applications, we saw significant challenges with managing Spark as well as automating, delivering, and optimizing secure data pipelines. We wanted to develop a service tailored to the data engineering practitioner built on top of a true enterprise hybrid data service platform.

Snapshot

Snapshot Data-driven Optimization Management

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The advent of rapid adoption of serverless data lake architectures—with ever-growing datasets that need to be ingested from a variety of sources, followed by complex data transformation and machine learning (ML) pipelines—can present a challenge. On the Secrets Manager console, navigate to the datalake-monitoring secret.

Data Lake

Data Lake Metrics Testing Cost-Benefit

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Granting Anthropic’s Claude permissions on Amazon Bedrock Have an AWS account and sign in using the AWS Management Console. Choose Manage model access. Prompt with no metadata For the first test, we used a basic prompt containing just the SQL generating instructions and no table metadata. Navigate to AWS CloudFormation.

Metadata

Metadata Data Lake Modeling Data Warehouse

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

Machine Learning – has grown from a collaborative workbench to an end-to-end Production ML platform that enables data scientists to deploy a model or an application to production in minutes with production-level monitoring, governance and performance tracking. Enrich – Data Engineering (Apache Spark and Apache Hive).

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Is Big Data Transforming Our Broken Hospital Management Systems?

Webinars

Trending Sources

MLOps and DevOps: Why Data Makes It Different

Webinars

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Available Now! Automated Testing for Data Transformations

Key Challenges Affecting Data Transformations—Dev and Testing

Functional Gaps in Your Data Transformation Testing Tools?

What is data analytics? Analyzing and managing data for decisions

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

The Best Data Management Tools For Small Businesses

Data’s dark secret: Why poor quality cripples AI and growth

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Ensuring Data Transformation Quality with dbt Core

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

What is a DataOps Engineer?

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Adding AI to Products: A High-Level Guide for Product Managers

Automating the Automators: Shift Change in the Robot Factory

What is business analytics? Using data to improve business outcomes

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

At AstraZeneca, data and AI are more than game changers – they are life changers

12 data science certifications that will pay off

Data Integrity, the Basis for Reliable Insights

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

DataOps Observability: Taming the Chaos (Part 2)

Migrate from Apache Solr to OpenSearch

Introducing Self-Service, No-Code Airflow Authoring UI in Cloudera Data Engineering

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

Cloudera Data Engineering 2021 Year End Review

Monitor data pipelines in a serverless data lake

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Happy Birthday, CDP Public Cloud

Stay Connected