Data Transformation, Publishing and Testing

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

With this launch of JDBC connectivity, Amazon DataZone expands its support for data users, including analysts and scientists, allowing them to work in their preferred environments—whether it’s SQL Workbench, Domino, or Amazon-native solutions—while ensuring secure, governed access within Amazon DataZone. Choose Test connection.

Visualization

Visualization Data Lake Testing Data Governance

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

DataKitchen

APRIL 26, 2021

GSK had been pursuing DataOps capabilities such as automation, containerization, automated testing and monitoring, and reusability, for several years. DataOps provides the “continuous delivery equivalent for Machine Learning and enables teams to manage the complexities around continuous training, A/B testing, and deploying without downtime.

Measurement

Measurement Metrics Data-driven Dashboards

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

For each service, you need to learn the supported authorization and authentication methods, data access APIs, and framework to onboard and test data sources. This approach simplifies your data journey and helps you meet your security requirements. Now, lets start running queries on your notebook. Choose Run all.

Visualization

Visualization Data Processing Testing Publishing

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Build data validation rules directly into ingestion layers so that insufficient data is stopped at the gate and not detected after damage is done. Use lineage tooling to trace data from source to report. Understanding how data transforms and where it breaks is crucial for audibility and root-cause resolution.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Improve Business Agility by Hiring a DataOps Engineer

DataKitchen

DECEMBER 20, 2020

They give data scientists tools to instantiate development sandboxes on demand. They automate the data operations pipeline and create platforms used to test and monitor data from ingestion to published charts and graphs.

Data-driven

Data-driven Manufacturing Data Architecture Data Analytics

AzureML and CRISP-DM – a Framework to help the Business Intelligence professional move to AI

Jen Stirrup

SEPTEMBER 30, 2021

For example, data can be filtered so that the investigation can be focused more specifically. There are a number of Data Transformation modules which help with these area. That said, it’s often better to clean the data further upstream so it is done closer to the source rather than at the end of a spoke.

Business Intelligence

Business Intelligence Data mining Machine Learning Testing

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

Also known as data validation, integrity refers to the structural testing of data to ensure that the data complies with procedures. This means there are no unintended data errors, and it corresponds to its appropriate designation (e.g., Here, it all comes down to the data transformation error rate.

Data Quality

Data Quality Metrics Data-driven Management

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow. This allows developers to make changes to their processing logic on the fly while running some test data through their flow and validating that their changes work as intended.

Testing

Testing Cost-Benefit Interactive Visualization

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

Allows them to iteratively develop processing logic and test with as little overhead as possible. Plays nice with existing CI/CD processes to promote a data pipeline to production. Provides monitoring, alerting, and troubleshooting for production data pipelines.

Testing

Testing Publishing Metadata Interactive

DataOps Observability: Taming the Chaos (Part 2)

DataKitchen

OCTOBER 25, 2022

The goal of DataOps Observability is to provide visibility of every journey that data takes from source to customer value across every tool, environment, data store, data and analytic team, and customer so that problems are detected, localized and raised immediately. That data then fills several database tables.

Testing

Testing Data-driven Visualization Dashboards

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

AWS Big Data

APRIL 12, 2023

We also share a Spark benchmark solution that suits all Amazon EMR deployment options, so you can replicate the process in your environment for your own performance test cases. The solution uses the TPC-DS dataset and unmodified data schema and table relationships, but derives queries from TPC-DS to support the SparkSQL test cases.

Testing

Testing Big Data Metadata Optimization

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

Our approach The migration initiative consisted of two main parts: building the new architecture and migrating data pipelines from the existing tool to the new architecture. Often, we would work on both in parallel, testing one component of the architecture while developing another at the same time.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Cloudera Data Engineering 2021 Year End Review

Cloudera

DECEMBER 21, 2021

This enabled new use-cases with customers that were using a mix of Spark and Hive to perform data transformations. . Secondly, instead of being tied to the embedded Airflow within CDE, we wanted any customer using Airflow (even outside of CDE) to tap into the CDP platform, that’s why we published our Cloudera provider package.

Snapshot

Snapshot Data-driven Optimization Management

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. This will enable right-sizing the Redshift data warehouse to meet workload demands cost-effectively. A loading team builds a producer-consumer architecture in Amazon Redshift to process concurrent near real-time publishing of data.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

AWS Big Data

JUNE 20, 2023

Kinesis Data Firehose is a fully managed service for delivering near-real-time streaming data to various destinations for storage and performing near-real-time analytics. You can perform analytics on VPC flow logs delivered from your VPC using the Kinesis Data Firehose integration with Datadog as a destination.

Dashboards

Dashboards Visualization Metrics Data Transformation

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

To grow the power of data at scale for the long term, it’s highly recommended to design an end-to-end development lifecycle for your data integration pipelines. The following are common asks from our customers: Is it possible to develop and test AWS Glue data integration jobs on my local laptop?

Data Integration

Data Integration Snapshot Testing Visualization

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

Data is decompressed and stored in a different S3 bucket (transformed data can be stored in the same S3 bucket where data was ingested, but for simplicity, we’re using two separate S3 buckets). The transformed data is then made accessible to Snowflake for data analysis. Set the protocol to Email.

Data Processing

Data Processing Management Publishing Visualization

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Cloudera

JULY 21, 2022

Cloudera Data Warehouse). Efficient batch data processing. Complex data transformations. Triton Digital, for example, uses Rill to deploy self-serve reporting for hundreds of digital media publishers with little or no training. Apache Hive. Large-scale high throughput analytics. Joins and subqueries . Apache Druid.

Metrics

Metrics Slice and Dice Data Warehouse Dashboards

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources. Data transformations through stored procedures and use of materialized views to curate datasets and generate insights is a known pattern with relational databases.

Management

Management Metadata Analytics Dashboards

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

AWS Big Data

JULY 27, 2023

DataBrew is a visual data preparation tool that enables you to clean and normalize data without writing any code. The over 200 transformations it provides are now available to be used in an AWS Glue Studio visual job. Now that you have addressed all data quality issues identified on the sample, publish the project as a recipe.

Visualization

Visualization Cost-Benefit Data Quality Publishing

Assessing and interviewing data engineers from a distance

Insight

APRIL 8, 2020

For example, they may give applicants access to an API and ask them to query data that satisfies some criteria, or they may share a large dataset and asking applicants to perform some sort of data transformation. Each submission is run through a series of tests to ensure that the desired output is produced.

Data Warehouse

Data Warehouse Cost-Benefit Software Optimization

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

Developers can use the support in Amazon Location Service for publishing device position updates to Amazon EventBridge to build a near-real-time data pipeline that stores locations of tracked assets in Amazon Simple Storage Service (Amazon S3). You can test this solution yourself using the AWS Samples GitHub repository.

Analytics

Analytics IoT Metadata Internet of Things

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

Tricentis is the global leader in continuous testing for DevOps, cloud, and enterprise applications. Speed changes everything, and continuous testing across the entire CI/CD lifecycle is the key. Tricentis instills that confidence by providing software tools that enable Agile Continuous Testing (ACT) at scale.

Software

Software Data Lake Testing Cost-Benefit

Improve observability across Amazon MWAA tasks

AWS Big Data

FEBRUARY 6, 2023

For data pipeline orchestration, the Apache Airflow UI is a user-friendly tool that provides detailed views into your data pipeline. When it comes to pipeline health management, each service that your tasks are interacting with could be storing or publishing logs to different locations, such as an S3 bucket or Amazon CloudWatch logs.

Management

Management Interactive Publishing Metadata

AI, the Power of Knowledge and the Future Ahead: An Interview with Head of Ontotext’s R&I Milena Yankova

Ontotext

APRIL 4, 2019

Within a large enterprise, there is a huge amount of data accumulated over the years – many decisions have been made and different methods have been tested. Milena Yankova : What we did for the BBC in the previous Olympics was that we helped journalists publish their reports faster. I think artists can relax.

Recreation/Entertainment

Recreation/Entertainment Testing Enterprise Knowledge Discovery

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

AWS Big Data

JUNE 29, 2023

In this post, we discuss why AWS recommends moving from Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics for Apache Flink to take advantage of Apache Flink’s advanced streaming capabilities. View the stream data. Transform and enrich the data. Manipulate the data with Python.

Data Analytics

Data Analytics Analytics IoT Data Lake

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Alation

OCTOBER 27, 2022

Few actors in the modern data stack have inspired the enthusiasm and fervent support as dbt. This data transformation tool enables data analysts and engineers to transform, test and document data in the cloud data warehouse. But what does this mean from a practitioner perspective?

Dashboards

Dashboards Metrics Sales Reporting

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

The following AWS services are used for data ingestion, processing, and load: Amazon AppFlow is a fully managed integration service that enables you to securely transfer data between SaaS applications like Salesforce, SAP, Marketo, Slack, and ServiceNow, and AWS services like Amazon S3 and Amazon Redshift , in just a few clicks.

Sales

Sales Visualization Software Metadata

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

This produces end-to-end lineage so business and technology users alike can understand the state of a data lake and/or lake house. They can better understand data transformations, checks, and normalization. They can better grasp the purpose and use for specific data (and improve the pipeline!). Transparency is key.

Metadata

Metadata Cost-Benefit Data Quality Data Lake

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

Real-time analytics and BI: Combine data from existing sources with new data to unlock new, faster insights without the cost and complexity of duplicating and moving data across different environments. How you can get started today Test out watsonx.ai and watsonx.data for yourself with our watsonx trial experience.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

For these workloads, data lake vendors usually recommend extracting data into flat files to be used solely for model training and testing purposes. This adds an additional ETL step, making the data even more stale. Data lakehouse was created to solve these problems. Data discoverability.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

AWS Big Data

FEBRUARY 26, 2024

You simply configure your data sources to send information to OpenSearch Ingestion, which then automatically delivers the data to your specified destination. Additionally, you can configure OpenSearch Ingestion to apply data transformations before delivery. Choose the Test tab. For Method type ¸ choose POST.

Dashboards

Dashboards Testing Metrics Optimization

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Enhancing Your BI Experience With Apache Iceberg

Jet Global

JULY 16, 2024

By providing a consistent and stable backend, Apache Iceberg ensures that data remains immutable and query performance is optimized, thus enabling businesses to trust and rely on their BI tools for critical insights. It provides a stable schema, supports complex data transformations, and ensures atomic operations.

Dashboards

Dashboards Data-driven Reporting Business Intelligence

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Strategic Objective Create a complete, user-friendly view of the data by preparing it for analysis. Requirement Multi-Source Data Blending Data from multiple sources is compiled and the output is a single view, metric, or visualization. Data Transformation and Enrichment Data can be enriched for analysis.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

AWS Big Data

NOVEMBER 6, 2024

In our examples, we use Kinesis Data Generator , a sample application to generate and publish data streams to Firehose. You can also set up Firehose to use other data sources for your real-time streams. We set up Firehose to deliver the stream into Iceberg tables in the Data Catalog. Choose Send data.

Metadata

Metadata Data Lake Management Internet of Things

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

AWS Big Data

NOVEMBER 14, 2024

We use the built-in features of Data Firehose, including AWS Lambda for necessary data transformation and Amazon Simple Notification Service (Amazon SNS) for near real-time alerts. APIs act as the entry point for applications to access data, business logic, or functionality from your backend services.

Data Lake

Data Lake Metadata Testing Data-driven

Tableau certification guide: How to boost your data analytics skills

CIO Business Intelligence

JANUARY 10, 2025

Tableaus certifications, in particular, focus on performance-based testing rather than theory in an effort to verify a candidates ability to apply the subject matter in a real work environment. The Tableau Certified Data Analyst title is active for two years from the date achieved. The certification does not expire.

Data Analytics

Data Analytics Analytics Consulting Visualization

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

Webinars

Trending Sources

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Webinars

Data’s dark secret: Why poor quality cripples AI and growth

Improve Business Agility by Hiring a DataOps Engineer

AzureML and CRISP-DM – a Framework to help the Business Intelligence professional move to AI

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

DataOps Observability: Taming the Chaos (Part 2)

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Cloudera Data Engineering 2021 Year End Review

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Use Snowflake with Amazon MWAA to orchestrate data pipelines

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

Assessing and interviewing data engineers from a distance

Gain insights from historical location data using Amazon Location Service and AWS analytics services

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Improve observability across Amazon MWAA tasks

AI, the Power of Knowledge and the Future Ahead: An Interview with Head of Ontotext’s R&I Milena Yankova

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Cross-account integration between SaaS platforms using Amazon AppFlow

Turnkey Cloud DataOps: Solution from Alation and Accenture

Exploring the AI and data capabilities of watsonx

Data platform trinity: Competitive or complementary?

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

What is Data Mapping?

Enhancing Your BI Experience With Apache Iceberg

What Is Embedded Analytics?

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

Tableau certification guide: How to boost your data analytics skills

Stay Connected