Data Transformation and Optimization

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This approach helps in managing storage costs while maintaining the flexibility to analyze historical trends when needed.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Agentic AI: Why this emerging technology will revolutionise multiple sectors

CIO Business Intelligence

DECEMBER 9, 2024

New advancements in GenAI technology are set to create more transformative opportunities for tech-savvy enterprises and organisations. These developments come as data shows that while the GenAI boom is real and optimism is high, not every organisation is generating tangible value so far. 3] Preparation. Operations.

Technology

Technology Insurance Interactive Reporting

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

Table of Contents 1) Benefits Of Big Data In Logistics 2) 10 Big Data In Logistics Use Cases Big data is revolutionizing many fields of business, and logistics analytics is no exception. The complex and ever-evolving nature of logistics makes it an essential use case for big data applications.

Big Data

Big Data Internet of Things Cost-Benefit Optimization

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

CIOs are rethinking how they use public cloud services. Here’s why.

CIO Business Intelligence

JANUARY 8, 2025

Expense optimization and clearly defined workload selection criteria will determine which go to the public cloud and which to private cloud, he says. Secure storage, together with data transformation, monitoring, auditing, and a compliance layer, increase the complexity of the system.

Data Processing

Data Processing Optimization Modeling Enterprise

Complex Data Transformations — Test Planning Best Practices

Wayne Yaddow

FEBRUARY 21, 2025

Complex Data TransformationsTest Planning Best Practices Ensuring data accuracy with structured testing and best practices Photo by Taylor Vick on Unsplash Introduction Data transformations and conversions are crucial for data pipelines, enabling organizations to process, integrate, and refine raw data into meaningful insights.

Testing

Testing Data Transformation Data Quality Data Integration

Key Challenges Affecting Data Transformations—Dev and Testing

Wayne Yaddow

FEBRUARY 6, 2025

Common challenges and practical mitigation strategies for reliable data transformations. Photo by Mika Baumeister on Unsplash Introduction Data transformations are important processes in data engineering, enabling organizations to structure, enrich, and integrate data for analytics , reporting, and operational decision-making.

Testing

Testing Data Transformation Data-driven Manufacturing

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

This new JDBC connectivity feature enables our governed data to flow seamlessly into these tools, supporting productivity across our teams.” Use case Amazon DataZone addresses your data sharing challenges and optimizes data availability.

Analytics

Analytics Visualization Data Governance Data-driven

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. With a unified catalog, enhanced analytics capabilities, and efficient data transformation processes, were laying the groundwork for future growth.

IoT

IoT Machine Learning Metadata Data-driven

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

With these five layers, we can present a highly productive, data-centric software interface that enables iterative development of large-scale data-intensive applications. However, none of these layers help with modeling and optimization. Model Operations. Model Development.

IT

IT Testing Experimentation Software

Data Engineers Are Using AI to Verify Data Transformations

Wayne Yaddow

FEBRUARY 26, 2025

AI is transforming how senior data engineers and data scientists validate data transformations and conversions. Artificial intelligence-based verification approaches aid in the detection of anomalies, the enforcement of data integrity, and the optimization of pipelines for improved efficiency.

Data Transformation

Data Transformation Testing Data-driven Data Quality

Available Now! Automated Testing for Data Transformations

Wayne Yaddow

FEBRUARY 18, 2025

Selecting the strategies and tools for validating data transformations and data conversions in your data pipelines. Introduction Data transformations and data conversions are crucial to ensure that raw data is organized, processed, and ready for useful analysis.

Testing

Testing Data Transformation Data-driven Data Quality

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

AWS Big Data

NOVEMBER 22, 2024

Maintaining reusable database sessions to help optimize the use of database connections, preventing the API server from exhausting the available connections and improving overall system scalability.

Data Warehouse

Data Warehouse Recreation/Entertainment Cost-Benefit Data-driven

Automating Data Pipelines in CDP with CDE Managed Airflow Service

Cloudera

AUGUST 17, 2021

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. Typically users need to ingest data, transform it into optimal format with quality checks, and optimize querying of the data by visual analytics tool.

Management

Management Cost-Benefit Data Transformation Optimization

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

Data lakes provide a centralized repository for data from various sources, enabling organizations to unlock valuable insights and drive data-driven decision-making. However, as data volumes continue to grow, optimizing data layout and organization becomes crucial for efficient querying and analysis.

Optimization

Optimization Data Lake Cost-Benefit Reporting

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments.

Data Warehouse

Data Warehouse Analytics Testing Sales

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. How is Data Virtualization performance optimized? In improving operational processes.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Let’s go through the ten Azure data pipeline tools Azure Data Factory : This cloud-based data integration service allows you to create data-driven workflows for orchestrating and automating data movement and transformation. You can use it for big data analytics and machine learning workloads.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

BMW Group uses 4,500 AWS Cloud accounts across the entire organization but is faced with the challenge of reducing unnecessary costs, optimizing spend, and having a central place to monitor costs. The ultimate goal is to raise awareness of cloud efficiency and optimize cloud utilization in a cost-effective and sustainable manner.

Analytics

Analytics Dashboards Metadata Data Warehouse

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. This includes the ETL processes that capture source data, the functional refinement and creation of data products, the aggregation for business metrics, and the consumption from analytics, business intelligence (BI), and ML.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Adding data transformation details to metadata can be challenging because of the dispersed nature of this information across data processing pipelines, making it difficult to extract and incorporate into table-level metadata. Enriching the prompt You can enhance the prompts with query optimization rules like partition pruning.

Metadata

Metadata Data Lake Modeling Data Warehouse

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

Accurately predicting demand for products allows businesses to optimize inventory levels, minimize stockouts, and reduce holding costs. Solution overview In today’s highly competitive business landscape, it’s essential for retailers to optimize their inventory management processes to maximize profitability and improve customer satisfaction.

Forecasting

Forecasting Management IoT Data-driven

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

DataKitchen

APRIL 26, 2021

GSK’s DataOps journey paralleled their data transformation journey. GSK has been in the process of investing in and building out its data and analytics capabilities and shifting the R&D organization to a software engineering mindset. Organizations should be optimizing and driving their data teams with data.” .

Measurement

Measurement Metrics Data-driven Dashboards

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Within the ANZ enterprise data mesh strategy, aligning data mesh nodes with the ANZ Group’s divisional structure provides optimal alignment between data mesh principles and organizational structure, as shown in the following diagram.

Metadata

Metadata Data Governance Data Quality Data-driven

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

With auto-copy, automation enhances the COPY command by adding jobs for automatic ingestion of data. If storing operational data in a data warehouse is a requirement, synchronization of tables between operational data stores and Amazon Redshift tables is supported.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Migrate from Apache Solr to OpenSearch

AWS Big Data

JULY 18, 2024

The main driving factors include lower total cost of ownership, scalability, stability, improved ingestion connectors (such as Data Prepper , Fluent Bit, and OpenSearch Ingestion), elimination of external cluster managers like Zookeeper, enhanced reporting, and rich visualizations with OpenSearch Dashboards.

Dashboards

Dashboards Testing Data-driven Visualization

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Build data validation rules directly into ingestion layers so that insufficient data is stopped at the gate and not detected after damage is done. Use lineage tooling to trace data from source to report. Understanding how data transforms and where it breaks is crucial for audibility and root-cause resolution.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse. In this post, we show how smava optimized their data platform by using Amazon Redshift Serverless and Amazon Redshift data sharing to overcome right-sizing challenges for unpredictable workloads and further improve price-performance.

Data Lake

Data Lake Data Warehouse Data-driven B2B

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

This means there are no unintended data errors, and it corresponds to its appropriate designation (e.g., Here, it all comes down to the data transformation error rate. In other words, it measures the time between when data is expected and the moment when it is readily available for use. date, month, and year).

Data Quality

Data Quality Metrics Data-driven Management

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

By providing real-time visibility into the performance and behavior of data-related systems, DataOps observability enables organizations to identify and address issues before they become critical, and to optimize their data-related workflows for maximum efficiency and effectiveness.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

Transition from Amazon CloudSearch to Amazon OpenSearch Service

AWS Big Data

JULY 25, 2024

If you want deeper control over your infrastructure for cost and latency optimization, you can choose OpenSearch Service’s managed clusters deployment option. With managed clusters, you get granular control over the instances you would like to use, indexing and data-sharding strategy, and more.

Cost-Benefit

Cost-Benefit Machine Learning Dashboards Management

What is business analytics? Using data to improve business outcomes

CIO Business Intelligence

JULY 5, 2022

What is the difference between business analytics and data analytics? Business analytics is a subset of data analytics. Data analytics is used across disciplines to find trends and solve problems using data mining , data cleansing, data transformation, data modeling, and more.

Business Analytics

Business Analytics Prescriptive Analytics Data mining Diagnostic Analytics

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

AWS Big Data

APRIL 12, 2023

Amazon EMR on EKS provides a deployment option for Amazon EMR that allows organizations to run open-source big data frameworks on Amazon Elastic Kubernetes Service (Amazon EKS). This performance-optimized runtime offered by Amazon EMR makes your Spark jobs run fast and cost-effectively. As of the Amazon EMR 6.5 Amazon EMR 6.10

Testing

Testing Big Data Metadata Optimization

12 data science certifications that will pay off

CIO Business Intelligence

JANUARY 19, 2024

The certification consists of several exams that cover topics such as machine learning, natural language processing, computer vision, and model forecasting and optimization. You should also have experience with pattern detection, experimentation in business optimization techniques, and time-series forecasting.

Data Science

Data Science Machine Learning Predictive Modeling Forecasting

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

After the read query validation stage was complete and we were satisfied with the performance, we reconnected our orchestrator so that the data transformation queries could be run in the new cluster. At this point, only one-time queries and those made by Amazon QuickSight reached the new cluster.

Data Lake

Data Lake Analytics Data Warehouse Data-driven

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

Data Warehouse – in addition to a number of performance optimizations, DW has added a number of new features for better scalability, monitoring and reliability to enable self-service access with security and performance . Predict – Data Engineering (Apache Spark). New Services. Learn More, Keep in Touch.

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

How SOCAR handles large IoT data with Amazon MSK and Amazon ElastiCache for Redis

AWS Big Data

MAY 3, 2023

In this post, we provide a detailed overview of streaming messages with Amazon Managed Streaming for Apache Kafka (Amazon MSK) and Amazon ElastiCache for Redis , covering technical aspects and design considerations that are essential for achieving optimal results. We also discuss the key features, considerations, and design of the solution.

IoT

IoT Internet of Things Data Transformation Management

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

But to augment its various businesses with ML and AI, Iyengar’s team first had to break down data silos within the organization and transform the company’s data operations. Digitizing was our first stake at the table in our data journey,” he says. The offensive side?

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

AWS Big Data

MAY 9, 2023

For workloads such as data transforms, joins, and queries, you can use G.1X With exponentially growing data sources and data lakes, customers want to run more data integration workloads, including their most demanding transforms, aggregations, joins, and queries. 1X (1 DPU) and G.2X DPU-hour ($) G.2X

Data Lake

Data Lake Cost-Benefit Data Integration Data Transformation

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

It supports modern analytical data lake operations such as create table as select (CTAS), upsert and merge, and time travel queries. Athena also supports the ability to create views and perform VACUUM (snapshot expiration) on Apache Iceberg tables to optimize storage and performance. You can perform bulk load using a CTAS statement.

Data Lake

Data Lake Snapshot Optimization Data Transformation

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

Pattern 1: Data transformation, load, and unload Several of our data pipelines included significant data transformation steps, which were primarily performed through SQL statements executed by Amazon Redshift. The following Diagram 2 shows this workflow.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

Additionally, a TCO calculator generates the TCO estimation of an optimized EMR cluster for facilitating the migration. For optimizing EMR cluster cost effectiveness, the following table provides general guidelines of choosing the proper type of EMR cluster and Amazon Elastic Compute Cloud (Amazon EC2) family.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

The data lakehouse architecture combines the flexibility, scalability and cost advantages of data lakes with the performance, functionality and usability of data warehouses to deliver optimal price-performance for a variety of data, analytics and AI workloads.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Cloudera Data Engineering 2021 Year End Review

Cloudera

DECEMBER 21, 2021

Since the release of Cloudera Data Engineering (CDE) more than a year ago , our number one goal was operationalizing Spark pipelines at scale with first class tooling designed to streamline automation and observability. This enabled new use-cases with customers that were using a mix of Spark and Hive to perform data transformations. .

Snapshot

Snapshot Data-driven Optimization Management

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Analytics

Analytics Data Warehouse Big Data Metrics

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Agentic AI: Why this emerging technology will revolutionise multiple sectors

Webinars

Trending Sources

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

Webinars

CIOs are rethinking how they use public cloud services. Here’s why.

Complex Data Transformations — Test Planning Best Practices

Key Challenges Affecting Data Transformations—Dev and Testing

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

How EUROGATE established a data mesh architecture using Amazon DataZone

MLOps and DevOps: Why Data Makes It Different

Data Engineers Are Using AI to Verify Data Transformations

Available Now! Automated Testing for Data Transformations

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

Automating Data Pipelines in CDP with CDE Managed Airflow Service

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Biggest Trends in Data Visualization Taking Shape in 2022

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Reference guide to build inventory management and forecasting solutions on AWS

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Amazon Redshift data ingestion options

Migrate from Apache Solr to OpenSearch

Data’s dark secret: Why poor quality cripples AI and growth

How smava makes loans transparent and affordable using Amazon Redshift Serverless

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

An AI Chat Bot Wrote This Blog Post …

Transition from Amazon CloudSearch to Amazon OpenSearch Service

What is business analytics? Using data to improve business outcomes

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

12 data science certifications that will pay off

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

Happy Birthday, CDP Public Cloud

How SOCAR handles large IoT data with Amazon MSK and Amazon ElastiCache for Redis

Straumann Group is transforming dentistry with data, AI

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Cloudera Data Engineering 2021 Year End Review

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Stay Connected