Data Processing, Data Transformation and Optimization

CIOs are rethinking how they use public cloud services. Here’s why.

CIO Business Intelligence

JANUARY 8, 2025

Expense optimization and clearly defined workload selection criteria will determine which go to the public cloud and which to private cloud, he says. By moving applications back on premises, or using on-premises or hosted private cloud services, CIOs can avoid multi-tenancy while ensuring data privacy. But should you?

Data Processing

Data Processing Optimization Modeling Enterprise

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

Table of Contents 1) Benefits Of Big Data In Logistics 2) 10 Big Data In Logistics Use Cases Big data is revolutionizing many fields of business, and logistics analytics is no exception. The complex and ever-evolving nature of logistics makes it an essential use case for big data applications.

Big Data

Big Data Internet of Things Cost-Benefit Optimization

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. The applications are hosted in dedicated AWS accounts and require a BI dashboard and reporting services based on Tableau.

IoT

IoT Machine Learning Metadata Data-driven

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. Choose Create.

Data Warehouse

Data Warehouse Analytics Testing Sales

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Within the ANZ enterprise data mesh strategy, aligning data mesh nodes with the ANZ Group’s divisional structure provides optimal alignment between data mesh principles and organizational structure, as shown in the following diagram. Consumer feedback and demand drives creation and maintenance of the data product.

Metadata

Metadata Data Governance Data Quality Data-driven

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

The currently available choices include: The Amazon Redshift COPY command can load data from Amazon Simple Storage Service (Amazon S3), Amazon EMR , Amazon DynamoDB , or remote hosts over SSH. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

In this post, we explore how AWS Glue can serve as the data integration service to bring the data from Snowflake for your data integration strategy, enabling you to harness the power of your data ecosystem and drive meaningful outcomes across various use cases. Store the extracted and transformed data in Amazon S3.

Analytics

Analytics Data-driven Data Integration Data Lake

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

Access to an SFTP server with permissions to upload and download data. If the SFTP server is hosted on Amazon Elastic Compute Cloud (Amazon EC2) , we recommend that the network communication between the SFTP server and the AWS Glue job happens within the virtual private cloud (VPC) as pictured in the preceding architecture diagram.

Data Processing

Data Processing Visualization Data Lake Data Processing

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

This means there are no unintended data errors, and it corresponds to its appropriate designation (e.g., Here, it all comes down to the data transformation error rate. In other words, it measures the time between when data is expected and the moment when it is readily available for use. date, month, and year).

Data Quality

Data Quality Metrics Data-driven Management

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse. In this post, we show how smava optimized their data platform by using Amazon Redshift Serverless and Amazon Redshift data sharing to overcome right-sizing challenges for unpredictable workloads and further improve price-performance.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. This includes the ETL processes that capture source data, the functional refinement and creation of data products, the aggregation for business metrics, and the consumption from analytics, business intelligence (BI), and ML.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, more data is becoming available for processing / enrichment of existing and new use cases e.g., recently we have experienced a rapid growth in data collection at the edge and an increase in availability of frameworks for processing that data. As a result, alternative data integration technologies (e.g.,

Data Processing

Data Processing Data Warehouse Enterprise Visualization

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

The data lakehouse architecture combines the flexibility, scalability and cost advantages of data lakes with the performance, functionality and usability of data warehouses to deliver optimal price-performance for a variety of data, analytics and AI workloads.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

This service supports a range of optimized AI models, enabling seamless and scalable AI inference. By leveraging the NVIDIA NeMo platform and optimized versions of open-source models like Llama 3 and Mistral, businesses can harness the latest advancements in natural language processing, computer vision, and other AI domains.

Optimization

Optimization Experimentation Metrics Enterprise

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

Customers rely on data from different sources such as mobile applications, clickstream events from websites, historical data, and more to deduce meaningful patterns to optimize their products, services, and processes. To create the connection string, the Snowflake host and account name is required. Choose Next.

Data Processing

Data Processing Management Publishing Visualization

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

The system ingests data from various sources such as cloud resources, cloud activity logs, and API access logs, and processes billions of messages, resulting in terabytes of data daily. This data is sent to Apache Kafka, which is hosted on Amazon Managed Streaming for Apache Kafka (Amazon MSK).

Data Lake

Data Lake Analytics Snapshot Data Quality

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

AWS Big Data

AUGUST 8, 2024

Oracle GoldenGate for Oracle Database and Big Data adapters Oracle GoldenGate is a real-time data integration and replication tool used for disaster recovery, data migrations, high availability. Refer to Amazon EBS-optimized instance types for more information.

Analytics

Analytics Big Data Software Data Integration

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

Additionally, there are major rewrites to deliver developer-focused improvements, including static type checking, enhanced runtime validation, strong consistency in call patterns, and optimized event chaining. The following eventNames and eventCodes are returned as part of the onChange callback when there is a change in the SDK code status.

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

Enable data analytics with Talend and Amazon Redshift Serverless

AWS Big Data

JULY 25, 2023

For Host , enter the Redshift Serverless endpoint’s host URL. As well as Talend Cloud for enterprise-level data transformation needs, you could also use Talend Stitch to handle data ingestion and data replication to Redshift Serverless. For Host , enter the Redshift Serverless endpoint’s host URL.

Data Analytics

Data Analytics Analytics Data Warehouse Data Processing

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

This method uses GZIP compression to optimize storage consumption and query performance. You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches. You’re now ready to query the tables using Athena.

Analytics

Analytics IoT Metadata Internet of Things

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

Although we explored the option of using AWS managed notebooks to streamline the provisioning process, we have decided to continue hosting these components on our on-premises infrastructure for the current timeline. Joel has led data transformation projects on fraud analytics, claims automation, and Master Data Management.

Interactive

Interactive Strategy Cost-Benefit Data Governance

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

.” Sean Im, CEO, Samsung SDS America “In the field of generative AI and foundation models, watsonx is a platform that will enable us to meet our customers’ requirements in terms of optimization and security, while allowing them to benefit from the dynamism and innovations of the open-source community.”

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

The Delta tables created by the EMR Serverless application are exposed through the AWS Glue Data Catalog and can be queried through Amazon Athena. Data ingestion – Steps 1 and 2 use AWS DMS, which connects to the source database and moves full and incremental data (CDC) to Amazon S3 in Parquet format.

Data Lake

Data Lake Dashboards Metrics Metadata

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

AWS Big Data

FEBRUARY 21, 2023

We use Apache Spark as our main data processing engine and have over 1,000 Spark applications running over massive amounts of data every day. These Spark applications implement our business logic ranging from data transformation, machine learning (ML) model inference, to operational tasks. Their costs were climbing.

Cost-Benefit

Cost-Benefit Informatics Optimization Management

Best BI Tools For 2024 You Need to Know

FineReport

MARCH 31, 2024

Furthermore, these tools boast customization options, allowing users to tailor data sources to address areas critical to their business success, thereby generating actionable insights and customizable reports. Best BI Tools for Data Analysts 3.1 Key Features: Extensive library of pre-built connectors for diverse data sources.

Dashboards

Dashboards Visualization Data mining Data-driven

The Rising Need for Data Governance in Healthcare

Alation

OCTOBER 28, 2021

It defines how data can be collected and used within an organization, and empowers data teams to: Maintain compliance, even as laws change. Uncover intelligence from data. Protect data at the source. Put data into action to optimize the patient experience and adapt to changing business models.

Data Governance

Data Governance Measurement Data Quality Metrics

7 Things All Successful Data Product Managers Have In Common

Alation

FEBRUARY 2, 2023

Having the right tools is essential for any successful data product manager focused on enterprise data transformation. When choosing the tools for a project, whether it be the CIO , CDO , or data product managers themselves, the buyers must see the big picture. They can quickly understand complex systems and technology.

Management

Management Data-driven Visualization Strategy

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

AWS Big Data

JULY 31, 2023

Customers often use many SQL scripts to select and transform the data in relational databases hosted either in an on-premises environment or on AWS and use custom workflows to manage their ETL. AWS Glue is a serverless data integration and ETL service with the ability to scale on demand. Wait for all the jobs to complete.

Sales

Sales Data Warehouse Visualization Testing

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

But Barnett, who started work on a strategy in 2023, wanted to continue using Baptist Memorial’s on-premise data center for financial, security, and continuity reasons, so he and his team explored options that allowed for keeping that data center as part of the mix.

IT

IT Insurance Cost-Benefit Testing

Supercharging Your Digital Transformation with Embedded Analytics

Sisense

FEBRUARY 26, 2020

Achieving a successful data transformation requires the right choice of an analytics and BI platform to ensure you get the most value from your data. It can also help you see ROI from your digital transformation sooner. Empowering customers with embedded analytics: Tessitura Network.

Digital Transformation

Digital Transformation Analytics Big Data Data-driven

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

AWS Big Data

FEBRUARY 26, 2024

When you start the process of designing your data model for Amazon Keyspaces, it’s essential to possess a comprehensive understanding of your access patterns, similar to the approach used in other NoSQL databases. Additionally, you can configure OpenSearch Ingestion to apply data transformations before delivery.

Dashboards

Dashboards Testing Metrics Optimization

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Unified Data Clears the Roadblocks of Your Hybrid Cloud Journey

Jet Global

AUGUST 24, 2023

Optimized Resource Allocation: Finance teams can strategically allocate resources in a hybrid ERP environment. This optimization leads to improved efficiency, reduced operational costs, and better resource utilization. Cost Optimization: The hybrid model allows finance teams to balance their expenses effectively.

Finance

Finance Reporting Data Integration Data Warehouse

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Reports In formats that are both static and interactive, these showcase tabular views of data. Strategic Objective Provide an optimal user experience regardless of where and how users prefer to access information. Ideally, your primary data source should belong in this group. addresses). Do what you expect your customers to do.

Analytics

Analytics Cost-Benefit Visualization Dashboards

PackScan: Building real-time sort center analytics with AWS Services

AWS Big Data

MAY 30, 2025

Within Amazons Middle Mile operations, high-volume sort centers process millions of packages daily, making immediate access to operational data essential for optimizing efficiency and decision-making. Data flow In 2024, PackScan was deployed across 80 sort centers in the USA, enabling real-time package analytics.

Analytics

Analytics Dashboards Optimization Data Processing

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

AWS Big Data

FEBRUARY 25, 2025

Amazon EC2 to host and run a Jenkins build server. Solution walkthrough The solution architecture is shown in the preceding figure and includes: Continuous integration and delivery ( CI/CD) for data processing Data engineers can define the underlying data processing job within a JSON template. million records each.

Data Processing

Data Processing Machine Learning Data-driven Cost-Benefit

Hybrid big data analytics with Amazon EMR on AWS Outposts

AWS Big Data

JANUARY 29, 2025

This helps optimize data access from the Regional S3 bucket as traffic is routed through Direct Connect. Additionally, we show you how to submit batch jobs to Amazon EMR using EMR steps for automated, scheduled data processing. This method is ideal for recurring tasks or large-scale data transformations.

Big Data

Big Data Data Analytics Analytics Interactive

CIOs are rethinking how they use public cloud services. Here’s why.

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

Webinars

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Amazon Redshift data ingestion options

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Use AWS Glue to streamline SFTP data processing

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Addressing the Three Scalability Challenges in Modern Data Platforms

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Deploy and Scale AI Applications With Cloudera AI Inference Service

Use Snowflake with Amazon MWAA to orchestrate data pipelines

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

Enable data analytics with Talend and Amazon Redshift Serverless

Gain insights from historical location data using Amazon Location Service and AWS analytics services

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

Exploring the AI and data capabilities of watsonx

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

Best BI Tools For 2024 You Need to Know

The Rising Need for Data Governance in Healthcare

7 Things All Successful Data Product Managers Have In Common

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

CIO 100 Award winners drive business results with IT

Supercharging Your Digital Transformation with Embedded Analytics

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

What is Data Mapping?

Unified Data Clears the Roadblocks of Your Hybrid Cloud Journey

What Is Embedded Analytics?

PackScan: Building real-time sort center analytics with AWS Services

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

Hybrid big data analytics with Amazon EMR on AWS Outposts

Stay Connected