Data Transformation, Events and Optimization

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This approach helps in managing storage costs while maintaining the flexibility to analyze historical trends when needed.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. or a later version) database.

Data Warehouse

Data Warehouse Analytics Testing Modeling

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. With a unified catalog, enhanced analytics capabilities, and efficient data transformation processes, were laying the groundwork for future growth.

IoT

IoT Machine Learning Metadata Data-driven

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

AWS Big Data

NOVEMBER 22, 2024

Maintaining reusable database sessions to help optimize the use of database connections, preventing the API server from exhausting the available connections and improving overall system scalability. Building event-driven applications with Amazon EventBridge and Lambda.

Data Warehouse

Data Warehouse Recreation/Entertainment Cost-Benefit Data-driven

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. How is Data Virtualization performance optimized? In forecasting future events.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Build data validation rules directly into ingestion layers so that insufficient data is stopped at the gate and not detected after damage is done. Use lineage tooling to trace data from source to report. Understanding how data transforms and where it breaks is crucial for audibility and root-cause resolution.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Within the ANZ enterprise data mesh strategy, aligning data mesh nodes with the ANZ Group’s divisional structure provides optimal alignment between data mesh principles and organizational structure, as shown in the following diagram.

Metadata

Metadata Data Governance Data Quality Data-driven

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

We used the AWS Step Function state machines to define, orchestrate, and execute our data pipelines. Amazon EventBridge We used Amazon EventBridge, the serverless event bus service, to define the event-based rules and schedules that would trigger our AWS Step Functions state machines.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Harnessing Streaming Data: Insights at the Speed of Life

Sisense

OCTOBER 15, 2020

Let’s look at a few ways that different industries take advantage of streaming data. How industries can benefit from streaming data. Automotive: Monitoring connected, autonomous cars in real time to optimize routes to avoid traffic and for diagnosis of mechanical issues. Optimizing object storage.

Dashboards

Dashboards IoT Optimization Internet of Things

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

Accurately predicting demand for products allows businesses to optimize inventory levels, minimize stockouts, and reduce holding costs. Solution overview In today’s highly competitive business landscape, it’s essential for retailers to optimize their inventory management processes to maximize profitability and improve customer satisfaction.

Forecasting

Forecasting Management IoT Data-driven

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

With auto-copy, automation enhances the COPY command by adding jobs for automatic ingestion of data. If storing operational data in a data warehouse is a requirement, synchronization of tables between operational data stores and Amazon Redshift tables is supported.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

You will load the event data from the SFTP site, join it to the venue data stored on Amazon S3, apply transformations, and store the data in Amazon S3. The event and venue files are from the TICKIT dataset. For Node parents , select Rename Venue data and Rename Event data.

Data Processing

Data Processing Visualization Data Lake Data Processing

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

Using EventBridge integration, filtered positional updates are published to an EventBridge event bus. Amazon Location device position events arrive on the EventBridge default bus with source: ["aws.geo"] and detail-type: ["Location Device Position Event"]. In this model, the Lambda function is invoked for each incoming event.

Analytics

Analytics IoT Metadata Internet of Things

What is business analytics? Using data to improve business outcomes

CIO Business Intelligence

JULY 5, 2022

What is the difference between business analytics and data analytics? Business analytics is a subset of data analytics. Data analytics is used across disciplines to find trends and solve problems using data mining , data cleansing, data transformation, data modeling, and more.

Business Analytics

Business Analytics Prescriptive Analytics Data mining Diagnostic Analytics

How Chime Financial uses AWS to build a serverless stream analytics platform and defeat fraudsters

AWS Big Data

SEPTEMBER 19, 2023

Problem statement In order to keep up with the rapid movement of fraudsters, our decision platform must continuously monitor user events and respond in real-time. However, our legacy data warehouse-based solution was not equipped for this challenge. Amazon DynamoDB is another data source for our Streaming 2.0

Analytics

Analytics Risk Big Data Machine Learning

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

By providing real-time visibility into the performance and behavior of data-related systems, DataOps observability enables organizations to identify and address issues before they become critical, and to optimize their data-related workflows for maximum efficiency and effectiveness.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

This means there are no unintended data errors, and it corresponds to its appropriate designation (e.g., Here, it all comes down to the data transformation error rate. In other words, it measures the time between when data is expected and the moment when it is readily available for use. date, month, and year).

Data Quality

Data Quality Metrics Data-driven Management

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

AWS Big Data

APRIL 2, 2024

You can use Amazon Data Firehose to aggregate and deliver log events from your applications and services captured in Amazon CloudWatch Logs to your Amazon Simple Storage Service (Amazon S3) bucket and Splunk destinations, for use cases such as data analytics, security analysis, application troubleshooting etc.

Metadata

Metadata Marketing Analytics Data Transformation

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon AppFlow is a fully managed integration service that you can use to securely transfer data from software as a service (SaaS) applications, such as Google BigQuery, Salesforce, SAP, HubSpot, and ServiceNow, to Amazon Web Services (AWS) services such as Amazon Simple Storage Service (Amazon S3) and Amazon Redshift, in just a few clicks.

Analytics

Analytics Data Warehouse Metrics Big Data

Build Hybrid Data Pipelines and Enable Universal Connectivity With CDF-PC Inbound Connections

Cloudera

JUNE 17, 2022

In the second blog of the Universal Data Distribution blog series , we explored how Cloudera DataFlow for the Public Cloud (CDF-PC) can help you implement use cases like data lakehouse and data warehouse ingest, cybersecurity, and log optimization, as well as IoT and streaming data collection.

Cost-Benefit

Cost-Benefit IoT Data Warehouse Manufacturing

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Furthermore, it allows for necessary actions to be taken, such as rectifying errors in the data source, refining data transformation processes, and updating data quality rules. An EventBridge rule receives an event notification from the AWS Glue Data Quality evaluations including the results.

Data Quality

Data Quality Metrics Data-driven Visualization

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Cloudera

JULY 21, 2022

Cloudera users can securely connect Rill to a source of event stream data, such as Cloudera DataFlow , model data into Rill’s cloud-based Druid service, and share live operational dashboards within minutes via Rill’s interactive metrics dashboard or any connected BI solution. Cloudera Data Warehouse). Apache Hive.

Metrics

Metrics Slice and Dice Data Warehouse Dashboards

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse. In this post, we show how smava optimized their data platform by using Amazon Redshift Serverless and Amazon Redshift data sharing to overcome right-sizing challenges for unpredictable workloads and further improve price-performance.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

This service supports a range of optimized AI models, enabling seamless and scalable AI inference. By leveraging the NVIDIA NeMo platform and optimized versions of open-source models like Llama 3 and Mistral, businesses can harness the latest advancements in natural language processing, computer vision, and other AI domains.

Optimization

Optimization Experimentation Metrics Enterprise

Turning the page

Cloudera

JUNE 1, 2021

Our customers must also have secure access to their data from anywhere – from on-premises to hybrid clouds and multiple public clouds. We must integrate and optimize the end-to-end data lifecycle for our customers, empowering them to focus on what really matters – extracting value from their data.

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

Additionally, there are major rewrites to deliver developer-focused improvements, including static type checking, enhanced runtime validation, strong consistency in call patterns, and optimized event chaining. is modernized by using promises for all actions, so developers can use async and await functions for better event management.

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

Improve power utility operational efficiency using smart sensor data and Amazon QuickSight

AWS Big Data

MAY 16, 2023

Different communication infrastructure types such as mesh network and cellular can be used to send load information on a pre-defined schedule or event data in real time to the backend servers residing in the utility UDN (Utility Data Network).

Dashboards

Dashboards Statistics Data Collection Business Intelligence

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

If you can’t make sense of your business data, you’re effectively flying blind. Insights hidden in your data are essential for optimizing business operations, finetuning your customer experience, and developing new products — or new lines of business, like predictive maintenance. Azure Data Factory.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

It supports modern analytical data lake operations such as create table as select (CTAS), upsert and merge, and time travel queries. Athena also supports the ability to create views and perform VACUUM (snapshot expiration) on Apache Iceberg tables to optimize storage and performance.

Data Lake

Data Lake Snapshot Optimization Data Transformation

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

Once a draft has been created or opened, developers use the visual Designer to build their data flow logic and validate it using interactive test sessions. In the Designer, you have the ability to start and stop each step of the data pipeline, resulting in events being queued up in the connections that link the processing steps together.

Testing

Testing Publishing Metadata Interactive

Automate alerting and reporting for AWS Glue job resource usage

AWS Big Data

MAY 25, 2023

Data transformation plays a pivotal role in providing the necessary data insights for businesses in any organization, small and large. To gain these insights, customers often perform ETL (extract, transform, and load) jobs from their source systems and output an enriched dataset.

Reporting

Reporting Metrics Optimization Data Lake

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

Due to this low complexity, the solution uses AWS serverless services to ingest the data, transform it, and make it available for analytics. The serverless architecture features auto scaling, high availability, and a pay-as-you-go billing model to increase agility and optimize costs.

Visualization

Visualization Dashboards Data-driven Gap analysis

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

AWS Big Data

AUGUST 8, 2024

Oracle GoldenGate for Oracle Database and Big Data adapters Oracle GoldenGate is a real-time data integration and replication tool used for disaster recovery, data migrations, high availability. GoldenGate provides special tools called S3 event handlers to integrate with Amazon S3 for data replication.

Analytics

Analytics Big Data Software Data Integration

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

Cloudera

OCTOBER 14, 2020

YuniKorn is designed for Big Data app workloads, and it natively supports to run Spark/Flink/Tensorflow, etc efficiently in K8s. YuniKorn is optimized for performance, it is suitable for high throughput and large scale environments. YuniKorn scheduler provides an optimal solution to manage resource quotas by using resource queues.

Machine Learning

Machine Learning Management Big Data Optimization

A Planning Center of Excellence Delivers Performance Improvement

David Menninger's Analyst Perspectives

NOVEMBER 7, 2024

It’s a pantry because all the data one needs is readily available and easily accessible, with labels that are immediately recognized and understood by the users of the application. In tech speak, this means the semantic layer is optimized for the intended audience.

Forecasting

Forecasting Machine Learning Finance Predictive Analytics

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift enables you to use SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and machine learning (ML) to deliver the best price-performance at scale. Shashank Tewari is a Senior Technical Account Manager at AWS.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

AWS Glue is a serverless data discovery, load, and transformation service that will prepare data for consumption in BI and AI/ML activities. Solution overview This solution uses Amazon AppFlow to retrieve data from the Jira Cloud. Parquet is a columnar format to optimize subsequent querying. Choose Update.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

Additionally, a TCO calculator generates the TCO estimation of an optimized EMR cluster for facilitating the migration. Transform the YARN job history logs from JSON to CSV After obtaining YARN logs, you run a YARN log organizer, yarn-log-organizer.py, which is a parser to transform JSON-based logs to CSV files.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

Stored procedure enhancements in Amazon Redshift

AWS Big Data

SEPTEMBER 6, 2023

Stored procedures are commonly used to encapsulate logic for data transformation, data validation, and business-specific logic. It inserts a record in the procedure_log table in the event of an exception. It also has an EXCEPTION block and inserts a record in the procedure_log table in the event of an exception.

Data Warehouse

Data Warehouse Insurance Statistics Software

Database vs. Data Warehouse: What’s the Difference?

Jet Global

MAY 28, 2019

A database is, by definition, ‘any collection of data organized for storage, accessibility, and retrieval.’ Databases usually consist of information arranged in rows, columns, and tables, organized mainly for easy input and collection of different events. while rows will contain the individual events and trades themselves.

Data Warehouse

Data Warehouse Reporting Business Intelligence Sales

How Infomedia built a serverless data pipeline with change data capture using AWS Glue and Apache Hudi

AWS Big Data

MARCH 15, 2023

Infomedia was looking to build a cloud-based data platform to take advantage of highly scalable data storage with flexible and cloud-native processing tools to ingest, transform, and deliver datasets to their SaaS applications. Performance and scalability of both the data pipeline and API endpoint were key success criteria.

Cost-Benefit

Cost-Benefit Data Processing Optimization Data-driven

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

When it comes to data modeling, function determines form. Let’s say you want to subject a dataset to some form of anomaly detection; your model might take the form of a singular event stream that can be read by an anomaly detection service. filling in nulls, changing time zones, formatting strings, conditional logic, etc.)

Modeling

Modeling Big Data IoT Data Warehouse

Advanced reporting and analytics for the Post Call Analytics (PCA) solution with Amazon QuickSight

AWS Big Data

JANUARY 27, 2023

As part of the solution workflow, EventBridge receives an event for each PCA solution analysis output file. Kinesis Data Firehose uses Lambda to perform data transformation and compression, storing the file in a compressed columnar format (Parquet) in the target S3 bucket. step 3) on the Amazon S3 console.

Analytics

Analytics Reporting Dashboards Visualization

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

AWS Big Data

OCTOBER 12, 2023

With EMR Serverless, you don’t have to configure, optimize, secure, or operate clusters to run applications with these frameworks. You can run analytics workloads at any scale with automatic scaling that resizes resources in seconds to meet changing data volumes and processing requirements.

Big Data

Big Data Data-driven Management Visualization

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

Biggest Trends in Data Visualization Taking Shape in 2022

Data’s dark secret: Why poor quality cripples AI and growth

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Harnessing Streaming Data: Insights at the Speed of Life

Reference guide to build inventory management and forecasting solutions on AWS

Amazon Redshift data ingestion options

Use AWS Glue to streamline SFTP data processing

Gain insights from historical location data using Amazon Location Service and AWS analytics services

What is business analytics? Using data to improve business outcomes

How Chime Financial uses AWS to build a serverless stream analytics platform and defeat fraudsters

An AI Chat Bot Wrote This Blog Post …

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Build Hybrid Data Pipelines and Enable Universal Connectivity With CDF-PC Inbound Connections

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

Simplify Metrics on Apache Druid With Rill Data and Cloudera

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Deploy and Scale AI Applications With Cloudera AI Inference Service

Turning the page

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

Improve power utility operational efficiency using smart sensor data and Amazon QuickSight

7 key Microsoft Azure analytics services (plus one extra)

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Automate alerting and reporting for AWS Glue job resource usage

How healthcare organizations can analyze and create insights using price transparency data

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

A Planning Center of Excellence Delivers Performance Improvement

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Stored procedure enhancements in Amazon Redshift

Database vs. Data Warehouse: What’s the Difference?

How Infomedia built a serverless data pipeline with change data capture using AWS Glue and Apache Hudi

Building Better Data Models to Unlock Next-Level Intelligence

Advanced reporting and analytics for the Post Call Analytics (PCA) solution with Amazon QuickSight

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

Stay Connected