Data Transformation, Events and Optimization

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This approach helps in managing storage costs while maintaining the flexibility to analyze historical trends when needed.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. or a later version) database.

Data Warehouse

Data Warehouse Analytics Testing Modeling

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. With a unified catalog, enhanced analytics capabilities, and efficient data transformation processes, were laying the groundwork for future growth.

IoT

IoT Machine Learning Metadata Data-driven

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

AWS Big Data

NOVEMBER 22, 2024

Maintaining reusable database sessions to help optimize the use of database connections, preventing the API server from exhausting the available connections and improving overall system scalability. Building event-driven applications with Amazon EventBridge and Lambda.

Data Warehouse

Data Warehouse Recreation/Entertainment Cost-Benefit Data-driven

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. How is Data Virtualization performance optimized? In forecasting future events.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

Accurately predicting demand for products allows businesses to optimize inventory levels, minimize stockouts, and reduce holding costs. Solution overview In today’s highly competitive business landscape, it’s essential for retailers to optimize their inventory management processes to maximize profitability and improve customer satisfaction.

Forecasting

Forecasting Management IoT Data-driven

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

We used the AWS Step Function state machines to define, orchestrate, and execute our data pipelines. Amazon EventBridge We used Amazon EventBridge, the serverless event bus service, to define the event-based rules and schedules that would trigger our AWS Step Functions state machines.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

With auto-copy, automation enhances the COPY command by adding jobs for automatic ingestion of data. If storing operational data in a data warehouse is a requirement, synchronization of tables between operational data stores and Amazon Redshift tables is supported.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Harnessing Streaming Data: Insights at the Speed of Life

Sisense

OCTOBER 15, 2020

Let’s look at a few ways that different industries take advantage of streaming data. How industries can benefit from streaming data. Automotive: Monitoring connected, autonomous cars in real time to optimize routes to avoid traffic and for diagnosis of mechanical issues. Optimizing object storage.

Dashboards

Dashboards IoT Optimization Internet of Things

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

You will load the event data from the SFTP site, join it to the venue data stored on Amazon S3, apply transformations, and store the data in Amazon S3. The event and venue files are from the TICKIT dataset. For Node parents , select Rename Venue data and Rename Event data.

Data Processing

Data Processing Visualization Data Lake Data Processing

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

Using EventBridge integration, filtered positional updates are published to an EventBridge event bus. Amazon Location device position events arrive on the EventBridge default bus with source: ["aws.geo"] and detail-type: ["Location Device Position Event"]. In this model, the Lambda function is invoked for each incoming event.

Analytics

Analytics IoT Metadata Internet of Things

What is business analytics? Using data to improve business outcomes

CIO Business Intelligence

JULY 5, 2022

What is the difference between business analytics and data analytics? Business analytics is a subset of data analytics. Data analytics is used across disciplines to find trends and solve problems using data mining , data cleansing, data transformation, data modeling, and more.

Business Analytics

Business Analytics Prescriptive Analytics Data mining Diagnostic Analytics

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

AWS Big Data

APRIL 2, 2024

You can use Amazon Data Firehose to aggregate and deliver log events from your applications and services captured in Amazon CloudWatch Logs to your Amazon Simple Storage Service (Amazon S3) bucket and Splunk destinations, for use cases such as data analytics, security analysis, application troubleshooting etc.

Metadata

Metadata Marketing Analytics Data Transformation

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Furthermore, it allows for necessary actions to be taken, such as rectifying errors in the data source, refining data transformation processes, and updating data quality rules. An EventBridge rule receives an event notification from the AWS Glue Data Quality evaluations including the results.

Data Quality

Data Quality Metrics Data-driven Visualization

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon AppFlow is a fully managed integration service that you can use to securely transfer data from software as a service (SaaS) applications, such as Google BigQuery, Salesforce, SAP, HubSpot, and ServiceNow, to Amazon Web Services (AWS) services such as Amazon Simple Storage Service (Amazon S3) and Amazon Redshift, in just a few clicks.

Analytics

Analytics Data Warehouse Metrics Big Data

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

By providing real-time visibility into the performance and behavior of data-related systems, DataOps observability enables organizations to identify and address issues before they become critical, and to optimize their data-related workflows for maximum efficiency and effectiveness.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

This means there are no unintended data errors, and it corresponds to its appropriate designation (e.g., Here, it all comes down to the data transformation error rate. In other words, it measures the time between when data is expected and the moment when it is readily available for use. date, month, and year).

Data Quality

Data Quality Metrics Data-driven Management

Build Hybrid Data Pipelines and Enable Universal Connectivity With CDF-PC Inbound Connections

Cloudera

JUNE 17, 2022

In the second blog of the Universal Data Distribution blog series , we explored how Cloudera DataFlow for the Public Cloud (CDF-PC) can help you implement use cases like data lakehouse and data warehouse ingest, cybersecurity, and log optimization, as well as IoT and streaming data collection.

Cost-Benefit

Cost-Benefit IoT Data Warehouse Manufacturing

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Cloudera

JULY 21, 2022

Cloudera users can securely connect Rill to a source of event stream data, such as Cloudera DataFlow , model data into Rill’s cloud-based Druid service, and share live operational dashboards within minutes via Rill’s interactive metrics dashboard or any connected BI solution. Cloudera Data Warehouse). Apache Hive.

Metrics

Metrics Slice and Dice Data Warehouse Dashboards

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

Additionally, there are major rewrites to deliver developer-focused improvements, including static type checking, enhanced runtime validation, strong consistency in call patterns, and optimized event chaining. is modernized by using promises for all actions, so developers can use async and await functions for better event management.

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

Improve power utility operational efficiency using smart sensor data and Amazon QuickSight

AWS Big Data

MAY 16, 2023

Different communication infrastructure types such as mesh network and cellular can be used to send load information on a pre-defined schedule or event data in real time to the backend servers residing in the utility UDN (Utility Data Network).

Dashboards

Dashboards Statistics Data Collection Business Intelligence

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

This service supports a range of optimized AI models, enabling seamless and scalable AI inference. By leveraging the NVIDIA NeMo platform and optimized versions of open-source models like Llama 3 and Mistral, businesses can harness the latest advancements in natural language processing, computer vision, and other AI domains.

Optimization

Optimization Experimentation Metrics Enterprise

Turning the page

Cloudera

JUNE 1, 2021

Our customers must also have secure access to their data from anywhere – from on-premises to hybrid clouds and multiple public clouds. We must integrate and optimize the end-to-end data lifecycle for our customers, empowering them to focus on what really matters – extracting value from their data.

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

It supports modern analytical data lake operations such as create table as select (CTAS), upsert and merge, and time travel queries. Athena also supports the ability to create views and perform VACUUM (snapshot expiration) on Apache Iceberg tables to optimize storage and performance.

Data Lake

Data Lake Snapshot Optimization Data Transformation

Automate alerting and reporting for AWS Glue job resource usage

AWS Big Data

MAY 25, 2023

Data transformation plays a pivotal role in providing the necessary data insights for businesses in any organization, small and large. To gain these insights, customers often perform ETL (extract, transform, and load) jobs from their source systems and output an enriched dataset.

Reporting

Reporting Metrics Optimization Data Lake

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

If you can’t make sense of your business data, you’re effectively flying blind. Insights hidden in your data are essential for optimizing business operations, finetuning your customer experience, and developing new products — or new lines of business, like predictive maintenance. Azure Data Factory.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

AWS Big Data

AUGUST 8, 2024

Oracle GoldenGate for Oracle Database and Big Data adapters Oracle GoldenGate is a real-time data integration and replication tool used for disaster recovery, data migrations, high availability. GoldenGate provides special tools called S3 event handlers to integrate with Amazon S3 for data replication.

Analytics

Analytics Big Data Software Data Integration

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

Due to this low complexity, the solution uses AWS serverless services to ingest the data, transform it, and make it available for analytics. The serverless architecture features auto scaling, high availability, and a pay-as-you-go billing model to increase agility and optimize costs.

Visualization

Visualization Dashboards Data-driven Gap analysis

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

Additionally, a TCO calculator generates the TCO estimation of an optimized EMR cluster for facilitating the migration. Transform the YARN job history logs from JSON to CSV After obtaining YARN logs, you run a YARN log organizer, yarn-log-organizer.py, which is a parser to transform JSON-based logs to CSV files.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

Once a draft has been created or opened, developers use the visual Designer to build their data flow logic and validate it using interactive test sessions. In the Designer, you have the ability to start and stop each step of the data pipeline, resulting in events being queued up in the connections that link the processing steps together.

Testing

Testing Publishing Metadata Interactive

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift enables you to use SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and machine learning (ML) to deliver the best price-performance at scale. Shashank Tewari is a Senior Technical Account Manager at AWS.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

Cloudera

OCTOBER 14, 2020

YuniKorn is designed for Big Data app workloads, and it natively supports to run Spark/Flink/Tensorflow, etc efficiently in K8s. YuniKorn is optimized for performance, it is suitable for high throughput and large scale environments. YuniKorn scheduler provides an optimal solution to manage resource quotas by using resource queues.

Machine Learning

Machine Learning Management Big Data Optimization

A Planning Center of Excellence Delivers Performance Improvement

David Menninger's Analyst Perspectives

NOVEMBER 7, 2024

It’s a pantry because all the data one needs is readily available and easily accessible, with labels that are immediately recognized and understood by the users of the application. In tech speak, this means the semantic layer is optimized for the intended audience.

Forecasting

Forecasting Machine Learning Finance Predictive Analytics

Stored procedure enhancements in Amazon Redshift

AWS Big Data

SEPTEMBER 6, 2023

Stored procedures are commonly used to encapsulate logic for data transformation, data validation, and business-specific logic. It inserts a record in the procedure_log table in the event of an exception. It also has an EXCEPTION block and inserts a record in the procedure_log table in the event of an exception.

Data Warehouse

Data Warehouse Insurance Statistics Software

Advanced reporting and analytics for the Post Call Analytics (PCA) solution with Amazon QuickSight

AWS Big Data

JANUARY 27, 2023

As part of the solution workflow, EventBridge receives an event for each PCA solution analysis output file. Kinesis Data Firehose uses Lambda to perform data transformation and compression, storing the file in a compressed columnar format (Parquet) in the target S3 bucket. step 3) on the Amazon S3 console.

Analytics

Analytics Reporting Dashboards Visualization

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

AWS Big Data

OCTOBER 12, 2023

With EMR Serverless, you don’t have to configure, optimize, secure, or operate clusters to run applications with these frameworks. You can run analytics workloads at any scale with automatic scaling that resizes resources in seconds to meet changing data volumes and processing requirements.

Big Data

Big Data Data-driven Management Visualization

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

When it comes to data modeling, function determines form. Let’s say you want to subject a dataset to some form of anomaly detection; your model might take the form of a singular event stream that can be read by an anomaly detection service. filling in nulls, changing time zones, formatting strings, conditional logic, etc.)

Modeling

Modeling Big Data IoT Data Warehouse

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

AWS Big Data

FEBRUARY 21, 2023

We use Apache Spark as our main data processing engine and have over 1,000 Spark applications running over massive amounts of data every day. These Spark applications implement our business logic ranging from data transformation, machine learning (ML) model inference, to operational tasks. Their costs were climbing.

Cost-Benefit

Cost-Benefit Informatics Optimization Management

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

AWS Big Data

JULY 27, 2023

DataBrew is a visual data preparation tool that enables you to clean and normalize data without writing any code. The over 200 transformations it provides are now available to be used in an AWS Glue Studio visual job. We can use knowledge of the data to optimize the join by filtering the data we really need.

Visualization

Visualization Cost-Benefit Data Quality Publishing

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

AWS Big Data

FEBRUARY 26, 2024

When you start the process of designing your data model for Amazon Keyspaces, it’s essential to possess a comprehensive understanding of your access patterns, similar to the approach used in other NoSQL databases. Additionally, you can configure OpenSearch Ingestion to apply data transformations before delivery.

Dashboards

Dashboards Testing Metrics Optimization

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

The key idea behind incremental queries is to use metadata or change tracking mechanisms to identify the new or modified data since the last query. By identifying these changes, the query engine can optimize the query to process only the relevant data, significantly reducing the processing time and resource requirements.

Data Lake

Data Lake Snapshot Big Data Data-driven

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

Curated foundation models, such as those created by IBM or Microsoft, help enterprises scale and accelerate the use and impact of the most advanced AI capabilities using trusted data. In addition to natural language, models are trained on various modalities, such as code, time-series, tabular, geospatial and IT events data.

Risk

Risk Modeling Management Metadata

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

Finally, CFM uses an AWS Graviton architecture to optimize even more cost and performance (as highlighted in the screenshot below). Another advantage was improved security, particularly the ability to contain the potential impact area in the event of credential leaks or account compromises.

Interactive

Interactive Strategy Cost-Benefit Data Governance

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

Additionally, the scale is significant because the multi-tenant data sources provide a continuous stream of testing activity, and our users require quick data refreshes as well as historical context for up to a decade due to compliance and regulatory demands. Finally, data integrity is of paramount importance.

Software

Software Data Lake Testing Cost-Benefit

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

Biggest Trends in Data Visualization Taking Shape in 2022

Reference guide to build inventory management and forecasting solutions on AWS

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Amazon Redshift data ingestion options

Harnessing Streaming Data: Insights at the Speed of Life

Use AWS Glue to streamline SFTP data processing

Gain insights from historical location data using Amazon Location Service and AWS analytics services

What is business analytics? Using data to improve business outcomes

Deliver decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk using Amazon Data Firehose

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

An AI Chat Bot Wrote This Blog Post …

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Build Hybrid Data Pipelines and Enable Universal Connectivity With CDF-PC Inbound Connections

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

Improve power utility operational efficiency using smart sensor data and Amazon QuickSight

Deploy and Scale AI Applications With Cloudera AI Inference Service

Turning the page

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

Automate alerting and reporting for AWS Glue job resource usage

7 key Microsoft Azure analytics services (plus one extra)

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

How healthcare organizations can analyze and create insights using price transparency data

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

A Planning Center of Excellence Delivers Performance Improvement

Stored procedure enhancements in Amazon Redshift

Advanced reporting and analytics for the Post Call Analytics (PCA) solution with Amazon QuickSight

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

Building Better Data Models to Unlock Next-Level Intelligence

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

How to use foundation models and trusted governance to manage AI workflow risk

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Stay Connected