Data Leaders Brief

Spark Data Streaming with MongoDB

Analytics Vidhya

APRIL 23, 2022

This article was published as a part of the Data Science Blogathon. Introduction In this article, we are going to talk about data streaming with apache spark in Python with codes. We will also talk about how to persist our streaming data into MongoDB.We

Data Science

Data Science Publishing Analytics

Handling Streaming Data with Apache Kafka – A First Look

Analytics Vidhya

JUNE 21, 2022

This article was published as a part of the Data Science Blogathon. Introduction When we mention BigData, one of the types of data usually talked about is the Streaming Data. Streaming Data is generated continuously, by multiple data sources say, sensors, server logs, stock prices, etc.

Data Science

Data Science Publishing Analytics Big Data

Data Engineering for Streaming Data on GCP

Analytics Vidhya

APRIL 3, 2023

Introduction Companies can access a large pool of data in the modern business environment, and using this data in real-time may produce insightful results that can spur corporate success. Real-time dashboards such as GCP provide strong data visualization and actionable information for decision-makers.

Dashboards

Dashboards Visualization Analytics Data Warehouse

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

A Data Scientist’s Guide to Data Streaming

KDnuggets

MAY 14, 2025

This guide introduces data streaming from a data science perspective. Well explain what it is, why it matters, and how to use tools like Apache Kafka, Apache Flink, and PyFlink to build real-time pipelines.

Data Science

Data Science IT

7 Ways to Supercharge Your ABM Strategy with Real-Time Intent

Advertiser: ZoomInfo

In this guide, we’ll walk through how streaming real-time intent data can supercharge your ABM strategy, including: How streaming real-time intent works The benefits of real-time intent in your ABM strategy How you can box out the competition Learn how capturing buyers’ search behavior in real time can shorten your sales cycle.

Strategy

Use Amazon Kinesis Data Streams to deliver real-time data to Amazon OpenSearch Service domains with Amazon OpenSearch Ingestion

AWS Big Data

NOVEMBER 11, 2024

In this post, we show how to use Amazon Kinesis Data Streams to buffer and aggregate real-time streaming data for delivery into Amazon OpenSearch Service domains and collections using Amazon OpenSearch Ingestion. This decoupling provides advantages over traditional architectures.

Metadata

Metadata Metrics Analytics Data Processing

How to use a Machine Learning Model to Make Predictions on Streaming Data using PySpark

Analytics Vidhya

DECEMBER 11, 2019

Overview Streaming data is a thriving concept in the machine learning space Learn how to use a machine learning model (such as logistic regression). The post How to use a Machine Learning Model to Make Predictions on Streaming Data using PySpark appeared first on Analytics Vidhya.

Machine Learning

Machine Learning Modeling Analytics Data Science

Netflix Case Study (EDA): Unveiling Data-Driven Strategies for Streaming

Analytics Vidhya

JUNE 1, 2023

Introduction Welcome to our comprehensive data analysis blog that delves deep into the world of Netflix. As one of the leading streaming platforms globally, Netflix has revolutionized how we consume entertainment. With its vast library of movies and TV shows, it offers an abundance of choices for viewers around the world.

Data-driven

Data-driven Strategy Recreation/Entertainment Analytics

Apache Kafka: A Metaphorical Introduction to Event Streaming for Data Scientists and Data Engineers

Analytics Vidhya

NOVEMBER 2, 2020

Overview Learn about viewing data as streams of immutable events in contrast to mutable containers Understand how Apache Kafka captures real-time data through event. The post Apache Kafka: A Metaphorical Introduction to Event Streaming for Data Scientists and Data Engineers appeared first on Analytics Vidhya.

Analytics

Executive Report: The Customer Data Too Often Overlooked by the C-Suite

A recent Calabrio research study of more than 1,000 C-Suite executives has revealed leaders are missing a key data stream – voice of the customer data. Download the report to learn how executives can find and use VoC data to make more informed business decisions.

Reporting

Confluent Helps Organizations Tackle Streaming Data

David Menninger's Analyst Perspectives

SEPTEMBER 9, 2021

Confluent Platform is a streaming platform built by the original creators of Apache Kafka. It enables organizations to organize and manage streaming data from various sources. Confluent launched its IPO in June this year and raised $828 million to further expand its business.

Modeling

Modeling Management IT Machine Learning

Image Classification with Tensorflow: Data Augmentation on Streaming Data (Part 2)

Analytics Vidhya

MAY 27, 2021

ArticleVideo Book This article was published as a part of the Data Science Blogathon Introduction This article is in continuation with Part 1 where we discussed Data. The post Image Classification with Tensorflow: Data Augmentation on Streaming Data (Part 2) appeared first on Analytics Vidhya.

Data Science

Data Science Publishing Analytics Deep Learning

5 tips for transforming company data into new revenue streams

CIO Business Intelligence

APRIL 24, 2025

Enterprises worldwide are harboring massive amounts of data. Although data has always accumulated naturally, the result of ever-growing consumer and business activity, data growth is expanding exponentially, opening opportunities for organizations to monetize unprecedented amounts of information.

Data-driven

Data-driven Marketing Risk Sales

Migrate from Amazon Kinesis Data Analytics for SQL to Amazon Managed Service for Apache Flink and Amazon Managed Service for Apache Flink Studio

AWS Big Data

OCTOBER 17, 2024

Amazon Kinesis Data Analytics for SQL is a data stream processing engine that helps you run your own SQL code against streaming sources to perform time series analytics, feed real-time dashboards, and create real-time metrics. Apache Flink is a distributed open source engine for processing data streams.

Data Analytics

Data Analytics Management Analytics Recreation/Entertainment

Embedding BI: Architectural Considerations and Technical Requirements

While data platforms, artificial intelligence (AI), machine learning (ML), and programming platforms have evolved to leverage big data and streaming data, the front-end user experience has not kept up. Traditional Business Intelligence (BI) aren’t built for modern data platforms and don’t work on modern architectures.

Big Data

Analysing Streaming Tweets with Python and PostgreSQL

Analytics Vidhya

AUGUST 16, 2020

Introduction We are aware of the massive amounts of data being produced each day. This humungous data has lots of insights and hidden trends. The post Analysing Streaming Tweets with Python and PostgreSQL appeared first on Analytics Vidhya.

Analytics

Analytics Unstructured Data

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. Consider a common scenario: A streaming pipeline continuously writes data to an Iceberg table while scheduled maintenance jobs perform compaction operations.

Snapshot

Snapshot Management Metadata Big Data

Invoke AWS Lambda functions from cross-account Amazon Kinesis Data Streams

AWS Big Data

MARCH 20, 2024

In a streaming architecture, you may have event producers, stream storage, and event consumers in a single account or spread across different accounts depending on your business and IT requirements. Amazon Kinesis Data Streams enables real-time processing of streaming data at scale.

Internet of Things

Internet of Things IoT Manufacturing Data Processing

Optimize write throughput for Amazon Kinesis Data Streams

AWS Big Data

JUNE 3, 2024

Amazon Kinesis Data Streams is used by many customers to capture, process, and store data streams at any scale. This level of unparalleled scale is enabled by dividing each data stream into multiple shards. Each shard in a stream has a 1 Mbps or 1,000 records per second write throughput limit.

Optimization

Optimization Metrics Data Processing Testing

5 Key Elements for Building a Successful Data-Driven Product

Leading brands and local businesses alike are tapping into varied business and consumer data to power their products and meet consumers’ ever-evolving needs. But companies need to remember that a product can only be as good as the data that powers it. The criteria you should use to vet available data sources.

Data-driven

Build Spark Structured Streaming applications with the open source connector for Amazon Kinesis Data Streams

AWS Big Data

MAY 24, 2024

Apache Spark is a powerful big data engine used for large-scale data analytics. You can use Apache Spark to process streaming data from a variety of streaming sources, including Amazon Kinesis Data Streams for use cases like clickstream analysis, fraud detection, and more.

Metadata

Metadata Interactive Business Objectives Management

Kafka to MongoDB: Building a Streamlined Data Pipeline

Analytics Vidhya

FEBRUARY 28, 2024

Introduction Data is fuel for the IT industry and the Data Science Project in today’s online world. IT industries rely heavily on real-time insights derived from streaming data sources. Handling and processing the streaming data is the hardest work for Data Analysis.

Data Science

Data Science Analytics IT Unstructured Data

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

We’re living in the age of real-time data and insights, driven by low-latency data streaming applications. The volume of time-sensitive data produced is increasing rapidly, with different formats of data being introduced across new businesses and customer use cases.

Analytics

Analytics IoT Data-driven Snapshot

Reduce your compute costs for stream processing applications with Kinesis Client Library 3.0

AWS Big Data

NOVEMBER 6, 2024

Amazon Kinesis Data Streams is a serverless data streaming service that makes it straightforward to capture and store streaming data at any scale. By abstracting away these concerns, KCL allows developers to focus on what matters most—implementing their core business logic for processing streaming data.

Cost-Benefit

Cost-Benefit Metadata Optimization Publishing

Embedded BI and Analytics: Best Practices to Monetize Your Data

Speaker: Azmat Tanauli, Senior Director of Product Strategy at Birst

How much potential revenue is hidden in your data? In a recent Economist survey of 476 senior executives worldwide, 60% are already generating revenue from their data, and a whopping 83% have used data to make existing products or services more profitable.

Analytics

Apache Flume Interview Questions

Analytics Vidhya

JULY 27, 2022

This article was published as a part of the Data Science Blogathon. Introduction to Apache Flume Apache Flume is a data ingestion mechanism for gathering, aggregating, and transmitting huge amounts of streaming data from diverse sources, such as log files, events, and so on, to a centralized data storage.

Data Science

Data Science Publishing Analytics IT

Python in Finance: Real Time Data Streaming within Jupyter Notebook

KDnuggets

FEBRUARY 20, 2024

Learn a modern approach to stream real-time data in Jupyter Notebook. This guide covers dynamic visualizations, a Python for quant finance use case, and Bollinger Bands analysis with live data.

Finance

Finance Visualization

Kafka Stream Processing Guide 2024

Analytics Vidhya

MARCH 27, 2024

Introduction Starting with the fundamentals: What is a data stream, also referred to as an event stream or streaming data? At its heart, a data stream is a conceptual framework representing a dataset that is perpetually open-ended and expanding.

Analytics

Analytics IT Big Data

AWS Kinesis: Benefits & Use cases

Analytics Vidhya

JULY 29, 2022

This article was published as a part of the Data Science Blogathon. Introduction Amazon Kinesis is one of the best-managed services that scale particularly flexibly, especially for processing real-time data at a massive site.

Data Science

Data Science Publishing Management Analytics

Data Analytics in the Cloud for Developers and Founders

Speaker: Javier Ramírez, Senior AWS Developer Advocate, AWS

You have lots of data, and you are probably thinking of using the cloud to analyze it. But how will you move data into the cloud? How will you validate and prepare the data? What about streaming data? Can data scientists discover and use the data? Is your data secure? In which format?

Data Lake

Combine transactional, streaming, and third-party data on Amazon Redshift for financial services

AWS Big Data

FEBRUARY 1, 2024

Financial services customers are using data from different sources that originate at different frequencies, which includes real time, batch, and archived datasets. Additionally, they need streaming architectures to handle growing trade volumes, market volatility, and regulatory demands. version cluster. version cluster.

Data Warehouse

Data Warehouse Dashboards Risk Management Risk

Developing an End-to-End Automated Data Pipeline

Analytics Vidhya

JULY 20, 2022

This article was published as a part of the Data Science Blogathon. Introduction Data acclimates to countless shapes and sizes to complete its journey from a source to a destination. Be it a streaming job or a batch job, ETL and ELT are irreplaceable.

Data Science

Data Science Publishing Optimization Analytics

Five Modern Data Architecture Trends

David Menninger's Analyst Perspectives

MARCH 30, 2020

I was recently asked to identify key modern data architecture trends. Data architectures have changed significantly to accommodate larger volumes of data as well as new types of data such as streaming and unstructured data. Here are some of the trends I see continuing to impact data architectures.

Data Architecture

Data Architecture Unstructured Data Data Lake Data Governance

Simplify data streaming ingestion for analytics using Amazon MSK and Amazon Redshift

AWS Big Data

FEBRUARY 21, 2024

Towards the end of 2022, AWS announced the general availability of real-time streaming ingestion to Amazon Redshift for Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (Amazon MSK) , eliminating the need to stage streaming data in Amazon Simple Storage Service (Amazon S3) before ingesting it into Amazon Redshift.

Analytics

Analytics Data-driven Management Data Integration

Introduction to Spark Streaming

Analytics Vidhya

AUGUST 13, 2022

This article was published as a part of the Data Science Blogathon. Introduction We, as a learner, are in the stage of analyzing the data mostly in the CSV format. Still, we need to understand that at the enterprise level, most of the work is done in real-time, where we need skills to stream live data. […].

Data Science

Data Science Publishing Enterprise Analytics

Build a Scalable Data Pipeline with Apache Kafka

Analytics Vidhya

MARCH 10, 2023

Introduction Apache Kafka is a framework for dealing with many real-time data streams in a way that is spread out. It was made on LinkedIn and shared with the public in 2011.

Management

Management Analytics IT

How EchoStar ingests terabytes of data daily across its 5G Open RAN network in near real-time using Amazon Redshift Serverless Streaming Ingestion

AWS Big Data

JULY 8, 2024

Amazon Redshift Serverless is a fully managed, scalable cloud data warehouse that accelerates your time to insights with fast, simple, and secure analytics at scale. Amazon Redshift data sharing allows you to share data within and across organizations, AWS Regions, and even third-party providers, without moving or copying the data.

Data Warehouse

Data Warehouse IT Recreation/Entertainment Cost-Benefit

4 Artificial Intelligence Approaches to Watch Out For

Analytics Vidhya

NOVEMBER 23, 2022

This article was published as a part of the Data Science Blogathon. Introduction Artificial intelligence (AI) is the most dynamic stream in the world. Humans have always been curious about their abilities to predict, understand, act, and make decisions.

Data Science

Data Science Publishing Analytics IT

Databricks Lakehouse Platform Streamlines Big Data Processing

David Menninger's Analyst Perspectives

OCTOBER 26, 2021

Databricks is a data engineering and analytics cloud platform built on top of Apache Spark that processes and transforms huge volumes of data and offers data exploration capabilities through machine learning models. The platform supports streaming data, SQL queries, graph processing and machine learning.

Big Data

Big Data Data Processing Machine Learning Modeling

How to Develop Serverless Code Using Azure Functions?

Analytics Vidhya

JANUARY 30, 2023

Whether we are analyzing IoT data streams, managing scheduled events, processing document uploads, responding to database changes, etc. Azure functions allow developers […] The post How to Develop Serverless Code Using Azure Functions?

IoT

IoT Management Analytics

A Dive into Apache Flume: Installation, Setup, and Configuration

Analytics Vidhya

MARCH 7, 2023

Introduction Apache Flume is a tool/service/data ingestion mechanism for gathering, aggregating, and delivering huge amounts of streaming data from diverse sources, such as log files, events, and so on, to centralized data storage. Flume is a tool that is very dependable, distributed, and customizable.

Analytics

Setting up Real-time Structured Streaming with Spark and Kafka on Windows OS

Analytics Vidhya

JUNE 26, 2021

ArticleVideo Book This article was published as a part of the Data Science Blogathon Introduction One of the major problem everyone face when they first. The post Setting up Real-time Structured Streaming with Spark and Kafka on Windows OS appeared first on Analytics Vidhya.

Data Science

Data Science Publishing Analytics

Better together? Why AWS is unifying data analytics and AI services in SageMaker

CIO Business Intelligence

DECEMBER 6, 2024

Data warehousing, business intelligence, data analytics, and AI services are all coming together under one roof at Amazon Web Services. It combines SQL analytics, data processing, AI development, data streaming, business intelligence, and search analytics.

Data Analytics

Data Analytics Analytics Data Lake Data Warehouse

Apache Flume: Data Collection, Aggregation & Transporting Tool

Analytics Vidhya

MAY 10, 2022

This article was published as a part of the Data Science Blogathon. Introduction on Apache Flume Apache Flume is a platform for aggregating, collecting, and transporting massive volumes of log data quickly and effectively. Its design is simple, based on streaming data flows, and written in the Java programming […].

Data Collection

Data Collection Data Science Publishing Analytics

Spark Data Streaming with MongoDB

Handling Streaming Data with Apache Kafka – A First Look

Webinars

Trending Sources

Data Engineering for Streaming Data on GCP

Webinars

A Data Scientist’s Guide to Data Streaming

7 Ways to Supercharge Your ABM Strategy with Real-Time Intent

Use Amazon Kinesis Data Streams to deliver real-time data to Amazon OpenSearch Service domains with Amazon OpenSearch Ingestion

How to use a Machine Learning Model to Make Predictions on Streaming Data using PySpark

Netflix Case Study (EDA): Unveiling Data-Driven Strategies for Streaming

Apache Kafka: A Metaphorical Introduction to Event Streaming for Data Scientists and Data Engineers

Executive Report: The Customer Data Too Often Overlooked by the C-Suite

Confluent Helps Organizations Tackle Streaming Data

Image Classification with Tensorflow: Data Augmentation on Streaming Data (Part 2)

5 tips for transforming company data into new revenue streams

Migrate from Amazon Kinesis Data Analytics for SQL to Amazon Managed Service for Apache Flink and Amazon Managed Service for Apache Flink Studio

Embedding BI: Architectural Considerations and Technical Requirements

Analysing Streaming Tweets with Python and PostgreSQL

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Invoke AWS Lambda functions from cross-account Amazon Kinesis Data Streams

Optimize write throughput for Amazon Kinesis Data Streams

5 Key Elements for Building a Successful Data-Driven Product

Build Spark Structured Streaming applications with the open source connector for Amazon Kinesis Data Streams

Kafka to MongoDB: Building a Streamlined Data Pipeline

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Reduce your compute costs for stream processing applications with Kinesis Client Library 3.0

Embedded BI and Analytics: Best Practices to Monetize Your Data

Apache Flume Interview Questions

Python in Finance: Real Time Data Streaming within Jupyter Notebook

Kafka Stream Processing Guide 2024

AWS Kinesis: Benefits & Use cases

Data Analytics in the Cloud for Developers and Founders

Combine transactional, streaming, and third-party data on Amazon Redshift for financial services

Developing an End-to-End Automated Data Pipeline

Five Modern Data Architecture Trends

Simplify data streaming ingestion for analytics using Amazon MSK and Amazon Redshift

Introduction to Spark Streaming

Build a Scalable Data Pipeline with Apache Kafka

How EchoStar ingests terabytes of data daily across its 5G Open RAN network in near real-time using Amazon Redshift Serverless Streaming Ingestion

4 Artificial Intelligence Approaches to Watch Out For

Databricks Lakehouse Platform Streamlines Big Data Processing

How to Develop Serverless Code Using Azure Functions?

A Dive into Apache Flume: Installation, Setup, and Configuration

Setting up Real-time Structured Streaming with Spark and Kafka on Windows OS

Better together? Why AWS is unifying data analytics and AI services in SageMaker

Apache Flume: Data Collection, Aggregation & Transporting Tool

Stay Connected