Data Leaders Brief

Build multi-Region resilient Apache Kafka applications with identical topic names using Amazon MSK and Amazon MSK Replicator

AWS Big Data

MARCH 25, 2025

Amazon Managed Streaming for Apache Kafka (Amazon MSK) is deployed across multiple Availability Zones and provides resilience within an AWS Region. This post explains how to use MSK Replicator for cross-cluster data replication and details the failover and failback processes while keeping the same topic name across Regions.

Metrics

Metrics Testing Management Risk

Migrate from Amazon Kinesis Data Analytics for SQL to Amazon Managed Service for Apache Flink and Amazon Managed Service for Apache Flink Studio

AWS Big Data

OCTOBER 17, 2024

Amazon Kinesis Data Analytics for SQL is a data stream processing engine that helps you run your own SQL code against streaming sources to perform time series analytics, feed real-time dashboards, and create real-time metrics. Customers running SQL queries typically select Amazon Managed Service for Apache Flink Studio.

Data Analytics

Data Analytics Management Analytics Recreation/Entertainment

What is the Difference Between Data Science and Machine Learning?

Analytics Vidhya

JUNE 26, 2023

Introduction “Data Science” and “Machine Learning” are prominent technological topics in the 25th century. They are utilized by various entities, ranging from novice computer science students to major organizations like Netflix and Amazon.

Machine Learning

Machine Learning Data Science Big Data Measurement

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How Does Amazon SNS Work?

Analytics Vidhya

AUGUST 21, 2022

Introduction Amazon Simple Notification Service (SNS) is a managed service that delivers messages from publishers to subscribers (also known as producers and consumers). Publishers communicate asynchronously by sending messages on a topic that serves as a logical access point and communication route for […].

Data Science

Data Science Publishing Management Analytics

Fitch Group achieves multi-Region resiliency for mission-critical Kafka infrastructure with Amazon MSK Replicator

AWS Big Data

DECEMBER 23, 2024

Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that allows you to build and run production Kafka applications. At the heart of this ecosystem lies Kafka, specifically Amazon MSK, which serves as the backbone for their data integration systems.

Data-driven

Data-driven Management Risk Big Data

5 key areas for tech leaders to watch in 2020

O'Reilly on Data

FEBRUARY 18, 2020

O’Reilly online learning contains information about the trends, topics, and issues tech leaders need to watch and explore. It’s also the data source for our annual usage study, which examines the most-used topics and the top search terms. [1]. Up until 2017, the ML+AI topic had been amongst the fastest growing topics on the platform.

Data-driven

Data-driven Software Statistics Marketing

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

AWS Big Data

OCTOBER 30, 2024

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze your data using standard SQL and your existing business intelligence (BI) tools. Data ingestion is the process of getting data to Amazon Redshift.

Data Warehouse

Data Warehouse Sales Data Lake Recreation/Entertainment

Elevate your search and analytics skills with the new Amazon OpenSearch Service YouTube channel

AWS Big Data

OCTOBER 17, 2024

We’re thrilled to announce the launch of the official Amazon OpenSearch Service YouTube channel —a comprehensive resource for anyone looking to master Amazon OpenSearch Service. Amazon OpenSearch Service is a managed service that makes it straightforward to deploy, operate, and scale OpenSearch domains in AWS.

Analytics

Analytics Optimization Data-driven Data Architecture

Amazon OpenSearch Service launches the next-generation OpenSearch UI

AWS Big Data

NOVEMBER 7, 2024

Amazon OpenSearch Service launches a modernized operational analytics experience that can provide comprehensive observability spanning multiple data sources , so that you can gain insights from OpenSearch and other integrated data sources in one place. You can add collaborators by their IAM Amazon Resource Name (ARN) or IDC username.

Dashboards

Dashboards Visualization Data-driven Management

How REA Group approaches Amazon MSK cluster capacity planning

AWS Big Data

DECEMBER 5, 2024

REA Group, a digital business that specializes in real estate property, solved this problem using Amazon Managed Streaming for Apache Kafka (Amazon MSK) and a data streaming platform called Hydro. REA Group’s team of more than 3,000 people is guided by our purpose: to change the way the world experiences property.

Metrics

Metrics Dashboards Testing Optimization

Migrate from Standard brokers to Express brokers in Amazon MSK using Amazon MSK Replicator

AWS Big Data

FEBRUARY 13, 2025

Amazon Managed Streaming for Apache Kafka (Amazon MSK) now offers a new broker type called Express brokers. To learn more about Express brokers, refer to Introducing Express brokers for Amazon MSK to deliver high throughput and faster scaling for your Kafka clusters.

Metrics

Metrics Metadata Strategy Management

Use Amazon Kinesis Data Streams to deliver real-time data to Amazon OpenSearch Service domains with Amazon OpenSearch Ingestion

AWS Big Data

NOVEMBER 11, 2024

In this post, we show how to use Amazon Kinesis Data Streams to buffer and aggregate real-time streaming data for delivery into Amazon OpenSearch Service domains and collections using Amazon OpenSearch Ingestion. For the Amazon S3 log use case, see Using an OpenSearch Ingestion pipeline with Amazon S3.

Metadata

Metadata Metrics Analytics Data Processing

Infor’s Velocity Summit Highlights Multiple Advances and Enhancements

David Menninger's Analyst Perspectives

NOVEMBER 12, 2024

Infor introduced its original AI and machine learning capabilities in 2017 in the form of Coleman, which uses its Infor AI/ML platform built on Amazon’s SageMaker to create predictive and prescriptive analytics. It also offered a chatbot that utilized Amazon Lex.

Finance

Finance Prescriptive Analytics Cost-Benefit Manufacturing

Unbundling the Graph in GraphRAG

O'Reilly on Data

NOVEMBER 19, 2024

Large-scale production recommenders, search engines, and other discovery processes also have a long history of leveraging knowledge graphs , such as at Amazon , Alphabet , Microsoft , LinkedIn , eBay , Pinterest , and so on. What is GraphRAG? Graph technologies help reveal nonintuitive connections within data.

Unstructured Data

Unstructured Data Structured Data Statistics Modeling

How EchoStar ingests terabytes of data daily across its 5G Open RAN network in near real-time using Amazon Redshift Serverless Streaming Ingestion

AWS Big Data

JULY 8, 2024

Amazon Redshift Serverless is a fully managed, scalable cloud data warehouse that accelerates your time to insights with fast, simple, and secure analytics at scale. Amazon Redshift data sharing allows you to share data within and across organizations, AWS Regions, and even third-party providers, without moving or copying the data.

Data Warehouse

Data Warehouse IT Recreation/Entertainment Cost-Benefit

Take Your SQL Skills To The Next Level With These Popular SQL Books

datapine

SEPTEMBER 27, 2022

These businesses include eBay, Autotrader, and Amazon. In other words, “Sams Teach Yourself SQL in 10 Minutes” teaches the parts of SQL you need to know: starting with simple data retrieval and quickly going on to more complex topics including the use of SQL joins , subqueries, stored procedures, cursors, triggers, and table constraints.

Business Intelligence

Business Intelligence Data Warehouse Data Processing Data mining

Improve Apache Kafka scalability and resiliency using Amazon MSK tiered storage

AWS Big Data

AUGUST 2, 2024

Since the launch of tiered storage for Amazon Managed Streaming for Apache Kafka (Amazon MSK), customers have embraced this feature for its ability to optimize storage costs and improve performance. New messages are initially written to Amazon EBS for fast performance. This frees up space on the EBS volumes for new messages.

Metrics

Metrics Testing Cost-Benefit Management

How to Set AI Goals

O'Reilly on Data

SEPTEMBER 15, 2020

User stakeholders are interested in benefiting from the platform’s functionality: staying up-to-date, quickly finding new people and topics to follow, and engaging with family and friends. Customer stakeholders are the people and companies that advertise on the platform, and are most concerned with ROI on their ad spend.

Advertising

Advertising Cost-Benefit ROI Machine Learning

How VMware Tanzu CloudHealth migrated from self-managed Kafka to Amazon MSK

AWS Big Data

MARCH 14, 2024

to Amazon Managed Streaming for Apache Kafka (Amazon MSK) running version 2.6.2. Why we migrated to Amazon MSK For us, migrating to Amazon MSK came down to three key decision points: Simplified technical operations – Running Kafka on a self-managed infrastructure was an operational overhead for us.

Management

Management Insurance Optimization Strategy

AI will evolve the role of the CIO

CIO Business Intelligence

NOVEMBER 4, 2024

Jacknis advises CIOs to focus on the three reasons why AI is such a hot topic. AI bias has already got organizations such as online retailer Amazon into hot water, and here again, the CIO must play a pivotal role in protecting the business. It’s back to Moore’s Law. We’ve already seen that AI depends on a lot of compute power.

Business Driver

Business Driver Advertising Data-driven Modeling

OpenSearch UI: Six months in review

AWS Big Data

MAY 23, 2025

With OpenSearch UI, you can have a unified interface to gain actionable insights across multiple data sources, including Amazon OpenSearch Service domains , Amazon OpenSearch Serverless collections , and AWS services such as Amazon CloudWatch and Amazon Security Lake.

Visualization

Visualization Enterprise Dashboards Management

Stream multi-tenant data with Amazon MSK

AWS Big Data

JUNE 20, 2024

AWS helps SaaS vendors by providing the building blocks needed to implement a streaming application with Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (Amazon MSK), and real-time processing applications with Amazon Managed Service for Apache Flink. In particular, we focus on Amazon MSK.

Modeling

Modeling Internet of Things Risk Management

Introducing Amazon MSK as a source for Amazon OpenSearch Ingestion

AWS Big Data

AUGUST 31, 2023

Ingesting a high volume of streaming data has been a defining characteristic of operational analytics workloads with Amazon OpenSearch Service. Many of these workloads involve either self-managed Apache Kafka or Amazon Managed Streaming for Apache Kafka (Amazon MSK) to satisfy their data streaming needs.

Testing

Testing Data Processing Dashboards Management

Core technologies and tools for AI, big data, and cloud computing

O'Reilly on Data

FEBRUARY 11, 2019

The resource examples I’ll cite will be drawn from the upcoming Strata Data conference in San Francisco , where leading companies and speakers will share their learnings on the topics covered in this post. AI and machine learning in the enterprise. Security and Privacy.

Big Data

Big Data Technology Machine Learning Deep Learning

The Top 20 Data Visualization Books That Should Be On Your Bookshelf

datapine

SEPTEMBER 16, 2022

A mere Amazon search of this topic returns over 15k items. Though printed in 1983, it remains a classic and a bestseller on Amazon. Boasting near flawless reader reviews on Amazon, this graphically-driven book on data visualization makes an excellent companion when it comes to thriving in the digital age.

Visualization

Visualization Dashboards Data-driven Statistics

Simplify data streaming ingestion for analytics using Amazon MSK and Amazon Redshift

AWS Big Data

FEBRUARY 21, 2024

Towards the end of 2022, AWS announced the general availability of real-time streaming ingestion to Amazon Redshift for Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (Amazon MSK) , eliminating the need to stage streaming data in Amazon Simple Storage Service (Amazon S3) before ingesting it into Amazon Redshift.

Analytics

Analytics Data-driven Management Data Integration

Build an end-to-end serverless streaming pipeline with Apache Kafka on Amazon MSK using Python

AWS Big Data

MARCH 21, 2024

AWS offers multiple serverless services like Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon Data Firehose , Amazon DynamoDB , and AWS Lambda that scale automatically depending on your needs. Create a serverless Kafka cluster on Amazon MSK We use Amazon MSK to ingest real-time telemetry data from modems.

Data Lake

Data Lake Management Modeling Optimization

I Actually Chatted with ChatGPT

O'Reilly on Data

JANUARY 16, 2024

I’m personally interested in this topic since I am a professor who researches human-computer interaction, user experience design, and cognitive science , so AI voice interfaces are fascinating to me. That in turn got me curious about the concept of syndication in the television business, so ChatGPT dived more into this topic.

Interactive

Interactive Recreation/Entertainment Visualization IT

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

AWS Big Data

SEPTEMBER 22, 2023

In this post, we will describe how and why we decided to migrate from self-managed Kafka to Amazon Managed Streaming for Apache Kafka ( Amazon MSK ). We’ll start with an overview of our self-managed Kafka, why we chose to migrate to Amazon MSK, and ultimately how we did it. in “newkafka.”

Management

Management Metrics Cost-Benefit Data Lake

Build event-driven architectures with Amazon MSK and Amazon EventBridge

AWS Big Data

SEPTEMBER 28, 2023

In EDAs, modern event brokers, such as Amazon EventBridge and Apache Kafka, play a key role to publish and subscribe to events. There are two ways to send events from Apache Kafka to EventBridge: the preferred method using Amazon EventBridge Pipes or the EventBridge sink connector for Kafka Connect.

Data-driven

Data-driven Metrics Publishing Management

How SOCAR handles large IoT data with Amazon MSK and Amazon ElastiCache for Redis

AWS Big Data

MAY 3, 2023

In this post, we provide a detailed overview of streaming messages with Amazon Managed Streaming for Apache Kafka (Amazon MSK) and Amazon ElastiCache for Redis , covering technical aspects and design considerations that are essential for achieving optimal results. The following figure shows an example of the data flow at SOCAR.

IoT

IoT Internet of Things Data Transformation Management

Amazon MSK IAM authentication now supports all programming languages

AWS Big Data

NOVEMBER 13, 2023

The AWS Identity and Access Management (IAM) authentication feature in Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports all programming languages. Both Amazon MSK provisioned and serverless cluster types support the new Amazon MSK IAM expansion to all programming languages.

Testing

Testing Management Consulting IT

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

AWS Big Data

DECEMBER 13, 2023

This post showcases how to use streaming ingestion to bring data to Amazon Redshift. It’s simple to set up, and directly ingests streaming data into your data warehouse from Amazon Kinesis Data Streams and Amazon Managed Streaming for Kafka ( Amazon MSK ) without the need to stage in Amazon Simple Storage Service (Amazon S3).

Data Warehouse

Data Warehouse Snapshot Data Processing Internet of Things

Webinar Summary: Agile, DataOps, and Data Team Excellence

DataKitchen

APRIL 11, 2024

The hosted by Christopher Bergh with Gil Benghiat from DataKitchen covered a comprehensive range of topics centered around improving the performance and efficiency of data teams through Agile and DataOps methodologies.

Statistics

Statistics Manufacturing Data Processing Data Quality

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

dbt enables you to write SQL select statements, and then it manages turning these select statements into tables or views in Amazon Redshift. Queues and topics – Queues and topics come from various integration applications that generate data in real time.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

Amazon Redshift , a warehousing service, offers a variety of options for ingesting data from diverse sources into its high-performance, scalable environment. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Securely process near-real-time data from Amazon MSK Serverless using an AWS Glue streaming ETL job with IAM authentication

AWS Big Data

SEPTEMBER 13, 2023

To address these issues effectively, we propose using Amazon Managed Streaming for Apache Kafka (Amazon MSK), a fully managed Apache Kafka service that offers a seamless way to ingest and process streaming data. Following the data processing, the streaming job stores data in Amazon S3 and generates a Data Catalog table.

Data Processing

Data Processing Management Interactive Metadata

Introducing self-managed data sources for Amazon OpenSearch Ingestion

AWS Big Data

JULY 1, 2024

Enterprise customers increasingly adopt Amazon OpenSearch Ingestion (OSI) to bring data into Amazon OpenSearch Service for various use cases. These sources can either be on Amazon Elastic Compute Cloud (Amazon EC2) or on-premises environments. Name resolution for data sources – OSI uses an Amazon Route 53 resolver.

Management

Management Data Processing Publishing Analytics

Nexthink scales to trillions of events per day with Amazon MSK

AWS Big Data

MARCH 29, 2024

In this post, Nexthink shares how Amazon Managed Streaming for Apache Kafka (Amazon MSK) empowered them to achieve massive scale in event processing. With Amazon MSK, Nexthink now seamlessly processes trillions of events per day, reaching over 5 GB per second of aggregated throughput.

Data-driven

Data-driven Cost-Benefit Metrics Management

Introducing support for Apache Kafka on Raft mode (KRaft) with Amazon MSK clusters

AWS Big Data

MAY 29, 2024

Organizations are adopting Apache Kafka and Amazon Managed Streaming for Apache Kafka (Amazon MSK) to capture and analyze data in real time. Since its inception, Apache Kafka has depended on Apache Zookeeper for storing and replicating the metadata of Kafka brokers and topics. Starting from Apache Kafka version 3.3,

Metadata

Metadata Cost-Benefit Management Big Data

Build streaming data pipelines with Amazon MSK Serverless and IAM authentication

AWS Big Data

SEPTEMBER 6, 2023

Amazon’s serverless Apache Kafka offering, Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless , is attracting a lot of interest. At the time of writing, the Amazon MSK library for IAM is exclusive to Kafka libraries in Java, creating a challenge for users of other programming languages.

Testing

Testing Metadata Cost-Benefit Internet of Things

Deep dive on Amazon MSK tiered storage

AWS Big Data

JUNE 6, 2023

This post explains how the underlying infrastructure affects Kafka performance when you use Amazon Managed Streaming for Apache Kafka (Amazon MSK) tiered storage. We delve deep into the core components of Amazon MSK tiered storage and address questions such as: How does read and write work in a tiered storage-enabled cluster?

Metadata

Metadata Optimization Management Metrics

Externalize Amazon MSK Connect configurations with Terraform

AWS Big Data

SEPTEMBER 19, 2023

Managing configurations for Amazon MSK Connect , a feature of Amazon Managed Streaming for Apache Kafka (Amazon MSK), can become challenging, especially as the number of topics and configurations grows. The challenges lie in the overhead of managing configurations, as well as dealing with patching and upgrades.

Data-driven

Data-driven Management Optimization Data Processing

AWS Glue mutual TLS authentication for Amazon MSK

AWS Big Data

AUGUST 7, 2024

Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed Apache Kafka service. For more information about using IAM authentication, refer to Securely process near-real-time data from Amazon MSK Serverless using an AWS Glue streaming ETL job with IAM authentication. Create a Kafka connection in AWS Glue.

Metadata

Metadata Visualization Internet of Things Management

Build multi-Region resilient Apache Kafka applications with identical topic names using Amazon MSK and Amazon MSK Replicator

Migrate from Amazon Kinesis Data Analytics for SQL to Amazon Managed Service for Apache Flink and Amazon Managed Service for Apache Flink Studio

Webinars

Trending Sources

What is the Difference Between Data Science and Machine Learning?

Webinars

How Does Amazon SNS Work?

Fitch Group achieves multi-Region resiliency for mission-critical Kafka infrastructure with Amazon MSK Replicator

5 key areas for tech leaders to watch in 2020

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

Elevate your search and analytics skills with the new Amazon OpenSearch Service YouTube channel

Amazon OpenSearch Service launches the next-generation OpenSearch UI

How REA Group approaches Amazon MSK cluster capacity planning

Migrate from Standard brokers to Express brokers in Amazon MSK using Amazon MSK Replicator

Use Amazon Kinesis Data Streams to deliver real-time data to Amazon OpenSearch Service domains with Amazon OpenSearch Ingestion

Infor’s Velocity Summit Highlights Multiple Advances and Enhancements

Unbundling the Graph in GraphRAG

How EchoStar ingests terabytes of data daily across its 5G Open RAN network in near real-time using Amazon Redshift Serverless Streaming Ingestion

Take Your SQL Skills To The Next Level With These Popular SQL Books

Improve Apache Kafka scalability and resiliency using Amazon MSK tiered storage

How to Set AI Goals

How VMware Tanzu CloudHealth migrated from self-managed Kafka to Amazon MSK

AI will evolve the role of the CIO

OpenSearch UI: Six months in review

Stream multi-tenant data with Amazon MSK

Introducing Amazon MSK as a source for Amazon OpenSearch Ingestion

Core technologies and tools for AI, big data, and cloud computing

The Top 20 Data Visualization Books That Should Be On Your Bookshelf

Simplify data streaming ingestion for analytics using Amazon MSK and Amazon Redshift

Build an end-to-end serverless streaming pipeline with Apache Kafka on Amazon MSK using Python

I Actually Chatted with ChatGPT

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

Build event-driven architectures with Amazon MSK and Amazon EventBridge

How SOCAR handles large IoT data with Amazon MSK and Amazon ElastiCache for Redis

Amazon MSK IAM authentication now supports all programming languages

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

Webinar Summary: Agile, DataOps, and Data Team Excellence

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Amazon Redshift data ingestion options

Securely process near-real-time data from Amazon MSK Serverless using an AWS Glue streaming ETL job with IAM authentication

Introducing self-managed data sources for Amazon OpenSearch Ingestion

Nexthink scales to trillions of events per day with Amazon MSK

Introducing support for Apache Kafka on Raft mode (KRaft) with Amazon MSK clusters

Build streaming data pipelines with Amazon MSK Serverless and IAM authentication

Deep dive on Amazon MSK tiered storage

Externalize Amazon MSK Connect configurations with Terraform

AWS Glue mutual TLS authentication for Amazon MSK

Stay Connected