Blog - Data Leaders Brief

category apache-kafka

Blog

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Read the complete blog below for a more detailed description of the vendors and their capabilities. Because it is such a new category, both overly narrow and overly broad definitions of DataOps abound. Apache Oozie — An open-source workflow scheduler system to manage Apache Hadoop jobs. DataOps is a hot topic in 2021.

Testing

Testing Machine Learning Consulting Data Science

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

AWS Big Data

SEPTEMBER 22, 2023

In our infrastructure, Apache Kafka has emerged as a powerful tool for managing event streams and facilitating real-time data processing. At Stitch Fix, we have used Kafka extensively as part of our data infrastructure to support various needs across the business for over six years.

Management

Management Metrics Cost-Benefit Data Lake

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

An Overview of Real Time Data Warehousing on Cloudera

Cloudera

NOVEMBER 2, 2020

As an example of this, in this post we look at Real Time Data Warehousing (RTDW), which is a category of use cases customers are building on Cloudera and which is becoming more and more common amongst our customers. Deep Dive into General Purpose RTDW , featuring Apache Kudu, Apache Impala, and Apache NiFi.

Data Warehouse

Data Warehouse Dashboards Optimization Interactive

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

NiFi as a Function in DataFlow Service

Cloudera

NOVEMBER 16, 2021

With the general availability of Cloudera DataFlow for the Public Cloud (CDF-PC) , our customers can now self-serve deployments of Apache NiFi data flows on Kubernetes clusters in a cost effective way providing auto scaling, resource isolation and monitoring with KPI-based alerting. Functions as a Service.

KPI

KPI Data-driven IoT Optimization

Nexthink scales to trillions of events per day with Amazon MSK

AWS Big Data

MARCH 29, 2024

In this post, Nexthink shares how Amazon Managed Streaming for Apache Kafka (Amazon MSK) empowered them to achieve massive scale in event processing. Furthermore, the absence of a streaming platform like Kafka created dependencies between teams through tight HTTP/gRPC coupling.

Data-driven

Data-driven Cost-Benefit Metrics Management

5 Key Takeaways from #Current2023

Cloudera

OCTOBER 17, 2023

Recently, Confluent hosted Current 2023 (formerly Kafka summit) in San Jose on Sept 26th and 27th. This blog is for anyone who was interested but unable to attend the conference, or anyone interested in a quick summary of what happened there. More of a Confluent conference now than a kafka conference. Flink is here to stay.

Data-driven

Data-driven Enterprise IoT Data Warehouse

Streaming Edge Data Collection and Global Data Distribution

Cloudera

JUNE 9, 2022

In the first blog of the Universal Data Distribution blog series , we discussed the emerging need within enterprise organizations to take control of their data flows. In this second installment of the Universal Data Distribution blog series, we will discuss a few different data distribution use cases and deep dive into one of them. .

Data Collection

Data Collection IoT Data Lake Unstructured Data

Data governance beyond SDX: Adding third party assets to Apache Atlas

Cloudera

MARCH 9, 2021

In this blog, we’ll highlight the key CDP aspects that provide data governance and lineage and show how they can be extended to incorporate metadata for non-CDP systems from across the enterprise. Apache Atlas as a fundamental part of SDX. The example 1_typedef-server.json describes the server typedef used in this blog. .

Data Governance

Data Governance Metadata Enterprise Data Processing

Real-time inference using deep learning within Amazon Kinesis Data Analytics for Apache Flink

AWS Big Data

JUNE 1, 2023

Apache Flink is a framework and distributed processing engine for stateful computations over data streams. Amazon Kinesis Data Analytics for Apache Flink is a fully managed service that enables you to use an Apache Flink application to process streaming data. Window the images into a collection of records.

Deep Learning

Deep Learning Data Analytics Analytics Machine Learning

What’s new in CDP Private Cloud Base 7.1.6?

Cloudera

APRIL 15, 2021

In this blog we will cover the new features in the 7.1.6 delivers benefits in the following categories: Better Upgrade Support . Added support for standalone NiFi/Kafka clusters. Operational Database – Apache Phoenix 5.1. We’ve released Apache Phoenix 5.1 Full support for Apache Omid . and HDP 2.6.5.

Data Warehouse

Data Warehouse Cost-Benefit Management Data Processing

DevOps Interview Prep Guide

Insight

AUGUST 12, 2019

For a good overview of what DevOps entails and how to transition, check out this blog post. The activities within each category are ranked more or less in order of importance as well. Example questions: Given an Apache web server log, how many requests are made per day? How do you ace your DevOps interview?

Software

Software Data-driven Testing Interactive

2020 Data Impact Award Winner Spotlight: Globe Telecom

Cloudera

DECEMBER 9, 2020

Entrants in this award category are so important to recognize because of how they tie every piece of their data strategy together. The post 2020 Data Impact Award Winner Spotlight: Globe Telecom appeared first on Cloudera Blog.

Predictive Modeling

Predictive Modeling Data Warehouse Enterprise Data Strategy

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

AWS Big Data

APRIL 28, 2025

However, migrating an existing data lake to a new table format such as Apache Iceberg can bring significant technical and organizational challenges Natural Intelligence (NI) is a world leader in multi-category marketplaces. Recently, NI embarked on a journey to transition their legacy data lake from Apache Hive to Apache Iceberg.

Data Lake

Data Lake Metadata Cost-Benefit Snapshot

The DataOps Vendor Landscape, 2021

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

Webinars

Trending Sources

An Overview of Real Time Data Warehousing on Cloudera

Webinars

NiFi as a Function in DataFlow Service

Nexthink scales to trillions of events per day with Amazon MSK

5 Key Takeaways from #Current2023

Streaming Edge Data Collection and Global Data Distribution

Data governance beyond SDX: Adding third party assets to Apache Atlas

Real-time inference using deep learning within Amazon Kinesis Data Analytics for Apache Flink

What’s new in CDP Private Cloud Base 7.1.6?

DevOps Interview Prep Guide

2020 Data Impact Award Winner Spotlight: Globe Telecom

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

Stay Connected