This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
In our previous blog, we talked about the four paths to Cloudera Data Platform. . If you haven’t read that yet, we invite you to take a moment and run through the scenarios in that blog. As we touched on in the previous blog, the decision to upgrade or migrate may seem difficult to evaluate at first glance. In-place Upgrade.
ApacheFlink is a framework and distributed processing engine for stateful computations over data streams. Amazon Kinesis Data Analytics for ApacheFlink is a fully managed service that enables you to use an ApacheFlink application to process streaming data. Window the images into a collection of records.
Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. It adds tables to compute engines including Spark, Trino, PrestoDB, Flink, and Hive using a high-performance table format that works just like a SQL table.
This blog is for anyone who was interested but unable to attend the conference, or anyone interested in a quick summary of what happened there. Flink is here to stay. It makes perfect sense that ApacheFlink has emerged as the standard. I will cover key takeaways from Current 2023 and offer Cloudera’s perspective.
Flink SQL is a data processing language that enables rapid prototyping and development of event-driven and streaming applications. Flink SQL combines the performance and scalability of ApacheFlink, a popular distributed streaming platform, with the simplicity and accessibility of SQL. You can view the code here.
As an example of this, in this post we look at Real Time Data Warehousing (RTDW), which is a category of use cases customers are building on Cloudera and which is becoming more and more common amongst our customers. Deep Dive into General Purpose RTDW , featuring Apache Kudu, Apache Impala, and Apache NiFi.
However, migrating an existing data lake to a new table format such as Apache Iceberg can bring significant technical and organizational challenges Natural Intelligence (NI) is a world leader in multi-category marketplaces. Recently, NI embarked on a journey to transition their legacy data lake from Apache Hive to Apache Iceberg.
We organize all of the trending information in your field so you don't have to. Join 42,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content