Multi-threaded Apache Kafka Consumer



Why do we need multi-thread consumer model? 

Suppose we implement a notification module which allow users to subscribe for notifications from other users, other applications.

Our module reads messages which will be written by other users, applications to a Kafka clusters. In this case, we can get all notifications of the others written to a Kafka topic and our module will create a consumer to subscribe to that topic.  

Everything seems to be fine at the beginning. However, what will happen if the number of notifications produced by other applications, users is increased fast and exceed the rate that can be processed by our module?  

All the messages/notifications that haven’t been processed by our module, are still in the Kafka topic. However, things get more danger when the number of messages is too much. Some of them will be lost when the retention policy is met (Note that Kafka retention policy can be time-based, partition size-based, key-based). And more important, when our notification module falls very far behind the income notifications/messages, it is not a true “notification” module anymore.

It’s time to think about the multi-thread consumer model.


Multi-threaded Apache Kafka consumer model

There are 2 possible models.  

  • Multiple consumers with their own threads - model 1

  • Single consumer, multiple worker processing threads - model 2


Test Multi-threaded Apache Kafka Consumer


Conclusion 

Multiple consumers with their own threads. The model is easy to implement. The total of consumers is limited the total partitions of the topic. The redundant consumers may not be used.

Single consumer, multiple worker processing threads. The model is flexible in scale out the number of processing thread. But it’s not easy to implement in-order processing on per partition. Let’s say there are 2 messages on the same partitions being processed by 2 different threads. To guarantee the order, those 2 threads must be coordinated somehow.

Comments

Popular posts from this blog

Debug Java Stream in Intellij Idea

Creating Efficient Docker Images with Spring Boot 2.3

CRUD Goes Even Easier With JPABuddy