Handling Different Kafka Message Versions

I was in a job interview a while ago and one of the problems raised there was handling different message versions in the same topic.

The problem was described as follows:

You have a producer of version 1.0. It sends a message in a topic that gets to a consumer of version 1.0.

A need arise to upgrade our message to version 2 (added features – our company’s growing). But we want to have the same topic and not create a new topic for every new version.

We also need to support old messages, because some customers do not update to the new version.

The outcome is a system that sends messages with multiple versions (1.0, 2.0, 2.2, 3.0 etc.).

Figure 1 illustrates how this looks like.

Figure 1: Multiple producers to multiple consumer versions. Consumer is actually a consumer group, so they scale in order to take the load.

While that’s not a problem in itself, a small problem arise. Assuming each message is handled by a different consumer (consumer being a nodejs microservice for instance) – how would you know to which consumer to send each message?

If you send the same message to all – it is relevant only to one of the consumers eventually. The heavy load on all of them might make your load balancer work overtime because each consumer group will receive all messages even though they are not relevant.

While the “bad” consumers would just quit when they find out they did not get a relevant message, creating an error handler for such a system can become quite a mess. In this case we are waiting for a single “I handled it” response from multiple consumer groups. The simpler case we would prefer would be to wait for a single response from one consumer stating if the process succeeded or not.

Let’s summarise our problems. Kafka just sends the topic to all consumers, even those that cannot handle that message. The naive solution would be to create multiple consumes – one per version – that handle the messages.

This creates another problem in regards to traffic and resources – the raising of so many consumers. In addition, we increase the complexity in error handling.

Kafka has no mechanism (at least not one I heard of) that directs traffic according to content.

One Consumer – Dynamic Handlers

The solution turned out to be pretty simple

Instead of creating multiple consumers – we create one consumer that can dynamically handle the correct version. This way, we just need one consumer group, and the message is always handled correctly (in regards to version).

Here’s a simple code that does that:

function handleMessage(msg) {
      const { version } = msg.properties.headers;
    
      let handler;
      switch (version) {
        case '1.0':
          handler = require('./handlers/1.0');
          break;
        case '1.1':
          handler = require('./handlers/1.1');
          break;
        case '2.0':
          handler = require('./handlers/2.0');
          break;
        case '2.2':
          handler = require('./handlers/2.2');
          break;
        case '3.0':
          handler = require('./handlers/3.0');
          break;
        default:
          return notifyErrorAndExit(new Error('unknown version'), msg);
      }
      return handler(version, msg);
    }

The handle message function gets the correct handler according to the version – and activates it on the message received.

Here’s a simple example of using this handler:

kafkaConsumer.start({
    groupId,
    topicsList: [TOPIC_1],
    onData: handleMessage,
    onError: notifyErrorAndExit
  })

The kafka consumer starts, listens to topic 1 and when data is received, it uses handle message to handle the data.

Figure 2 shows the much simpler architecture we have now.

Figure 2: Our multiple consumers are now one dynamic consumer inside a load balanced group.

Summary

In this article we saw how we can reduce the complexity of a system using a very simple mechanism.

Our need was to change message versions while keeping backward compatibility. The naive approach of creating a new service that handles the new message created some problems and added a lot of complexity.

Eventually, the solution was to keep to one service that dynamically called the correct data handler.

Thanks to Piotr from xFAANG for the kind review. Remember to visit their project AskQL 😁

Sign up to my newsletter to enjoy more content:

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments