Kafka Consumer CLI Error: Timed out waiting for a node assignment

This post is more of a personal note than a perfect analysis. I have applied lot of workarounds and still not able to identify exactly what caused issue. Writing this blog so it will act as future reference if similar issue happens and add more details about the issue.

We were adding monitoring for Kafka consumer lag using kafka-consumer-groups.sh which is located under Kafka installation bin directory. We were able to get lag in all environment except one and script execution failed with below error ($KAFKA_BIN/kafka-consumer-groups.sh –describe –all-groups –bootstrap-server localhost:9092):

java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: metadata
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.getLogEndOffsets(ConsumerGroupCommand.scala:638)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$collectGroupsOffsets$7(ConsumerGroupCommand.scala:411)
at scala.collection.immutable.List.flatMap(List.scala:293)
at scala.collection.immutable.List.flatMap(List.scala:79)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$collectGroupsOffsets$2(ConsumerGroupCommand.scala:567)
at scala.collection.Iterator$$anon$9.next(Iterator.scala:575)
at scala.collection.mutable.Growable.addAll(Growable.scala:62)
at scala.collection.mutable.Growable.addAll$(Growable.scala:57)
at scala.collection.mutable.HashMap.addAll(HashMap.scala:117)
at scala.collection.mutable.HashMap$.from(HashMap.scala:589)
at scala.collection.mutable.HashMap$.from(HashMap.scala:582)
at scala.collection.MapOps$WithFilter.map(Map.scala:381)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.collectGroupsOffsets(ConsumerGroupCommand.scala:560)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.describeGroups(ConsumerGroupCommand.scala:367)
at kafka.admin.ConsumerGroupCommand$.run(ConsumerGroupCommand.scala:72)
at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:59)
at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: metadata

A quick search result on internet suggested to change listener configuration in server.properties. But listener configuration was good in our case. Upon further checking, found that there were few topics which were under replicated. This was due to missing brokers. This env being a dev env, brokers were added and removed few months ago as part of a POC. As couple of brokers to which partitions were assigned were no longer available, these topics were under replicated. Strangely, partition rebalance also didn’t work. To rule out this was causing issue, we deleted these topics and restarted cluster along with zookeeper. Still, consumer script failed with same error.

We found lot of zookeeper based consumers in this env but are missing in other envs. We did this analysis using Kafka Manager UI. This was strange as we were using Kafka based consumers. We have deleted Zookeeper based consumers using Zookeeper cli (cd $KAFKA_BIN; ./zookeeper-shell.sh localhost:2181; deleteall /consumers/<consumerName>) and restarted whole cluster including zookeeper. This improved situation a bit. Now, we were able to get individual consumer details ($KAFKA_BIN/kafka-consumer-groups.sh –describe –group <groupName>–bootstrap-server localhost:9092) for most of the consumer but for few. We were getting below error, when tried to run script for problematic groups:

java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Call(callName=listOffsets on broker 464, deadlineMs=1660909481213, tries=1, nextAllowedTryMs=1660909481320) timed out at 1660909481220 after 1 attempt(s)
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.getLogEndOffsets(ConsumerGroupCommand.scala:638)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$collectGroupsOffsets$7(ConsumerGroupCommand.scala:411)
at scala.collection.immutable.List.flatMap(List.scala:293)
at scala.collection.immutable.List.flatMap(List.scala:79)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$collectGroupsOffsets$2(ConsumerGroupCommand.scala:567)
at scala.collection.Iterator$$anon$9.next(Iterator.scala:575)
at scala.collection.mutable.Growable.addAll(Growable.scala:62)
at scala.collection.mutable.Growable.addAll$(Growable.scala:57)
at scala.collection.mutable.HashMap.addAll(HashMap.scala:117)
at scala.collection.mutable.HashMap$.from(HashMap.scala:589)
at scala.collection.mutable.HashMap$.from(HashMap.scala:582)
at scala.collection.MapOps$WithFilter.map(Map.scala:381)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.collectGroupsOffsets(ConsumerGroupCommand.scala:560)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.describeGroups(ConsumerGroupCommand.scala:367)
at kafka.admin.ConsumerGroupCommand$.run(ConsumerGroupCommand.scala:72)
at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:59)
at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala)
Caused by: org.apache.kafka.common.errors.TimeoutException: Call(callName=listOffsets on broker 464, deadlineMs=1660909481213, tries=1, nextAllowedTryMs=1660909481320) timed out at 1660909481220 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.DisconnectException: Cancelled listOffsets on broker 464 request with correlation id 10416 due to node 464 being disconnected

We tried to delete consumers but it was failing with below error:

$KAFKA_BIN/kafka-consumer-groups.sh –delete –group –bootstrap-server kafka-dev-app-a1:9092
Error: Deletion of some consumer groups failed:
Group ‘NotificationEmailGroup’ could not be deleted due to: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.GroupNotEmptyException: The group is not empty.

As there were active connections from consumers, we were unable to delete consumers. Unfortunately, we didn’t have the consuming application contacts to request them to stop consumer applications. As a quick workaround, we have taken tcpdump on broker nodes and identified ip address of these consumers and blocked those ips using Linux iptables command. Restarted cluster again and we were able to delete consumers and even the kafka-consumer-groups.sh script completed successfully.

As mentioned earlier, this is not a perfect document and be evolving. We will try to update when we find more useful information about the issue.

Posted In UncategorisedTagged In

Related Post