Kafka主题有分区，其领导者为-1（Kafka领导者选举），而节点正在运行。

Question

Kafka主题有分区，其领导者为-1（Kafka领导者选举），而节点正在运行。

10

我有一个由3个成员组成的kafka集群，其中__consumer_offsets主题有50个分区。

以下是对命令的描述结果：

root@kafka-cluster-0:~# kafka-topics.sh --zookeeper localhost:2181 --describe
Topic:__consumer_offsets    PartitionCount:50   ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
    Topic: __consumer_offsets   Partition: 0    Leader: 1   Replicas: 1 Isr: 1
    Topic: __consumer_offsets   Partition: 1    Leader: -1  Replicas: 2 Isr: 2
    Topic: __consumer_offsets   Partition: 2    Leader: 0   Replicas: 0 Isr: 0
    Topic: __consumer_offsets   Partition: 3    Leader: 1   Replicas: 1 Isr: 1
    Topic: __consumer_offsets   Partition: 4    Leader: -1  Replicas: 2 Isr: 2
    Topic: __consumer_offsets   Partition: 5    Leader: 0   Replicas: 0 Isr: 0
    ...
    ...

会员是节点0、1和2。

显然，在复制=2的分区中，它们没有设置领导者，其领导者=-1

我想知道是什么原因导致了这个问题，我重启了第二个成员kafka服务，但我从未想过它会有这种副作用。

现在，所有节点都已经运行了数小时，这是ls broker/ids的结果：

/home/kafka/bin/zookeeper-shell.sh localhost:2181 <<< "ls /brokers/ids"
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is disabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[0, 1, 2]

此外，集群中有许多主题，节点2没有成为任何一个主题的领导者，在它只有数据的地方（复制因子=1，并且该分区托管在此节点上），leader=-1，如下所示。

Here, node 2 is in the ISR, but never a leader, since replication-factor=2.
Topic:upstream-t2   PartitionCount:20   ReplicationFactor:2 Configs:retention.ms=172800000,retention.bytes=536870912
    Topic: upstream-t2  Partition: 0    Leader: 1   Replicas: 1,2   Isr: 1,2
    Topic: upstream-t2  Partition: 1    Leader: 0   Replicas: 2,0   Isr: 0
    Topic: upstream-t2  Partition: 2    Leader: 0   Replicas: 0,1   Isr: 0
    Topic: upstream-t2  Partition: 3    Leader: 0   Replicas: 1,0   Isr: 0
    Topic: upstream-t2  Partition: 4    Leader: 1   Replicas: 2,1   Isr: 1,2
    Topic: upstream-t2  Partition: 5    Leader: 0   Replicas: 0,2   Isr: 0
    Topic: upstream-t2  Partition: 6    Leader: 1   Replicas: 1,2   Isr: 1,2


Here, node 2 is the only partition some chunks of data are hosted on, but leader=-1.
Topic:upstream-t20  PartitionCount:10   ReplicationFactor:1 Configs:retention.ms=172800000,retention.bytes=536870912
    Topic: upstream-t20 Partition: 0    Leader: 1   Replicas: 1 Isr: 1
    Topic: upstream-t20 Partition: 1    Leader: -1  Replicas: 2 Isr: 2
    Topic: upstream-t20 Partition: 2    Leader: 0   Replicas: 0 Isr: 0
    Topic: upstream-t20 Partition: 3    Leader: 1   Replicas: 1 Isr: 1
    Topic: upstream-t20 Partition: 4    Leader: -1  Replicas: 2 Isr: 2

非常感谢您的帮助，需要解决领导人未被选举的问题。此外，了解可能对我的经纪人行为产生的任何影响也很重要。

编辑---

Kafka版本：1.1.0（2.12-1.1.0）有可用的空间，例如800GB的免费磁盘。日志文件相当正常，在节点2上，以下是日志文件的最后10行。如果有特别要查找的内容，请告诉我。

[2018-12-18 10:31:43,828] INFO [Log partition=upstream-t14-1, dir=/var/lib/kafka] Rolled new log segment at offset 79149636 in 2 ms. (kafka.log.Log)
[2018-12-18 10:32:03,622] INFO Updated PartitionLeaderEpoch. New: {epoch:10, offset:6435}, Current: {epoch:8, offset:6386} for Partition: upstream-t41-8. Cache now contains 7 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-12-18 10:32:03,693] INFO Updated PartitionLeaderEpoch. New: {epoch:10, offset:6333}, Current: {epoch:8, offset:6324} for Partition: upstream-t41-3. Cache now contains 7 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-12-18 10:38:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 10:40:04,831] INFO Updated PartitionLeaderEpoch. New: {epoch:10, offset:6354}, Current: {epoch:8, offset:6340} for Partition: upstream-t41-9. Cache now contains 7 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-12-18 10:48:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 10:58:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 11:05:50,770] INFO [ProducerStateManager partition=upstream-t4-17] Writing producer snapshot at offset 3086815 (kafka.log.ProducerStateManager)
[2018-12-18 11:05:50,772] INFO [Log partition=upstream-t4-17, dir=/var/lib/kafka] Rolled new log segment at offset 3086815 in 2 ms. (kafka.log.Log)
[2018-12-18 11:07:16,634] INFO [ProducerStateManager partition=upstream-t4-11] Writing producer snapshot at offset 3086497 (kafka.log.ProducerStateManager)
[2018-12-18 11:07:16,635] INFO [Log partition=upstream-t4-11, dir=/var/lib/kafka] Rolled new log segment at offset 3086497 in 1 ms. (kafka.log.Log)
[2018-12-18 11:08:15,803] INFO [ProducerStateManager partition=upstream-t4-5] Writing producer snapshot at offset 3086616 (kafka.log.ProducerStateManager)
[2018-12-18 11:08:15,804] INFO [Log partition=upstream-t4-5, dir=/var/lib/kafka] Rolled new log segment at offset 3086616 in 1 ms. (kafka.log.Log)
[2018-12-18 11:08:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)

编辑2 ----

我已经停止了领导者Zookeeper实例，现在第二个Zookeeper实例被选为领导者！这样，未选择领导者的问题现在得到解决！

虽然我不知道可能出了什么问题，但是任何关于“为什么更改Zookeeper领导者会解决未选择领导者问题”的想法都非常欢迎！

谢谢！

- SpiXel

哪个Kafka版本？Broker 2驱动器上是否有可用空间？日志文件显示什么？如果这是一个生产集群，您应该考虑增加__consumer_offsets的复制因子。 - Gery

你是否在server.properties中明确设置了broker.id？你是如何安装Kafka的？也许将第二个broker配置文件与其他人区别开来可以帮助某些人重现这个问题。 - OneCricketeer

@SpiXel 在日志文件中使用grep查找错误。识别控制器并检查其controller.log，可能是leader选举任务崩溃了。检查broker 2是否仍然可以从其他broker中看到。 - Gery

1

@cricket_007 是的，它们在server.properties中被明确设置。Kafka二进制文件从网站下载并提取到用户的主目录，一切都由提供的脚本运行。配置文件的差异仅限于：broker.id 和 advertised.listeners。重新启动leader zookeeper服务（更改领导者）实际上修复了选举过程，但我很难弄清楚为什么集群可能会进入那种状态。 - SpiXel

我不确定。有时候我们会安装代理，它们会报告具有不同生成的自动生成的集群ID，然后我们再次重启它，这样就可以了。 - OneCricketeer

2

就这件事而言，我建议增加偏移量主题的复制因子。 - OneCricketeer

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dennis Jaheruddin · Answer 1

尽管根本原因从未被确定，但提问者似乎找到了解决方案：

我已经停止了领导者 Zookeeper 实例，现在第二个 Zookeeper 实例被选为领导者！这样，未被选择的领导者问题现在得到了解决！