Zookeeper服务器未运行。

8

我正在尝试从Ambari启动一个HBase主节点。

它无法启动,因为无法连接到Zookeeper服务器。

Ambari将所有Zookeeper服务器(3个节点)标记为运行中。

运行Zookeeper服务器应用程序的应用服务器(tomcat?)似乎正常运行;至少有一个在指定端口上侦听的服务。

但是该应用程序无法连接到其他节点,似乎无法启动。

所有连接都以错误消息“ZooKeeperServer not running”关闭,客户端会显示“zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket”的错误信息。

这是这些节点的Zookeeper服务器日志输出(所有节点都是相同的日志,只有节点名称不同):

2016-03-31 16:15:34,550 - INFO  [main:QuorumPeerConfig@103] - Reading configuration from: /usr/hdp/current/zookeeper-server/conf/zoo.cfg
2016-03-31 16:15:34,553 - INFO  [main:QuorumPeerConfig@338] - Defaulting to majority quorums
2016-03-31 16:15:34,557 - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 30
2016-03-31 16:15:34,557 - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 24
2016-03-31 16:15:34,558 - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
2016-03-31 16:15:34,565 - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
2016-03-31 16:15:34,566 - INFO  [main:QuorumPeerMain@127] - Starting quorum peer
2016-03-31 16:15:34,573 - INFO  [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2016-03-31 16:15:34,582 - INFO  [main:QuorumPeer@992] - tickTime set to 2000
2016-03-31 16:15:34,582 - INFO  [main:QuorumPeer@1012] - minSessionTimeout set to -1
2016-03-31 16:15:34,582 - INFO  [main:QuorumPeer@1023] - maxSessionTimeout set to -1
2016-03-31 16:15:34,582 - INFO  [main:QuorumPeer@1038] - initLimit set to 10
2016-03-31 16:15:34,598 - INFO  [Thread-2:QuorumCnxManager$Listener@506] - My election bind port: sg1.imatiasl.lan/127.0.0.1:3888
2016-03-31 16:15:34,607 - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@747] - LOOKING
2016-03-31 16:15:34,608 - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@815] - New election. My id =  1, proposed zxid=0x0
2016-03-31 16:15:34,609 - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (
n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2016-03-31 16:15:34,612 - WARN  [WorkerSender[myid=1]:QuorumCnxManager@383] - Cannot open channel to 2 at election address sg2.imatiasl.lan/10.7.0.93:3888
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
        at java.lang.Thread.run(Thread.java:745)
2016-03-31 16:15:34,614 - WARN  [WorkerSender[myid=1]:QuorumCnxManager@383] - Cannot open channel to 3 at election address sg3.imatiasl.lan/10.7.0.94:3888
java.net.ConnectException: Conexión rehusada
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
        at java.lang.Thread.run(Thread.java:745)
2016-03-31 16:15:34,812 - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383] - Cannot open channel to 2 at election address sg2.imatiasl.la
n/10.7.0.93:3888
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
        at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:795)
2016-03-31 16:15:34,813 - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383] - Cannot open channel to 3 at election address sg3.imatiasl.la
n/10.7.0.94:3888
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
        at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:795)
2016-03-31 16:15:34,813 - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 400

客户端尝试连接时:
2016-03-31 16:15:35,086 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.7.0.93:55914
2016-03-31 16:15:35,130 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOExcep
tion: ZooKeeperServer not running
2016-03-31 16:15:35,130 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.7.0.93:55914 (no ses
sion established for client)

还有其他的问题吗?

如何解决这个问题呢?

4个回答

5
你的选举端口正在绑定到sgX.imatiasl.lan/127.0.0.1:3888,这适用于所有节点,所以当客户端尝试连接到sgY.imatiasl.lan/10.7.0.93:3888时会失败。
选举端口应该绑定到0.0.0.0:3888或每个节点的真实IP地址,但由于某些原因它们被解析为127.0.0.1。您可以使用netstat -patun在每个节点上检查IP:端口以确认此问题。
很可能你的/etc/hosts文件存在问题。 请参考:https://unix.stackexchange.com/questions/240506/zookeeper-dns-name-problems-with-leader-elections-when-migrating-from-windows-to

谢谢!我在每个 sgX 的 /etc/hosts 中将 sgX 绑定到 127.0.0.1(顺便解决了集群设置期间的一些问题,但我甚至都不记得了),这就是问题所在。 - NotGaeL
现在我可以顺利地启动Zookeeper和HBase RegionServer,但是HBase主节点还在抵抗。我在checked_call['curl -sS -L -w '%{http_code}' -X GET 'http://sg1.imatiasl.lan:50070/webhdfs/v1/apps/hbase/data?op=GETFILESTATUS&user.name=hdfs''] {'logoutput': None, 'user': 'hdfs', 'stderr': -1, 'quiet': False}上遇到了UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 58: ordinal not in range(128)的问题。请问您知道我该如何解决吗? - NotGaeL
由于RESTful API调用中的非ASCII输入,HBase主机无法启动。 - NotGaeL
我不使用Ambari,也不懂Python,但这似乎与'ñ'和重音符有关。尝试在任何地方删除它们,或者使用unicode u'..'字符串。我无法推断更多的信息 :( http://salvatorelab.es/2013/12/unicodedecodeerror-ascii-codec-cant-decode-byte-0xc3-in-position-2-ordinal-not-in-range128/ - Alfonso Nishikawa
正如NotGael所说的那样,请注意,如果您在/ etc / hosts中绑定了一些主机名到IP(例如127.0.1.1),即使有多个服务器,Zookeeper也将无法建立服务器2服务器连接。 - quazardous

1

我在Kubernetes环境中遇到了类似的问题。虽然我找不到为什么会发生这种情况的合适解释,但是一次性重启所有ZooKeeper实例解决了我的问题。

如果有人有更好的见解,我很乐意听取。


0

2016-03-31 16:15:34,813 - 警告 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383] - 无法打开到选举地址 sg3.imatiasl.la 上的节点 3 的通道 你的一个节点拒绝连接。


所有我的节点都拒绝连接。这个节点无法连接到另外两个节点,而那两个节点也无法连接到这个节点或彼此之间。当我尝试启动zookeeper时会出现这种情况。Ambari显示它正在正确启动,但是正如您在日志中看到的那样,它并没有成功启动。您知道原因吗? - NotGaeL

0

尝试在您的节点上执行“jps”命令,并查看zookeeper服务是否已启动,如果没有,请启动它。


我以用户“zookeeper”的身份运行了“jps -l”命令,并得到了“612 org.apache.zookeeper.server.quorum.QuorumPeerMain”的输出,所以我想这就是它了。现在我该怎么办? - NotGaeL
你能重启一下吗?或者杀掉这些进程吗? 我想你只希望运行一个Zookeeper服务,你是否运行了任何初始化ZK的脚本,可能其中有一些错误的循环呢? - 15412s

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接