Elasticsearch集群 'master_not_discovered_exception'

Question

Elasticsearch集群 'master_not_discovered_exception'

22

我已经安装了 Elasticsearch 2.2.3，并将其配置为由两个节点组成的集群。

节点1（elasticsearch.yml）

cluster.name: my-cluster
node.name: node1
bootstrap.mlockall: true
discovery.zen.ping.unicast.hosts: ["ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com", "ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com"]
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.multicast.enabled: false
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
node.master: true
node.data: true
http.cors.enabled: true
script.inline: false
script.indexed: false
network.bind_host: 0.0.0.0

节点2（elasticsearch.yml）

cluster.name: my-cluster
node.name: node2
bootstrap.mlockall: true
discovery.zen.ping.unicast.hosts: ["ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com", "ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com"]
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.multicast.enabled: false
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
node.master: false
node.data: true
http.cors.enabled: true
script.inline: false
script.indexed: false
network.bind_host: 0.0.0.0

如果我运行curl -XGET 'http://localhost:9200/_cluster/state?pretty'，我将得到以下结果：

{
  "error" : {
    "root_cause" : [ {
      "type" : "master_not_discovered_exception",
      "reason" : null
    } ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}

节点1的日志记录如下：

[2016-06-22 13:33:56,167][INFO ][cluster.service          ] [node1] new_master {node1}{Vwj4gI3STr6saeTxKkSqEw}{127.0.0.1}{127.0.0.1:9300}{master=true}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-06-22 13:33:56,210][INFO ][http                     ] [node1] publish_address {127.0.0.1:9200}, bound_addresses {[::]:9200}
[2016-06-22 13:33:56,210][INFO ][node                     ] [node1] started
[2016-06-22 13:33:56,221][INFO ][gateway                  ] [-node1] recovered [0] indices into cluster_state

将日志写入节点2的日志中：

[2016-06-22 13:34:38,419][INFO ][discovery.zen            ] [node2] failed to send join request to master [{node1}{Vwj4gI3STr6saeTxKkSqEw}{127.0.0.1}{127.0.0.1:9300}{master=true}], reason [RemoteTransportException[[node2][127.0.0.1:9300][internal:discovery/zen/join]]; nested: IllegalStateException[Node [{node2}{_YUbBNx9RUuw854PKFe1CA}{127.0.0.1}{127.0.0.1:9300}{master=false}] not master for join request]; ]

错误在哪里？

- hellb0y77

我使用Netcat进行了检查，结果显示：Ncat: No route to host。 - Andre Leon Rangel

9个回答

11

Note: Elasticsearch将9300-9400端口预留给集群通信，而9200-9300端口则用于访问Elasticsearch API。

master not discovered异常的根本原因是节点无法在9300端口相互ping通，需要双向验证。例如node1应能够ping通node2的9300端口，反之亦然。

可通过简单的telnet进行确认。从node1开始，执行命令telnet node2 9300

如果成功，请尝试从node2执行命令telnet node1 9300

如果出现master not discovered异常，则以上至少一个telnet会失败。

如果您没有安装telnet，也可以使用curl命令进行测试。

希望这有所帮助。

- Sandeep Kanabar

6

这可能是主节点无法被发现的原因。如果 EC2 实例在同一 VPC 下，则请按以下方式在 /etc/elasticsearch/elasticsearch.yml 中提供私有 IP：

cluster.initial_master_nodes: ["<PRIVATE-IP"]

注意：在进行上述配置更改后，请重新启动 ElasticSearch 服务，例如，在 Ubuntu 操作系统中使用以下命令：sudo service elasticsearch stop 和 sudo service elasticsearch start。

- Abhishek katiyar

参数应包含属于集群的所有节点。假设node-1、node-2和node-3是您的节点，这里是您可以使用的设置：cluster.initial_master_nodes: ["node-1","node-2","node-3"] - Vishwas M.R

谢谢，这对于v7.16的cluster.initial_master_nodes: [_local_, _site_]有所帮助。 - SiZE

6

如果您正在使用 Elasticsearch 7

请更新位于/etc/elasticsearch目录下的elasticsearch.yml文件：

node.name: "node-1" 

network.host: ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com

http.port: 9200

cluster.initial_master_nodes: ["node-1"]

这里要求node.name和cluster.initial_master_nodes的第一个值相同。

- pratsy

对我来说似乎可以工作（注意：network.host被注释掉了，只能本地访问）。 - mike rodent

哇！这太棒了！ - Tebe

2

这里有很多设置项，其中一些你可能不想用（比如fielddata）或者不需要。此外，你明显是在使用AWS EC2实例，所以你应该使用cloud-aws插件（在ES 5.x 中拆分为单独的插件）。这将提供一个新的发现模型，你可以利用它而不是使用zen。

因此，对于每个节点，你都需要安装cloud-aws插件（假设你使用的是ES 2.x）：

$ bin/plugin install cloud-aws

一旦在每个节点上安装完成，您就可以使用它来利用discovery-ec2组件：

# Guarantee that the plugin is installed
plugin.mandatory: cloud-aws

# Discovery / AWS EC2 Settings
discovery
  type: ec2
  ec2:
    availability_zones: [ "us-east-1a", "us-east-1b" ]
    groups: [ "my_security_group1", "my_security_group2" ]

cloud:
  aws
    access_key: AKVAIQBF2RECL7FJWGJQ
    secret_key: vExyMThREXeRMm/b/LRzEB8jWwvzQeXgjqMX+6br
    region: us-east-1
  node.auto_attributes: true

# Bind to the network on whatever IP you want to allow connections on.
# You _should_ only want to allow connections from within the network
# so you only need to bind to the private IP
node.host: _ec2:privateIp_

# You can bind to all hosts that are possible to communicate with the
# node but advertise it to other nodes via the private IP (less
# relevant because of the type of discovery used, but not a bad idea).
#node:
#  bind_host: [ _ec2:privateIp_, _ec2:publicIp_, _ec2:publicDns_ ]
#  publish_host: _ec2:privateIp_

# Node-specific settings (note: nodes default to be master and data nodes)
node:
  name: node1
  master: true
  data: true

# Constant settings
cluster.name: my-cluster
bootstrap.mlockall: true

最终，你的问题在于由于可能源自于网络连接问题，你无法成功进行主节点选举。以上配置可以解决这些问题，但你还有另一个关键性问题：你错误地指定了 discovery.zen.minimum_master_nodes 设置。你有两个 合格的 主节点，但是你要求 Elasticsearch 仅需要一个来进行任何一项选举。这意味着每个合格的主节点都可以独立地决定他们有多数票，并因此分别选举自己（从而产生两个主节点和实际上的两个集群）。这是< strong >不好的。

你< strong >必须始终使用仲裁来设置该设置：< code>(M / 2)+1 ，向下取整，其中< code>M 是主< em>合格节点的数量。因此：

M = 2
(2 / 2) + 1 = (1) + 1 = 2

如果您有3、4或5个主节点，则应该是这样的：

M = 3
(3 / 2) + 1 = (1.5) + 1 = 2.5 => 2

M = 4
(4 / 2) + 1 = (2) + 1 = 3

M = 5
(5 / 2) + 1 = (2.5) + 1 = 3.5 => 3

因此，在您的情况下，您也应该设置以下内容：

discovery.zen.minimum_master_nodes: 2

请注意，您可以将此添加为另一行，或者您可以修改上面的发现块（这确实归结为YAML样式）：

discovery
  type: ec2
  ec2:
    availability_zones: [ "us-east-1a", "us-east-1b" ]
    groups: [ "my_security_group1", "my_security_group2" ]
  zen.minimum_master_nodes: 2

- pickypg

1

感谢提供 cloud-aws 插件的信息，但现在我需要使用这个配置。我已经设置了 discovery.zen.minimum_master_nodes: 2，但是现在节点1也出现了与节点2相同的错误。 - hellb0y77

我不确定为什么你需要使用那个配置，但你_可能_需要修复你的discovery.zen.ping.unicast.hosts列表，以使用这些EC2实例的私有IP。 - pickypg

1

非常感谢！更改 discovery.zen.minimum_master_nodes 对我有用。我使用了旧的 ES 配置，其中有一个不同的 minimum_master_nodes。我已经进行了更正。 - Srikanth Jeeva

ES >= 8.0: "discovery.zen设置已被删除。[...]不再支持discovery.zen命名空间下的所有设置。" - Gerold Broser

1

如果主节点使用旧版本的 ElasticSearch 创建索引，而从节点使用新版本的 ElasticSearch 初始化空索引，那么您也可能遇到此错误。

- Ryabchenko Alexander

1

我的系统防火墙是开启的，所以当我关闭防火墙时，一切正常。因此，请确保您的防火墙已关闭。

- baj9032

0

Sandeep的答案提示我节点之间无法相互通信。经过更深入的了解，我发现EC2安全组中缺少TCP端口9300的入站规则。我添加了规则，并在所有节点上重新启动了elasticsearch服务，它开始正常工作了。

- avp

0

我使用了具有Centos7的AWS EC2实例。

我的问题是没有IP路由。我不得不按照下面的说明打开一些防火墙端口，这解决了问题。

sudo firewall-cmd --permanent --add-port=8080/tcp
sudo firewall-cmd --permanent --add-port=9200/tcp
sudo firewall-cmd --permanent --add-port=9300/tcp

- Andre Leon Rangel

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- hellb0y77 · Accepted Answer

我用以下代码解决了这个问题:

network.publish_host: ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com

每个 elasticsearch.yml 配置文件都必须具有此行，并使用您的主机名。