如果某个主机无法访问,终止ansible playbook

5

我想知道是否有一种合理的方式来要求所有任务执行的主机实际上是可达的?

我目前正在尝试处理一个更新,如果不同步更新所有相关节点可能会很痛苦。

4个回答

7

当我准备发表一个问题时,我看到了这个问题。Duncan提出的答案在我的情况下并不起作用。主机无法访问。所有我的playbook都指定了max_fail_percentage为0。

但是ansible会愉快地执行所有它能够到达和执行动作的主机上的任务。我真正想要的是,如果任何一个主机无法访问,就不要执行任何任务。

我发现了一个简单但可能被认为是hacky解决方案,并且还有更好的答案。

由于运行playbooks的第一步是收集所有主机的事实。在无法访问主机的情况下,它将无法收集信息。我在我的playbook的开头编写了一个简单的play,它将使用一个事实。如果主机无法访问,那么该任务将失败并显示"Undefined variable error"。 如果所有主机都可以访问,则该任务只是一个虚拟任务,始终通过。

请参见下面的示例:

- name: Check Ansible connectivity to all hosts
  hosts: host_all
  user: "{{ remote_user }}"
  sudo: "{{ sudo_required }}"
  sudo_user: root
  connection: ssh # or paramiko
  max_fail_percentage: 0
  tasks:
    - name: check connectivity to hosts (Dummy task)
      shell: echo " {{ hostvars[item]['ansible_hostname'] }}"
      with_items: groups['host_all']
      register: cmd_output

    - name: debug ...
      debug: var=cmd_output

如果主机无法访问,您将收到以下错误信息:
TASK: [c.. ***************************************************** 
fatal: [172.22.191.160] => One or more undefined variables: 'dict object'    has no attribute 'ansible_hostname' 
fatal: [172.22.191.162] => One or more undefined variables: 'dict object' has no attribute 'ansible_hostname'

FATAL: all hosts have already failed -- aborting

注意:如果你的主机组不叫做host_all,你必须更改虚拟任务以反映该名称。


谢谢,我最终使用这个作为预处理任务请参见Gist - Jacob Evans

3
你可以将any_errors_fatal: truemax_fail_percentage: 0结合起来,再与gather_facts: false一起使用,然后运行一个任务,如果主机离线,该任务将失败。在playbook的顶部放置以下内容应该可以实现你需要的功能:
- hosts: all
  gather_facts: false
  max_fail_percentage: 0
  tasks:
    - action: ping

一个好处是这也适用于使用-l SUBSET选项来限制匹配主机。

1
为什么需要收集事实? - hbogert
1
默认情况下,Ansible仅对可达的主机进行操作,并在收集事实时确定。随后的“ping”将始终成功,因为Ansible仅尝试在已知处于运行状态的主机上运行playbook。 - wilkystyle
2
Ansible在2.0版本中的运行行为发生了变化,因此这种方法不再可行。 - hbogert
实际上,只确定是2.1版本,不是2.0(无法再编辑之前的评论)。 - hbogert

1

受其他问题/答案的启发。 https://dev59.com/CYzda4cB1Zd3GeqPpaoE#55219490

使用ansible-playbook 2.7.8。

对于每个所需主机检查是否有任何ansible_facts感觉更加明确。

# my-playbook.yml
- hosts: myservers
  tasks:
    - name: Check ALL hosts are reacheable before doing the release
      fail:
        msg: >
          [REQUIRED] ALL hosts to be reachable, so flagging {{ inventory_hostname }} as failed,
          because host {{ item }} has no facts, meaning it is UNREACHABLE.
      when: "hostvars[item].ansible_facts|list|length == 0"
      with_items: "{{ groups.myservers }}"

    - debug:
        msg: "Will only run if all hosts are reacheable"

$ ansible-playbook -i my-inventory.yml my-playbook.yml

PLAY [myservers] *************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *********************************************************************************************************************************************************************************************************
fatal: [my-host-03]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname my-host-03: Name or service not known", "unreachable": true}
fatal: [my-host-04]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname my-host-04: Name or service not known", "unreachable": true}
ok: [my-host-02]
ok: [my-host-01]

TASK [Check ALL hosts are reacheable before doing the release] ********************************************************************************************************************************************************************************************************************
failed: [my-host-01] (item=my-host-03) => {"changed": false, "item": "my-host-03", "msg": "[REQUIRED] ALL hosts to be reachable, so flagging my-host-01 as failed, because host my-host-03 has no facts, meaning it is UNREACHABLE."}
failed: [my-host-01] (item=my-host-04) => {"changed": false, "item": "my-host-04", "msg": "[REQUIRED] ALL hosts to be reachable, so flagging my-host-01 as failed, because host my-host-04 has no facts, meaning it is UNREACHABLE."}
failed: [my-host-02] (item=my-host-03) => {"changed": false, "item": "my-host-03", "msg": "[REQUIRED] ALL hosts to be reachable, so flagging my-host-02 as failed, because host my-host-03 has no facts, meaning it is UNREACHABLE."}
failed: [my-host-02] (item=my-host-04) => {"changed": false, "item": "my-host-04", "msg": "[REQUIRED] ALL hosts to be reachable, so flagging my-host-02 as failed, because host my-host-04 has no facts, meaning it is UNREACHABLE."}
skipping: [my-host-01] => (item=my-host-01)
skipping: [my-host-01] => (item=my-host-02)
skipping: [my-host-02] => (item=my-host-01)
skipping: [my-host-02] => (item=my-host-02)
        to retry, use: --limit @./my-playbook.retry

PLAY RECAP *********************************************************************************************************************************************************************************************************************
my-host-01 : ok=1    changed=0    unreachable=0    failed=1
my-host-02 : ok=1    changed=0    unreachable=0    failed=1
my-host-03 : ok=0    changed=0    unreachable=1    failed=0
my-host-04 : ok=0    changed=0    unreachable=1    failed=0

如果您正在使用角色,则还需要在pre_tasks中使用它: - Julien
如果使用ansible-playbook 2.1.1.0,可以考虑使用when: "'ansible_system' not in hostvars[item]" - Julien

1
你可以在playbook中添加max_fail_percentage,类似于以下内容:
- hosts: all_boxes
  max_fail_percentage: 0
  roles:
    - common
  pre_tasks:
    - include: roles/common/tasks/start-time.yml
    - include: roles/common/tasks/debug.yml

这样您就可以决定要容忍多少失败。以下是Ansible文档中相关部分的链接
默认情况下,只要组中还有尚未失败的主机,Ansible就会继续执行操作。在某些情况下,例如上面描述的滚动更新,当达到一定数量的故障时中止播放可能是可取的。从版本1.3开始,您可以设置播放的最大失败百分比,如下所示: hosts:webservers max_fail_percentage:30 serial:10 在上面的示例中,如果组中的10个服务器中有超过3个失败,则其余播放将被中止。
注意:必须超过设置的百分比,而不是相等。例如,如果串行设置为4,并且您希望在2个系统失败时中止任务,则应将百分比设置为49而不是50。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接