我使用docker swarm部署了Selenium Grid。
docker-compose.yml文件:
问题在于当 `hub` 的状态为 `unhealthy` 时,Swarm 几乎从不重新启动它。我只注意到有几次成功重新启动了。据我所知,它应该保持重新启动状态,直到 `healthcheck` 成功或永远,但容器仅以 `unhealthy` 状态运行。
我尝试完全排除 `restart_policy`,以防它与 Swarm 模式搞混了,但没有效果。
此外:似乎当 `hub` 成功重启时,`chrome` 容器(所有副本)也会重新启动。这种关系没有在 `docker-compose.yml` 中指定,这是怎么回事?
我的设置可能出了什么问题?
更新:
当我尝试检查容器(状态为 `unhealthy` 并且没有更多的重试)时,例如 `docker container inspect $container_id --format '{{json .State.Health}}' | jq.`,或者对容器进行几乎任何其他函数操作时,都会失败并输出以下内容:
为了解决这个问题,我尝试应用了以下解决方案:https://success.docker.com/article/how-to-reserve-resource-temporarily-unavailable-errors-due-to-tasksmax-setting。但是它没有起到任何作用,因此我猜测原因可能不同。在我的系统上,
version: '3.7'
services:
hub:
image: selenium/hub:3.141.59-mercury
ports:
- "4444:4444"
volumes:
- /dev/shm:/dev/shm
privileged: true
environment:
HUB_HOST: hub
HUB_PORT: 4444
deploy:
resources:
limits:
memory: 5000M
restart_policy:
condition: on-failure
window: 240s
healthcheck:
test: ["CMD", "curl", "-I", "http://127.0.0.1:4444/wd/hub/status"]
interval: 1m
timeout: 60s
retries: 3
start_period: 300s
chrome:
image: selenium/node-chrome:latest
volumes:
- /dev/shm:/dev/shm
privileged: true
environment:
HUB_HOST: hub
HUB_PORT: 4444
NODE_MAX_INSTANCES: 5
NODE_MAX_SESSION: 5
deploy:
resources:
limits:
memory: 2800M
replicas: 10
entrypoint: bash -c 'SE_OPTS="-host $$HOSTNAME" /opt/bin/entry_point.sh'
问题在于当 `hub` 的状态为 `unhealthy` 时,Swarm 几乎从不重新启动它。我只注意到有几次成功重新启动了。据我所知,它应该保持重新启动状态,直到 `healthcheck` 成功或永远,但容器仅以 `unhealthy` 状态运行。
我尝试完全排除 `restart_policy`,以防它与 Swarm 模式搞混了,但没有效果。
此外:似乎当 `hub` 成功重启时,`chrome` 容器(所有副本)也会重新启动。这种关系没有在 `docker-compose.yml` 中指定,这是怎么回事?
我的设置可能出了什么问题?
更新:
当我尝试检查容器(状态为 `unhealthy` 并且没有更多的重试)时,例如 `docker container inspect $container_id --format '{{json .State.Health}}' | jq.`,或者对容器进行几乎任何其他函数操作时,都会失败并输出以下内容:
docker container inspect 1abfa546cc26 --format '{{json .State.Health}}' | jq .
runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x7fa114765fff m=0 sigcode=18446744073709551610
goroutine 0 [idle]:
runtime: unknown pc 0x7fa114765fff
stack: frame={sp:0x7ffe5e0f1a08, fp:0x0} stack=[0x7ffe5d8f2fc8,0x7ffe5e0f1ff0)
00007ffe5e0f1908: 73752f3a6e696273 732f3a6e69622f72
00007ffe5e0f1918: 6e69622f3a6e6962 2a3a36333b30303d
00007ffe5e0f1928: 3b30303d616b6d2e 33706d2e2a3a3633
00007ffe5e0f1938: 2a3a36333b30303d 3b30303d63706d2e
00007ffe5e0f1948: 67676f2e2a3a3633 2a3a36333b30303d
00007ffe5e0f1958: 333b30303d61722e 3d7661772e2a3a36
00007ffe5e0f1968: 2e2a3a36333b3030 333b30303d61676f
00007ffe5e0f1978: 7375706f2e2a3a36 2a3a36333b30303d
00007ffe5e0f1988: 3b30303d7870732e 0000000000000000
00007ffe5e0f1998: 3a36333b30303d66 2a3a36333b30303d
00007ffe5e0f19a8: 3b30303d616b6d2e 33706d2e2a3a3633
00007ffe5e0f19b8: 2a3a36333b30303d 3b30303d63706d2e
00007ffe5e0f19c8: 67676f2e2a3a3633 2a3a36333b30303d
00007ffe5e0f19d8: 333b30303d61722e 3d7661772e2a3a36
00007ffe5e0f19e8: 2e2a3a36333b3030 333b30303d61676f
00007ffe5e0f19f8: 7375706f2e2a3a36 0000000000000002
00007ffe5e0f1a08: <8000000000000006 fffffffe7fffffff
00007ffe5e0f1a18: ffffffffffffffff ffffffffffffffff
00007ffe5e0f1a28: ffffffffffffffff ffffffffffffffff
00007ffe5e0f1a38: ffffffffffffffff ffffffffffffffff
00007ffe5e0f1a48: ffffffffffffffff ffffffffffffffff
00007ffe5e0f1a58: ffffffffffffffff ffffffffffffffff
00007ffe5e0f1a68: ffffffffffffffff ffffffffffffffff
00007ffe5e0f1a78: ffffffffffffffff ffffffffffffffff
00007ffe5e0f1a88: ffffffffffffffff 00007fa114acd6e0
00007ffe5e0f1a98: 00007fa11476742a 0000000000000020
00007ffe5e0f1aa8: 0000000000000000 0000000000000000
00007ffe5e0f1ab8: 0000000000000000 0000000000000000
00007ffe5e0f1ac8: 0000000000000000 0000000000000000
00007ffe5e0f1ad8: 0000000000000000 0000000000000000
00007ffe5e0f1ae8: 0000000000000000 0000000000000000
00007ffe5e0f1af8: 0000000000000000 0000000000000000
runtime: unknown pc 0x7fa114765fff
stack: frame={sp:0x7ffe5e0f1a08, fp:0x0} stack=[0x7ffe5d8f2fc8,0x7ffe5e0f1ff0)
00007ffe5e0f1908: 73752f3a6e696273 732f3a6e69622f72
00007ffe5e0f1918: 6e69622f3a6e6962 2a3a36333b30303d
00007ffe5e0f1928: 3b30303d616b6d2e 33706d2e2a3a3633
00007ffe5e0f1938: 2a3a36333b30303d 3b30303d63706d2e
00007ffe5e0f1948: 67676f2e2a3a3633 2a3a36333b30303d
00007ffe5e0f1958: 333b30303d61722e 3d7661772e2a3a36
00007ffe5e0f1968: 2e2a3a36333b3030 333b30303d61676f
00007ffe5e0f1978: 7375706f2e2a3a36 2a3a36333b30303d
00007ffe5e0f1988: 3b30303d7870732e 0000000000000000
00007ffe5e0f1998: 3a36333b30303d66 2a3a36333b30303d
00007ffe5e0f19a8: 3b30303d616b6d2e 33706d2e2a3a3633
00007ffe5e0f19b8: 2a3a36333b30303d 3b30303d63706d2e
00007ffe5e0f19c8: 67676f2e2a3a3633 2a3a36333b30303d
00007ffe5e0f19d8: 333b30303d61722e 3d7661772e2a3a36
00007ffe5e0f19e8: 2e2a3a36333b3030 333b30303d61676f
00007ffe5e0f19f8: 7375706f2e2a3a36 0000000000000002
00007ffe5e0f1a08: <8000000000000006 fffffffe7fffffff
00007ffe5e0f1a18: ffffffffffffffff ffffffffffffffff
00007ffe5e0f1a28: ffffffffffffffff ffffffffffffffff
00007ffe5e0f1a38: ffffffffffffffff ffffffffffffffff
00007ffe5e0f1a48: ffffffffffffffff ffffffffffffffff
00007ffe5e0f1a58: ffffffffffffffff ffffffffffffffff
00007ffe5e0f1a68: ffffffffffffffff ffffffffffffffff
00007ffe5e0f1a78: ffffffffffffffff ffffffffffffffff
00007ffe5e0f1a88: ffffffffffffffff 00007fa114acd6e0
00007ffe5e0f1a98: 00007fa11476742a 0000000000000020
00007ffe5e0f1aa8: 0000000000000000 0000000000000000
00007ffe5e0f1ab8: 0000000000000000 0000000000000000
00007ffe5e0f1ac8: 0000000000000000 0000000000000000
00007ffe5e0f1ad8: 0000000000000000 0000000000000000
00007ffe5e0f1ae8: 0000000000000000 0000000000000000
00007ffe5e0f1af8: 0000000000000000 0000000000000000
goroutine 1 [running, locked to thread]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:311 fp=0xc00009c720 sp=0xc00009c718 pc=0x565171ddf910
runtime.newproc(0x565100000000, 0x56517409ab70)
/usr/local/go/src/runtime/proc.go:3243 +0x71 fp=0xc00009c768 sp=0xc00009c720 pc=0x565171dbdea1
runtime.init.5()
/usr/local/go/src/runtime/proc.go:239 +0x37 fp=0xc00009c788 sp=0xc00009c768 pc=0x565171db6447
runtime.init()
<autogenerated>:1 +0x6a fp=0xc00009c798 sp=0xc00009c788 pc=0x565171ddf5ba
runtime.main()
/usr/local/go/src/runtime/proc.go:147 +0xc2 fp=0xc00009c7e0 sp=0xc00009c798 pc=0x565171db6132
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc00009c7e8 sp=0xc00009c7e0 pc=0x565171de1a11
rax 0x0
rbx 0x6
rcx 0x7fa114765fff
rdx 0x0
rdi 0x2
rsi 0x7ffe5e0f1990
rbp 0x5651736b13d5
rsp 0x7ffe5e0f1a08
r8 0x0
r9 0x7ffe5e0f1990
r10 0x8
r11 0x246
r12 0x565175ae21a0
r13 0x11
r14 0x565173654be8
r15 0x0
rip 0x7fa114765fff
rflags 0x246
cs 0x33
fs 0x0
gs 0x0
为了解决这个问题,我尝试应用了以下解决方案:https://success.docker.com/article/how-to-reserve-resource-temporarily-unavailable-errors-due-to-tasksmax-setting。但是它没有起到任何作用,因此我猜测原因可能不同。在我的系统上,
journalctl -u docker
日志中充满了这些记录: level=warning msg="Health check for container c427cfd49214d394cee8dd2c9019f6f319bc6637cfb53f0c14de70e1147b5fa6 error: context deadline exceeded"
chrome
节点容器不会重新启动。 - user1935987