Docker - 启动容器失败:ID 已被使用(重启后)

5
我在Ubuntu 18.04.1 LTS上运行多个docker容器,使用restart=always。物理服务器每天早上2点通过cronjob执行reboot now重启。
到目前为止,在过去的5到6个月中,我没有在这个特定的设置下遇到任何问题。
但是今天,在每日重新启动后,容器没有启动。 docker ps输出为空,所有容器的状态均为“Exited”。
为什么突然发生这种情况? 我的设置是否从一开始就配置错误了,或者最近的docker-ce软件包升级有影响吗?
以下是重启前后的日志,以及docker.service单元和版本信息:
root@skprov2:~# journalctl -b -1 -x -u docker
Nov 15 02:00:02 skprov2 systemd[1]: Stopping Docker Application Container Engine...
-- Subject: Unit docker.service has begun shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit docker.service has begun shutting down.
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.189764841+01:00" level=info msg="Processing signal 'terminated'"
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.595098434+01:00" level=info msg="shim reaped" id=c929d444a6eb59a69a0da738ca782a9feb92ac1f80e5c4576bf85376c3d4c17a
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.601217756+01:00" level=info msg="shim reaped" id=98a8c1b99cf986e6a889474f0fc28fe3635e466b21f8a37ef3c10a1050495c78
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.604880385+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.670918937+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.732991633+01:00" level=info msg="shim reaped" id=9b3badc752786df08d00138c0222042a6bd80bb2c971f5a96b71e57105cea95c
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.748732351+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.843982385+01:00" level=info msg="shim reaped" id=ae7531405113db8b4754491a12c2ababf09fa0c8f501bfe6f1b33e3ff18b6462
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.869023019+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:03 skprov2 dockerd[1504]: time="2018-11-15T02:00:03.863568729+01:00" level=info msg="shim reaped" id=b335536f5f07b1db3f32ba4452fc4aadacc02c6184cef7fc9df619ab81bbf002
Nov 15 02:00:04 skprov2 dockerd[1504]: time="2018-11-15T02:00:04.279347144+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:12 skprov2 dockerd[1504]: time="2018-11-15T02:00:12.233635995+01:00" level=info msg="Container 77e00eabebea97357f05f564597d167acad8f2596b25d295b4366baf08ef3127 failed to exit within 10 seconds of signal 15 - using the force"
Nov 15 02:00:12 skprov2 dockerd[1504]: time="2018-11-15T02:00:12.253563540+01:00" level=info msg="Container 7f7d2a92bcdbb240a9400942c9301f5cd77bf9d3fbde1d38f41a2bd1226f9b09 failed to exit within 10 seconds of signal 15 - using the force"
Nov 15 02:00:12 skprov2 dockerd[1504]: time="2018-11-15T02:00:12.253563179+01:00" level=info msg="Container f6b49cc85eb7f9226ac192498b1e319d68e0de2faff6b4e3e67adabba43a093a failed to exit within 10 seconds of signal 15 - using the force"
Nov 15 02:00:12 skprov2 dockerd[1504]: time="2018-11-15T02:00:12.654403249+01:00" level=info msg="shim reaped" id=7f7d2a92bcdbb240a9400942c9301f5cd77bf9d3fbde1d38f41a2bd1226f9b09
Nov 15 02:00:12 skprov2 dockerd[1504]: time="2018-11-15T02:00:12.679675304+01:00" level=info msg="shim reaped" id=f6b49cc85eb7f9226ac192498b1e319d68e0de2faff6b4e3e67adabba43a093a
Nov 15 02:00:12 skprov2 dockerd[1504]: time="2018-11-15T02:00:12.680699340+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:12 skprov2 dockerd[1504]: time="2018-11-15T02:00:12.689801078+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:13 skprov2 dockerd[1504]: time="2018-11-15T02:00:13.088891655+01:00" level=info msg="shim reaped" id=77e00eabebea97357f05f564597d167acad8f2596b25d295b4366baf08ef3127
Nov 15 02:00:13 skprov2 dockerd[1504]: time="2018-11-15T02:00:13.111193244+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:15 skprov2 dockerd[1504]: time="2018-11-15T02:00:15.233286510+01:00" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
Nov 15 02:00:15 skprov2 dockerd[1504]: time="2018-11-15T02:00:15.233684167+01:00" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
Nov 15 02:00:15 skprov2 dockerd[1504]: time="2018-11-15T02:00:15.233695697+01:00" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
Nov 15 02:00:15 skprov2 dockerd[1504]: time="2018-11-15T02:00:15.234287398+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4209ec610, TRANSIENT_FAILURE" module=grpc
Nov 15 02:00:15 skprov2 dockerd[1504]: time="2018-11-15T02:00:15.234328545+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4209ec610, CONNECTING" module=grpc
Nov 15 02:00:16 skprov2 systemd[1]: Stopped Docker Application Container Engine.
-- Subject: Unit docker.service has finished shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit docker.service has finished shutting down.



==================================================================================
==================================================================================
==================================================================================



root@skprov2:~# journalctl -b 0 -x -u docker
-- Logs begin at Thu 2018-07-05 13:16:23 CEST, end at Thu 2018-11-15 08:16:31 CET. --
Nov 15 02:04:00 skprov2 systemd[1]: Starting Docker Application Container Engine...
-- Subject: Unit docker.service has begun start-up
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit docker.service has begun starting up.
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.152961544+01:00" level=info msg="systemd-resolved is running, so using resolvconf: /run/systemd/resolve/resolv.conf"
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.432271212+01:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.432315437+01:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.439772198+01:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.439800208+01:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.471564855+01:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0  <nil>}]" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.471618580+01:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.471653422+01:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0  <nil>}]" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.475146270+01:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.475678777+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420abc010, CONNECTING" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.475795536+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42080b0b0, CONNECTING" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.475769194+01:00" level=info msg="blockingPicker: the picked transport is not ready, loop back to repick" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.476273893+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42080b0b0, READY" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.476346309+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420abc010, READY" module=grpc
Nov 15 02:04:14 skprov2 dockerd[1690]: time="2018-11-15T02:04:14.769703354+01:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
Nov 15 02:04:23 skprov2 dockerd[1690]: time="2018-11-15T02:04:23.247573731+01:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Nov 15 02:04:23 skprov2 dockerd[1690]: time="2018-11-15T02:04:23.247926863+01:00" level=warning msg="Your kernel does not support swap memory limit"
Nov 15 02:04:23 skprov2 dockerd[1690]: time="2018-11-15T02:04:23.247998928+01:00" level=warning msg="Your kernel does not support cgroup rt period"
Nov 15 02:04:23 skprov2 dockerd[1690]: time="2018-11-15T02:04:23.248016977+01:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Nov 15 02:04:23 skprov2 dockerd[1690]: time="2018-11-15T02:04:23.254944197+01:00" level=info msg="Loading containers: start."
Nov 15 02:04:25 skprov2 dockerd[1690]: time="2018-11-15T02:04:25.856323528+01:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Nov 15 02:04:35 skprov2 dockerd[1690]: time="2018-11-15T02:04:35.182112549+01:00" level=error msg="Failed to start container c929d444a6eb59a69a0da738ca782a9feb92ac1f80e5c4576bf85376c3d4c17a: id already in use"
Nov 15 02:04:35 skprov2 dockerd[1690]: time="2018-11-15T02:04:35.206030890+01:00" level=error msg="Failed to start container b335536f5f07b1db3f32ba4452fc4aadacc02c6184cef7fc9df619ab81bbf002: id already in use"
Nov 15 02:04:35 skprov2 dockerd[1690]: time="2018-11-15T02:04:35.235647072+01:00" level=error msg="Failed to start container ae7531405113db8b4754491a12c2ababf09fa0c8f501bfe6f1b33e3ff18b6462: id already in use"
Nov 15 02:04:35 skprov2 dockerd[1690]: time="2018-11-15T02:04:35.374241415+01:00" level=error msg="Failed to start container 9b3badc752786df08d00138c0222042a6bd80bb2c971f5a96b71e57105cea95c: id already in use"
Nov 15 02:04:35 skprov2 dockerd[1690]: time="2018-11-15T02:04:35.410173049+01:00" level=error msg="Failed to start container 7f7d2a92bcdbb240a9400942c9301f5cd77bf9d3fbde1d38f41a2bd1226f9b09: id already in use"
Nov 15 02:04:36 skprov2 dockerd[1690]: time="2018-11-15T02:04:36.171600568+01:00" level=error msg="Failed to start container 98a8c1b99cf986e6a889474f0fc28fe3635e466b21f8a37ef3c10a1050495c78: id already in use"
Nov 15 02:04:36 skprov2 dockerd[1690]: time="2018-11-15T02:04:36.970077586+01:00" level=error msg="Failed to start container f6b49cc85eb7f9226ac192498b1e319d68e0de2faff6b4e3e67adabba43a093a: id already in use"
Nov 15 02:04:36 skprov2 dockerd[1690]: time="2018-11-15T02:04:36.993993749+01:00" level=error msg="Failed to start container 77e00eabebea97357f05f564597d167acad8f2596b25d295b4366baf08ef3127: id already in use"
Nov 15 02:04:36 skprov2 dockerd[1690]: time="2018-11-15T02:04:36.994202774+01:00" level=info msg="Loading containers: done."
Nov 15 02:04:37 skprov2 dockerd[1690]: time="2018-11-15T02:04:37.492457742+01:00" level=info msg="Docker daemon" commit=4d60db4 graphdriver(s)=overlay2 version=18.09.0
Nov 15 02:04:37 skprov2 dockerd[1690]: time="2018-11-15T02:04:37.494916840+01:00" level=info msg="Daemon has completed initialization"
Nov 15 02:04:37 skprov2 dockerd[1690]: time="2018-11-15T02:04:37.669139526+01:00" level=info msg="API listen on /var/run/docker.sock"
Nov 15 02:04:37 skprov2 systemd[1]: Started Docker Application Container Engine.
-- Subject: Unit docker.service has finished start-up
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit docker.service has finished starting up.
--
-- The start-up result is RESULT.



==================================================================================
==================================================================================
==================================================================================



root@skprov2:~# cat /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service
Wants=network-online.target

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H unix://
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this option.
TasksMax=infinity

# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

# kill only the docker process, not all processes in the cgroup
KillMode=process

[Install]
WantedBy=multi-user.target



==================================================================================
==================================================================================
==================================================================================



root@skprov2:~# docker info && docker version
Containers: 14
 Running: 5
 Paused: 0
 Stopped: 9
Images: 61
Server Version: 18.09.0
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: c4446665cb9c30056f4998ed953e6d4ff22c7c39
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-39-generic
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 31.39GiB
Name: skprov2
ID: EDC2:AGFH:BHKP:P4HS:M5DA:ZPXM:AU6B:TV6E:6KIU:YC4S:F3NN:35A4
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 localhost:5000
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support
Client:
 Version:           18.09.0
 API version:       1.39
 Go version:        go1.10.4
 Git commit:        4d60db4
 Built:             Wed Nov  7 00:49:01 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.0
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.4
  Git commit:       4d60db4
  Built:            Wed Nov  7 00:16:44 2018
  OS/Arch:          linux/amd64
  Experimental:     false

1
今天早上我遇到了完全相同的问题:重启后,我的所有容器都没有重新启动,并出现了与您相同的“ID已在使用”的错误列表。如果找到任何解决方案或者有关此问题的错误报告被打开,请告诉我。 - Gwendal
我们也有同样的问题。而且它也是不可预测的,不知道什么时候会发生。 - Shrikrishna Holla
2个回答

2

我遇到了同样的问题。 在我的情况下,问题是由于Docker没有正确清理自己而导致的。 如Docker日志所示:

time="2018-12-31T17:38:54.330555181+02:00" level=error msg="2089c8095e62011b0dc05e66c51ae59d648d909ca7a8e806af0fdf39b2e3006c cleanup: failed to delete container from containerd: transport is closing: unknown"

这些id是由于restart=always在启动时使用的。因此,Docker如下所述:
time="2018-12-31T17:40:04.648261275+02:00" level=error msg="Failed to start container 2089c8095e62011b0dc05e66c51ae59d648d909ca7a8e806af0fdf39b2e3006c: id already in use"

看起来Docker守护进程关闭的速度比容器清理得更快(可能是因为容器忽略信号或其他原因)。

所以,对我来说解决方案似乎是在Docker守护进程配置文件中更改守护进程的shutdown-timeout。默认值为10秒左右,我将其更改为60秒,现在我不再遇到这些问题了。

尽管如此,我仍然认为这是一个应该开箱即用的合法场景。


1

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接