如何在基于CoreOS的Docker镜像中从快照中恢复etcd集群?

4
我有一个运行在vmware上的 Kubernetes 集群(版本为 v1.5.6),其中包含三个 etcd 节点(版本为 3.1.5)。这些 etcd 节点分别运行在 vmware 上的三个 docker 容器中,每个容器位于不同的主机上,使用 coreos 操作系统。
我尝试使用以下方法备份 etcd:
docker run --rm --net=host -v /tmp:/etcd_backup -e ETCDCTL_API=3 quay.io/coreos/etcd:v3.1.5 etcdctl --endpoints=[1.1.1.1:2379,2.2.2.2:2379,3.3.3.3:2379] snapshot save etcd_backup/snapshot.db

备份已成功完成。
我想在另一个VMware环境中从零开始创建这个Kubernetes集群,但我需要从快照中还原etcd。
到目前为止,我还没有找到适用于Docker容器中的etcd的正确解决方案。
我尝试使用以下方法进行恢复,但不幸的是我没有成功。
首先,在运行以下命令后,我创建了一个新的etcd节点:
docker run --rm --net=host -v /tmp/etcd_bak:/etcd_backup -e ETCDCTL_API=3 registry:5000/quay.io/coreos/etcd:v3.1.5 etcdctl snapshot restore etcd_backup/snapshot.db --name etcd0 --initial-cluster etcd0=http://etcd0:2380,etcd1=http://etcd1:2380,etcd2=http://etcd2:2380 --initial-cluster-token etcd-cluster-1 --initial-advertise-peer-urls http://etcd0:2380

结果:

2018-06-04 09:25:52.314747 I | etcdserver/membership: added member 7ff5c9c6942f82e [http://etcd0:2380] to cluster 5d1b637f4b7740d5
2018-06-04 09:25:52.314940 I | etcdserver/membership: added member 91b417e7701c2eeb [http://etcd2:2380] to cluster 5d1b637f4b7740d5
2018-06-04 09:25:52.315096 I | etcdserver/membership: added member faeb78734ee4a93d [http://etcd1:2380] to cluster 5d1b637f4b7740d5

很不幸,什么都没有发生。

恢复etcd备份的好方法是什么?

我该如何创建一个空的etcd集群/节点,并如何恢复快照?

2个回答

5
根据Etcd的 灾难恢复 文档,您需要使用类似于您的命令从快照中恢复所有三个Etcd节点,然后使用类似于此命令运行三个节点:
etcd \
  --name m1 \
  --listen-client-urls http://host1:2379 \
  --advertise-client-urls http://host1:2379 \
  --listen-peer-urls http://host1:2380 &

此外,您可以从镜像中提取etcdctl,方法如下:
docker run --rm -v /opt/bin:/opt/bin registry:5000/quay.io/coreos/etcd:v3.1.5 cp /usr/local/bin/etcdctl /opt/bin

然后使用etcdctl恢复快照:
# ETCDCTL_API=3 ./etcdctl snapshot restore snapshot.db \
  --name m1 \
  --initial-cluster m1=http://host1:2380,m2=http://host2:2380,m3=http://host3:2380 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-advertise-peer-urls http://host1:2380 \
  --data-dir /var/lib/etcd

这将把快照恢复到/var/lib/etcd目录。然后使用docker启动etcd,不要忘记将/var/lib/etcd挂载到容器中,并为其指定--data-dir参数。

但是,我的etcd节点正在Docker容器中运行,而不是在CoreOS上本地运行。 在这种情况下,如果Docker镜像没有运行,我就无法恢复快照。 另一方面,我该如何使用其他命令运行相同的Docker镜像?不幸的是,这部分对我来说不太清楚。 - almi
我按照您的指示成功恢复了集群,非常感谢。 不幸的是,我的v3备份中没有v2密钥(只有事件)。我的minions上安装了旧版CoreOS和etcdctl 2.3.7。(而我的k8s集群版本是1.5.6。)我必须尝试v2备份/恢复程序。 - almi

3

在Kubernetes中,Ectd是运行在Docker容器中的。以下是我恢复集群所做的步骤:

  • retrieve Etcd cluster metedata

    docker inspect etcd1
    

    you'd got something like below:

    "Binds": [
        "/etc/ssl/certs:/etc/ssl/certs:ro",
        "/etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro",
        "/var/lib/etcd:/var/lib/etcd:rw"
    ],
    ...
    "Env": [
        "ETCD_DATA_DIR=/var/lib/etcd",
        "ETCD_ADVERTISE_CLIENT_URLS=https://172.16.60.1:2379",
        "ETCD_INITIAL_ADVERTISE_PEER_URLS=https://172.16.60.1:2380",
        "ETCD_INITIAL_CLUSTER_STATE=existing",
        "ETCD_METRICS=basic",
        "ETCD_LISTEN_CLIENT_URLS=https://172.16.60.1:2379,https://127.0.0.1:2379",
        "ETCD_ELECTION_TIMEOUT=5000",
        "ETCD_HEARTBEAT_INTERVAL=250",
        "ETCD_INITIAL_CLUSTER_TOKEN=k8s_etcd",
        "ETCD_LISTEN_PEER_URLS=https://172.16.60.1:2380",
        "ETCD_NAME=etcd1",
        "ETCD_PROXY=off",
        "ETCD_INITIAL_CLUSTER=etcd1=https://172.16.60.1:2380,etcd2=https://172.16.60.2:2380,etcd3=https://172.16.60.2:2380",
        "ETCD_AUTO_COMPACTION_RETENTION=8",
        "ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem",
        "ETCD_CERT_FILE=/etc/ssl/etcd/ssl/member-node01.pem",
        "ETCD_KEY_FILE=/etc/ssl/etcd/ssl/member-node01-key.pem",
        "ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem",
        "ETCD_PEER_CERT_FILE=/etc/ssl/etcd/ssl/member-node01.pem",
        "ETCD_PEER_KEY_FILE=/etc/ssl/etcd/ssl/member-node01-key.pem",
        "ETCD_PEER_CLIENT_CERT_AUTH=true",
        "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    ],
    "Cmd": [
        "/usr/local/bin/etcd"
    ],
    
  • copy etcd snapshotdb to other etcd nodes

    scp snapshotdb_20180913 node02:/root/  
    scp snapshotdb_20180913 node03:/root/  
    
  • rebuild a new cluster with original info

    # etcd1
    docker stop etcd1
    rm -rf /var/lib/etcd
    
    ETCDCTL_API=3 etcdctl snapshot restore snapshotdb_20180913 \
      --cacert /etc/ssl/etcd/ssl/ca.pem \
      --cert /etc/ssl/etcd/ssl/member-node01.pem \
      --key /etc/ssl/etcd/ssl/member-node01-key.pem \
      --name etcd1 \
      --initial-cluster etcd1=https://node01:2380,etcd2=https://node02:2380,etcd3=https://node03:2380 \
      --initial-cluster-token k8s_etcd \
      --initial-advertise-peer-urls https://node01:2380 \
      --data-dir /var/lib/etcd
    
    # etcd2
    docker stop etcd2
    rm -rf /var/lib/etcd
    
    ETCDCTL_API=3 etcdctl snapshot restore snapshotdb_20180913 \
      --cacert /etc/ssl/etcd/ssl/ca.pem \
      --cert /etc/ssl/etcd/ssl/member-node02.pem \
      --key /etc/ssl/etcd/ssl/member-node02-key.pem \
      --name etcd2 \
      --initial-cluster etcd1=https://node01:2380,etcd2=https://node02:2380,etcd3=https://node03:2380 \
      --initial-cluster-token k8s_etcd \
      --initial-advertise-peer-urls https://node02:2380 \
      --data-dir /var/lib/etcd
    
    # etcd3
    docker stop etcd3
    rm -rf /var/lib/etcd
    
    ETCDCTL_API=3 etcdctl snapshot restore snapshotdb_20180913 \
      --cacert /etc/ssl/etcd/ssl/ca.pem \
      --cert /etc/ssl/etcd/ssl/member-node03.pem \
      --key /etc/ssl/etcd/ssl/member-node03-key.pem \
      --name etcd3 \
      --initial-cluster etcd1=https://node01:2380,etcd2=https://node02:2380,etcd3=https://node03:2380 \
      --initial-cluster-token k8s_etcd \
      --initial-advertise-peer-urls https://node03:2380 \
      --data-dir /var/lib/etcd
    
  • start containers and check cluster status

    cd /etc/ssl/etcd/ssl
    etcdctl \
      --endpoints=https://node01:2379 \
      --ca-file=./ca.pem \
      --cert-file=./member-node01.pem \
      --key-file=./member-node01-key.pem \
      member list
    

感谢您的发布,关于etcd备份和恢复的内容相当稀缺。 顺便说一下,这可能会对您有所帮助: https://labs.consol.de/kubernetes/2018/05/25/kubeadm-backup.html 它是关于使用cronjob自动化etcd备份的文章。Heptio Ark是另一个很酷的灾难恢复工具,值得一试。 - neokyle

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接