如何在使用kubeadm和Weave构建的Vagrant集群中使kube-dns正常工作

10
我使用Vagrant部署了几个虚拟机来测试Kubernetes:
主节点:4个CPU,4GB RAM
节点1:4个CPU,8GB RAM
基础镜像:Centos/7。
网络:桥接。
主机操作系统:Centos 7.2

按照kubeadm入门指南的说明,使用kubeadm部署了Kubernetes。将节点添加到集群并安装Weave Net后,不幸的是,我无法使kube-dns正常运行,因为它一直处于ContainerCreating状态:

[vagrant@master ~]$ kubectl get pods --all-namespaces
NAMESPACE     NAME                             READY     STATUS              RESTARTS   AGE
kube-system   etcd-master                      1/1       Running             0          1h
kube-system   kube-apiserver-master            1/1       Running             0          1h
kube-system   kube-controller-manager-master   1/1       Running             0          1h
kube-system   kube-discovery-982812725-0tiiy   1/1       Running             0          1h
kube-system   kube-dns-2247936740-46rcz        0/3       ContainerCreating   0          1h
kube-system   kube-proxy-amd64-4d8s7           1/1       Running             0          1h
kube-system   kube-proxy-amd64-sqea1           1/1       Running             0          1h
kube-system   kube-scheduler-master            1/1       Running             0          1h
kube-system   weave-net-h1om2                  2/2       Running             0          1h
kube-system   weave-net-khebq                  1/2       CrashLoopBackOff    17         1h
我假设问题与位于node-1上处于CrashloopBackoff状态的weave-net pod相关:
[vagrant@master ~]$ kubectl describe pods --namespace=kube-system weave-net-khebq
Name:       weave-net-khebq
Namespace:  kube-system
Node:       node-1/10.0.2.15
Start Time: Wed, 05 Oct 2016 07:10:39 +0000
Labels:     name=weave-net
Status:     Running
IP:     10.0.2.15
Controllers:    DaemonSet/weave-net
Containers:
  weave:
    Container ID:   docker://4976cd0ec6f971397aaf7fbfd746ca559322ab3d8f4ee217dd6c8bd3f6ed4f76
    Image:      weaveworks/weave-kube:1.7.0
    Image ID:       docker://sha256:1ac5304168bd9dd35c0ecaeb85d77d26c13a7d077aa8629b2a1b4e354cdffa1a
    Port:       
    Command:
      /home/weave/launch.sh
    Requests:
      cpu:      10m
    State:      Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 05 Oct 2016 08:18:51 +0000
      Finished:     Wed, 05 Oct 2016 08:18:51 +0000
    Ready:      False
    Restart Count:  18
    Liveness:       http-get http://127.0.0.1:6784/status delay=30s timeout=1s period=10s #success=1 #failure=3
    Volume Mounts:
      /etc from cni-conf (rw)
      /host_home from cni-bin2 (rw)
      /opt from cni-bin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kir36 (ro)
      /weavedb from weavedb (rw)
    Environment Variables:
      WEAVE_VERSION:    1.7.0
  weave-npc:
    Container ID:   docker://feef7e7436d2565182d99c9021958619f65aff591c576a0c240ac0adf9c66a0b
    Image:      weaveworks/weave-npc:1.7.0
    Image ID:       docker://sha256:4d7f0bd7c0e63517a675e352146af7687a206153e66bdb3d8c7caeb54802b16a
    Port:       
    Requests:
      cpu:      10m
    State:      Running
      Started:      Wed, 05 Oct 2016 07:11:04 +0000
    Ready:      True
    Restart Count:  0
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kir36 (ro)
    Environment Variables:  <none>
Conditions:
  Type      Status
  Initialized   True 
  Ready     False 
  PodScheduled  True 
Volumes:
  weavedb:
    Type:   EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium: 
  cni-bin:
    Type:   HostPath (bare host directory volume)
    Path:   /opt
  cni-bin2:
    Type:   HostPath (bare host directory volume)
    Path:   /home
  cni-conf:
    Type:   HostPath (bare host directory volume)
    Path:   /etc
  default-token-kir36:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-kir36
QoS Class:  Burstable
Tolerations:    dedicated=master:Equal:NoSchedule
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath       Type        Reason      Message
  --------- --------    -----   ----            -------------       --------    ------      -------
  1h        3m      19  {kubelet node-1}    spec.containers{weave}  Normal      Pulling     pulling image "weaveworks/weave-kube:1.7.0"
  1h        3m      19  {kubelet node-1}    spec.containers{weave}  Normal      Pulled      Successfully pulled image "weaveworks/weave-kube:1.7.0"
  55m       3m      11  {kubelet node-1}    spec.containers{weave}  Normal      Created     (events with common reason combined)
  55m       3m      11  {kubelet node-1}    spec.containers{weave}  Normal      Started     (events with common reason combined)
  1h        14s     328 {kubelet node-1}    spec.containers{weave}  Warning     BackOff     Back-off restarting failed docker container
  1h        14s     300 {kubelet node-1}                Warning     FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "weave" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=weave pod=weave-net-khebq_kube-system(d1feb9c1-8aca-11e6-8d4f-525400c583ad)"

列出运行在node-1上的容器

[vagrant@node-1 ~]$ sudo docker ps
CONTAINER ID        IMAGE                                              COMMAND                  CREATED             STATUS              PORTS               NAMES
feef7e7436d2        weaveworks/weave-npc:1.7.0                         "/usr/bin/weave-npc"     About an hour ago   Up About an hour                        k8s_weave-npc.e6299282_weave-net-khebq_kube-system_d1feb9c1-8aca-11e6-8d4f-525400c583ad_0f0517cf
762cd80d491e        gcr.io/google_containers/pause-amd64:3.0           "/pause"                 About an hour ago   Up About an hour                        k8s_POD.d8dbe16c_weave-net-khebq_kube-system_d1feb9c1-8aca-11e6-8d4f-525400c583ad_cda766ac
8c3395959d0e        gcr.io/google_containers/kube-proxy-amd64:v1.4.0   "/usr/local/bin/kube-"   About an hour ago   Up About an hour                        k8s_kube-proxy.64a0bb96_kube-proxy-amd64-4d8s7_kube-system_909e6ae1-8aca-11e6-8d4f-525400c583ad_48e7eb9a
d0fbb716bbf3        gcr.io/google_containers/pause-amd64:3.0           "/pause"                 About an hour ago   Up About an hour                        k8s_POD.d8dbe16c_kube-proxy-amd64-4d8s7_kube-system_909e6ae1-8aca-11e6-8d4f-525400c583ad_d6b232ea

第一个容器的日志显示了一些连接错误:

[vagrant@node-1 ~]$ sudo docker logs feef7e7436d2
E1005 08:46:06.368703       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:154: Failed to list *api.Pod: Get https://100.64.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:06.370119       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:155: Failed to list *extensions.NetworkPolicy: Get https://100.64.0.1:443/apis/extensions/v1beta1/networkpolicies?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:06.473779       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:153: Failed to list *api.Namespace: Get https://100.64.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:07.370451       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:154: Failed to list *api.Pod: Get https://100.64.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:07.371308       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:155: Failed to list *extensions.NetworkPolicy: Get https://100.64.0.1:443/apis/extensions/v1beta1/networkpolicies?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:07.474991       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:153: Failed to list *api.Namespace: Get https://100.64.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused

我对kubernetes和容器网络的经验不足,无法进一步排除这些问题,因此非常感谢提供的提示。 观察结果:所有 pod/node 报告它们的 IP 是 10.0.2.15,这是本地 Vagrant NAT 地址,而不是虚拟机的实际 IP 地址。


我使用 kubeadm init --api-advertise-addresses 选项,并将其指向非 NAT 地址(需要在 Vagrant 文件中定义)。虽然它允许进一步进行,但它并没有解决 weave-net 的问题。我已经在这里报告了这个问题 https://github.com/kubernetes/kubernetes/issues/34101。 - Andrew
我想知道是否有任何可行的,逐步指导如何在一组虚拟机上启动和运行Kubernetes集群的说明? 如果可以的话,我很乐意降级Kubernetes版本。 我对所有这些选项感到非常困惑:kube-deploy,kube-up和其他... - Andrew
谢谢。我想我们可以得出这样的结论,无论是Ubuntu还是Centos部署都存在问题。在初始化期间,我使用 --api-advertise-addresses 指定了正确的主节点地址,但是我找不到为节点设置类似标志(如果需要)的方法。 - bach
2个回答

12

这是我(截至2017年3月19日,使用Vagrant和VirtualBox)成功实现的配方。该集群由3个节点组成,1个主节点和2个从节点。

1) 确保在初始化时明确设置主节点的IP地址。

kubeadm init --api-advertise-addresses=10.30.3.41

2) 手动或在配置过程中,向每个节点的/etc/hosts文件中添加您要配置的确切IP地址。以下是可以添加到Vagrant文件中的一行(我使用的节点命名约定为:k8node-$i):

192.168.33.$i k8node-$i

config.vm.provision :shell, inline: "sed 's/127\.0\.0\.1.*k8node.*/10.30.3.4#{i} k8node-#{i}/' -i /etc/hosts"

例子:

vagrant@k8node-1:~$ cat /etc/hosts
10.30.3.41 k8node-1
127.0.0.1   localhost

3) 最后,所有节点将尝试使用集群的公共IP连接到主节点(不确定为什么会发生这种情况…)。以下是解决方法。

首先,在主节点上运行以下命令以查找公共IP。

kubectl get svc
NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   10.96.0.1    <none>        443/TCP   1h

对于每个节点,请确保使用 10.96.0.1(以我的情况为例)的任何进程都被路由到位于 10.30.3.41 上的 master。

因此,在每个 Node 上(您可以跳过 master),使用 route 来设置重定向。

route add 10.96.0.1 gw 10.30.3.41

之后,一切应该正常工作:

vagrant@k8node-1:~$ kubectl get pods --all-namespaces
NAMESPACE     NAME                               READY     STATUS    RESTARTS   AGE
kube-system   dummy-2088944543-rnl2f             1/1       Running   0          1h
kube-system   etcd-k8node-1                      1/1       Running   0          1h
kube-system   kube-apiserver-k8node-1            1/1       Running   0          1h
kube-system   kube-controller-manager-k8node-1   1/1       Running   0          1h
kube-system   kube-discovery-1769846148-g8g85    1/1       Running   0          1h
kube-system   kube-dns-2924299975-7wwm6          4/4       Running   0          1h
kube-system   kube-proxy-9dxsb                   1/1       Running   0          46m
kube-system   kube-proxy-nx63x                   1/1       Running   0          1h
kube-system   kube-proxy-q0466                   1/1       Running   0          1h
kube-system   kube-scheduler-k8node-1            1/1       Running   0          1h
kube-system   weave-net-2nc8d                    2/2       Running   0          46m
kube-system   weave-net-2tphv                    2/2       Running   0          1h
kube-system   weave-net-mp6s0                    2/2       Running   0          1h


vagrant@k8node-1:~$ kubectl get nodes
NAME       STATUS         AGE
k8node-1   Ready,master   1h
k8node-2   Ready          1h
k8node-3   Ready          48m

0

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接