我目前正在处理一个混合的Linux/Windows Kubernetes集群。它当前有4个节点,作为虚拟机运行在单个物理服务器上的VMware集群中:
- 3个运行在debian stretch上、使用kubeadm配置的Linux节点
- 1个基于Microsoft文档配置的Windows Server 2019 (1809)节点。
我在网络方面使用了Microsoft建议的flannel host-gw模式。IP地址已正确分配到它们各自的范围内(10.244.0.0/16用于pod和10.96.0.0/12用于服务)的pod和服务中。
整个集群正在运行Kubernetes 1.13,从1.12.3升级而来,刚刚从Microsoft/SDN下载了最新的flannel二进制文件。
用于启动服务的Windows Powershell命令:
.\start.ps1 -ManagementIP 10.71.145.37 -ClusterCIDR 10.244.0.0/16 -ServiceCIDR 10.96.0.0/12 -KubeDnsServiceIP 10.96.0.10
有哪些工作正常运行?
- Linux pod -> Linux pod: 是的
- Linux pod -> Windows pod: 是的
- Windows pod -> Linux pod: 是的
- Windows pod -> Windows pod: 是的
- Linux pod -> Linux service: 是的
- Linux pod -> Windows service: 否
- Windows pod -> Linux service: 否
- Windows pod -> Windows service: 否
- Linux host -> Linux pod: 是的
- Linux host -> Windows pod: 是的
- Windows host -> Linux pod: 是的
- Windows host -> Windows pod: 是的
- Linux host -> Linux service: 是的
- Linux host -> Windows service: 否
- Windows host -> Linux service: 否
- Windows host -> Windows service: 否
简而言之: 直接连接到pod在Windows和Linux之间可以工作,但服务连接仅适用于Linux服务(由Linux pods支持的服务),并且仅从Linux pods或主机上可用。
DNS解析也可以工作,但我无法在Windows pods上解析 service.namespace
,只能使用主机名或FQDN,中间没有其他内容。
来自Linux节点的路由表:
# host linux-node-1: 10.71.144.71
root@linux-node-1:~# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 10.71.144.1 0.0.0.0 UG 0 0 0 ens32
10.71.144.0 0.0.0.0 255.255.252.0 U 0 0 0 ens32
10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
10.244.1.0 linux-node-2 255.255.255.0 UG 0 0 0 ens32
10.244.2.0 linux-node-3 255.255.255.0 UG 0 0 0 ens32
10.244.5.0 windows-node-1 255.255.255.0 UG 0 0 0 ens32
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
# host linux-node-2: 10.71.147.15
root@linux-node-2:~# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 10.71.144.1 0.0.0.0 UG 0 0 0 ens32
10.71.144.0 0.0.0.0 255.255.252.0 U 0 0 0 ens32
10.244.0.0 linux-node-1 255.255.255.0 UG 0 0 0 ens32
10.244.1.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
10.244.2.0 linux-node-3 255.255.255.0 UG 0 0 0 ens32
10.244.5.0 windows-node-1 255.255.255.0 UG 0 0 0 ens32
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
# host linux-node-3: 10.71.144.123
root@linux-node-3:~# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 10.71.144.1 0.0.0.0 UG 0 0 0 ens32
10.71.144.0 0.0.0.0 255.255.252.0 U 0 0 0 ens32
10.244.0.0 linux-node-1 255.255.255.0 UG 0 0 0 ens32
10.244.1.0 linux-node-2 255.255.255.0 UG 0 0 0 ens32
10.244.2.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
10.244.5.0 windows-node-1 255.255.255.0 UG 0 0 0 ens32
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
来自Windows节点的路由表:
PS C:\k> route print
===========================================================================
Interface List
9...00 50 56 89 69 ce ......Hyper-V Virtual Ethernet Adapter #2
21...00 15 5d 8d 98 26 ......Hyper-V Virtual Ethernet Adapter #3
1...........................Software Loopback Interface 1
12...00 15 5d 84 c0 c9 ......Hyper-V Virtual Ethernet Adapter
===========================================================================
IPv4 Route Table
===========================================================================
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 10.71.144.1 10.71.145.37 25
0.0.0.0 0.0.0.0 10.244.5.1 10.244.5.2 281
10.71.144.0 255.255.252.0 On-link 10.71.145.37 281
10.71.145.37 255.255.255.255 On-link 10.71.145.37 281
10.71.145.37 255.255.255.255 10.71.144.1 10.71.145.37 125
10.71.147.255 255.255.255.255 On-link 10.71.145.37 281
10.244.0.0 255.255.255.0 10.71.144.71 10.71.145.37 281
10.244.1.0 255.255.255.0 10.71.147.15 10.71.145.37 281
10.244.2.0 255.255.255.0 10.71.144.123 10.71.145.37 281
10.244.5.0 255.255.255.0 On-link 10.244.5.2 281
10.244.5.2 255.255.255.255 On-link 10.244.5.2 281
10.244.5.255 255.255.255.255 On-link 10.244.5.2 281
127.0.0.0 255.0.0.0 On-link 127.0.0.1 331
127.0.0.1 255.255.255.255 On-link 127.0.0.1 331
127.255.255.255 255.255.255.255 On-link 127.0.0.1 331
172.27.80.0 255.255.240.0 On-link 172.27.80.1 5256
172.27.80.1 255.255.255.255 On-link 172.27.80.1 5256
172.27.95.255 255.255.255.255 On-link 172.27.80.1 5256
224.0.0.0 240.0.0.0 On-link 127.0.0.1 331
224.0.0.0 240.0.0.0 On-link 172.27.80.1 5256
224.0.0.0 240.0.0.0 On-link 10.71.145.37 281
224.0.0.0 240.0.0.0 On-link 10.244.5.2 281
255.255.255.255 255.255.255.255 On-link 127.0.0.1 331
255.255.255.255 255.255.255.255 On-link 172.27.80.1 5256
255.255.255.255 255.255.255.255 On-link 10.71.145.37 281
255.255.255.255 255.255.255.255 On-link 10.244.5.2 281
===========================================================================
Persistent Routes:
Network Address Netmask Gateway Address Metric
0.0.0.0 0.0.0.0 10.244.5.1 Default
10.244.0.0 255.255.255.0 10.71.144.71 Default
10.244.1.0 255.255.255.0 10.71.147.15 Default
10.244.2.0 255.255.255.0 10.71.144.123 Default
0.0.0.0 0.0.0.0 10.244.5.2 Default
10.71.145.37 255.255.255.255 10.71.144.1 100
===========================================================================
从Windows Pod到kube-dns的Traceroute:
C:\>tracert -4 -d -h 10 10.96.0.10
Tracing route to 10.96.0.10 over a maximum of 10 hops
2
1 * * * Request timed out.
2 * * * Request timed out.
3 * * * Request timed out.
4 * * * Request timed out.
5 * * * Request timed out.
6 * * * Request timed out.
7 * * * Request timed out.
8 * * * Request timed out.
9 * * * Request timed out.
10 * * * Request timed out.
Trace complete.
从Linux pod到kube-dns的Traceroute:
root@deb:/# traceroute -4 -n 10.96.0.10
traceroute to 10.96.0.10 (10.96.0.10), 30 hops max, 60 byte packets
1 10.244.2.1 0.396 ms 0.336 ms 0.314 ms
2 10.71.144.1 7.044 ms 9.939 ms 10.062 ms
3 10.71.144.2 1.727 ms 1.917 ms 10.71.144.3 1.233 ms
4 10.68.132.166 6.985 ms 10.68.132.162 7.934 ms 8.404 ms
5 10.103.4.246 203.807 ms 203.405 ms 203.777 ms
6 10.103.4.245 209.431 ms 209.348 ms 209.772 ms
7 10.96.108.86 496.457 ms 502.957 ms 494.978 ms
8 10.96.0.10 211.666 ms * *
跳数1是pod网络地址,跳数2和3是Linux主机的标准网关(VRRP),跳数7是物理网络中的交换机,跳数8是kube-dns服务,其余的跳数(4-6)可能是物理网络中的Cisco路由器。
DNS查询正常,我可以从主机ping 10.96.0.1(kubernetes服务)和10.96.0.10(kube-dns),这让我相信路由正常工作,但我无法ping其他任何服务地址,也无法从Windows主机上的ingress控制器中获取curl等响应。
关闭Windows防火墙也没有任何不同。
我已经想不到还能检查什么了,在Google上搜索几乎没有适用的内容。