为什么Redis在Kubernetes中总是重启？

Question

为什么Redis在Kubernetes中总是重启？

3

Redis容器一直在重启。我该如何找出这种行为的原因？

我发现资源配额应该升级，但我不知道最佳的CPU / RAM比率是什么。为什么没有崩溃事件或日志呢？

以下是容器：

> kubectl get pods
    redis-master-5d9cfb54f8-8pbgq                     1/1     Running     33         3d16h

以下是日志记录：

> kubectl logs --follow redis-master-5d9cfb54f8-8pbgq
[1] 08 Sep 07:02:12.152 # Server started, Redis version 2.8.19
[1] 08 Sep 07:02:12.153 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
[1] 08 Sep 07:02:12.153 * The server is now ready to accept connections on port 6379
[1] 08 Sep 07:03:13.085 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 07:03:13.085 * Background saving started by pid 8
[8] 08 Sep 07:03:13.101 * DB saved on disk
[8] 08 Sep 07:03:13.101 * RDB: 0 MB of memory used by copy-on-write
[1] 08 Sep 07:03:13.185 * Background saving terminated with success
[1] 08 Sep 07:04:14.018 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 07:04:14.018 * Background saving started by pid 9
...
[93] 08 Sep 08:38:30.160 * DB saved on disk
[93] 08 Sep 08:38:30.164 * RDB: 2 MB of memory used by copy-on-write
[1] 08 Sep 08:38:30.259 * Background saving terminated with success
[1] 08 Sep 08:39:31.072 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 08:39:31.074 * Background saving started by pid 94

这里是同一 pod 的先前日志记录。

> kubectl logs --previous --follow redis-master-5d9cfb54f8-8pbgq
[1] 08 Sep 09:41:46.057 * Background saving terminated with success
[1] 08 Sep 09:42:47.073 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 09:42:47.076 * Background saving started by pid 140
[140] 08 Sep 09:43:14.398 * DB saved on disk
[140] 08 Sep 09:43:14.457 * RDB: 1 MB of memory used by copy-on-write
[1] 08 Sep 09:43:14.556 * Background saving terminated with success
[1] 08 Sep 09:44:15.073 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 09:44:15.077 * Background saving started by pid 141
[1 | signal handler] (1599558267) Received SIGTERM scheduling shutdown...
[1] 08 Sep 09:44:28.052 # User requested shutdown...
[1] 08 Sep 09:44:28.052 # There is a child saving an .rdb. Killing it!
[1] 08 Sep 09:44:28.052 * Saving the final RDB snapshot before exiting.
[1] 08 Sep 09:44:49.592 * DB saved on disk
[1] 08 Sep 09:44:49.592 # Redis is now ready to exit, bye bye...

这是 pod 的描述。您可以看到限制为 100Mi，但我无法看到 pod 重新启动的阈值。

> kubectl describe pod redis-master-5d9cfb54f8-8pbgq
Name:           redis-master-5d9cfb54f8-8pbgq
Namespace:      cryptoman
Priority:       0
Node:           gke-my-cluster-default-pool-818613a8-smmc/10.172.0.28
Start Time:     Fri, 04 Sep 2020 18:52:17 +0300
Labels:         app=redis
                pod-template-hash=5d9cfb54f8
                role=master
                tier=backend
Annotations:    <none>
Status:         Running
IP:             10.36.2.124
IPs:            <none>
Controlled By:  ReplicaSet/redis-master-5d9cfb54f8
Containers:
  master:
    Container ID:   docker://3479276666a41df502f1f9eb9bb2ff9cfa592f08a33e656e44179042b6233c6f
    Image:          k8s.gcr.io/redis:e2e
    Image ID:       docker-pullable://k8s.gcr.io/redis@sha256:f066bcf26497fbc55b9bf0769cb13a35c0afa2aa42e737cc46b7fb04b23a2f25
    Port:           6379/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Wed, 09 Sep 2020 10:27:56 +0300
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    0
      Started:      Wed, 09 Sep 2020 07:34:18 +0300
      Finished:     Wed, 09 Sep 2020 10:27:55 +0300
    Ready:          True
    Restart Count:  42
    Limits:
      cpu:     100m
      memory:  250Mi
    Requests:
      cpu:        100m
      memory:     250Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-5tds9 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-5tds9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-5tds9
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason          Age                   From                                                Message
  ----    ------          ----                  ----                                                -------
  Normal  SandboxChanged  52m (x42 over 4d13h)  kubelet, gke-my-cluster-default-pool-818613a8-smmc  Pod sandbox changed, it will be killed and re-created.
  Normal  Killing         52m (x42 over 4d13h)  kubelet, gke-my-cluster-default-pool-818613a8-smmc  Stopping container master
  Normal  Created         52m (x43 over 4d16h)  kubelet, gke-my-cluster-default-pool-818613a8-smmc  Created container master
  Normal  Started         52m (x43 over 4d16h)  kubelet, gke-my-cluster-default-pool-818613a8-smmc  Started container master
  Normal  Pulled          52m (x42 over 4d13h)  kubelet, gke-my-cluster-default-pool-818613a8-smmc  Container image "k8s.gcr.io/redis:e2e" already present on machine

- Alexander Paul Wansiedler

4个回答

2

主要的问题是您没有限制redis应用程序。因此，redis会不断增加内存，当达到Pod limits.memory 250Mb时，它会被OOM杀死，并重新启动。然后，如果删除limits.memory，redis将继续使用内存，直到节点没有足够的内存运行其他进程，K8s将其杀死并标记为“驱逐”。

因此，请在redis.conf文件中配置redis应用程序的内存，以限制redis使用的内存，并根据您的需求设置LRU或LFU策略来删除一些键（https://redis.io/topics/lru-cache）：

maxmemory 256mb
maxmemory-policy allkeys-lfu

将Pod的内存限制在redis maxmemory的两倍左右，以为在redis中保存的其余进程和对象提供一定的余地：

resources:
  limits:
    cpu:     100m
    memory:  512Mi

- jotacor

2

Max的回答非常完整。但是如果您没有安装Prometheus或不想安装，还有另一种简单的方法可以检查实际资源消耗，在集群中安装指标服务器项目。安装后，您可以使用kubectl top node检查节点上的CPU和内存使用情况，以检查节点上的消耗，使用kubectl top pod检查pod上的消耗。我使用它非常有用。

或者，您可以只增加CPU和内存限制，但是您将无法确保容器需要多少资源。基本上会浪费资源。

- Daniel Marques

-1

现在pod正在被驱逐。我能知道原因吗？

NAME                                              READY   STATUS             RESTARTS   AGE
redis-master-7d97765bbb-7kjwn                     0/1     Evicted            0          38h
redis-master-7d97765bbb-kmc9g                     1/1     Running            0          30m
redis-master-7d97765bbb-sf2ss                     0/1     Evicted            0          30m

- Alexander Paul Wansiedler

你能发布一下被驱逐的Pod的kubectl describe吗？ - Daniel Marques

你应该编辑你的第一篇帖子，将这个新信息添加进去，而不是自己回复更多问题。 - jotacor

Pods正在被驱逐，因为Kubernetes集群的节点内存不足。在这里可以看到我的评论：https://dev59.com/VlMHtIcB2Jgan1zn_taG#HvUwoYgBc1ULPQZF8b9B - jotacor

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Max Lobur · Accepted Answer

这是重启之后的极限。CPU被限制，内存被清除。

    Limits:
      cpu:     100m
      memory:  250Mi

原因: OOMKilled

去除请求和限制
运行Pod，确保它不会重启
如果您已经有Prometheus，请运行VPA推荐器来检查需要多少资源。或者只需使用任何监控堆栈：GKE Prometheus，prometheus-operator，DataDog等来检查实际资源消耗并相应地调整限制。