EC2用户数据脚本在Centos7 AMI上运行非常缓慢。

4

在AWS市场上,CentOS 7 AMI的userdata脚本每次触及磁盘时似乎会出现25秒的延迟。

以下是我的脚本:

#!/bin/bash -ex
echo "[TIMER] START $(date +%s.%N)"
current_user=$(whoami)
echo "Running as: $current_user"
sudo id -u myuser &>/dev/null || sudo useradd myuser
echo "[TIMER] CreatedUser $(date +%s.%N)"
time sudo yum update -y
echo "[TIMER] Yum Update $(date +%s.%N)"
sudo mkdir -p /opt/myuser/resources
echo "[TIMER] Create /opt/myuser/resources $(date +%s.%N)"

sudo bash -c "cat > /etc/systemd/system/my-service.service" <<EOF
[Unit]
Description=My Service
After=network-online.target

[Service]
User=myuser
Group=myuser
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/bash -ex -c 'echo "Hello World"'

[Install]
Alias=my-service
WantedBy=default.target
EOF

echo "[TIMER] Make my-service.service $(date +%s.%N)"
sudo chmod 644 /etc/systemd/system/my-service.service
echo "[TIMER] Chmod $(date +%s.%N)"
sudo systemctl daemon-reload
echo "[TIMER] daemon-reload $(date +%s.%N)"
sudo systemctl enable my-service
echo "[TIMER] enable $(date +%s.%N)"
sudo systemctl start my-service
echo "[TIMER] END: my-service $(date +%s.%N)"

我启动了一个c5.large的AMI,并使用上述链接中的userdata脚本:https://aws.amazon.com/marketplace/pp/B00O7WM7QW 计时器结果:
[TIMER] START 1546978269.809559549
[TIMER] CreatedUser 1546978320.472706964
[TIMER] Yum Update 1546978356.991642552
[TIMER] Create /opt/myuser/resources 1546978382.033044767
[TIMER] Make my-service.service 1546978407.074353857
[TIMER] Chmod 1546978432.111791937
[TIMER] daemon-reload 1546978457.195078083
[TIMER] enable 1546978482.265036318
[TIMER] END: my-service 1546978507.313735938
| CENTOS 7                                                  |                      |             |
|-----------------------------------------------------------|----------------------|-------------|
|                                                           |                      |             |
| 日志                                                      | 时间戳               | 秒数        |
| [TIMER] 开始 1546978269.809559549                          | 1546978269.809559549 |             |
| [TIMER] 创建用户 1546978320.472706964                      | 1546978320.472706964 | 50.66315007 |
| [TIMER] Yum更新 1546978356.991642552                       | 1546978356.991642552 | 36.51893997 |
| [TIMER] 创建/opt/myuser/resources 1546978382.033044767     | 1546978382.033044767 | 25.04139996 |
| [TIMER] 制作my-service.service 1546978407.074353857         | 1546978407.074353857 | 25.04131007 |
| [TIMER] Chmod 1546978432.111791937                          | 1546978432.111791937 | 25.03743982 |
| [TIMER] daemon-reload 1546978457.195078083                  | 1546978457.195078083 | 25.08328009 |
| [TIMER] 启用 1546978482.265036318                           | 1546978482.265036318 | 25.06995988 |
| [TIMER] 结束: my-service 1546978507.313735938              | 1546978507.313735938 | 25.04870009 |
|                                                           |                      |             |
|                                                           | 总计 (s)             | 237.50418   |
|                                                           |                      |             |
|                                                           | 总计 (m)             | 3.958402999 |

如果您在我的ASCII表中向右滚动,您会发现像mkdirchmoduseradd这样的简单命令需要25秒钟。为什么会发生这种情况?

编辑:

/etc/hosts文件的内容:

$ hostname
ip-172-31-40-213.us-west-2.compute.internal
$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

以下是来自/var/log/messages的示例日志,systemd日志显示创建sudo会话需要25秒:

Jan  9 23:50:32 ip-172-31-35-166 cloud-init: + echo '[TIMER] Make my-service.service 1547077832.899069408'
Jan  9 23:50:32 ip-172-31-35-166 cloud-init: [TIMER] Make my-service.service 1547077832.899069408
Jan  9 23:50:32 ip-172-31-35-166 cloud-init: + sudo chmod 644 /etc/systemd/system/my-service.service
Jan  9 23:50:32 ip-172-31-35-166 systemd: Removed slice User Slice of root.
Jan  9 23:50:32 ip-172-31-35-166 systemd: Created slice User Slice of root.
Jan  9 23:50:32 ip-172-31-35-166 systemd: Started Session c3 of user root.
Jan  9 23:50:57 ip-172-31-35-166 cloud-init: ++ date +%s.%N
Jan  9 23:50:57 ip-172-31-35-166 cloud-init: + echo '[TIMER] Chmod 1547077857.946078493'
Jan  9 23:50:57 ip-172-31-35-166 cloud-init: [TIMER] Chmod 1547077857.946078493

journalctl日志显示可能的罪魁祸首:

Jan 09 23:50:32 ip-172-31-35-166.us-west-2.compute.internal cloud-init[1197]: + echo '[TIMER] Make my-service.service 1547077832.899069408'
Jan 09 23:50:32 ip-172-31-35-166.us-west-2.compute.internal cloud-init[1197]: [TIMER] Make my-service.service 1547077832.899069408
Jan 09 23:50:32 ip-172-31-35-166.us-west-2.compute.internal cloud-init[1197]: + sudo chmod 644 /etc/systemd/system/my-service.service
Jan 09 23:50:32 ip-172-31-35-166.us-west-2.compute.internal systemd[1]: Removed slice User Slice of root.
Jan 09 23:50:32 ip-172-31-35-166.us-west-2.compute.internal sudo[13392]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/chmod 644 /etc/systemd/system/my-service.service
Jan 09 23:50:32 ip-172-31-35-166.us-west-2.compute.internal systemd[1]: Created slice User Slice of root.
Jan 09 23:50:32 ip-172-31-35-166.us-west-2.compute.internal systemd[1]: Started Session c3 of user root.
Jan 09 23:50:57 ip-172-31-35-166.us-west-2.compute.internal sudo[13392]: pam_systemd(sudo:session): Failed to create session: Connection timed out
Jan 09 23:50:57 ip-172-31-35-166.us-west-2.compute.internal sudo[13392]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jan 09 23:50:57 ip-172-31-35-166.us-west-2.compute.internal sudo[13392]: pam_unix(sudo:session): session closed for user root
Jan 09 23:50:57 ip-172-31-35-166.us-west-2.compute.internal cloud-init[1197]: ++ date +%s.%N
Jan 09 23:50:57 ip-172-31-35-166.us-west-2.compute.internal cloud-init[1197]: + echo '[TIMER] Chmod 1547077857.946078493'
Jan 09 23:50:57 ip-172-31-35-166.us-west-2.compute.internal cloud-init[1197]: [TIMER] Chmod 1547077857.946078493

Googling更多,我发现:https://github.com/systemd/systemd/issues/2863 这个问题已经在后来的systemd版本中得到了解决,但是在AWS EC2上的CentOS使用的是systemd 219版本,我无法自行更新。有什么建议吗?是否有一些配置可以避免这个问题?我可以删除我的userdata脚本中的大部分sudo实例,但我确实需要它来执行像以下这样的操作:
sudo -H -u myuser bash -ex <<EOF
  ... commands
EOF

值得一提的是,Amazon Linux 2使用相同版本的systemd,但不会出现此类行为。


1
这是因为 sudo。在您的情况下,它正在等待某些超时才执行命令。您可以尝试使用 sudo 和不使用 sudo 执行相同的命令吗?您能发布机器的 /etc/hosts 文件内容吗? - helloV
@helloV,确实解决了问题。我已经将/etc/hosts的内容添加到我的帖子中。在互联网上搜索显示这可能是反向DNS查找超时,对吗?您能更深入地解释一下根本原因吗? - Lev Dubinets
3
抱歉,我不理解这个解决方案。这不是 StackOverflow 的正确使用方式。您显然从使用 SO 中受益了,请问您能否通过解释如何解决问题来回报 SO 社区? - Bruno Bronosky
1个回答

1
在Redhat的链接中记录了问题和解决方案: https://access.redhat.com/solutions/5692661 总结一下,在userdata脚本中以sudo身份运行命令并不正常,因此默认策略是不允许这样做,这会导致25秒的延迟,因为它尝试运行pam_systemd并由于dbus 25秒超时而超时。
在我的情况下,我尝试运行su <user> -c "command"。通过运行journalctl -b(-b用于当前启动会话),我找到了错误。您可以找到相关的错误日志。
pam_systemd(su:session): Failed to create session: Connection time out

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接