Bash: 带有超时的等待

Question

Bash: 带有超时的等待

64

在Bash脚本中，我想要做类似于：

app1 &
pidApp1=$!
app2 &
pidApp2=$1

timeout 60 wait $pidApp1 $pidApp2
kill -9 $pidApp1 $pidApp2

即，在后台启动两个应用程序，并给它们 60 秒完成其工作。然后，如果它们在该时间间隔内没有完成，就杀死它们。

不幸的是，上面的方法行不通，因为 timeout 是一个可执行文件，而 wait 是一个 shell 命令。我尝试将其更改为：

timeout 60 bash -c wait $pidApp1 $pidApp2

不过这仍然无法正常工作，因为wait只能在同一Shell中启动的PID上调用。

有任何想法吗？

- user1202136

可以改为 sleep 60 吗？虽然不够高效，但更简单易懂。 - Shahbaz

2

“60”必须是最大的执行时间上限。应用程序的实际运行时间可能要低得多。所以，不，这对我来说效率太低了。 - user1202136

如果这些程序真的需要你使用 kill -9，那么它们是有问题的。另请参见 http://www.iki.fi/era/unix/award.html#kill - tripleee

1

bash 的 wait 命令不支持超时，因为它是使用系统调用 wait() 或 waitpid() 实现的，具体取决于是否向其传递了 PID。这两个系统调用都不原生支持超时。也许可以使用信号处理程序和 alarm 来摆脱等待子进程而不实际等待，但我还没有测试过它是否真正可行。 - Mikko Rantalainen

11个回答

28

将进程ID写入文件并像这样启动应用程序：

pidFile=...
( app ; rm $pidFile ; ) &
pid=$!
echo $pid > $pidFile
( sleep 60 ; if [[ -e $pidFile ]]; then killChildrenOf $pid ; fi ; ) &
killerPid=$!

wait $pid
kill $killerPid

这将创建另一个进程，该进程会在超时后休眠并杀死进程（如果到目前为止它还没有完成）。

如果进程完成得更快，则会删除PID文件并终止killer进程。

killChildrenOf是一个脚本，用于获取所有进程并杀死特定PID的所有子进程。有关实现此功能的不同方法，请参见此问题的答案：最佳方法杀死所有子进程

如果您想跳出BASH，可以将PIDs和超时写入目录并监视该目录。每隔一分钟左右，读取条目并检查哪些进程仍然存在以及它们是否已超时。

编辑如果您想知道进程是否已成功终止，可以使用kill -0 $pid

EDIT2 或者您可以尝试使用进程组。 kevinarpe说：要获取PID（146322）的PGID：

ps -fjww -p 146322 | tail -n 1 | awk '{ print $4 }'

在我的情况下：145974。然后可以使用PGID和kill的特殊选项来终止组中的所有进程：kill -- -145974

- Aaron Digulla

1

你不应该使用 kill -9。 - l0b0

我会在 killChildrenOf $pid 后面加上 sleep 5 或其他类似的命令，以确保 kill $killerPid 真正杀死你想要杀死的进程。 - Cassie Dee

3

你可以这样做，或者尝试使用 kill -0 $pid 命令来检查进程是否仍在运行。 - Aaron Digulla

3

我注意到我的方法有一个问题...你考虑过使用进程组的技术吗？以下是获取PID(146322)的PGID的一种消息方法：ps -fjww -p 146322 | tail -n 1 | awk '{ print $4 }'。(在我的情况下，输出为145974)然后可以使用特殊模式的kill命令来终止组中的所有进程：kill -- -145974。 - kevinarpe

1

问题中已经提供了一个类似的解决方案 https://dev59.com/0Kzka4cB1Zd3GeqP7FdZ 。（我认为你的想法相当优雅和聪明；我自己想不到这个。） - imz -- Ivan Zakharyaschev

显示剩余5条评论

9

以下是Aaron Digulla的答案的简化版本，使用了Aaron Digulla在评论中提到的kill -0技巧：

app &
pidApp=$!
( sleep 60 ; echo 'timeout'; kill $pidApp ) &
killerPid=$!

wait $pidApp
kill -0 $killerPid && kill $killerPid

在我的情况下，我想要既能保证set -e -x的安全性，又能返回状态码，所以我使用了以下命令：

set -e -x
app &
pidApp=$!
( sleep 45 ; echo 'timeout'; kill $pidApp ) &
killerPid=$!

wait $pidApp
status=$?
(kill -0 $killerPid && kill $killerPid) || true

exit $status

退出状态为 143 表示 SIGTERM，很可能来自于我们设置的超时。

- Bryan Larsen

问题中提供了一个类似的解决方案 https://dev59.com/0Kzka4cB1Zd3GeqP7FdZ 。（我发现这种解决问题的方法相当优雅和聪明；我自己想不出来。） - imz -- Ivan Zakharyaschev

2

为了提供我的建议，我们可以将Teixeira的解决方案简化为以下几点：

try_wait() {
    # Usage: [PID]...
    for ((i = 0; i < $#; i += 1)); do
        kill -0 $@ && sleep 0.001 || return 0
    done
    return 1 # timeout or no PIDs
} &>/dev/null

Bash的sleep命令接受小数秒，0.001秒等于1毫秒，等于1千赫兹，时间足够长。然而，当涉及到文件和进程时，UNIX没有任何漏洞。 try_wait几乎没有什么作用。

$ cat &
[1] 16574
$ try_wait %1 && echo 'exited' || echo 'timeout'
timeout
$ kill %1
$ try_wait %1 && echo 'exited' || echo 'timeout'
exited

我们需要回答一些难题才能进一步深入了解。

为什么wait没有超时参数？也许是因为timeout、kill -0、wait和wait -n命令可以更精确地告诉机器我们想要什么。

为什么wait内置于Bash中，以至于timeout wait PID无法工作？也许只是为了让Bash能够实现正确的信号处理。

考虑：

$ timeout 30s cat &
[1] 6680
$ jobs
[1]+    Running   timeout 30s cat &
$ kill -0 %1 && echo 'running'
running
$ # now meditate a bit and then...
$ kill -0 %1 && echo 'running' || echo 'vanished'
bash: kill: (NNN) - No such process
vanished

无论是在物质世界还是机器中，我们需要一些基础来运行，同样地，我们也需要一些基础来等待。

当 kill 命令失败时，你几乎不知道原因。除非你编写了这个进程，或者手册明确指出了情况，否则无法确定合理的超时值。

当你编写了这个进程，你可以实现一个适当的 TERM 处理程序，甚至通过命名管道响应发送给它的 "Auf Wiedersehen!" 消息。然后即使是像 try_wait 这样的操作，你也有了一些依据 :-)

- Andreas Spindler

这个答案的精确和优雅就像外科手术刀一样。 - David Golembiowski

2

我编写了一个Bash函数，它会等待PID进程完成或者超时结束。如果超时时间到达，该函数会返回非零值，并打印出所有未完成的PID。请保留HTML标签：```

...

```。

function wait_timeout {
  local limit=${@:1:1}
  local pids=${@:2}
  local count=0
  while true
  do
    local have_to_wait=false
    for pid in ${pids}; do
      if kill -0 ${pid} &>/dev/null; then
        have_to_wait=true
      else
        pids=`echo ${pids} | sed -e "s/${pid}//g"`
      fi
    done
    if ${have_to_wait} && (( $count < $limit )); then
      count=$(( count + 1 ))
      sleep 1
    else
      echo ${pids}
      return 1
    fi
  done   
  return 0
}

要使用此功能，只需执行以下命令：wait_timeout $timeout $PID1 $PID2 ...

- JonatasTeixeira

1

请注意，如果你不幸拥有例如$pid值为123，并且所有PID的列表中包含PID 1231，则sed将删除错误的PID号码。您最终会得到一个被修改，等待PID 1的列表，显然这不会消失。 - Mikko Rantalainen

1

你可以使用“read”内置命令的超时选项。

以下内容将在最多60秒后杀死未终止的任务，并显示已完成任务的名称：

( (job1; echo -n "job1 ")& (job2; echo -n "job2 ")&) | (read -t 60 -a jobarr; echo ${jobarr[*]} ${#jobarr[*]} )

这个技术是通过创建一个子shell来包含所有的后台任务，然后将该子shell的输出读入一个Bash数组变量中，可以根据需要使用它（例如通过打印数组+元素计数）。

请确保在与read命令相同的子shell中引用${jobarr}（因此需要使用括号），否则${jobarr}将为空。

所有的子shell都会在read命令终止后自动静音（不被杀死）。您必须手动杀死它们。

- Lismatro

1

app1 &
app2 &
sleep 60 &

wait -n

- user1931823

0

又一个 timeout bash 脚本

运行许多子进程并设置总超时时间。使用最新的 bash 特性，我编写了这个脚本：

#!/bin/bash
maxTime=5.0 jobs=() pids=() cnt=1 Started=${EPOCHREALTIME/.}
if [[ $1 == -m ]] ;then maxTime=$2; shift 2; fi

for cmd ;do  # $cmd is unquoted in order to use strings as command + args
    $cmd &
    jobs[$!]=$cnt pids[cnt++]=$!
done

printf -v endTime %.6f $maxTime
endTime=$(( Started + 10#${endTime/.} ))
exec {pio}<> <(:) # Pseudo FD for "builtin sleep" by using "read -t" 
while ((${#jobs[@]})) && (( ${EPOCHREALTIME/.} < endTime ));do
    for cnt in ${jobs[@]};do
        if ! jobs $cnt &>/dev/null;then
            Elap=00000$(( ${EPOCHREALTIME/.} - Started ))
            printf 'Job %d (%d) ended after %.4f secs.\n' \
                   $cnt ${pids[cnt]} ${Elap::-6}.${Elap: -6}
            unset jobs[${pids[cnt]}] pids[cnt]
        fi
    done
    read -ru $pio -t .02 _
done
if ((${#jobs[@]})) ;then
    Elap=00000$(( ${EPOCHREALTIME/.} - Started ))
    for cnt in ${jobs[@]};do
        printf 'Job %d (%d) killed after %.4f secs.\n' \
               $cnt ${pids[cnt]} ${Elap::-6}.${Elap: -6}
    done
    kill ${pids[@]}
fi

示例运行：

带参数的命令可以作为字符串提交
-m 开关允许您选择一个浮点数作为秒数的最大时间。

$ ./execTimeout.sh -m 2.3 "sleep 1" 'sleep 2' sleep\ {3,4}  'cat /dev/tty'
Job 1 (460668) ended after 1.0223 secs.
Job 2 (460669) ended after 2.0424 secs.
Job 3 (460670) killed after 2.3100 secs.
Job 4 (460671) killed after 2.3100 secs.
Job 5 (460672) killed after 2.3100 secs.

为了测试这个，我编写了这个脚本：

在1.0000和9.9999秒之间选择随机持续时间
对于输出的随机行数，在0和8之间（它们可能不输出任何内容）。
输出的行包含进程ID（$$），剩余要打印的行数和总持续时间（以秒为单位）。

#!/bin/bash

tslp=$RANDOM lnes=${RANDOM: -1}
printf -v tslp %.6f ${tslp::1}.${tslp:1}
slp=00$((${tslp/.}/($lnes?$lnes:1)))
printf -v slp %.6f ${slp::-6}.${slp: -6}
# echo >&2 Slp $lnes x $slp == $tslp
exec {dummy}<> <(: -O)
while read -rt $slp -u $dummy; ((--lnes>0)); do
    echo $$ $lnes $tslp
done

运行此脚本5次，每次超时时间为5.0秒：

$ ./execTimeout.sh -m 5.0 ./tstscript.sh{,,,,}
2869814 6 2.416700
2869815 5 3.645000
2869814 5 2.416700
2869814 4 2.416700
2869815 4 3.645000
2869814 3 2.416700
2869813 5 8.414000
2869812 1 3.408000
2869814 2 2.416700
2869815 3 3.645000
2869814 1 2.416700
2869815 2 3.645000
Job 3 (2869814) ended after 2.4511 secs.
2869813 4 8.414000
2869815 1 3.645000
Job 1 (2869812) ended after 3.4518 secs.
Job 4 (2869815) ended after 3.6757 secs.
2869813 3 8.414000
Job 2 (2869813) killed after 5.0159 secs.
Job 5 (2869816) killed after 5.0159 secs.

- F. Hauri - Give Up GitHub

0

有一些进程在从超时调用时无法正常工作。我遇到了一个问题，需要在qemu实例周围放置一个超时捕获，如果您调用

timeout 900 qemu

它将永远挂起。

我的解决方案

./qemu_cmd &
qemuPid=$!
timeout 900 tail --pid=$qemuPid -f /dev/null
ret=$?
if [ "$ret" != "0" ]; then
   allpids=()
   descendent_pids $tracePid
   for pids in ${allpids[@]};do
      kill -9 $pids
   done
fi

descendent_pids(){
   allpids=("${allpids[@]}" $1)
   pids=$(pgrep -P $1)
   for pid in $pids; do
      descendent_pids $pid
   done
}

值得注意的是，超时并不总是会杀死后代进程，这取决于您从超时生成的 cmd 的复杂程度。

- watermjm

面对相同的问题，我发现这是由于qemu的-nographic或-serial stdio选项引起的。 - undefined

0

迟到总比不来得好，这是使用等待而不轮询的解决方案（虽然仍然是一个循环），但尽可能快地停止。

app1 &
pidApp1=$!
app2 &
pidApp2=$!

# timeout 60 wait $pidApp1 $pidApp2
declare -A pidApps=( [$pidApp1]=running [$pidApp2]=running )
{ sleep 60; echo "stop"; } | read &
pidTmout=$!
while [[ ${#pidApps[@]} -gt 0 ]]; do
    wait -np pidStop
    [[ $pidStop == $pidTmout ]] && break
    unset pidApps[$pidStop]
done
[[ ${pidApps[$pidApp1]} == running ]] && kill -9 $pidApp1
[[ ${pidApps[$pidApp2]} == running ]] && kill -9 $pidApp2

- joheirba

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Adrian Frühwirth · Accepted Answer

你的示例和被接受的答案都过于复杂，为什么不只使用timeout，因为这正是它的用途？timeout命令甚至有一个内置选项(-k)，在发送初始信号终止命令(SIGTERM默认情况下)后，如果命令仍在运行，则发送SIGKILL来结束命令(参见man timeout)。

如果脚本并不一定需要等待并在等待后恢复控制流，那只是很简单的问题。

timeout -k 60s 60s app1 &
timeout -k 60s 60s app2 &
# [...]

如果需要这样做，只需保存timeout的PID即可：

pids=()
timeout -k 60s 60s app1 &
pids+=($!)
timeout -k 60s 60s app2 &
pids+=($!)
wait "${pids[@]}"
# [...]

例如。

$ cat t.sh
#!/bin/bash

echo "$(date +%H:%M:%S): start"
pids=()
timeout 10 bash -c 'sleep 5; echo "$(date +%H:%M:%S): job 1 terminated successfully"' &
pids+=($!)
timeout 2 bash -c 'sleep 5; echo "$(date +%H:%M:%S): job 2 terminated successfully"' &
pids+=($!)
wait "${pids[@]}"
echo "$(date +%H:%M:%S): done waiting. both jobs terminated on their own or via timeout; resuming script"

.

$ ./t.sh
08:59:42: start
08:59:47: job 1 terminated successfully
08:59:47: done waiting. both jobs terminated on their own or via timeout; resuming script