获取后台进程的退出代码

Question

获取后台进程的退出代码

linuxshellunixprocess

177

我有一个在我的主Bourne Shell脚本中调用的命令CMD需要很长时间才能执行完。

我想按照以下方式修改脚本：

将命令CMD作为后台进程并行运行（CMD &）。
在主脚本中，有一个循环来监视每隔几秒钟生成的命令。该循环还会打印一些消息到标准输出，指示脚本的进度。
当生成的命令终止时退出循环。
捕获并报告生成的进程的退出代码。

有人能给我提供实现这个需求的提示吗？

- bob

参见：如何在bash中等待多个子进程完成，并在任何子进程以!=0的代码结束时返回exit code !=0？ - Gabriel Staples

14个回答

81

当我有类似需求时，这是我解决问题的方法：

# Some function that takes a long time to process
longprocess() {
        # Sleep up to 14 seconds
        sleep $((RANDOM % 15))
        # Randomly exit with 0 or 1
        exit $((RANDOM % 2))
}

pids=""
# Run five concurrent processes
for i in {1..5}; do
        ( longprocess ) &
        # store PID of process
        pids+=" $!"
done

# Wait for all processes to finish, will take max 14s
# as it waits in order of launch, not order of finishing
for p in $pids; do
        if wait $p; then
                echo "Process $p success"
        else
                echo "Process $p fail"
        fi
done

- Bjorn

6

这个解决方案不能满足第二个要求：每个后台进程需要一个监控循环。使用“wait”命令会导致脚本一直等待，直到（每个）进程结束。 - Dima Korobskiy

简单而美好的方法..我已经寻找这个解决方案相当一段时间了.. - Santosh Kumar Arjunan

这不起作用...或者不做你想要的事情：它没有检查后台进程的退出状态？ - conny

1

@conny，是的，它确实检查了后台进程的退出状态。 "wait" 命令返回进程的退出状态。在这里展示的示例中，通过 "Process $p success/fail" 来演示。 - Bjorn

无论它是否回答了原始问题，这帮助我编写了一个脚本，用于检查并行化后台进程的完成和失败情况，所以谢谢！对于任何感兴趣的人，我已将生成的测试脚本保存为gist，链接在此处：https://gist.github.com/therightstuff/f4cc70db21d8e21d7277a414dbefa0a6 - therightstuff

@Bjorn，绝对是最好的答案。没有其他解决方案能够获取我后台函数调用的退出代码。建议强调使用有意的子shell调用（第12行）来调用函数块，否则由于后台进程清理，我会得到一堆“wait：pid ####不是此shell的子进程”的错误信息。谢谢！ - codejedi365

21

一个后台子进程的进程ID会被存储在$!中。你可以将所有子进程的进程ID存储到一个数组中，例如PIDS[]。

wait [-n] [jobspec or pid …]

等待由每个进程ID pid或作业规范jobspec指定的子进程退出，并返回最后一个等待的命令的退出状态。如果给出作业规范，则等待作业中的所有进程。如果没有给出参数，则等待所有当前活动的子进程，返回状态为零。如果提供了-n选项，则wait等待任何作业终止并返回其退出状态。如果jobspec和pid都没有指定shell的活动子进程，则返回状态为127。

使用wait命令，您可以等待所有子进程完成，同时可以通过$?获取每个子进程的退出状态并将其存储到STATUS[]中。然后，根据状态可以执行一些操作。

我尝试了以下两种解决方案，它们运行良好。solution01更为简洁，而solution02则稍微复杂一些。

solution01

#!/bin/bash

# start 3 child processes concurrently, and store each pid into array PIDS[].
process=(a.sh b.sh c.sh)
for app in ${process[@]}; do
  ./${app} &
  PIDS+=($!)
done

# wait for all processes to finish, and store each process's exit code into array STATUS[].
for pid in ${PIDS[@]}; do
  echo "pid=${pid}"
  wait ${pid}
  STATUS+=($?)
done

# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[@]}; do
  if [[ ${st} -ne 0 ]]; then
    echo "$i failed"
  else
    echo "$i finish"
  fi
  ((i+=1))
done

解决方案02

#!/bin/bash

# start 3 child processes concurrently, and store each pid into array PIDS[].
i=0
process=(a.sh b.sh c.sh)
for app in ${process[@]}; do
  ./${app} &
  pid=$!
  PIDS[$i]=${pid}
  ((i+=1))
done

# wait for all processes to finish, and store each process's exit code into array STATUS[].
i=0
for pid in ${PIDS[@]}; do
  echo "pid=${pid}"
  wait ${pid}
  STATUS[$i]=$?
  ((i+=1))
done

# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[@]}; do
  if [[ ${st} -ne 0 ]]; then
    echo "$i failed"
  else
    echo "$i finish"
  fi
  ((i+=1))
done

- Terry

我已经尝试并证明它运行良好。您可以在我的代码中阅读我的解释。 - Terry

2

请阅读“如何撰写优秀的答案？”以获取以下信息：**...尽量在你的回答中提及任何限制、假设或简化。简洁是可以接受的，但更充分的解释更好。**因此，您的答案是可以接受的，但如果您能详细说明问题和解决方案，您就有更好的获得赞同的机会。 :-) - Noel Widmer

3

pid=$!; PIDS[$i]=${pid}; ((i+=1)) 可以简化为 PIDS+=($!)，这样就不需要使用额外的变量来进行索引或存储 pid 了。对于 STATUS 数组也适用同样的方法。请注意，简化后的代码不会改变原意，只是更加简洁易懂。 - codeforester

1

@codeforester，感谢您的建议，我已经将我的初始代码修改为solution01，看起来更加简洁。 - Terry

同样的事情也适用于其他需要向数组中添加元素的地方。 - codeforester

显示剩余2条评论

13

我看到几乎所有答案都使用外部工具（主要是ps）来轮询后台进程的状态。有一个更Unix风格的解决方案，可以捕获SIGCHLD信号。在信号处理程序中，必须检查哪个子进程已停止。它可以通过内置（通用）的kill -0 <PID>或检查/proc/<PID>目录的存在（仅适用于Linux）或使用内置的jobs（bash特定。 jobs -l还报告pid。在这种情况下，输出的第3个字段可以是Stopped | Running | Done | Exit。

这是我的例子。

启动的进程名为loop.sh。它接受-x或数字作为参数。对于-x，它以退出代码1退出。对于数字，它会等待num * 5秒。每5秒钟它会打印它的PID。

启动器进程称为launch.sh：

#!/bin/bash

handle_chld() {
    local tmp=()
    for((i=0;i<${#pids[@]};++i)); do
        if [ ! -d /proc/${pids[i]} ]; then
            wait ${pids[i]}
            echo "Stopped ${pids[i]}; exit code: $?"
        else tmp+=(${pids[i]})
        fi
    done
    pids=(${tmp[@]})
}

set -o monitor
trap "handle_chld" CHLD

# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)

# Wait until all background processes are stopped
while [ ${#pids[@]} -gt 0 ]; do echo "WAITING FOR: ${pids[@]}"; sleep 2; done
echo STOPPED

更多解释请参见：从Bash脚本启动进程失败

- TrueY

2

既然我们正在谈论Bash，那么for循环可以使用参数扩展写成：for i in ${!pids[@]};。 - PlasmaBinturong

11

#/bin/bash

#pgm to monitor
tail -f /var/log/messages >> /tmp/log&
# background cmd pid
pid=$!
# loop to monitor running background cmd
while :
do
    ps ax | grep $pid | grep -v grep
    ret=$?
    if test "$ret" != "0"
    then
        echo "Monitored pid ended"
        break
    fi
    sleep 5

done

wait $pid
echo $?

- Abu Aqil

2

这里有一个避免使用 grep -v 的技巧。你可以将搜索限制在行的开头：grep '^'$pid 此外，你也可以使用 ps p $pid -o pid=。另外，tail -f 不会结束直到你杀掉它，所以我认为这不是一个很好的演示方式（至少没有指出来）。你可能想将 ps 命令的输出重定向到 /dev/null，否则它会在每次迭代时显示在屏幕上。你的 exit 导致 wait 被跳过了 - 它应该是一个 break。但是 while/ps 和 wait 不是多余的吗？ - Dennis Williamson

5

为什么大家都忘记了 kill -0 $pid？它实际上没有发送任何信号，只是使用内置的shell命令而不是外部进程检查进程是否存活。请注意，此命令仅用于检查进程是否存在，并不会对进程执行任何操作。 - ephemient

3

因为你只能终止自己拥有的进程：bash: kill: (1) - Operation not permitted。 - curious_prism

2

循环是多余的。只需等待即可。少代码=>更少的边缘情况。 - Brais Gabin

1

@Brais Gabin 监控循环是问题的第二要求。 - Dima Korobskiy

5

我会稍微改变你的方法。不是每隔几秒钟检查命令是否仍在运行并报告消息，而是有另一个进程每隔几秒钟报告一次命令仍在运行，然后在命令完成时杀死该进程。例如：

#!/bin/sh
cmd() { sleep 5; exit 24; }
cmd &   # 运行长时间运行的进程
pid=$!  # 记录 pid
# 生成一个进程，连续报告命令仍在运行
while echo "$(date): $pid is still running"; do sleep 1; done &
echoer=$!
# 设置一个陷阱，在进程完成时杀死报告者
trap 'kill $echoer' 0
# 等待进程完成
if wait $pid; then
    echo "cmd 成功"
else
    echo "cmd 失败!! (返回 $?)"
fi

- William Pursell

很棒的模板，感谢分享！我相信，我们可以使用while kill -0 $pid 2> /dev/null; do X; done代替trap，希望对未来阅读此消息的其他人有用;) - punkbit

3

我们的团队有一个远程SSH执行脚本的需求，但是在25分钟不活动后会超时。这里提供了一个解决方案，使用监控循环每秒检查后台进程，但仅每10分钟打印一次以抑制不活动超时。

long_running.sh & 
pid=$!

# Wait on a background job completion. Query status every 10 minutes.
declare -i elapsed=0
# `ps -p ${pid}` works on macOS and CentOS. On both OSes `ps ${pid}` works as well.
while ps -p ${pid} >/dev/null; do
  sleep 1
  if ((++elapsed % 600 == 0)); then
    echo "Waiting for the completion of the main script. $((elapsed / 60))m and counting ..."
  fi
done

# Return the exit code of the terminated background process. This works in Bash 4.4 despite what Bash docs say:
# "If neither jobspec nor pid specifies an active child process of the shell, the return status is 127."
wait ${pid}

- Dima Korobskiy

2

另一种解决方案是通过proc文件系统监视进程（比ps/grep组合更安全）；当您启动进程时，它在/proc/$pid中有一个对应的文件夹，因此解决方案可以是

#!/bin/bash
....
doSomething &
local pid=$!
while [ -d /proc/$pid ]; do # While directory exists, the process is running
    doSomethingElse
    ....
else # when directory is removed from /proc, process has ended
    wait $pid
    local exit_status=$?
done
....

现在，您可以随意使用$exit_status变量。

- Iskren

在bash中无法工作？“语法错误：else意外（期望“done”）” - benjaoming

1

使用这种方法，您的脚本不必等待后台进程，您只需监视一个临时文件以获取退出状态即可。

FUNCmyCmd() { sleep 3;return 6; };

export retFile=$(mktemp); 
FUNCexecAndWait() { FUNCmyCmd;echo $? >$retFile; }; 
FUNCexecAndWait&

现在，您的脚本可以做任何其他事情，而您只需要继续监视retFile的内容（它还可以包含您想要的任何其他信息，例如退出时间）。

附注：顺便说一下，我是按照bash编码的。

- Aquarius Power

1

一个简单的例子，与上面的解决方案类似。这不需要监控任何进程输出。下一个例子使用tail跟踪输出。

$ echo '#!/bin/bash' > tmp.sh
$ echo 'sleep 30; exit 5' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh &
[1] 7454
$ pid=$!
$ wait $pid
[1]+  Exit 5                  ./tmp.sh
$ echo $?
5

使用tail命令跟踪进程输出，并在进程完成时退出。

$ echo '#!/bin/bash' > tmp.sh
$ echo 'i=0; while let "$i < 10"; do sleep 5; echo "$i"; let i=$i+1; done; exit 5;' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh
0
1
2
^C
$ ./tmp.sh > /tmp/tmp.log 2>&1 &
[1] 7673
$ pid=$!
$ tail -f --pid $pid /tmp/tmp.log
0
1
2
3
4
5
6
7
8
9
[1]+  Exit 5                  ./tmp.sh > /tmp/tmp.log 2>&1
$ wait $pid
$ echo $?
5

- Darren Weber

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mob · Accepted Answer

1: 在bash中，$!保存着最后一个被执行的后台进程的PID。这将告诉你要监视哪个进程。

4: wait <n> 命令会一直等到进程ID为<n>的进程完成（它会阻塞直到该进程完成，所以在确定进程完成之前可能不想调用此命令），然后返回已完成进程的退出代码。

2, 3: ps 或 ps | grep " $! " 可以告诉你进程是否仍在运行。如何理解输出并决定进程有多接近完成取决于你。（ps | grep 不是傻瓜式操作。如果你有时间，可以想出更健壮的方法来判断进程是否仍在运行）。

以下是脚本框架：

# simulate a long process that will have an identifiable exit code
(sleep 15 ; /bin/false) &
my_pid=$!

while   ps | grep " $my_pid "     # might also need  | grep -v grep  here
do
    echo $my_pid is still in the ps output. Must still be running.
    sleep 3
done

echo Oh, it looks like the process is done.
wait $my_pid
# The variable $? always holds the exit code of the last command to finish.
# Here it holds the exit code of $my_pid, since wait exits with that code. 
my_status=$?
echo The exit status of the process was $my_status