逐行读取子进程的标准输出

Question

逐行读取子进程的标准输出

311

我的Python脚本使用subprocess调用一个非常吵闹的Linux工具。我想将所有的输出存储到日志文件中，并向用户展示其中的一部分。我以为下面的方法可以实现，但是输出直到该工具产生大量输出后才会在我的应用程序中显示出来。

# fake_utility.py, just generates lots of output over time
import time
i = 0
    while True:
        print(hex(i)*512)
        i += 1
        time.sleep(0.5)

在父进程中：

import subprocess

proc = subprocess.Popen(['python', 'fake_utility.py'], stdout=subprocess.PIPE)
for line in proc.stdout:
    # the real code does filtering here
    print("test:", line.rstrip())

我真正想要的行为是过滤脚本在接收到子进程的每一行时打印出来，就像tee一样，但是在Python代码中实现。

我错过了什么？这种情况是否可能？

- deft_code

5

你可以使用 print line, 而不是 print line.rstrip()（注意：在末尾加逗号）。 - jfs

相关：Python：从subprocess.communicate()读取流式输入 - jfs

2

更新2说明它可以在Python 3.0+上运行，但使用旧的打印语句，因此无法在Python 3.0+上工作。 - Rooky

这里列出的答案都对我没用，但是 https://dev59.com/J2035IYBdhLWcg3wc_sm#5413588 很有用！ - boxed

有趣的是，只在Python3.0+中运行的代码使用2.7语法进行打印。 - thang

1

更新不起作用。你只是逐行打印，而不是逐个接收它们。 - Vaidøtas I.

13个回答

97

晚来了一步，但我很惊讶没有看到我认为最简单的解决方案：

import io
import subprocess

proc = subprocess.Popen(["prog", "arg"], stdout=subprocess.PIPE)
for line in io.TextIOWrapper(proc.stdout, encoding="utf-8"):  # or another encoding
    # do something with line

（需要使用 Python 3。）

- jbg

29

我想使用这个答案，但是我遇到了以下错误信息：AttributeError: 'file' object has no attribute 'readable'。我正在使用Python 2.7版本。 - Dan Garthwaite

7

使用Python 3工作。 - matanster

11

如果你正在编写一个仍然需要支持Python 2的库，那么不要使用这段代码。但许多人有幸能够使用比十年前更新发布的软件。如果您尝试读取已关闭的文件，则无论是否使用TextIOWrapper，都会出现该异常。您可以简单地处理此异常，但这些并不能使它“无效”。 - jbg

3

@Ammad中的\n是表示换行符。在Python中，当按行分割时，通常不会移除换行符 - 如果您迭代文件的行或使用readlines()方法，您将看到相同的行为。您可以使用line[:-1]来获取没有它的行（TextIOWrapper默认使用“通用换行符”模式，因此即使您在Windows上并且该行以\r\n结尾，您也只有\n在末尾，所以-1有效）。如果您不介意任何其他类似空格的字符被移除，则也可以使用line.rstrip()。 - jbg

2

我在Python 3.7中遇到了“AttributeError: 'file' object has no attribute 'readable'”错误，但这是因为我使用了subprocess.run而不是subprocess.Popen。 - cowlinator

显示剩余4条评论

27

实际上，如果您解决了迭代器问题，那么缓冲可能是您的问题。您可以告诉子进程中的Python不要缓冲其输出。

proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)

成为

proc = subprocess.Popen(['python','-u', 'fake_utility.py'],stdout=subprocess.PIPE)

当我在Python中调用Python时，我需要这个。

- Steve Carter

20

一种函数，允许同时实时按行迭代stdout和stderr。

如果您需要同时获取stdout和stderr的输出流，可以使用以下函数。

该函数使用队列将两个Popen管道合并为单个迭代器。

这里我们创建了 read_popen_pipes() 函数：

from queue import Queue, Empty
from concurrent.futures import ThreadPoolExecutor


def enqueue_output(file, queue):
    for line in iter(file.readline, ''):
        queue.put(line)
    file.close()


def read_popen_pipes(p):

    with ThreadPoolExecutor(2) as pool:
        q_stdout, q_stderr = Queue(), Queue()

        pool.submit(enqueue_output, p.stdout, q_stdout)
        pool.submit(enqueue_output, p.stderr, q_stderr)

        while True:

            if p.poll() is not None and q_stdout.empty() and q_stderr.empty():
                break

            out_line = err_line = ''

            try:
                out_line = q_stdout.get_nowait()
            except Empty:
                pass
            try:
                err_line = q_stderr.get_nowait()
            except Empty:
                pass

            yield (out_line, err_line)

read_popen_pipes() 的使用：

import subprocess as sp


with sp.Popen(my_cmd, stdout=sp.PIPE, stderr=sp.PIPE, text=True) as p:

    for out_line, err_line in read_popen_pipes(p):

        # Do stuff with each line, e.g.:
        print(out_line, end='')
        print(err_line, end='')

    return p.poll() # return status-code

- Rotareti

18

你想将这些额外参数传递给 subprocess.Popen：

bufsize=1, universal_newlines=True

然后您可以像示例中那样迭代。(已在Python 3.5中测试)

- user1747134

2

@nicoulaj 如果使用subprocess32包，它应该可以工作。 - Quantum7

6

您也可以在不使用循环的情况下读取行。适用于Python3.6。

import os
import subprocess

process = subprocess.Popen(command, stdout=subprocess.PIPE)
list_of_byte_strings = process.stdout.readlines()

- aiven

1

或者转换为字符串：list_of_strings = [x.decode('utf-8').rstrip('\n') for x in iter(process.stdout.readlines())] - ndtreviv

3

如果你希望输出结果为字符串，可以向Popen传递text=True参数或使用它的"encoding"关键字参数，不需要自己进行转换。请注意，不要改变原来的意思。 - Bobby Impollonia

4

Python 3.5增加了subprocess模块的run()和call()方法，两者都返回一个CompletedProcess对象。使用它们，您可以轻松地使用proc.stdout.splitlines()：

proc = subprocess.run( comman, shell=True, capture_output=True, text=True, check=True )
for line in proc.stdout.splitlines():
   print "stdout:", line

请参考如何使用Subprocess Run方法在Python中执行Shell命令

。

- StefanQ

8

这个解决方案简单高效。但与原问题相比，有一个问题：它没有按照"按接收顺序打印每一行"的要求打印消息，这意味着需要实时地按照在命令行中直接运行进程的方式打印消息。相反，它只会在进程运行结束后打印输出结果。 - sfuqua

2

感谢@sfuqua提到这一点。我广泛使用管道并依赖流数据，如果只考虑简洁性，我会做出错误的选择。 - Sridhar Sarnobat

这并没有回答问题。它将子进程的整个输出缓冲到内存中。 - undefined

2

subprocess模块自2010年以来发展了很长一段时间，这里的大部分答案都已经过时了。

下面是适用于现代Python版本的一种简单方法：

from subprocess import Popen, PIPE, STDOUT

with Popen(args, stdout=PIPE, stderr=STDOUT, text=True) as proc:
    for line in proc.stdout:
        print(line)
rc = proc.returncode

关于使用Popen作为上下文管理器：在with块退出时，标准文件描述符会被关闭，并等待进程结束/设置返回代码属性。

- wim

1

我用Python3尝试过，source。当你使用popen生成新线程时，你告诉操作系统将子进程的stdout以管道方式传输到父进程中读取，在这里，stderr则被复制到父进程的stderr中。在output_reader中，我们通过将其包装在一个迭代器中来逐行读取子进程的stdout输出，每当有新的一行准备好时就会填充输出。

def output_reader(proc):
    for line in iter(proc.stdout.readline, b''):
        print('got line: {0}'.format(line.decode('utf-8')), end='')


def main():
    proc = subprocess.Popen(['python', 'fake_utility.py'],
                            stdout=subprocess.PIPE,
                            stderr=subprocess.STDOUT)

    t = threading.Thread(target=output_reader, args=(proc,))
    t.start()

    try:
        time.sleep(0.2)
        import time
        i = 0
    
        while True:
        print (hex(i)*512)
        i += 1
        time.sleep(0.5)
    finally:
        proc.terminate()
        try:
            proc.wait(timeout=0.2)
            print('== subprocess exited with rc =', proc.returncode)
        except subprocess.TimeoutExpired:
            print('subprocess did not terminate in time')
    t.join()

- shakram02

1

这很棒，但它似乎使用了普通的 Popen。你应该真正描述它与众不同的地方以及它的作用，而不仅仅展示代码片段。其中有很多让读者感到惊讶的内容，我们应该遵循最少惊喜原则。 - Maarten Bodewes

谢谢@MaartenBodewes，我在答案中添加了更多细节，请让我知道您是否有更多评论。 - shakram02

好多了，已点赞。我会删除我的评论，你也可以这样做 :) - Maarten Bodewes

0

我在使用Popen更新服务器时遇到了参数列表的问题，以下代码可以解决这个问题。

import getpass
from subprocess import Popen, PIPE

username = 'user1'
ip = '127.0.0.1'

print ('What is the password?')
password = getpass.getpass()
cmd1 = f"""sshpass -p {password} ssh {username}@{ip}"""
cmd2 = f"""echo {password} | sudo -S apt update"""
cmd3 = " && "
cmd4 = f"""echo {password} | sudo -S apt upgrade -y"""
cmd5 = " && "
cmd6 = "exit"
commands = [cmd1, cmd2, cmd3, cmd4, cmd5, cmd6]

command = " ".join(commands)

cmd = command.split()

with Popen(cmd, stdout=PIPE, bufsize=1, universal_newlines=True) as p:
    for line in p.stdout:
        print(line, end='')

要在本地计算机上运行更新，可以使用以下代码示例。

import getpass
from subprocess import Popen, PIPE

print ('What is the password?')
password = getpass.getpass()

cmd1_local = f"""apt update"""
cmd2_local = f"""apt upgrade -y"""
commands = [cmd1_local, cmd2_local]

with Popen(['echo', password], stdout=PIPE) as auth:
    for cmd in commands:
        cmd = cmd.split()
        with Popen(['sudo','-S'] + cmd, stdin=auth.stdout, stdout=PIPE, bufsize=1, universal_newlines=True) as p:
            for line in p.stdout:
                print(line, end='')

- Stan S.

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Rômulo Ceccon · Accepted Answer

我认为问题出在语句 for line in proc.stdout 上，它会在迭代之前读取整个输入。解决方法是使用readline()代替:

#filters output
import subprocess
proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)
while True:
  line = proc.stdout.readline()
  if not line:
    break
  #the real code does filtering here
  print "test:", line.rstrip()

当然，你仍然需要处理子进程的缓冲问题。

注意：根据文档，使用迭代器的解决方案应该等效于使用readline()，除了向前读取缓冲区外，但是（或者正是因为这个原因），对我来说，建议的更改在某些情况下产生了不同的结果（Python 2.5 on Windows XP）。