如何在Python中使用多进程队列?

156

我在尝试理解Python中的多进程队列以及如何实现它时遇到了很多问题。假设我有两个Python模块,它们从一个共享文件中访问数据,我们称这两个模块为写入者和读取者。我的计划是让读取者和写入者将请求放入两个不同的多进程队列中,然后让第三个进程在循环中弹出这些请求并执行相应的操作。

我的主要问题是我真的不知道如何正确地实现multiprocessing.queue,因为你不能为每个进程实例化对象,否则它们将成为独立的队列。那么,如何确保所有进程都关联到共享队列(或在这种情况下,队列)呢?


7
在父进程中实例化每个进程类时,将队列作为参数传递给它们。 - Joel Cornett
7个回答

185

简短概述

截至2023年,本答案中描述的技术已经相当过时。现在,应该使用concurrent.futures.ProcessPoolExecutor()代替下面的multiprocessing...

原始答案

我的主要问题是我真的不知道如何正确实现multiprocessing.queue,因为你不能为每个进程实例化对象,因为它们将是单独的队列,你如何确保所有进程都与共享队列(或在这种情况下,队列)相关联?

这是一个读者和写者共享单个队列的简单示例...写者向读者发送一堆整数;当写者用完数字后,它会发送“DONE”,这让读者知道要退出读取循环。

您可以生成任意数量的读者进程...

from multiprocessing import Process, Queue
import time
import sys


def reader_proc(queue):
    """Read from the queue; this spawns as a separate Process"""
    while True:
        msg = queue.get()  # Read from the queue and do nothing
        if msg == "DONE":
            break


def writer(count, num_of_reader_procs, queue):
    """Write integers into the queue.  A reader_proc() will read them from the queue"""
    for ii in range(0, count):
        queue.put(ii)  # Put 'count' numbers into queue

    ### Tell all readers to stop...
    for ii in range(0, num_of_reader_procs):
        queue.put("DONE")


def start_reader_procs(qq, num_of_reader_procs):
    """Start the reader processes and return all in a list to the caller"""
    all_reader_procs = list()
    for ii in range(0, num_of_reader_procs):
        ### reader_p() reads from qq as a separate process...
        ###    you can spawn as many reader_p() as you like
        ###    however, there is usually a point of diminishing returns
        reader_p = Process(target=reader_proc, args=((qq),))
        reader_p.daemon = True
        reader_p.start()  # Launch reader_p() as another proc

        all_reader_procs.append(reader_p)

    return all_reader_procs


if __name__ == "__main__":
    num_of_reader_procs = 2
    qq = Queue()  # writer() writes to qq from _this_ process
    for count in [10**4, 10**5, 10**6]:
        assert 0 < num_of_reader_procs < 4
        all_reader_procs = start_reader_procs(qq, num_of_reader_procs)

        writer(count, len(all_reader_procs), qq)  # Queue stuff to all reader_p()
        print("All reader processes are pulling numbers from the queue...")

        _start = time.time()
        for idx, a_reader_proc in enumerate(all_reader_procs):
            print("    Waiting for reader_p.join() index %s" % idx)
            a_reader_proc.join()  # Wait for a_reader_proc() to finish

            print("        reader_p() idx:%s is done" % idx)

        print(
            "Sending {0} integers through Queue() took {1} seconds".format(
                count, (time.time() - _start)
            )
        )
        print("")

25
很好的例子。为了解决原帖中的疑惑,额外提供一点信息...这个例子表明共享队列需要从主进程开始,然后传递给所有子进程。如果两个完全不相关的进程要共享数据,它们必须通过某个中央或关联的网络设备进行通信(例如套接字)。必须有某种方式来协调信息。 - jdi
6
好的例子。我对这个主题也是新手。如果我有多个进程运行相同的目标函数(使用不同的参数),如何确保它们在将数据放入队列时不会发生冲突?需要使用锁吗? - WYSIWYG
4
根据multiprocessing模块文档,Queue是使用几个锁/信号量实现的。因此,当你使用get()和put(object)队列方法时,如果其他进程/线程正在尝试获取或放置队列中的内容,队列将被阻塞。因此,你不必担心手动锁定它。 - almel
3
显式停止条件比隐式停止条件更好。 - Mike Pennington
7
如果队列读取者的速度超过了队列写入者的速度,Qsize可以降至零。请注意,此处的Qsize指队列的大小。 - Mike Pennington
显示剩余8条评论

32
这是一个非常简单的使用示例,使用multiprocessing.Queue和multiprocessing.Process,允许调用者向一个独立的进程发送一个"事件"和参数,该进程将事件分派到进程上的一个"do_"方法中。 (Python 3.4+)
import multiprocessing as mp
import collections

Msg = collections.namedtuple('Msg', ['event', 'args'])

class BaseProcess(mp.Process):
    """A process backed by an internal queue for simple one-way message passing.
    """
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.queue = mp.Queue()

    def send(self, event, *args):
        """Puts the event and args as a `Msg` on the queue
        """
        msg = Msg(event, args)
        self.queue.put(msg)

    def dispatch(self, msg):
        event, args = msg

        handler = getattr(self, "do_%s" % event, None)
        if not handler:
            raise NotImplementedError("Process has no handler for [%s]" % event)

        handler(*args)

    def run(self):
        while True:
            msg = self.queue.get()
            self.dispatch(msg)

使用方法:

class MyProcess(BaseProcess):
    def do_helloworld(self, arg1, arg2):
        print(arg1, arg2)

if __name__ == "__main__":
    process = MyProcess()
    process.start()
    process.send('helloworld', 'hello', 'world')
send 发生在父进程中,do_* 发生在子进程中。
我省略了任何明显会中断运行循环并退出子进程的异常处理。您还可以通过重写 run 来自定义它,以控制阻塞或其他操作。
这实际上只在您只有一个工作进程的情况下有用,但我认为这是对这个问题的一个相关答案,以展示一个更具面向对象的常见场景。

3
非常棒的回答!谢谢。+50 :) - kmiklas

23

我查看了Stack Overflow和网络上的多个答案,试图建立一种使用队列传递大型pandas数据框进行多进程处理的方法。在我的看法中,每个答案似乎都在反复强调同样的解决方案,没有考虑到当设置这些计算时,一定会遇到的大量边缘情况。问题在于同时涉及许多因素,如任务数量、工作人员数量、每个任务的持续时间以及任务执行期间可能出现的异常。所有这些都使得同步变得棘手,大多数答案都没有解决如何解决这些问题。所以,这是我经过几个小时的测试后得出的结论,希望这对大多数人都有用。

在提供任何编码示例之前,请思考一些问题。由于queue.Emptyqueue.qsize()或任何类似方法都不可靠用于流量控制,任何此类代码都

while True:
    try:
        task = pending_queue.get_nowait()
    except queue.Empty:
        break

这是虚假的。即使毫秒之后另一个任务出现在队列中,这也会导致工作进程崩溃。工人将无法恢复,并且经过一段时间后,所有工人都将消失,因为它们随机发现队列短暂为空。最终结果将是主多进程函数(具有对进程进行join()的函数)将返回,而不是所有任务都已完成。好棒。如果你有数千个任务并且缺少几个任务,那么祝你好运调试。

另一个问题是使用哨兵值。许多人建议在队列中添加哨兵值来标记队列的结尾。但是要向谁标记呢?如果有N个工人,假设N是可用核心数,那么一个单独的哨兵值只会向一个工人标记队列的结尾。当没有剩余工作时,所有其他工人都会坐等更多工作。我看到的典型示例如下:

while True:
    task = pending_queue.get()
    if task == SOME_SENTINEL_VALUE:
        break

当中有一个工作线程会得到哨兵值(sentinel value),而其他线程会无限期地等待。我没有看到任何帖子提到过,您需要将哨兵值提交到队列中至少与您拥有的工作线程数量相同的次数,以便所有线程都收到。

另一个问题是在任务执行期间处理异常。同样,这些异常应该被捕获和处理。此外,如果您有一个completed_tasks队列,在您决定作业已完成之前,您应该以确定性的方式独立计算队列中有多少项。再次依赖于队列大小注定会失败并返回意外的结果。

在下面的示例中,par_proc()函数将接收任务列表,其中包括应使用这些任务执行的函数以及任何指定的参数和值。

import multiprocessing as mp
import dill as pickle
import queue
import time
import psutil

SENTINEL = None


def do_work(tasks_pending, tasks_completed):
    # Get the current worker's name
    worker_name = mp.current_process().name

    while True:
        try:
            task = tasks_pending.get_nowait()
        except queue.Empty:
            print(worker_name + ' found an empty queue. Sleeping for a while before checking again...')
            time.sleep(0.01)
        else:
            try:
                if task == SENTINEL:
                    print(worker_name + ' no more work left to be done. Exiting...')
                    break

                print(worker_name + ' received some work... ')
                time_start = time.perf_counter()
                work_func = pickle.loads(task['func'])
                result = work_func(**task['task'])
                tasks_completed.put({work_func.__name__: result})
                time_end = time.perf_counter() - time_start
                print(worker_name + ' done in {} seconds'.format(round(time_end, 5)))
            except Exception as e:
                print(worker_name + ' task failed. ' + str(e))
                tasks_completed.put({work_func.__name__: None})


def par_proc(job_list, num_cpus=None):

    # Get the number of cores
    if not num_cpus:
        num_cpus = psutil.cpu_count(logical=False)

    print('* Parallel processing')
    print('* Running on {} cores'.format(num_cpus))

    # Set-up the queues for sending and receiving data to/from the workers
    tasks_pending = mp.Queue()
    tasks_completed = mp.Queue()

    # Gather processes and results here
    processes = []
    results = []

    # Count tasks
    num_tasks = 0

    # Add the tasks to the queue
    for job in job_list:
        for task in job['tasks']:
            expanded_job = {}
            num_tasks = num_tasks + 1
            expanded_job.update({'func': pickle.dumps(job['func'])})
            expanded_job.update({'task': task})
            tasks_pending.put(expanded_job)

    # Use as many workers as there are cores (usually chokes the system so better use less)
    num_workers = num_cpus

    # We need as many sentinels as there are worker processes so that ALL processes exit when there is no more
    # work left to be done.
    for c in range(num_workers):
        tasks_pending.put(SENTINEL)

    print('* Number of tasks: {}'.format(num_tasks))

    # Set-up and start the workers
    for c in range(num_workers):
        p = mp.Process(target=do_work, args=(tasks_pending, tasks_completed))
        p.name = 'worker' + str(c)
        processes.append(p)
        p.start()

    # Gather the results
    completed_tasks_counter = 0
    while completed_tasks_counter < num_tasks:
        results.append(tasks_completed.get())
        completed_tasks_counter = completed_tasks_counter + 1

    for p in processes:
        p.join()

    return results

这里有一个测试可以运行上面的代码

def test_parallel_processing():
    def heavy_duty1(arg1, arg2, arg3):
        return arg1 + arg2 + arg3

    def heavy_duty2(arg1, arg2, arg3):
        return arg1 * arg2 * arg3

    task_list = [
        {'func': heavy_duty1, 'tasks': [{'arg1': 1, 'arg2': 2, 'arg3': 3}, {'arg1': 1, 'arg2': 3, 'arg3': 5}]},
        {'func': heavy_duty2, 'tasks': [{'arg1': 1, 'arg2': 2, 'arg3': 3}, {'arg1': 1, 'arg2': 3, 'arg3': 5}]},
    ]

    results = par_proc(task_list)

    job1 = sum([y for x in results if 'heavy_duty1' in x.keys() for y in list(x.values())])
    job2 = sum([y for x in results if 'heavy_duty2' in x.keys() for y in list(x.values())])

    assert job1 == 15
    assert job2 == 21

再加上一个带有一些例外的

def test_parallel_processing_exceptions():
    def heavy_duty1_raises(arg1, arg2, arg3):
        raise ValueError('Exception raised')
        return arg1 + arg2 + arg3

    def heavy_duty2(arg1, arg2, arg3):
        return arg1 * arg2 * arg3

    task_list = [
        {'func': heavy_duty1_raises, 'tasks': [{'arg1': 1, 'arg2': 2, 'arg3': 3}, {'arg1': 1, 'arg2': 3, 'arg3': 5}]},
        {'func': heavy_duty2, 'tasks': [{'arg1': 1, 'arg2': 2, 'arg3': 3}, {'arg1': 1, 'arg2': 3, 'arg3': 5}]},
    ]

    results = par_proc(task_list)

    job1 = sum([y for x in results if 'heavy_duty1' in x.keys() for y in list(x.values())])
    job2 = sum([y for x in results if 'heavy_duty2' in x.keys() for y in list(x.values())])

    assert not job1
    assert job2 == 21

希望这有所帮助。


9
在“from queue import Queue”中没有名为queue的模块,应该使用multiprocessing。因此,它应该看起来像这样:“from multiprocessing import Queue”。

19
虽然有些晚,但使用multiprocessing.Queue是正确的。通常Queue.Queue用于Python的线程。当您尝试在多进程中使用Queue.Queue时,每个子进程都会创建Queue对象的副本,而子进程永远不会更新。基本上,Queue.Queue通过使用全局共享对象来工作,而multiprocessing.Queue使用IPC(进程间通信)来工作。参见:https://dev59.com/NnNA5IYBdhLWcg3whuZ_ - Michael Guffre

4

这里提供了一个简单通用的示例,演示如何在两个独立的程序之间通过队列传递消息。虽然它并没有直接回答问题,但足以清晰地说明概念。

服务器端:

multiprocessing-queue-manager-server.py

import asyncio
import concurrent.futures
import multiprocessing
import multiprocessing.managers
import queue
import sys
import threading
from typing import Any, AnyStr, Dict, Union


class QueueManager(multiprocessing.managers.BaseManager):

    def get_queue(self, ident: Union[AnyStr, int, type(None)] = None) -> multiprocessing.Queue:
        pass


def get_queue(ident: Union[AnyStr, int, type(None)] = None) -> multiprocessing.Queue:
    global q

    if not ident in q:
        q[ident] = multiprocessing.Queue()

    return q[ident]


q: Dict[Union[AnyStr, int, type(None)], multiprocessing.Queue] = dict()
delattr(QueueManager, 'get_queue')


def init_queue_manager_server():
    if not hasattr(QueueManager, 'get_queue'):
        QueueManager.register('get_queue', get_queue)


def serve(no: int, term_ev: threading.Event):
    manager: QueueManager
    with QueueManager(authkey=QueueManager.__name__.encode()) as manager:
        print(f"Server address {no}: {manager.address}")

        while not term_ev.is_set():
            try:
                item: Any = manager.get_queue().get(timeout=0.1)
                print(f"Client {no}: {item} from {manager.address}")
            except queue.Empty:
                continue


async def main(n: int):
    init_queue_manager_server()
    term_ev: threading.Event = threading.Event()
    executor: concurrent.futures.ThreadPoolExecutor = concurrent.futures.ThreadPoolExecutor()

    i: int
    for i in range(n):
        asyncio.ensure_future(asyncio.get_running_loop().run_in_executor(executor, serve, i, term_ev))

    # Gracefully shut down
    try:
        await asyncio.get_running_loop().create_future()
    except asyncio.CancelledError:
        term_ev.set()
        executor.shutdown()
        raise


if __name__ == '__main__':
    asyncio.run(main(int(sys.argv[1])))

Client:

multiprocessing-queue-manager-client.py

import multiprocessing
import multiprocessing.managers
import os
import sys
from typing import AnyStr, Union


class QueueManager(multiprocessing.managers.BaseManager):

    def get_queue(self, ident: Union[AnyStr, int, type(None)] = None) -> multiprocessing.Queue:
        pass


delattr(QueueManager, 'get_queue')


def init_queue_manager_client():
    if not hasattr(QueueManager, 'get_queue'):
        QueueManager.register('get_queue')


def main():
    init_queue_manager_client()

    manager: QueueManager = QueueManager(sys.argv[1], authkey=QueueManager.__name__.encode())
    manager.connect()

    message = f"A message from {os.getpid()}"
    print(f"Message to send: {message}")
    manager.get_queue().put(message)


if __name__ == '__main__':
    main()

用法

服务器:

$ python3 multiprocessing-queue-manager-server.py N

N是一个整数,表示应该创建多少台服务器。复制服务器输出的其中一个<server-address-N>,并将其作为每个multiprocessing-queue-manager-client.py的第一个参数。

客户端:

python3 multiprocessing-queue-manager-client.py <server-address-1>

结果

服务器:

Client 1: <item> from <server-address-1>

要点: https://gist.github.com/89062d639e40110c61c2f88018a8b0e5


更新: 创建了一个包,在这里

服务器:

import ipcq


with ipcq.QueueManagerServer(address=ipcq.Address.AUTO, authkey=ipcq.AuthKey.AUTO) as server:
    server.get_queue().get()

客户端:

import ipcq


client = ipcq.QueueManagerClient(address=ipcq.Address.AUTO, authkey=ipcq.AuthKey.AUTO)
client.get_queue().put('a message')

enter image description here


出现此错误类型:对象“Address”没有属性“DEFAULT”。 - Akshay J
已将其重命名为“AUTO”,刚刚更新了答案。谢谢。 - Johann Chang

3

多生产者和多消费者的示例,已经验证。它应该很容易修改以覆盖其他情况,例如单生产者/多生产者,单消费者/多消费者。

from multiprocessing import Process, JoinableQueue
import time
import os

q = JoinableQueue()

def producer():
    for item in range(30):
        time.sleep(2)
        q.put(item)
    pid = os.getpid()
    print(f'producer {pid} done')


def worker():
    while True:
        item = q.get()
        pid = os.getpid()
        print(f'pid {pid} Working on {item}')
        print(f'pid {pid} Finished {item}')
        q.task_done()

for i in range(5):
    p = Process(target=worker, daemon=True).start()

# send thirty task requests to the worker
producers = []
for i in range(2):
    p = Process(target=producer)
    producers.append(p)
    p.start()

# make sure producers done
for p in producers:
    p.join()

# block until all workers are done
q.join()
print('All work completed')

说明:

  1. 此示例中有两个生产者和五个消费者。
  2. 使用JoinableQueue确保队列中存储的所有元素都将被处理。“task_done”用于工作者通知一个元素已完成。'q.join()'将等待所有标记为完成的元素。
  3. 通过第2点,无需等待每个工作者完成任务。
  4. 但是,重要的是等待每个生产者将元素存储到队列中。否则,程序会立即退出。

如果我最多能负担20个进程,并且我从15个生产者和5个工人开始,当所有任务都放入队列时,如何将生产者转移到工人?即在这种情况下有2个生产者和5个工人,一旦生产者生产了所有任务并将它们添加到队列中,那么这2个进程是否作为工人工作?总共7个工人? - Prish
如果消费者过快,使队列为空,那么他们将在生产者完成所有任务入队之前全部完成。这就是为什么接受的答案更好,因为它使用了一个哨兵。 - Michael Currie
同时,您应该在首次创建进程后,在单独的循环中调用每个进程对象的.start(),否则它们将按顺序运行。 - Michael Currie

3
我们实现了两个版本,一个简单的多线程池,可以执行许多类型的可调用对象,使我们的生活更加轻松;第二个版本使用进程,在可调用对象方面不太灵活,并需要额外调用dill。
将frozen_pool设置为true会冻结执行,直到在任一类中调用finish_pool_queue。
线程版本:
'''
Created on Nov 4, 2019

@author: Kevin
'''
from threading import Lock, Thread
from Queue import Queue
import traceback
from helium.loaders.loader_retailers import print_info
from time import sleep
import signal
import os

class ThreadPool(object):
    def __init__(self, queue_threads, *args, **kwargs):
        self.frozen_pool = kwargs.get('frozen_pool', False)
        self.print_queue = kwargs.get('print_queue', True)
        self.pool_results = []
        self.lock = Lock()
        self.queue_threads = queue_threads
        self.queue = Queue()
        self.threads = []

        for i in range(self.queue_threads):
            t = Thread(target=self.make_pool_call)
            t.daemon = True
            t.start()
            self.threads.append(t)

    def make_pool_call(self):
        while True:
            if self.frozen_pool:
                #print '--> Queue is frozen'
                sleep(1)
                continue

            item = self.queue.get()
            if item is None:
                break

            call = item.get('call', None)
            args = item.get('args', [])
            kwargs = item.get('kwargs', {})
            keep_results = item.get('keep_results', False)

            try:
                result = call(*args, **kwargs)

                if keep_results:
                    self.lock.acquire()
                    self.pool_results.append((item, result))
                    self.lock.release()

            except Exception as e:
                self.lock.acquire()
                print e
                traceback.print_exc()
                self.lock.release()
                os.kill(os.getpid(), signal.SIGUSR1)

            self.queue.task_done()

    def finish_pool_queue(self):
        self.frozen_pool = False

        while self.queue.unfinished_tasks > 0:
            if self.print_queue:
                print_info('--> Thread pool... %s' % self.queue.unfinished_tasks)
            sleep(5)

        self.queue.join()

        for i in range(self.queue_threads):
            self.queue.put(None)

        for t in self.threads:
            t.join()

        del self.threads[:]

    def get_pool_results(self):
        return self.pool_results

    def clear_pool_results(self):
        del self.pool_results[:]

处理器版本:

  '''
Created on Nov 4, 2019

@author: Kevin
'''
import traceback
from helium.loaders.loader_retailers import print_info
from time import sleep
import signal
import os
from multiprocessing import Queue, Process, Value, Array, JoinableQueue, Lock,\
    RawArray, Manager
from dill import dill
import ctypes
from helium.misc.utils import ignore_exception
from mem_top import mem_top
import gc

class ProcessPool(object):
    def __init__(self, queue_processes, *args, **kwargs):
        self.frozen_pool = Value(ctypes.c_bool, kwargs.get('frozen_pool', False))
        self.print_queue = kwargs.get('print_queue', True)
        self.manager = Manager()
        self.pool_results = self.manager.list()
        self.queue_processes = queue_processes
        self.queue = JoinableQueue()
        self.processes = []

        for i in range(self.queue_processes):
            p = Process(target=self.make_pool_call)
            p.start()
            self.processes.append(p)

        print 'Processes', self.queue_processes

    def make_pool_call(self):
        while True:
            if self.frozen_pool.value:
                sleep(1)
                continue

            item_pickled = self.queue.get()

            if item_pickled is None:
                #print '--> Ending'
                self.queue.task_done()
                break

            item = dill.loads(item_pickled)

            call = item.get('call', None)
            args = item.get('args', [])
            kwargs = item.get('kwargs', {})
            keep_results = item.get('keep_results', False)

            try:
                result = call(*args, **kwargs)

                if keep_results:
                    self.pool_results.append(dill.dumps((item, result)))
                else:
                    del call, args, kwargs, keep_results, item, result

            except Exception as e:
                print e
                traceback.print_exc()
                os.kill(os.getpid(), signal.SIGUSR1)

            self.queue.task_done()

    def finish_pool_queue(self, callable=None):
        self.frozen_pool.value = False

        while self.queue._unfinished_tasks.get_value() > 0:
            if self.print_queue:
                print_info('--> Process pool... %s' % (self.queue._unfinished_tasks.get_value()))

            if callable:
                callable()

            sleep(5)

        for i in range(self.queue_processes):
            self.queue.put(None)

        self.queue.join()
        self.queue.close()

        for p in self.processes:
            with ignore_exception: p.join(10)
            with ignore_exception: p.terminate()

        with ignore_exception: del self.processes[:]

    def get_pool_results(self):
        return self.pool_results

    def clear_pool_results(self):
        del self.pool_results[:]
def test(eg):
        print 'EG', eg

可用以下任一方式进行通话:

tp = ThreadPool(queue_threads=2)
tp.queue.put({'call': test, 'args': [random.randint(0, 100)]})
tp.finish_pool_queue()

或者

pp = ProcessPool(queue_processes=2)
pp.queue.put(dill.dumps({'call': test, 'args': [random.randint(0, 100)]}))
pp.queue.put(dill.dumps({'call': test, 'args': [random.randint(0, 100)]}))
pp.finish_pool_queue()

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接