使用ThreadPoolExecutor时记录线程日志

Question

使用ThreadPoolExecutor时记录线程日志

5

我正在使用Python的concurrent.futures中的ThreadPoolExecutor来并行地进行爬虫并将结果写入数据库。然而，我发现如果线程失败了，我无法获得任何信息。如何正确地知道哪些线程失败以及原因 (包括'正常'的 traceback)? 下面是一个最简工作示例。

import logging
logging.basicConfig(format='%(asctime)s  %(message)s', 
    datefmt='%y-%m-%d %H:%M:%S', level=logging.INFO)
from concurrent.futures import ThreadPoolExecutor

def worker_bee(seed):
    # sido is not defined intentionally to break the code
    result = seed + sido
    return result

# uncomment next line, and you will get the usual traceback
# worker_bee(1)

# ThreadPoolExecutor will not provide any traceback
logging.info('submitting all jobs to the queue')
with ThreadPoolExecutor(max_workers=4) as executor:
    for seed in range(0,10):
        executor.submit(worker_bee, seed)
    logging.info(f'submitted, waiting for threads to finish')

如果我在worker_bee()函数内导入logging并将消息定向到根记录器，那么我可以在最终的日志中看到这些消息。但我只能看到我定义的日志消息，而无法看到代码实际失败的回溯信息。

- gochristoph

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- martineau · Accepted Answer

通过从executor.submit()检索结果，您可以获得“普通回溯”。这将允许一些时间经过，并启动线程执行（可能会失败）。

这是我的意思：

from concurrent.futures import ThreadPoolExecutor
import logging

logging.basicConfig(format='%(asctime)s  %(message)s',
                    datefmt='%y-%m-%d %H:%M:%S', level=logging.INFO)

def worker_bee(seed):
    # sido is not defined intentionally to break the code
    result = seed + sido
    return result

logging.info('submitting all jobs to the queue')
with ThreadPoolExecutor(max_workers=4) as executor:
    results = []
    for seed in range(10):
        result = executor.submit(worker_bee, seed)
        results.append(result)
    logging.info(f'submitted, waiting for threads to finish')

for result in results:
    print(result.result())

输出：

20-03-21 16:21:24  submitting all jobs to the queue
20-03-21 16:21:24  submitted, waiting for threads to finish
Traceback (most recent call last):
  File "logging-threads-when-using-threadpoolexecutor.py", line 24, in <module>
    print(result.result())
  File "C:\Python3\lib\concurrent\futures\_base.py", line 432, in result
    return self.__get_result()
  File "C:\Python3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
  File "C:\Python3\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "logging-threads-when-using-threadpoolexecutor.py", line 12, in worker_bee
    result = seed + sido
NameError: name 'sido' is not defined