使用ThreadPoolExecutor时记录线程日志

5

我正在使用Python的concurrent.futures中的ThreadPoolExecutor来并行地进行爬虫并将结果写入数据库。然而,我发现如果线程失败了,我无法获得任何信息。如何正确地知道哪些线程失败以及原因 (包括'正常'的 traceback)? 下面是一个最简工作示例。

import logging
logging.basicConfig(format='%(asctime)s  %(message)s', 
    datefmt='%y-%m-%d %H:%M:%S', level=logging.INFO)
from concurrent.futures import ThreadPoolExecutor

def worker_bee(seed):
    # sido is not defined intentionally to break the code
    result = seed + sido
    return result

# uncomment next line, and you will get the usual traceback
# worker_bee(1)

# ThreadPoolExecutor will not provide any traceback
logging.info('submitting all jobs to the queue')
with ThreadPoolExecutor(max_workers=4) as executor:
    for seed in range(0,10):
        executor.submit(worker_bee, seed)
    logging.info(f'submitted, waiting for threads to finish')

如果我在worker_bee()函数内导入logging并将消息定向到根记录器,那么我可以在最终的日志中看到这些消息。但我只能看到我定义的日志消息,而无法看到代码实际失败的回溯信息。

1个回答

3
通过从executor.submit()检索结果,您可以获得“普通回溯”。这将允许一些时间经过,并启动线程执行(可能会失败)。
这是我的意思:
from concurrent.futures import ThreadPoolExecutor
import logging

logging.basicConfig(format='%(asctime)s  %(message)s',
                    datefmt='%y-%m-%d %H:%M:%S', level=logging.INFO)

def worker_bee(seed):
    # sido is not defined intentionally to break the code
    result = seed + sido
    return result

logging.info('submitting all jobs to the queue')
with ThreadPoolExecutor(max_workers=4) as executor:
    results = []
    for seed in range(10):
        result = executor.submit(worker_bee, seed)
        results.append(result)
    logging.info(f'submitted, waiting for threads to finish')

for result in results:
    print(result.result())

输出:

20-03-21 16:21:24  submitting all jobs to the queue
20-03-21 16:21:24  submitted, waiting for threads to finish
Traceback (most recent call last):
  File "logging-threads-when-using-threadpoolexecutor.py", line 24, in <module>
    print(result.result())
  File "C:\Python3\lib\concurrent\futures\_base.py", line 432, in result
    return self.__get_result()
  File "C:\Python3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
  File "C:\Python3\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "logging-threads-when-using-threadpoolexecutor.py", line 12, in worker_bee
    result = seed + sido
NameError: name 'sido' is not defined


我认为print(result.result())需要缩进到with上下文之下,因为它只在那里定义。 - kareem_emad
1
@Kareem:不需要,因为with语句不会引入新的作用域。但是你的评论让我注意到了其他问题... - martineau
print() 调用只是为了调用 result() 方法,否则并不重要或有意义。抱歉,我不知道有什么方法可以自动将所有异常发送到日志文件中,但我不是一个日志记录专家。 - martineau
请参阅问题如何记录带有调试信息的Python错误?,这是我所知道的唯一记录异常的方法。它需要添加异常处理程序并调用logging.exception(),因此几乎不算“自动化”。 - martineau
这个答案是针对一个相关问题的,看起来可以自动记录未处理的异常。 - martineau
显示剩余3条评论

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接