Python多线程计时器：当程序超时时设置时间限制

Question

Python多线程计时器：当程序超时时设置时间限制

pythonmultithreadingtimertimeout

4

我有一些关于在Python中设置函数最大运行时间的问题，实际上我想使用pdfminer将.pdf文件转换为.txt文件。

问题在于，很多时候，有些文件无法解码并且需要非常长的时间。因此，我想使用threading.Timer()来限制每个文件的转换时间为5秒钟。另外，我在Windows下运行，因此不能使用signal模块。

我已经成功地使用pdfminer.convert_pdf_to_txt()（在我的代码中是“c”）运行了转换代码，但我不确定在下面的代码中，threading.Timer()是否有效。（我认为它没有正确地约束每个处理的时间）

简而言之，我想要：

1. 将PDF转换为TXT 2. 每次转换的时限为5秒钟，如果超过时间限制，则抛出异常并保存一个空文件。 3. 将所有txt文件保存在同一个文件夹中。 4. 如果有任何异常/错误，仍然保存文件，但内容为空。

以下是当前代码：

import converter as c
import os
import timeit
import time
import threading
import thread

yourpath = 'D:/hh/'

def iftimesout():
    print("no")

    with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
        newfile.write("")


for root, dirs, files in os.walk(yourpath, topdown=False):
    for name in files:
        try:
           timer = threading.Timer(5.0,iftimesout)
           timer.start()
           t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
           a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
           g=str(a.split("\\")[1])

           with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
                print("yes")

           timer.cancel()

         except KeyboardInterrupt:
               raise

         except:
             for name in files:
                 t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
                 a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])

                 g=str(a.split("\\")[1])
                 with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                     newfile.write("")

- SXC88

再考虑一下 :) - linusg

@linusg 那太好了！谢谢 :)) - SXC88

终于完成了，这应该可以了 :) - linusg

@SXC88，我没有使用过pdfminer，但我已经检查过它，发现它不包含convert_pdf_to_txt()方法或者converter.convert_pdf_to_txt()... 你是指pdfminer.PDFConverter吗？ - Andersson

嗨，我刚刚在下面发布了converter.convert_pdf_to_txt()函数，如果你想看一下的话。但是实际上我可以无问题地转换所有这些文件，但是一旦我尝试添加时间限制，代码就不能正常工作... @Andersson - SXC88

@SXC88 - 我终于搞定了。看看我的完全更新的答案！ - linusg

2个回答

0

请检查以下代码，如果有任何问题，请告诉我。同时，请问您是否仍然想要使用强制终止功能（KeyboardInterruption）？

path_to_pdf = "C:\\Path\\To\\Main\\PDFs" # No "\\" at the end of path!
path_to_text = "C:\\Path\\To\\Save\\Text\\" # There is "\\" at the end of path!
TIMEOUT = 5  # seconds
TIME_TO_CHECK = 1  # seconds


# Save PDF content into text file or save empty file in case of conversion timeout
def convert(path_to, my_pdf):
    my_txt = text_file_name(my_pdf)
    with open(my_txt, "w") as my_text_file:
         try:
              my_text_file.write(convert_pdf_to_txt(path_to + '\\' + my_pdf))
         except:
              print "Error. %s file wasn't converted" % my_pdf


# Convert file_name.pdf from PDF folder to file_name.text in Text folder
def text_file_name(pdf_file):
    return path_to_text + (pdf_file.split('.')[0]+ ".txt")


if __name__ == "__main__":
    # for each pdf file in PDF folder
    for root, dirs, files in os.walk(path_to_pdf, topdown=False):
        for my_file in files:
            count = 0
            p = Process(target=convert, args=(root, my_file,))
            p.start()
            # some delay to be sure that text file created
            while not os.path.isfile(text_file_name(my_file)):
                time.sleep(0.001)
            while True:
                # if not run out of $TIMEOUT and file still empty: wait for $TIME_TO_CHECK,
                # else: close file and start new iteration
                if count < TIMEOUT and os.stat(text_file_name(my_file)).st_size == 0:
                    count += TIME_TO_CHECK
                    time.sleep(TIME_TO_CHECK)
                else:
                    p.terminate()
                    break

- Andersson

嗨，我有一篇新文章，如果你想看的话；））https://dev59.com/sVkR5IYBdhLWcg3wygHp @Andersson - SXC88

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- linusg · Accepted Answer

我终于想通了！

首先，定义一个函数来使用有限的超时时间调用另一个函数：

import multiprocessing

def call_timeout(timeout, func, args=(), kwargs={}):
    if type(timeout) not in [int, float] or timeout <= 0.0:
        print("Invalid timeout!")

    elif not callable(func):
        print("{} is not callable!".format(type(func)))

    else:
        p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
        p.start()
        p.join(timeout)

        if p.is_alive():
            p.terminate()
            return False
        else:
            return True

该函数的作用是什么？

- 检查超时和函数是否有效 - 在新进程中启动给定的函数，这比线程具有一些优势 - 阻塞程序x秒钟（p.join()），并允许在此期间执行该函数 - 超时后，检查函数是否仍在运行 - 是：终止它并返回False - 否：好的，没有超时！返回True 我们可以使用time.sleep()进行测试。

import time

finished = call_timeout(2, time.sleep, args=(1, ))
if finished:
    print("No timeout")
else:
    print("Timeout")

我们运行一个需要1秒钟才能完成的函数，超时时间设置为两秒钟:

No timeout

如果我们运行 time.sleep(10) 并将超时设置为两秒：

finished = call_timeout(2, time.sleep, args=(10, ))

结果：

Timeout

请注意，程序在两秒钟后停止，而没有完成被调用的函数。

你的最终代码将如下所示：

import converter as c
import os
import timeit
import time
import multiprocessing

yourpath = 'D:/hh/'

def call_timeout(timeout, func, args=(), kwargs={}):
    if type(timeout) not in [int, float] or timeout <= 0.0:
        print("Invalid timeout!")

    elif not callable(func):
        print("{} is not callable!".format(type(func)))

    else:
        p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
        p.start()
        p.join(timeout)

        if p.is_alive():
            p.terminate()
            return False
        else:
            return True

def convert(root, name, g, t):
    with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
        newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))

for root, dirs, files in os.walk(yourpath, topdown=False):
    for name in files:
        try:
           t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
           a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
           g=str(a.split("\\")[1])
           finished = call_timeout(5, convert, args=(root, name, g, t))

           if finished:
               print("yes")
           else:
               print("no")

               with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                   newfile.write("")

        except KeyboardInterrupt:
             raise

       except:
           for name in files:
                t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
                a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])

               g=str(a.split("\\")[1])
               with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                   newfile.write("")

代码应该易于理解，如果不是，请随时提问。

我真的希望这有所帮助（因为我们花了一些时间才做对;)！