使用pywin32控制Adobe Acrobat时出现“未实现”异常

12

我用pywin32编写了一个Python脚本,可以将PDF文件保存为文本,直到最近一直正常工作。在Excel中我使用类似的方法。以下是代码:

def __pdf2Txt(self, pdf, fileformat="com.adobe.acrobat.accesstext"):
    outputLoc = os.path.dirname(pdf)
    outputLoc = os.path.join(outputLoc, os.path.splitext(os.path.basename(pdf))[0] + '.txt')

    try:
        win32com.client.gencache.EnsureModule('{E64169B3-3592-47d2-816E-602C5C13F328}', 0, 1, 1)
        adobe = win32com.client.DispatchEx('AcroExch.App')
        pdDoc = win32com.client.DispatchEx('AcroExch.PDDoc')
        pdDoc.Open(pdf)
        jObject = pdDoc.GetJSObject()
        jObject.SaveAs(outputLoc, "com.adobe.acrobat.accesstext")
    except:
        traceback.print_exc()
        return False
    finally:
        del jObject
        pdDoc.Close()
        del pdDoc
        adobe.Exit()
        del adobe

然而,这段代码突然停止工作,并输出以下内容:

Traceback (most recent call last):
  File "C:\Documents and Settings\ablishen\workspace\HooverKeyCreator\src\HooverKeyCreator.py", line 38, in __pdf2Txt
    jObject.SaveAs(outputLoc, "com.adobe.acrobat.accesstext")
  File "C:\Python27\lib\site-packages\win32com\client\dynamic.py", line 505, in __getattr__
    ret = self._oleobj_.Invoke(retEntry.dispid,0,invoke_type,1)
com_error: (-2147467263, 'Not implemented', None, None)
False

我写了类似的VB代码,它可以正常工作,所以我猜想可能是COM接口没有正确地绑定到相应的函数上?(我的COM知识不太全面)。


2
这个PDF文件是否具有保存使用权限?(根据文档中的描述,可以猜测此方法适用于具有保存使用权限的Adobe Reader文档。) - Steven Rumbalski
1
似乎没有用,但我已经启用了它们,仍然出现相同的错误。此外,我正在使用Adobe Acrobat来运行代码。 - Blish
2个回答

8
Blish,这篇帖子揭示了你正在寻找的解决方案的关键:https://mail.python.org/pipermail/python-win32/2002-March/000260.html 我承认上面的帖子不太容易找到(可能是因为Google根据内容的年龄将其评分较低?)。
具体来说,应用这个建议将会让你的事情运转起来:https://mail.python.org/pipermail/python-win32/2002-March/000265.html 供参考,完整的代码片段不需要手动修补dynamic.py(该片段应该几乎可以直接运行):
# gets all files under ROOT_INPUT_PATH with FILE_EXTENSION and tries to extract text from them into ROOT_OUTPUT_PATH with same filename as the input file but with INPUT_FILE_EXTENSION replaced by OUTPUT_FILE_EXTENSION
from win32com.client import Dispatch
from win32com.client.dynamic import ERRORS_BAD_CONTEXT

import winerror

# try importing scandir and if found, use it as it's a few magnitudes of an order faster than stock os.walk
try:
    from scandir import walk
except ImportError:
    from os import walk

import fnmatch

import sys
import os

ROOT_INPUT_PATH = None
ROOT_OUTPUT_PATH = None
INPUT_FILE_EXTENSION = "*.pdf"
OUTPUT_FILE_EXTENSION = ".txt"

def acrobat_extract_text(f_path, f_path_out, f_basename, f_ext):
    avDoc = Dispatch("AcroExch.AVDoc") # Connect to Adobe Acrobat

    # Open the input file (as a pdf)
    ret = avDoc.Open(f_path, f_path)
    assert(ret) # FIXME: Documentation says "-1 if the file was opened successfully, 0 otherwise", but this is a bool in practise?

    pdDoc = avDoc.GetPDDoc()

    dst = os.path.join(f_path_out, ''.join((f_basename, f_ext)))

    # Adobe documentation says "For that reason, you must rely on the documentation to know what functionality is available through the JSObject interface. For details, see the JavaScript for Acrobat API Reference"
    jsObject = pdDoc.GetJSObject()

    # Here you can save as many other types by using, for instance: "com.adobe.acrobat.xml"
    jsObject.SaveAs(dst, "com.adobe.acrobat.accesstext")

    pdDoc.Close()
    avDoc.Close(True) # We want this to close Acrobat, as otherwise Acrobat is going to refuse processing any further files after a certain threshold of open files are reached (for example 50 PDFs)
    del pdDoc

if __name__ == "__main__":
    assert(5 == len(sys.argv)), sys.argv # <script name>, <script_file_input_path>, <script_file_input_extension>, <script_file_output_path>, <script_file_output_extension>

    #$ python get.txt.from.multiple.pdf.py 'C:\input' '*.pdf' 'C:\output' '.txt'

    ROOT_INPUT_PATH = sys.argv[1]
    INPUT_FILE_EXTENSION = sys.argv[2]
    ROOT_OUTPUT_PATH = sys.argv[3]
    OUTPUT_FILE_EXTENSION = sys.argv[4]

    # tuples are of schema (path_to_file, filename)
    matching_files = ((os.path.join(_root, filename), os.path.splitext(filename)[0]) for _root, _dirs, _files in walk(ROOT_INPUT_PATH) for filename in fnmatch.filter(_files, INPUT_FILE_EXTENSION))

    # patch ERRORS_BAD_CONTEXT as per https://mail.python.org/pipermail/python-win32/2002-March/000265.html
    global ERRORS_BAD_CONTEXT
    ERRORS_BAD_CONTEXT.append(winerror.E_NOTIMPL)

    for filename_with_path, filename_without_extension in matching_files:
        print "Processing '{}'".format(filename_without_extension)
        acrobat_extract_text(filename_with_path, ROOT_OUTPUT_PATH, filename_without_extension, OUTPUT_FILE_EXTENSION)

我已在WinPython x64 2.7.6.3和Acrobat X Pro上进行了测试。


2
在dynamic.py中将winerror.E_NOTIMPL添加到ERRORS_BAD_CONTEXT列表中已经生效。非常感谢! - Blish
1
嗨,我正在使用Python和Acrobat Reader Pro进行同样的功能。即使在执行了先前评论者所做的操作后,目前这段代码仍然给出以下错误:"NotAllowedError: Security settings prevent access to this property or method"。你知道是什么原因导致的吗?谢谢。 - dasen
2
我无法为ERRORS_BAD_CONTEXT.append(winerror.E_NOTIMPL)这行代码给你点赞足够多了。 - Fenikso
这真是太神奇了。同时解释了为什么 jsobject 在 VBA 中能够工作却不能在 PowerShell 中工作... - Yiping

2

makepy.py是win32com Python包附带的脚本。

运行它会将Python与Windows中的COM/OLE对象“连接”起来。以下是我用来与Excel通信并在其中执行某些操作的一些代码摘录。此示例获取当前工作簿中第1个工作表的名称。如果出现异常,它会自动运行makepy:

import win32com;
import win32com.client;
from win32com.client import selecttlb;

def attachExcelCOM():
   makepyExe = r'python C:\Python25\Lib\site-packages\win32com\client\makepy.py';
   typeList = selecttlb.EnumTlbs();
   for tl in typeList:
      if (re.match('^Microsoft.*Excel.*', tl.desc, re.IGNORECASE)):
          makepyCmd = "%s -d \"%s\"" % (makepyExe, tl.desc);
          os.system(makepyCmd);
      # end if
   # end for
# end def

def getSheetName(sheetNum):
   try:
      xl = win32com.client.Dispatch("Excel.Application");
      wb = xl.Workbooks.Item(sheetNum);
   except Exception, detail:
      print 'There was a problem attaching to Excel, refreshing connect config...';
      print Exception, str(detail);
      attachExcelCOM();
      try:
         xl = win32com.client.Dispatch("Excel.Application");
         wb = xl.Workbooks.Item(sheetNum);
      except:
         print 'Could not attach to Excel...';
         sys.exit(-1);
      # end try/except
   # end try/except

   wsName = wb.Name;
   if (wsName == 'PERSONAL.XLS'):
      return( None );
   # end if
   print 'The target worksheet is:';
   print '      ', wsName;
   print 'Is this correct? [Y/N]',;
   answer = string.strip( sys.stdin.readline() );
   answer = answer.upper();
   if (answer != 'Y'):
      print 'Sheet not identified correctly.';
      return(None);
   # end if
   return( (wb, wsName) );
# end def

# -- Main --
sheetInfo = getSheetName(sheetNum);
if (sheetInfo == None):
   print 'Sheet not found';
   sys.exit(-1);
else:
   (wb, wsName) = sheetInfo;
# end if

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接