使用PDFMiner（Python）处理在线PDF文件。对URL进行编码？

Question

使用PDFMiner（Python）处理在线PDF文件。对URL进行编码？

3

我希望能够使用PDFMiner提取在线可用的pdf文件的内容。

我的代码基于文档中提供的代码，用于从硬盘上提取PDF文件的内容。

# Open a PDF file.
fp = open('mypdf.pdf', 'rb')
# Create a PDF parser object associated with the file object.
parser = PDFParser(fp)
# Create a PDF document object that stores the document structure.
document = PDFDocument(parser)

那个方法非常好，只需要做一些小的修改就行了。

现在，我尝试使用urllib2.openurl来打开在线PDF文件，但是它不起作用。我得到一个错误信息：coercing to Unicode: need string or buffer, instance found。

我该如何从urllib2.openurl获取一个字符串（或其他内容），使其与当我给出PDF文件名（而不是URL）时open函数所得到的内容相同？

如果我的问题不清楚，请告诉我。

- tagoma

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- tagoma · Accepted Answer

好的，我终于找到了解决办法，

我使用了Request和StringIO方法，并且去掉了open('my_file', 'rd')命令

from urllib2 import Request
from StringIO import StringIO

url = 'my_url'

open = urllib2.urlopen(Request(url)).read()
memoryFile = StringIO(open)

parser = PDFParser(memoryFile)

那样Python会认为这个URL是一个文件（这么说）。