无法在Windows上安装textract

8
我尝试了很多方法,但在Windows上使用pip命令安装textract包时仍然失败。
我收到以下错误信息:

error

我不知道该怎么做,所以非常感激任何建议。谢谢。


看起来文件使用无法读取的编码进行编码,当您尝试在编辑器中查找和打开README时会发生什么? - James Parsons
@James_Parsons 我甚至找不到这个文件。 - Sebastian Wdowiarz
尝试读取它的Python文件的位置在堆栈跟踪中。您可以检查该文件以获取可能导致README位置的上下文。 - James Parsons
嘿@SebastianWdowiarz,你搞定了吗?如果是,请选择我的答案,或者如果你找到了其他方法,请创建一个新的回复。 - Marcus Mann
4个回答

12

这里盗取:

需要先从conda (miniconda)安装swig。

conda install swig

然后从版本发布页面下载EbookLib 0.15的zip文件。

https://github.com/aerkalov/ebooklib/releases
在解压后,我手动使用记事本++从README.md文件中删除了Unicode字符(该Unicode字符位于第44行),然后使用pip安装了该模块。
cd to_unzipped_folder_path_here
pip install .

最后

pip install textract

1

现在该项目似乎已被另一个人接管(在我写这篇答案的3个月前开始更新了该项目),因此解决方案要简单得多。

现在,您可以访问https://github.com/deanmalmgren/textract/releases并下载v1.6.2,它仅提供比v1.6.1更高的需求更新(修复了Unicode调试错误),或者v1.6.3是最新版本(截至撰写本文时)。

下载后,解压缩,cd [提取到的文件夹],然后pip install .

请记住,随着要求的更新,恶意代码可能会被插入依赖项中,因此请自行承担风险进行更新。


1

(Windows 10,Python 3.7) 我的问题比其他人更多,但这是基于以前的答案构建的:

  1. 确保已安装 Microsoft Visual Studio C++ Compiler for Python

  2. python -m pip install --upgrade pip setuptools wheel

  3. pip install six --upgrade

  4. 下载 EbookLib 版本 0.15:

    • 解压缩 .zip 文件 为避免编码错误,请编辑 "long_description" 变量赋值为 "long_description = open('README.md',encoding="utf-8").read(),"
  5. 下载 Swig:

    • http://www.swig.org/download.html
    • 解压缩 .zip 文件
    • 将 swig.exe 文件复制到 Python 路径中:例如 "C:\Users\username\AppData\Local\Programs\Python\Python37"
    • 将 "typemaps" 文件夹复制到 python 的 "Lib" 文件夹中:例如 "C:\Program Files\swigwin-4.0.0\Lib\typemaps" --> "C:\Users\username\AppData\Local\Programs\Python\Python37\Lib\"
    • 将 "*.swg" 文件复制到 python 的 "Lib" 文件夹中:例如 "C:\Program Files\swigwin-4.0.0\Lib*.swg" --> "C:\Users\username\AppData\Local\Programs\Python\Python37\Lib\"
    • 将所有 swig python 文件复制到 python 的 "Lib" 文件夹中:例如 "C:\Program Files\swigwin-4.0.0\Lib\python*" --> "C:\Users\username\AppData\Local\Programs\Python\Python37\Lib\"
  6. 从提示符中进入未压缩的 Ebooklib 文件夹:例如 C:> cd "C:\Users\username\Desktop\ebooklib-0.15"

  7. 运行 EbookLib 的安装:pip install .

  8. 运行 textract 安装:pip install textract

输出应该是:

输出结果应该是:

C:\Users\username\Desktop\ebooklib-0.15>pip install textract
Collecting textract
Requirement already satisfied: docx2txt==0.6 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (0.6)
Requirement already satisfied: beautifulsoup4==4.5.3 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (4.5.3)
Requirement already satisfied: EbookLib==0.15 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (0.15)
Requirement already satisfied: xlrd==1.0.0 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (1.0.0)
Requirement already satisfied: SpeechRecognition==3.6.3 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (3.6.3)
Requirement already satisfied: six==1.10.0 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (1.10.0)
Collecting pocketsphinx==0.1.3 (from textract)
  Using cached https://files.pythonhosted.org/packages/93/5f/a968e5d53d25e32deb78c3e169fd8612ecf53cc76e32cb40e19be35696af/pocketsphinx-0.1.3.tar.bz2
Requirement already satisfied: chardet==2.3.0 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (2.3.0)
Requirement already satisfied: argcomplete==1.8.2 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (1.8.2)
Requirement already satisfied: python-pptx==0.6.5 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (0.6.5)
Requirement already satisfied: lxml in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from EbookLib==0.15->textract) (4.3.3)
Requirement already satisfied: XlsxWriter>=0.5.7 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from python-pptx==0.6.5->textract) (1.1.8)
Requirement already satisfied: Pillow>=2.6.1 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from python-pptx==0.6.5->textract) (6.0.0)
Building wheels for collected packages: pocketsphinx
  Building wheel for pocketsphinx (setup.py) ... done
  Stored in directory: C:\Users\username\AppData\Local\pip\Cache\wheels\38\80\4f\ddc3e8c2b788f2c7f1d625ae870f6bafd3038ff04a3445a2f8
Successfully built pocketsphinx
Installing collected packages: pocketsphinx, textract
Successfully installed pocketsphinx-0.1.3 textract-1.6.1

C:\Users\username\Desktop\ebooklib-0.15>

在撰写本文时,jsonschema与textract存在冲突的依赖关系。我尝试找到正确的安装方式时也出现了以下错误:

ERROR: requests 2.22.0 has requirement chardet<3.1.0,>=3.0.2, but you'll have chardet 2.3.0 which is incompatible.
ERROR: camelot-py 0.7.2 has requirement chardet>=3.0.4, but you'll have chardet 2.3.0 which is incompatible.

ERROR: Command "python setup.py egg_info" failed with error code 1 in C:\Users\username\AppData\Local\Temp\pip-install-msmb9od3\EbookLib\
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 1671: character maps to <undefined>
error: command 'C:\\Users\\username\\AppData\\Local\\Programs\\Python\\Python37\\swig.exe' failed with exit status 1

ERROR: Failed building wheel for pocketsphinx
error: command 'swig.exe' failed: No such file or directory
  (1) : Error: Unable to find 'swig.swg'
  (3) : Error: Unable to find 'python.swg'

1
不是最优雅的解决方案,但它可以工作!
pip install git+https://github.com/jpweytjens/textract

感谢 jpweytjens。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接