无法使用 !pip install textract 安装 textract。

4

我一直在尝试使用命令安装textract: !pip install textract,但是遇到以下错误:

Collecting textract
Requirement already satisfied: docx2txt==0.6 in /home/UGI/akedia/.conda/envs/at3deploy/lib/python3.6/site-packages (from textract) (0.6)
Requirement already satisfied: argcomplete==1.8.2 in /home/UGI/akedia/.conda/envs/at3deploy/lib/python3.6/site-packages (from textract) (1.8.2)
Collecting six==1.10.0 (from textract)
  Using cached https://files.pythonhosted.org/packages/c8/0a/b6723e1bc4c516cb687841499455a8505b44607ab535be01091c0f24f079/six-1.10.0-py2.py3-none-any.whl
Collecting EbookLib==0.15 (from textract)
Collecting pocketsphinx==0.1.3 (from textract)
  Using cached https://files.pythonhosted.org/packages/93/5f/a968e5d53d25e32deb78c3e169fd8612ecf53cc76e32cb40e19be35696af/pocketsphinx-0.1.3.tar.bz2
Requirement already satisfied: beautifulsoup4==4.5.3 in /home/UGI/akedia/.conda/envs/at3deploy/lib/python3.6/site-packages (from textract) (4.5.3)
Requirement already satisfied: SpeechRecognition==3.6.3 in /home/UGI/akedia/.conda/envs/at3deploy/lib/python3.6/site-packages (from textract) (3.6.3)
Requirement already satisfied: chardet==2.3.0 in /home/UGI/akedia/.conda/envs/at3deploy/lib/python3.6/site-packages (from textract) (2.3.0)
Requirement already satisfied: python-pptx==0.6.5 in /home/UGI/akedia/.conda/envs/at3deploy/lib/python3.6/site-packages (from textract) (0.6.5)
Requirement already satisfied: xlrd==1.0.0 in /home/UGI/akedia/.conda/envs/at3deploy/lib/python3.6/site-packages (from textract) (1.0.0)
Requirement already satisfied: lxml in /home/UGI/akedia/.conda/envs/at3deploy/lib/python3.6/site-packages (from EbookLib==0.15->textract) (4.3.2)
Requirement already satisfied: XlsxWriter>=0.5.7 in /home/UGI/akedia/.conda/envs/at3deploy/lib/python3.6/site-packages (from python-pptx==0.6.5->textract) (1.1.5)
Requirement already satisfied: Pillow>=2.6.1 in /home/UGI/akedia/.conda/envs/at3deploy/lib/python3.6/site-packages (from python-pptx==0.6.5->textract) (5.3.0)
Building wheels for collected packages: pocketsphinx
  Building wheel for pocketsphinx (setup.py) ... error
  Complete output from command /home/UGI/akedia/.conda/envs/at3deploy/bin/python -u -c "import setuptools, tokenize;__file__='/backup/mltmp/pip-install-k3hazve3/pocketsphinx/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /backup/mltmp/pip-wheel-7t5e7pu4 --python-tag cp36:
  running bdist_wheel
  running build_ext
  building 'sphinxbase._ad' extension
  swigging swig/sphinxbase/ad.i to swig/sphinxbase/ad_wrap.c
  swig -python -modern -Ideps/sphinxbase/include -Ideps/sphinxbase/include/sphinxbase -Ideps/sphinxbase/include/android -Ideps/sphinxbase/swig -outdir sphinxbase -o swig/sphinxbase/ad_wrap.c swig/sphinxbase/ad.i
  unable to execute 'swig': No such file or directory
  error: command 'swig' failed with exit status 1

  ----------------------------------------
  Failed building wheel for pocketsphinx
  Running setup.py clean for pocketsphinx
Failed to build pocketsphinx
spacy 2.0.12 has requirement regex==2017.4.5, but you'll have regex 2018.7.11 which is incompatible.
Installing collected packages: six, EbookLib, pocketsphinx, textract
  Running setup.py install for pocketsphinx ... error
    Complete output from command /home/UGI/akedia/.conda/envs/at3deploy/bin/python -u -c "import setuptools, tokenize;__file__='/backup/mltmp/pip-install-k3hazve3/pocketsphinx/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /backup/mltmp/pip-record-ws03agmf/install-record.txt --single-version-externally-managed --compile:
    running install
    running build_ext
    building 'sphinxbase._ad' extension
    swigging swig/sphinxbase/ad.i to swig/sphinxbase/ad_wrap.c
    swig -python -modern -Ideps/sphinxbase/include -Ideps/sphinxbase/include/sphinxbase -Ideps/sphinxbase/include/android -Ideps/sphinxbase/swig -outdir sphinxbase -o swig/sphinxbase/ad_wrap.c swig/sphinxbase/ad.i
    unable to execute 'swig': No such file or directory
    error: command 'swig' failed with exit status 1

    ----------------------------------------
Command "/home/UGI/akedia/.conda/envs/at3deploy/bin/python -u -c "import setuptools, tokenize;__file__='/backup/mltmp/pip-install-k3hazve3/pocketsphinx/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /backup/mltmp/pip-record-ws03agmf/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /backup/mltmp/pip-install-k3hazve3/pocketsphinx/

你安装了swig吗?无法执行'swig':没有这个文件或目录。 - FlyingTeller
请查看已编辑的答案,如果不起作用,请发表评论。 - anand_v.singh
1个回答

1

此处所述,您不能直接使用pip install tesseract进行安装,您需要根据您的操作系统执行命令。现在根据您的目录结构,我假设它是Linux系统,并且由于您使用了!pip,我相信您正在使用Ipyhton笔记本电脑或jupyter shell进行安装,在这种情况下,您需要分两部分运行它,首先运行

!apt-get install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig

然后运行

!apt-get install libpulse-dev
!pip install textract

看起来你还需要libpulse-dev,这在官方安装指南中没有提到。

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libpulse-mainloop-glib0
The following NEW packages will be installed:
  libpulse-dev libpulse-mainloop-glib0
0 upgraded, 2 newly installed, 0 to remove and 10 not upgraded.
Need to get 104 kB of archives.
After this operation, 714 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libpulse-mainloop-glib0 amd64 1:11.1-1ubuntu7.2 [22.1 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libpulse-dev amd64 1:11.1-1ubuntu7.2 [81.5 kB]
Fetched 104 kB in 1s (192 kB/s)
Selecting previously unselected package libpulse-mainloop-glib0:amd64.
(Reading database ... 119263 files and directories currently installed.)
Preparing to unpack .../libpulse-mainloop-glib0_1%3a11.1-1ubuntu7.2_amd64.deb ...
Unpacking libpulse-mainloop-glib0:amd64 (1:11.1-1ubuntu7.2) ...
Selecting previously unselected package libpulse-dev:amd64.
Preparing to unpack .../libpulse-dev_1%3a11.1-1ubuntu7.2_amd64.deb ...
Unpacking libpulse-dev:amd64 (1:11.1-1ubuntu7.2) ...
Setting up libpulse-mainloop-glib0:amd64 (1:11.1-1ubuntu7.2) ...
Setting up libpulse-dev:amd64 (1:11.1-1ubuntu7.2) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Collecting textract
Requirement already satisfied: docx2txt==0.6 in /usr/local/lib/python3.6/dist-packages (from textract) (0.6)
Requirement already satisfied: argcomplete==1.8.2 in /usr/local/lib/python3.6/dist-packages (from textract) (1.8.2)
Requirement already satisfied: EbookLib==0.15 in /usr/local/lib/python3.6/dist-packages (from textract) (0.15)
Requirement already satisfied: python-pptx==0.6.5 in /usr/local/lib/python3.6/dist-packages (from textract) (0.6.5)
Requirement already satisfied: six==1.10.0 in /usr/local/lib/python3.6/dist-packages (from textract) (1.10.0)
Requirement already satisfied: beautifulsoup4==4.5.3 in /usr/local/lib/python3.6/dist-packages (from textract) (4.5.3)
Requirement already satisfied: chardet==2.3.0 in /usr/local/lib/python3.6/dist-packages (from textract) (2.3.0)
Requirement already satisfied: xlrd==1.0.0 in /usr/local/lib/python3.6/dist-packages (from textract) (1.0.0)
Collecting pocketsphinx==0.1.3 (from textract)
  Using cached https://files.pythonhosted.org/packages/93/5f/a968e5d53d25e32deb78c3e169fd8612ecf53cc76e32cb40e19be35696af/pocketsphinx-0.1.3.tar.bz2
Requirement already satisfied: SpeechRecognition==3.6.3 in /usr/local/lib/python3.6/dist-packages (from textract) (3.6.3)
Requirement already satisfied: lxml in /usr/local/lib/python3.6/dist-packages (from EbookLib==0.15->textract) (4.2.6)
Requirement already satisfied: Pillow>=2.6.1 in /usr/local/lib/python3.6/dist-packages (from python-pptx==0.6.5->textract) (4.1.1)
Requirement already satisfied: XlsxWriter>=0.5.7 in /usr/local/lib/python3.6/dist-packages (from python-pptx==0.6.5->textract) (1.1.5)
Requirement already satisfied: olefile in /usr/local/lib/python3.6/dist-packages (from Pillow>=2.6.1->python-pptx==0.6.5->textract) (0.46)
Building wheels for collected packages: pocketsphinx
  Building wheel for pocketsphinx (setup.py) ... done
  Stored in directory: /root/.cache/pip/wheels/38/80/4f/ddc3e8c2b788f2c7f1d625ae870f6bafd3038ff04a3445a2f8
Successfully built pocketsphinx
Installing collected packages: pocketsphinx, textract
Successfully installed pocketsphinx-0.1.3 textract-1.6.1

你真的理解你复制的内容吗?如果是这样,为什么要像这样分开那个 apt-get 命令行呢? - Nils Werner
@NilsWerner,我看了安装指南,对于它在第二行和flac本身是一个命令行工具有点困惑,感谢您指出我的明显错误,不应该犯那个错误,另外,由于缺少一个依赖项,官方指南会导致安装失败,我也添加了这个内容,再次感谢您指出。 - anand_v.singh

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接