安装tesseract-ocr时出现错误。

4
我想使用 pytesseract 进行 OCR。所以我安装了它。但在此之前,我需要安装 tesseract-ocr。我使用的是 Windows 8.1。我打开了命令行并运行了命令 pip install tesseract-ocr。以下是该命令的结果。
我无法理解这里发生了什么。请帮助我理解并成功地在我的电脑上安装 tesseract。
C:\Users\HarshLaptop>pip install tesseract-ocr
Collecting tesseract-ocr
  Using cached https://files.pythonhosted.org/packages/e2/0d/dcee3dd0fc4c7bcd181
25a98f8ba6d9db7aecaa40770595203e312649587/tesseract-ocr-0.0.1.tar.gz
Requirement already satisfied: cython in c:\users\harshlaptop\anaconda3\lib\site
-packages (from tesseract-ocr) (0.25.2)
Building wheels for collected packages: tesseract-ocr
  Running setup.py bdist_wheel for tesseract-ocr ... error
  Complete output from command c:\users\harshlaptop\anaconda3\python.exe -u -c "
import setuptools, tokenize;__file__='C:\\Users\\HARSHL~1\\AppData\\Local\\Temp\
\pip-install-x8nz3uhm\\tesseract-ocr\\setup.py';f=getattr(tokenize, 'open', open
)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __f
ile__, 'exec'))" bdist_wheel -d C:\Users\HARSHL~1\AppData\Local\Temp\pip-wheel-s
j29zfyo --python-tag cp36:
  running bdist_wheel
  running build
  running build_py
  file tesseract_ocr.py (for module tesseract_ocr) not found
  file tesseract_ocr.py (for module tesseract_ocr) not found
  running build_ext
  building 'tesseract_ocr' extension
  creating build
  creating build\temp.win-amd64-3.6
  creating build\temp.win-amd64-3.6\Release
  C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c
 /nologo /Ox /W3 /GL /DNDEBUG /MD -Ic:\users\harshlaptop\anaconda3\include -Ic:\
users\harshlaptop\anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual S
tudio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10
240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Pro
gram Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows
Kits\8.1\include\winrt" /EHsc /Tptesseract_ocr.cpp /Fobuild\temp.win-amd64-3.6\R
elease\tesseract_ocr.obj
  tesseract_ocr.cpp
  tesseract_ocr.cpp(463): fatal error C1083: Cannot open include file: 'leptonic
a/allheaders.h': No such file or directory
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN
\\x86_amd64\\cl.exe' failed with exit status 2

  ----------------------------------------
  Failed building wheel for tesseract-ocr
  Running setup.py clean for tesseract-ocr
Failed to build tesseract-ocr
Installing collected packages: tesseract-ocr
  Running setup.py install for tesseract-ocr ... error
    Complete output from command c:\users\harshlaptop\anaconda3\python.exe -u -c
 "import setuptools, tokenize;__file__='C:\\Users\\HARSHL~1\\AppData\\Local\\Tem
p\\pip-install-x8nz3uhm\\tesseract-ocr\\setup.py';f=getattr(tokenize, 'open', op
en)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, _
_file__, 'exec'))" install --record C:\Users\HARSHL~1\AppData\Local\Temp\pip-rec
ord-vnlr99lk\install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    file tesseract_ocr.py (for module tesseract_ocr) not found
    file tesseract_ocr.py (for module tesseract_ocr) not found
    running build_ext
    building 'tesseract_ocr' extension
    creating build
    creating build\temp.win-amd64-3.6
    creating build\temp.win-amd64-3.6\Release
    C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe
/c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ic:\users\harshlaptop\anaconda3\include -Ic
:\users\harshlaptop\anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual
 Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.
10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\P
rogram Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Window
s Kits\8.1\include\winrt" /EHsc /Tptesseract_ocr.cpp /Fobuild\temp.win-amd64-3.6
\Release\tesseract_ocr.obj
    tesseract_ocr.cpp
    tesseract_ocr.cpp(463): fatal error C1083: Cannot open include file: 'lepton
ica/allheaders.h': No such file or directory
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\B
IN\\x86_amd64\\cl.exe' failed with exit status 2

    ----------------------------------------
Command "c:\users\harshlaptop\anaconda3\python.exe -u -c "import setuptools, tok
enize;__file__='C:\\Users\\HARSHL~1\\AppData\\Local\\Temp\\pip-install-x8nz3uhm\
\tesseract-ocr\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.rea
d().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" insta
ll --record C:\Users\HARSHL~1\AppData\Local\Temp\pip-record-vnlr99lk\install-rec
ord.txt --single-version-externally-managed --compile" failed with error code 1
in C:\Users\HARSHL~1\AppData\Local\Temp\pip-install-x8nz3uhm\tesseract-ocr\`enter code here`

你的Python来自Anaconda发行版。在这种情况下,通常更好地选择使用conda而不是pip。你尝试过conda install tesseract吗? - Alex Yu
请阅读在什么情况下我可以在我的问题中添加“紧急”或其他类似短语,以便获得更快的答案? - 总结是这不是一个理想的方式来处理志愿者,并且可能会适得其反,导致无法获得答案。请不要在您的问题中添加此内容。 - halfer
@Ingaz 包未找到错误 - Harsh Vardhan
尝试使用 pytesseract - Mooncrater
4个回答

7

我曾经也遇到过完全相同的问题。在Windows 10机器上使用Visual Studio 2017并安装了Python 3.6。我采取的解决方法是:

  1. Download and Install tesseract-ocr executable from https://github.com/UB-Mannheim/tesseract/wiki (Script assumes running from a windows system and saved tesseract installation to the default location suggested I.e. C:\Program Files (x86)\Tesseract-OCR) See https://github.com/tesseract-ocr/tesseract/wiki for more information on installing on different OS types (including windows), using the pre-built binary package.
  2. Ensure you have Python Imaging Library('PIL') or 'pillow' package installed for opening images. (installing PIL didn't work in my setting but pillow did i.e. pip install pillow). The reason you need this is because it is required by pytesseract. See https://pypi.org/project/pytesseract/0.2.5/ for more info on that.
  3. Then to use it successfully in your code simply set the tesseract_cmd path within your code as follows:

    from PIL import Image
    import pytesseract
    
    try:
    img = Image.open(path/to/image.png) 
    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract'
    text = pytesseract.image_to_string(path/to/image.png)
    Print(text)
    

    Hope it helps.


0

您可以通过以下方式安装tesseract:

  • -pip install tesseract

-pip install tesseract-oct似乎无法正常工作。 因此,我从这里下载了所需语言的.testeddatahttps://github.com/tesseract-ocr/tesseract,并将其粘贴到本地机器的testeddata文件夹中。


0

你需要安装leptonica,Tesseract需要它。


什么是leptonica? - Harsh Vardhan
Leptonica是tesseract的一个库和依赖项。https://github.com/DanBloomberg/leptonica - A.s.e
我已经安装了Visual Studio。还需要安装Leptonica吗? - Harsh Vardhan
是的,我的朋友,这是一个用于图像处理的库,tesseract使用它并依赖它。 - A.s.e

0

为了安装leptonica,您需要按照此link进行操作。

conda install -c conda-forge leptonica

然而,这并不是一个完整的解决方案,无法在安装Tesseract-OCR时消除错误。
您需要使用可在此处下载的Windows安装程序安装Tesseract。 然后,您应该安装Python包装器,如下所示:
pip install pytesseract

最后但并非最不重要的是,在导入pytesseract库后,您还应该在脚本中设置tesseract路径,如下所示(请不要忘记安装路径可能在您的情况下被修改!):
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接