OCR和字符相似度

Question

OCR和字符相似度

mathmatrixocrprojection

5

我目前正在开发某种OCR（光学字符识别）系统。我已经编写了一个脚本来提取文本中的每个字符并清除（大部分）的不规则性。我也知道字体。现在我拥有的图像示例为：

M（http://i.imgur.com/oRfSOsJ.png（字体）和http://i.imgur.com/UDEJZyV.png（扫描））

K（http://i.imgur.com/PluXtDz.png（字体）和http://i.imgur.com/TRuDXSx.png（扫描））

C（http://i.imgur.com/wggsX6M.png（字体）和http://i.imgur.com/GF9vClh.png（扫描））

对于所有这些图像，我已经有了一种二进制矩阵（黑色为1，白色为0）。我现在想知道是否有一种数学投影式公式来查看这些矩阵之间的相似性。我不想依赖于库，因为这不是给我的任务。

我知道这个问题可能有点模糊，也有类似的问题，但我正在寻找方法，而不是包，到目前为止我没有找到任何关于方法的评论。这个问题之所以模糊，是因为我真的没有开始的地方。我想做的事情实际上在维基百科上描述：

Matrix matching是一种按像素逐个比较图像与存储字形的技术，也称为“模式匹配”或“模式识别”[9]。这种技术依赖于正确地将输入字形从图像中分离出来，并且存储的字形在类似字体和相同比例尺度下。该技术最适用于打印文字，遇到新字体时效果不佳。这是早期基于物理光电池的OCR直接实现的技术。http://en.wikipedia.org/wiki/Optical_character_recognition#Character_recognition

- JohannesB

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Spektre · Accepted Answer

为了识别或分类，大多数OCR使用神经网络

这些必须被正确配置以适应所需的任务，如层数、内部互连体系结构等。此外，神经网络的问题在于它们必须得到适当的训练，这很难做到，因为您需要知道一些事情，例如适当的训练数据集大小（使其包含足够的信息且不会过度训练）。如果您没有神经网络的经验，请不要自己实现！

还有其他比较模式的方法。

vector approach
- polygonize image (edges or border)
- compare polygons similarity (surface area, perimeter, shape ,....)
pixel approach

You can compare images based on:
- histogram
- DFT/DCT spectral analysis
- size
- number of occupied pixels per each line
- start position of occupied pixel in each line (from left)
- end position of occupied pixel in each line (from right)
- these 3 parameters can be done also for rows
- points of interest list (points where is some change like intensity bump,edge,...)
You create feature list for each tested character and compare it to your font and then the closest match is your character. Also these feature list can be scaled to some fixed size (like 64x64) so the recognition became invariant on scaling.

Here is sample of features I use for OCR

In this case (the feature size is scaled to fit in NxN) so each character has 6 arrays by N numbers like:
```
 int row_pixels[N]; // 1nd image
 int lin_pixels[N]; // 2st image
 int row_y0[N];     // 3th image green
 int row_y1[N];     // 3th image red
 int lin_x0[N];     // 4th image green
 int lin_x1[N];     // 4th image red
```
Now: pre-compute all features for each character in your font and for each readed character. Find the most close match from font
- min distance between all feature vectors/arrays
- not exceeding some threshold difference
This is partially invariant on rotation and skew up to a point. I do OCR for filled characters so for outlined font it may have use some tweaking

[注]

为了比较，您可以使用距离或相关系数