我在一个非常庞大的图像集中使用dHash (http://www.hackerfactor.com/blog/index.php?url=archives/529-Kind-of-Like-That.html)。默认的调整大小为8像素:
def dhash(image, hash_size=8):
"""
Difference Hash computation.
following http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html
@image must be a PIL instance.
"""
image = image.convert("L").resize((hash_size + 1, hash_size), Image.ANTIALIAS)
pixels = numpy.array(image.getdata(), dtype=numpy.float).reshape((hash_size + 1, hash_size))
# compute differences
diff = pixels[1:, :] > pixels[:-1, :]
return ImageHash(diff)
如果我们将这个算法应用于大量图像,那么由于短哈希指纹,我不会遇到碰撞吗?
最好的hash_size是多少?hash_size越大不是更准确吗?它选择8是因为有特殊原因吗?