我正在寻找一个处理大文件内容的哈希解决方案(在32位操作系统中可能超过2GB)。是否有简单的解决方案?或者只能分段读取并加载到缓冲区中?
outputBuffer
: 输入数组的一部分副本,用于计算哈希码。 哪一部分?这在示例中使用,但它不是一个副本,而是一个别名引用。这到底是做什么的?(不清楚的文档不是你的错,我敢打赌TransformBlock 会完成工作,只是我想在使用方法之前了解其规范)。 - Eamon Nerbonneoffset += sha.TransformBlock(input, offset, size, input, offset);
可以安全地替换为offset += sha.TransformBlock(input.Skip(offset).Take(size).ToArray(), 0, size, null, -24124512);
-最后两个参数显然被忽略。 - Eamon Nerbonne如果您选择使用{{link1:TransformBlock
}},那么您可以安全地忽略最后一个参数并将outputBuffer设置为null
。TransformBlock将从输入复制到输出数组-但是为什么您要无缘无故地简单复制位?
此外,所有mscorlib HashAlgorithms都按您所期望的方式工作,即块大小似乎不会影响哈希输出;无论您是否通过更改inputOffset
在块中传递数据然后分块哈希,还是通过传递较小的单独数组进行哈希都没有关系。我使用以下代码验证了这一点:
(这有点长,只是为了让人们自己验证HashAlgorithm
实现是否合理)。
public static void Main() {
RandomNumberGenerator rnd = RandomNumberGenerator.Create();
byte[] input = new byte[20];
rnd.GetBytes(input);
Console.WriteLine("Input Data: " + BytesToStr(input));
var hashAlgoTypes = Assembly.GetAssembly(typeof(HashAlgorithm)).GetTypes()
.Where(t => typeof(HashAlgorithm).IsAssignableFrom(t) && !t.IsAbstract);
foreach (var hashType in hashAlgoTypes)
new AlgoTester(hashType).AssertOkFor(input.ToArray());
}
public static string BytesToStr(byte[] bytes) {
StringBuilder str = new StringBuilder();
for (int i = 0; i < bytes.Length; i++)
str.AppendFormat("{0:X2}", bytes[i]);
return str.ToString();
}
public class AlgoTester {
readonly byte[] key;
readonly Type type;
public AlgoTester(Type type) {
this.type=type;
if (typeof(KeyedHashAlgorithm).IsAssignableFrom(type))
using(var algo = (KeyedHashAlgorithm)Activator.CreateInstance(type))
key = algo.Key.ToArray();
}
public HashAlgorithm MakeAlgo() {
HashAlgorithm algo = (HashAlgorithm)Activator.CreateInstance(type);
if (key != null)
((KeyedHashAlgorithm)algo).Key = key;
return algo;
}
public byte[] GetHash(byte[] input) {
using(HashAlgorithm sha = MakeAlgo())
return sha.ComputeHash(input);
}
public byte[] GetHashOneBlock(byte[] input) {
using(HashAlgorithm sha = MakeAlgo()) {
sha.TransformFinalBlock(input, 0, input.Length);
return sha.Hash;
}
}
public byte[] GetHashMultiBlock(byte[] input, int size) {
using(HashAlgorithm sha = MakeAlgo()) {
int offset = 0;
while (input.Length - offset >= size)
offset += sha.TransformBlock(input, offset, size, input, offset);
sha.TransformFinalBlock(input, offset, input.Length - offset);
return sha.Hash;
}
}
public byte[] GetHashMultiBlockInChunks(byte[] input, int size) {
using(HashAlgorithm sha = MakeAlgo()) {
int offset = 0;
while (input.Length - offset >= size)
offset += sha.TransformBlock(input.Skip(offset).Take(size).ToArray()
, 0, size, null, -24124512);
sha.TransformFinalBlock(input.Skip(offset).ToArray(), 0
, input.Length - offset);
return sha.Hash;
}
}
public void AssertOkFor(byte[] data) {
var direct = GetHash(data);
var indirect = GetHashOneBlock(data);
var outcomes =
new[] { 1, 2, 3, 5, 10, 11, 19, 20, 21 }.SelectMany(i =>
new[]{
new{ Hash=GetHashMultiBlock(data,i), Name="ByMSDN"+i},
new{ Hash=GetHashMultiBlockInChunks(data,i), Name="InChunks"+i}
}).Concat(new[] { new { Hash = indirect, Name = "OneBlock" } })
.Where(result => !result.Hash.SequenceEqual(direct)).ToArray();
Console.Write("Testing: " + type);
if (outcomes.Any()) {
Console.WriteLine("not OK.");
Console.WriteLine(type.Name + " direct was: " + BytesToStr(direct));
} else Console.WriteLine(" OK.");
foreach (var outcome in outcomes)
Console.WriteLine(type.Name + " differs with: " + outcome.Name + " "
+ BytesToStr(outcome.Hash));
}
}
ComputeHash
的这种重载肯定是最简单易用的。为了完整起见,我会保留我的答案。 - driis