大量项目的Dictionary.Add运行时间过长

Question

大量项目的Dictionary.Add运行时间过长

5

我有一个C#应用程序，可以将文本文件中的数据存储到字典对象中。要存储的数据量可能相当大，所以插入条目需要很长时间。如果字典中有许多项，则由于用于存储字典数据的内部数组的调整大小，情况会变得更糟。因此，我使用将添加的项数量初始化字典，但这对速度没有影响。

下面是我的函数：

private Dictionary<IdPair, Edge> AddEdgesToExistingNodes(HashSet<NodeConnection> connections)
{
  Dictionary<IdPair, Edge> resultSet = new Dictionary<IdPair, Edge>(connections.Count);

  foreach (NodeConnection con in connections)
  {
    ...
    resultSet.Add(nodeIdPair, newEdge);
  }

  return resultSet;
}

在我的测试中，我插入了约300k个项目。我使用ANTS Performance Profiler检查运行时间，并发现当我用所需大小初始化字典时，resultSet.Add(...)的平均时间并没有改变。与我使用new Dictionary()初始化字典时相同（每个Add平均约为0.256毫秒）。这绝对是由于字典中的数据量引起的（尽管我已使用所需的大小初始化它）。对于前20k个项目，每个项目的添加平均时间为0.03毫秒。

有任何想法如何使添加操作更快吗？

提前感谢， Frank

这是我的IdPair-Struct：

public struct IdPair
{
  public int id1;
  public int id2;

  public IdPair(int oneId, int anotherId)
  {
    if (oneId > anotherId)
    {
      id1 = anotherId;
      id2 = oneId;
    }
    else if (anotherId > oneId)
    {
      id1 = oneId;
      id2 = anotherId;
    }
    else
      throw new ArgumentException("The two Ids of the IdPair can't have the same value.");
  }
}

- Aaginor

6

你是否在IdPair类中重写了 Equals 和 GetHashCode 方法？如果是，你的 GetHashCode 算法是否能生成良好分布的哈希值？ - LukeH

IdPair只是一个带有构造函数的结构体。我已经将它添加到我的问题中。 - Aaginor

2个回答

7

IdPair 是一个结构体，你没有覆盖 Equals 或 GetHashCode 方法。这意味着默认实现会被使用。

对于值类型，Equals 和 GetHashCode 的默认实现使用反射，这可能导致性能较差。尝试提供自己的方法实现并查看是否有所帮助。

我的建议是提供以下实现，可能不完全符合你的需求：

public struct IdPair : IEquatable<IdPair>
{
    // ...

    public override bool Equals(object obj)
    {
        if (obj is IdPair)
            return Equals((IdPair)obj);

        return false;
    }

    public bool Equals(IdPair other)
    {
        return id1.Equals(other.id1)
            && id2.Equals(other.id2);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            int hash = 269;
            hash = (hash * 19) + id1.GetHashCode();
            hash = (hash * 19) + id2.GetHashCode();
            return hash;
        }
    }
}

- LukeH

非常感谢，Luke。问题出在（标准）哈希函数上。有了你的解决方案，我平均每次添加操作的运行时间缩短到了约0.03毫秒。虽然比erikkallens的解决方案稍微慢一点，但比以前好多了。值得注意的是，预先设置字典的大小似乎根本没有任何（时间）影响。 - Aaginor

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- erikkallen · Accepted Answer

由于你有一个结构体，因此你可以获得Equals()和GetHashCode()的默认实现。正如其他人指出的那样，这并不是非常高效，因为它使用了反射，但我认为反射不是问题所在。

我猜测你的哈希码通过默认的GetHashCode()分布不均匀，这可能会发生，例如，如果默认实现返回所有成员的简单异或（在这种情况下，hash(a, b) == hash(b, a)）。我找不到任何关于ValueType.GetHashCode()如何实现的文档，但是尝试添加

public override int GetHashCode() {
    return oneId << 16 | (anotherId & 0xffff);
}

可能更好的选择。