unordered_map的哈希函数是确定性的吗？

Question

unordered_map的哈希函数是确定性的吗？

3

我正在尝试调试代码库中的缓存问题，我们使用 std::unordered_map 哈希一个结构体 custom_key，如下所定义，但我认为它应该只包含POD数据类型，因此我认为C++库实现的哈希函数将被使用。我想知道这个函数是否是确定性的？

struct custom_key {
  // bunch of uint8_t variables
  // some int8_t variables
  // some boolean variables
  another_struct params;
}

struct another_struct {
  // an enum variable
  // some int variables
  int dimensions[5];
}

我们的缓存算法是

// some_input_key is coming in from a stream
std::unordered_map<custom_key, some_output_type, custom_hasher<custom_key>, custom_equal<custom_key>> ht;
if (ht.find(some_input_key) != ht.end()) {
  // do something
} else {
  ht[some_input_key] = ....
}

在我们的输入流中，有一组“custom_keys”，其中可能有相同的值，但我们无法在哈希表中找到某些后续相同键。
例如，我们的流将是“key1，key2，key3，key4，...”，假设“key1”和“key4”是相同的，有时它将无法在哈希表中找到“key4”。我相当确定哈希函数是确定性的，所以我对问题的原因感到困惑。我已经打印了两个键的内容，它们看起来是相同的。
编辑：似乎我们确实有一个自定义哈希函数。

template <typename T>
// based on Fowler–Noll–Vo hash function
struct custom_hasher {
  static_assert(std::is_pod<T>::value, "T is not POD");

  size_t operator()(const T& input) const {
    auto ptr = reinterpret_cast<const uint8_t*>(&input);
    uint32_t value = 0x811C9DC5;
    // we have an implementation for irange that does something similar to the one in python
    for (const auto i : irange((int)sizeof(T))) {
      value ^= ptr[i];
      value *= 0x01000193;
    }
    return (size_t)value;
  }
};

template <typename T>
struct custom_equal {
  static_assert(std::is_pod<T>::value, "T is not POD");

  bool operator()(const T& lhs, const T& rhs) const {
    auto ptr1 = reinterpret_cast<const uint8_t*>(&lhs);
    auto ptr2 = reinterpret_cast<const uint8_t*>(&rhs);
    return memcmp(ptr1, ptr2, sizeof(T)) == 0;
  }
};

- 24n8

1

你尝试过打印两个键的哈希值并查看它们是否确实相同吗？默认情况下是std::hash<custom_key>，但如果您使用了未显示的自定义哈希函数，我们只能推测其影响。另外，您省略的//do something也可能没有产生您想要的效果。 - Nathan Pierson

1

键的==有没有实现？ - fabian

1

你的哈希函数看起来会受到custom_key数据字段之间填充的影响。同样，你的相等运算符也会受到影响。 - G.M.

2

“为什么结构体的sizeof不等于每个成员的sizeof之和？”这篇文章讨论了填充和对齐，并提供了一些示例。 - Retired Ninja

1

@24n8：class和struct之间唯一的区别在于它们默认的私有继承和成员，很少有情况需要使用其中一个关键字（例如，template <class T>是可以的，但template <struct T>不行），或者与您先前的声明保持一致。我在旧的SO答案中列出了所有的差异。无论如何，foo* p = new foo{}既是我的意思，也是我写的（在句子后面与foo* p = new foo;进行对比，后者是非零初始化版本）。谢谢！ - Tony Delroy

显示剩余19条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Captain Hatteras · Accepted Answer

cppreference.com:

对于两个相等的参数k1和k2，std::hash()(k1) == std::hash()(k2)。对于两个不相等的参数k1和k2，std::hash()(k1) == std::hash()(k2)的概率应该非常小，接近于1.0/std::numeric_limitsstd::size_t::max()。

std::unordered_map将在其键上使用哈希。具有相同哈希值的对象进入相同的桶中。但是，如果两个东西具有相同的值，则将忽略第二个键值对。如果key1和key4相同，则会丢弃key4及其值。

您要使用的是std::unordered_multimap（请参见此处）： cppreference.com:

无序多重映射是一种支持等效键（unordered_multimap可以包含每个键值的多个副本） （已加重点）的无序关联容器。

编辑：正如其他人所说，填充可能会影响您的哈希结果。一种替代方法是分别对结构体的每个成员进行哈希，然后组合哈希。