何时应该使用 BitVector32？

Question

何时应该使用 BitVector32？

6

我正在处理一个项目，在某个时刻需要显示一个月中哪些天仍然可用。有一个计算哪些天可用的函数。我的同事说：“哦，我们知道，你应该返回一个 BitVector32。在处理布尔值列表时，这是最有效的方法。”我本来会使用 List<bool> 或类似的东西。对我来说，BitVector32 似乎是一种处理位时的低级别东西。

所以问题是，当您需要少于32个项的布尔列表时，是否应该使用 BitVector32，或者只有在处理低级别东西时才使用它？

- Matthijs Wessels

2个回答

4

BitVector32是c#位操作的封装（或者你可以称之为抽象）。例如，下面两个语句返回相同的结果：

1 << 1
BitVector32.CreateMask(1)

假设有一个包含一些重复数字的整数数组。我们想要找到所有重复项。当然，您可以简单地使用Linq中的GroupBy函数，但让我们假装我们没有Linq。

The first option is brute force approach where each element will be compared against every element in the given array:

foreach(int i in list) 
{
    foreach(int j in list)
    {
        if (i == j) 
        {
            // print this or store it in the result list
        }
    }
}

Since the brute force approach will result in N square running time, which is pretty inefficient, we can think of utilizing HashSet which will provide a constant lookup time or O(1)

HashSet<int> hashSet = new HashSet<int>();

foreach(int i in list)
{    
    if (hashSet.Contains(i))
    {
        // print the duplicate or add it to the result list
    }
    else
    {
        hashSet.Add(i);
    }
}

这种方法将导致线性运行时间或O(n)。然而，它需要额外的内存n * 4字节（假设我们讨论的是32位整数）。

The third approach is similar to using a hashset except it requires less memory by using a boolean array

bool[] masks = new bool[list.Length];

for (int i = 0; i < list.length; i++) 
{
    if (masks[list[i]])
    {
        // print or add to the result list
    }
    else
    {
        masks[list[i]] = true;
    }
}

它使用布尔数组而不是HashSet。它具有相同的运行时间，即O(n)，但需要1/4的内存量，因为布尔类型占用1个字节（8位），而整数占用4个字节（32位）。

Finally, we can solve this problem using the BitVector32 class or the native bit shifting operations.

int check = 0;
for (int i=0; i < list.Length; i++)
{
    int mask = 1 << list[i];
    if (check & mask == mask) 
    {
        // print or add list[i] to the result list
    }
    else
    {
        check = check | mask;
    }
}

它还会在总共只有32位内存的情况下导致线性运行时间。因此，内存使用量为n/32。当然，如果数组中的最大值大于32，则这种方法不适用。我们可以使用64位无符号整数来增加掩码中的插槽数量，但它仍有非常短的限制。在这种情况下，如果您创建了一个 BitVector32 数组，您可以将位移动到下一个索引的 BitVector32 对象中。例如，代码将如下所示：

BitVector32[] bitVectorArray = new BitVector32[maxValue / 32];
bitVectorArray[list[i] / 32] = 1 << list[i] % 32;

这样，您就不必受到32位大小限制的限制。只要内存容量允许，您可以无限扩大大掩码的大小。因此，将所有内容组合在一起：

// This code assumes you know the range of the number in the array
BitVector32[] bitVectorArray = new BitVector32[maxValue / 32];

for (int i=0; i < list.Length; i++)
{
    int mask = 1 << list[i] % 32;

    if (bitVectorArray[(list[i] - 1)/32][i] & mask == mask) 
    {
        // print or add list[i] to the result list
    }
    else
    {
        bitVectorArray[(list[i] - 1)/32] = bitVectorArray[list[i] / 32] | mask;
    }
}

- Jin

1

感谢您的回答。有一点需要指出，在BitVector32数组（以及字节数组）的初始化中，您使用了输入的大小。实际上，您应该使用您想要运行算法的域的大小。在Int32的情况下，这个大小是2^32。这会导致更大的内存使用。 - Matthijs Wessels

1

此外，我会使用初始容量来初始化HashSet。否则，由于它必须每隔一段时间增加其容量，你最终会得到O(n^2)的结果。 - Matthijs Wessels

我认为你可以通过使用Dictionary<int, BitVector32>来修复它，而不是使用BitVector32[]（并且使用Dictionary<int, bool>代替byte[]）。不确定这对内存使用情况会产生什么影响，但我认为它并不比你的HashSet<int>方法更好。 - Matthijs Wessels

1

在这个特定的例子中，字典在运行时和内存方面都不会给你任何优势，因为键是从0开始的递增整数，可以简单地用数组索引替换。字典会增加更多的开销。比哈希集合方法更好的是，它使用每个位作为掩码，所以内存使用量只有1/32。 - Jin

我认为在我们不知道数字范围且它们零散分布的情况下，我们可以使用字典。然而，与HashSet不同的是，每个BitVector32将给出32个掩码而不是1个。 - Jin

显示剩余6条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- CodesInChaos · Accepted Answer

使用列表很容易扩展到其他时间段。比如你想一次显示两个月。哦，那就比32大了。我需要改变返回类型和使用的所有地方。太好了！而且BitVector32甚至没有实现IEnumerable<T>。

除非它在一个紧密的循环中，可读性和可维护性优于效率。而且列表分配的开销并不大，除非你每秒执行一百万次。

所以我同意你只应该在低级别代码中使用BitVector32。