比较位集的最快方法是什么（在位集上使用 < 运算符）？

Question

比较位集的最快方法是什么（在位集上使用 < 运算符）？

17

什么是实现std::bitset的<运算符的最优方法，该运算符对应于无符号整数表示的比较（它应该适用于超过64位的位集）？

一个简单的实现方式如下：

template<std::size_t N>
bool operator<(const std::bitset<N>& x, const std::bitset<N>& y)
{
    for (int i = N-1; i >= 0; i--) {
        if (x[i] && !y[i]) return false;
        if (!x[i] && y[i]) return true;
    }
    return false;
}

当我说“最优化的方式”时，我指的是使用位运算和元编程技巧（以及类似的东西）来实现。

编辑：我想我已经找到了诀窍：使用模板元编程进行编译时递归，并使用右位移来将位集比较为多个unsigned long long整数。但还没有清楚的想法如何做到这一点...

- Vincent

关于您使用正确的位移操作的想法：这将创建大量中间对象，并且 to_ullong 将不得不检查每个检查中移位后的值是否适合 unsigned long long，从而创建相当多的开销。我怀疑它不会更快，尽管只有基准测试才能证明这一点。 - Daniel Frey

1

复制 std::bitset 的代码，重命名它，并给它一个按字访问的方法。 - brian beuning

1

如果您已经复制了代码，那么您可以提供一个具有访问内部的 operator<。 - Daniel Frey

@Vincent 我已经更新了运行时间：按位（当前最高票数）、块式和模板递归块式。 - user

9个回答

5

如果您愿意采用STL bitset的解决方案，您可以使用以下内容：

template<int n>
bool compare(bitset<n>& l, bitset<n>& r){
  if(n > 64){
  typedef array<long, (n/64)> AsArray;
  return *reinterpret_cast<AsArray*>(&l)
       < *reinterpret_cast<AsArray*>(&r);
    }//else
  return l.to_ulong() < r.to_ulong();
}

编译器会忽略if语句中无关的分支。

- Klaus

4

我刚才查看了源代码，但不幸的是（除非我弄错了），它们似乎没有为特定位块提供const & unsigned long 的原地访问。如果有的话，您可以执行模板递归，并有效比较每个unsigned long而不是在一个unsigned long中的每个位。

毕竟，如果A < B，那么最重要的每个位a <= b，也是每个最重要的块A[i] <= B[i]。

我不想这么说，但我可能会使用C++11的std::array上的递归来制作自己的东西。如果您可以访问块，则可以轻松制作模板递归函数以执行此操作（并且由于您正在请求元编程，因此我确定您知道）让编译器有很好的优化机会。

总的来说，答案并不好，但那就是我会做的事情。

顺便说一句，问题很棒。

===========

编辑

这将计时三种方法：当前投票最多的方法，我描述的块策略和模板递归变体。我使用位集填充向量，然后重复使用指定的比较器函数进行排序。

愉快的编程！

在我的计算机上输出：

RUNTIMES：编译g++-std = c ++ 11-Wall-g test.cpp std :: bitset 4530000（OP中原始的6000000）块对块900000 模板递归730000

编译g++-std = c ++ 11-Wall-g-O3 test.cpp 运行时间： std :: bitset 700000（OP中的740000）块对块470000 模板递归530000

C++11代码：

#include <iostream>
#include <bitset>
#include <algorithm>
#include <time.h>

/* Existing answer. Note that I've flipped the order of bit significance to match my own */
template<std::size_t N>
class BitByBitComparator
{
public:
  bool operator()(const std::bitset<N>& x, const std::bitset<N>& y) const
  {
    for (int i = 0; i < N; ++i) {
      if (x[i] ^ y[i]) return y[i];
    }
    return false;
  }
};

/* New simple bit set class (note: mostly untested). Also note bad
   design: should only allow read access via immutable facade. */
template<std::size_t N>
class SimpleBitSet
{
public:
  static const int BLOCK_SIZE = 64;
  static const int LOG_BLOCK_SIZE = 6;
  static constexpr int NUM_BLOCKS = N >> LOG_BLOCK_SIZE;
  std::array<unsigned long int, NUM_BLOCKS> allBlocks;
  SimpleBitSet()
  {
    allBlocks.fill(0);
  }
  void addItem(int itemIndex)
  {
    // TODO: can do faster
    int blockIndex = itemIndex >> LOG_BLOCK_SIZE;
    unsigned long int & block = allBlocks[blockIndex];
    int indexWithinBlock = itemIndex % BLOCK_SIZE;
    block |= (0x8000000000000000 >> indexWithinBlock);
  }
  bool getItem(int itemIndex) const
  {
    int blockIndex = itemIndex >> LOG_BLOCK_SIZE;
    unsigned long int block = allBlocks[blockIndex];
    int indexWithinBlock = itemIndex % BLOCK_SIZE;
    return bool((block << indexWithinBlock) & 0x8000000000000000);
  }
};

/* New comparator type 1: block-by-block. */
template<std::size_t N>
class BlockByBlockComparator
{
public:
  bool operator()(const SimpleBitSet<N>& x, const SimpleBitSet<N>& y) const
  {
    return ArrayCompare(x.allBlocks, y.allBlocks);
  }

  template <std::size_t S>
  bool ArrayCompare(const std::array<unsigned long int, S> & lhs, const std::array<unsigned long int, S> & rhs) const
  {
    for (int i=0; i<S; ++i)
      {
    unsigned long int lhsBlock = lhs[i];
    unsigned long int rhsBlock = rhs[i];
    if (lhsBlock < rhsBlock) return true;
    if (lhsBlock > rhsBlock) return false;
      }
    return false;
  }
};

/* New comparator type 2: template recursive block-by-block. */
template <std::size_t I, std::size_t S>
class TemplateRecursiveArrayCompare;

template <std::size_t S>
class TemplateRecursiveArrayCompare<S, S>
{
public:
  bool operator()(const std::array<unsigned long int, S> & lhs, const std::array<unsigned long int, S> & rhs) const
  {
    return false;
  }
};

template <std::size_t I, std::size_t S>
class TemplateRecursiveArrayCompare
{
public:
  bool operator()(const std::array<unsigned long int, S> & lhs, const std::array<unsigned long int, S> & rhs) const
  {
    unsigned long int lhsBlock = lhs[I];
    unsigned long int rhsBlock = rhs[I];
    if (lhsBlock < rhsBlock) return true;
    if (lhsBlock > rhsBlock) return false;

    return TemplateRecursiveArrayCompare<I+1, S>()(lhs, rhs);
  }
};

template<std::size_t N>
class TemplateRecursiveBlockByBlockComparator
{
public:
  bool operator()(const SimpleBitSet<N>& x, const SimpleBitSet<N>& y) const
  {
    return TemplateRecursiveArrayCompare<x.NUM_BLOCKS, x.NUM_BLOCKS>()(x.allBlocks, y.allBlocks);
  }
};

/* Construction, timing, and verification code */
int main()
{
  srand(0);

  const int BITSET_SIZE = 4096;

  std::cout << "Constructing..." << std::endl;

  // Fill a vector with random bitsets
  const int NUMBER_TO_PROCESS = 10000;
  const int SAMPLES_TO_FILL = BITSET_SIZE;
  std::vector<std::bitset<BITSET_SIZE> > allBitSets(NUMBER_TO_PROCESS);
  std::vector<SimpleBitSet<BITSET_SIZE> > allSimpleBitSets(NUMBER_TO_PROCESS);
  for (int k=0; k<NUMBER_TO_PROCESS; ++k)
    {
      std::bitset<BITSET_SIZE> bs;
      SimpleBitSet<BITSET_SIZE> homemadeBs;
      for (int j=0; j<SAMPLES_TO_FILL; ++j)
    {
      int indexToAdd = rand()%BITSET_SIZE;
      bs[indexToAdd] = true;
      homemadeBs.addItem(indexToAdd);
    }

      allBitSets[k] = bs;
      allSimpleBitSets[k] = homemadeBs;
    }

  clock_t t1,t2,t3,t4;
  t1=clock();

  std::cout << "Sorting using bit-by-bit compare and std::bitset..."  << std::endl;
  const int NUMBER_REPS = 100;
  for (int rep = 0; rep<NUMBER_REPS; ++rep)
    {
      auto tempCopy = allBitSets;
      std::sort(tempCopy.begin(), tempCopy.end(), BitByBitComparator<BITSET_SIZE>());
    }

  t2=clock();

  std::cout << "Sorting block-by-block using SimpleBitSet..."  << std::endl;
  for (int rep = 0; rep<NUMBER_REPS; ++rep)
    {
      auto tempCopy = allSimpleBitSets;
      std::sort(tempCopy.begin(), tempCopy.end(), BlockByBlockComparator<BITSET_SIZE>());
    }

  t3=clock();

  std::cout << "Sorting block-by-block w/ template recursion using SimpleBitSet..."  << std::endl;
  for (int rep = 0; rep<NUMBER_REPS; ++rep)
    {
      auto tempCopy = allSimpleBitSets;
      std::sort(tempCopy.begin(), tempCopy.end(), TemplateRecursiveBlockByBlockComparator<BITSET_SIZE>());
    }

  t4=clock();

  std::cout << std::endl << "RUNTIMES:" << std::endl;
  std::cout << "\tstd::bitset        \t" << t2-t1 << std::endl;
  std::cout << "\tBlock-by-block     \t" << t3-t2 << std::endl;
  std::cout << "\tTemplate recursive \t" << t4-t3 << std::endl;
  std::cout << std::endl;

  std::cout << "Checking result... ";
  std::sort(allBitSets.begin(), allBitSets.end(), BitByBitComparator<BITSET_SIZE>());
  auto copy = allSimpleBitSets;
  std::sort(allSimpleBitSets.begin(), allSimpleBitSets.end(), BlockByBlockComparator<BITSET_SIZE>());
  std::sort(copy.begin(), copy.end(), TemplateRecursiveBlockByBlockComparator<BITSET_SIZE>());
  for (int k=0; k<NUMBER_TO_PROCESS; ++k)
    {
      auto stdBitSet = allBitSets[k];
      auto blockBitSet = allSimpleBitSets[k];
      auto tempRecBlockBitSet = allSimpleBitSets[k];

      for (int j=0; j<BITSET_SIZE; ++j)
    if (stdBitSet[j] != blockBitSet.getItem(j) || blockBitSet.getItem(j) != tempRecBlockBitSet.getItem(j))
      std::cerr << "error: sorted order does not match" << std::endl;
    }
  std::cout << "success" << std::endl;

  return 0;
}

- user

使用最新的gcc编译器和-O3优化选项，第二个选项是最快的，第三个选项非常接近，而第一个选项比第二个选项慢1.5倍。 - Michaël

4

尽管您说了“位集合”，但实际上您是在谈论任意精度无符号整数比较。如果是这样，那么您可能不会比使用GMP更容易。

从他们的网站上可以看到:

GMP经过精心设计，旨在尽可能快地处理小操作数和大操作数。通过使用fullwords作为基本算术类型，采用快速算法，针对许多CPU的常见内部循环使用高度优化的汇编代码，并且总体上强调速度来实现速度。

请参阅{{link1：其整数函数}}。

- waTeim

3

如何检查XOR的最高位？

bool operator<(const std::bitset<N>& x, const std::bitset<N>& y)
{
    return y[fls(x^y)]
}

int fls(const std::bitset<N>& n) {
    // find the last set bit
}

这里可以找到一些关于fps的想法，http://uwfsucks.blogspot.be/2007/07/fls-implementation.html。

- Łukasz Kidziński

问题：优化 fls 需要与原始问题一样访问位集的内部。 - Ben Voigt

2

好吧，有一个老牌的memcmp函数。它很脆弱，因为它依赖于std::bitset的实现，因此可能无法使用。但是可以合理地假设该模板创建了一个不透明的int数组，并且没有其他的记录字段。

template<std::size_t N>
bool operator<(const std::bitset<N>& x, const std::bitset<N>& y)
{
    int cmp = std::memcmp(&x, &y, sizeof(x));
    return (cmp < 0);
}

这将为位集确定唯一的排序方式。但这可能不是人类直观的顺序。它取决于哪些位用于哪个集合成员索引。例如，索引0可以是第一个32位整数的LSB。或者它可以是第一个8位字节的LSB。

我强烈建议编写单元测试以确保它实际使用起来有效。;->

- timmer

0

在几乎所有的64位CPU、编译器等中，bitset的底层实现都使用uint64，因为有一种明智的方式来编写这个类的实现，以满足给定接口，这使得很容易找到一个“可移植”的hack。

所以假设你只想要“显而易见”的高效方法，并且你的代码不会被用来控制核武库，完全知道这将使你的保修失效，等等，这就是你要找的代码：

template <int N> bool operator<(const bitset<N> & a, const bitset<N> & b) {

    const uint64_t * p = (const uint64_t *)(&a);
    const uint64_t * q = (const uint64_t *)(&b);

    const uint64_t * r = p;

    int i= (sizeof(bitset<N>)-1)/sizeof(uint64_t);

    for (p+=i, q+=i; (p>=r) && (*p==*q); --p, --q) {}

    return *p<*q;
}

基本上将其转换为uint64数组，并按相反的顺序逐个比较元素，直到找到差异。

还要注意，这假定了x86-64字节序。

- paperclip optimizer

0

只有在两个位集合已经不同的情况下才执行按位比较，这样可以提高一些性能：

template<std::size_t N>
bool operator<(const std::bitset<N>& x, const std::bitset<N>& y)
{       if (x == y)
                return false;
        ….
}

- user12411795

它并不重要，如果它们总是不同的。 - Sopel

0

我知道这是一个有点老的问题，但如果你知道bitset的最大大小，你可以创建像这样的东西：

class Bitset{
    vector<bitset<64>> bits;
    /*
     * operators that you need
    */
};

这样可以让你将每个bitsets<64>转换为unsigned long long以进行快速比较。如果您想要获取特定的位（以便更改它或其他操作），可以执行bits[id / 64][id % 64]

- capi1500

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Daniel Frey · Accepted Answer

显然的优化方案是

template<std::size_t N>
bool operator<(const std::bitset<N>& x, const std::bitset<N>& y)
{
    for (int i = N-1; i >= 0; i--) {
        if (x[i] ^ y[i]) return y[i];
    }
    return false;
}

除此以外，由于没有符合标准的访问方法，更多比特数用于测试似乎是不可能的。您可以对 x.to_string() < y.to_string() 进行基准测试，并希望 to_string() 和字符串比较都能比位访问一个bitset 更好地进行优化，但这是一个冒险。