在C++中将一系列字节拆分为向量形式的字节。

Question

在C++中将一系列字节拆分为向量形式的字节。

5

我有一个字节字符串，看起来像下面这样：

"1,3,8,b,e,ff,10"

我应该如何将此字符串拆分为一个std :: vector的BYTE，其中包含以下值：

[ 0x01, 0x03, 0x08, 0x0b, 0x0e, 0xff, 0x10 ]

我尝试使用“，”作为分隔符来拆分字符串，但是我在这方面遇到了一些问题。有人能给我帮助，告诉我如何实现吗？

因此，我尝试过这个：

    std::istringstream iss("1 3 8 b e ff 10");
    BYTE num = 0;
    while(iss >> num || !iss.eof()) 
    {
        if(iss.fail()) 
        {
            iss.clear();
            std::string dummy;
            iss >> dummy;
            continue;
        }
        dataValues.push_back(num);
    }

但是这将 ASCII 字节值推入向量中：

49 //1
51 //3
56 //8
98 //b
101 //e
102 //f
102 //f
49 //1
48 //0

我尝试填充向量，而不是：

 0x01
 0x03
 0x08
 0x0b
 0x0e
 0xff
 0x10

- user3330644

2

你应该将你的代码中有问题的相关部分发出来，这样这里的人才能帮你修复它。 - Paul R

3

使用std::istringstream和std::hex I/O操作符一起使用。跳过,字符可以按此示例所示完成。 - πάντα ῥεῖ

@PaulR 刚刚进行了编辑。 - user3330644

@πάνταῥεῖ 我刚试了一下，但它没有采用正确的值。我编辑了帖子来解释我的意思。 - user3330644

@user3330644，正如我之前提到的那样，在while循环之前，你需要调用iss >> std::hex;。或者你可以写成while(iss >> std::hex >> num || !iss.eof())。此外，请注意BYTE只是unsigned char的一个别名，你应该首先将其输入到unsigned int中。 - πάντα ῥεῖ

3个回答

0

一个可工作的示例代码（在GCC 4.9.0和C++11中测试过）：

文件“save.txt”包含：第一行为“1,3,8,b,e,ff,10”。

输出：

1
3
8
b
e
ff
10

思路如下：

使用 std::getline 逐行读取。
使用 boost::split 根据分隔符拆分行。
使用 std::stringstream 将十六进制字符串转换为无符号字符。

代码：

#include <fstream>
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>
#include <boost/lexical_cast.hpp>

int main(int argc, char* argv[]) {
    std::ifstream ifs("e:\\save.txt");

    std::string line;
    std::vector<std::string> tokens;
    std::getline(ifs, line);
    boost::split(tokens, line, boost::is_any_of(","));

    std::vector<unsigned char> values;
    for (const auto& t : tokens) {
        unsigned int x;
        std::stringstream ss;
        ss << std::hex << t;
        ss >> x;

        values.push_back(x);
    }

    for (auto v : values) {
        std::cout << std::hex << (unsigned long)v << std::endl;
    }

    return 0;
}

- NetVipeC

0

只是为了演示另一种可能更快的方法来完成操作，请考虑将所有内容读入到一个数组中，并使用自定义迭代器进行转换。

class ToHexIterator : public std::iterator<std::input_iterator_tag, int>{
    char* it_;
    char* end_;
    int current_;
    bool isHex(const char c){
        return (c >= '0' && c <= '9') || (c >= 'a' && c <= 'f') || (c >= 'A' && c <= 'F');
    }
    char toUpperCase(const char c){
        if (c >= 'a' && c <= 'f'){
            return (c - 'a') + 'A';
        }
        return c;
    }
    int toNibble(const char c){
        auto x = toUpperCase(c);
        if (x >= '0' && x <= '9'){
            return x - '0';
        }
        else {
            return (x - 'A') + 10;
        }
    }
public:
    ToHexIterator() :it_{ nullptr }, end_{ nullptr }, current_{}{}                  //default constructed means end iterator
    ToHexIterator(char* begin, char* end) :it_{ begin }, end_{ end }, current_{}{
        while (!isHex(*it_) && it_ != end_){ ++it_; };  //make sure we are pointing to valid stuff
        ++(*this);
    }
    bool operator==(const ToHexIterator &other){
        return it_ == nullptr && end_ == nullptr && other.it_ == nullptr && other.end_ == nullptr;
    }
    bool operator!=(const ToHexIterator &other){
        return !(*this == other);
    }
    int operator*(){
        return current_;
    }
    ToHexIterator & operator++(){
        current_ = 0;
        if (it_ != end_) {
            while (isHex(*it_) && it_ != end_){
                current_ <<= 4;
                current_ += toNibble(*it_);
                ++it_;
            };
            while (!isHex(*it_) && it_ != end_){ ++it_; };
        }
        else {
            it_ = nullptr;
            end_ = nullptr;
        }
        return *this;
    }
    ToHexIterator operator++(int){
        ToHexIterator temp(*this);
        ++(*this);
        return temp;
    }
};

基本使用情况如下：

char in[] = "1,3,8,b,e,ff,10,--";
std::vector<int> v;
std::copy(ToHexIterator{ std::begin(in), std::end(in) }, ToHexIterator{}, std::back_inserter(v));

请注意，使用查找表进行ASCII到十六进制半字节转换可能会更快。

速度可能非常依赖于编译器优化和平台，但是由于一些istringstream函数是作为虚拟函数或函数指针实现的（取决于标准库的实现），因此优化器难以处理它们。在我的代码中，没有虚拟函数或函数指针，唯一的循环是在std::copy实现内部，优化器已经习惯了处理它。通常循环直到两个地址相等比循环直到某个变化指针指向的东西等于某个东西要快得多。归根结底，这都是猜测和巫术，但在我的机器上的MSVC13上，我的速度大约快10倍。这里有一个实时示例http://ideone.com/nuwu15在GCC上，速度介于10倍和3倍之间，具体取决于运行和哪个测试先运行（可能是因为某些缓存效应）。

总的来说，在这个抽象层面上，无疑还有更多的优化空间等等。任何声称“我的速度总是更快”的人都是在卖蛇油。

更新：使用编译时生成的查找表可以进一步提高速度：http://ideone.com/ady8GY（请注意，我增加了输入字符串的大小以减少噪音，因此这与上面的示例不是直接可比的）。

- odinthenerd

什么让你确信这段代码比标准流operator>>和十六进制解析实现更快？ - πάντα ῥεῖ

我添加了更多的解释和一个实时示例，测量了我的实现的时间和你的实现的近似值（我在两个实现中都使用了int而不是unsigned char，但这不应该有太大变化）。 - odinthenerd

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- πάντα ῥεῖ · Accepted Answer

您只需要解决一些出现在您使用评论中链接的答案的情况下的小问题。

    std::istringstream iss("1,3,8,b,e,ff,10");
    std::vector<BYTE> dataValues;

    unsigned int num = 0; // read an unsigned int in 1st place
                          // BYTE is just a typedef for unsigned char
    while(iss >> std::hex >> num || !iss.eof()) {
        if(iss.fail()) {
            iss.clear();
            char dummy;
            iss >> dummy; // use char as dummy if no whitespaces 
                          // may occur as delimiters
            continue;
        }
        if(num <= 0xff) {
            dataValues.push_back(static_cast<BYTE>(num));
        }
        else {
            // Error single byte value expected
        }
    }

你可以在ideone上这里看到完整的可运行示例。