将十六进制字符串转换为字节数组

Question

将十六进制字符串转换为字节数组

54

如何将一个可变长度的十六进制字符串（例如"01A1"）转换为包含该数据的字节数组。

即将此转换为：

std::string = "01A1";

变成这样

char* hexArray;
int hexLength;

或者这个

std::vector<char> hexArray;

我希望将其写入文件，并使用hexdump -C命令，以便获取包含01A1的二进制数据。

- oracal

16

@alexvii 这并不是这个问题的答案。 - dhavenith

2

您可以将std::streams设置为十六进制模式，以便以十六进制格式读写数字。 - πάντα ῥεῖ

@makulik 我尝试使用流和std::hex，但是无法使其正常工作。你能给我一个例子吗？谢谢。 - oracal

我认为不需要任何ASCII减法，只需使用C API将其转换为字符数组，除非我理解问题有误。我在下面的答案中指出了API http://stackoverflow.com/a/17273020/986760。 - fkl

根据您对另一个答案的评论，我认为您需要在问题中添加一些内容，以说明当输入字符数为奇数时应该发生什么。缺失的0应该添加到字符串的开头还是结尾？ - Zan Lynx

@oracal 请看我的答案，使用stringstream方法。 - TheoretiCAL

23个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ZachB · Answer 1

如果你的目标是速度，我这里有一个编码器和解码器的AVX2 SIMD实现：https://github.com/zbjornson/fast-hex。这些基准测试比最快的标量实现快12倍。

- TooTone · Answer 2

我建议使用标准函数sscanf将字符串读入无符号整数，这样你就已经在内存中拥有所需的字节。如果你在大端机器上，可以直接从第一个非零字节开始写出整数的内存（使用memcpy）。然而，在一般情况下你不能安全地假设这一点，因此需要使用一些位掩码和移位操作来提取字节。

const char* src = "01A1";
char hexArray[256] = {0};
int hexLength = 0;

// read in the string
unsigned int hex = 0;
sscanf(src, "%x", &hex);

// write it out
for (unsigned int mask = 0xff000000, bitPos=24; mask; mask>>=8, bitPos-=8) {
    unsigned int currByte = hex & mask;
    if (currByte || hexLength) {
        hexArray[hexLength++] = currByte>>bitPos;
    }
}

- Закиров Ришат · Answer 3

这可能对某些人有用。将一组字节转换为字符串然后再转回去的逻辑，解决了零字符问题。

#include <sstream>
#include <iomanip>

std::string BytesToHex(const std::vector<char>& data, size_t len)
{
    std::stringstream ss;
    ss << std::hex << std::setfill('0');

    for(size_t index(0); index < len; ++index)
    {
        ss << std::setw(2) << static_cast<unsigned short>(data[index]);
    }

    return ss.str();
}

std::vector<char> HexToBytes(const std::string& data)
{
    std::stringstream ss;
    ss << data;

    std::vector<char> resBytes;
    size_t count = 0;
    const auto len = data.size();
    while(ss.good() && count < len)
    {
        unsigned short num;
        char hexNum[2];
        ss.read(hexNum, 2);
        sscanf(hexNum, "%2hX", &num);
        resBytes.push_back(static_cast<char>(num));
        count += 2;
    }
    return resBytes;
}

- Cristei Gabriel · Answer 4

如何在编译时实现这个功能

#pragma once

#include <memory>
#include <iostream>
#include <string>
#include <array>

#define DELIMITING_WILDCARD ' '

//  @sean :)
constexpr int _char_to_int( char ch )
{
    if( ch >= '0' && ch <= '9' )
        return ch - '0';

    if( ch >= 'A' && ch <= 'F' )
        return ch - 'A' + 10;

    return ch - 'a' + 10;
};

template <char wildcard, typename T, size_t N = sizeof( T )>
constexpr size_t _count_wildcard( T &&str )
{
    size_t count = 1u;
    for( const auto &character : str )
    {
        if( character == wildcard )
        {
            ++count;
        }
    }

    return count;
}

//  construct a base16 hex and emplace it at make_count
//  change 16 to 256 if u want the result to be when:
//  sig[0] == 0xA && sig[1] == 0xB = 0xA0B
//  or leave as is for the scenario to return 0xAB
#define CONCATE_HEX_FACTOR 16
#define CONCATE_HEX(a, b) ( CONCATE_HEX_FACTOR * ( a ) + ( b ) )

template
<   char skip_wildcard,
    //  How many occurances of a delimiting wildcard do we find in sig
    size_t delimiter_count,
    typename T, size_t N = sizeof( T )>
    constexpr auto _make_array( T &&sig )
{
    static_assert( delimiter_count > 0, "this is a logical error, delimiter count can't be of size 0" );
    static_assert( N > 1, "sig length must be bigger than 1" );

    //  Resulting byte array, for delimiter_count skips we should have delimiter_count integers
    std::array<int, delimiter_count> ret{};

    //  List of skips that point to the position of the delimiter wildcard in skip
    std::array<size_t, delimiter_count> skips{};

    //  Current skip
    size_t skip_count = 0u;

    //  Character count, traversed for skip
    size_t skip_traversed_character_count = 0u;
    for( size_t i = 0u; i < N; ++i )
    {
        if( sig[i] == DELIMITING_WILDCARD )
        {
            skips[skip_count] = skip_traversed_character_count;
            ++skip_count;
        }

        ++skip_traversed_character_count;
    }

    //  Finally traversed character count
    size_t traversed_character_count = 0u;

    //  Make count (we will supposedly have at least an instance in our return array)
    size_t make_count = 1u;

    //  Traverse signature
    for( size_t i = 0u; i < N; ++i )
    {
        //  Read before
        if( i == 0u )
        {
            //  We don't care about this, and we don't want to use 0
            if( sig[0u] == skip_wildcard )
            {
                ret[0u] = -1;
                continue;
            }

            ret[0u] = CONCATE_HEX( _char_to_int( sig[0u] ), _char_to_int( sig[1u] ) );
            continue;
        }

        //  Make result by skip data
        for( const auto &skip : skips )
        {
            if( ( skip == i ) && skip < N - 1u )
            {
                //  We don't care about this, and we don't want to use 0
                if( sig[i + 1u] == skip_wildcard )
                {
                    ret[make_count] = -1;
                    ++make_count;
                    continue;
                }

                ret[make_count] = CONCATE_HEX( _char_to_int( sig[i + 1u] ), _char_to_int( sig[i + 2u] ) );
                ++make_count;
            }
        }
    }

    return ret;
}

#define SKIP_WILDCARD '?'
#define BUILD_ARRAY(a) _make_array<SKIP_WILDCARD, _count_wildcard<DELIMITING_WILDCARD>( a )>( a )
#define BUILD_ARRAY_MV(a) _make_array<SKIP_WILDCARD, _count_wildcard<DELIMITING_WILDCARD>( std::move( a ) )>( std::move( a ) )

//  -----
//  usage
//  -----
template <int n>
constexpr int combine_two()
{
    constexpr auto numbers = BUILD_ARRAY( "55 8B EC 83 E4 F8 8B 4D 08 BA ? ? ? ? E8 ? ? ? ? 85 C0 75 12 ?" );
    constexpr int number = numbers[0];
    constexpr int number_now = n + number;
    return number_now;
}

int main()
{
    constexpr auto shit = BUILD_ARRAY( "?? AA BB CC DD ? ? ? 02 31 32" );
    for( const auto &hex : shit )
    {
        printf( "%x ", hex );
    }

    combine_two<3>();
    constexpr auto saaahhah = combine_two<3>();
    static_assert( combine_two<3>() == 88 );
    static_assert( combine_two<3>() == saaahhah );
    printf( "\n%d", saaahhah );
}

这个方法也可以用于运行时，但是对于这个你可能更喜欢其他更快的东西。

- GeorgeMakarov · Answer 5

我修改了TheoretiCAL的代码。

uint8_t buf[32] = {};
std::string hex = "0123";
while (hex.length() % 2)
    hex = "0" + hex;
std::stringstream stream;
stream << std::hex << hex;

for (size_t i= 0; i <sizeof(buf); i++)
    stream >> buf[i];

- nullptr · Answer 6

#include <iostream>

using byte = unsigned char;

static int charToInt(char c) {
    if (c >= '0' && c <= '9') {
        return c - '0';
    }
    if (c >= 'A' && c <= 'F') {
        return c - 'A' + 10;
    }
    if (c >= 'a' && c <= 'f') {
        return c - 'a' + 10;
    }
    return -1;
}

// Decodes specified HEX string to bytes array. Specified nBytes is length of bytes
// array. Returns -1 if fails to decode any of bytes. Returns number of bytes decoded
// on success. Maximum number of bytes decoded will be equal to nBytes. It is assumed
// that specified string is '\0' terminated.
int hexStringToBytes(const char* str, byte* bytes, int nBytes) {
    int nDecoded {0};
    for (int i {0}; str[i] != '\0' && nDecoded < nBytes; i += 2, nDecoded += 1) {
        if (str[i + 1] != '\0') {
            int m {charToInt(str[i])};
            int n {charToInt(str[i + 1])};
            if (m != -1 && n != -1) {
                bytes[nDecoded] = (m << 4) | n;
            } else {
                return -1;
            }
        } else {
            return -1;
        }
    }
    return nDecoded;
}

int main(int argc, char* argv[]) {
    if (argc < 2) {
        return 1;
    }

    byte bytes[0x100];
    int ret {hexStringToBytes(argv[1], bytes, 0x100)};
    if (ret < 0) {
        return 1;
    }
    std::cout << "number of bytes: " << ret << "\n" << std::hex;
    for (int i {0}; i < ret; ++i) {
        if (bytes[i] < 0x10) {
            std::cout << "0";
        }
        std::cout << (bytes[i] & 0xff);
    }
    std::cout << "\n";

    return 0;
}

- Willem van Ketwich · Answer 7

非常类似于其他答案，这就是我选择的：

typedef uint8_t BYTE;

BYTE* ByteUtils::HexStringToBytes(BYTE* HexString, int ArrayLength)
{
  BYTE* returnBytes;
  returnBytes = (BYTE*) malloc(ArrayLength/2);
  int j=0;

  for(int i = 0; i < ArrayLength; i++)
  {
    if(i % 2 == 0)
    {
      int valueHigh = (int)(*(HexString+i));
      int valueLow =  (int)(*(HexString+i+1));

      valueHigh = ByteUtils::HexAsciiToDec(valueHigh);
      valueLow =  ByteUtils::HexAsciiToDec(valueLow);

      valueHigh *= 16;
      int total = valueHigh + valueLow;
      *(returnBytes+j++) = (BYTE)total;
    }
  }
  return returnBytes;
}

int ByteUtils::HexAsciiToDec(int value)
{
  if(value > 47 && value < 59)
  {
    value -= 48;
  }
  else if(value > 96 && value < 103)
  {
    value -= 97;
    value += 10;
  }
  else if(value > 64 && value < 71)
  {
    value -= 65;
    value += 10;
  }
  else
  {
    value = 0;
  }
  return value;
}

- dontsov · Answer 8

输入："303132"，输出："012"。输入字符串可以是奇数或偶数长度。

char char2int(char input)
{
    if (input >= '0' && input <= '9')
        return input - '0';
    if (input >= 'A' && input <= 'F')
        return input - 'A' + 10;
    if (input >= 'a' && input <= 'f')
        return input - 'a' + 10;

    throw std::runtime_error("Incorrect symbol in hex string");
};

string hex2str(string &hex)
{
    string out;
    out.resize(hex.size() / 2 + hex.size() % 2);

    string::iterator it = hex.begin();
    string::iterator out_it = out.begin();
    if (hex.size() % 2 != 0) {
        *out_it++ = char(char2int(*it++));
    }

    for (; it < hex.end() - 1; it++) {
        *out_it++ = char2int(*it++) << 4 | char2int(*it);
    };

    return out;
}

- Zephyr · Answer 9

static bool Hexadec2xdigit(const std::string& data, std::string& buffer, std::size_t offset = sizeof(uint16_t))
{
    if (data.empty())
    {
        return false;
    }

    try
    {
        constexpr auto s_function_lambda = [] (const char* string) noexcept { return *static_cast<const uint16_t*>(reinterpret_cast<const uint16_t*>(string)); };
        {
            for (std::size_t i = 0, tmp = s_function_lambda(data.c_str() + i); i < data.size(); i += offset, tmp = s_function_lambda(data.c_str() + i))
            {
                if (std::isxdigit(data[i]))
                {
                    buffer += static_cast<char>(/*std::stoul*/std::strtoul(reinterpret_cast<const char*>(std::addressof(tmp)), NULL, 16));
                }
            }
        }

        return true;
    }
    catch (const std::invalid_argument& ex)
    {

    }
    catch (const std::out_of_range& ex)
    {

    }

    return false;
}

这段代码没有太多的复制过程

- Christophe · Answer 10

在将十六进制转换为字符的过程中的困难在于十六进制数字是成对出现的，例如：3132或A0FF。因此假设有一个偶数个十六进制数字。然而，实际上可能存在一个奇数个数字，比如：332和AFF，这应该被理解为0332和0AFF。

我提出改进Niels Keurentjes的hex2bin()函数的建议。首先我们要计算有效十六进制数字的数量。由于我们需要计数，让我们同时控制缓冲区的大小。

void hex2bin(const char* src, char* target, size_t size_target)
{
    int countdgts=0;    // count hex digits
    for (const char *p=src; *p && isxdigit(*p); p++) 
        countdgts++;                            
    if ((countdgts+1)/2+1>size_target)
        throw exception("Risk of buffer overflow");

顺便提一下，要使用isxdigit()，您需要#include <cctype>。
一旦我们知道有多少个数字，我们就可以确定第一个数字是高位数字（仅成对）还是不是（第一个数字不成对）。

bool ishi = !(countdgts%2);

然后我们可以逐位循环，使用二进制移位和二进制或运算符将每一对组合起来，并在每次迭代时切换“高”指示器：

    for (*target=0; *src; ishi = !ishi)  {    
        char tmp = char2int(*src++);    // hex digit on 4 lower bits
        if (ishi)
            *target = (tmp << 4);   // high:  shift by 4
        else *target++ |= tmp;      // low:  complete previous  
    } 
  *target=0;    // null terminated target (if desired)
}