使用reinterpret_cast的示例,不会触发未定义行为。

24

阅读https://en.cppreference.com/w/cpp/language/reinterpret_cast,我想知道在实践中使用reinterpret_cast的非UB用例是什么?

上述描述包含许多情况,其中将指针转换为其他类型然后返回是合法的。但这似乎用处不大。通过reinterpret_cast指针访问对象由于违反了严格别名(和/或对齐)而大多数是UB,除了通过char*/byte*-指针访问。

一个有用的例外是将整数常量强制转换为指针并访问目标对象,这对于操作硬件寄存器(在µC中)非常有用。

有人能告诉一些实际使用的reinterpret_cast相关的真实用例吗?


我不会将寄存器/整数转换为对象。这可能会给您一些更好方法的启示:使用单个实现处理硬件设备系列 - Ben Saks - CppCon 2021 - Pepijn Kramer
顺带一提,将void *转换为T *可以使用static_cast<T*>( void_pointer )而非reinterpret_cast,因此在实际应用中,真正需要使用reinterpret_cast的情况应该很少。 - Louis Go
3个回答

32
一些我想到的例子:
  • Reading/writing the object representation of a trivially-copyable object, for example to write the byte representation of the object to a file and read it back:

    // T must be trivially-copyable object type!
    T obj;
    
    //...
    
    std::ofstream file(/*...*/);
    file.write(reinterpret_cast<char*>(obj), sizeof(obj));
    
    //...
    
    std::ifstream file(/*...*/);
    file.read(reinterpret_cast<char*>(obj), sizeof(obj));
    

    Technically it is currently not really specified how accessing the object representation should work aside from directly passing on the pointer to memcpy et. al, but there is a current proposal for the standard to clarify at least how reading (but not writing) individual bytes in the object representation should work, see https://github.com/cplusplus/papers/issues/592.

  • Reinterpreting between signed and unsigned variants of the same integral type, especially char and unsigned char for strings, which may be useful if an API expects an unsigned string.

    auto str = "hello world!";
    auto unsigned_str = reinterpret_cast<const unsigned char*>(str);
    

    While this is allowed by the aliasing rules, technically pointer arithmetic on the resulting unsigned_str pointer is currently not defined by the standard. But I don't really see why it isn't.

  • Accessing objects nested within a byte buffer (especially on the stack):

    alignas(T) std::byte buf[42*sizeof(T)];
    new(buf+sizeof(T)) T;
    
    // later
    
    auto ptr = std::launder(reinterpret_cast<T*>(buf + sizeof(T)));
    

    This works as long as the address buf + sizeof(T) is suitably aligned for T, the buffer has type std::byte or unsigned char, and obviously is of sufficient size. The new expression also returns a pointer to the object, but one might not want to store that for each object. If all objects stored in the buffer are the same type, it would also be fine to use pointer arithmetic on a single such pointer.

  • Obtaining a pointer to a specific memory address. Whether and for which address values this is possible is implementation-defined, as is any possible use of the resulting pointer:

    auto ptr = reinterpret_cast<void*>(0x12345678);
    
  • Casting a void* returned by dlsym (or a similar function) to the actual type of a function located at that address. Whether this is possible and what exactly the semantics are is again implementation-defined:

    // my_func is a C linkage function with type `void()` in `my_lib.so`
    
    // error checking omitted!
    
    auto lib = dlopen("my_lib.so", RTLD_LAZY);
    
    auto my_func = reinterpret_cast<void(*)()>(dlsym(lib, "my_func");
    
    my_func();
    
  • Various round-trip casts may be useful to store pointer values or for type erasure.

    Round-trip of an object pointer through void* requires only static_cast on both sides and reinterpret_cast on object pointers is defined in terms of a two-step static_cast through (cv-qualified)void* anyway.

    Round-trip of an object pointer through std::uintptr_t, std::intptr_t, or another integral type large enough to hold all pointer values may be useful for having a representation of the pointer value that can be serialized (although I am not sure how often that really is useful). It is however implementation-defined whether any of these types exist. Typically they will, but exotic platforms where memory addresses cannot be represented as single integer values or all integer types are too small to cover the address space are permitted by the standard. I would also be vary of pointer analysis of the compiler causing issues depending on how you use this, see e.g. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 as just the first bug report I found. The standard isn't particularly clear on how the integer -> pointer cast is supposed to work especially when considering pointer provenance. See for example https://open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2318r1.pdf and other documents linked therein.

    Round-trip of a function pointer through any arbitrary function pointer type (likely void(*)()) may be useful to erase the type from arbitrary functions, although again I am not sure how often that really is useful. void* type-erased arguments are common in C APIs when a function just passes through data, but type-erased function pointers like that are less common.

    A round-trip cast of a function pointer through void* may be used in a similar way as above, as dlsym essentially does with the additional dynamic library complication. This is conditionally-supported only, although it is effectively required for POSIX systems. (It is not generally supported, because object and function pointer values may have distinct representations, size, alignment etc. on some more exotic platforms.)


10

另一个使用reinterpret_cast的现实世界示例是使用各种网络相关函数,这些函数接受struct sockaddr *参数,即recvfrom()bind()accept()

例如,以下是recvfrom函数的定义:

ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags,
                 struct sockaddr *src_addr, socklen_t *addrlen);

它的第五个参数被定义为struct sockaddr *src_addr,作为接受指向特定地址类型结构体的指针的通用接口(例如sockaddr_insockaddr_in6)。 Beej's Guide to Network Programming说:
在内存中,struct sockaddr_in和struct sockaddr_in6与struct sockaddr共享相同的开始结构,您可以自由地将一个类型的指针转换为另一个类型的指针,而不会有任何危害,除了可能的宇宙末日。只是开玩笑……如果当你将一个struct sockaddr_in*转换为struct sockaddr*时宇宙终结了,那么我向你保证这纯属巧合,你甚至不用担心它。因此,请记住,每当一个函数说它需要一个struct sockaddr*时,您可以轻松安全地将您的struct sockaddr_in*、struct sockaddr_in6*或struct sockadd_storage*转换为该类型。
例如:
int fd; // file descriptor value obtained elsewhere
struct sockaddr_in addr {};
socklen_t addr_len = sizeof(addr);
std::vector<std::uint8_t> buffer(4096);
    
const int bytes_recv = recvfrom(fd, buffer.data(), buffer.size(), 0,
                                reinterpret_cast<sockaddr*>(&addr), &addr_len);

1
需要注意的是,根据标准,将此转换的结果用作 sockaddr 的方式是未定义的行为。如果您手动访问它,则编译器可能会错误地编译代码。这里之所以安全,仅因为 recvfrom 指定要以这种方式使用,并且由其自己的实现来保证调用将使用转换指针正常工作(在这种情况下,通过简单地转发到系统调用)。 - user17732522
1
@user17732522:或者,使用这些结构编写的代码可以指定它仅适用于扩展语言以支持此类构造的实现和配置,认识到C++标准寻求指定符合实现的基本最低要求,并明确放弃为程序定义任何符合性类别。 - supercat
@user17732522:sendto系统调用可以在内部将指针转换为char并检查数据,这应该不会有未定义行为。在recvfrom中,这是不可能的,因为从char写入数据是UB。因此,写入必须在内核中完成(因此没有C++的限制)。 - wimalopaan
@user17732522:在C++中,允许将指针本身进行强制类型转换,但是不允许通过该指针访问对象(在C++中)。但是在C语言中,如果将其转换为整数类型(例如char*),则允许这样做。因此,在C中实现的recvfrom函数中,这应该是可以的。 - wimalopaan

3
reinterpret_cast最常见的用法是在C++语言被扩展的方言中使用,这些方言通过指定更多的行为方式来处理更多的情况。虽然对违反别名规则的代码进行错误处理是符合C++标准的,但标准并不要求这样做。根据C++标准本身:“尽管该文档仅陈述了对C++实现的要求,但如果将这些要求表述为针对程序、程序部分或程序执行的要求,则这些要求往往更易于理解… 如果一个程序包含违反规则但不需要诊断的情况,该文档对实现不会有任何要求。” 几乎所有的实现都可以将reinterpret_cast的语义扩展到各种对象的表示方式上,而不必考虑标准是否需要这样做。事实上,reinterpret_cast提供了一种统一的语法,用于处理利用这些扩展编写的非可移植构造,这比大多数“可移植”用法更加广泛地适用。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接