我正在尝试处理零大小类型(ZSTs),因为我很好奇它们在底层实际上是如何被实现的。考虑到ZSTs不需要任何内存空间,并且获取原始指针是一项安全操作,我对从不同种类的ZST“分配”中获得的原始指针以及结果对于安全Rust来说有多奇怪感兴趣。
我的第一次尝试(test_stk.rs
)是取几个栈上ZST实例的const指针:
struct Empty;
struct EmptyAgain;
fn main() {
let stk_ptr: *const Empty = &Empty;
let stk_ptr_again: *const EmptyAgain = &EmptyAgain;
let nested_stk_ptr = nested_stk();
println!("Pointer to on-stack Empty: {:?}", stk_ptr);
println!("Pointer to on-stack EmptyAgain: {:?}", stk_ptr_again);
println!("Pointer to Empty in nested frame: {:?}", nested_stk_ptr);
}
fn nested_stk() -> *const Empty {
&Empty
}
编译并运行此代码,将产生以下结果:
$ rustc test_stk.rs -o test_stk
$ ./test_stk
Pointer to on-stack Empty: 0x55ab86fc6000
Pointer to on-stack EmptyAgain: 0x55ab86fc6000
Pointer to Empty in nested frame: 0x55ab86fc6000
对进程内存映射进行简要分析后发现,0x55ab86fc6000
实际上不是一个栈分配,而是.rodata
节的开头。这似乎是合理的:编译器假装每个ZST都有一个单一的大小为零的值,在编译时已知,并且这些值中的每一个都驻留在.rodata
中,就像编译时常量一样。
第二次尝试是使用boxed ZSTs(test_box.rs
):
struct Empty;
struct EmptyAgain;
fn main() {
let ptr = Box::into_raw(Box::new(Empty));
let ptr_again = Box::into_raw(Box::new(EmptyAgain));
let nested_ptr = nested_box();
println!("Pointer to boxed Empty: {:?}", ptr);
println!("Pointer to boxed EmptyAgain: {:?}", ptr_again);
println!("Pointer to boxed Empty in nested frame: {:?}", nested_ptr);
}
fn nested_box() -> *mut Empty {
Box::into_raw(Box::new(Empty))
}
运行此代码段会得到:
$ rustc test_box.rs -o test_box
$ ./test_box
Pointer to boxed Empty: 0x1
Pointer to boxed EmptyAgain: 0x1
Pointer to boxed Empty in nested frame: 0x1
经过快速调试,发现这是零大小类型在 Rust 的 liballoc/alloc.rs
中的分配器工作方式:
unsafe fn exchange_malloc(size: usize, align: usize) -> *mut u8 {
if size == 0 {
align as *mut u8
} else {
// ...
}
}
根据 Nomicon,最小可能的对齐方式为 1,因此对于 ZSTs,box
操作符调用 exchange_malloc(0, 1)
,并得到的地址为 0x1
。
注意到 into_raw()
返回可变指针后,我决定使用可变指针 (test_stk_mut.rs
) 重新尝试之前的测试(在堆栈上):
struct Empty;
struct EmptyAgain;
fn main() {
let stk_ptr: *mut Empty = &mut Empty;
let stk_ptr_again: *mut EmptyAgain = &mut EmptyAgain;
let nested_stk_ptr = nested_stk();
println!("Pointer to on-stack Empty: {:?}", stk_ptr);
println!("Pointer to on-stack EmptyAgain: {:?}", stk_ptr_again);
println!("Pointer to Empty in nested frame: {:?}", nested_stk_ptr);
}
fn nested_stk() -> *mut Empty {
&mut Empty
}
运行此命令会输出以下内容:
$ rustc test_stk_mut.rs -o test_stk_mut
$ ./test_stk_mut
Pointer to on-stack Empty: 0x7ffc3817b5e0
Pointer to on-stack EmptyAgain: 0x7ffc3817b5f0
Pointer to Empty in nested frame: 0x7ffc3817b580
原来这一次我有了真正的栈分配值,每个值都有自己的地址!当我尝试顺序声明它们 (test_stk_seq.rs
) 时,我发现每个值占用了八个字节:
struct Empty;
fn main() {
let mut stk1 = Empty;
let mut stk2 = Empty;
let mut stk3 = Empty;
let mut stk4 = Empty;
let mut stk5 = Empty;
let stk_ptr1: *mut Empty = &mut stk1;
let stk_ptr2: *mut Empty = &mut stk2;
let stk_ptr3: *mut Empty = &mut stk3;
let stk_ptr4: *mut Empty = &mut stk4;
let stk_ptr5: *mut Empty = &mut stk5;
println!("Pointer to on-stack Empty: {:?}", stk_ptr1);
println!("Pointer to on-stack Empty: {:?}", stk_ptr2);
println!("Pointer to on-stack Empty: {:?}", stk_ptr3);
println!("Pointer to on-stack Empty: {:?}", stk_ptr4);
println!("Pointer to on-stack Empty: {:?}", stk_ptr5);
}
运行:
$ rustc test_stk_seq.rs -o test_stk_seq
$ ./test_stk_seq
Pointer to on-stack Empty: 0x7ffdba303840
Pointer to on-stack Empty: 0x7ffdba303848
Pointer to on-stack Empty: 0x7ffdba303850
Pointer to on-stack Empty: 0x7ffdba303858
Pointer to on-stack Empty: 0x7ffdba303860
所以,我无法理解以下几点:
为什么盒装ZST分配使用愚蠢的
0x1
地址而不是像“堆栈上”值一样更有意义的东西?为什么需要为堆栈上的ZST值分配真实空间,当有可变的原始指针可以指向它们?
为什么可变堆栈分配只使用了八个字节?我应该将这个大小视为“实际类型大小的0字节+8字节的对齐方式”吗?
NonNull::dangling()
相同,但这里有一些解释。 - rodrigo