删除一个const指针(而不是指向const的指针)是否合法?

4

编辑1:现在代码遵循“五个规则”。问题仍然存在。
编辑2:现在只传递void*给printf的%p。问题仍然存在。
编辑3:tl;dr:这是一个GCC bug

追踪一段代码中的分段错误,我注意到当出现像这样的一行时

    Lexer* const lexer_;

如果存在一个属性,代码会崩溃;而没有使用const时,它可以正常工作。

上述位置允许使用const吗?

供参考,下面是一个来自更大程序的C-Reduce'd C++代码,展示了这个问题。不幸的是,C-Reduce在某个点开始将标识符混淆为单个字母,所以我停止了简化,并尽可能使代码整洁。编译时,我使用的是Linux x86_64上的g++ v11.3。

> g++ main.cpp -o main.x -fsanitize=address -Werror=all -Werror=extra

运行时,它会打印出来。
0x602000000010 = new Lexer
0x602000000030 = new Token
0x7ffca90b51f0 = new Expression
0x7ffca90b51f0 = start delete Expression
0x602000000010 = start delete Lexer
0x602000000030 = delete Token
0x602000000010 = done delete Lexer
=================================================================
==1232849==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000000030 at pc 0x556fc889953d bp 0x7ffca90b5190 sp 0x7ffca90b5180
READ of size 8 at 0x602000000030 thread T0
    #0 0x556fc889953c in ExpressionParser::Expression::~Expression() (.../main.x+0x153c)
    ...
0x602000000030 is located 0 bytes inside of 8-byte region [0x602000000030,0x602000000038)
freed by thread T0 here:
    #0 0x7f5258f6f22f in operator delete(void*, unsigned long) .../libsanitizer/asan/asan_new_delete.cpp:172
    #1 0x556fc889965f in ExpressionParser::Lexer::~Lexer() (.../main.x+0x165f)
    ...
previously allocated by thread T0 here:
    #0 0x7f5258f6e1c7 in operator new(unsigned long) .../libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x556fc8899588 in ExpressionParser::Lexer::tokenize() (.../main.x+0x1588)
    ...
SUMMARY: AddressSanitizer: heap-use-after-free (/home/john/own/C/mp-gmp/const-problem/main-2.x+0x153c) in ExpressionParser::Expression::~Expression()
...

使用-D CONST=使lexer_成为非常量,代码运行正常并打印出以下内容:
0x602000000010 = new Lexer
0x602000000030 = new Token
0x7ffff44937e0 = new Expression
0x7ffff44937e0 = start delete Expression
0x602000000010 = start delete Lexer
0x602000000030 = delete Token
0x602000000010 = done delete Lexer
0x7ffff44937e0 = end delete Expression

另外一个可行的方法是使用virtual ~Lexer();;但是这应该是不需要的,因为Lexer没有虚方法。

来源

#include <cstdio>

#ifndef CONST
#define CONST const
#endif

class ExpressionParser
{
public:
    class Token;
    class Lexer;
    class Expression
    {
        friend ExpressionParser;
        Expression (Token *token) : expression_(token)
        {
            printf ("%p = new Expression\n", (void*) this);
        }
        Expression (const Expression&) = delete;
        Expression (Expression&&) = delete;
        void operator= (const Expression&) = delete;
        void operator= (Expression&&) = delete;
        ~Expression();
        Token *expression_;
    };
    static void eval();
};

using EP = ExpressionParser;

class EP::Lexer
{
public:
    Token *tokens_ = nullptr;
    Lexer()
    {
        printf ("%p = new Lexer\n", (void*) this);
    }
    Lexer (const Lexer&) = delete;
    Lexer (Lexer&&) = delete;
    void operator= (const Lexer&) = delete;
    void operator= (Lexer&&) = delete;
    ~Lexer();
    void tokenize();
};

class EP::Token
{
    friend ExpressionParser;
    Lexer * CONST lexer_;
    Token (Lexer *lexer) : lexer_(lexer)
    {
        printf ("%p = new Token\n", (void*) this);
    }
    Token (const Token&) = delete;
    Token (Token&&) = delete;
    void operator= (const Token&) = delete;
    void operator= (Token&&) = delete;
    ~Token()
    {
        printf ("%p = delete Token\n", (void*) this);
    }
};

void EP::eval()
{
    Lexer *lexer = new Lexer();
    lexer->tokenize();
    (void) Expression (lexer->tokens_);
}

EP::Expression::~Expression()
{
    printf ("%p = start delete Expression\n", (void*) this);
    delete expression_->lexer_;
    printf ("%p = end delete Expression\n", (void*) this);
}

void EP::Lexer::tokenize()
{
    tokens_= new Token (this);
}

EP::Lexer::~Lexer()
{
    printf ("%p = start delete Lexer\n", (void*) this);
    delete tokens_;
    printf ("%p = done delete Lexer\n", (void*) this);
}

int main (void)
{
    ExpressionParser::eval();
}

评论已被移至聊天室,请勿在此处继续讨论。在此评论之前,请查看评论的目的。通常不请求澄清或建议改进的评论应作为答案、在[元数据]中或在[聊天室]中发布。继续讨论的评论可能会被删除。 - Samuel Liew
2个回答

1
根据GCC C++维护者(正如apple apple在一条评论中指出的那样),这是一个已知的GCC bug,自2012年/v4.6以来就已经存在,即PR52339。它已经出现在v4.0中,但也可以通过当前主版本(future v14)或v11.3进行复现。原因是最终delete中的表达式被评估了多次,与[expr.delete]冲突:

4在delete-expression中的cast-expression应该被评估一次。

测试用例:
struct Lexer;

struct Token
{
    Lexer* const lexer_;
    Token (Lexer *l) : lexer_(l) {}
    ~Token() = default;

    Token() = delete;
    Token (const Token&) = delete;
    Token (Token&&) = delete;
    void operator= (const Token&) = delete;
    void operator= (Token&&) = delete;
};

struct Lexer
{
    Token *token_;
    Lexer() = default;
    ~Lexer() { delete token_; }

    Lexer (const Lexer&) = delete;
    Lexer (Lexer&&) = delete;
    void operator= (const Lexer&) = delete;
    void operator= (Lexer&&) = delete;
};

int main()
{
    Lexer *lexer = new Lexer();
    Token *token = new Token (lexer);
    lexer->token_ = token;
    delete token->lexer_;
    // delete lexer; // is OK
}

命令行

$ g++ main-3.cpp -O2 && ./a.out

但也可以通过-O0-m32来触发。


0

既然这个问题已经重新开放,我会将我的评论发布为答案:

使用godbolt的简化版本:https://godbolt.org/z/n1qzrWsdq

当执行delete token->lexer;时,析构函数~Lexer()会删除其token,而在本例中,该token是来自delete语句的token。此时,您正在删除仍在使用的指针,这将导致未定义的行为。

从非优化构建生成的汇编代码中可以看出:

        mov     rax, QWORD PTR [rbp-16]
        mov     rax, QWORD PTR [rax]
        mov     rdi, rax
        call    Lexer::~Lexer() [complete object destructor]
        mov     rax, QWORD PTR [rbp-16]
        mov     rax, QWORD PTR [rax]
        mov     esi, 8
        mov     rdi, rax
        call    operator delete(void*, unsigned long)

(其中[rbp-16]token的地址),您可以看到在调用~Lexer()之后,token被重新加载。


据我了解,由于delete不是函数调用而是操作符,所以删除与对token的取消引用没有顺序。这听起来有点疯狂,因为为了删除任何东西,必须已经取消引用了token。但是编译器可以随意取消引用它。另一个奇怪的事情是,只有使用Lexer* const lexer;才会发生这种情况,没有使用const,g++和clang也会做类似的事情。你能看出编译器为什么需要在~Lexer()之后再次取消引用token吗? - emacs drives me nuts
没有必要这样做,但正如你所说,编译器可以随时这样做。在调试模式(或无优化)下,编译器生成这样的代码非常普遍。当你打开优化时,它已经检测到了UB,所以它可以做任何想做的事情。如果你查看使用-O1(或更高版本)编译的代码,它只会生成一个new,但是会有两个delete。如果你删除const,g++将编译代码为mov eax, 0; ret,因为它可以/没有副作用。不过,奇怪的是它在const的情况下却没有这样做。 - ChrisMM
这是GCC的一个bug,请参见其他答案。 - emacs drives me nuts

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接