编译器如何为虚函数调用生成代码？

Question

编译器如何为虚函数调用生成代码？

19

在此输入图片描述

CAT *p;
...
p->speak();
...

有些书中说编译器会将 p->speak() 翻译为：

(*p->vptr[i])(p); //i is the idx of speak in the vtbl

我的问题是：由于在编译时无法知道p的真实类型，这意味着无法知道要使用哪个vptr或vtbl。那么编译器如何生成正确的代码？

[修改后]

例如：

void foo(CAT* c)
{
    c->speak();
    //if c point to SmallCat
    // should translate to (*c->vptr[i])(p); //use vtbl at 0x1234   
    //if c point to CAT
    // should translate to (*c->vptr[i])(p); //use vtbl at 0x5678  

    //since ps,pc all are CAT*, why does compiler can generate different code for them 
    //in compiler time?
}

...
CAT *ps,*pc;
ps = new SmallCat;  //suppose SmallCat's vtbl address is 0x1234;
pc = new CAT;       //suppose CAT's vtbl address is 0x5678;
...
foo(ps);
foo(pc)
...

有任何想法吗？谢谢。

- camino

最终这取决于编译器的实现细节... - πάντα ῥεῖ

@camino 现在你的新箭头让图表看起来正确了！ - Sergey Kalinichenko

4个回答

9

这意味着在方法调用期间无法知道使用哪个vptr或vtbl。但是在构造对象时，实际上已经知道了所构造对象的类型，并且编译器将生成代码在构造函数中初始化vptr以指向相应类的vtbl。所有后续虚拟方法调用都将通过此vptr调用正确vtbl中的方法。

有关基本对象如何进行此初始化（在顺序调用多个构造函数的情况下），请参考类似问题的这个答案获取更多详细信息。

- user3146587

这也是为什么你不应该在构造函数中调用虚函数的原因。 - Alexander Oh

2

@Alex：在 C++ 中没有理由回避它：这种行为是安全的（与 Java 不同，在 Java 中可能会访问未初始化的成员，因此不鼓励这样做）。那些在 C++ 中反对它的人通常会混淆两种语言，这是这些人而不是语言本身的问题。 - MSalters

2

@MSalters：人们在C++中不鼓励使用它，不是因为它不安全，而是因为它不直观，因此容易出错。 - Mooing Duck

1

@MSalters 我觉得如果人们不知道C++如何构建它的对象，他们会感到困惑。理论上，也可能是另一种方式，即我先构建派生对象，一旦设置了vptr，就会去初始化基类。总的来说，这种行为可能会让人感到惊讶。如果你知道编译器正在做什么，这样做是完全有效的。这不是关于引起未定义的行为，而是关于让其他人感到困惑，他们可能会阅读它。 - Alexander Oh

2

@Alex：你应该知道构造顺序，不仅因为虚函数，而且因为所有成员都是按照从基类到派生类的顺序创建的。C++会尽力帮助你：this->只允许访问已经构造好的成员。 - MSalters

显示剩余3条评论

6

编译器会隐式地为具有一个或多个虚函数的每个类添加指针，称为 vptr。

你可以使用 sizeof 测量这样的类，可以看到它比预期多出 4 或 8 个字节，具体取决于 sizeof(void*)。

编译器还会将每个类的构造函数中添加一段隐式代码，用于设置 vptr 指向函数指针表（也称为虚函数表）。

当对象实例化时，它的类型会被明确 "提及"。

例如：A a(1) 或者 A* p = new B(2)。

因此，在构造函数内部，在运行时，vptr 可以轻松地设置为指向正确的虚函数表。

以上面的例子为例：

a 的 vptr 被设置为指向 class A 的虚函数表。
p 的 vptr 被设置为指向 class B 的虚函数表。

顺便说一下，构造函数与所有其他函数不同之处在于，必须显式使用对象类型才能调用它（因此构造函数永远不能声明为虚函数）。

以下是编译器为虚函数 p->speak() 生成正确代码的方式：

CAT *p;
...
p = new SuperCat("SaberTooth",2); // p->vptr = SuperCat_Vtable
...
p->speak(); // See pseudo assembly code below

Ax = p               // Get the address of the instance
Bx = p->vptr         // Get the address of the instance's V-Table
Cx = Bx + CAT::speak // Add the number of the function in its class
Dx = *Cx             // Get the address of the appropriate function
Push Ax              // Push the address of the instance into the stack
Push Dx              // Push the address of the function into the stack
CallF                // Save some registers and jump to the beginning of the function

编译器在 CAT 类 层次结构中的所有 speak 函数中使用相同的编号（索引）。

以下是编译器如何为非虚函数 p->eat() 生成正确代码：

p->eat(); // See pseudo assembly code below

Ax = p        // Get the address of the instance
Bx = CAT::eat // Get the address of the function
Push Ax       // Push the address of the instance into the stack
Push Bx       // Push the address of the function into the stack
CallF         // Save some registers and jump to the beginning of the function

由于编译时已知eat函数的地址，因此汇编代码更加高效。

最后，以下是在运行时将'vptr'设置为正确的V表的方法：

class SmallCat
{
    void* vptr; // implicitly added by the compiler
    ...         // your explicit variables
    SmallCat()
    {
        vptr = (void*)0x1234; // implicitly added by the compiler
        ...                   // Your explicit code
    }
};

当你实例化CAT* p = new SmallCat()时，会创建一个新对象，它的vptr = 0x1234。

- barak manos

在编译时，我们没有真正的对象，因此无法使用其vptr来获取相应的vtbl，对吧？ - camino

1

不过，正如我所说的，我们可以添加代码，在运行时将vptr设置为指向正确的V-Table。 - barak manos

我已经添加了一个例子，希望它能让我的问题清晰明了。 - camino

谢谢。我将“pc->speak()”移动到一个函数中，那么在这种情况下，编译器会如何处理它？ - camino

1

当然了！c指向一个类型为CAT或SmallCat的对象。当你创建该对象时，它的vptr字段被设置为指向正确的VTable。请参见更新的答案。 - barak manos

显示剩余4条评论

4

当您编写以下内容时（我已将所有用户代码替换为小写字母）：

class cat {
public:
    virtual void speak() {std::cout << "meow\n";}
    virtual void eat() {std::cout << "eat\n";}
    virtual void destructor() {std::cout << "destructor\n";}
};

编译器会神奇地生成所有这些内容（我所有的示例编译器代码都是大写的）：

class cat;
struct CAT_VTABLE_TYPE { //here's the cat's vtable type
    void(*speak)(cat* this); //contains a pointer for each virtual function
    void(*eat)(cat* this);
    void(*destructor)(cat* this);
};
extern CAT_VTABLE_TYPE CAT_VTABLE; //later is a global shared copy of the vtable
class cat { //here's the class you typed
private:
    CAT_VTABLE_TYPE* vptr; //but the compiler adds this magic member
public:
    cat() :vptr(&CAT_VTABLE) {} //the compiler initializes the vtable ptr
    ~cat() {vptr->destructor(this);} //redirects to the one you coded
    void speak() {vptr->speak(this);} //redirects to the one you coded
    void eat() {vptr->eat(this);} //redirects to the one you coded
};

//Here's the functions you programmed
void DEFAULT_CAT_SPEAK(CAT* this) {std::cout << "meow\n";}
void DEFAULT_CAT_EAT(CAT* this) {std::cout << "eat\n";}
void DEFAULT_CAT_DESTRUCTOR(CAT* this) {std::cout << "destructor\n";}
//and the global cat vtable (shared by all cat objects)
const CAT_VTABLE_TYPE CAT_VTABLE = {
    DEFAULT_CAT_SPEAK, 
    DEFAULT_CAT_EAT, 
    DEFAULT_CAT_DESTRUCTOR};

嗯，这是很多内容了，不是吗？（实际上我有点作弊，因为在定义对象之前我会取对象的地址，但这样写更简洁、更易懂，即使在技术上无法编译）。你可以看出来为什么他们将其构建到语言中了。而且，在此之前，这就是SmallCat：

class smallcat : public cat {
public:
    virtual void speak() {std::cout << "meow2\n";}
    virtual void destructor() {std::cout << "destructor2\n";}
};

改变后：

class smallcat;
//here's the smallcat's vtable type
struct SMALLCAT_VTABLE_TYPE : public CAT_VTABLE_TYPE { 
     //contains no additional virtual functions that cat didn't have
};
extern SMALLCAT_VTABLE_TYPE SMALLCAT_VTABLE; //later is a global shared copy of the vtable
class smallcat : public cat { //here's the class you typed
public:
    smallcat() :vptr(&SMALLCAT_VTABLE) {} //the compiler initializes the vtable ptr
    //The other functions already are virtual, nothing additional needed
};
//Here's the functions you programmed
void DEFAULT_SMALLCAT_SPEAK(CAT* this) {std::cout << "meow2\n";}
void DEFAULT_SMALLCAT_DESTRUCTOR(CAT* this) {std::cout << "destructor2\n";}
//and the global cat vtable (shared by all cat objects)
const SMALLCAT_VTABLE_TYPE SMALLCAT_VTABLE = {
    DEFAULT_SMALLCAT_SPEAK, 
    DEFAULT_CAT_EAT, //note: eat wasn't overridden
    DEFAULT_SMALLCAT_DESTRUCTOR};

因此，如果这篇文章太长了，编译器就会为每个类型创建一个VTABLE对象，该对象指向该特定类型的成员函数，然后将指针插入到每个实例内部。

当您创建一个“smallcat”对象时，编译器构建“cat”父对象，并将“vptr”分配给指向全局的“CAT_VTABLE”。紧接着，编译器构建派生对象“smallcat”，它覆盖“vptr”成员以使其指向全局的“SMALLCAT_VTABLE”。

当您调用“c->speak（）”时，编译器会产生对其拷贝的“cat :: speak”的调用（看起来像“this->vptr->speak（this）; ”）。因此，“vptr”成员可能指向全局的“CAT_VTABLE”或全局的“SMALLCAT_VTABLE”，因此该表的“speak”指针指向“DEFAULT_CAT_SPEAK”（您放在“cat :: speak”中的内容）或“DEFAULT_SMALLCAT_SPEAK”（您放在“smallcat :: speak”中的代码）。因此，“this->vptr->speak（this）;”最终会调用最派生类型的函数，无论最派生类型是什么。

总的来说，这确实非常令人困惑，因为编译器在编译时会自动重命名函数。实际上，由于多重继承，在现实中比我在这里展示的更加复杂。

- Mooing Duck

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sergey Kalinichenko · Accepted Answer

你的图片缺少的是从 CAT 和 SmallCAT 对象到它们对应的虚函数表（vtbl）的箭头。编译器将指向虚函数表的指针嵌入对象本身中——可以将其视为隐藏成员变量。这就是为什么添加第一个虚拟函数会在内存占用中“花费”一个指针。虚函数表指针是由构造函数中的代码设置的，因此所有由编译器生成的虚拟调用只需要做的就是解引用指向 this 的指针以获取其运行时的虚函数表。

当然，使用虚拟继承和多重继承会更加复杂：编译器需要生成稍微不同的代码，但基本过程保持不变。

下面更详细地解释了你的示例：

CAT *p1,*p2;
p1 = new SmallCat;  //suppose its vtbl address is 0x1234;
// The layout of SmallCat object includes a vptr as a hidden member.
// At this point, the value of this vptr is set to 0x1234.
p2 = new CAT;       //suppose its vtbl address is 0x5678;
// The layout of Cat object also includes a vptr as a hidden member.
// At this point, the value of this vptr is set to 0x5678.
(*p1->vptr[i])(p); //should use vtbl at 0x1234
// Compiler has enough information to do that, because it squirreled away 0x1234
// inside the SmallCat object at the time it was constructed.
(*p2->vptr[i])(p); //should use vtbl at 0x5678
// Same deal - the constructor saved 0x5678 inside the Cat, so we're good.