C++学习之final优化

我们都知道虚函数在引用语义下表现出多态,但也多了一次寻址开销

实际性能开销主要源自虚函数抑制了内联优化

首先我们先看一下以下代码的输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
class A {
public:
virtual void f() {
cout << "A::f()" << "\n";
}
virtual void g() {
cout << "A::g()" << "\n";
}
};
class B : public A {
public:
void f() override {
cout << "B::f()" << "\n";
}
void g() override {
cout << "B::g()" << "\n";
}
};
class C : public B {
public:
void f() override {
cout << "C::f()" << "\n";
}
void g() override {
cout << "C::g()" << "\n";
}
};

int main() {
A* a = new A();
a->f();
a->g();

B* b = new B();
*(std::uint64_t*)a = *(std::uint64_t*)b;
a->f();
a->g();

A aa = *a;
aa.f();
aa.g();

a = new C();
a->f();
a->g();
}

A::f()
A::g()

B::f()
B::g()

A::f()
A::g()

C::f()
C::g()

第一次输出很好理解,就是简单的函数调用,第二次和第四次输出a都指向了子类,所以调用了子类重写的虚函数,第三次输出是值语义,不表现多态

我们再看另外一份代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class A {
public:
virtual void f() {
cout << "A::f()" << "\n";
}
virtual void g() {
cout << "A::g()" << "\n";
}
};
class B : public A {
public:
void f() override {
cout << "B::f()" << "\n";
}
void g() override {
cout << "B::g()" << "\n";
}
};
class C : public B {
public:
void f() override {
cout << "C::f()" << "\n";
}
void g() override {
cout << "C::g()" << "\n";
}
};

int main() {
A* a = new B();
C* c = new C();

a->f();
a->g();

c->f();
c->g();

*(std::uint64_t*)a = *(std::uint64_t*)c;
a->f();
a->g();

static_cast<B*>(a)->f();
static_cast<B*>(a)->g();
}

与之前的输出的原理一致

B::f()
B::g()

C::f()
C::g()

C::f()
C::g()

C::f()
C::g()

观察汇编代码也可以很清楚的搞清其中的原理,所有的调用都在查询虚表(64位程序所以是+8)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
	a->f();
00007FF7E38F4F17 48 8B 03 mov rax,qword ptr [rbx]
00007FF7E38F4F1A 48 8B CB mov rcx,rbx
00007FF7E38F4F1D FF 10 call qword ptr [rax]
a->g();
00007FF7E38F4F1F 48 8B 03 mov rax,qword ptr [rbx]
00007FF7E38F4F22 48 8B CB mov rcx,rbx
00007FF7E38F4F25 FF 50 08 call qword ptr [rax+8]

c->f();
00007FF7E38F4F28 48 8B 07 mov rax,qword ptr [rdi]
00007FF7E38F4F2B 48 8B CF mov rcx,rdi
00007FF7E38F4F2E FF 10 call qword ptr [rax]
c->g();
00007FF7E38F4F30 48 8B 07 mov rax,qword ptr [rdi]
00007FF7E38F4F33 48 8B CF mov rcx,rdi
00007FF7E38F4F36 FF 50 08 call qword ptr [rax+8]

*(std::uint64_t*)a = *(std::uint64_t*)c;
00007FF7E38F4F39 48 8B 07 mov rax,qword ptr [rdi]
00007FF7E38F4F3C 48 89 03 mov qword ptr [rbx],rax
a->f();
00007FF7E38F4F3F 48 8B CB mov rcx,rbx
00007FF7E38F4F42 FF 10 call qword ptr [rax]
a->g();
00007FF7E38F4F44 48 8B 03 mov rax,qword ptr [rbx]
00007FF7E38F4F47 48 8B CB mov rcx,rbx
00007FF7E38F4F4A FF 50 08 call qword ptr [rax+8]

static_cast<B*>(a)->f();
00007FF7E38F4F4D 48 8B 03 mov rax,qword ptr [rbx]
00007FF7E38F4F50 48 8B CB mov rcx,rbx
00007FF7E38F4F53 FF 10 call qword ptr [rax]
static_cast<B*>(a)->g();
00007FF7E38F4F55 48 8B 03 mov rax,qword ptr [rbx]
00007FF7E38F4F58 48 8B CB mov rcx,rbx
00007FF7E38F4F5B FF 50 08 call qword ptr [rax+8]

现在我们稍微修改一下代码,对B的f函数加上Final,并对C的继承加上Final

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
class A {
public:
virtual void f() {
cout << "A::f()" << "\n";
}
virtual void g() {
cout << "A::g()" << "\n";
}
};
class B : public A {
public:
void f() override final {
cout << "B::f()" << "\n";
}
void g() override {
cout << "B::g()" << "\n";
}
};
class C final : public B {
public:
// error
//void f() override {
// cout << "C::f()" << "\n";
//}

void g() override {
cout << "C::g()" << "\n";
}
};

int main() {
A* a = new B();
C* c = new C();

a->f();
a->g();

c->f();
c->g();

*(std::uint64_t*)a = *(std::uint64_t*)c;
a->f();
a->g();

static_cast<B*>(a)->f();
static_cast<B*>(a)->g();
}

输出如下,可以观察到输出改变了

B::f()
B::g()

B::f()
C::g()

B::f()
C::g()

B::f()
C::g()

现在回到汇编代码就可以很清晰的知道发生了什么

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
	a->f();
00007FF619A64F07 48 8B 03 mov rax,qword ptr [rbx]
00007FF619A64F0A 48 8B CB mov rcx,rbx
00007FF619A64F0D FF 10 call qword ptr [rax]
a->g();
00007FF619A64F0F 48 8B 03 mov rax,qword ptr [rbx]
00007FF619A64F12 48 8B CB mov rcx,rbx
00007FF619A64F15 FF 50 08 call qword ptr [rax+8]

c->f();
00007FF619A64F18 48 8B CF mov rcx,rdi
00007FF619A64F1B E8 7D C9 FF FF call B::f (07FF619A6189Dh)
c->g();
00007FF619A64F20 48 8B CF mov rcx,rdi
00007FF619A64F23 E8 1A C5 FF FF call C::g (07FF619A61442h)

*(std::uint64_t*)a = *(std::uint64_t*)c;
00007FF619A64F28 48 8B 07 mov rax,qword ptr [rdi]
00007FF619A64F2B 48 89 03 mov qword ptr [rbx],rax
a->f();
00007FF619A64F2E 48 8B CB mov rcx,rbx
00007FF619A64F31 FF 10 call qword ptr [rax]
a->g();
00007FF619A64F33 48 8B 03 mov rax,qword ptr [rbx]
00007FF619A64F36 48 8B CB mov rcx,rbx
00007FF619A64F39 FF 50 08 call qword ptr [rax+8]

static_cast<B*>(a)->f();
00007FF619A64F3C 48 8B CB mov rcx,rbx
00007FF619A64F3F E8 59 C9 FF FF call B::f (07FF619A6189Dh)
static_cast<B*>(a)->g();
00007FF619A64F44 48 8B 03 mov rax,qword ptr [rbx]
00007FF619A64F47 48 8B CB mov rcx,rbx
00007FF619A64F4A FF 50 08 call qword ptr [rax+8]

对于第一次输出,父类指针a指向了子类B,所以调用采用查询虚表的形式,与之前一致

对于第二次输出,由于C在继承时添加了Final,这表明了不会有别的类进一步继承C,因此C调用函数的过程在编译期可确定,不再查询虚表

对于第三次输出,父类指针a指向了c,此时保持查询虚表的形式

对于第四次输出,我们把指针a转换位B*类型,由于B中的f函数添加了final,可以确定继承B的子类不会进一步重写f函数,因此B调用函数f的过程也是编译期可以确定的,不再查询虚表

通过上面的例子不难总结,当使用了final时:

  1. 对于指针指向子类对象时,调用虚函数依然需要查询虚表(编译期无法确定)
  2. 对于指针指向自身类型时,调用虚函数可以根据final进行优化,一旦时编译期可以确定的行为,那么讲不再查询虚表

这样的优化叫做去虚拟化