浅析 C++ 异常机制

What good can using exceptions do for me? The basic answer is: Using exceptions for error handling makes your code simpler, cleaner, and less likely to miss errors. But what’s wrong with “good old errno and if-statements”? The basic answer is: Using those, your error handling and your normal code are closely intertwined. That way, your code gets messy and it becomes hard to ensure that you have dealt with all errors.

由于 C++ 异常机制复杂的特性,编写异常安全的代码不是件轻松的事情。异常增强了语言的表达能力,但也带来了不可避免的开销。本文简单分析了 C++ 异常机制的实现原理,并总结相关注意事项。

Stack Unwinding

call-stack-layout

计算机程序的 Call Stack(调用栈) 由多层 Stack Frame(栈帧)组成。每个栈帧对应一个正在进行的子程序(函数)调用过程,子程序返回时,则弹出栈帧。图中简单展示了一个调用栈的布局:栈的方向为自底向上;程序 DrawSquare 调用子程序 DrawLine;

x86_64 体系下,栈的方向为自顶向下,Stack Pointer 保存在 RSP 寄存器中,如果 C/C++ 程序编译时保留函数的 Frame Pointer,则该数据保存在 RBP 寄存器中。参考附录,每当通过 call 指令进入被调函数时,RSP 中对应就是保存在栈上的 Return Address(返回地址,对应代码段),如果可以通过代码相关地址解析出栈帧布局,就可以逐层回溯整个调用栈。总体上而言,Stack Unwinding 泛指这种解析|展开|修改调用栈的行为。stack-unwinding 主要有以下作用:

  • 栈回溯(用于 debug,监控,perf,crash 报告 等场景)
  • 异常机制

Frame Pointer 栈回溯

基于 frame-pointer 进行栈回溯是最简单通用的做法。需要占用一个寄存器专门存储 frame-pointer,并在栈帧上相对固定的位置存储相关数据,近似于把栈帧以链表的形式串联。x86_64 下,可以通过 frame-pointer 进行栈回溯的函数,其代码指令形如:

1
2
3
4
5
6
7
push %rbp
mov %rsp, %rbp

...

pop %rbp
retq

test1

backtrace()backtrace_symbols() 是 C++ 自带的栈回溯功能,其内部实现不完全依赖 frame-pointer。__builtin_return_address(n) 表示从当前函数回溯 n 层获取返回地址,实现上依赖 frame-pointer。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// test1.cpp

#include <cstdio>
#include <cstdlib>
#include <execinfo.h>

#ifndef NO_INLINE
#define NO_INLINE __attribute__((__noinline__))
#endif

using ULL = unsigned long long;

NO_INLINE void dump_traceback() {
const int size = 200;
void *buffer[size];
int nptrs = backtrace(buffer, size);
char **strings = backtrace_symbols(buffer, nptrs);
if (strings) {
for (int i = 0; i < nptrs; ++i) {
printf("[%d] %s\n", i, strings[i]);
}
free(strings);
}
}

template <int level> NO_INLINE void *f3() {
dump_traceback();
return __builtin_return_address(level);
}
template <int level> NO_INLINE void *f2() { return f3<level>(); }
template <int level> NO_INLINE void *f1() { return f2<level>(); }

int main(int argc, char **argv) {
if (argc > 1) {
f1<2>();
} else {
f1<0>();
}
return 0;
}

开启 -O3 级别编译优化项默认消除 frame-pointer,即 -fomit-frame-pointer(加上编译参数 fno-omit-frame-pointer 表示保留 frame-pointer);关联符号名称 -rdynamic;

1
2
3
4
5
6
7
8
> clang test1.cpp -O3 -rdynamic && ./a.out

[0] ./a.out(_Z14dump_tracebackv+0x1e) [0x55ccb5c1218e]
[1] ./a.out(_Z2f3ILi0EEPvv+0x6) [0x55ccb5c12276]
[2] ./a.out(main+0x14) [0x55ccb5c12204]
[3] /usr/lib64/libc.so.6(+0x3feb0) [0x7f29f083feb0]
[4] /usr/lib64/libc.so.6(__libc_start_main+0x80) [0x7f29f083ff60]
[5] ./a.out(_start+0x25) [0x55ccb5c120a5]

消除 frame-pointer 后,仍然可以用 backtrace 获取当前线程的调用堆栈信息,但是却无法通过内置函数 __builtin_return_address(2) 获取第 2 层调用栈(理论上应是 main 函数代码相关部分)信息

1
2
3
4
5
6
7
8
9
> clang test1.cpp -O3 -rdynamic && ./a.out l2

[0] ./a.out(_Z14dump_tracebackv+0x1e) [0x56397028f18e]
[1] ./a.out(_Z2f3ILi2EEPvv+0x9) [0x56397028f249]
[2] ./a.out(main+0xb) [0x56397028f1fb]
[3] /usr/lib64/libc.so.6(+0x3feb0) [0x7f552de3feb0]
[4] /usr/lib64/libc.so.6(__libc_start_main+0x80) [0x7f552de3ff60]
[5] ./a.out(_start+0x25) [0x56397028f0a5]
[1] 2757999 segmentation fault (core dumped) ./a.out l2

test1 分析

导出反汇编结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
> clang test1.cpp -O3 -rdynamic && objdump -C -r -d a.out

0000000000001080 <_start>:
1080: f3 0f 1e fa endbr64
1084: 31 ed xor %ebp,%ebp
1086: 49 89 d1 mov %rdx,%r9
1089: 5e pop %rsi
108a: 48 89 e2 mov %rsp,%rdx
108d: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
1091: 50 push %rax
1092: 54 push %rsp
1093: 45 31 c0 xor %r8d,%r8d
1096: 31 c9 xor %ecx,%ecx
1098: 48 8d 3d 51 01 00 00 lea 0x151(%rip),%rdi # 11f0 <main>
109f: ff 15 33 2f 00 00 callq *0x2f33(%rip) # 3fd8 <__libc_start_main@GLIBC_2.34>
10a5: f4 hlt

0000000000001170 <dump_traceback()>:
1170: 41 57 push %r15
1172: 41 56 push %r14
1174: 41 54 push %r12
1176: 53 push %rbx
1177: 48 81 ec 48 06 00 00 sub $0x648,%rsp
117e: 49 89 e6 mov %rsp,%r14
1181: 4c 89 f7 mov %r14,%rdi
1184: be c8 00 00 00 mov $0xc8,%esi
1189: e8 c2 fe ff ff callq 1050 <backtrace@plt>
118e: 89 c3 mov %eax,%ebx
1190: 4c 89 f7 mov %r14,%rdi
1193: 89 c6 mov %eax,%esi
1195: e8 a6 fe ff ff callq 1040 <backtrace_symbols@plt>
119a: 48 85 c0 test %rax,%rax
119d: 74 41 je 11e0 <dump_traceback()+0x70>
119f: 49 89 c7 mov %rax,%r15
11a2: 85 db test %ebx,%ebx
11a4: 7e 32 jle 11d8 <dump_traceback()+0x68>
11a6: 41 89 dc mov %ebx,%r12d
11a9: 4c 8d 35 54 0e 00 00 lea 0xe54(%rip),%r14 # 2004 <_IO_stdin_used+0x4>
11b0: 31 db xor %ebx,%ebx
11b2: 66 66 66 66 66 2e 0f data16 data16 data16 data16 nopw %cs:0x0(%rax,%rax,1)
11b9: 1f 84 00 00 00 00 00
11c0: 49 8b 14 df mov (%r15,%rbx,8),%rdx
11c4: 4c 89 f7 mov %r14,%rdi
11c7: 89 de mov %ebx,%esi
11c9: 31 c0 xor %eax,%eax
11cb: e8 90 fe ff ff callq 1060 <printf@plt>
11d0: 48 ff c3 inc %rbx
11d3: 49 39 dc cmp %rbx,%r12
11d6: 75 e8 jne 11c0 <dump_traceback()+0x50>
11d8: 4c 89 ff mov %r15,%rdi
11db: e8 50 fe ff ff callq 1030 <free@plt>
11e0: 48 81 c4 48 06 00 00 add $0x648,%rsp
11e7: 5b pop %rbx
11e8: 41 5c pop %r12
11ea: 41 5e pop %r14
11ec: 41 5f pop %r15
11ee: c3 retq
11ef: 90 nop

00000000000011f0 <main>:
11f0: 50 push %rax
11f1: 83 ff 02 cmp $0x2,%edi
11f4: 7c 09 jl 11ff <main+0xf>
11f6: e8 15 00 00 00 callq 1210 <void* f1<2>()>
11fb: 31 c0 xor %eax,%eax
11fd: 59 pop %rcx
11fe: c3 retq
11ff: e8 1c 00 00 00 callq 1220 <void* f1<0>()>
1204: 31 c0 xor %eax,%eax
1206: 59 pop %rcx
1207: c3 retq
1208: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
120f: 00

0000000000001210 <void* f1<2>()>:
1210: e9 1b 00 00 00 jmpq 1230 <void* f2<2>()>
1215: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
121c: 00 00 00
121f: 90 nop

0000000000001220 <void* f1<0>()>:
1220: e9 3b 00 00 00 jmpq 1260 <void* f2<0>()>
1225: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
122c: 00 00 00
122f: 90 nop

0000000000001230 <void* f2<2>()>:
1230: e9 0b 00 00 00 jmpq 1240 <void* f3<2>()>
1235: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
123c: 00 00 00
123f: 90 nop

0000000000001240 <void* f3<2>()>:
1240: 55 push %rbp
1241: 48 89 e5 mov %rsp,%rbp
1244: e8 27 ff ff ff callq 1170 <dump_traceback()>
1249: 48 8b 45 00 mov 0x0(%rbp),%rax
124d: 48 8b 00 mov (%rax),%rax
1250: 48 8b 40 08 mov 0x8(%rax),%rax
1254: 5d pop %rbp
1255: c3 retq
1256: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
125d: 00 00 00

0000000000001260 <void* f2<0>()>:
1260: e9 0b 00 00 00 jmpq 1270 <void* f3<0>()>
1265: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
126c: 00 00 00
126f: 90 nop

0000000000001270 <void* f3<0>()>:
1270: 50 push %rax
1271: e8 fa fe ff ff callq 1170 <dump_traceback()>
1276: 48 8b 44 24 08 mov 0x8(%rsp),%rax
127b: 59 pop %rcx
127c: c3 retq

参考 附录 解析以上代码

0000000000001270 <void* f3<0>()>:

  • 0x0(%rsp) 是返回地址
  • push %rax 后,返回地址为 0x8(%rsp)
  • mov 0x8(%rsp),%rax 获取返回地址

0000000000001240 <void* f3<2>()>:

  • 执行完 push %rbpmov %rsp,%rbp
    • %rsp 是当前栈顶地址
    • %rbp 是当前栈基地址
    • 0x0(%rbp) 是上个栈基地址,即栈基地址+0x0
    • 0x8(%rbp) 是返回地址,即栈基地址+0x8
  • mov 0x0(%rbp),%rax 获取上个栈基地址
  • mov (%rax),%rax 获取上上个栈基地址
    • 根据层数,逐步解析出栈基地址
  • mov 0x8(%rax),%rax 获取最终返回地址

__builtin_return_address(0) 不依赖 rbp,可以正常工作

__builtin_return_address(?) 函数调用存在几个明显缺点:

  • 当解析层数大于 0 时依赖 rbp 中数据,会忽略消除 frame-pointer 或尾递归优化后的函数调用,导致无法获取准确的返回地址
  • 栈回溯时,除了 rsp 无法恢复其他寄存器
  • 无法关联源代码

Frame Pointer 栈回溯缺点总结

  • 强依赖于 frame-pointer,如果调用栈中有栈帧没有保存 frame-pointer,则会导致解析失败
  • 独占一个寄存器存储 frame-pointer,可能导致性能下降

DWARF(Debugging With Arbitrary Record Formats)

ref: DWARF Debugging Standard

DWARF is a debugging information file format used by many compilers and debuggers to support source level debugging. It addresses the requirements of a number of procedural languages, such as C, C++, and Fortran, and is designed to be extensible to other languages. DWARF is architecture independent and applicable to any processor or operating system. It is widely used on Unix, Linux and other operating systems, as well as in stand-alone environments.

DWARF 使用了独立的 .debug_* 段来解决依赖 frame-pointer 的问题,以 CFI(Call Frame Information) 描述调用栈帧信息(Linux 系统标准中的相关部分也衍生于此,详见下文)。这种方式的优点:

  • 栈回溯不依赖 rbp,可恢复其他寄存器数据
  • 在 ELF 文件中附加段,无运行时性能开销
    • .debug_frame 段记录栈帧相关信息,其他部分细化拆分成诸如 .debug_info.debug_abbrev.debug_line.debug_str 等段

Exception Handling Frame

现代 Linux 系统标准 LSB(Linux Standard Base) 中指明:支持异常的语言(例如 C++),必须向运行时环境提供附加信息,以描述在异常处理期间必须展开的调用帧;该信息包含在特殊段 .eh_frame.eh_framehdr 中;

.eh_frame 基于 DWARF v2 版本的 .debug_frame,主要由 CIE(Common Information Entry)FDE(Frame Description Entry) 组成。对于函数定义,编译器在汇编中嵌入 CFI Directive 相关指令,由汇编器解析生成 .eh_frame.debug_frame。生成行为受编译参数影响如下:

编译参数 生成段
-fasynchronous-unwind-tables -fexceptions .eh_frame
-fno-asynchronous-unwind-tables -fexceptions .eh_frame
-fasynchronous-unwind-tables -fno-exceptions .eh_frame
-fno-asynchronous-unwind-tables -fno-exceptions [-g0 | none] none
-fno-asynchronous-unwind-tables -fno-exceptions -g<? gt 0> .debug_frame

CFI Directive 指令

基本介绍

  • 指令命名为 .cfi_* ()
  • .cfi_startproc.cfi_endproc 标识 FDE 区域
  • .cfi_def_cfa_offset 表示调用栈返回地址
  • .cfi_offset 定义寄存器数据保存位置
  • .cfi_def_cfa_* 定义 CFA 的计算规则
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
> echo 'void test() {__builtin_unwind_init();}' > test3.cpp && clang test3.cpp -S && cat test3.s

.text
.file "test3.cpp"
.globl _Z4testv # -- Begin function _Z4testv
.p2align 4, 0x90
.type _Z4testv,@function
_Z4testv: # @_Z4testv
.cfi_startproc
# %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
.cfi_offset %rbx, -56
.cfi_offset %r12, -48
.cfi_offset %r13, -40
.cfi_offset %r14, -32
.cfi_offset %r15, -24
popq %rbx
popq %r12
popq %r13
popq %r14
popq %r15
popq %rbp
.cfi_def_cfa %rsp, 8
retq
.Lfunc_end0:
.size _Z4testv, .Lfunc_end0-_Z4testv
.cfi_endproc
# -- End function
.ident "Debian clang version 15.0.7"
.section ".note.GNU-stack","",@progbits
.addrsig

.eh_frame

.eh_frame 在 x86 平台下的内容示例如下:

  • 每个 FDE 均有关联的 CIE
  • FDE 每个条目记载特定 PC 位置的 CFA,被调用者 nonvolatile 寄存器的保存位置和返回地址(ra)
1
2
3
4
5
6
7
8
9
10
11
12
Contents of the .eh_frame section:

(FDE 偏移量) (FDE 长度) (FDE 所属的 CIE) (FDE 对应函数的起始 PC 和结束 PC)
00000030 0000000000000024 00000034 FDE cie=00000000 pc=0000000000001020..0000000000001080

(PC 位置) (上一级 (被调用者非易失性 (返回地址的位置)
调用者的 寄存器保存的位置)
栈顶地址)
LOC CFA rbx r12 r14 r15 ra
0000000000001170 rsp+8 u u u u c-8
000000000000117e rsp+1648 c-40 c-32 c-24 c-16 c-8
00000000000011ee rsp+8 c-40 c-32 c-24 c-16 c-8

test1 调用帧分析

导出并解析调用帧定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
> clang test1.cpp -O3 -rdynamic && readelf -wF ./a.out

Contents of the .eh_frame section:


00000000 0000000000000014 00000000 CIE "zR" cf=1 df=-8 ra=16
LOC CFA ra
0000000000000000 rsp+8 u

00000018 0000000000000014 0000001c FDE cie=00000000 pc=0000000000001080..00000000000010ab

00000030 0000000000000014 00000000 CIE "zR" cf=1 df=-8 ra=16
LOC CFA ra
0000000000000000 rsp+8 c-8

00000048 0000000000000024 0000001c FDE cie=00000030 pc=0000000000001020..0000000000001070
LOC CFA ra
0000000000001020 rsp+16 c-8
0000000000001026 rsp+24 c-8
0000000000001030 exp c-8

00000070 0000000000000014 00000044 FDE cie=00000030 pc=0000000000001070..0000000000001078

00000088 0000000000000038 0000005c FDE cie=00000030 pc=0000000000001170..00000000000011ef
LOC CFA rbx r12 r14 r15 ra
0000000000001170 rsp+8 u u u u c-8
0000000000001172 rsp+16 u u u u c-8
0000000000001174 rsp+24 u u u u c-8
0000000000001176 rsp+32 u u u u c-8
0000000000001177 rsp+40 u u u u c-8
000000000000117e rsp+1648 c-40 c-32 c-24 c-16 c-8
00000000000011e7 rsp+40 c-40 c-32 c-24 c-16 c-8
00000000000011e8 rsp+32 c-40 c-32 c-24 c-16 c-8
00000000000011ea rsp+24 c-40 c-32 c-24 c-16 c-8
00000000000011ec rsp+16 c-40 c-32 c-24 c-16 c-8
00000000000011ee rsp+8 c-40 c-32 c-24 c-16 c-8

000000c4 000000000000001c 00000098 FDE cie=00000030 pc=00000000000011f0..0000000000001208
LOC CFA ra
00000000000011f0 rsp+8 c-8
00000000000011f1 rsp+16 c-8
00000000000011fe rsp+8 c-8
00000000000011ff rsp+16 c-8
0000000000001207 rsp+8 c-8

000000e4 0000000000000010 000000b8 FDE cie=00000030 pc=0000000000001210..0000000000001215

000000f8 0000000000000010 000000cc FDE cie=00000030 pc=0000000000001220..0000000000001225

0000010c 0000000000000010 000000e0 FDE cie=00000030 pc=0000000000001230..0000000000001235

00000120 000000000000001c 000000f4 FDE cie=00000030 pc=0000000000001240..0000000000001256
LOC CFA rbp ra
0000000000001240 rsp+8 u c-8
0000000000001241 rsp+16 c-16 c-8
0000000000001244 rbp+16 c-16 c-8
0000000000001255 rsp+8 c-16 c-8

00000140 0000000000000010 00000114 FDE cie=00000030 pc=0000000000001260..0000000000001265

00000154 0000000000000018 00000128 FDE cie=00000030 pc=0000000000001270..000000000000127d
LOC CFA ra
0000000000001270 rsp+8 c-8
0000000000001271 rsp+16 c-8
000000000000127c rsp+8 c-8

00000170 0000000000000044 00000144 FDE cie=00000030 pc=0000000000001280..00000000000012dd
LOC CFA rbx rbp r12 r13 r14 r15 ra
0000000000001280 rsp+8 u u u u u u c-8
0000000000001282 rsp+16 u u u u u c-16 c-8
0000000000001287 rsp+24 u u u u c-24 c-16 c-8
000000000000128c rsp+32 u u u c-32 c-24 c-16 c-8
0000000000001291 rsp+40 u u c-40 c-32 c-24 c-16 c-8
0000000000001299 rsp+48 u c-48 c-40 c-32 c-24 c-16 c-8
00000000000012a1 rsp+56 c-56 c-48 c-40 c-32 c-24 c-16 c-8
00000000000012a8 rsp+64 c-56 c-48 c-40 c-32 c-24 c-16 c-8
00000000000012d2 rsp+56 c-56 c-48 c-40 c-32 c-24 c-16 c-8
00000000000012d3 rsp+48 c-56 c-48 c-40 c-32 c-24 c-16 c-8
00000000000012d4 rsp+40 c-56 c-48 c-40 c-32 c-24 c-16 c-8
00000000000012d6 rsp+32 c-56 c-48 c-40 c-32 c-24 c-16 c-8
00000000000012d8 rsp+24 c-56 c-48 c-40 c-32 c-24 c-16 c-8
00000000000012da rsp+16 c-56 c-48 c-40 c-32 c-24 c-16 c-8
00000000000012dc rsp+8 c-56 c-48 c-40 c-32 c-24 c-16 c-8

000001b8 0000000000000010 0000018c FDE cie=00000030 pc=00000000000012e0..00000000000012e1

000001cc ZERO terminator

> readelf -wF /usr/lib64/libc.so.6

000000ac 0000000000000018 000000b0 FDE cie=00000000 pc=000000000003fe30..000000000003fedc
LOC CFA ra
000000000003fe30 rsp+8 c-8
000000000003fe31 rsp+16 c-8
000000000003fe32 rsp+8 c-8
000000000003fe39 rsp+160 c-8

000000c8 0000000000000030 000000cc FDE cie=00000000 pc=000000000003fee0..0000000000040028
LOC CFA rbx rbp r12 r13 r14 r15 ra
000000000003fee0 rsp+8 u u u u u u c-8
000000000003fee6 rsp+16 u u u u u c-16 c-8
000000000003feeb rsp+24 u u u u c-24 c-16 c-8
000000000003feed rsp+32 u u u c-32 c-24 c-16 c-8
000000000003fef2 rsp+40 u u c-40 c-32 c-24 c-16 c-8
000000000003fef6 rsp+48 u c-48 c-40 c-32 c-24 c-16 c-8
000000000003fef9 rsp+56 c-56 c-48 c-40 c-32 c-24 c-16 c-8
000000000003fefd rsp+80 c-56 c-48 c-40 c-32 c-24 c-16 c-8

000000fc 0000000000000010 00000100 FDE cie=00000000 pc=0000000000040030..000000000004004a

分析可知 backtrace() 的调用栈展开过程为:

  • dump_traceback()+0x1e
    • 118e:
      • ref 000000000000117e rsp+1648 c-40 c-32 c-24 c-16 c-8
      • 1640(%rsp) 得到返回地址 1276
  • void* f3<0>()+0x6
    • 1276:
      • ref 0000000000001271 rsp+16 c-8
      • 8(%rsp) 得到返回地址 1204
  • main()+0x14
    • 1204:
      • ref 00000000000011ff rsp+16 c-8
      • 8(%rsp) 得到 /usr/lib64/libc.so.6 中地址 3feb0
  • /usr/lib64/libc.so.6(+0x3feb0)
    • ref 000000000003fe39 rsp+160 c-8
    • 152(%rsp) 得到 /usr/lib64/libc.so.6 中地址 3ff60
  • /usr/lib64/libc.so.6(__libc_start_main+0x80)
    • ref 000000000003fefd rsp+80 c-56 c-48 c-40 c-32 c-24 c-16 c-8
    • 72(%rsp) 得到返回地址 10a5
  • _start+0x25
    • ref 0000000000000000 rsp+8 u

libunwind Stack Unwinding

libunwind 提供了可移植的、高效的 API 来确定程序的调用链。API 支持本地(同进程)和远程(跨进程)操作。API 可操作每个调用帧的保存状态(被调用方保存),可在调用链的任何点恢复执行。典型的 libunwind 使用场景有:异常处理,debug 调试,调用链监控,setjmp() / longjmp()

libunwind 的相关接口函数如下

  • unw_init_local 主要用于当前进程的栈展开
  • unw_init_remote 则通常作用于其他进程
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> nm -CD libunwind.so | grep 'unw_' | less

0000000000007d84 W unw_getcontext
0000000000001d80 W unw_get_fpreg
0000000000001f30 W unw_get_proc_info
0000000000002030 W unw_get_proc_name
0000000000001bc0 W unw_get_reg
0000000000001ab0 W unw_init_local
00000000000020d0 W unw_is_fpreg
00000000000021d0 W unw_is_signal_frame
0000000000002240 W unw_iterate_dwarf_unwind_cache
000000000000e0c8 D unw_local_addr_space
0000000000002150 W unw_regname
0000000000001fc0 W unw_resume
0000000000001e20 W unw_set_fpreg
0000000000001c60 W unw_set_reg
0000000000001ec0 W unw_step

test2

test2 测试基于 libunwind 进行本地调用栈回溯。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
// test2.cpp

#include <assert.h>
#include <cstdio>
#include <cstdlib>
#include <cxxabi.h>

#define UNW_LOCAL_ONLY
#include <libunwind.h>

#ifndef NO_INLINE
#define NO_INLINE __attribute__((__noinline__))
#endif

NO_INLINE void dump_backtrace() {
char buff[1024];
size_t demangle_buff_size = 0;
char *demangle_buff = nullptr;
unw_cursor_t cursor;
unw_context_t uc;

unw_word_t offset{};
unw_getcontext(&uc);
unw_init_local(&cursor, &uc);

while (unw_step(&cursor) > 0) {
unw_word_t ip, sp;
unw_get_reg(&cursor, UNW_REG_IP, &ip);
unw_get_reg(&cursor, UNW_REG_SP, &sp);
auto status = unw_get_proc_name(&cursor, buff, sizeof(buff), &offset);
assert(!status);
auto realname = buff;
{
if (int status = -1;
demangle_buff = abi::__cxa_demangle(buff, demangle_buff,
&demangle_buff_size, &status),
status == 0) {
realname = demangle_buff;
}
}
printf("0x%016lx <%s+0x%lx>\n", ip, realname, offset);
}
if (demangle_buff) {
free(demangle_buff);
}
}

NO_INLINE void *f3() {
dump_backtrace();
return nullptr;
}
NO_INLINE void *f2() { return f3(); }
NO_INLINE void *f1() { return f2(); }

int main(int argc, char **argv) {
f1();
return 0;
}
1
2
3
4
5
6
7
8
9
> clang test2.cpp -O3 -L/usr/lib64 -lunwind -lc++ -lc++abi -stdlib=libc++ -std=gnu++20 -rdynamic && ./a.out 

0x00005645e5d11306 <f3()+0x6>
0x00005645e5d11316 <f2()+0x6>
0x00005645e5d11326 <f1()+0x6>
0x00005645e5d11336 <main+0x6>
0x00007ff550c3feb0 <__libc_start_call_main+0x80>
0x00007ff550c3ff60 <__libc_start_main+0x80>
0x00005645e5d110f5 <_start+0x25>

此处由于函数 f1() f2() f3() dump_backtrace() 默认符号对外可见,编译器优化 optimize-sibling-calls 便无法生效。如果函数声明为 static 或使用匿名空间,则可优化这种调用关系。

C++ Exception Handling

C++ Exception Handling 是 Stack Unwinding 的典型应用。异常处理相关的 ABI 有多种,以 Itanium C++ ABI: Exception Handling 使用最广,其中 C++ 异常处理的 ABI 被分成 3 个级别:

  • Base ABI:语言无关的公共接口
  • C++ ABI:C++ 实现的必要接口
  • 特定运行时下的实现规范

Landing Pad 定义

  • landing-pad 是指捕获异常或在异常后执行清理流程的用户代码
  • 异常处理过程中的 Personality Routine 流程会有选择地将代码的控制权移交给 landing-pad,执行相关逻辑后,或结束异常处理并回到正常用户代码,或继续处理异常,或抛出异常

test4

test4 以简单的代码实例介绍异常从被抛出到捕获的过程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// test4.cpp

volatile int v = 0x0;

struct N {
__attribute__((__noinline__)) N() { v = 0x1; }
__attribute__((__noinline__)) ~N() { v = 0x2; }
};

void test(bool x) {
N n;
try {
if (x)
throw v;
v = 0x4;
} catch (int &e) {
throw static_cast<double>(v);
} catch (double &e) {
v = 0x5;
} catch (...) {
v = 0x3;
}
}

void test_noexcept(bool) noexcept {
N n;
throw v;
}

void test2(bool x) {
N n;
try {
test(x);
} catch (float &e) {
v = 0x6;
} catch (double &e) {
v = 0x7;
}
v = 0x8;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
> clang -c test4.cpp -O3 && objdump -C -r -d ./test4.o

Disassembly of section .text:

0000000000000000 <test(bool)>:
0: 53 push %rbx
1: 48 83 ec 10 sub $0x10,%rsp
5: 89 fb mov %edi,%ebx
7: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
c: e8 00 00 00 00 callq 11 <test(bool)+0x11>
d: R_X86_64_PLT32 N::N()-0x4
# test x
11: 85 db test %ebx,%ebx
13: 75 1a jne 2f <test(bool)+0x2f>

# x is 0
# 正常流程结束函数调用
15: c7 05 00 00 00 00 04 movl $0x4,0x0(%rip) # 1f <test(bool)+0x1f>
1c: 00 00 00
17: R_X86_64_PC32 v-0x8
1f: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
24: e8 00 00 00 00 callq 29 <test(bool)+0x29>
25: R_X86_64_PLT32 N::~N()-0x4
29: 48 83 c4 10 add $0x10,%rsp
2d: 5b pop %rbx
2e: c3 retq

# x is NOT 0
# __cxa_allocate_exception 分配异常对象空间
2f: bf 04 00 00 00 mov $0x4,%edi
34: e8 00 00 00 00 callq 39 <test(bool)+0x39>
35: R_X86_64_PLT32 __cxa_allocate_exception-0x4
# 设置异常对象数据
39: 8b 0d 00 00 00 00 mov 0x0(%rip),%ecx # 3f <test(bool)+0x3f>
3b: R_X86_64_PC32 v-0x4
3f: 89 08 mov %ecx,(%rax)

# 设置异常对象类型
41: 48 8b 35 00 00 00 00 mov 0x0(%rip),%rsi # 48 <test(bool)+0x48>
44: R_X86_64_REX_GOTPCRELX typeinfo for int-0x4
48: 48 89 c7 mov %rax,%rdi
4b: 31 d2 xor %edx,%edx

# __cxa_throw 抛出异常
4d: e8 00 00 00 00 callq 52 <test(bool)+0x52>
4e: R_X86_64_PLT32 __cxa_throw-0x4

# landing pad:清理
52: eb 63 jmp b7 <test(bool)+0xb7>

# landing pad:异常捕获
54: 48 89 d3 mov %rdx,%rbx
57: 48 89 c7 mov %rax,%rdi

# 匹配 int 类型异常
5a: 83 fb 03 cmp $0x3,%ebx
5d: 74 2c je 8b <test(bool)+0x8b>
5f: e8 00 00 00 00 callq 64 <test(bool)+0x64>
60: R_X86_64_PLT32 __cxa_begin_catch-0x4

# 匹配 double 类型异常
64: 83 fb 02 cmp $0x2,%ebx
67: 75 11 jne 7a <test(bool)+0x7a>
69: c7 05 00 00 00 00 88 movl $0x5,0x0(%rip) # 73 <test(bool)+0x73>
70: 88 00 00
6b: R_X86_64_PC32 v-0x8
73: e8 00 00 00 00 callq 78 <test(bool)+0x78>
74: R_X86_64_PLT32 __cxa_end_catch-0x4
78: eb a5 jmp 1f <test(bool)+0x1f>

# 匹配默认异常
7a: c7 05 00 00 00 00 03 movl $0x3,0x0(%rip) # 84 <test(bool)+0x84>
81: 00 00 00
7c: R_X86_64_PC32 v-0x8
84: e8 00 00 00 00 callq 89 <test(bool)+0x89>
85: R_X86_64_PLT32 __cxa_end_catch-0x4
89: eb 94 jmp 1f <test(bool)+0x1f>

# 匹配 int 类型异常,新抛出 double 类型异常
8b: e8 00 00 00 00 callq 90 <test(bool)+0x90>
8c: R_X86_64_PLT32 __cxa_begin_catch-0x4
90: bf 08 00 00 00 mov $0x8,%edi
95: e8 00 00 00 00 callq 9a <test(bool)+0x9a>
96: R_X86_64_PLT32 __cxa_allocate_exception-0x4
9a: f2 0f 2a 05 00 00 00 cvtsi2sdl 0x0(%rip),%xmm0 # a2 <test(bool)+0xa2>
a1: 00
9e: R_X86_64_PC32 v-0x4
a2: f2 0f 11 00 movsd %xmm0,(%rax)
a6: 48 8b 35 00 00 00 00 mov 0x0(%rip),%rsi # ad <test(bool)+0xad>
a9: R_X86_64_REX_GOTPCRELX typeinfo for double-0x4
ad: 48 89 c7 mov %rax,%rdi
b0: 31 d2 xor %edx,%edx
b2: e8 00 00 00 00 callq b7 <test(bool)+0xb7>
b3: R_X86_64_PLT32 __cxa_throw-0x4
b7: 48 89 c3 mov %rax,%rbx
ba: eb 08 jmp c4 <test(bool)+0xc4>

# `b2:` 抛出异常对应 landing pad:清理
bc: 48 89 c3 mov %rax,%rbx
bf: e8 00 00 00 00 callq c4 <test(bool)+0xc4>
c0: R_X86_64_PLT32 __cxa_end_catch-0x4
c4: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
c9: e8 00 00 00 00 callq ce <test(bool)+0xce>
ca: R_X86_64_PLT32 N::~N()-0x4

# 跳转回到 cleanup 阶段
ce: 48 89 df mov %rbx,%rdi
d1: e8 00 00 00 00 callq d6 <test(bool)+0xd6>
d2: R_X86_64_PLT32 _Unwind_Resume-0x4
d6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
dd: 00 00 00

00000000000000e0 <test_noexcept(bool)>:
e0: 50 push %rax
e1: 48 89 e7 mov %rsp,%rdi
e4: e8 00 00 00 00 callq e9 <test_noexcept(bool)+0x9>
e5: R_X86_64_PLT32 N::N()-0x4
e9: bf 04 00 00 00 mov $0x4,%edi
ee: e8 00 00 00 00 callq f3 <test_noexcept(bool)+0x13>
ef: R_X86_64_PLT32 __cxa_allocate_exception-0x4
f3: 8b 0d 00 00 00 00 mov 0x0(%rip),%ecx # f9 <test_noexcept(bool)+0x19>
f5: R_X86_64_PC32 v-0x4
f9: 89 08 mov %ecx,(%rax)
fb: 48 8b 35 00 00 00 00 mov 0x0(%rip),%rsi # 102 <test_noexcept(bool)+0x22>
fe: R_X86_64_REX_GOTPCRELX typeinfo for int-0x4
102: 48 89 c7 mov %rax,%rdi
105: 31 d2 xor %edx,%edx
107: e8 00 00 00 00 callq 10c <test_noexcept(bool)+0x2c>
108: R_X86_64_PLT32 __cxa_throw-0x4

# landing pad:terminate 进程
10c: 48 89 c7 mov %rax,%rdi
10f: e8 00 00 00 00 callq 114 <test_noexcept(bool)+0x34>
110: R_X86_64_PLT32 __clang_call_terminate-0x4
114: 66 66 66 2e 0f 1f 84 data16 data16 nopw %cs:0x0(%rax,%rax,1)
11b: 00 00 00 00 00

0000000000000120 <test2(bool)>:
120: 55 push %rbp
121: 53 push %rbx
122: 50 push %rax
123: 89 fb mov %edi,%ebx
125: 48 89 e7 mov %rsp,%rdi
128: e8 00 00 00 00 callq 12d <test2(bool)+0xd>
129: R_X86_64_PLT32 N::N()-0x4
12d: 89 df mov %ebx,%edi
12f: e8 00 00 00 00 callq 134 <test2(bool)+0x14>
130: R_X86_64_PLT32 test(bool)-0x4
134: c7 05 00 00 00 00 08 movl $0x8,0x0(%rip) # 13e <test2(bool)+0x1e>
13b: 00 00 00
136: R_X86_64_PC32 v-0x8
13e: 48 89 e7 mov %rsp,%rdi
141: e8 00 00 00 00 callq 146 <test2(bool)+0x26>
142: R_X86_64_PLT32 N::~N()-0x4
146: 48 83 c4 08 add $0x8,%rsp
14a: 5b pop %rbx
14b: 5d pop %rbp
14c: c3 retq

# landing pad:异常捕获
14d: 48 89 c3 mov %rax,%rbx

# 匹配 float 类型异常
150: bd 06 00 00 00 mov $0x6,%ebp
155: 83 fa 02 cmp $0x2,%edx
158: 74 0a je 164 <test2(bool)+0x44>

# 匹配 double 类型异常
15a: bd 07 00 00 00 mov $0x7,%ebp
15f: 83 fa 01 cmp $0x1,%edx
162: 75 15 jne 179 <test2(bool)+0x59>
164: 48 89 df mov %rbx,%rdi
167: e8 00 00 00 00 callq 16c <test2(bool)+0x4c>
168: R_X86_64_PLT32 __cxa_begin_catch-0x4
16c: 89 2d 00 00 00 00 mov %ebp,0x0(%rip) # 172 <test2(bool)+0x52>
16e: R_X86_64_PC32 v-0x4
172: e8 00 00 00 00 callq 177 <test2(bool)+0x57>
173: R_X86_64_PLT32 __cxa_end_catch-0x4
177: eb bb jmp 134 <test2(bool)+0x14>

# 无匹配异常
# landing pad:清理并继续 unwind
179: 48 89 e7 mov %rsp,%rdi
17c: e8 00 00 00 00 callq 181 <test2(bool)+0x61>
17d: R_X86_64_PLT32 N::~N()-0x4
181: 48 89 df mov %rbx,%rdi
184: e8 00 00 00 00 callq 189 <test2(bool)+0x69>
185: R_X86_64_PLT32 _Unwind_Resume-0x4

Disassembly of section .text._ZN1NC2Ev:

0000000000000000 <N::N()>:
0: c7 05 00 00 00 00 01 movl $0x1,0x0(%rip) # a <N::N()+0xa>
7: 00 00 00
2: R_X86_64_PC32 v-0x8
a: c3 retq

Disassembly of section .text._ZN1ND2Ev:

0000000000000000 <N::~N()>:
0: c7 05 00 00 00 00 02 movl $0x2,0x0(%rip) # a <N::~N()+0xa>
7: 00 00 00
2: R_X86_64_PC32 v-0x8
a: c3 retq

Disassembly of section .text.__clang_call_terminate:

0000000000000000 <__clang_call_terminate>:
0: 50 push %rax
1: e8 00 00 00 00 callq 6 <__clang_call_terminate+0x6>
2: R_X86_64_PLT32 __cxa_begin_catch-0x4
6: e8 00 00 00 00 callq b <__clang_call_terminate+0xb>
7: R_X86_64_PLT32 std::terminate()-0x4

test4 分析

异常抛出逻辑主要步骤:

  • 异常对象构造:__cxa_allocate_exception 分配堆上空间并构造对象,设置异常类型 RTTI(Run-Time Type Information)
  • 异常抛出:__cxa_throw 设置当前异常,执行 _Unwind_RaiseException(主要分为 2 个阶段 search 和 cleanup):
    • search 阶段:通过 Personality Routine 机制(ELF 下主要根据 .gcc_except_table 段,由 __gxx_personality_v0__gcc_personality_v0 解析)逐级展开栈帧,查找 try{}catch{} 与当前异常类型匹配的模块,如果没有找到就 terminate 进程,否则进入 cleanup 阶段
    • cleanup 阶段:重新通过 Personality Routine 机制逐级展开栈帧
      • 找到需要清理变量的栈帧后,恢复寄存器状态,跳转到该帧相关的 landing-pad。该 landing-pad 最后会调用 _Unwind_Resume 跳转回到 cleanup 阶段。
      • 找到需要执行异常捕获的栈帧后,恢复寄存器状态,跳转到该帧相关的 landing-pad。如果异常匹配成功后,调用 __cxa_begin_catch,执行相关 catch 代码逻辑,最后调用 __cxa_end_catch 结束异常处理流程,回归正常代码;如果无异常匹配,则清理残留变量并通过 _Unwind_Resume 跳转回 cleanup 阶段;

__cxa_* 为 C++ 内部实现的异常处理接口,clang 下的具体行为可参考 Exception Handling in LLVM

  • __cxa_begin_catch 返回异常对象的指针
  • __cxa_end_catch 减少当前捕获异常的引用计数,或清除异常对象
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
> nm -CD libstdc++.so

00000000000917a0 T __cxa_allocate_exception
00000000000919f0 T __cxa_begin_catch
000000000009e0b0 T __cxa_demangle
0000000000091a60 T __cxa_end_catch
0000000000092b40 T __cxa_rethrow
0000000000092af0 T __cxa_throw

> objdump -C -r -d libstdc++.so.6

0000000000092af0 <__cxa_throw@@CXXABI_1.3>:
...
92b22: e8 b9 7a ff ff callq 8a5e0 <_Unwind_RaiseException@plt>
...

异常抛出的行为依赖 unwind 库接口 _Unwind_*。gcc 自带默认 unwind 库 libgcc_s.[so.*]libgcc_eh.a,此外还有以 nongnu.org/libunwindllvm-project/libunwind 为典型代表的 libunwind 库。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
> nm -CD libunwind.so | grep '_Unwind_' | less
0000000000007760 T _Unwind_Backtrace
0000000000007350 T _Unwind_DeleteException
00000000000076a0 T _Unwind_FindEnclosingFunction
00000000000078f0 T _Unwind_Find_FDE
0000000000007170 T _Unwind_ForcedUnwind
00000000000079d0 T _Unwind_GetCFA
00000000000075c0 T _Unwind_GetDataRelBase
00000000000073a0 T _Unwind_GetGR
0000000000007470 T _Unwind_GetIP
0000000000007a40 T _Unwind_GetIPInfo
0000000000007220 T _Unwind_GetLanguageSpecificData
00000000000072d0 T _Unwind_GetRegionStart
0000000000007630 T _Unwind_GetTextRelBase
0000000000006780 T _Unwind_RaiseException
0000000000006de0 T _Unwind_Resume
0000000000007530 T _Unwind_Resume_or_Rethrow
0000000000007420 T _Unwind_SetGR
00000000000074e0 T _Unwind_SetIP

通常 C++ 编译器默认函数会抛出异常(禁止使用异常则需加上编译参数 -fno-exceptions)。被关键字 noexcept 修饰的函数表示其不会对外抛出异常,主要影响体现在 2 个层面:

  • 函数定义:f(...) noexcept { ... f?(); ... } 定义近似于 f(...) { ... try { f?(); } catch (...) {std::terminate();} },即函数内部调用其他函数时捕获任何异常均会进入 terminate 流程。
  • 函数声明:声明为 noexcept 的函数被调用时,编译器无需为该调用路径设置 landing-pad,以便于优化调用方行为。常见的 Cextern "C" 基础库函数通常会包含 throw() / noexcept(true) / noexcept 之类的无异常声明。

C++ 异常的影响

鉴于 C++ 抛出异常和处理异常时的糟糕性能已是公认的问题,此处不再累述。然而,面向基本无异常抛出的场景,用异常来代替返回值检查可以进一步优化最短链路。

test5 中 f1() 会在出现错误时抛出异常,f2() 则是以返回值表示状态,test1()test2() 实现近似功能。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// test5.cpp

#include <cstdint>
#include <optional>

int64_t f1();

using N = std::optional<int64_t>;

N f2() noexcept;

int64_t test1(size_t n) {
int64_t res = 0;
for (size_t i = 0; i < n; ++i) {
res += f1();
}
return res;
}

N test2(size_t n) noexcept {
int64_t res = 0;
for (size_t i = 0; i < n; ++i) {
auto &&x = f2();
if (!x) {
return x;
}
res += *x;
}
return res;
}

从反汇编结果来看,test1() 函数的代码更精练紧凑,使用的寄存器更少。极少出现错误时,整个调用链路可以节省状态检查的相关逻辑,进一步压榨性能。

f2() 将返回状态和返回结果分别存储在 8 字节整型结构中返回,可以通过 RAX 和 RDX 寄存器传回调用方,test2()中则检测 RDX 并跳转分支。如果返回状态错误较少,则 CPU 分支预测成功率较高情况下,这种状态检查造成的开销微乎其微。如果状态检查的逻辑较为复杂时,则需要具体评估使用场景并优化。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
> clang -c test5.cpp -O3 -std=gnu++17 && objdump -C -r -d test5.o

test5.o: file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <test1(unsigned long)>:
0: 41 56 push %r14
2: 53 push %rbx
3: 50 push %rax
4: 48 85 ff test %rdi,%rdi
7: 74 16 je 1f <test1(unsigned long)+0x1f>
9: 48 89 fb mov %rdi,%rbx
c: 45 31 f6 xor %r14d,%r14d
f: 90 nop
10: e8 00 00 00 00 callq 15 <test1(unsigned long)+0x15>
11: R_X86_64_PLT32 f1()-0x4
15: 49 01 c6 add %rax,%r14
18: 48 ff cb dec %rbx
1b: 75 f3 jne 10 <test1(unsigned long)+0x10>
1d: eb 03 jmp 22 <test1(unsigned long)+0x22>
1f: 45 31 f6 xor %r14d,%r14d
22: 4c 89 f0 mov %r14,%rax
25: 48 83 c4 08 add $0x8,%rsp
29: 5b pop %rbx
2a: 41 5e pop %r14
2c: c3 retq
2d: 0f 1f 00 nopl (%rax)

0000000000000030 <test2(unsigned long)>:
30: 55 push %rbp
31: 41 56 push %r14
33: 53 push %rbx
34: 41 b6 01 mov $0x1,%r14b
37: 48 85 ff test %rdi,%rdi
3a: 74 27 je 63 <test2(unsigned long)+0x33>
3c: 48 89 fd mov %rdi,%rbp
3f: 31 db xor %ebx,%ebx
41: 66 66 66 66 66 66 2e data16 data16 data16 data16 data16 nopw %cs:0x0(%rax,%rax,1)
48: 0f 1f 84 00 00 00 00
4f: 00
50: e8 00 00 00 00 callq 55 <test2(unsigned long)+0x25>
51: R_X86_64_PLT32 f2()-0x4
55: 84 d2 test %dl,%dl
57: 74 0e je 67 <test2(unsigned long)+0x37>
59: 48 01 c3 add %rax,%rbx
5c: 48 ff cd dec %rbp
5f: 75 ef jne 50 <test2(unsigned long)+0x20>
61: eb 0a jmp 6d <test2(unsigned long)+0x3d>
63: 31 db xor %ebx,%ebx
65: eb 06 jmp 6d <test2(unsigned long)+0x3d>
67: 45 31 f6 xor %r14d,%r14d
6a: 48 89 c3 mov %rax,%rbx
6d: 48 89 d8 mov %rbx,%rax
70: 44 89 f2 mov %r14d,%edx
73: 5b pop %rbx
74: 41 5e pop %r14
76: 5d pop %rbp
77: c3 retq

> ls -l test5.o
-rw-r--r-- 1 root root 1480 Oct 10 14:14 test5.o

> clang -c test5.cpp -O3 -std=gnu++17 -fno-exceptions
> ls -l test5.o
-rw-r--r-- 1 root root 1432 Oct 10 14:14 test5.o

异常影响代码体积:

  • 启用异常后,每处可能抛出异常的函数调用,每处 try{}catch{} 异常捕获逻辑,编译器均会生成相关的 landing-pad 代码。编译器在 .gcc_except_table 段存储函数的异常表,包含在函数代码的特定部分中引发异常时要执行的相关操作。
  • 相对于禁用异常,启用异常后二进制文件体积会增大约 10% ~ 20%。这也是 LLVM Coding Standards 中禁用异常的原因。

异常影响编译优化:

  • 理论上而言,编译器越先进,所生成的代码性能上(无异常抛出时)越是能接近禁用异常的场景。早期编译器这方面的能力不强,现代主流编译器已有较大改善。
  • 编译器分析推导上下文时,越简单明确的分支|行为可以令其做出更好的优化,同时也可提升编译速度。因此,明确不会抛出异常的函数,建议在函数定义和声明处均加上 noexcept

C 语言异常

C 语言没有原生的异常机制,可通过 setjmp / longjmp 模拟实现类似的效果。

ref: https://en.wikipedia.org/wiki/Setjmp.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
// test6.cpp

#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <pthread.h>
#include <setjmp.h>

static void first();
static void second();

/* Use a file scoped static variable for the exception stack so we can access
* it anywhere within this translation unit. */
static jmp_buf exception_env;
static int exception_type;

struct TestFlag {
TestFlag() { printf("construct %s \n", __func__); }
~TestFlag() {
// no reachable
printf("destruct %s. SHOULD NOT HAPPEN\n", __func__);
exit(-1);
}
};

int main(void) {
char *volatile mem_buffer = NULL;

if (setjmp(exception_env)) {
// if we get here there was an exception
printf("first failed, exception type: %d\n", exception_type);
} else {
// Run code that may signal failure via longjmp.
puts("calling first");
first();

mem_buffer = (char *)(malloc(300)); // allocate a resource
printf("%s\n", strcpy(mem_buffer, "first succeeded")); // not reached
}

free(mem_buffer); // NULL can be passed to free, no operation is performed

return 0;
}

static void first() {
jmp_buf my_env;

puts("entering first"); // reached

TestFlag n;

std::memcpy(my_env, exception_env, sizeof my_env);

switch (setjmp(exception_env)) {
case 3: // if we get here there was an exception.
puts("second failed, exception type: 3; remapping to type 1");
exception_type = 1;

default: // fall through
std::memcpy(exception_env, my_env,
sizeof exception_env); // restore exception stack
longjmp(exception_env, exception_type); // continue handling the exception

case 0: // normal, desired operation
puts("calling second"); // reached
second();
puts("second succeeded"); // not reached
}

std::memcpy(exception_env, my_env,
sizeof exception_env); // restore exception stack

puts("leaving first"); // never reached
}

static void second() {
puts("entering second"); // reached

exception_type = 3;
longjmp(exception_env, exception_type); // declare that the program has failed

puts("leaving second"); // not reached
}
1
2
3
4
5
6
7
8
9
> clang test6.cpp -O3 -std=gnu++17 && ./a.out

calling first
entering first
construct TestFlag
calling second
entering second
second failed, exception type: 3; remapping to type 1
first failed, exception type: 1

这种方式比较灵活可控,异常链路下的性能可能优于 C++,但缺点也很明显:

  • 需要自建资源管控机制:longjmp 跳转后,无法逐级展开栈帧并清理残留数据;setjmp 仅保存寄存器相关数据,栈上对象无保护;
  • 代码复杂度急剧增加

C++ 异常总结

广义上说异常机制是牺牲异常链路性能来优化正常链路

  • 使用异常的代价是生成的二进制文件更大,编译速度更慢,异常捕获处理链路上性能极其糟糕(多线程环境下更甚)。切忌把异常用于普通的逻辑控制,建议在出现概率小于 0.1% 的错误中使用异常。
    • 如果出现错误后需要结束整个调用链路,可以抛出异常,并且建议在异常对象中包含函数调用链之类的关键信息。如果错误本身就是可预期的,则不建议用异常。
    • 简而言之,不用分析调用栈的错误,多半无需使用异常
  • 相较于禁用异常,现代主流的 C++ 编译器已经能生成性能近似的代码(无异常抛出时),甚至于更进一步提升最短链路下的性能。
    • 建议评估代码逻辑是否需要用异常代替返回状态来压榨性能。这点比较适用于低延迟场景。
      • 或可根据业务场景,通过 setjmp / longjmp 半手动构建更灵活的错误处理机制
    • 如果使用返回状态,理想情况下建议 std::optional< (1 ~ 8 byte) > / std::pair<long, (1 ~ 8 byte) > / long 这类返回值,以便于通过寄存器快速传值,调用方快速判断结果。

明确不会抛出异常的函数(例如 memcmp / memcpy 等基础函数),建议在函数定义和声明处均加上 noexcept,以便于编译器做出更好的调用方优化。

  • 对于内联函数或定义可见的函数,编译器会自动识别

异常机制对于 OOP 编程模式较为友好,可以增强代码的表达性和兼容性。当需要在大型项目代码中干脏活时,异常无疑是最快的选择。

  • 构造函数失败可抛出异常至上层,否则需引入二段式构造
  • 重载运算符之类的标准接口或不易改造的历史接口
  • 快速在繁杂逻辑中新开辟错误处理路径

附录

汇编基础

  • push %? 等价于 sub $0x8,%rsp + mov %?,0x0(%rsp)
  • pop %? 等价于 mov 0x0(%rsp),%? + add $0x8,%rsp
  • callq ? <?()> 等价于 push %rip + jmpq ? <?()>
  • retq 等价于 push %rip + jmpq ? <?()>
  • callq ? <?()> 执行前,必须保证 rsp 中的栈顶地址按照 16 对齐

x86_64 calling conventions

x86_64-abi

An Application Binary Interface (ABI) is the interface between two binary program modules that work together. An ABI is a contract between pieces of binary code defining the mechanisms by which functions are invoked and how parameters are passed between the caller and callee.


x86_64 寄存器分类

  • volatile (caller-saved) 寄存器:RAX, RCX, RDX, RDI, RSI, R8, R9, R10, R11, XMM*, YMM*

    • 通常用于存储临时数据,函数返回值,函数参数等
  • nonvolatile (callee-saved) 寄存器:RBX, RBP, RSP, R12, R13, R14, R15

    • 寄存器数据在函数调用前后必须保持一致
      • 如果函数内需要改动寄存器数据,通用的做法是在栈上保存原始数据并还原

System V Application Binary Interface AMD64 Architecture Processor Supplement: 3.2.3 Parameter Passing

函数参数传递规范:

  • 当函数非浮点数参数少于 7 个时,参数从左到右放入寄存器: RDI, RSI, RDX, RCX, R8, R9;当参数为 7 个及以上时,第 7 个参数开始依次从 右向左 放入栈中(栈地址自高到低);
  • XMM0 ~ XMM7 传递前 8 个浮点数参数,其他通过内存传递
  • 结构体参数传递规则相对复杂,详见文档
1
2
3
4
5
6
7
8
9
10
11
F(a, b, c, d, e, f, g, h, double x0 ... double x7, double x8)
a: %rdi
b: %rsi
c: %rdx
d: %rcx
e: %r8
f: %r9
g: 0x8(%rsp)
h: 0x16(%rsp)
x0 ~ x7: xmm0 ~ xmm7
x8: 0x24(%rsp)

函数返回值规范:

  • 返回值小于等于 2 个:RAX 用于保存第 1 个返回值,RDX 用于保存第 2 个返回值; XMM0 和 XMM1 存储前 2 个浮点数返回值;
  • 返回值大于 2 个:由调用方开辟内存空间,并将地址作为第 1 个参数传入 RDI。被调用方将所有返回值依次保存到相关内存空间后,保存起始地址到 RAX。

Reference