What good can using exceptions do for me? The basic answer is: Using exceptions for error handling makes your code simpler, cleaner, and less likely to miss errors. But what’s wrong with “good old errno and if-statements”? The basic answer is: Using those, your error handling and your normal code are closely intertwined. That way, your code gets messy and it becomes hard to ensure that you have dealt with all errors.
由于 C++ 异常机制复杂的特性,编写异常安全的代码不是件轻松的事情。异常增强了语言的表达能力,但也带来了不可避免的开销。本文简单分析了 C++ 异常机制的实现原理,并总结相关注意事项。
Stack Unwinding
计算机程序的 Call Stack (调用栈) 由多层 Stack Frame
(栈帧)组成。每个栈帧对应一个正在进行的子程序(函数)调用过程,子程序返回时,则弹出栈帧。图中简单展示了一个调用栈的布局:栈的方向为自底向上;程序 DrawSquare 调用子程序 DrawLine;
在 x86_64
体系下,栈的方向为自顶向下,Stack Pointer
保存在 RSP 寄存器中,如果 C/C++ 程序编译时保留函数的 Frame Pointer
,则该数据保存在 RBP 寄存器中。参考附录 ,每当通过 call
指令进入被调函数时,RSP 中对应就是保存在栈上的 Return Address
(返回地址,对应代码段),如果可以通过代码相关地址解析出栈帧布局,就可以逐层回溯整个调用栈。总体上而言,Stack Unwinding
泛指这种解析|展开|修改调用栈的行为。stack-unwinding 主要有以下作用:
栈回溯(用于 debug,监控,perf,crash 报告 等场景)
异常机制
Frame Pointer 栈回溯 基于 frame-pointer 进行栈回溯是最简单通用的做法。需要占用一个寄存器专门存储 frame-pointer,并在栈帧上相对固定的位置存储相关数据,近似于把栈帧以链表的形式串联。x86_64
下,可以通过 frame-pointer 进行栈回溯的函数,其代码指令形如:
1 2 3 4 5 6 7 push %rbp mov %rsp, %rbp ... pop %rbp retq
test1 backtrace()
和 backtrace_symbols()
是 C++ 自带的栈回溯功能,其内部实现不完全依赖 frame-pointer。__builtin_return_address(n)
表示从当前函数回溯 n 层获取返回地址,实现上依赖 frame-pointer。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 #include <cstdio> #include <cstdlib> #include <execinfo.h> #ifndef NO_INLINE #define NO_INLINE __attribute__((__noinline__)) #endif using ULL = unsigned long long ;NO_INLINE void dump_traceback () { const int size = 200 ; void *buffer[size]; int nptrs = backtrace (buffer, size); char **strings = backtrace_symbols (buffer, nptrs); if (strings) { for (int i = 0 ; i < nptrs; ++i) { printf ("[%d] %s\n" , i, strings[i]); } free (strings); } } template <int level> NO_INLINE void *f3 () { dump_traceback (); return __builtin_return_address(level); } template <int level> NO_INLINE void *f2 () { return f3 <level>(); }template <int level> NO_INLINE void *f1 () { return f2 <level>(); }int main (int argc, char **argv) { if (argc > 1 ) { f1 <2 >(); } else { f1 <0 >(); } return 0 ; }
开启 -O3
级别编译优化项默认消除 frame-pointer,即 -fomit-frame-pointer
(加上编译参数 fno-omit-frame-pointer
表示保留 frame-pointer);关联符号名称 -rdynamic
;
1 2 3 4 5 6 7 8 > clang test1.cpp -O3 -rdynamic && ./a.out [0] ./a.out(_Z14dump_tracebackv+0x1e) [0x55ccb5c1218e] [1] ./a.out(_Z2f3ILi0EEPvv+0x6) [0x55ccb5c12276] [2] ./a.out(main+0x14) [0x55ccb5c12204] [3] /usr/lib64/libc.so.6(+0x3feb0) [0x7f29f083feb0] [4] /usr/lib64/libc.so.6(__libc_start_main+0x80) [0x7f29f083ff60] [5] ./a.out(_start+0x25) [0x55ccb5c120a5]
消除 frame-pointer 后,仍然可以用 backtrace 获取当前线程的调用堆栈信息,但是却无法通过内置函数 __builtin_return_address(2)
获取第 2 层调用栈(理论上应是 main 函数代码相关部分)信息
1 2 3 4 5 6 7 8 9 > clang test1.cpp -O3 -rdynamic && ./a.out l2 [0] ./a.out(_Z14dump_tracebackv+0x1e) [0x56397028f18e] [1] ./a.out(_Z2f3ILi2EEPvv+0x9) [0x56397028f249] [2] ./a.out(main+0xb) [0x56397028f1fb] [3] /usr/lib64/libc.so.6(+0x3feb0) [0x7f552de3feb0] [4] /usr/lib64/libc.so.6(__libc_start_main+0x80) [0x7f552de3ff60] [5] ./a.out(_start+0x25) [0x56397028f0a5] [1] 2757999 segmentation fault (core dumped) ./a.out l2
test1 分析 导出反汇编结果
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 > clang test1.cpp -O3 -rdynamic && objdump -C -r -d a.out 0000000000001080 <_start>: 1080: f3 0f 1e fa endbr64 1084: 31 ed xor %ebp,%ebp 1086: 49 89 d1 mov %rdx,%r9 1089: 5e pop %rsi 108a: 48 89 e2 mov %rsp,%rdx 108d: 48 83 e4 f0 and $0xfffffffffffffff0 ,%rsp 1091: 50 push %rax 1092: 54 push %rsp 1093: 45 31 c0 xor %r8d,%r8d 1096: 31 c9 xor %ecx,%ecx 1098: 48 8d 3d 51 01 00 00 lea 0x151(%rip),%rdi 109f: ff 15 33 2f 00 00 callq *0x2f33(%rip) 10a5: f4 hlt 0000000000001170 <dump_traceback()>: 1170: 41 57 push %r15 1172: 41 56 push %r14 1174: 41 54 push %r12 1176: 53 push %rbx 1177: 48 81 ec 48 06 00 00 sub $0x648 ,%rsp 117e: 49 89 e6 mov %rsp,%r14 1181: 4c 89 f7 mov %r14,%rdi 1184: be c8 00 00 00 mov $0xc8 ,%esi 1189: e8 c2 fe ff ff callq 1050 <backtrace@plt> 118e: 89 c3 mov %eax,%ebx 1190: 4c 89 f7 mov %r14,%rdi 1193: 89 c6 mov %eax,%esi 1195: e8 a6 fe ff ff callq 1040 <backtrace_symbols@plt> 119a: 48 85 c0 test %rax,%rax 119d: 74 41 je 11e0 <dump_traceback()+0x70> 119f: 49 89 c7 mov %rax,%r15 11a2: 85 db test %ebx,%ebx 11a4: 7e 32 jle 11d8 <dump_traceback()+0x68> 11a6: 41 89 dc mov %ebx,%r12d 11a9: 4c 8d 35 54 0e 00 00 lea 0xe54(%rip),%r14 11b0: 31 db xor %ebx,%ebx 11b2: 66 66 66 66 66 2e 0f data16 data16 data16 data16 nopw %cs:0x0(%rax,%rax,1) 11b9: 1f 84 00 00 00 00 00 11c0: 49 8b 14 df mov (%r15,%rbx,8),%rdx 11c4: 4c 89 f7 mov %r14,%rdi 11c7: 89 de mov %ebx,%esi 11c9: 31 c0 xor %eax,%eax 11cb: e8 90 fe ff ff callq 1060 <printf @plt> 11d0: 48 ff c3 inc %rbx 11d3: 49 39 dc cmp %rbx,%r12 11d6: 75 e8 jne 11c0 <dump_traceback()+0x50> 11d8: 4c 89 ff mov %r15,%rdi 11db: e8 50 fe ff ff callq 1030 <free@plt> 11e0: 48 81 c4 48 06 00 00 add $0x648 ,%rsp 11e7: 5b pop %rbx 11e8: 41 5c pop %r12 11ea: 41 5e pop %r14 11ec: 41 5f pop %r15 11ee: c3 retq 11ef: 90 nop 00000000000011f0 <main>: 11f0: 50 push %rax 11f1: 83 ff 02 cmp $0x2 ,%edi 11f4: 7c 09 jl 11ff <main+0xf> 11f6: e8 15 00 00 00 callq 1210 <void* f1<2>()> 11fb: 31 c0 xor %eax,%eax 11fd: 59 pop %rcx 11fe: c3 retq 11ff: e8 1c 00 00 00 callq 1220 <void* f1<0>()> 1204: 31 c0 xor %eax,%eax 1206: 59 pop %rcx 1207: c3 retq 1208: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 120f: 00 0000000000001210 <void* f1<2>()>: 1210: e9 1b 00 00 00 jmpq 1230 <void* f2<2>()> 1215: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 121c: 00 00 00 121f: 90 nop 0000000000001220 <void* f1<0>()>: 1220: e9 3b 00 00 00 jmpq 1260 <void* f2<0>()> 1225: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 122c: 00 00 00 122f: 90 nop 0000000000001230 <void* f2<2>()>: 1230: e9 0b 00 00 00 jmpq 1240 <void* f3<2>()> 1235: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 123c: 00 00 00 123f: 90 nop 0000000000001240 <void* f3<2>()>: 1240: 55 push %rbp 1241: 48 89 e5 mov %rsp,%rbp 1244: e8 27 ff ff ff callq 1170 <dump_traceback()> 1249: 48 8b 45 00 mov 0x0(%rbp),%rax 124d: 48 8b 00 mov (%rax),%rax 1250: 48 8b 40 08 mov 0x8(%rax),%rax 1254: 5d pop %rbp 1255: c3 retq 1256: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 125d: 00 00 00 0000000000001260 <void* f2<0>()>: 1260: e9 0b 00 00 00 jmpq 1270 <void* f3<0>()> 1265: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 126c: 00 00 00 126f: 90 nop 0000000000001270 <void* f3<0>()>: 1270: 50 push %rax 1271: e8 fa fe ff ff callq 1170 <dump_traceback()> 1276: 48 8b 44 24 08 mov 0x8(%rsp),%rax 127b: 59 pop %rcx 127c: c3 retq
参考 附录 解析以上代码
0000000000001270 <void* f3<0>()>:
0x0(%rsp)
是返回地址
push %rax
后,返回地址为 0x8(%rsp)
mov 0x8(%rsp),%rax
获取返回地址
0000000000001240 <void* f3<2>()>:
执行完 push %rbp
和 mov %rsp,%rbp
后
%rsp
是当前栈顶地址
%rbp
是当前栈基地址
0x0(%rbp)
是上个栈基地址,即栈基地址+0x0
0x8(%rbp)
是返回地址,即栈基地址+0x8
mov 0x0(%rbp),%rax
获取上个栈基地址
mov (%rax),%rax
获取上上个栈基地址
mov 0x8(%rax),%rax
获取最终返回地址
__builtin_return_address(0)
不依赖 rbp,可以正常工作
__builtin_return_address(?)
函数调用存在几个明显缺点:
当解析层数大于 0 时依赖 rbp 中数据,会忽略消除 frame-pointer 或尾递归优化后的函数调用,导致无法获取准确的返回地址
栈回溯时,除了 rsp 无法恢复其他寄存器
无法关联源代码
Frame Pointer 栈回溯缺点总结
强依赖于 frame-pointer,如果调用栈中有栈帧没有保存 frame-pointer,则会导致解析失败
独占一个寄存器存储 frame-pointer,可能导致性能下降
ref: DWARF Debugging Standard
DWARF is a debugging information file format used by many compilers and debuggers to support source level debugging. It addresses the requirements of a number of procedural languages, such as C, C++, and Fortran, and is designed to be extensible to other languages. DWARF is architecture independent and applicable to any processor or operating system. It is widely used on Unix, Linux and other operating systems, as well as in stand-alone environments.
DWARF
使用了独立的 .debug_*
段来解决依赖 frame-pointer 的问题,以 CFI(Call Frame Information)
描述调用栈帧信息(Linux 系统标准中的相关部分也衍生于此,详见下文)。这种方式的优点:
栈回溯不依赖 rbp,可恢复其他寄存器数据
在 ELF 文件中附加段,无运行时性能开销
.debug_frame
段记录栈帧相关信息,其他部分细化拆分成诸如 .debug_info
,.debug_abbrev
,.debug_line
,.debug_str
等段
Exception Handling Frame 现代 Linux 系统标准 LSB(Linux Standard Base) 中指明:支持异常的语言(例如 C++),必须向运行时环境提供附加信息,以描述在异常处理期间必须展开的调用帧;该信息包含在特殊段 .eh_frame
和 .eh_framehdr
中;
.eh_frame
基于 DWARF v2 版本的 .debug_frame
,主要由 CIE(Common Information Entry)
和 FDE(Frame Description Entry)
组成。对于函数定义,编译器在汇编中嵌入 CFI Directive 相关指令,由汇编器解析生成 .eh_frame
或 .debug_frame
。生成行为受编译参数影响如下:
编译参数
生成段
-fasynchronous-unwind-tables -fexceptions
.eh_frame
-fno-asynchronous-unwind-tables -fexceptions
.eh_frame
-fasynchronous-unwind-tables -fno-exceptions
.eh_frame
-fno-asynchronous-unwind-tables -fno-exceptions [-g0 | none]
none
-fno-asynchronous-unwind-tables -fno-exceptions -g<? gt 0>
.debug_frame
CFI Directive 指令 基本介绍
指令命名为 .cfi_* ()
.cfi_startproc
和 .cfi_endproc
标识 FDE 区域
.cfi_def_cfa_offset
表示调用栈返回地址
.cfi_offset
定义寄存器数据保存位置
.cfi_def_cfa_*
定义 CFA 的计算规则
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 > echo 'void test() {__builtin_unwind_init();}' > test3.cpp && clang test3.cpp -S && cat test3.s .text .file "test3.cpp" .globl _Z4testv .p2align 4, 0x90 .type _Z4testv,@function _Z4testv: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp .cfi_def_cfa_register %rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx .cfi_offset %rbx, -56 .cfi_offset %r12, -48 .cfi_offset %r13, -40 .cfi_offset %r14, -32 .cfi_offset %r15, -24 popq %rbx popq %r12 popq %r13 popq %r14 popq %r15 popq %rbp .cfi_def_cfa %rsp, 8 retq .Lfunc_end0: .size _Z4testv, .Lfunc_end0-_Z4testv .cfi_endproc .ident "Debian clang version 15.0.7" .section ".note.GNU-stack" ,"" ,@progbits .addrsig
.eh_frame
段.eh_frame
在 x86 平台下的内容示例如下:
每个 FDE 均有关联的 CIE
FDE 每个条目记载特定 PC 位置的 CFA,被调用者 nonvolatile 寄存器的保存位置和返回地址(ra)
1 2 3 4 5 6 7 8 9 10 11 12 Contents of the .eh_frame section: (FDE 偏移量) (FDE 长度) (FDE 所属的 CIE) (FDE 对应函数的起始 PC 和结束 PC) 00000030 0000000000000024 00000034 FDE cie= 00000000 pc= 0000000000001020. .0000000000001080 (PC 位置) (上一级 (被调用者非易失性 (返回地址的位置) 调用者的 寄存器保存的位置) 栈顶地址) LOC CFA rbx r12 r14 r15 ra 0000000000001170 rsp+ 8 u u u u c-8 000000000000117 e rsp+ 1648 c-40 c-32 c-24 c-16 c-8 00000000000011 ee rsp+ 8 c-40 c-32 c-24 c-16 c-8
导出并解析调用帧定义
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 > clang test1.cpp -O3 -rdynamic && readelf -wF ./a.out Contents of the .eh_frame section: 00000000 0000000000000014 00000000 CIE "zR" cf=1 df =-8 ra=16 LOC CFA ra 0000000000000000 rsp+8 u 00000018 0000000000000014 0000001c FDE cie=00000000 pc=0000000000001080..00000000000010ab 00000030 0000000000000014 00000000 CIE "zR" cf=1 df =-8 ra=16 LOC CFA ra 0000000000000000 rsp+8 c-8 00000048 0000000000000024 0000001c FDE cie=00000030 pc=0000000000001020..0000000000001070 LOC CFA ra 0000000000001020 rsp+16 c-8 0000000000001026 rsp+24 c-8 0000000000001030 exp c-8 00000070 0000000000000014 00000044 FDE cie=00000030 pc=0000000000001070..0000000000001078 00000088 0000000000000038 0000005c FDE cie=00000030 pc=0000000000001170..00000000000011ef LOC CFA rbx r12 r14 r15 ra 0000000000001170 rsp+8 u u u u c-8 0000000000001172 rsp+16 u u u u c-8 0000000000001174 rsp+24 u u u u c-8 0000000000001176 rsp+32 u u u u c-8 0000000000001177 rsp+40 u u u u c-8 000000000000117e rsp+1648 c-40 c-32 c-24 c-16 c-8 00000000000011e7 rsp+40 c-40 c-32 c-24 c-16 c-8 00000000000011e8 rsp+32 c-40 c-32 c-24 c-16 c-8 00000000000011ea rsp+24 c-40 c-32 c-24 c-16 c-8 00000000000011ec rsp+16 c-40 c-32 c-24 c-16 c-8 00000000000011ee rsp+8 c-40 c-32 c-24 c-16 c-8 000000c4 000000000000001c 00000098 FDE cie=00000030 pc=00000000000011f0..0000000000001208 LOC CFA ra 00000000000011f0 rsp+8 c-8 00000000000011f1 rsp+16 c-8 00000000000011fe rsp+8 c-8 00000000000011ff rsp+16 c-8 0000000000001207 rsp+8 c-8 000000e4 0000000000000010 000000b8 FDE cie=00000030 pc=0000000000001210..0000000000001215 000000f8 0000000000000010 000000cc FDE cie=00000030 pc=0000000000001220..0000000000001225 0000010c 0000000000000010 000000e0 FDE cie=00000030 pc=0000000000001230..0000000000001235 00000120 000000000000001c 000000f4 FDE cie=00000030 pc=0000000000001240..0000000000001256 LOC CFA rbp ra 0000000000001240 rsp+8 u c-8 0000000000001241 rsp+16 c-16 c-8 0000000000001244 rbp+16 c-16 c-8 0000000000001255 rsp+8 c-16 c-8 00000140 0000000000000010 00000114 FDE cie=00000030 pc=0000000000001260..0000000000001265 00000154 0000000000000018 00000128 FDE cie=00000030 pc=0000000000001270..000000000000127d LOC CFA ra 0000000000001270 rsp+8 c-8 0000000000001271 rsp+16 c-8 000000000000127c rsp+8 c-8 00000170 0000000000000044 00000144 FDE cie=00000030 pc=0000000000001280..00000000000012dd LOC CFA rbx rbp r12 r13 r14 r15 ra 0000000000001280 rsp+8 u u u u u u c-8 0000000000001282 rsp+16 u u u u u c-16 c-8 0000000000001287 rsp+24 u u u u c-24 c-16 c-8 000000000000128c rsp+32 u u u c-32 c-24 c-16 c-8 0000000000001291 rsp+40 u u c-40 c-32 c-24 c-16 c-8 0000000000001299 rsp+48 u c-48 c-40 c-32 c-24 c-16 c-8 00000000000012a1 rsp+56 c-56 c-48 c-40 c-32 c-24 c-16 c-8 00000000000012a8 rsp+64 c-56 c-48 c-40 c-32 c-24 c-16 c-8 00000000000012d2 rsp+56 c-56 c-48 c-40 c-32 c-24 c-16 c-8 00000000000012d3 rsp+48 c-56 c-48 c-40 c-32 c-24 c-16 c-8 00000000000012d4 rsp+40 c-56 c-48 c-40 c-32 c-24 c-16 c-8 00000000000012d6 rsp+32 c-56 c-48 c-40 c-32 c-24 c-16 c-8 00000000000012d8 rsp+24 c-56 c-48 c-40 c-32 c-24 c-16 c-8 00000000000012da rsp+16 c-56 c-48 c-40 c-32 c-24 c-16 c-8 00000000000012dc rsp+8 c-56 c-48 c-40 c-32 c-24 c-16 c-8 000001b8 0000000000000010 0000018c FDE cie=00000030 pc=00000000000012e0..00000000000012e1 000001cc ZERO terminator > readelf -wF /usr/lib64/libc.so.6 000000ac 0000000000000018 000000b0 FDE cie=00000000 pc=000000000003fe30..000000000003fedc LOC CFA ra 000000000003fe30 rsp+8 c-8 000000000003fe31 rsp+16 c-8 000000000003fe32 rsp+8 c-8 000000000003fe39 rsp+160 c-8 000000c8 0000000000000030 000000cc FDE cie=00000000 pc=000000000003fee0..0000000000040028 LOC CFA rbx rbp r12 r13 r14 r15 ra 000000000003fee0 rsp+8 u u u u u u c-8 000000000003fee6 rsp+16 u u u u u c-16 c-8 000000000003feeb rsp+24 u u u u c-24 c-16 c-8 000000000003feed rsp+32 u u u c-32 c-24 c-16 c-8 000000000003fef2 rsp+40 u u c-40 c-32 c-24 c-16 c-8 000000000003fef6 rsp+48 u c-48 c-40 c-32 c-24 c-16 c-8 000000000003fef9 rsp+56 c-56 c-48 c-40 c-32 c-24 c-16 c-8 000000000003fefd rsp+80 c-56 c-48 c-40 c-32 c-24 c-16 c-8 000000fc 0000000000000010 00000100 FDE cie=00000000 pc=0000000000040030..000000000004004a
分析可知 backtrace()
的调用栈展开过程为:
dump_traceback()+0x1e
118e
:
ref 000000000000117e rsp+1648 c-40 c-32 c-24 c-16 c-8
1640(%rsp)
得到返回地址 1276
void* f3<0>()+0x6
1276
:
ref 0000000000001271 rsp+16 c-8
8(%rsp)
得到返回地址 1204
main()+0x14
1204
:
ref 00000000000011ff rsp+16 c-8
8(%rsp)
得到 /usr/lib64/libc.so.6
中地址 3feb0
/usr/lib64/libc.so.6(+0x3feb0)
ref 000000000003fe39 rsp+160 c-8
152(%rsp)
得到 /usr/lib64/libc.so.6
中地址 3ff60
/usr/lib64/libc.so.6(__libc_start_main+0x80)
ref 000000000003fefd rsp+80 c-56 c-48 c-40 c-32 c-24 c-16 c-8
72(%rsp)
得到返回地址 10a5
_start+0x25
ref 0000000000000000 rsp+8 u
libunwind
Stack Unwindinglibunwind 提供了可移植的、高效的 API 来确定程序的调用链。API 支持本地(同进程)和远程(跨进程)操作。API 可操作每个调用帧的保存状态(被调用方保存),可在调用链的任何点恢复执行。典型的 libunwind 使用场景有:异常处理,debug 调试,调用链监控,setjmp() / longjmp()
。
libunwind 的相关接口函数如下
unw_init_local
主要用于当前进程的栈展开
unw_init_remote
则通常作用于其他进程
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 > nm -CD libunwind.so | grep 'unw_' | less 0000000000007d84 W unw_getcontext 0000000000001d80 W unw_get_fpreg 0000000000001f30 W unw_get_proc_info 0000000000002030 W unw_get_proc_name 0000000000001bc0 W unw_get_reg 0000000000001ab0 W unw_init_local 00000000000020d0 W unw_is_fpreg 00000000000021d0 W unw_is_signal_frame 0000000000002240 W unw_iterate_dwarf_unwind_cache 000000000000e0c8 D unw_local_addr_space 0000000000002150 W unw_regname 0000000000001fc0 W unw_resume 0000000000001e20 W unw_set_fpreg 0000000000001c60 W unw_set_reg 0000000000001ec0 W unw_step
test2 test2 测试基于 libunwind 进行本地调用栈回溯。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 #include <assert.h> #include <cstdio> #include <cstdlib> #include <cxxabi.h> #define UNW_LOCAL_ONLY #include <libunwind.h> #ifndef NO_INLINE #define NO_INLINE __attribute__((__noinline__)) #endif NO_INLINE void dump_backtrace () { char buff[1024 ]; size_t demangle_buff_size = 0 ; char *demangle_buff = nullptr ; unw_cursor_t cursor; unw_context_t uc; unw_word_t offset{}; unw_getcontext (&uc); unw_init_local (&cursor, &uc); while (unw_step (&cursor) > 0 ) { unw_word_t ip, sp; unw_get_reg (&cursor, UNW_REG_IP, &ip); unw_get_reg (&cursor, UNW_REG_SP, &sp); auto status = unw_get_proc_name (&cursor, buff, sizeof (buff), &offset); assert (!status); auto realname = buff; { if (int status = -1 ; demangle_buff = abi::__cxa_demangle(buff, demangle_buff, &demangle_buff_size, &status), status == 0 ) { realname = demangle_buff; } } printf ("0x%016lx <%s+0x%lx>\n" , ip, realname, offset); } if (demangle_buff) { free (demangle_buff); } } NO_INLINE void *f3 () { dump_backtrace (); return nullptr ; } NO_INLINE void *f2 () { return f3 (); }NO_INLINE void *f1 () { return f2 (); }int main (int argc, char **argv) { f1 (); return 0 ; }
1 2 3 4 5 6 7 8 9 > clang test2.cpp -O3 -L/usr/lib64 -lunwind -lc++ -lc++abi -stdlib=libc++ -std=gnu++20 -rdynamic && ./a.out 0x00005645e5d11306 <f3()+0x6> 0x00005645e5d11316 <f2()+0x6> 0x00005645e5d11326 <f1()+0x6> 0x00005645e5d11336 <main+0x6> 0x00007ff550c3feb0 <__libc_start_call_main+0x80> 0x00007ff550c3ff60 <__libc_start_main+0x80> 0x00005645e5d110f5 <_start+0x25>
此处由于函数 f1()
f2()
f3()
dump_backtrace()
默认符号对外可见,编译器优化 optimize-sibling-calls
便无法生效。如果函数声明为 static
或使用匿名空间,则可优化这种调用关系。
C++ Exception Handling C++ Exception Handling 是 Stack Unwinding 的典型应用。异常处理相关的 ABI 有多种,以 Itanium C++ ABI: Exception Handling 使用最广,其中 C++ 异常处理的 ABI 被分成 3 个级别:
Landing Pad 定义
landing-pad 是指捕获异常或在异常后执行清理流程的用户代码
异常处理过程中的 Personality Routine 流程会有选择地将代码的控制权移交给 landing-pad,执行相关逻辑后,或结束异常处理并回到正常用户代码,或继续处理异常,或抛出异常
test4 test4 以简单的代码实例介绍异常从被抛出到捕获的过程
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 volatile int v = 0x0 ;struct N { __attribute__((__noinline__)) N () { v = 0x1 ; } __attribute__((__noinline__)) ~N () { v = 0x2 ; } }; void test (bool x) { N n; try { if (x) throw v; v = 0x4 ; } catch (int &e) { throw static_cast <double >(v); } catch (double &e) { v = 0x5 ; } catch (...) { v = 0x3 ; } } void test_noexcept (bool ) noexcept { N n; throw v; } void test2 (bool x) { N n; try { test (x); } catch (float &e) { v = 0x6 ; } catch (double &e) { v = 0x7 ; } v = 0x8 ; }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 > clang -c test4.cpp -O3 && objdump -C -r -d ./test4.o Disassembly of section .text: 0000000000000000 <test (bool)>: 0: 53 push %rbx 1: 48 83 ec 10 sub $0x10 ,%rsp 5: 89 fb mov %edi,%ebx 7: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi c: e8 00 00 00 00 callq 11 <test (bool)+0x11> d: R_X86_64_PLT32 N::N()-0x4 11: 85 db test %ebx,%ebx 13: 75 1a jne 2f <test (bool)+0x2f> 15: c7 05 00 00 00 00 04 movl $0x4 ,0x0(%rip) 1c: 00 00 00 17: R_X86_64_PC32 v-0x8 1f: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi 24: e8 00 00 00 00 callq 29 <test (bool)+0x29> 25: R_X86_64_PLT32 N::~N()-0x4 29: 48 83 c4 10 add $0x10 ,%rsp 2d: 5b pop %rbx 2e: c3 retq 2f: bf 04 00 00 00 mov $0x4 ,%edi 34: e8 00 00 00 00 callq 39 <test (bool)+0x39> 35: R_X86_64_PLT32 __cxa_allocate_exception-0x4 39: 8b 0d 00 00 00 00 mov 0x0(%rip),%ecx 3b: R_X86_64_PC32 v-0x4 3f: 89 08 mov %ecx,(%rax) 41: 48 8b 35 00 00 00 00 mov 0x0(%rip),%rsi 44: R_X86_64_REX_GOTPCRELX typeinfo for int-0x4 48: 48 89 c7 mov %rax,%rdi 4b: 31 d2 xor %edx,%edx 4d: e8 00 00 00 00 callq 52 <test (bool)+0x52> 4e: R_X86_64_PLT32 __cxa_throw-0x4 52: eb 63 jmp b7 <test (bool)+0xb7> 54: 48 89 d3 mov %rdx,%rbx 57: 48 89 c7 mov %rax,%rdi 5a: 83 fb 03 cmp $0x3 ,%ebx 5d: 74 2c je 8b <test (bool)+0x8b> 5f: e8 00 00 00 00 callq 64 <test (bool)+0x64> 60: R_X86_64_PLT32 __cxa_begin_catch-0x4 64: 83 fb 02 cmp $0x2 ,%ebx 67: 75 11 jne 7a <test (bool)+0x7a> 69: c7 05 00 00 00 00 88 movl $0x5 ,0x0(%rip) 70: 88 00 00 6b: R_X86_64_PC32 v-0x8 73: e8 00 00 00 00 callq 78 <test (bool)+0x78> 74: R_X86_64_PLT32 __cxa_end_catch-0x4 78: eb a5 jmp 1f <test (bool)+0x1f> 7a: c7 05 00 00 00 00 03 movl $0x3 ,0x0(%rip) 81: 00 00 00 7c: R_X86_64_PC32 v-0x8 84: e8 00 00 00 00 callq 89 <test (bool)+0x89> 85: R_X86_64_PLT32 __cxa_end_catch-0x4 89: eb 94 jmp 1f <test (bool)+0x1f> 8b: e8 00 00 00 00 callq 90 <test (bool)+0x90> 8c: R_X86_64_PLT32 __cxa_begin_catch-0x4 90: bf 08 00 00 00 mov $0x8 ,%edi 95: e8 00 00 00 00 callq 9a <test (bool)+0x9a> 96: R_X86_64_PLT32 __cxa_allocate_exception-0x4 9a: f2 0f 2a 05 00 00 00 cvtsi2sdl 0x0(%rip),%xmm0 a1: 00 9e: R_X86_64_PC32 v-0x4 a2: f2 0f 11 00 movsd %xmm0,(%rax) a6: 48 8b 35 00 00 00 00 mov 0x0(%rip),%rsi a9: R_X86_64_REX_GOTPCRELX typeinfo for double-0x4 ad: 48 89 c7 mov %rax,%rdi b0: 31 d2 xor %edx,%edx b2: e8 00 00 00 00 callq b7 <test (bool)+0xb7> b3: R_X86_64_PLT32 __cxa_throw-0x4 b7: 48 89 c3 mov %rax,%rbx ba: eb 08 jmp c4 <test (bool)+0xc4> bc: 48 89 c3 mov %rax,%rbx bf: e8 00 00 00 00 callq c4 <test (bool)+0xc4> c0: R_X86_64_PLT32 __cxa_end_catch-0x4 c4: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi c9: e8 00 00 00 00 callq ce <test (bool)+0xce> ca: R_X86_64_PLT32 N::~N()-0x4 ce: 48 89 df mov %rbx,%rdi d1: e8 00 00 00 00 callq d6 <test (bool)+0xd6> d2: R_X86_64_PLT32 _Unwind_Resume-0x4 d6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) dd : 00 00 00 00000000000000e0 <test_noexcept(bool)>: e0: 50 push %rax e1: 48 89 e7 mov %rsp,%rdi e4: e8 00 00 00 00 callq e9 <test_noexcept(bool)+0x9> e5: R_X86_64_PLT32 N::N()-0x4 e9: bf 04 00 00 00 mov $0x4 ,%edi ee: e8 00 00 00 00 callq f3 <test_noexcept(bool)+0x13> ef: R_X86_64_PLT32 __cxa_allocate_exception-0x4 f3: 8b 0d 00 00 00 00 mov 0x0(%rip),%ecx f5: R_X86_64_PC32 v-0x4 f9: 89 08 mov %ecx,(%rax) fb: 48 8b 35 00 00 00 00 mov 0x0(%rip),%rsi fe: R_X86_64_REX_GOTPCRELX typeinfo for int-0x4 102: 48 89 c7 mov %rax,%rdi 105: 31 d2 xor %edx,%edx 107: e8 00 00 00 00 callq 10c <test_noexcept(bool)+0x2c> 108: R_X86_64_PLT32 __cxa_throw-0x4 10c: 48 89 c7 mov %rax,%rdi 10f: e8 00 00 00 00 callq 114 <test_noexcept(bool)+0x34> 110: R_X86_64_PLT32 __clang_call_terminate-0x4 114: 66 66 66 2e 0f 1f 84 data16 data16 nopw %cs:0x0(%rax,%rax,1) 11b: 00 00 00 00 00 0000000000000120 <test2(bool)>: 120: 55 push %rbp 121: 53 push %rbx 122: 50 push %rax 123: 89 fb mov %edi,%ebx 125: 48 89 e7 mov %rsp,%rdi 128: e8 00 00 00 00 callq 12d <test2(bool)+0xd> 129: R_X86_64_PLT32 N::N()-0x4 12d: 89 df mov %ebx,%edi 12f: e8 00 00 00 00 callq 134 <test2(bool)+0x14> 130: R_X86_64_PLT32 test (bool)-0x4 134: c7 05 00 00 00 00 08 movl $0x8 ,0x0(%rip) 13b: 00 00 00 136: R_X86_64_PC32 v-0x8 13e: 48 89 e7 mov %rsp,%rdi 141: e8 00 00 00 00 callq 146 <test2(bool)+0x26> 142: R_X86_64_PLT32 N::~N()-0x4 146: 48 83 c4 08 add $0x8 ,%rsp 14a: 5b pop %rbx 14b: 5d pop %rbp 14c: c3 retq 14d: 48 89 c3 mov %rax,%rbx 150: bd 06 00 00 00 mov $0x6 ,%ebp 155: 83 fa 02 cmp $0x2 ,%edx 158: 74 0a je 164 <test2(bool)+0x44> 15a: bd 07 00 00 00 mov $0x7 ,%ebp 15f: 83 fa 01 cmp $0x1 ,%edx 162: 75 15 jne 179 <test2(bool)+0x59> 164: 48 89 df mov %rbx,%rdi 167: e8 00 00 00 00 callq 16c <test2(bool)+0x4c> 168: R_X86_64_PLT32 __cxa_begin_catch-0x4 16c: 89 2d 00 00 00 00 mov %ebp,0x0(%rip) 16e: R_X86_64_PC32 v-0x4 172: e8 00 00 00 00 callq 177 <test2(bool)+0x57> 173: R_X86_64_PLT32 __cxa_end_catch-0x4 177: eb bb jmp 134 <test2(bool)+0x14> 179: 48 89 e7 mov %rsp,%rdi 17c: e8 00 00 00 00 callq 181 <test2(bool)+0x61> 17d: R_X86_64_PLT32 N::~N()-0x4 181: 48 89 df mov %rbx,%rdi 184: e8 00 00 00 00 callq 189 <test2(bool)+0x69> 185: R_X86_64_PLT32 _Unwind_Resume-0x4 Disassembly of section .text._ZN1NC2Ev: 0000000000000000 <N::N()>: 0: c7 05 00 00 00 00 01 movl $0x1 ,0x0(%rip) 7: 00 00 00 2: R_X86_64_PC32 v-0x8 a: c3 retq Disassembly of section .text._ZN1ND2Ev: 0000000000000000 <N::~N()>: 0: c7 05 00 00 00 00 02 movl $0x2 ,0x0(%rip) 7: 00 00 00 2: R_X86_64_PC32 v-0x8 a: c3 retq Disassembly of section .text.__clang_call_terminate: 0000000000000000 <__clang_call_terminate>: 0: 50 push %rax 1: e8 00 00 00 00 callq 6 <__clang_call_terminate+0x6> 2: R_X86_64_PLT32 __cxa_begin_catch-0x4 6: e8 00 00 00 00 callq b <__clang_call_terminate+0xb> 7: R_X86_64_PLT32 std::terminate()-0x4
test4 分析 异常抛出逻辑主要步骤:
异常对象构造:__cxa_allocate_exception
分配堆上空间并构造对象,设置异常类型 RTTI(Run-Time Type Information)
异常抛出:__cxa_throw
设置当前异常,执行 _Unwind_RaiseException
(主要分为 2 个阶段 search 和 cleanup):
search 阶段:通过 Personality Routine
机制(ELF 下主要根据 .gcc_except_table
段,由 __gxx_personality_v0
和 __gcc_personality_v0
解析)逐级展开栈帧,查找 try{}catch{}
与当前异常类型匹配的模块,如果没有找到就 terminate 进程,否则进入 cleanup 阶段
cleanup 阶段:重新通过 Personality Routine
机制逐级展开栈帧
找到需要清理变量的栈帧后,恢复寄存器状态,跳转到该帧相关的 landing-pad。该 landing-pad 最后会调用 _Unwind_Resume
跳转回到 cleanup 阶段。
找到需要执行异常捕获的栈帧后,恢复寄存器状态,跳转到该帧相关的 landing-pad。如果异常匹配成功后,调用 __cxa_begin_catch
,执行相关 catch 代码逻辑,最后调用 __cxa_end_catch
结束异常处理流程,回归正常代码;如果无异常匹配,则清理残留变量并通过 _Unwind_Resume
跳转回 cleanup 阶段;
__cxa_*
为 C++ 内部实现的异常处理接口,clang 下的具体行为可参考 Exception Handling in LLVM
__cxa_begin_catch
返回异常对象的指针
__cxa_end_catch
减少当前捕获异常的引用计数,或清除异常对象
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > nm -CD libstdc++.so 00000000000917a0 T __cxa_allocate_exception 00000000000919f0 T __cxa_begin_catch 000000000009e0b0 T __cxa_demangle 0000000000091a60 T __cxa_end_catch 0000000000092b40 T __cxa_rethrow 0000000000092af0 T __cxa_throw > objdump -C -r -d libstdc++.so.6 0000000000092af0 <__cxa_throw@@CXXABI_1.3>: ... 92b22: e8 b9 7a ff ff callq 8a5e0 <_Unwind_RaiseException@plt> ...
异常抛出的行为依赖 unwind 库接口 _Unwind_*
。gcc 自带默认 unwind 库 libgcc_s.[so.*]
和 libgcc_eh.a
,此外还有以 nongnu.org/libunwind 和 llvm-project/libunwind 为典型代表的 libunwind 库。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 > nm -CD libunwind.so | grep '_Unwind_' | less 0000000000007760 T _Unwind_Backtrace 0000000000007350 T _Unwind_DeleteException 00000000000076a0 T _Unwind_FindEnclosingFunction 00000000000078f0 T _Unwind_Find_FDE 0000000000007170 T _Unwind_ForcedUnwind 00000000000079d0 T _Unwind_GetCFA 00000000000075c0 T _Unwind_GetDataRelBase 00000000000073a0 T _Unwind_GetGR 0000000000007470 T _Unwind_GetIP 0000000000007a40 T _Unwind_GetIPInfo 0000000000007220 T _Unwind_GetLanguageSpecificData 00000000000072d0 T _Unwind_GetRegionStart 0000000000007630 T _Unwind_GetTextRelBase 0000000000006780 T _Unwind_RaiseException 0000000000006de0 T _Unwind_Resume 0000000000007530 T _Unwind_Resume_or_Rethrow 0000000000007420 T _Unwind_SetGR 00000000000074e0 T _Unwind_SetIP
通常 C++ 编译器默认函数会抛出异常(禁止使用异常则需加上编译参数 -fno-exceptions
)。被关键字 noexcept
修饰的函数表示其不会对外抛出异常,主要影响体现在 2 个层面:
函数定义:f(...) noexcept { ... f?(); ... }
定义近似于 f(...) { ... try { f?(); } catch (...) {std::terminate();} }
,即函数内部调用其他函数时捕获任何异常均会进入 terminate 流程。
函数声明:声明为 noexcept
的函数被调用时,编译器无需为该调用路径设置 landing-pad,以便于优化调用方行为。常见的 C
或 extern "C"
基础库函数通常会包含 throw()
/ noexcept(true)
/ noexcept
之类的无异常声明。
C++ 异常的影响 鉴于 C++ 抛出异常和处理异常时的糟糕性能已是公认的问题,此处不再累述。然而,面向基本无异常抛出的场景,用异常来代替返回值检查可以进一步优化最短链路。
test5 中 f1()
会在出现错误时抛出异常,f2()
则是以返回值表示状态,test1()
和 test2()
实现近似功能。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 #include <cstdint> #include <optional> int64_t f1 () ;using N = std::optional<int64_t >;N f2 () noexcept ;int64_t test1 (size_t n) { int64_t res = 0 ; for (size_t i = 0 ; i < n; ++i) { res += f1 (); } return res; } N test2 (size_t n) noexcept { int64_t res = 0 ; for (size_t i = 0 ; i < n; ++i) { auto &&x = f2 (); if (!x) { return x; } res += *x; } return res; }
从反汇编结果来看,test1()
函数的代码更精练紧凑,使用的寄存器更少。极少出现错误时,整个调用链路可以节省状态检查的相关逻辑,进一步压榨性能。
f2()
将返回状态和返回结果分别存储在 8 字节整型结构中返回,可以通过 RAX 和 RDX 寄存器传回调用方,test2()
中则检测 RDX 并跳转分支。如果返回状态错误较少,则 CPU 分支预测成功率较高情况下,这种状态检查造成的开销微乎其微。如果状态检查的逻辑较为复杂时,则需要具体评估使用场景并优化。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 > clang -c test5.cpp -O3 -std=gnu++17 && objdump -C -r -d test5.o test5.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <test1(unsigned long)>: 0: 41 56 push %r14 2: 53 push %rbx 3: 50 push %rax 4: 48 85 ff test %rdi,%rdi 7: 74 16 je 1f <test1(unsigned long)+0x1f> 9: 48 89 fb mov %rdi,%rbx c: 45 31 f6 xor %r14d,%r14d f: 90 nop 10: e8 00 00 00 00 callq 15 <test1(unsigned long)+0x15> 11: R_X86_64_PLT32 f1()-0x4 15: 49 01 c6 add %rax,%r14 18: 48 ff cb dec %rbx 1b: 75 f3 jne 10 <test1(unsigned long)+0x10> 1d: eb 03 jmp 22 <test1(unsigned long)+0x22> 1f: 45 31 f6 xor %r14d,%r14d 22: 4c 89 f0 mov %r14,%rax 25: 48 83 c4 08 add $0x8 ,%rsp 29: 5b pop %rbx 2a: 41 5e pop %r14 2c: c3 retq 2d: 0f 1f 00 nopl (%rax) 0000000000000030 <test2(unsigned long)>: 30: 55 push %rbp 31: 41 56 push %r14 33: 53 push %rbx 34: 41 b6 01 mov $0x1 ,%r14b 37: 48 85 ff test %rdi,%rdi 3a: 74 27 je 63 <test2(unsigned long)+0x33> 3c: 48 89 fd mov %rdi,%rbp 3f: 31 db xor %ebx,%ebx 41: 66 66 66 66 66 66 2e data16 data16 data16 data16 data16 nopw %cs:0x0(%rax,%rax,1) 48: 0f 1f 84 00 00 00 00 4f: 00 50: e8 00 00 00 00 callq 55 <test2(unsigned long)+0x25> 51: R_X86_64_PLT32 f2()-0x4 55: 84 d2 test %dl,%dl 57: 74 0e je 67 <test2(unsigned long)+0x37> 59: 48 01 c3 add %rax,%rbx 5c: 48 ff cd dec %rbp 5f: 75 ef jne 50 <test2(unsigned long)+0x20> 61: eb 0a jmp 6d <test2(unsigned long)+0x3d> 63: 31 db xor %ebx,%ebx 65: eb 06 jmp 6d <test2(unsigned long)+0x3d> 67: 45 31 f6 xor %r14d,%r14d 6a: 48 89 c3 mov %rax,%rbx 6d: 48 89 d8 mov %rbx,%rax 70: 44 89 f2 mov %r14d,%edx 73: 5b pop %rbx 74: 41 5e pop %r14 76: 5d pop %rbp 77: c3 retq > ls -l test5.o -rw-r--r-- 1 root root 1480 Oct 10 14:14 test5.o > clang -c test5.cpp -O3 -std=gnu++17 -fno-exceptions > ls -l test5.o -rw-r--r-- 1 root root 1432 Oct 10 14:14 test5.o
异常影响代码体积:
启用异常后,每处可能抛出异常的函数调用,每处 try{}catch{}
异常捕获逻辑,编译器均会生成相关的 landing-pad 代码。编译器在 .gcc_except_table
段存储函数的异常表,包含在函数代码的特定部分中引发异常时要执行的相关操作。
相对于禁用异常,启用异常后二进制文件体积会增大约 10% ~ 20%。这也是 LLVM Coding Standards 中禁用异常的原因。
异常影响编译优化:
理论上而言,编译器越先进,所生成的代码性能上(无异常抛出时)越是能接近禁用异常的场景。早期编译器这方面的能力不强,现代主流编译器已有较大改善。
编译器分析推导上下文时,越简单明确的分支|行为可以令其做出更好的优化,同时也可提升编译速度。因此,明确不会抛出异常的函数,建议在函数定义和声明处均加上 noexcept
。
C 语言异常 C 语言没有原生的异常机制,可通过 setjmp
/ longjmp
模拟实现类似的效果。
ref: https://en.wikipedia.org/wiki/Setjmp.h
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 #include <cstdio> #include <cstdlib> #include <cstring> #include <pthread.h> #include <setjmp.h> static void first () ;static void second () ;static jmp_buf exception_env;static int exception_type;struct TestFlag { TestFlag () { printf ("construct %s \n" , __func__); } ~TestFlag () { printf ("destruct %s. SHOULD NOT HAPPEN\n" , __func__); exit (-1 ); } }; int main (void ) { char *volatile mem_buffer = NULL ; if (setjmp (exception_env)) { printf ("first failed, exception type: %d\n" , exception_type); } else { puts ("calling first" ); first (); mem_buffer = (char *)(malloc (300 )); printf ("%s\n" , strcpy (mem_buffer, "first succeeded" )); } free (mem_buffer); return 0 ; } static void first () { jmp_buf my_env; puts ("entering first" ); TestFlag n; std::memcpy (my_env, exception_env, sizeof my_env); switch (setjmp (exception_env)) { case 3 : puts ("second failed, exception type: 3; remapping to type 1" ); exception_type = 1 ; default : std::memcpy (exception_env, my_env, sizeof exception_env); longjmp (exception_env, exception_type); case 0 : puts ("calling second" ); second (); puts ("second succeeded" ); } std::memcpy (exception_env, my_env, sizeof exception_env); puts ("leaving first" ); } static void second () { puts ("entering second" ); exception_type = 3 ; longjmp (exception_env, exception_type); puts ("leaving second" ); }
1 2 3 4 5 6 7 8 9 > clang test6.cpp -O3 -std=gnu++17 && ./a.out calling first entering first construct TestFlag calling second entering second second failed, exception type : 3; remapping to type 1 first failed, exception type : 1
这种方式比较灵活可控,异常链路下的性能可能优于 C++,但缺点也很明显:
需要自建资源管控机制:longjmp 跳转后,无法逐级展开栈帧并清理残留数据;setjmp 仅保存寄存器相关数据,栈上对象无保护;
代码复杂度急剧增加
C++ 异常总结 广义上说异常机制是牺牲异常链路性能来优化正常链路
使用异常的代价是生成的二进制文件更大,编译速度更慢,异常捕获处理链路上性能极其糟糕(多线程环境下更甚)。切忌把异常用于普通的逻辑控制,建议在出现概率小于 0.1% 的错误中使用异常。
如果出现错误后需要结束整个调用链路,可以抛出异常,并且建议在异常对象中包含函数调用链之类的关键信息。如果错误本身就是可预期的,则不建议用异常。
简而言之,不用分析调用栈的错误,多半无需使用异常
相较于禁用异常,现代主流的 C++ 编译器已经能生成性能近似的代码(无异常抛出时),甚至于更进一步提升最短链路下的性能。
建议评估代码逻辑是否需要用异常代替返回状态来压榨性能。这点比较适用于低延迟场景。
或可根据业务场景,通过 setjmp
/ longjmp
半手动构建更灵活的错误处理机制
如果使用返回状态,理想情况下建议 std::optional< (1 ~ 8 byte) >
/ std::pair<long, (1 ~ 8 byte) >
/ long
这类返回值,以便于通过寄存器快速传值,调用方快速判断结果。
明确不会抛出异常的函数(例如 memcmp / memcpy 等基础函数),建议在函数定义和声明处均加上 noexcept
,以便于编译器做出更好的调用方优化。
异常机制对于 OOP 编程模式较为友好,可以增强代码的表达性和兼容性。当需要在大型项目代码中干脏活时,异常无疑是最快的选择。
构造函数失败可抛出异常至上层,否则需引入二段式构造
重载运算符之类的标准接口或不易改造的历史接口
快速在繁杂逻辑中新开辟错误处理路径
附录 汇编基础
push %?
等价于 sub $0x8,%rsp
+ mov %?,0x0(%rsp)
pop %?
等价于 mov 0x0(%rsp),%?
+ add $0x8,%rsp
callq ? <?()>
等价于 push %rip
+ jmpq ? <?()>
retq
等价于 push %rip
+ jmpq ? <?()>
callq ? <?()>
执行前,必须保证 rsp 中的栈顶地址按照 16 对齐
x86_64 calling conventions x86_64-abi :
An Application Binary Interface (ABI) is the interface between two binary program modules that work together. An ABI is a contract between pieces of binary code defining the mechanisms by which functions are invoked and how parameters are passed between the caller and callee.
x86_64
寄存器分类
volatile (caller-saved) 寄存器:RAX, RCX, RDX, RDI, RSI, R8, R9, R10, R11, XMM*, YMM*
nonvolatile (callee-saved) 寄存器:RBX, RBP, RSP, R12, R13, R14, R15
寄存器数据在函数调用前后必须保持一致
如果函数内需要改动寄存器数据,通用的做法是在栈上保存原始数据并还原
System V Application Binary Interface AMD64 Architecture Processor Supplement: 3.2.3 Parameter Passing
函数参数传递规范:
当函数非浮点数参数少于 7 个时,参数从左到右放入寄存器: RDI, RSI, RDX, RCX, R8, R9;当参数为 7 个及以上时,第 7 个参数开始依次从 右向左
放入栈中(栈地址自高到低);
XMM0 ~ XMM7 传递前 8 个浮点数参数,其他通过内存传递
结构体参数传递规则相对复杂,详见文档
1 2 3 4 5 6 7 8 9 10 11 F (a, b, c, d, e, f, g, h, double x0 ... double x7, double x8)a: %rdi b: %rsi c: %rdx d: %rcx e: %r8 f: %r9 g: 0x8 (%rsp) h: 0x16 (%rsp) x0 ~ x7: xmm0 ~ xmm7 x8: 0x24 (%rsp)
函数返回值规范:
返回值小于等于 2 个:RAX 用于保存第 1 个返回值,RDX 用于保存第 2 个返回值; XMM0 和 XMM1 存储前 2 个浮点数返回值;
返回值大于 2 个:由调用方开辟内存空间,并将地址作为第 1 个参数传入 RDI。被调用方将所有返回值依次保存到相关内存空间后,保存起始地址到 RAX。
Reference