Open Bug 1766963 Opened 2 years ago Updated 1 year ago

Exception handling setup generates a lot of code (also, write barriers are too large)

Categories

(Core :: JavaScript: WebAssembly, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: lth, Unassigned)

References

(Blocks 1 open bug)

Details

I'm filing this under WebAssembly because that there are tricks we're not using, but I suspect we'll need to do something on the GC side too, eventually.

Consider:

wasmDis(new WebAssembly.Module(wasmTextToBinary(`
(module
  (global $g (mut i32) (i32.const 0))
  (func $f (param i32) (result i32)
    (global.set $g (i32.add (global.get $g) (i32.const 1)))
    try $L (result i32)
      (call $g (local.get 0))
    catch_all
      (global.set $g (i32.sub (global.get $g) (i32.const 1)))
      rethrow $L
    end)
  (func $g (param i32) (result i32) (local.get 0)))`)))

This is typical C++ RAII code that we might see in an application: a catch_all makes sure to run a destructor during unwinding. (Yeah, the destructor should be run after the call too but I'm simplifying.)

The MIR ends up looking like this:

  0:wasmparameter
  1:wasmparameter
  2:wasmloadglobalvar (1:wasmparameter)
  3:constant 0x1
  4:add (2:wasmloadglobalvar (1:wasmparameter)) (3:constant 0x1)
  5:wasmstoreglobalvar (4:add (2:wasmloadglobalvar) (3:constant 0x1)) (1:wasmparameter)
  6:wasmcallcatchable (0:wasmparameter) (1:wasmparameter)
  --
  7:wasmregisterresult
  8:goto
  --
  9:wasmcalllandingprepad
  10:goto
  --
  11:wasmloadtls (1:wasmparameter)
  12:wasmnullconstant
  13:wasmderivedpointer (1:wasmparameter)
  14:wasmstoreref (1:wasmparameter) (13:wasmderivedpointer (1:wasmparameter)) (12:wasmnullconstant)
  15:wasmcalluncatchable (13:wasmderivedpointer (1:wasmparameter)) (12:wasmnullconstant) (1:wasmparameter)
  16:wasmderivedpointer (1:wasmparameter)
  17:wasmstoreref (1:wasmparameter) (16:wasmderivedpointer (1:wasmparameter)) (12:wasmnullconstant)
  18:wasmcalluncatchable (16:wasmderivedpointer (1:wasmparameter)) (12:wasmnullconstant) (1:wasmparameter)
  19:goto
  --
  20:wasmloadglobalvar (1:wasmparameter)
  21:sub (20:wasmloadglobalvar (1:wasmparameter)) (3:constant 0x1)
  22:wasmstoreglobalvar (21:sub (20:wasmloadglobalvar) (3:constant 0x1)) (1:wasmparameter)
  23:wasmcalluncatchable (11:wasmloadtls (1:wasmparameter)) (1:wasmparameter)
  24:wasmtrap
  --
  25:wasmreturn (7:wasmregisterresult) (1:wasmparameter)

The suspicious part is 11 through 18, there are multiple calls here and the machine code is even worse:

;; Prologues elided

;; Increment global
00000024  41 8b 86 30 01 00 00      movl 0x130(%r14), %eax
0000002B  83 c0 01                  add $0x01, %eax
0000002E  41 89 86 30 01 00 00      movl %eax, 0x130(%r14)

;; Call $g
00000040  e8 4b 01 00 00            call 0x0000000000000190
00000045  8b c0                     mov %eax, %eax

;; Jump to return point
00000047  e9 2e 01 00 00            jmp 0x000000000000017A

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Exception landing pad begins

0000004C  49 8b 46 28               movq 0x28(%r14), %rax   ;; load TLS::pendingException_
00000050  48 89 44 24 18            movq %rax, 0x18(%rsp)   ;;   and spill it
00000055  48 33 c0                  xor %rax, %rax          ;; generate null
00000058  48 89 44 24 10            movq %rax, 0x10(%rsp)   ;;   and spill it (yay)

0000005D  49 8b c6                  mov %r14, %rax          ;; compute
00000060  48 83 c0 28               add $0x28, %rax         ;;   &TLS::pendingException_
00000064  48 8b d0                  mov %rax, %rdx          ;; into PreBarrierReg
00000067  48 8b d8                  mov %rax, %rbx          ;; and into temp
0000006A  48 8b 44 24 10            movq 0x10(%rsp), %rax   ;; load spilled null
0000006F  49 8b 4e 48               movq 0x48(%r14), %rcx   ;; gc state
00000073  f7 01 01 00 00 00         testl $0x01, (%rcx)     ;;   filter and
00000079  0f 84 12 00 00 00         jz 0x0000000000000091   ;;     skip if not active
0000007F  48 8b 0a                  movq (%rdx), %rcx       ;; old ptr
00000082  48 85 c9                  test %rcx, %rcx         ;;   filter and
00000085  0f 84 06 00 00 00         jz 0x0000000000000091   ;;     skip if null
0000008B  49 8b 4e 78               movq 0x78(%r14), %rcx   ;; prebarrier address
0000008F  ff d1                     call %rcx               ;;   and invoke it
00000091  48 89 02                  movq %rax, (%rdx)       ;; store pointer
00000094  48 8b f3                  mov %rbx, %rsi          ;; setup param #2
00000097  48 8b d0                  mov %rax, %rdx          ;; setup param #3
000000A5  4c 89 74 24 08            movq %r14, 0x08(%rsp)   ;; spill tls
000000AA  4c 89 34 24               movq %r14, (%rsp)       ;;   and again
000000AE  49 8b fe                  mov %r14, %rdi          ;;     and setup param #1
000000B1  48 b8 40 a5 bd e5 c0 2f 00 00 
                                  mov $0x2FC0E5BDA540, %rax ;; postbarrier
000000BB  ff d0                     call %rax               ;;   call
000000BD  4c 8b 74 24 08            movq 0x08(%rsp), %r14   ;; restore tls
000000C2  4d 8b 3e                  movq (%r14), %r15       ;; restore heapreg

000000C5  49 8b c6                  mov %r14, %rax          ;; compute
000000C8  48 83 c0 30               add $0x30, %rax         ;;   &TLS::pendingExceptionTag_
000000CC  48 8b d0                  mov %rax, %rdx          ;; etc
000000CF  48 8b d8                  mov %rax, %rbx
000000D2  48 8b 4c 24 10            movq 0x10(%rsp), %rcx
000000D7  49 8b 46 48               movq 0x48(%r14), %rax
000000DB  f7 00 01 00 00 00         testl $0x01, (%rax)
000000E1  0f 84 12 00 00 00         jz 0x00000000000000F9
000000E7  48 8b 02                  movq (%rdx), %rax
000000EA  48 85 c0                  test %rax, %rax
000000ED  0f 84 06 00 00 00         jz 0x00000000000000F9
000000F3  49 8b 46 78               movq 0x78(%r14), %rax
000000F7  ff d0                     call %rax
000000F9  48 89 0a                  movq %rcx, (%rdx)
000000FC  48 8b f3                  mov %rbx, %rsi
000000FF  48 8b d1                  mov %rcx, %rdx
0000010D  4c 89 74 24 08            movq %r14, 0x08(%rsp)
00000112  4c 89 34 24               movq %r14, (%rsp)
00000116  49 8b fe                  mov %r14, %rdi
00000119  48 b8 40 a5 bd e5 c0 2f 00 00 
                                    mov $0x2FC0E5BDA540, %rax
00000123  ff d0                     call %rax
00000125  4c 8b 74 24 08            movq 0x08(%rsp), %r14
0000012A  4d 8b 3e                  movq (%r14), %r15

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Body of exception handler begins

;; Decrement global
0000012D  41 8b 86 30 01 00 00      movl 0x130(%r14), %eax
00000134  83 e8 01                  sub $0x01, %eax
00000137  41 89 86 30 01 00 00      movl %eax, 0x130(%r14)

;; Rethrow
0000013E  48 8b 74 24 18            movq 0x18(%rsp), %rsi
0000014E  4c 89 74 24 08            movq %r14, 0x08(%rsp)
00000153  4c 89 34 24               movq %r14, (%rsp)
00000157  49 8b fe                  mov %r14, %rdi
0000015A  48 b8 80 a6 bd e5 c0 2f 00 00 
                                    mov $0x2FC0E5BDA680, %rax
00000164  ff d0                     call %rax
00000166  85 c0                     test %eax, %eax
00000168  0f 89 02 00 00 00         jns 0x0000000000000170
0000016E  0f 0b                     ud2
00000170  4c 8b 74 24 08            movq 0x08(%rsp), %r14
00000175  4d 8b 3e                  movq (%r14), %r15
00000178  0f 0b                     ud2

;; Return point
0000017A  48 83 c4 20               add $0x20, %rsp
0000017E  5d                        pop %rbp
0000017F  c3                        ret

What concerns me here is firstly the clearing of the exception state and secondly the size and cost of the barrier code.

The code to clear the exception state is fairly large, and I believe we will have code like this basically everywhere, given the (empirically measured) prevalence of checked calls in C++ code. The postbarriers impose a mandatory call anyway, so there's really no obvious reason why setPendingExceptionState could not be a call to the runtime, for a probably significant code size savings. That said, since the values being stored are null constants, the barrier filtering will kick in and there will never be a prebarrier call; we should take advantage of this by implementing a simple optimization that removes the prebarrier. (A misreading on my part.)

Secondarily, the write barriers. Once reference types are ubiquitous in WebAssembly with the GC proposal, write barrier code size will be a problem, as will any mandatory postbarrier call. This is going to need to be addressed somehow.

Finally, there are obvious micro-optimizations here we should pursue: null values should probably be emitted at uses or rematerialized instead of spilled and restored; some moves could be avoided (if we keep code in-line); exception handling code should possibly be out-of-line rather than in-line; address computation could use LEA, certainly on a cold path like this; we could call a postbarrier that preserves wasm state to reduce in-line code size; i believe that the exception fields will never be null in this situation too, and if that's true then the null filter in the prebarrier is redundant.

Bug 1799999 is for improving the write barrier.

You need to log in before you can comment on or make changes to this bug.