Wasm baseline: Generate function prologue in-line

RESOLVED FIXED in Firefox 53

Status

()

Core
JavaScript Engine: JIT
P3
normal
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: lth, Assigned: lth)

Tracking

(Blocks: 1 bug)

unspecified
mozilla53
Points:
---

Firefox Tracking Flags

(firefox53 fixed)

Details

Attachments

(2 attachments, 2 obsolete attachments)

(Assignee)

Description

a year ago
Right now the code emitted at the start of the function jumps to the prologue code which is emitted at the end of the function, which then jumps back to the body if the stack overflow check worked out OK.

This means more static code plus more jumping around in every function.  Depending on how you count, it is one or two jumps extra, but since we can probably jump OOL for the stack overflow and flow through the conditional jump if no overflow, then really it's closer to two than one.

It would be nice to generate the code in-line in the prologue.  However, for that to happen, we have to backpatch the stack overflow check because the size of the frame won't be known until all code has been generated for the function.  Initially, therefore, we'd have to emit space for a subtract-constant-from-register that allows for the largest possible frame, and patch it in endFunction().  (At that point we can choose to leave a SUB r, 0 in place or replace it with nops.)

See https://bug1316803.bmoattachments.org/attachment.cgi?id=8812902 for an example of what the current code looks like, though in that case the additional frame size adjustment is zero so the subtraction in the out-of-line prologue code is not emitted.  Imagine it present just after the RSP is being read.
(Assignee)

Comment 1

a year ago
Created attachment 8813164 [details] [diff] [review]
bug1319388-prologue.patch

Roughly like this, but the patching machinery needs more work (both to move it out of the compiler and into the assembler, and to make it work properly), see comments in the code.
(Assignee)

Comment 2

a year ago
Testing shows that many functions are small: bug 1320374 comment 8.  This optimization may be quite useful.
Priority: P5 → P3
(Assignee)

Updated

a year ago
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
(Assignee)

Comment 3

a year ago
Created attachment 8817750 [details] [diff] [review]
bug1319388-inline-prologue.patch

This is general cleanup, really, but it seems worth doing because it's not hard.  It will probably yield a very small performance improvement across platforms and probably a code size improvement on x86 and x64.  It's the sibling of 1319358 (generate return code in-line when we can).
Attachment #8813164 - Attachment is obsolete: true
Attachment #8817750 - Flags: review?(luke)
Yury: with this patch, and assuming the stack depth gets aligned to WasmStackAlignment (which I'll check), the debug trap handler won't have to dynamically align the stack.
Comment on attachment 8817750 [details] [diff] [review]
bug1319388-inline-prologue.patch

Review of attachment 8817750 [details] [diff] [review]:
-----------------------------------------------------------------

Great idea!  Before landing, could you run the MacroAssembler addition by a JIT person (perhaps Jan) to make sure this is the idiomatic way to express a patchable add?  (E.g., should this be in MacroAssembler.h?)

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +592,5 @@
>      ////////////////////////////////////////////////////////////
>      //
> +    // Assembler functionality (should move).
> +
> +    CodeOffset sub32FromStackPointerWithPatch(Register r) {

nit: should this be named add32ToStackPointerWithPatch?  E.g., this makes the AboveOrEqual branch condition after the add make sense.  With this change, a few other names/comments would need to be updated.

@@ +2218,4 @@
>  
>          MOZ_ASSERT(maxFramePushed_ >= localSize_);
>  
> +        patchSub32FromStackPointer(stackSubOffset_, int32_t(maxFramePushed_ - localSize_));

Could you round up the stack alignment so the stack is always WasmStackAlignment-aligned?  With that change, I believe you'll be able to remove FunctionCall::frameAlignAdjustment (since it'll always be zero).
Attachment #8817750 - Flags: review?(luke) → review+
(Assignee)

Comment 6

a year ago
Comment on attachment 8817750 [details] [diff] [review]
bug1319388-inline-prologue.patch

Jan, see comment 5: Are you OK with these masm changes, which are pretty minimal but not cross-platform?

The alternative is perhaps to abstract a pair of functions like add32ToPtrRegWithPatch() / patchAdd32ToPtrReg() and export those functions from the MacroAssembler cross-platform.  Which I can do, if you think it has utility.
Attachment #8817750 - Flags: review?(jdemooij)
Comment on attachment 8817750 [details] [diff] [review]
bug1319388-inline-prologue.patch

Review of attachment 8817750 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Lars T Hansen [:lth] (On PTO until 27 Dec) from comment #6)
> The alternative is perhaps to abstract a pair of functions like
> add32ToPtrRegWithPatch() / patchAdd32ToPtrReg() and export those functions
> from the MacroAssembler cross-platform.  Which I can do, if you think it has
> utility.

Yeah I think this would be nice for consistency. It might also make life a little easier for the MIPS people, as they (hopefully) won't have to touch WasmBaselineCompile.cpp for this when they get to it.
Attachment #8817750 - Flags: review?(jdemooij)
The more general add32ToPtrRegWithPatch() sounds good; in that case it could subsume the existing x86-shared addlWithWithPatch() in Assembler-x86-shared.h.
(Assignee)

Comment 9

a year ago
Created attachment 8822135 [details] [diff] [review]
bug1319388-patchable-add.patch

As discussed, abstracts add32ToPtrWithPatch(src,dest) and patchAdd32ToPtr(offset,value) at the MacroAssembler layer and implements on all platforms (stubs on arm64 and mips).
Attachment #8822135 - Flags: review?(jdemooij)
(Assignee)

Comment 10

a year ago
(In reply to Luke Wagner [:luke] from comment #5)
> Comment on attachment 8817750 [details] [diff] [review]
> bug1319388-inline-prologue.patch
> 
> Review of attachment 8817750 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +2218,4 @@
> >  
> >          MOZ_ASSERT(maxFramePushed_ >= localSize_);
> >  
> > +        patchSub32FromStackPointer(stackSubOffset_, int32_t(maxFramePushed_ - localSize_));
> 
> Could you round up the stack alignment so the stack is always
> WasmStackAlignment-aligned?  With that change, I believe you'll be able to
> remove FunctionCall::frameAlignAdjustment (since it'll always be zero).

I think that may be a misunderstanding brought on by the names I had chosen for the helper functions (sub32FromStackPointerWithPatch and patchSub32FromStackPointer), as those functions do not actually update the stack pointer but compute temp=SP+patchedValue.  The compiler still uses naked push/pop stack manipulation everywhere; the stack frame is still dynamically sized.

Or if you were thinking of something else: what were you thinking of?
Flags: needinfo?(luke)
(Assignee)

Comment 11

a year ago
Created attachment 8822136 [details] [diff] [review]
bug1319388-inline-prologue-v2.patch

Updated compiler patch, carrying r+.
Attachment #8817750 - Attachment is obsolete: true
Attachment #8822136 - Flags: review+
Ah yes, you're right, some wires got crossed in my head, so n/m.  In the future, though, we could have a constant stack depth and change all the pushing/popping to access sp[K] (so a push is K++).  The benefit would be no need to maintain a virtual (memory) fp register (or fine-grained stack-depth pc ranges) to support async interrupt + stack-walk.
Flags: needinfo?(luke)
Comment on attachment 8822135 [details] [diff] [review]
bug1319388-patchable-add.patch

Review of attachment 8822135 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #8822135 - Flags: review?(jdemooij) → review+
(Assignee)

Comment 14

a year ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=28dfc715a7e5
(Assignee)

Comment 15

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/e0e05d0bc34b19e5c034de40f1435cde9a803226
Bug 1319388 - Add add32ToPtrWithPatch and patchAdd32ToPtr to MacroAssembler. r=jandem

https://hg.mozilla.org/integration/mozilla-inbound/rev/0de5f500516b0d576fd77f342ac2570b536e8115
Bug 1319388 - wasm baseline: Generate function prologue in-line by patching. r=luke

Comment 16

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/e0e05d0bc34b
https://hg.mozilla.org/mozilla-central/rev/0de5f500516b
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
status-firefox53: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
You need to log in before you can comment on or make changes to this bug.