Abort/delay GC while wasm frames with anyref are on the stack

RESOLVED FIXED in Firefox 61

Status

()

enhancement
P3
normal
RESOLVED FIXED
a year ago
a month ago

People

(Reporter: lth, Assigned: lth)

Tracking

unspecified
mozilla61
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox61 fixed)

Details

Attachments

(3 attachments, 5 obsolete attachments)

(Assignee)

Description

a year ago
For milestone1 we will not support tracing through anyref types in wasm frames, but gc work can still occur when such wasm frames are on the stack; we must at least prevent gc work from completing while wasm frames are on the stack.  This could be one of:

- don't trace with wasm frames on the stack (eg, block the start of tracing if
  a frame is on the stack, or abort the trace slice if a frame is seen)
- don't complete tracing while wasm frames are on the stack

Investigation needed.
(Assignee)

Comment 1

a year ago
Chatted with Jon about this; discussion not over yet.

It may be that it'll just be easier to support tracing properly even for milestone1, esp if we only support the baseline compiler, than to try to (partially) disable GC.  Possibly the stack maps for milestone1 could be very simple or slow, instead.

(Don't know yet about entirely disabling GC, but my experience with GC'd systems is that they rarely have a concept of not allowing GC.)

Jon raised the issue of cross-compartment wrappers and whether the refs visible to wasm could be cross-compartment, as that has additional complexity.  Obviously for direct access from wasm this is a problem so we'll need some kind of response to it in any case.
(Assignee)

Updated

a year ago
Assignee: nobody → lhansen
(Assignee)

Comment 2

a year ago
One possibility is to neuter the collection if a wasm frame is seen on the stack during MarkRoots, and to disable compaction if wasm+anyref is or may be present.  Compaction should be disabled (if possible) because it does another root scan and we don't need the complexity of dealing with that, even if, in principle, compaction could go ahead if there are no wasm frames on the stack.

The neutering may perhaps take the form of resetting the collection + delaying the start of the next collection (normally one is scheduled to start immediately after reset, so this requires some engineering), or of setting a flag and then marking everything in the compartment live when that flag is set (no canned functionality for that at present) and letting the collection continue normally.
(Assignee)

Comment 3

a year ago
Of course, nothing is simple: one cannot reset the collection (currently) during root scan, it needs to be delayed (though that's not hard), and the above comments about MarkRoots pertain to incremental collection only.  But the principle is still somewhat plausible.
(Assignee)

Comment 4

a year ago
This "works" but appears to be brittle in a way I don't understand yet.  I'll attach a simple test case too.

If I set a breakpoint in GCRuntime::checkIfGCAllowedInCurrentState() I usually get to the breakpoint OK but then when I try to step I segv.  But without the breakpoint things are apparently fine.  This suggests eg a timer firing somewhere else, trying to "do something".
(Assignee)

Comment 5

a year ago
Posted file suppress.js
The call to gc() should not cause a GC to happen; set a breakpoint in GCRuntime::checkIfAllowedInCurrentState() after the initial check, and you will see that you do not reach that point.
(Assignee)

Comment 6

a year ago
Posted file suppress.js
Here's a variant on the same TC that doesn't call gc but allocates a binary tree 22 levels deep.  It runs just fine, but gc is not triggered while wasm is on the stack.  Jon says that when the nursery is full we just start allocating in the tenured area instead.
Will this also delay/abort nursery GC? I'm a bit worried about suppressing GC for arbitrary code.
(Assignee)

Comment 8

a year ago
(In reply to Jan de Mooij [:jandem] from comment #7)

> Will this also delay/abort nursery GC? I'm a bit worried about suppressing
> GC for arbitrary code.

Oh, this is just for experimentation, we could never ship it.  We might be able to land it in Nightly but it would have to be behind a very ugly-sounding pref.  And if we're good we find additional mitigations, eg, we only do this if the compartment has any wasm code that actually uses anyref, say.

In general we don't want to perturb GC too much, so when wasm is not on the stack we want things to be as "normal" as possible.  The hack attached here now does that, I think, unless nursery overflow disables generational GC, though I don't know why it would do that.
(Assignee)

Comment 9

a year ago
(In reply to Lars T Hansen [:lth] from comment #4)
> Created attachment 8961777 [details] [diff] [review]
> bug1445277-suppress-gc-wip.patch
> 
> This "works" ...

One thing I know I don't know anything about is what happens if a GC has already started when we suppress.  My expectation is that the suppression flag prevents the GC from advancing further (except background sweeping), and since root marking is atomic and at the beginning of the GC while no user code is running we should not be interrupting that somehow.  But there could be some dragons here if we suppress for a long time of while a lot of memory is allocated, if only because that case is poorly tested.
(Assignee)

Comment 10

a year ago
This sits on top of the first two patches in bug 1445272.  If gcTypesEnabled is true in the ModuleEnvironment then we emit extra code around entry into Wasm to suppress GC while the Wasm call is active.  It has the exact effect of having AutoSuppressGC in a tight scope around the invocation of the Wasm function.

Passes all wasm tests (so far), so probably reasonably stable.  A little more testing will be good.
Attachment #8961777 - Attachment is obsolete: true
(Assignee)

Comment 11

a year ago
Rebased to current patch queue in bug 1445272.
Attachment #8964937 - Attachment is obsolete: true
(Assignee)

Comment 12

a year ago
Make wasm stubs that call into wasm have the effect of AutoSuppressGC.
Attachment #8966958 - Attachment is obsolete: true
Attachment #8966993 - Flags: review?(bbouvier)
Comment on attachment 8966993 [details] [diff] [review]
bug1445277-suppress-gc-during-wasm.patch

Review of attachment 8966993 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, just a few questions about the push/pop WasmTlsReg below.

::: js/src/wasm/WasmCode.cpp
@@ +675,4 @@
>      const FuncExportVector& funcExports = codeTier.metadata().funcExports;
>      uint8_t* moduleSegmentBase = codeTier.segment().base();
>  
> +    bool gcTypesEnabled = codeTier.code().metadata().temporaryHasGcTypes == HasGcTypes::True;

nit: for consistency with updated patches, can you directly use a HasGcTypes parameter in GenerateEntryStubs?

@@ +686,4 @@
>                            moduleRanges[fe.interpCodeRangeIndex()].funcNormalEntry();
>          Maybe<ImmPtr> callee;
>          callee.emplace(calleePtr, ImmPtr::NoCheckToken());
> +        if (!GenerateEntryStubs(masm, funcExportIndex, fe, callee, /* asmjs*/ false, gcTypesEnabled,

preexisting: missing space after asmjs

::: js/src/wasm/WasmStubs.cpp
@@ +400,5 @@
>      else
>          masm.loadPtr(Address(masm.getStackPointer(), argBase + arg.offsetFromArgBase()), WasmTlsReg);
>  
> +#ifdef ENABLE_WASM_GC
> +    WasmPush(masm, WasmTlsReg);

As said on irc, I thought WasmTlsReg was preserved across calls, thus not necessary to manually preserve it. (holding r+ on this)

Also, could the WasmPush be within the `if` body below?

@@ +461,5 @@
>  
> +#ifdef ENABLE_WASM_GC
> +    WasmPop(masm, WasmTlsReg);
> +    if (gcTypesEnabled)
> +        SuppressGC(masm, -1, WasmTlsReg);

ditto

@@ +583,5 @@
> +#ifdef ENABLE_WASM_GC
> +# ifdef JS_CODEGEN_ARM64
> +    unsigned savedTlsSize = 2 * sizeof(void*);
> +# else
> +    unsigned savedTlsSize = sizeof(void*);

Any chance we could use something like AlignBytes to automatically get the right padding, and not ifdef ARM64?
(Assignee)

Updated

a year ago
Depends on: 1453416
(Assignee)

Comment 14

a year ago
(In reply to Benjamin Bouvier [:bbouvier] from comment #13)
> Comment on attachment 8966993 [details] [diff] [review]
> bug1445277-suppress-gc-during-wasm.patch
> 
> Review of attachment 8966993 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/wasm/WasmStubs.cpp
> @@ +400,5 @@
> >      else
> >          masm.loadPtr(Address(masm.getStackPointer(), argBase + arg.offsetFromArgBase()), WasmTlsReg);
> >  
> > +#ifdef ENABLE_WASM_GC
> > +    WasmPush(masm, WasmTlsReg);
> 
> As said on irc, I thought WasmTlsReg was preserved across calls, thus not
> necessary to manually preserve it. (holding r+ on this)

Actually not.  This was explored in bug 1453416 but (embarassingly, since seeing this earlier would have saved us time and effort) there is a comment before each CallFuncExport that states explicitly that this is not so and why it is not so.

Luke mentions in bug 1453416 that the canonical way of getting the tls after the call is to reload it from the frame, but the stub does not set up a standard frame and as far as I can see the Tls is not easily loadable that way here.  (Indeed the FP is used as a result value and may have non-pointer payloads.)  Hence I think the save/reload methods I've introduced here are the simplest, at least for the time being.  Also remember this is temporary code.

> Also, could the WasmPush be within the `if` body below?

It cannot, because there are various static, global constants about push sizes that are unaware of whether the GC hack is enabled or not.  We could change that but I don't think we should; after all we will remove this code again.
> @@ +583,5 @@
> > +#ifdef ENABLE_WASM_GC
> > +# ifdef JS_CODEGEN_ARM64
> > +    unsigned savedTlsSize = 2 * sizeof(void*);
> > +# else
> > +    unsigned savedTlsSize = sizeof(void*);
> 
> Any chance we could use something like AlignBytes to automatically get the
> right padding, and not ifdef ARM64?

I agree that's pretty messy.  I didn't use AlignBytes, but simplified it in a different way, hopefully it is better now.
(Assignee)

Comment 15

a year ago
Same patch but with a fair amount of cleanup to address nits and clarify naming and semantics.
Attachment #8966993 - Attachment is obsolete: true
Attachment #8966993 - Flags: review?(bbouvier)
Attachment #8967283 - Flags: review?(bbouvier)
Comment on attachment 8967283 [details] [diff] [review]
bug1445277-suppress-gc-during-wasm-v2.patch

Review of attachment 8967283 [details] [diff] [review]:
-----------------------------------------------------------------

Nice refactoring, looks good to me, thanks!

(I just realized that if wasm gets interrupted and stopped, via the slow script dialog for instance, no GCs will ever happen anymore. Oh well, this is certainly a very edge case, so it shouldn't prevent landing here.)
Attachment #8967283 - Flags: review?(bbouvier) → review+
Comment on attachment 8967283 [details] [diff] [review]
bug1445277-suppress-gc-during-wasm-v2.patch

Review of attachment 8967283 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmCode.cpp
@@ +675,4 @@
>      const FuncExportVector& funcExports = codeTier.metadata().funcExports;
>      uint8_t* moduleSegmentBase = codeTier.segment().base();
>  
> +    HasGcTypes gcTypesEnabled = codeTier.code().metadata().temporaryHasGcTypes;

Actually, just checked tests: this doesn't work, because code() hasn't been committed yet on the codeTier (second tier) we're operating. So codeTier.code() is nullptr. Probably we need to pass the hasGcTypes parameter from the callers instead.

(wasm/ion-lazy-tables failing because of this)
Posted patch updated.patchSplinter Review
Updated patch, ftr.
Attachment #8967283 - Attachment is obsolete: true
Attachment #8967727 - Flags: review+
Status: NEW → ASSIGNED

Comment 19

a year ago
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3d46648331c1
Suppress GC when wasm is active and running with wasm-gc support.  r=bbouvier

Comment 20

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/3d46648331c1
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Depends on: 1535482
(Assignee)

Updated

a month ago
No longer depends on: 1535482
You need to log in before you can comment on or make changes to this bug.