Generalize the write barrier machinery

RESOLVED FIXED in Firefox 63

Status

()

enhancement
P1
normal
RESOLVED FIXED
Last year
Last year

People

(Reporter: lth, Assigned: lth)

Tracking

unspecified
mozilla63
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox63 fixed)

Details

Attachments

(2 attachments)

A couple of things:

- Structure fields don't fit very nicely into the existing PostBarrierArg
  mechanism; even if we were to widen the argument to uintptr_t size we
  could not hope to pass eg object pointer + offset within the one word
  that is allowed us.

  Instead we'd probably prefer to compute the cell address and then pass the
  address directly to the barrier code.  Then it fits in one word again.

  (We can tag this address if we like, to distinguish it from the PostBarrierArg
  for globals; the two low bits should for sure be available.  But see below.)

- The common case for globals will be that the prebarrier does not fire but
  that the postbarrier does fire.  The prebarrier does not usually fire
  because incremental gc is not usually ongoing.  The postbarrier will
  usually fire because the pointer being stored usually will point into the
  nursery, and because the container containing the global cell will
  usually be non-nursery.  The latter point is either because it is a private
  global whose storage is always non-nursery, or because the heap-allocated
  cell for the global has become tenured, as we expect for cells.

  In this common case, Instance::postBarrier() will compute the location pointer
  for the global cell.  The jitted code has an elaborate song and dance to
  compute the location pointer only when necessary, but since it will usually
  be necessary to compute it anyway we should just compute it in the jitted
  code always.

  This complexity is my fault - I made Benjamin split the prebarrier code
  so that we only compute the pointer when it's needed, not realizing that it is
  needed much more often than I expected.

In consequence of these two observations - the code for field setting will always compute the cell address and the common path for global setting will also compute the cell address - I think we should simplify code so that the barrier always receives the cell address.

(This also paves the way for a possible later optimization where we just call directly into the GC's StoreBuffer::putCell code because we have all the information we need in our own TLS, but that's for another day.)

The one remaining sticking point is that we must use the PreBarrierReg for the cell address because that's where the preBarrier needs to find it; this may increase spilling in the baseline compiler if the preBarrier ends up *not* being called, since that register may be in use already.  But we should be able to select the register for the address conditionally based on the result of testNeedPreBarrier(), it shouldn't be awful.
(In reply to Lars T Hansen [:lth] from comment #0)

> ... But we should be able to select the register for the address conditionally
> based on the result of testNeedPreBarrier(), it shouldn't be awful.

That's an insane remark.  Please disregard.
Generalize the tracing of global cells to also trace them when they are Ref, not just AnyRef.
Attachment #8995223 - Flags: review?(bbouvier)
Generalize the barrier code as described in comment 0.

(I also ended up moving the code in WasmBaselineCompiler.cpp because it was in the middle of the section for branch optimization, so the diff looks more severe than it really is.)
Attachment #8995225 - Flags: review?(bbouvier)
Attachment #8995223 - Flags: review?(bbouvier) → review+
Comment on attachment 8995225 [details] [diff] [review]
bug1478616-generalize-write-barrier.patch

Review of attachment 8995225 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.

::: js/src/wasm/WasmInstance.cpp
@@ +482,4 @@
>  
>  #ifdef ENABLE_WASM_GC
>  /* static */ void
> +Instance::postBarrier(Instance* instance, void* location)

We could also cheat and pretend location had a gc::Cell** type.
Attachment #8995225 - Flags: review?(bbouvier) → review+
The second patch is incorrect, because it calls needRef() to obtain a second scratch register inside one branch of a control flow diamond.  needRef() may sync(), which means the join point may have different expression stacks coming in from the two arms, which is bad.

This is basically why there is an unstated rule that register allocation does not happen on this level, but in the higher-level code (in this case emitSetGlobal); when that can't happen we introduce weirdo structures like the PopAtomicXchg64Regs RAII classes.

(Didn't find this in testing; realized it before falling asleep last night.)

I wish there were some way of guarding against this problem, but it would probably involve several RAII classes and some way of connecting them during compilation to model stack operations - complicated & expensive.  So we just try to do our best.
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/55e008024f24
Wasm, trace Ref as well as Anyref global and Val.  r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/e713a94f6e55
Generalize the wasm write barrier.  r=bbouvier
https://hg.mozilla.org/mozilla-central/rev/55e008024f24
https://hg.mozilla.org/mozilla-central/rev/e713a94f6e55
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
You need to log in before you can comment on or make changes to this bug.