GC at deterministic points when recording or replaying.

RESOLVED FIXED in Firefox 63

Status

()

enhancement
RESOLVED FIXED
Last year
Last year

People

(Reporter: bhackett, Assigned: bhackett)

Tracking

(Blocks 1 bug)

Trunk
mozilla63
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox63 fixed)

Details

Attachments

(3 attachments)

Currently, we allow GCs to occur at different points when replaying than they did while recording.  This has turned out to be problematic, for several reasons:

1. There are a lot of GC related runnables that we have to make sure either aren't posted or use another mechanism to run to make sure they don't affect the recording.

2. Several parts of the GC which run under callbacks, including some Gecko root tracing code and code that runs after finalization finishes, are handled by passing through thread events while they run.  While this works superficially, it isn't quite kosher and could cause divergence during replay.

3. This prevents us (I'm pretty sure, I haven't looked at the related code closely in a while) from being able to enable incremental GC when recording/replaying, since IGC slices execute as part of Gecko runnables.

For all these reasons, I think it is better if we make sure that GCs happen at the same time when replaying as they did while recording.  They will still behave non-deterministically, in that the set of things marked and swept may differ between recording and replay.
This removes all the modifications made to Gecko that are in place due to GCs occurring at different points.
Assignee: nobody → bhackett1024
Attachment #8996046 - Flags: review?(continuation)
Nursery collections usually happen due to the nursery filling up, which can happen at different points between recording and replay.  Rather than doing a lot of reworking here it seems best to just disable the nursery while recording/replaying.
Attachment #8996047 - Flags: review?(jcoppeard)
This patch ensures that GCs happen at deterministic points, by refusing to GC for reasons that occur non-deterministically (e.g. based on allocation thresholds).  It also narrows the range of execution where recorded events are disallowed to the main marking and sweeping routines, rather than the entire GC.

An alternative strategy here would be to handle non-deterministic GC reasons via the record/replay trigger mechanism, which would allow these GCs to happen at deterministic points.  I think we'll want this eventually, but it hasn't proved itself necessary yet.
Attachment #8996050 - Flags: review?(jcoppeard)
Comment on attachment 8996047 [details] [diff] [review]
Part 2 - Disable nursery when recording/replaying.

Review of attachment 8996047 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/Nursery.cpp
@@ +156,5 @@
>      if (!freeMallocedBuffersTask || !freeMallocedBuffersTask->init())
>          return false;
>  
> +    // The nursery is permanently disabled when recording or replaying. Nursery
> +    // collections may occur at non-deterministic points in execution.

I take it from this that we create a new runtime for recording or replaying?  The MDN pages says "A recording content process differs from a normal content process" so I guess this is so.
Attachment #8996047 - Flags: review?(jcoppeard) → review+
Comment on attachment 8996050 [details] [diff] [review]
Part 3 - Require GCs to happen at deterministic points.

Review of attachment 8996050 [details] [diff] [review]:
-----------------------------------------------------------------

> An alternative strategy here would be to handle non-deterministic GC reasons via the record/replay trigger mechanism, which would allow these GCs to happen at deterministic points.

It would make sense to me to do it that way.

::: js/src/gc/GC.cpp
@@ +3326,5 @@
> +      case JS::gcreason::LAST_DITCH:
> +      case JS::gcreason::TOO_MUCH_MALLOC:
> +      case JS::gcreason::ALLOC_TRIGGER:
> +      case JS::gcreason::DELAYED_ATOMS_GC:
> +      case JS::gcreason::TOO_MUCH_WASM_MEMORY:

Looking at the list of reasons, it seems there are others that are non-deterministic too.  For example FULL_GC_TIMER is based on a timer and MEM_PRESSURE comes from the OS (somehow).  But maybe you're recording and replaying these events so that they become deterministic in this context.
Attachment #8996050 - Flags: review?(jcoppeard) → review+
MEM_PRESSURE is definitely non-deterministic, but IIRC we only generate these events on Windows right now, which is likely why it didn't come up.
(In reply to Jon Coppeard (:jonco) from comment #4)
> Comment on attachment 8996047 [details] [diff] [review]
> Part 2 - Disable nursery when recording/replaying.
> 
> Review of attachment 8996047 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/gc/Nursery.cpp
> @@ +156,5 @@
> >      if (!freeMallocedBuffersTask || !freeMallocedBuffersTask->init())
> >          return false;
> >  
> > +    // The nursery is permanently disabled when recording or replaying. Nursery
> > +    // collections may occur at non-deterministic points in execution.
> 
> I take it from this that we create a new runtime for recording or replaying?
> The MDN pages says "A recording content process differs from a normal
> content process" so I guess this is so.

Well, whether a process is recording or replaying is set at process startup (near the beginning of main()), so there is never a time when we will create a normal content process and change it to a recording/replaying process later on.  A recording/replaying content process will create the same JSRuntimes as a normal content process, though (one for the main thread, one for each worker).
(In reply to Jon Coppeard (:jonco) from comment #5)
> Comment on attachment 8996050 [details] [diff] [review]
> Part 3 - Require GCs to happen at deterministic points.
>
> ::: js/src/gc/GC.cpp
> @@ +3326,5 @@
> > +      case JS::gcreason::LAST_DITCH:
> > +      case JS::gcreason::TOO_MUCH_MALLOC:
> > +      case JS::gcreason::ALLOC_TRIGGER:
> > +      case JS::gcreason::DELAYED_ATOMS_GC:
> > +      case JS::gcreason::TOO_MUCH_WASM_MEMORY:
> 
> Looking at the list of reasons, it seems there are others that are
> non-deterministic too.  For example FULL_GC_TIMER is based on a timer and
> MEM_PRESSURE comes from the OS (somehow).  But maybe you're recording and
> replaying these events so that they become deterministic in this context.

I'll update the comment to be clearer, but determinism/non-determinism here is with respect to differences in behavior between recording and replaying, which is a much narrower set than the behaviors that may vary between different runs of the browser.  The amount of data allocated by malloc or for GC things may vary between recording and replaying, but other behaviors like timers and messages from the OS (or over IPC, etc.) will replay exactly as they happened while recording.
Attachment #8996046 - Flags: review?(continuation) → review+
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/080d6b5d32e2
Part 1 - Remove instrumentation related to non-deterministic GCs, r=mccr8.
https://hg.mozilla.org/integration/mozilla-inbound/rev/0b9d45318e25
Part 2 - Disable nursery when recording/replaying, r=jonco.
https://hg.mozilla.org/integration/mozilla-inbound/rev/6b45f9604ee3
Part 3 - Require GCs to happen at deterministic points, r=jonco.
You need to log in before you can comment on or make changes to this bug.