Control child process position from ReplayDebugger

RESOLVED FIXED in Firefox 65

Status

()

RESOLVED FIXED
4 months ago
4 months ago

People

(Reporter: bhackett, Assigned: bhackett)

Tracking

(Blocks: 1 bug)

unspecified
mozilla65
Points:
---

Firefox Tracking Flags

(firefox65 fixed)

Details

Attachments

(4 attachments)

(Assignee)

Description

4 months ago
Posted patch patchSplinter Review
Currently, ReplayDebugger can call RecordReplayControl methods that direct the child process to resume or pause, but it does not have exclusive control over when the child process resumes: internal record/replay code has logic around breakpoints that can resume the process all on its own after, say, calling all the breakpoint handlers available when pausing.

This interacts poorly with the devtools server code.  For example, when the user explicitly pauses (at a breakpoint, say) the server's thread actor will push an event loop to process events from the debugger client.  In the non-replaying case the debuggee will remain paused throughout the lifetime of this event loop, since it still sits on the JS stack, but in the replaying case it is possible to resume the child before the event loop finishes, depending on what calls the server makes and the order in which runnables run.

More generally, the code which manages pausing and resuming is complex, poorly documented, and prone to gnarly reentrance problems.  It is broken, and it would be better to rewrite it instead of fix it, as this patch does.  This patch has the following goals:

- Move control of resuming and pausing the child process exclusively into ReplayDebugger.  We enforce that there is a single ReplayDebugger in the server (if ReplayDebuggers are created in different places they will all refer to the same object), which manages its own state related to how the child is resumed and paused.

- Clarify semantics over when the child can pause and resume.  These need to be at least as restrictive as in the non-replaying case.  The child can only resume after control in the server/middleman returns to the main event loop (or more precisely, an event loop that is not associated with a pause pushed by the thread actor).  The child can pause at any time (we have to support this so that the server can synchronously interrupt execution), but breakpoint handlers for the pause will only run when control returns to an event loop as above.

- Flatten out the control flow (mainly using runnables) to avoid reentrance problems, and document invariants.
(Assignee)

Comment 1

4 months ago
The ReplayDebugger will need a common entry point for when the child process has paused, so it doesn't make sense to keep track of breakpoints and their handlers separately and call them directly from C++.  This patch changes the representation of breakpoints in the child and middleman processes from a mapping of IDs -> positions/handlers, and into a set of breakpoint positions which the process should stop at when executing.  This is a nice simplification.
Attachment #9025229 - Flags: review?(continuation)
(Assignee)

Comment 2

4 months ago
Remove a lot of pausing/resuming logic in middleman C++ code.  We still need a little bit, mainly to handle the case when we are running without a debugger attached.  This also changes the pause notification mechanism from being based on the handlers associated with breakpoint IDs into a generic DebuggerOnPause method (called for both breakpoints and checkpoints), and adds a few JS bindings so ReplayDebugger can access some interfaces like WaitUntilPaused and MarkExplicitPause that previously were only used internally.
Attachment #9025230 - Flags: review?(continuation)
(Assignee)

Comment 3

4 months ago
New logic to control pausing and resuming from the ReplayDebugger.  Parts 1 and 2 provide context for this patch, but I'm hoping that it can be followed pretty well on its own.
Attachment #9025231 - Flags: review?(lsmyth)
Comment on attachment 9025229 [details] [diff] [review]
Part 1 - Change breakpoint state to be a set of positions.

Review of attachment 9025229 [details] [diff] [review]:
-----------------------------------------------------------------

Please include your description of the patch into the patch itself.

::: toolkit/recordreplay/ipc/ChildProcess.cpp
@@ +191,5 @@
>      const HitCheckpointMessage& nmsg = static_cast<const HitCheckpointMessage&>(aMsg);
>      mLastCheckpoint = nmsg.mCheckpointId;
>  
>      // All messages sent since the last checkpoint are now obsolete, except
> +    // AddBreakpoint messages.

nit: and ClearBreakpoints.
Attachment #9025229 - Flags: review?(continuation) → review+
Comment on attachment 9025231 [details] [diff] [review]
Part 3 - Control pausing and resuming in ReplayDebugger.

Review of attachment 9025231 [details] [diff] [review]:
-----------------------------------------------------------------

::: devtools/server/actors/replay/debugger.js
@@ +1001,4 @@
>    }
>  }
>  
> +function pointEquals(a, b) {

Is this meant as a utility? It looks like it's not used. I'd be hesitant to compare things this way because the ordering of keys is not guaranteed in JSON, so two objects may have the same keys in two different orders, and usually you'd still want them to be considered equal.
Attachment #9025231 - Flags: review?(lsmyth) → review+
(Assignee)

Comment 6

4 months ago
(In reply to Logan Smyth [:loganfsmyth] from comment #5)
> Comment on attachment 9025231 [details] [diff] [review]
> Part 3 - Control pausing and resuming in ReplayDebugger.
> 
> Review of attachment 9025231 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: devtools/server/actors/replay/debugger.js
> @@ +1001,4 @@
> >    }
> >  }
> >  
> > +function pointEquals(a, b) {
> 
> Is this meant as a utility? It looks like it's not used. I'd be hesitant to
> compare things this way because the ordering of keys is not guaranteed in
> JSON, so two objects may have the same keys in two different orders, and
> usually you'd still want them to be considered equal.

Oops, an earlier version of the patch used this but it is dead code now, and I'll remove it.
Comment on attachment 9025230 [details] [diff] [review]
Part 2 - Bindings and internal changes to allow ReplayDebugger to control child pausing/resuming.

Review of attachment 9025230 [details] [diff] [review]:
-----------------------------------------------------------------

This is certainly a lot simpler!

::: toolkit/recordreplay/ipc/JSControl.cpp
@@ +213,2 @@
>  {
> +  if (gReplayDebugger) {

Maybe use an early return here?

@@ +219,3 @@
>                               HandleValueArray::empty(), &rval))
>      {
> +      Print("Warning: ReplayDebugger hook threw an exception\n");

Why does this ignore the failure rather than returning false?

It would be good to include aMethod in the error message.

::: toolkit/recordreplay/ipc/ParentIPC.cpp
@@ +1179,3 @@
>      Resume(true);
> +  } else if (!js::DebuggerOnPause()) {
> +    gMainThreadMessageLoop->PostTask(NewRunnableFunction("Resume", Resume, true));

Maybe make this "RecvHitCheckpointResume" to be less ambiguous.

@@ +1193,5 @@
> +  // to pause twice at the same point.
> +  if (aMsg.mRecordingEndpoint) {
> +    Resume(true);
> +  } else if (!js::DebuggerOnPause()) {
> +    gMainThreadMessageLoop->PostTask(NewRunnableFunction("Resume", Resume, true));

As above, this would be better as "RecvHitBreakpointResume".
Attachment #9025230 - Flags: review?(continuation) → review+
(Assignee)

Comment 8

4 months ago
(In reply to Andrew McCreight [:mccr8] from comment #7)
> @@ +219,3 @@
> >                               HandleValueArray::empty(), &rval))
> >      {
> > +      Print("Warning: ReplayDebugger hook threw an exception\n");
> 
> Why does this ignore the failure rather than returning false?

Hmm, I forgot to comment in the declaration of DebuggerOnPause() (I'll fix that), but the return value of DebuggerOnPause is whether there was a ReplayDebugger that we notified about the pause.  If it then throws an exception that is a bug in and of itself, but if we return false and indicate there was no ReplayDebugger, the caller will do its default behavior of running forward, which is not what we want to happen here --- when there is a ReplayDebugger, it controls when the child resumes.

Comment 9

4 months ago
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/70a8eb10c67f
Part 1 - Change breakpoint state to be a set of positions, r=mccr8.
https://hg.mozilla.org/integration/mozilla-inbound/rev/1c7fc8389e01
Part 2 - Bindings and internal changes to allow ReplayDebugger to control child pausing/resuming, r=mccr8.
https://hg.mozilla.org/integration/mozilla-inbound/rev/3a3c453432c1
Part 3 - Control pausing and resuming in ReplayDebugger, r=lsmyth.

Comment 10

4 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/70a8eb10c67f
https://hg.mozilla.org/mozilla-central/rev/1c7fc8389e01
https://hg.mozilla.org/mozilla-central/rev/3a3c453432c1
Status: NEW → RESOLVED
Last Resolved: 4 months ago
status-firefox65: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
You need to log in before you can comment on or make changes to this bug.