Manage recording/replaying child processes from JS

RESOLVED FIXED in Firefox 66

Status

()

enhancement
RESOLVED FIXED
6 months ago
5 months ago

People

(Reporter: bhackett, Assigned: bhackett)

Tracking

unspecified
mozilla66
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox66 fixed)

Details

Attachments

(9 attachments)

Assignee

Description

6 months ago
There is a lot of logic to manage and coordinate the child processes that can be spawned by a middleman when using web replay.  This logic is all in C++ and supports an interface based on the idea that only a single process (the active one) should be presented to the caller (the devtools server code), regardless of what goes on under the hood.  This is inadequate for changes that will be coming soon for bug 1513118, which needs to allow the devtools server to interact with processes other than the active one.

Rather than muddy up the existing C++/JS interface with changes to allow other processes to be active, I think it is better to change things so that details of which child process is being operated on become an explicit part of the interface.  The interface still hides some details of child process IPC --- mainly, the details of synchronous operations that require some back and forth IPC with a child, like debugger requests --- but the calling JS becomes much more powerful and refined in its ability to control the children.

The main ramification of this change is that the existing C++ logic for coordinating the child processes can be moved into JS.  The attached patch makes this change: in all, roughly 1300 lines of C++ have been replaced by 1000 lines of JS and 300 lines of new C++ JS binding code.

I'm hoping that the new C++/JS interface will allow future changes to how child processes are managed (e.g. bug 1513118) to be implemented entirely in JS.
Assignee

Comment 1

6 months ago
Posted patch patchSplinter Review
Assignee

Comment 2

6 months ago
There isn't any distinction made in other parts of the system between whether we pause due to hitting a checkpoint or a breakpoint, so merging these into a single message simplifies things downstream.
Attachment #9033479 - Flags: review?(continuation)
Assignee

Updated

6 months ago
Attachment #9033479 - Attachment description: Part 1 - Part 1 - Merge HitCheckpoint and HitBreakpoint messages. → Part 1 - Merge HitCheckpoint and HitBreakpoint messages.
Assignee

Comment 3

6 months ago
Part 1 needs to print out the contents of an ExecutionPoint in record/replay spew, so this patch moves the ToString methods on BreakpointPosition and ExecutionPoint into the associated structures so that they can be called from anywhere.
Attachment #9033480 - Flags: review?(lsmyth)
Assignee

Comment 4

6 months ago
Memory management of channel messages is currently pretty ugly, with bare pointers being passed around and then freed in various places.  This patch changes these to use UniquePtr, to avoid the situation getting even worse in later parts.
Attachment #9033481 - Flags: review?(continuation)
Assignee

Comment 5

6 months ago
This patch improves how execution points which have no breakpoint position (they refer to a particular checkpoint) are converted to JS, to allow tidying up how these points are handled in the debugger and replay control JS code.  Instead of the result having a { type: "Invalid" } position, the result has no position property at all.
Attachment #9033482 - Flags: review?(lsmyth)
Assignee

Comment 6

6 months ago
Remove ~1300 lines of code for managing child processes from C++ in ParentIPC.cpp, ChildProcess.cpp, and associated files.  Unfortunately, there is some new/rearranged logic in here as well, for dispatching received messages in child processes, reworking the interface for when the middleman blocks until a child is paused to avoid reentrance problems, and to keep things working at startup before the control logic has been set up.  This was hard to separate out into separate patches, but is all pretty straightforward.
Attachment #9033483 - Flags: review?(continuation)
Assignee

Comment 7

6 months ago
Add IDL for the control JS module in part 7 so that C++ can call into it, and change the RecordReplayControl interface/bindings to support the functionality needed by that module.  Mainly, methods which operate on a child process now take an ID to specify that process.
Attachment #9033484 - Flags: review?(lsmyth)
Assignee

Comment 8

6 months ago
The control.js module is the main interface between debugger.js and the child processes: it keeps track of where each of the child processes is currently at, and presents a model which abstracts away the details of which process the debugger is currently interacting with.  This is the same interface which was previously provided by C++: this patch is mainly a port of the code removed in part 5, although some parts have been simplified, particularly the logic for moving one child to the exact same point as another child.
Attachment #9033485 - Flags: review?(lsmyth)
Assignee

Comment 9

6 months ago
Changes to the ReplayDebugger to work with control.js.  These are mostly minor: the debugger just calls into its _control object (provided by control.js) instead of RecordReplayControl.
Attachment #9033486 - Flags: review?(lsmyth)
Attachment #9033479 - Flags: review?(continuation) → review+
Comment on attachment 9033481 [details] [diff] [review]
Part 3 - Use UniquePtr more for web replay messages.

Review of attachment 9033481 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for fixing this! UniquePtrs are a lot better.

::: toolkit/recordreplay/ipc/Channel.h
@@ +160,3 @@
>    }
>  
>   public:

You should add #include "mozilla/UniquePtr.h" to Channel.h.

@@ +163,5 @@
> +  struct FreePolicy { void operator()(Message* msg) { free(msg); } };
> +  typedef UniquePtr<Message, FreePolicy> UniquePtr;
> +
> +  UniquePtr Clone() const {
> +    void* res = malloc(mSize);

Maybe do something like
  auto res = static_cast<Message*>(malloc(mSize));
and then get rid of the static cast below?

::: toolkit/recordreplay/ipc/ChildIPC.cpp
@@ +58,4 @@
>  
>  // Copy of the introduction message we got from the middleman. This is saved on
>  // receipt and then processed during InitRecordingOrReplayingProcess.
> +static Message::UniquePtr gIntroductionMessage;

You should preserve the more specific types for this and gCallResponseMessage. It'll take a little more work to set up by writing out the types and having to deal with the casts when storing to it, but having to do a cast every time they are accessed is bad (even though right now you aren't accessing them much).
Attachment #9033481 - Flags: review?(continuation) → review+
Comment on attachment 9033483 [details] [diff] [review]
Part 5 - Remove logic for coordinating child processes from C++.

Review of attachment 9033483 [details] [diff] [review]:
-----------------------------------------------------------------

I didn't really follow what all of the logic changed here, but at least at a low level this looks fine to me.

::: toolkit/recordreplay/ipc/ChildProcess.cpp
@@ +93,5 @@
>        break;
> +    case MessageType::MiddlemanCallRequest: {
> +      const MiddlemanCallRequestMessage& nmsg =
> +        static_cast<const MiddlemanCallRequestMessage&>(aMsg);
> +      MiddlemanCallResponseMessage* response = ProcessMiddlemanCallMessage(nmsg);

Can you use a unique pointer here? Or is that too much of a pain because of this whole setup where only generic Messages are made into unique pointers?

@@ +270,5 @@
> +  MOZ_RELEASE_ASSERT(NS_IsMainThread());
> +
> +  for (size_t i = 0; i < gPendingMessages.length(); i++) {
> +    PendingMessage& pending = gPendingMessages[i];
> +    if (!*aProcess || pending.mProcess == *aProcess) {

Combining these two cases together seems a little ugly, but I guess copying all of this boiler plate wouldn't be any better.

::: toolkit/recordreplay/ipc/ParentForwarding.cpp
@@ +78,4 @@
>      ipc::IProtocol::Result r =
>          contentChild->PContentChild::OnMessageReceived(aMessage);
>      MOZ_RELEASE_ASSERT(r == ipc::IProtocol::MsgProcessed);
> +    if (type == dom::PContent::Msg_RegisterChrome__ID) {

Can the SetXPCOMProcessAttributes message get dropped from the "white list" of messages earlier in this method?

@@ +85,1 @@
>        PreferencesLoaded();

Maybe this method should be renamed at some point? If it was called ChromeRegistered or something then you could also drop the comment.

::: toolkit/recordreplay/ipc/ParentIPC.cpp
@@ +53,5 @@
>  // The single recording child process, or null.
>  static ChildProcessInfo* gRecordingChild;
>  
> +// Any replaying child processes that have been spawned.
> +static StaticInfallibleVector<ChildProcessInfo*> gReplayingChildren;

Can this be StaticInfallibleVector<UniquePtr<ChildProcessInfo>>?
Attachment #9033483 - Flags: review?(continuation) → review+
Attachment #9033480 - Flags: review?(lsmyth) → review+
Attachment #9033482 - Flags: review?(lsmyth) → review+
Attachment #9033484 - Flags: review?(lsmyth) → review+
Attachment #9033486 - Flags: review?(lsmyth) → review+
Attachment #9033485 - Flags: review?(lsmyth) → review+

Comment 12

5 months ago
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cf0578ce6aa3
Part 1 - Merge HitCheckpoint and HitBreakpoint messages, r=mccr8.
https://hg.mozilla.org/integration/mozilla-inbound/rev/0426a61d27a9
Part 2 - Add ToString methods for breakpoint positions and execution points, r=lsmyth.
https://hg.mozilla.org/integration/mozilla-inbound/rev/fb64ff37f634
Part 3 - Use UniquePtr more for web replay messages, r=mccr8.
https://hg.mozilla.org/integration/mozilla-inbound/rev/9ddc5bc1e961
Part 4 - Improve handling when encoding/decoding execution points with no position, r=lsmyth.
https://hg.mozilla.org/integration/mozilla-inbound/rev/1701613c165d
Part 5 - Remove logic for coordinating child processes from C++, r=mccr8.
https://hg.mozilla.org/integration/mozilla-inbound/rev/91d5c6ff3ee7
Part 6 - IDL and binding changes for coordinating child processes from JS, r=lsmyth.
https://hg.mozilla.org/integration/mozilla-inbound/rev/6689773a4e1c
Part 7 - Add JS module for controlling child processes, r=lsmyth.
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab90c34ff486
Part 8 - ReplayDebugger changes for working with control code, r=lsmyth.

Backed out 12 changesets (bug 1516578, bug 1513118, bug 1516694) for failing at browser_toolbox_remoteness_change.js on a CLOSED TREE

Backout link: https://hg.mozilla.org/integration/mozilla-inbound/rev/9e3564442734c89fa1b735ff8662588576cf0115

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&resultStatus=testfailed%2Cbusted%2Cexception%2Crunnable&revision=4abb81088a9b49d700cfea840848a9dac6a0010d&selectedJob=221519592

Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=221519592&repo=mozilla-inbound&lineNumber=9675

Log snippet:

[task 2019-01-12T21:42:48.171Z] 21:42:48 INFO - GECKO(1484) | nsStringStats
[task 2019-01-12T21:42:48.171Z] 21:42:48 INFO - GECKO(1484) | => mAllocCount: 1939881
[task 2019-01-12T21:42:48.173Z] 21:42:48 INFO - GECKO(1484) | => mReallocCount: 1
[task 2019-01-12T21:42:48.173Z] 21:42:48 INFO - GECKO(1484) | => mFreeCount: 1939881
[task 2019-01-12T21:42:48.174Z] 21:42:48 INFO - GECKO(1484) | => mShareCount: 1889903
[task 2019-01-12T21:42:48.175Z] 21:42:48 INFO - GECKO(1484) | => mAdoptCount: 67026
[task 2019-01-12T21:42:48.176Z] 21:42:48 INFO - GECKO(1484) | => mAdoptFreeCount: 67986
[task 2019-01-12T21:42:48.177Z] 21:42:48 INFO - GECKO(1484) | => Process ID: 1484, Thread ID: 139783454422848
[task 2019-01-12T21:42:48.210Z] 21:42:48 INFO - TEST-INFO | Main app process: exit 0
[task 2019-01-12T21:42:48.211Z] 21:42:48 ERROR - TEST-UNEXPECTED-FAIL | devtools/client/framework/test/browser_toolbox_remoteness_change.js | leaked 4 window(s) until shutdown [url = about:blank]
[task 2019-01-12T21:42:48.212Z] 21:42:48 ERROR - TEST-UNEXPECTED-FAIL | devtools/client/framework/test/browser_toolbox_remoteness_change.js | leaked 1 window(s) until shutdown [url = chrome://devtools/content/webconsole/index.html]
[task 2019-01-12T21:42:48.222Z] 21:42:48 ERROR - TEST-UNEXPECTED-FAIL | devtools/client/framework/test/browser_toolbox_remoteness_change.js | leaked 1 window(s) until shutdown [url = about:devtools-toolbox]
[task 2019-01-12T21:42:48.223Z] 21:42:48 INFO - TEST-INFO | devtools/client/framework/test/browser_toolbox_remoteness_change.js | windows(s) leaked: [pid = 1484] [serial = 492], [pid = 1484] [serial = 490], [pid = 1484] [serial = 489], [pid = 1484] [serial = 491], [pid = 1484] [serial = 488], [pid = 1484] [serial = 487]
[task 2019-01-12T21:42:48.224Z] 21:42:48 ERROR - TEST-UNEXPECTED-FAIL | devtools/client/framework/test/browser_toolbox_remoteness_change.js | leaked 2 docShell(s) until shutdown
[task 2019-01-12T21:42:48.224Z] 21:42:48 INFO - TEST-INFO | devtools/client/framework/test/browser_toolbox_remoteness_change.js | docShell(s) leaked: [pid = 1484] [id = {2478efcb-416b-4465-ac17-c720bf9ea414}], [pid = 1484] [id = {4df99fc4-7efa-48f0-b186-723fe21e3008}]
[task 2019-01-12T21:42:48.226Z] 21:42:48 INFO - runtests.py | Application ran for: 0:15:58.112473
[task 2019-01-12T21:42:48.228Z] 21:42:48 INFO - zombiecheck | Reading PID log: /tmp/tmpCS3TLopidlog

Flags: needinfo?(bhackett1024)

Comment 14

5 months ago
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e2af5f75beaf
Part 1 - Merge HitCheckpoint and HitBreakpoint messages, r=mccr8.
https://hg.mozilla.org/integration/mozilla-inbound/rev/a42a0f3a4874
Part 2 - Add ToString methods for breakpoint positions and execution points, r=lsmyth.
https://hg.mozilla.org/integration/mozilla-inbound/rev/a2018dda2654
Part 3 - Use UniquePtr more for web replay messages, r=mccr8.
https://hg.mozilla.org/integration/mozilla-inbound/rev/5dcb44dc558d
Part 4 - Improve handling when encoding/decoding execution points with no position, r=lsmyth.
https://hg.mozilla.org/integration/mozilla-inbound/rev/1471f85be5dc
Part 5 - Remove logic for coordinating child processes from C++, r=mccr8.
https://hg.mozilla.org/integration/mozilla-inbound/rev/4df8f323196d
Part 6 - IDL and binding changes for coordinating child processes from JS, r=lsmyth.
https://hg.mozilla.org/integration/mozilla-inbound/rev/477c15c5f25a
Part 7 - Add JS module for controlling child processes, r=lsmyth.
https://hg.mozilla.org/integration/mozilla-inbound/rev/f7361bad8498
Part 8 - ReplayDebugger changes for working with control code, r=lsmyth.
Assignee

Comment 15

5 months ago

I don't think anything in these patches has anything to do with the browser_toolbox_remoteness_change.js failure: when I test locally, browser_toolbox_remoteness_change.js leaks windows both with and without these changes. The try push below contains two out of three of the bugs whose patches were in the earlier push, and doesn't have any browser_toolbox_remoteness_change.js failures.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=191ee7f02a12

Flags: needinfo?(bhackett1024)
You need to log in before you can comment on or make changes to this bug.