Closed Bug 1465287 Opened 6 years ago Closed 6 years ago

Web Replay: Child/middleman IPC

Categories

(Core :: IPC, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

Details

Attachments

(9 files, 2 obsolete files)

25.19 KB, patch
mccr8
: review+
jld
: review+
Details | Diff | Splinter Review
16.73 KB, patch
mccr8
: review+
jld
: review+
Details | Diff | Splinter Review
82.50 KB, patch
mccr8
: review+
jld
: review+
Details | Diff | Splinter Review
2.57 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
3.97 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
4.52 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
3.46 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
4.29 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
20.66 KB, patch
jld
: review+
mrbkap
: review+
Details | Diff | Splinter Review
This bug includes patches for managing IPC between recording/replaying processes, middleman processes, and the UI process, except for most graphics related stuff.
Recording/replaying processes use a special IPC channel, implemented as a socket pair, for communication with the middleman that is not included as part of the recording. This patch defines this channel, as well as all the messages that can be sent back and forth between these processes.
Assignee: nobody → bhackett1024
Attachment #8981701 - Flags: review?(continuation)
Child side of record/replay IPC, setting up the channel and handlers for sending and receiving messages from the middleman.
Attachment #8981702 - Flags: review?(continuation)
Middleman side of record/replay IPC. This has some additional complexities not seen on the child side: - IPDL messages need to be forwarded between the UI process and the recording process (if there is one), with certain messages handled in the middleman instead and certain others handled in both the middleman and recording process. - The middleman can manage multiple child processes simultaneously (up to one recording process and two replaying processes), requiring the middleman to keep track of where each child process is located in the replay, and to direct them around to other places.
Attachment #8981703 - Flags: review?(continuation)
On platforms where Web Replay is not enabled (everything except Mac nightlies), this dummy implementation avoids link errors.
Attachment #8981704 - Flags: review?(continuation)
Initializing recording/replaying/middleman processes is pretty straightforward, except for dealing with the shared memory block used for prefs.
Attachment #8981705 - Flags: review?(continuation)
Middleman processes use a special MessageChannel for forwarding messages to their child recording process. This patch has fixes so that this channel is used for PContent and its child protocols, and for a few places where protocol actors were incorrectly using the middleman's pid for communication between the recording and UI processes.
Attachment #8981706 - Flags: review?(continuation)
Add some IPDL and IDL glue for record/replay assertions and directives (which are all performed within the recording/replaying process and don't need any IPC) and fatal errors (which originate from the recording/replaying process and end up producing an alert within the UI process).
Attachment #8981707 - Flags: review?(continuation)
Allow content parents to record or replay their child processes, and to save recordings via IDL or the SaveAllRecordingsDirectory value.
Attachment #8981709 - Flags: review?(jld)
In order to handle IPDL messages in both the middleman and recording processes, the messages need to be copyable.
Attachment #8981710 - Flags: review?(continuation)
Comment on attachment 8981707 [details] [diff] [review] Part 7 - Add IPDL and IDL for record/replay assertions, directives, and fatal errors. Review of attachment 8981707 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/webidl/Window.webidl @@ +309,5 @@ > attribute EventHandler ondevicelight; > > void dump(DOMString str); > > + void recordReplayAssert(DOMString str); I assume these are supposed to both be [ChromeOnly]? We don't want to expose these methods to random webpages.
(In reply to Andrew McCreight [:mccr8] from comment #10) > Comment on attachment 8981707 [details] [diff] [review] > Part 7 - Add IPDL and IDL for record/replay assertions, directives, and > fatal errors. > > Review of attachment 8981707 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: dom/webidl/Window.webidl > @@ +309,5 @@ > > attribute EventHandler ondevicelight; > > > > void dump(DOMString str); > > > > + void recordReplayAssert(DOMString str); > > I assume these are supposed to both be [ChromeOnly]? We don't want to expose > these methods to random webpages. Sure, this would be fine, though recordReplayDirective() is used on content pages in tests (would that still work with [ChromeOnly]?), and recordReplayAssert() wouldn't be harmful if exposed to web content.
(In reply to Brian Hackett (:bhackett) from comment #11) > Sure, this would be fine, though recordReplayDirective() is used on content > pages in tests (would that still work with [ChromeOnly]?), and > recordReplayAssert() wouldn't be harmful if exposed to web content. The issue is that if we start adding a property onto Window that is visible, then it hurts cross-browser compatibility, and sites can end up depending on it, so may have to ship a property that has long outlived its usefulness. I think the patch as-is should have failed test_interfaces.js which checks to make sure we don't accidentally ship an API. Anyways, I think putting [ChromeOnly] on it, and then using SpecialPowers.unwrap() in a Mochitest, should be ok. What besides tests is using this function?
(In reply to Andrew McCreight [:mccr8] from comment #12) > (In reply to Brian Hackett (:bhackett) from comment #11) > > Sure, this would be fine, though recordReplayDirective() is used on content > > pages in tests (would that still work with [ChromeOnly]?), and > > recordReplayAssert() wouldn't be harmful if exposed to web content. > > The issue is that if we start adding a property onto Window that is visible, > then it hurts cross-browser compatibility, and sites can end up depending on > it, so may have to ship a property that has long outlived its usefulness. I > think the patch as-is should have failed test_interfaces.js which checks to > make sure we don't accidentally ship an API. Anyways, I think putting > [ChromeOnly] on it, and then using SpecialPowers.unwrap() in a Mochitest, > should be ok. > > What besides tests is using this function? recordReplayDirective() is only used by tests. recordReplayAssert() is not used anywhere, but is something that I found useful while debugging problems with web pages. I can remove it entirely if you want, I didn't realize the ramifications of adding interfaces to this file.
(In reply to Brian Hackett (:bhackett) from comment #13) > recordReplayDirective() is only used by tests. It looks like the method doesn't depend on the window at all, so it would make more sense to add it to nsIXPCComponents_Utils instead. Then you could call it via Components.Utils. There's also ChromeUtils, if you really want to use WebIDL. > recordReplayAssert() is not used anywhere, but is something that I found useful while debugging problems > with web pages. I can remove it entirely if you want, I didn't realize the > ramifications of adding interfaces to this file. Yeah, it should either be in something like Cu, as with the other one, or removed entirely. You wouldn't be able to call it on a webpage, though, if that's what you were doing with debugging. Separately, and maybe this is a naive question, but why do you need to use your own custom communication rather than defining a new IPDL protocol? It would be nice to avoid the scary dynamically sized object boilerplate you need here for the different messages. Would that end up getting tangled up in the stuff being recorded too much?
Comment on attachment 8981704 [details] [diff] [review] Part 4 - Add dummy IPC implementation for disabled platforms. Review of attachment 8981704 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/recordreplay/ipc/DisabledIPC.cpp @@ +116,5 @@ > + > +namespace parent { > + > +void > +InitializeUIProcess(int aArgc, char** aArgv) Why do these two methods not crash?
Attachment #8981704 - Flags: review?(continuation) → review+
Comment on attachment 8981707 [details] [diff] [review] Part 7 - Add IPDL and IDL for record/replay assertions, directives, and fatal errors. Review of attachment 8981707 [details] [diff] [review]: ----------------------------------------------------------------- As I said, this should be on Component.Utils or something. The patch looks fine to me otherwise.
Attachment #8981707 - Flags: review?(continuation) → review-
Comment on attachment 8981709 [details] [diff] [review] Part 8 - Allow spawning recording/replaying child processes and saving recordings. Review of attachment 8981709 [details] [diff] [review]: ----------------------------------------------------------------- Summary: this needs DOM review, and I have concerns about sandboxing, and a bunch of style comments. ::: dom/ipc/ContentParent.cpp @@ +605,5 @@ > RefPtr<ContentParent> process = > new ContentParent(/* aOpener = */ nullptr, > + NS_LITERAL_STRING(DEFAULT_REMOTE_TYPE), > + /* aRecordExecution = */ nsString(), > + /* aReplayExecution = */ nsString()); Nit: EmptyString() @@ +772,5 @@ > + nsAutoString recordExecution; > + nsAutoString replayExecution; > + if (aFrameElement) { > + aFrameElement->GetAttr(kNameSpaceID_None, nsGkAtoms::RecordExecution, recordExecution); > + aFrameElement->GetAttr(kNameSpaceID_None, nsGkAtoms::ReplayExecution, replayExecution); This needs a DOM peer, which I am not. Strictly speaking all of dom/ipc belongs to the DOM module, not IPC, but I've reviewed some patches that are simple or mostly IPC things and so far nobody's complained. But this patch touches DOM things enough that I don't feel comfortable being the only reviewer. @@ +851,5 @@ > if (!p->LaunchSubprocess(aPriority)) { > return nullptr; > } > > + if (!recordExecution.Length() && !replayExecution.Length()) { Suggestion: IsEmpty() is a clearer way to do this. (Also for the other boolean-coerced Length() calls.) @@ +1406,5 @@ > if (aMethod == SEND_SHUTDOWN_MESSAGE) { > + if (const char* directory = recordreplay::parent::SaveAllRecordingsDirectory()) { > + // Save a recording for the child process before it shuts down. > + char file[64]; > + strcpy(file, "RecordingXXXXXX"); `char file[] = "RecordingXXXXXX";` should do what you want without the strcpy. @@ +1408,5 @@ > + // Save a recording for the child process before it shuts down. > + char file[64]; > + strcpy(file, "RecordingXXXXXX"); > + char buf[1024]; > + SprintfLiteral(buf, "%s/%s", directory, mktemp(file)); nsPrintfCString can do this without the manual buffer sizing. (Alternately, nsAutoCString and the Append methods.) @@ +1409,5 @@ > + char file[64]; > + strcpy(file, "RecordingXXXXXX"); > + char buf[1024]; > + SprintfLiteral(buf, "%s/%s", directory, mktemp(file)); > + fprintf(stderr, "Saving Recording: %s\n", buf); Is this printf meant to be enabled by default? (I'd suggest MOZ_LOG as an alternative, but we don't seem to have a LogModule set up for ContentParent yet.) @@ +2090,5 @@ > > + // Specify whether the process is recording or replaying an execution. > + if (mReplayExecution.Length()) { > + char buf[20]; > + SprintfLiteral(buf, "%d", (int) recordreplay::ProcessKind::MiddlemanReplaying); nsPrintfCString is better for this; see also the -prefsHandle argument. (Also for the other char[20] below.) ::: dom/ipc/PContent.ipdl @@ +708,5 @@ > */ > async PClientOpenWindowOp(ClientOpenWindowArgs aArgs); > > + /* Save the execution up to the current point in a recorded process. */ > + async SaveRecording(nsCString filename); Can you open the file in the parent process and send a mozilla::ipc::FileDescriptor in this message instead? (See also the PCycleCollectWithLogs constructor, which does that.) This is also a more general comment about any other files that might be opened after RecvSetProcessSandbox. WebReplay will need to be sandbox-compatible if we want to ship it as a feature that's exposed to general users in devtools (which seems to be the long-term goal?), and if the comment in attachment 8953473 [details] [diff] [review] is accurate that the only conflict is file access, it should be relatively easy to fix that with file descriptor/handle passing.
Attachment #8981709 - Flags: review?(jld)
Comment on attachment 8981703 [details] [diff] [review] Part 3 - Middleman side of record/replay IPC. Review of attachment 8981703 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/recordreplay/ipc/ParentGraphics.cpp @@ +27,5 @@ > +InitializeGraphicsMemory() > +{ > + mach_vm_address_t address; > + kern_return_t kr = mach_vm_allocate(mach_task_self(), &address, > + GraphicsMemorySize, VM_FLAGS_ANYWHERE); If `GraphicsMemorySize` can be greater than 128MiB, see bug 1465667.
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #18) > If `GraphicsMemorySize` can be greater than 128MiB, see bug 1465667. It is defined as 4096 * 4096 * 4 in part 1.
(In reply to Andrew McCreight [:mccr8] from comment #15) > Comment on attachment 8981704 [details] [diff] [review] > Part 4 - Add dummy IPC implementation for disabled platforms. > > Review of attachment 8981704 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: toolkit/recordreplay/ipc/DisabledIPC.cpp > @@ +116,5 @@ > > + > > +namespace parent { > > + > > +void > > +InitializeUIProcess(int aArgc, char** aArgv) > > Why do these two methods not crash? These methods are called from the UI process, to check for record/replay command line args.
(In reply to Andrew McCreight [:mccr8] from comment #14) > Separately, and maybe this is a naive question, but why do you need to use > your own custom communication rather than defining a new IPDL protocol? It > would be nice to avoid the scary dynamically sized object boilerplate you > need here for the different messages. Would that end up getting tangled up > in the stuff being recorded too much? The custom communication is, yeah, mainly to avoid tangling up unrecorded communication with recorded communication. A recording process would need to have multiple live channels open with the same process, one of which wasn't recorded and the others which were, with some state being shared between all the channels (e.g. shared memory global state). A replaying process would have a live channel and other channels replaying old data, and additionally needs to make sure the live channel (a) is able to pause in a state where snapshots can be taken, and (b) isn't corrupted if we e.g. rewind to a place where only half of a message has been sent over it. The patches in bug 1207696 took this approach (parts 10d and 10e, and some others) but the instrumentation was distasteful and I felt that removing it and using a simple socket pair was a better option.
Attachment #8981707 - Attachment is obsolete: true
Attachment #8982550 - Flags: review?(continuation)
Attachment #8982550 - Flags: review?(continuation) → review+
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #17) > Can you open the file in the parent process and send a > mozilla::ipc::FileDescriptor in this message instead? (See also the > PCycleCollectWithLogs constructor, which does that.) > > This is also a more general comment about any other files that might be > opened after RecvSetProcessSandbox. WebReplay will need to be > sandbox-compatible if we want to ship it as a feature that's exposed to > general users in devtools (which seems to be the long-term goal?), and if > the comment in attachment 8953473 [details] [diff] [review] is accurate that > the only conflict is file access, it should be relatively easy to fix that > with file descriptor/handle passing. Thanks for the feedback, I'll experiment with enabling the sandbox on recording/replaying processes and post an updated patch. This has also been an issue in part 11 of bug 1465294, where I posted the sandbox-disabling bits for review.
(In reply to Brian Hackett (:bhackett) from comment #21) > The custom communication is, yeah, mainly to avoid tangling up unrecorded > communication with recorded communication. [...] That makes sense. Thanks for the explanation, and the reference to the other patches. I guess my main complaint here is really about the serialization/deserialization. The rest of it doesn't seem too complex. I'll try to think about how IPDL could be used to generate that code, but given that I expect the set of messages isn't going to change over time much I think the current stuff is ok for landing.
Comment on attachment 8981701 [details] [diff] [review] Part 1 - IPC channels. Jed, could you also review this, at least the Channel.cpp stuff? Channel.h is mostly serialization stuff but the cpp file has a lot of socket stuff I'm not familiar with. Thanks.
Attachment #8981701 - Flags: review?(jld)
Comment on attachment 8981701 [details] [diff] [review] Part 1 - IPC channels. Review of attachment 8981701 [details] [diff] [review]: ----------------------------------------------------------------- I wonder if people who do IPDL fuzzing should be told about this new IPC mechanism. On the other hand, if it is only used between two different content processes, that are going to be touching things from the same domain anyways, I guess it doesn't really add any new attack surface. ::: toolkit/recordreplay/ipc/Channel.cpp @@ +41,5 @@ > + , mConnectionFd(0) > + , mFd(0) > + , mMessageBytes(0) > +{ > + MOZ_RELEASE_ASSERT(NS_IsMainThread()); Maybe this should also assert that you are in the right kind of process? My main concern here is ensuring the chrome process doesn't suddenly start opening a socket doing who knows what. @@ +44,5 @@ > +{ > + MOZ_RELEASE_ASSERT(NS_IsMainThread()); > + > + if (IsRecordingOrReplaying()) { > + MOZ_ASSERT(IsRecordingOrReplaying()); This assert is redundant. @@ +152,5 @@ > +Message* > +Channel::WaitForMessage() > +{ > + if (!mMessageBuffer.length()) { > + mMessageBuffer.appendN(0, 1048576); This should be a named constant. Also, 1MB is quite large for an allocation. For Firefox IPDL messages, Bill implemented (in bug 1262671) segmented storage for messages, because we were seeing lots of OOM crashes in IPC on Beta. I think each chunk is something like 4kb. There's also a hard cap of something like 128MB (256MB?) for the messages. Presumably, the typical developer who is using this will have much more powerful systems than the typical Firefox Beta user, and maybe these messages are more consistent and smaller in volume than the things we send via IPDL, but that's something to keep in mind if you start seeing crashes. @@ +167,5 @@ > + } > + > + // Make sure the buffer is large enough for the entire incoming message. > + if (messageSize > mMessageBuffer.length()) { > + mMessageBuffer.appendN(0, messageSize - mMessageBuffer.length()); It is a little concerning that this buffer can always get bigger, but never smaller. I assume that a channel has a long lifetime. So if a channel sends a 100MB message one time, you end up effectively leaking memory for a long time. Maybe that's not an issue, depending on the typical work load. @@ +205,5 @@ > +static char* > +WideCharString(const char16_t* aBuffer, size_t aBufferSize) > +{ > + char* buf = new char[aBufferSize + 1]; > + for (size_t i = 0; i < aBufferSize; i++) { Can you not use some kind of memmove here? Or is that not okay because the source and destination have different types? @@ +275,5 @@ > + default: > + break; > + } > + const char* kind = IsMiddleman() ? "Middleman" : (IsRecording() ? "Recording" : "Replaying"); > + PrintSpew("%s%s:%d %s %s\n", kind, aPrefix, (int) mId, aMsg.TypeString(), data ? data : ""); I looked through a number of patches, and I can't find where this is defined. I think you want something along the lines of mozilla::ipc::LoggingEnabled() guarding this function, so we're not printing out debug messages all of the time, or if we aren't, it would be good to have to run all of this logging code when we're not logging. (I assume that this logging is intended for people debugging WebReplay, not people simple using it.) ::: toolkit/recordreplay/ipc/Channel.h @@ +4,5 @@ > + * License, v. 2.0. If a copy of the MPL was not distributed with this file, > + * You can obtain one at http://mozilla.org/MPL/2.0/. */ > + > +#ifndef mozilla_toolkit_recordreplay_ipc_Channel_h > +#define mozilla_toolkit_recordreplay_ipc_Channel_h It doesn't really matter, but the include guard is supposed to be based on the path used to include the file, not of the file itself. Maybe more like mozilla_recordreplay_Channel_h. @@ +47,5 @@ > +// while the recording process is appending new chunks to it (see File.cpp), > +// all replaying processes must be paused when the recording process is > +// flushing a new index to the file. > + > +#define ForEachMessageType(Macro) \ Maybe _Macro so that it is clearer that this is an argument to a macro? @@ +129,5 @@ > + > + Message(MessageType aType, uint32_t aSize) > + : mType(aType), mSize(aSize) > + { > + MOZ_RELEASE_ASSERT(mSize >= sizeof(*this)); Do you need similar asserts in the subclasses? @@ +438,5 @@ > + > +// Command line option used to specify the channel ID for a child process. > +static const char* gChannelIDOption = "-recordReplayChannelID"; > + > +static const int32_t GraphicsMessageId = 42; Maybe describe these two graphics constants?
Attachment #8981701 - Flags: review?(continuation) → review+
Comment on attachment 8981710 [details] [diff] [review] Part 9 - Allow copying IPDL messages. Review of attachment 8981710 [details] [diff] [review]: ----------------------------------------------------------------- I feel bad about putting this in Nathan's review queue, but he knows more about this than me. At least this patch is short.
Attachment #8981710 - Flags: review?(continuation) → review?(nfroyd)
Comment on attachment 8981705 [details] [diff] [review] Part 5 - Initialize record/replay and middleman state. Review of attachment 8981705 [details] [diff] [review]: ----------------------------------------------------------------- ::: browser/app/nsBrowserApp.cpp @@ +282,5 @@ > + // Middleman content processes should use the same executable for replaying > + // processes under the same circumstances that the chrome process used > + // to launch the original recording process, so that the recording and > + // replaying processes will initialize themselves in the same way. > +#ifdef MOZ_BROWSER_CAN_BE_CONTENTPROC It looks like this isn't defined on OSX, so is this actually needed? Also, isn't this being called already? https://searchfox.org/mozilla-central/rev/292d295d6b084b43b70de26a42e68513bb7b36a3/browser/app/nsBrowserApp.cpp#303 I admit I have no idea what this method is actually doing, but I'll try to read up more on it. ::: dom/ipc/ContentProcess.cpp @@ +211,5 @@ > } > > + if (recordreplay::IsRecordingOrReplaying()) { > + // Set up early prefs from shmem contents passed to us by the middleman. > + Preferences::DeserializePreferences(recordreplay::child::PrefsShmemContents(*prefsLen), I guess the middleman is never recording or replaying?
(In reply to Andrew McCreight [:mccr8] from comment #26) > @@ +275,5 @@ > > + default: > > + break; > > + } > > + const char* kind = IsMiddleman() ? "Middleman" : (IsRecording() ? "Recording" : "Replaying"); > > + PrintSpew("%s%s:%d %s %s\n", kind, aPrefix, (int) mId, aMsg.TypeString(), data ? data : ""); > > I looked through a number of patches, and I can't find where this is > defined. I think you want something along the lines of > mozilla::ipc::LoggingEnabled() guarding this function, so we're not printing > out debug messages all of the time, or if we aren't, it would be good to > have to run all of this logging code when we're not logging. (I assume that > this logging is intended for people debugging WebReplay, not people simple > using it.) PrintSpew is defined in part 1 of bug 1464903. When spew is not enabled it is an inline function that checks whether spew is enabled and then returns.
(In reply to Andrew McCreight [:mccr8] from comment #28) > Comment on attachment 8981705 [details] [diff] [review] > Part 5 - Initialize record/replay and middleman state. > > Review of attachment 8981705 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: browser/app/nsBrowserApp.cpp > @@ +282,5 @@ > > + // Middleman content processes should use the same executable for replaying > > + // processes under the same circumstances that the chrome process used > > + // to launch the original recording process, so that the recording and > > + // replaying processes will initialize themselves in the same way. > > +#ifdef MOZ_BROWSER_CAN_BE_CONTENTPROC > > It looks like this isn't defined on OSX, so is this actually needed? Also, > isn't this being called already? > https://searchfox.org/mozilla-central/rev/ > 292d295d6b084b43b70de26a42e68513bb7b36a3/browser/app/nsBrowserApp.cpp#303 > > I admit I have no idea what this method is actually doing, but I'll try to > read up more on it. Oops, yes, this code isn't compiled on Mac. I'll remove it (it's probably a holdover from when we had some Windows support). > ::: dom/ipc/ContentProcess.cpp > @@ +211,5 @@ > > } > > > > + if (recordreplay::IsRecordingOrReplaying()) { > > + // Set up early prefs from shmem contents passed to us by the middleman. > > + Preferences::DeserializePreferences(recordreplay::child::PrefsShmemContents(*prefsLen), > > I guess the middleman is never recording or replaying? Yes, middleman processes cannot also be recording or replaying processes.
Comment on attachment 8981710 [details] [diff] [review] Part 9 - Allow copying IPDL messages. Review of attachment 8981710 [details] [diff] [review]: ----------------------------------------------------------------- Thank you for not just adding a copy constructor; we had issues in the B2G days with needless copying of messages because the copy constructor would get invoked unnecessarily. WDYT about making the new function: // CopyFrom, Duplicate, whatever. static Message CopyFrom(const Message&); with any supporting functions that need to be created to support CopyFrom. That way we can make clear that copying the message creates a newly initialized message, rather than letting you invoke CopyFrom on any old Message that you have lying around. I guess we'd have to do the same sort of function for BufferList. Not sure if it comes out to be less code, so I'm not completely sure it's worth it. r=me either way.
Attachment #8981710 - Flags: review?(nfroyd) → review+
Comment on attachment 8981710 [details] [diff] [review] Part 9 - Allow copying IPDL messages. Review of attachment 8981710 [details] [diff] [review]: ----------------------------------------------------------------- Was going to say that some asserts inside BufferList::CopyFrom would be appropriate, and then noticed this problem. A test or two for BufferList would be splendid as well. ::: mfbt/BufferList.h @@ +144,5 @@ > + return false; > + > + size_t offset = 0; > + for (const Segment& segment : aOther.mSegments) { > + memcpy(Start() + offset, segment.mData, segment.mSize); Actually, wait, how is this correct for multiple segments? Start() always returns a pointer to the start of the first segment's data, so on the second iteration of the loop, we're going to be indexing off the end. We need to set up a proper iterator over `this`.
Attachment #8981710 - Flags: review+ → review-
(In reply to Brian Hackett (:bhackett) from comment #29) > PrintSpew is defined in part 1 of bug 1464903. When spew is not enabled it > is an inline function that checks whether spew is enabled and then returns. Alright, thanks. I think it makes sense for Channel::PrintMessage() to return early if SpewEnabled() is false, but maybe the overhead isn't important.
Comment on attachment 8981702 [details] [diff] [review] Part 2 - Child side of record/replay IPC. Review of attachment 8981702 [details] [diff] [review]: ----------------------------------------------------------------- Jed, could you please review the little chunk of code headed by the comment "The parent should have sent us a handle to the graphics shmem."? I don't know anything about this mach shmem stuff. Thanks. ::: toolkit/recordreplay/ipc/ChildIPC.cpp @@ +58,5 @@ > +static FileHandle gCheckpointWriteFd; > +static FileHandle gCheckpointReadFd; > + > +// Copy of the introduction message we got from the middleman. > +static IntroductionMessage* gIntroductionMessage; Maybe add a comment here indicating that this is only used in between getting the introduction message and when we call InitRecordingOrReplayingProcess? All of the variables are permanent state that are needed as long as the process is doing stuff, but this one is really just a closure for initialization, so I think that a comment saying that would be useful. @@ +70,5 @@ > + aMsg->mType == MessageType::Terminate); > + > + switch (aMsg->mType) { > + case MessageType::Introduction: { > + gIntroductionMessage = (IntroductionMessage*) aMsg->Clone(); Should this assert that gIntroductionMessage is null beforehand? Are you planning to have automated testing of this in debug builds? Because AFAICT this is going to "leak". You could use StaticAutoPtr to avoid that. @@ +82,5 @@ > + } > + case MessageType::Terminate: { > + PrintSpew("Terminate message received, exiting...\n"); > + MOZ_RELEASE_ASSERT(IsRecording()); > + _exit(0); Does this really need to just exit immediately, rather than going through the normal content process shutdown? This will cause leak checking to not work in debug builds, if that's a concern, and normally when going through ContentChild::ActorDestroy() we do a little bit of clean up, rather than abruptly exiting (also note there's some reason to not use _exit() on Windows). Maybe it doesn't matter. @@ +122,5 @@ > + } > + case MessageType::Resume: { > + const ResumeMessage& nmsg = (const ResumeMessage&) *aMsg; > + PauseMainThreadAndInvokeCallback([=]() { > + // The hooks will not have been set yet for the primordial resume. It feels to me like the resume code should be responsible for its initialization, rather than this random IPC code. From reading this code, I'm not really sure what this hook is, and somebody looking at wherever ResumeExecution() is set up won't see the initialization. @@ +142,5 @@ > + default: > + MOZ_CRASH(); > + } > + > + free(aMsg); It would be better if this code used UniquePtr() or something instead of all of this manual memory management, but I guess it is okay. @@ +259,5 @@ > + > + // Record/replay the introduction message itself so we get consistent args > + // and prefs between recording and replaying. > + size_t introductionSize = RecordReplayValue(gIntroductionMessage->mSize); > + IntroductionMessage* msg = (IntroductionMessage*) malloc(introductionSize); The malloc+memcpy is pretty low level and requires that the reader of the code know about the implementation of IntroductionMessage. It would be a lot nicer if you defined a RecordReplay() overload for IntroductionMessage in Channel.h that bundles up all of the logic here (from introductionSize down to RecordReplayBytes). Also, you are leaking |msg| here, so this should be a UniquePtr<> or something. @@ +263,5 @@ > + IntroductionMessage* msg = (IntroductionMessage*) malloc(introductionSize); > + if (IsRecording()) { > + memcpy(msg, gIntroductionMessage, introductionSize); > + } > + RecordReplayBytes(msg, introductionSize); gIntroductionMessage feels a little weird to me after this point. If we are replaying, then all of the other global state you are maintaining is about the process being replayed, but this is still about the "original" state before we started the replay. Maybe you could clear and free it here? Of course, then you'd lose some of the value of the null assert for gIntroductionMessage I suggested above, but that feels better to me than having this zombie information sitting around (in case some future person is working on this code and doesn't realize that it is defunct). @@ +354,5 @@ > +// Checkpoint Messages > +/////////////////////////////////////////////////////////////////////////////// > + > +// When recording, the time when the last HitCheckpoint message was sent. > +double gLastCheckpointTime; I think gLastCheckpointTime and gIdleTimeStart should be static, because they are not exposed outside of this file. @@ +427,5 @@ > + JS::replay::hooks.respondAfterRecoveringFromDivergence(); > + }); > +} > + > +static void It seems a little odd to define this method here instead of wherever these hooks are defined. ::: toolkit/recordreplay/ipc/ChildIPC.h @@ +9,5 @@ > + > +#include "base/process.h" > + > +#include "mozilla/gfx/2D.h" > +#include "Units.h" 2D.h and Units.h don't look like they are used in the header. @@ +13,5 @@ > +#include "Units.h" > + > +namespace mozilla { > + > +class VsyncObserver; Likewise, this forward declaration is unused in the header.
Attachment #8981702 - Flags: review?(jld)
Attachment #8981702 - Flags: review?(continuation)
Attachment #8981702 - Flags: review+
Attachment #8981705 - Flags: review?(continuation) → review+
Comment on attachment 8981706 [details] [diff] [review] Part 6 - Use correct channel and pids when communicating with a middleman process. Review of attachment 8981706 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/glue/ProtocolUtils.cpp @@ +1070,5 @@ > > const MessageChannel* > IToplevelProtocol::ToplevelState::GetIPCChannel() const > { > + return ProtocolState::mChannel ? ProtocolState::mChannel : &mChannel; Ah, I guess these two places are fixing the pid. ::: ipc/glue/ProtocolUtils.h @@ +287,5 @@ > const MessageChannel* GetIPCChannel() const > { > return mState->GetIPCChannel(); > } > + void SetIPCChannel(MessageChannel* aChannel) This seems like a capability we don't want random channels using. Could you rename it to SetMiddlemanIPCChannel() and assert we're in a middleman process? @@ +864,5 @@ > // be used to send and receive messages. The endpoint becomes invalid. > bool Bind(PFooSide* aActor) > { > MOZ_RELEASE_ASSERT(mValid); > + if (mMyPid != base::GetCurrentProcId()) { Given that we're in the middle of IPC code here, and not in recordreplay code, this comment needs to set up the context better, either by having the release asserts before the comment or an additional comment along the lines of "these pids should match unless we're recording and replaying."
Attachment #8981706 - Flags: review?(continuation) → review+
(In reply to Nathan Froyd [:froydnj] from comment #32) > ::: mfbt/BufferList.h > @@ +144,5 @@ > > + return false; > > + > > + size_t offset = 0; > > + for (const Segment& segment : aOther.mSegments) { > > + memcpy(Start() + offset, segment.mData, segment.mSize); > > Actually, wait, how is this correct for multiple segments? Start() always > returns a pointer to the start of the first segment's data, so on the second > iteration of the loop, we're going to be indexing off the end. We need to > set up a proper iterator over `this`. The Init() call above this creates a single segment that is large enough to contain all the data from aOther's segments, i.e. this method is not creating an exact copy of the other BufferList, just one that is functionally equivalent.
(In reply to Andrew McCreight [:mccr8] from comment #34) > @@ +70,5 @@ > > + aMsg->mType == MessageType::Terminate); > > + > > + switch (aMsg->mType) { > > + case MessageType::Introduction: { > > + gIntroductionMessage = (IntroductionMessage*) aMsg->Clone(); > > Should this assert that gIntroductionMessage is null beforehand? > > Are you planning to have automated testing of this in debug builds? Because > AFAICT this is going to "leak". You could use StaticAutoPtr to avoid that. I'd like the same automated tests we use normally (I think right now we only run opt build tests on ash, not sure if that is generally the case on macOS), but any leak detection infrastructure is going to throw a fit with a recording/replaying process, as there is a lot of global state like this we allocate once and keep around forever. With a replaying process things are even more problematic, because to emulate certain library calls, like getenv, we have to allocate memory and then leak it to the caller. I think we'll want to just not worry about this and turn off any leak detection that starts complaining about recording/replaying processes. > @@ +82,5 @@ > > + } > > + case MessageType::Terminate: { > > + PrintSpew("Terminate message received, exiting...\n"); > > + MOZ_RELEASE_ASSERT(IsRecording()); > > + _exit(0); > > Does this really need to just exit immediately, rather than going through > the normal content process shutdown? This will cause leak checking to not > work in debug builds, if that's a concern, and normally when going through > ContentChild::ActorDestroy() we do a little bit of clean up, rather than > abruptly exiting (also note there's some reason to not use _exit() on > Windows). Maybe it doesn't matter. Hmm, I'll experiment to see if we can go through the normal shutdown procedure.
(In reply to Brian Hackett (:bhackett) from comment #37) > as there is a lot of global state like this we > allocate once and keep around forever. That isn't too hard to handle. You can use something like a StaticAutoPtr<> and pass it to ClearOnShutdown, and it will be freed for you. You'll just have to make sure recording/replaying can't happen once shutdown begins. Though with the way you are allocating a lot of these objects via malloc, they'll be invisible to the XPCOM leakchecker anyways, though LSan will detect them. (So you'd want to add MOZ_COUNT_CTOR/DTOR to really do it right.) > With a replaying process things are > even more problematic, because to emulate certain library calls, like > getenv, we have to allocate memory and then leak it to the caller. Those will probably be blocks of memory not tracked by the XPCOM leak checker, so it shouldn't be an issue for that. For LSan, we have MOZ_LSAN_INTENTIONALLY_LEAK_OBJECT (you could also whitelist the stack, but I think the macro is easier). (Though we don't run LSan on OSX, so this isn't really an immediate issue.) Anyways, leak checking doesn't need to block landing or anything, but any time I've gotten leak checking somewhere new, it has found leaks. If you don't want to get leak checking working, but you do want to run in debug builds, then you'll have to disable leak checking somehow (in a way that won't affect regular debug builds).
Updated patch per comments, and with changes to support sandboxing as described in bug 1466877.
Attachment #8981709 - Attachment is obsolete: true
Attachment #8983462 - Flags: review?(bugs)
Comment on attachment 8981703 [details] [diff] [review] Part 3 - Middleman side of record/replay IPC. Review of attachment 8981703 [details] [diff] [review]: ----------------------------------------------------------------- Could you please review InitializeGraphicsMemory() in ParentGraphics.cpp and the chunk of code starting with the comment "Send the child a handle to the graphics shmem via mach IPC." in ChildProcess.cpp, Jed? Thanks. ::: toolkit/recordreplay/ipc/ChildProcess.cpp @@ +74,5 @@ > + // We can determine the disposition of the child by looking at the first > + // resume message sent since the last time it reached a checkpoint. > + for (Message* msg : mMessages) { > + if (msg->mType == MessageType::Resume) { > + const ResumeMessage& nmsg = (const ResumeMessage&) *msg; The various C-style casts on Message in this file should be static_casts. @@ +116,5 @@ > + > + // Find the last time we sent a SetBreakpoint message to this process for > + // this breakpoint ID. > + SetBreakpointMessage* lastSet = nullptr; > + for (Message* msg : mMessages) { Is there a reverse iterator you could use? @@ +208,5 @@ > + if (msg->mType == MessageType::SetBreakpoint) { > + // Look for an older SetBreakpoint on the same ID to overwrite. > + bool found = false; > + for (Message*& older : newMessages) { > + if (older->mType == MessageType::SetBreakpoint && This check is redundant. Everything in newMessages at this point should be of type SetBreakpoint. @@ +398,5 @@ > + return; > + } > + msg = mMessages[mNumRecoveredMessages++]; > + SendMessageRaw(*msg); > + } while (msg->mType == MessageType::SetBreakpoint); Why does this break out of the loop when there's a non-SetBreakpoint message? Are other messages going to be newer, after we started recovering or something? ::: toolkit/recordreplay/ipc/ParentGraphics.cpp @@ +3,5 @@ > +/* This Source Code Form is subject to the terms of the Mozilla Public > + * License, v. 2.0. If a copy of the MPL was not distributed with this file, > + * You can obtain one at http://mozilla.org/MPL/2.0/. */ > + > +// This file has the logic which the middleman process uses to sent messages to nit: "sent" ::: toolkit/recordreplay/ipc/ParentIPC.cpp @@ +90,5 @@ > + > +static const double FlushSeconds = .5; > +static const double MajorCheckpointSeconds = 2; > + > +// This section describes the strategy used for managing child processes. When Thanks for writing this detailed explanation. @@ +93,5 @@ > + > +// This section describes the strategy used for managing child processes. When > +// recording, there is a single recording process and two replaying processes. > +// When replaying, there are two replaying processes. The main advantage of > +// using two replaying processes is to provide a smooth experience when This is an interesting implementation detail. You might want to throw in a sentence or two (like this paragraph) into the WebReplay MDN page. @@ +100,5 @@ > +// At any time there is one active child: the process which the user is > +// interacting with. This may be any of the two or three children in existence, > +// depending on the user's behavior. Below are some scenarios showing the state > +// we attempt to keep the children in, and ways in which the active process > +// switches from one to another. Maybe give a high-level description of the non-active processes. eg if the recording process is not active, it is inert, which means it is not doing anything. If a replaying process is not active, it is on standby, which means that it is working to keep up with the active process (whether it is moving forwards or backwards), while saving some intermediate execution states. @@ +109,5 @@ > +// with the process saving the checkpoint alternating back and forth so that > +// individual processes save checkpoints every MajorCheckpointSeconds*2. These > +// are the major checkpoints for each replaying process. > +// > +// Active Recording: ----------------------- It would be good to explain these diagrams a bit first. I was able to figure them out (I think!) after reading through the examples, but that would have been easier with an explanation up front. Something like, each segment indicates a part of an execution. All three processes are executing the same sequence of operations. If the segment is *, then that segment has been saved. If it is -, then it has not. And point out that in this first diagram that the recording process is ahead of the replaying processes. @@ +146,5 @@ > +// Standby Replaying #1: *---------***** > +// Active Replaying #2: -----*---------** > +// > +// Rewinding continues in this manner, alternating back and forth between the > +// replaying process as the user continues going back in time. nit: "processes" @@ +254,5 @@ > +static void RecvRecordingFlushed(); > +static void RecvAlwaysMarkMajorCheckpoints(); > + > +// The role taken by the active child. > +class ChildRoleActive : public ChildRole You could mark this and the other ChildRole subclasses final. @@ +382,5 @@ > +// rewind to. > +static bool ActiveChildIsPausedOrRewinding(); > + > +static void > +MaybeClearSavedNonMajorCheckpoint(ChildProcess* aChild, size_t aCheckpoint) Maybe "ClearIf" instead of "MaybeClear"? I think that conveys why we may do the clear better. @@ +417,5 @@ > + size_t lastMajorCheckpoint = LastMajorCheckpointPreceding(mProcess, targetCheckpoint); > + > + // If there is no major checkpoint prior to the active child's position, > + // just idle. > + if (!lastMajorCheckpoint) { I think this would be clearer if it was |lastMajorCheckpoint == InvalidCheckpointId|. @@ +426,5 @@ > + // child's current position, or the other replaying child's most recent > + // major checkpoint. > + size_t otherMajorCheckpoint = > + LastMajorCheckpointPreceding(OtherReplayingChild(mProcess), targetCheckpoint); > + if (otherMajorCheckpoint > lastMajorCheckpoint && otherMajorCheckpoint <= targetCheckpoint) { IIUC, |LastMajorCheckpointPreceding(a, x) <= x|, so the second clause is always true, so you could remove it. @@ +432,5 @@ > + } > + > + // If we haven't reached the last major checkpoint, we need to run forward > + // without saving intermediate checkpoints. > + if (mProcess->LastCheckpoint() < lastMajorCheckpoint) { Maybe move this before the otherMajorCheckpoint stuff because it does not depend on it? @@ +451,5 @@ > + return; > + } > + > + // Since we always save major checkpoints, we must have saved the > + // checkpoint prior to the missing one and can restore it. The inductive step of the logic here wasn't immediately obvious to me (maybe because the comment led me to only think about the base case), so maybe spell out the two cases: a) We always save major checkpoints, so missingCheckpoint can't be lastMajorCheckpoint. b) For every other checkpoint lastMajorCheckpoint < i <= missingCheckpoint, in the prior iteration of the loop we checked HasSavedCheckpoint(i - 1). @@ +461,5 @@ > + mProcess->SendMessage(RestoreCheckpointMessage(restoreTarget)); > + return; > + } > + > + // Otherwise, run forward to the next checkpoint and save it. Maybe throw a clause in there like "if we haven't already told the child to save it"? I guess I just found the name ShouldSaveCheckpoint() to be a little confusing. @@ +789,5 @@ > + return gActiveChild->RewindTargetCheckpoint() <= gLastExplicitPause; > +} > + > +/////////////////////////////////////////////////////////////////////////////// > +// IPDL Forwarding Is there some way you could split this out into a different file (like MiddlemanProtocol.cpp)? It feels conceptually different from the earlier code, and this file is pretty big already. @@ +855,5 @@ > + IPC::Message::msgid_t type = aMessage.type(); > + > + // Ignore messages sent from the child to dead UI process targets. > + if (aSide == ipc::ParentSide) { > + if (type == dom::PBrowser::Msg_PDocAccessibleConstructor__ID) { Why does PDocAccessible need special handling like this? If this is about the manager annotation, I see a bunch of other protocols that have it in some form or another: https://searchfox.org/mozilla-central/search?q=manager&case=false&regexp=false&path=.ipdl There should be a comment explaining why if it isn't obvious. @@ +873,5 @@ > + return false; > + } > + > + // Handle messages that should be sent to both the middleman and the > + // content process. What's special about this particular list? How would a person in the future decide if a message should be added or removed from this list? Please document that in the comment. @@ +945,5 @@ > + gMainThreadBlocked = false; > + } > +}; > + > +class MiddlemanProtocol : public ipc::IToplevelProtocol This is a little ugly (no other non-codegenerated code inherits directly from IToplevelProtocol), but I guess this is the only way you can forward arbitrary messages from PContent without hooking into the IPDL codegen. @@ +963,5 @@ > + virtual void RemoveManagee(int32_t, IProtocol*) override { > + MOZ_CRASH("MiddlemanProtocol::RemoveManagee"); > + } > + > + static void ForwardMessageAsync(MiddlemanProtocol* aProtocol, Message* aMessage) { It is unfortunate that the Message handling isn't all UniquePtr, but I took a look at the similar existing code, and apparently we aren't using UniquePtr down to the Send() level yet, so I guess this is okay. @@ +982,5 @@ > + // If we do not have a recording process then just see if the message can > + // be handled in the middleman. > + if (!mOppositeMessageLoop) { > + MOZ_RELEASE_ASSERT(mSide == ipc::ChildSide); > + HandleMessageInMiddleman(mSide, aMessage); Should this produce some spew if the message is just dropped on the floor or is that too noisy? HandleMessageInMiddleman is a little wooly with the giant list of messages, and I could imagine somebody adding an important message that causes breakage, but right now there's no indication of that. @@ +1286,5 @@ > + // to the replaying children. > + ChildProcess* targetChild = ReplayingChildResponsibleForSavingCheckpoint(targetCheckpoint); > + MOZ_RELEASE_ASSERT(targetChild != gActiveChild); > + > + // This process will be the new active child, make sure it has saved the nit: this should be ", so make sure" or ". Make sure" or something. @@ +1373,5 @@ > + > + // Call breakpoint handlers until one of them explicitly resumes forward or > + // backward travel. > + for (size_t i = 0; i < aNumBreakpoints && gResumeForwardOrBackward; i++) { > + // FIXME what happens if this throws? Is this something that needs to be fixed? ::: toolkit/recordreplay/ipc/ParentIPC.h @@ +20,5 @@ > +namespace parent { > + > +// The middleman process is a content process that manages communication with > +// one or more child recording or replaying processes. It performs IPC with the > +// chrome process in the normal fashion for a content process, using the normal Maybe be consistent in this file about UI process vs. chrome process? Or do something like "chrome (UI) process" here at the start of the file. @@ +43,5 @@ > +// Get the message channel used to communicate with the UI process. > +ipc::MessageChannel* ChannelToUIProcess(); > + > +// Initialize state in a middleman process. > +void Initialize(int aArgc, char* aArgv[], base::ProcessId aParentPid); Maybe call this "InitializeMiddleman" so that it is clearer? ::: toolkit/recordreplay/ipc/ParentInternal.h @@ +79,5 @@ > +protected: > + ChildProcess* mProcess; > + Type mType; > + > + ChildRole(Type aType) You need an |explicit| annotation on here, I think. @@ +94,5 @@ > + virtual ~ChildRole() {} > + > + // These are called on the main thread. > + virtual void Initialize() {} > + virtual void Poke() {} The other methods are clear from their name, but what is "Poke" supposed to do? @@ +99,5 @@ > + virtual void OnIncomingMessage(const Message& aMsg) = 0; > +}; > + > +// Information about a recording or replaying child process. > +class ChildProcess Maybe rename this to ChildProcessInfo? Maybe it is just me, but I found it a little confusing that ChildProcess isn't like PFooChild, because it is in the middleman process rather than the child process. @@ +116,5 @@ > + > + // The current recovery stage of this process. When recovering, the child > + // process might not be in the exact place reflected by the state below, but > + // we will shepherd it to that spot and not be able to send or receive > + // messages until it does so. What is "recovery"? Is this just when a standby process is catching up to the active process, or is it used to deal with a more serious failure? I didn't see any explanation of what it means in the headers in this patch. @@ +149,5 @@ > + // The number of times we have restarted this process. > + size_t mNumRestarts; > + > + // Current role of this process. > + ChildRole* mRole; It would be better if this was a UniquePointer.
Attachment #8981703 - Flags: review?(jld)
Attachment #8981703 - Flags: review?(continuation)
Attachment #8981703 - Flags: review+
Thanks for the detailed review! (In reply to Andrew McCreight [:mccr8] from comment #40) > @@ +398,5 @@ > > + return; > > + } > > + msg = mMessages[mNumRecoveredMessages++]; > > + SendMessageRaw(*msg); > > + } while (msg->mType == MessageType::SetBreakpoint); > > Why does this break out of the loop when there's a non-SetBreakpoint > message? Are other messages going to be newer, after we started recovering > or something? This method returns once it has sent a message to the child that will unpause it, and SetBreakpoint messages are the only messages sent during recovery that will not cause the child to unpause. I'll update the comments to make it clearer what is happening. > @@ +855,5 @@ > > + IPC::Message::msgid_t type = aMessage.type(); > > + > > + // Ignore messages sent from the child to dead UI process targets. > > + if (aSide == ipc::ParentSide) { > > + if (type == dom::PBrowser::Msg_PDocAccessibleConstructor__ID) { > > Why does PDocAccessible need special handling like this? If this is about > the manager annotation, I see a bunch of other protocols that have it in > some form or another: > https://searchfox.org/mozilla-central/ > search?q=manager&case=false&regexp=false&path=.ipdl There should be a > comment explaining why if it isn't obvious. I think this will need to be expanded to cover other protocols which PBrowser manages, but so far PDocAccessible has been the only one that has caused problems. > @@ +873,5 @@ > > + return false; > > + } > > + > > + // Handle messages that should be sent to both the middleman and the > > + // content process. > > What's special about this particular list? How would a person in the future > decide if a message should be added or removed from this list? Please > document that in the comment. I'll add comments for each of these that explains why they are included. All the middleman needs to do itself is be able to correctly start up, shut down, and send rendering data to the UI process. > @@ +1373,5 @@ > > + > > + // Call breakpoint handlers until one of them explicitly resumes forward or > > + // backward travel. > > + for (size_t i = 0; i < aNumBreakpoints && gResumeForwardOrBackward; i++) { > > + // FIXME what happens if this throws? > > Is this something that needs to be fixed? Hmm, I'll modify this so that exceptions just get logged and are then dropped. > @@ +94,5 @@ > > + virtual ~ChildRole() {} > > + > > + // These are called on the main thread. > > + virtual void Initialize() {} > > + virtual void Poke() {} > > The other methods are clear from their name, but what is "Poke" supposed to > do? After state has changed that might affect a child that is sitting idle, Poke() should be called on the child's role to see if it needs to start doing something else. I'll add a comment. > @@ +116,5 @@ > > + > > + // The current recovery stage of this process. When recovering, the child > > + // process might not be in the exact place reflected by the state below, but > > + // we will shepherd it to that spot and not be able to send or receive > > + // messages until it does so. > > What is "recovery"? Is this just when a standby process is catching up to > the active process, or is it used to deal with a more serious failure? I > didn't see any explanation of what it means in the headers in this patch. Recovery can be used in both situations. It can be used to bring one process to the exact state and point of execution as another process, or after a child process crashes it can be used on a newly spawned process to bring that one to the exact state as the process that just crashed. I'll add an explanatory comment, it looks like I forgot to put one in altogether.
Comment on attachment 8983462 [details] [diff] [review] Part 8 - Allow spawning recording/replaying child processes and saving recordings. >+static bool >+CreateTemporaryRecordingFile(nsAString& aResult) >+{ >+ nsCOMPtr<nsIFile> file; >+ char filename[] = "RecordingXXXXXX"; >+ mktemp(filename); >+ >+ return !NS_FAILED(NS_GetSpecialDirectory(NS_OS_TEMP_DIR, getter_AddRefs(file))) >+ && !NS_FAILED(file->AppendNative(nsAutoCString(filename))) >+ && !NS_FAILED(file->GetPath(aResult)); >+} I don't see us using mktemp elsewhere in our code. And "This POSIX function is deprecated." and "Never use this function; see BUGS." in man 3 mktemp >+ContentParent::RecvCreateReplayingProcess(const uint32_t& aChannelId) >+{ >+ while (aChannelId >= mReplayingChildren.length()) { >+ if (!mReplayingChildren.append(nullptr)) { >+ return IPC_FAIL_NO_REASON(this); >+ } >+ } >+ if (mReplayingChildren[aChannelId]) { >+ return IPC_FAIL_NO_REASON(this); >+ } >+ >+ std::vector<std::string> extraArgs; Could you use our own arrays? Or can you not use them inside recordreplay? >+ enum RecordReplayState { Nit, { should be on its own line. >+ RECORDING, >+ REPLAYING, >+ NOT_RECORDING_OR_REPLAYING >+ }; Enums should be named eRecording, eReplaying ... (I know, we aren't very consistent with that, but should be) >+ // Whether this process is recording or replaying its execution, and any >+ // associated recording file. >+ RecordReplayState mRecordReplayState; >+ nsAutoString mRecordingFile; nsString as a member variable, please. No need to use more memory than needed, especially for non-performance related case. >+ sync OpenRecordReplayChannel(uint32_t channelId) >+ returns (FileDescriptor connection); needs ipc peer review >+ async CreateReplayingProcess(uint32_t channelId); Wait, child sends CreateReplayingProcess to parent? That sounds a bit scary. Compromised child process could do something quite unexpected. >+++ b/ipc/ipdl/sync-messages.ini >@@ -850,8 +850,10 @@ description = > [PContent::CreateChildProcess] > description = > [PContent::BridgeToChildProcess] > description = >+[PContent::OpenRecordReplayChannel] >+description = This needs ipc peer review. r- because that process creation looks a bit scary, but jld or someone should review this from security point of view.
Attachment #8983462 - Flags: review?(jld)
Attachment #8983462 - Flags: review?(bugs)
Attachment #8983462 - Flags: review-
(In reply to Olli Pettay [:smaug] from comment #42) > >+ContentParent::RecvCreateReplayingProcess(const uint32_t& aChannelId) > >+{ > >+ while (aChannelId >= mReplayingChildren.length()) { > >+ if (!mReplayingChildren.append(nullptr)) { > >+ return IPC_FAIL_NO_REASON(this); > >+ } > >+ } > >+ if (mReplayingChildren[aChannelId]) { > >+ return IPC_FAIL_NO_REASON(this); > >+ } > >+ > >+ std::vector<std::string> extraArgs; > Could you use our own arrays? Or can you not use them inside recordreplay? record/replay code can use our own vectors, arrays, etc. but GeckoChildProcessHost::LaunchAndWaitForProcessHandle takes a std::vector<std::string> argument. > >+ sync OpenRecordReplayChannel(uint32_t channelId) > >+ returns (FileDescriptor connection); > needs ipc peer review > > >+ async CreateReplayingProcess(uint32_t channelId); > Wait, child sends CreateReplayingProcess to parent? > That sounds a bit scary. Compromised child process could do something quite > unexpected. Well, the middleman needs to be able to spawn new processes after the sandbox has been initialized, and PContent already has ways for child processes to spawn each other, e.g. CreateChildProcess. In this case all the existing child can do is specify the channel ID for the new process to use, so it shouldn't be possible for a compromised child to do anything but spam the parent with CreateReplayingProcess messages.
I'm on Core:General triage, and Core:IPC is my guess at the right component for this bug. Feel free to change that if I'm wrong.
Component: General → IPC
Comment on attachment 8981710 [details] [diff] [review] Part 9 - Allow copying IPDL messages. See comment 36 for an explanation about the BufferList copying issue.
Attachment #8981710 - Flags: review- → review?(nfroyd)
Hi, do you know when you'll be able to look at the patches here and in bug 1466877?
Flags: needinfo?(jld)
Sorry; I've been juggling a few other things, and the All Hands was in the middle of all this. I should be able to get to these patches early next week.
Flags: needinfo?(jld)
Comment on attachment 8981710 [details] [diff] [review] Part 9 - Allow copying IPDL messages. Review of attachment 8981710 [details] [diff] [review]: ----------------------------------------------------------------- Comment 36 addresses my previous complaint from comment 32. I still think some asserts would be nice, but either way.
Attachment #8981710 - Flags: review?(nfroyd) → review+
Comment on attachment 8981701 [details] [diff] [review] Part 1 - IPC channels. Review of attachment 8981701 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/recordreplay/ipc/Channel.cpp @@ +23,5 @@ > +static void > +GetSocketAddress(struct sockaddr_un* addr, base::ProcessId aMiddlemanPid, size_t aId) > +{ > + addr->sun_family = AF_UNIX; > + int n = snprintf(addr->sun_path, sizeof(addr->sun_path), "/tmp/WebReplay_%d_%d", aMiddlemanPid, (int) aId); Traditionally on Unix with this kind of thing, there's the question of what happens when another user connects to the socket and sends malicious input. (Or, instead of an actual other user, as a question of defense-in-depth against a compromised service running as a dedicated uid.) This might not be as much of a problem on OS X, but we'd definitely want to revisit it on Linux. @@ +53,5 @@ > + > + struct sockaddr_un addr; > + GetSocketAddress(&addr, child::MiddlemanProcessId(), mId); > + > + int rv = connect(mFd, (sockaddr*) &addr, SUN_LEN(&addr)); You'll want HANDLE_EINTR (the one in ipc/chromium) on every socket operation that can potentially block: connect, accept, send*, recv*. (I see that there are already some EINTR checks; those can stay as-is, or HANDLE_EINTR can replace them.) @@ +94,5 @@ > + MOZ_RELEASE_ASSERT(IsMiddleman()); > + > + struct sockaddr_un addr; > + socklen_t len = sizeof(addr); > + channel->mFd = accept(channel->mConnectionFd, (sockaddr*) &addr, &len); The address and length arguments can be nullptr, if you're not using them. @@ +100,5 @@ > + > + HelloMessage msg; > + msg.mMagic = MagicValue; > + > + int rv = send(channel->mFd, &msg, sizeof(msg), 0); Normally I'd suggest MSG_NOSIGNAL, but you're going to crash on error anyway for all of these operations, so it may not be useful. @@ +173,5 @@ > + > + ssize_t nbytes = recv(mFd, &mMessageBuffer[mMessageBytes], > + mMessageBuffer.length() - mMessageBytes, 0); > + if (nbytes < 0) { > + MOZ_RELEASE_ASSERT(errno == EAGAIN); Why would this fail with EAGAIN? This isn't using nonblocking I/O or receive timeouts that I can see. @@ +181,5 @@ > + if (IsMiddleman()) { > + return nullptr; > + } > + PrintSpew("Channel disconnected, exiting...\n"); > + DeleteSnapshotFiles(); This looks like a potential conflict with sandboxing — does the child successfully delete these files? — but maybe there's another patch in this cluster of bugs that deals with that. @@ +202,5 @@ > + return res; > +} > + > +static char* > +WideCharString(const char16_t* aBuffer, size_t aBufferSize) This function could use a better name — it's taking a UTF-16 string and lossily converting it to ASCII (by removing the high octet of each code unit; this also answers the earlier comment about memmove). For debug logging of likely-ASCII strings this is fine, but it's a bad idea for any other use. If it's possible to use XPCOM strings here I'd suggest just using NS_ConvertUTF16toUTF8 instead. ::: toolkit/recordreplay/ipc/Channel.h @@ +119,5 @@ > + ForEachMessageType(DefineEnum) > +#undef DefineEnum > +}; > + > +struct Message Not something that should block this bug, but a possible followup: I wonder if it's possible to use IPDL instead of all this message boilerplate. I understand that the normal IPC channels can't be used here, but the Pickle/ParamTraits layer should be relatively self-contained. (There's also the question of whether it would actually be an improvement, or enough of one to justify the work of changing everything to use it.) @@ +126,5 @@ > + > + // Total message size, including the header. > + uint32_t mSize; > + > + Message(MessageType aType, uint32_t aSize) Does this constructor need to be public? It's looks like it's used only by subclasses, and doesn't seem useful outside of that. @@ +149,5 @@ > + } > + > +protected: > + template <typename T, typename Elem> > + Elem* Data() { return (Elem*) (sizeof(T) + (char*) this); } These methods are a little scary, but it's a reasonably nice way to handle the inline strings/arrays given that this wheel needs to be reinvented.
Attachment #8981701 - Flags: review?(jld) → review+
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #49) > @@ +173,5 @@ > > + > > + ssize_t nbytes = recv(mFd, &mMessageBuffer[mMessageBytes], > > + mMessageBuffer.length() - mMessageBytes, 0); > > + if (nbytes < 0) { > > + MOZ_RELEASE_ASSERT(errno == EAGAIN); > > Why would this fail with EAGAIN? This isn't using nonblocking I/O or > receive timeouts that I can see. I'm not sure, I'll try removing this code (I think the system call used here was different in the past). > @@ +181,5 @@ > > + if (IsMiddleman()) { > > + return nullptr; > > + } > > + PrintSpew("Channel disconnected, exiting...\n"); > > + DeleteSnapshotFiles(); > > This looks like a potential conflict with sandboxing — does the child > successfully delete these files? — but maybe there's another patch in this > cluster of bugs that deals with that. Yeah, the patch in bug 1466877 part 2 removes references to the snapshot files, which no longer exist.
(Just as an update, I've started looking at the remaining patches, but I keep getting preempted by IPC things and I'm not done yet.)
Comment on attachment 8981702 [details] [diff] [review] Part 2 - Child side of record/replay IPC. Review of attachment 8981702 [details] [diff] [review]: ----------------------------------------------------------------- The part of this I was asked to review, I've already reviewed the amended version of in bug 1466877.
Attachment #8981702 - Flags: review?(jld) → review+
Comment on attachment 8981703 [details] [diff] [review] Part 3 - Middleman side of record/replay IPC. Review of attachment 8981703 [details] [diff] [review]: ----------------------------------------------------------------- Most of the part of this I was asked to review, I've already reviewed the amended version of in bug 1466877. (The rest is InitializeGraphicsMemory, which is fine as-is, but if GraphicsMemorySize is changed then see my earlier comment.)
Attachment #8981703 - Flags: review?(jld) → review+
Comment on attachment 8983462 [details] [diff] [review] Part 8 - Allow spawning recording/replaying child processes and saving recordings. Review of attachment 8983462 [details] [diff] [review]: ----------------------------------------------------------------- I was afraid that this patch was going to need me to chase the call graph through all the other Web Replay patches, but I think I have a good enough idea of what's going on without that. r=me with some changes. Also, I notice that this is still r-(smaug), and I would like to have a DOM peer sign off on the part of the patch that passes the frame Element into ContentParent. ::: dom/ipc/ContentParent.cpp @@ +767,5 @@ > +CreateTemporaryRecordingFile(nsAString& aResult) > +{ > + nsCOMPtr<nsIFile> file; > + char filename[] = "RecordingXXXXXX"; > + mktemp(filename); This won't do what you want. mktemp() expects the template to be a filesystem path and will stat() the transformed versions to find one that doesn't exist. This has a security problem: if you're using a public temporary directory like /tmp, another user can race you and cause you to follow a malicious symlink instead. But this also doesn't even ensure uniqueness without an attacker, because mktemp will be checking the current directory for existing files instead of NS_OS_TEMP_DIR. mkstemp() should work, given the actual path; mkdtemp() may also be of interest. (The comments in nsIFile.idl about losing information converting to/from a string should be ignorable here: my understanding is that they're about old OSes where distinct storage volumes could have the same name and had to be referenced by ID, not about POSIX.) @@ +1433,5 @@ > if (aMethod == SEND_SHUTDOWN_MESSAGE) { > + if (const char* directory = recordreplay::parent::SaveAllRecordingsDirectory()) { > + // Save a recording for the child process before it shuts down. > + char file[] = "RecordingXXXXXX"; > + nsPrintfCString buf("%s/%s", directory, mktemp(file)); See above about mktemp. @@ +2025,5 @@ > +} > + > +mozilla::ipc::IPCResult > +ContentParent::RecvCreateReplayingProcess(const uint32_t& aChannelId) > +{ Can this method require that it's used only from a middleman process (comment #43)? If I understand correctly, those processes don't render Web content themselves, so they're relatively trustworthy. @@ +2039,5 @@ > + std::vector<std::string> extraArgs; > + recordreplay::parent::GetArgumentsForChildProcess(Pid(), aChannelId, > + NS_ConvertUTF16toUTF8(mRecordingFile).get(), > + /* aRecording = */ false, > + extraArgs); Does this use any information from the recording file, or does it just pass the pathname and format the other arguments? When I saw this I was concerned that a compromised content process might be able to inject arguments. @@ +2051,5 @@ > +} > + > +mozilla::ipc::IPCResult > +ContentParent::RecvTerminateReplayingProcess(const uint32_t& aChannelId) > +{ See also the comment about CreateReplayingProcess and restricting this to middlemen. ::: ipc/ipdl/sync-messages.ini @@ +851,5 @@ > description = > [PContent::BridgeToChildProcess] > description = > +[PContent::OpenRecordReplayChannel] > +description = Can you describe why this needs to be sync and can't use an async promise-returning message? Alternately, if it could in theory be async but fixing that is complex enough to be a separate bug, I'd accept filing a followup and putting the bug number in the description. (Yes, we've been *very* lax about the description field, probably because the first version of this file left everything blank (I think it was autogenerated?), but we should try to do better.)
Attachment #8983462 - Flags: review?(jld) → review+
Comment on attachment 8983462 [details] [diff] [review] Part 8 - Allow spawning recording/replaying child processes and saving recordings. Hi Andrew, do you mind looking at the DOM parts of this patch (see the first part of comment 54). Thanks! (And also, thanks for all the other reviews you've been doing!)
Attachment #8983462 - Flags: review- → review?(continuation)
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #54) > Also, I notice that this is still r-(smaug), and I would like to have a DOM > peer sign off on the part of the patch that passes the frame Element into > ContentParent. FWIW, it seemed like Olli was okay with the patch, except for the stuff about spawning a new process. I guess he's on vacation now, so I'll take a look at the patch today or tomorrow.
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #54) > Comment on attachment 8983462 [details] [diff] [review] > > ::: dom/ipc/ContentParent.cpp > @@ +767,5 @@ > > +CreateTemporaryRecordingFile(nsAString& aResult) > > +{ > > + nsCOMPtr<nsIFile> file; > > + char filename[] = "RecordingXXXXXX"; > > + mktemp(filename); > > This won't do what you want. > > mktemp() expects the template to be a filesystem path and will stat() the > transformed versions to find one that doesn't exist. This has a security > problem: if you're using a public temporary directory like /tmp, another > user can race you and cause you to follow a malicious symlink instead. But > this also doesn't even ensure uniqueness without an attacker, because mktemp > will be checking the current directory for existing files instead of > NS_OS_TEMP_DIR. > > mkstemp() should work, given the actual path; mkdtemp() may also be of > interest. (The comments in nsIFile.idl about losing information converting > to/from a string should be ignorable here: my understanding is that they're > about old OSes where distinct storage volumes could have the same name and > had to be referenced by ID, not about POSIX.) I looked at some other places where we create temporary files and changed this to the code below, which uses NS_OS_TEMP_DIR and less randomization. static bool CreateTemporaryRecordingFile(nsAString& aResult) { unsigned long elapsed = (TimeStamp::Now() - TimeStamp::ProcessCreation()).ToMilliseconds(); nsCOMPtr<nsIFile> file; return !NS_FAILED(NS_GetSpecialDirectory(NS_OS_TEMP_DIR, getter_AddRefs(file))) && !NS_FAILED(file->AppendNative(nsPrintfCString("Recording%lu", elapsed))) && !NS_FAILED(file->GetPath(aResult)); } > @@ +2039,5 @@ > > + std::vector<std::string> extraArgs; > > + recordreplay::parent::GetArgumentsForChildProcess(Pid(), aChannelId, > > + NS_ConvertUTF16toUTF8(mRecordingFile).get(), > > + /* aRecording = */ false, > > + extraArgs); > > Does this use any information from the recording file, or does it just pass > the pathname and format the other arguments? When I saw this I was > concerned that a compromised content process might be able to inject > arguments. This method just formats the arguments that were passed into it. It is defined in part 3 of bug 1466877. > ::: ipc/ipdl/sync-messages.ini > @@ +851,5 @@ > > description = > > [PContent::BridgeToChildProcess] > > description = > > +[PContent::OpenRecordReplayChannel] > > +description = > > Can you describe why this needs to be sync and can't use an async > promise-returning message? Alternately, if it could in theory be async but > fixing that is complex enough to be a separate bug, I'd accept filing a > followup and putting the bug number in the description. > > (Yes, we've been *very* lax about the description field, probably because > the first version of this file left everything blank (I think it was > autogenerated?), but we should try to do better.) This could be async and use a promise, but the code which sends it expects everything to be done synchronously so will need refactoring. I'll file a followup bug.
Depends on: 1475898
Comment on attachment 8983462 [details] [diff] [review] Part 8 - Allow spawning recording/replaying child processes and saving recordings. Review of attachment 8983462 [details] [diff] [review]: ----------------------------------------------------------------- I'm afraid I don't really know anything about this code. Blake, could you take a look please? I think the only thing you really need to look at is the changes to GetNewOrUsedBrowserProcess, per jld's comment: "Also, I notice that this is still r-(smaug), and I would like to have a DOM peer sign off on the part of the patch that passes the frame Element into ContentParent." ::: dom/ipc/ContentParent.h @@ +739,5 @@ > bool aLoadUri); > > FORWARD_SHMEM_ALLOCATOR_TO(PContentParent) > > + enum RecordReplayState { nit: { on the next line @@ +1293,5 @@ > > + // Whether this process is recording or replaying its execution, and any > + // associated recording file. > + RecordReplayState mRecordReplayState; > + nsAutoString mRecordingFile; As Olli said, nsAutoString shouldn't be used for a member variable. I think we even have a static analysis for this.
Attachment #8983462 - Flags: review?(continuation) → review?(mrbkap)
Comment on attachment 8983462 [details] [diff] [review] Part 8 - Allow spawning recording/replaying child processes and saving recordings. Review of attachment 8983462 [details] [diff] [review]: ----------------------------------------------------------------- r=me with Andrew and Olli's comments addressed. ::: dom/ipc/ContentParent.cpp @@ +770,5 @@ > + char filename[] = "RecordingXXXXXX"; > + mktemp(filename); > + > + return !NS_FAILED(NS_GetSpecialDirectory(NS_OS_TEMP_DIR, getter_AddRefs(file))) > + && !NS_FAILED(file->AppendNative(nsAutoCString(filename))) Nit: we prefer && at the end of the previous line in Gecko. @@ +792,5 @@ > + recordReplayState = REPLAYING; > + } else { > + aFrameElement->GetAttr(kNameSpaceID_None, nsGkAtoms::RecordExecution, recordingFile); > + if (recordingFile.IsEmpty() && recordreplay::parent::SaveAllRecordingsDirectory()) { > + recordingFile = NS_LITERAL_STRING("*"); recordingFile.AssignLiteral("*");
Attachment #8983462 - Flags: review?(mrbkap) → review+
Pushed by bhackett@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/d910657e2fc2 Part 1 - IPC channels, r=mccr8,jld. https://hg.mozilla.org/integration/mozilla-inbound/rev/fd1477afca0d Part 2 - Child side of record/replay IPC, r=mccr8,jld. https://hg.mozilla.org/integration/mozilla-inbound/rev/dedac27e743a Part 3 - Middleman side of record/replay IPC, r=mccr8,jld. https://hg.mozilla.org/integration/mozilla-inbound/rev/264841180e7e Part 4 - Add dummy IPC implementation for disabled platforms, r=mccr8. https://hg.mozilla.org/integration/mozilla-inbound/rev/d7c9ef377180 Part 5 - Initialize record/replay and middleman state, r=mccr8. https://hg.mozilla.org/integration/mozilla-inbound/rev/bb07bb364de6 Part 6 - Use correct channel and pids when communicating with a middleman process, r=mccr8. https://hg.mozilla.org/integration/mozilla-inbound/rev/da197fbfe21f Part 7 - Add IPDL and IDL for record/replay directives and fatal errors, r=mccr8. https://hg.mozilla.org/integration/mozilla-inbound/rev/271b647d8627 Part 8 - Allow spawning recording/replaying child processes and saving recordings, r=jld,mrbkap. https://hg.mozilla.org/integration/mozilla-inbound/rev/c9baf2774253 Part 9 - Allow copying IPDL messages, r=froydnj.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: