Closed Bug 947632 Opened 8 years ago Closed 8 years ago

CPOW failure in session restore code during shutdown

Categories

(Core :: IPC, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: billm, Assigned: billm)

References

Details

Attachments

(3 files, 1 obsolete file)

Attached patch rollupSplinter Review
I'm attaching the patch that enables session restore in e10s. It should be based on 8648aa476eef. When I run with it, I get a failure when I quit Firefox:

[time:1386442427450587][11747][PJavaScriptParent] Sending Msg_Call([TODO])
[time:1386442427450748][11779][PJavaScriptChild] Received Msg_Call([TODO])
[time:1386442427451856][11779][PBrowserChild] Sending Msg_SyncMessage([TODO])

###!!! [Child][MessageChannel::SendAndWait] Error: Channel error: cannot send/recv

************************************************************
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "'[JavaScript Error: "child process crashed or timedout" {file: "resource:///modules/sessionstore/TabState.jsm" line: 108}]' when calling method: [nsIObserver::observe]"  nsresult: "0x80570021 (NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS)"  location: "JS frame :: chrome://global/content/globalOverlay.js :: canQuitApplication :: line 45"  data: yes]
************************************************************

The session restore code in the parent is sending an urgent message to the child. While handling the urgent message, the child tries to do sendSyncMessage. When sending the sync message, the IPC code deadlocks or something: the "cannot send/recv" error seems to be produced because of a timeout.
Attached patch cpow-assert (obsolete) — Splinter Review
Here's the assertion we discussed.
Assignee: nobody → wmccloskey
Status: NEW → ASSIGNED
Attachment #8346281 - Flags: review?(dvander)
Attached patch cpow-assert v2Splinter Review
Forgot the recursion count.
Attachment #8346281 - Attachment is obsolete: true
Attachment #8346281 - Flags: review?(dvander)
Attachment #8346286 - Flags: review?(dvander)
Attachment #8346286 - Flags: review?(dvander) → review+
This problem happens any time we call MessageQueue.flush. The sequence is as follows:

- Parent sends CPOW request down to call SyncHandler.flush(id).
- Child sends up sync message in response.

The underlying problem here is that we have a rule that code in the child process is not allowed to call sendSyncMessage while responding to a CPOW request. Instead it has to use sendRpcMessage, which is functionally identical to sendSyncMessage. I've changed the sendSyncMessage call to sendRpcMessage, which makes the code work.

To make it more obvious that this is a problem, the other patch in this bug adds an assertion to make sure we don't violate the rule.

Here's why we have the rule. Let's say that, when sending the CPOW, the parent has some messages from the child in its queue that haven't been processed yet. We don't want to process those messages right away when sending the CPOW request, since chrome code may not be prepared to handle messages at that time. So we just delay them until the CPOW is done. However, if the child sends a sync message while handling the CPOW, we definitely need to process that message to avoid deadlocking. But that means that the sync message has "jumped ahead" of the messages that were queued before sending the CPOW. Normally we try to process messages in order.

So we introduced sendRpcMessage to essentially say "it's okay to process this message out of order if it's sent while handling a CPOW." In all other ways, it works the same as sendSyncMessage.
Attachment #8346290 - Flags: review?(ttaubert)
Comment on attachment 8346290 [details] [diff] [review]
session-store-cpow-fix

Review of attachment 8346290 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense to me. Thanks for finding and fixing that!
Attachment #8346290 - Flags: review?(ttaubert) → review+
https://hg.mozilla.org/mozilla-central/rev/d6a8a4c5fbd5
https://hg.mozilla.org/mozilla-central/rev/24ba20d0a8e5
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Depends on: 1012883
You need to log in before you can comment on or make changes to this bug.