"Route error: message sent to unknown actor ID" when swapping browsers too fast

RESOLVED FIXED in Firefox 42

Status

()

Core
IPC
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: Away for a while, Assigned: billm)

Tracking

unspecified
mozilla43
Points:
---

Firefox Tracking Flags

(e10s+, firefox42 fixed, firefox43 fixed)

Details

Attachments

(3 attachments)

(Reporter)

Description

2 years ago
The test in bug 1190106 hits this.  See the setTimeout(1000) in browser/base/content/test/general/browser_audioTabIcon.js that I am adding in that bug.  Without that, I get:

###!!! [Child][DispatchAsyncMessage] Error: (msgtype=0x20006F,name=???) Route error: message sent to unknown actor ID
[Child 49157] ###!!! ABORT: Content child abort due to IPC error: file /Users/ehsan/moz/src/dom/ipc/ContentChild.cpp, line 1964
#01: mozilla::dom::ContentChild::ProcessingError(mozilla::ipc::HasResultCodes::Result, char const*) (ContentChild.cpp:1965, in XUL)
#02: mozilla::dom::PContentChild::OnProcessingError(mozilla::ipc::HasResultCodes::Result, char const*) (.PContentChild.cpp:7313, in XUL)
#03: mozilla::ipc::MessageChannel::MaybeHandleError(mozilla::ipc::HasResultCodes::Result, IPC::Message const&, char const*) (MessageChannel.cpp:1745, in XUL)
#04: mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) (MessageChannel.cpp:1385, in XUL)
#05: mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message const&) (MessageChannel.cpp:1302, in XUL)
#06: mozilla::ipc::MessageChannel::OnMaybeDequeueOne() (MessageChannel.cpp:1275, in XUL)
#07: void DispatchToMethod<mozilla::ipc::MessageChannel, bool (mozilla::ipc::MessageChannel::*)()>(mozilla::ipc::MessageChannel*, bool (mozilla::ipc::MessageChannel::*)(), Tuple0 const&) (tuple.h:388, in XUL)
#08: RunnableMethod<mozilla::ipc::MessageChannel, bool (mozilla::ipc::MessageChannel::*)(), Tuple0>::Run() (task.h:311, in XUL)
#09: mozilla::ipc::MessageChannel::RefCountedTask::Run() (MessageChannel.h:459, in XUL)
#10: mozilla::ipc::MessageChannel::DequeueTask::Run() (MessageChannel.h:476, in XUL)
#11: MessageLoop::RunTask(Task*) (message_loop.cc:365, in XUL)
#12: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) (message_loop.cc:375, in XUL)
#13: MessageLoop::DoWork() (message_loop.cc:459, in XUL)
#14: mozilla::ipc::DoWorkRunnable::Run() (MessagePump.cpp:221, in XUL)
#15: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:868, in XUL)
#16: NS_ProcessPendingEvents(nsIThread*, unsigned int) (nsThreadUtils.cpp:219, in XUL)
#17: nsBaseAppShell::NativeEventCallback() (nsBaseAppShell.cpp:100, in XUL)
#18: nsAppShell::ProcessGeckoEvents(void*) (nsAppShell.mm:388, in XUL)
#19: __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ (in CoreFoundation) + 17
#20: __CFRunLoopDoSources0 (in CoreFoundation) + 269
#21: __CFRunLoopRun (in CoreFoundation) + 927
#22: CFRunLoopRunSpecific (in CoreFoundation) + 296
#23: RunCurrentEventLoopInMode (in HIToolbox) + 235
#24: ReceiveNextEventCommon (in HIToolbox) + 431
#25: _BlockUntilNextEventMatchingListInModeWithFilter (in HIToolbox) + 71
#26: _DPSNextEvent (in AppKit) + 978
#27: -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] (in AppKit) + 346
#28: -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] (nsAppShell.mm:122, in XUL)
#29: -[NSApplication run] (in AppKit) + 594
#30: nsAppShell::Run() (nsAppShell.mm:663, in XUL)
#31: XRE_RunAppShell (nsEmbedFunctions.cpp:785, in XUL)
#32: mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) (MessagePump.cpp:259, in XUL)
#33: MessageLoop::RunInternal() (message_loop.cc:235, in XUL)
#34: MessageLoop::RunHandler() (message_loop.cc:228, in XUL)
#35: MessageLoop::Run() (message_loop.cc:201, in XUL)
#36: XRE_InitChildProcess (nsEmbedFunctions.cpp:621, in XUL)
#37: content_process_main(int, char**) (plugin-container.cpp:237, in plugin-container)
#38: main (MozillaRuntimeMain.cpp:11, in plugin-container)
[Child 49157] ###!!! ABORT: Content child abort due to IPC error: file /Users/ehsan/moz/src/dom/ipc/ContentChild.cpp, line 1964
Hit MOZ_CRASH() at /Users/ehsan/moz/src/memory/mozalloc/mozalloc_abort.cpp:33
###!!! [Parent][MessageChannel] Error: (msgtype=0x200081,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
++DOCSHELL 0x11d4f8000 == 9 [pid = 49156] [id = 10]
++DOMWINDOW == 17 (0x11dacec00) [pid = 49156] [serial = 23] [outer = 0x0]
++DOMWINDOW == 18 (0x12891c800) [pid = 49156] [serial = 24] [outer = 0x11dacec00]
###!!! [Parent][MessageChannel] Error: (msgtype=0x200081,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
++DOCSHELL 0x12dc32000 == 10 [pid = 49156] [id = 11]
++DOMWINDOW == 19 (0x12d884800) [pid = 49156] [serial = 25] [outer = 0x0]
++DOMWINDOW == 20 (0x12e403000) [pid = 49156] [serial = 26] [outer = 0x12d884800]
###!!! [Parent][MessageChannel] Error: (msgtype=0x200081,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
++DOCSHELL 0x12e5ad800 == 11 [pid = 49156] [id = 12]
++DOMWINDOW == 21 (0x12e6bf400) [pid = 49156] [serial = 27] [outer = 0x0]
++DOMWINDOW == 22 (0x12e738800) [pid = 49156] [serial = 28] [outer = 0x12e6bf400]
###!!! [Parent][MessageChannel] Error: (msgtype=0x200081,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
++DOCSHELL 0x11d968800 == 12 [pid = 49156] [id = 13]
++DOMWINDOW == 23 (0x12e3ea000) [pid = 49156] [serial = 29] [outer = 0x0]
++DOMWINDOW == 24 (0x12e878000) [pid = 49156] [serial = 30] [outer = 0x12e3ea000]
80 INFO Console message: [JavaScript Error: "remote browser crashed while on about:blank
" {file: "chrome://mochikit/content/mochitest-e10s-utils.js" line: 8}]
++DOMWINDOW == 25 (0x12dc54c00) [pid = 49156] [serial = 31] [outer = 0x11dacec00]
81 INFO Console message: [JavaScript Error: "remote browser crashed while on about:blank
" {file: "chrome://mochikit/content/mochitest-e10s-utils.js" line: 8}]
++DOMWINDOW == 26 (0x12eb4b400) [pid = 49156] [serial = 32] [outer = 0x12d884800]
82 INFO Console message: [JavaScript Error: "remote browser crashed while on about:blank
" {file: "chrome://mochikit/content/mochitest-e10s-utils.js" line: 8}]
++DOMWINDOW == 27 (0x12f96d400) [pid = 49156] [serial = 33] [outer = 0x12e6bf400]
83 INFO Console message: [JavaScript Error: "remote browser crashed while on about:blank
" {file: "chrome://mochikit/content/mochitest-e10s-utils.js" line: 8}]
(Reporter)

Comment 1

2 years ago
To reproduce, reverse apply this patch and then run the test in e10s mode:

diff --git a/browser/base/content/test/general/browser_audioTabIcon.js b/browser/base/content/test/general/browser_audioTabIcon.js
index 87e3a3a..2eb989d 100644
--- a/browser/base/content/test/general/browser_audioTabIcon.js
+++ b/browser/base/content/test/general/browser_audioTabIcon.js
@@ -141,6 +141,9 @@ function* test_browser_swapping(tab, browser) {
   }, function*(newBrowser) {
     yield test_swapped_browser(tab, newBrowser, true)

+    // FIXME: this is needed to work around bug 1190903.
+    yield new Promise(resolve => setTimeout(resolve, 1000));
+
     // Now, test swapping with a muted but not playing tab.
     // Note that the tab remains muted, so we only need to pause playback.
     tab = gBrowser.getTabForBrowser(newBrowser);
(Reporter)

Comment 2

2 years ago
(This is the code in question in the test: https://hg.mozilla.org/integration/mozilla-inbound/rev/4a9d48b58545#l2.47)
tracking-e10s: ? → +
(Reporter)

Comment 3

2 years ago
Can this be prioritized please?  This is showing up as a terrible intermittent orange.
Blocks: 1192449
Flags: needinfo?(wmccloskey)
(Assignee)

Comment 4

2 years ago
As discussed on IRC, I'm not able to run this test at all. I spent some time debugging it and it seems that the tooltip is never shown because we get here:
http://mxr.mozilla.org/mozilla-central/source/browser/base/content/tabbrowser.xml#3902
In my case, tab.localName == "tabs". I'll see if I can figure out why.
Flags: needinfo?(wmccloskey)
(Assignee)

Comment 5

2 years ago
Ehsan, if you know where we're supposed to generate the right tooltipNode element, it's probably quicker than me trying to figure it out.
Flags: needinfo?(ehsan)
(Reporter)

Comment 6

2 years ago
I think it comes from nsXULPopupManager::GetLastTriggerNode, which is later returned from XULDocument::GetTooltipNode.

What happens if you comment out that preventDefault() line?
Flags: needinfo?(ehsan)
(Assignee)

Comment 7

2 years ago
OK, I traced it back a little further from GetLastTriggerNode, and I think it comes from here:
http://mxr.mozilla.org/mozilla-central/source/layout/xul/nsXULTooltipListener.cpp#198
I still don't quite understand how this will be the "tab" node and not the icon. But I'll debug some more when I have a chance.

> What happens if you comment out that preventDefault() line?

It's already too late because the code in that function is what sets the tooltip text. If we return early, then we don't set the text to "Mute audio" and the test fails.
(Reporter)

Comment 8

2 years ago
(In reply to Bill McCloskey (:billm) from comment #7)
> OK, I traced it back a little further from GetLastTriggerNode, and I think
> it comes from here:
> http://mxr.mozilla.org/mozilla-central/source/layout/xul/
> nsXULTooltipListener.cpp#198
> I still don't quite understand how this will be the "tab" node and not the
> icon. But I'll debug some more when I have a chance.

Could it be because we don't wait for the tab animation to finish?  Jared?

> > What happens if you comment out that preventDefault() line?
> 
> It's already too late because the code in that function is what sets the
> tooltip text. If we return early, then we don't set the text to "Mute audio"
> and the test fails.

Well, but you could comment out those checks, right?  The crash should have nothing to do with the tooltip, I'm pretty sure.
Flags: needinfo?(jaws)
(Reporter)

Comment 9

2 years ago
(In reply to (Ask others for review please; out most of the week) from comment #8)
> > > What happens if you comment out that preventDefault() line?
> > 
> > It's already too late because the code in that function is what sets the
> > tooltip text. If we return early, then we don't set the text to "Mute audio"
> > and the test fails.
> 
> Well, but you could comment out those checks, right?  The crash should have
> nothing to do with the tooltip, I'm pretty sure.

Even if you don't comment then out, you should get to the point that the test crashes, unless I'm misunderstanding something.
(Assignee)

Comment 10

2 years ago
(In reply to (Ask others for review please; out most of the week) from comment #9)
> Even if you don't comment then out, you should get to the point that the
> test crashes, unless I'm misunderstanding something.

Well, I did comment out the preventDefault line and I didn't crash. I just hang at some other point in the test. I didn't look into why.

I'm sorry I haven't had time to look at this again yet. I'll try to get to it later today or tomorrow.
Flags: needinfo?(wmccloskey)
(Reporter)

Comment 11

2 years ago
Thanks Bill, and please ping me if you need help in massaging the test.  :)
(Assignee)

Comment 12

2 years ago
Created attachment 8649035 [details] [diff] [review]
trigger crash

OK, finally got it to crash. I had to comment out more stuff. I'll look into this tomorrow.

Just to not, putting extra delays in did not fix the original problem I was having. Something is really messed up there.
Flags: needinfo?(wmccloskey)
(Assignee)

Updated

2 years ago
Flags: needinfo?(wmccloskey)
(Assignee)

Comment 13

2 years ago
Created attachment 8649383 [details] [diff] [review]
ime patch

This patch fixes the main problem. We are sending a StopIMEStateManagement message after the tab has been destroyed. The rule is that we're not allowed to send any more messages to the PBrowser after TabParent::Destroy. The Destroy message that gets sent there causes the child to delete its TabChild. If another messages comes in after that, the child won't have an actor to pass it to.

I think we still need the OnTabParentDestroying call in ActorDestroy in case the child process crashes. In that case, Destroy() is never called.
Assignee: nobody → wmccloskey
Status: NEW → ASSIGNED
Flags: needinfo?(wmccloskey)
Attachment #8649383 - Flags: review?(masayuki)
(Assignee)

Comment 14

2 years ago
Created attachment 8649389 [details] [diff] [review]
docshell active patch

I also noticed a second problem when running this test. After doing a tab swap, the tab shows up black. If I switch away and switch back, it draws normally.

We don't seem to be setting the docShellIsActive property correctly when swapping docshells. I'm kinda surprised about this, but I looked over the code, and I just don't see it covered. In e10s, failing to set docShellIsActive causes the layer tree for the tab not to be sent, so nothing can be drawn.

The tricky part is deciding where to put this code. I think we need the tabbrowser to be involved since it's the only piece that knows whether a browser should be active or not. (In e10s there's no way to query this property from the parent process.)

I think Tim has reviewed this stuff in the past, but since he's out, are you okay to look at this Gijs?
Attachment #8649389 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8649389 [details] [diff] [review]
docshell active patch

(In reply to Bill McCloskey (:billm) from comment #14)
> Created attachment 8649389 [details] [diff] [review]
> docshell active patch
> 
> I also noticed a second problem when running this test. After doing a tab
> swap, the tab shows up black. If I switch away and switch back, it draws
> normally.
> 
> We don't seem to be setting the docShellIsActive property correctly when
> swapping docshells. I'm kinda surprised about this, but I looked over the
> code, and I just don't see it covered. In e10s, failing to set
> docShellIsActive causes the layer tree for the tab not to be sent, so
> nothing can be drawn.
> 
> The tricky part is deciding where to put this code. I think we need the
> tabbrowser to be involved since it's the only piece that knows whether a
> browser should be active or not. (In e10s there's no way to query this
> property from the parent process.)
> 
> I think Tim has reviewed this stuff in the past, but since he's out, are you
> okay to look at this Gijs?

I can, and it looks sane-ish to me, but I think mconley would be a better choice... but he seems to be out. I'll look in more depth tomorrow.
Flags: needinfo?(jaws)
(Assignee)

Comment 16

2 years ago
Comment on attachment 8649389 [details] [diff] [review]
docshell active patch

Review of attachment 8649389 [details] [diff] [review]:
-----------------------------------------------------------------

I just remembered that Dave has worked on this code pretty recently for e10s. Maybe he can review it?
Attachment #8649389 - Flags: review?(dtownsend)
Comment on attachment 8649383 [details] [diff] [review]
ime patch

> I think we still need the OnTabParentDestroying call in ActorDestroy in case
> the child process crashes. In that case, Destroy() is never called.

Could you write this reason in to ActorDestory()?

with that change, r=masayuki
Attachment #8649383 - Flags: review?(masayuki) → review+
Comment on attachment 8649389 [details] [diff] [review]
docshell active patch

I poked about a bit, this seems essentially the same as the use in https://dxr.mozilla.org/mozilla-central/source/browser/base/content/tabbrowser.xml#1547 , so rs=me

Do we not need to touch ourBrowser's docShellIsActive ?
Attachment #8649389 - Flags: review?(gijskruitbosch+bugs) → review+

Comment 19

2 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/b2a791e57881
https://hg.mozilla.org/integration/mozilla-inbound/rev/28b488cc08b9
(Assignee)

Comment 20

2 years ago
(In reply to :Gijs Kruitbosch from comment #18)
> Do we not need to touch ourBrowser's docShellIsActive ?

I believe the other docshell should always end up unused.
Comment on attachment 8649389 [details] [diff] [review]
docshell active patch

Review of attachment 8649389 [details] [diff] [review]:
-----------------------------------------------------------------

I'm fine with Gijs' review
Attachment #8649389 - Flags: review?(dtownsend)
https://hg.mozilla.org/mozilla-central/rev/b2a791e57881
https://hg.mozilla.org/mozilla-central/rev/28b488cc08b9
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox43: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
(Assignee)

Comment 23

2 years ago
Comment on attachment 8649389 [details] [diff] [review]
docshell active patch

Approval Request Comment
[Feature/regressing bug #]: e10s
[User impact if declined]: tab dragging between windows doesn't work if tab started out not selected
[Describe test coverage new/current, TreeHerder]: on m-c for a few days
[Risks and why]: Pretty low risk.
[String/UUID change made/needed]: none
Attachment #8649389 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 24

2 years ago
Comment on attachment 8649383 [details] [diff] [review]
ime patch

Approval Request Comment
[Feature/regressing bug #]: unknown
[User impact if declined]: possible crashes on tab close
[Describe test coverage new/current, TreeHerder]: on m-c for a few days
[Risks and why]: very low risk
[String/UUID change made/needed]: none
Attachment #8649383 - Flags: approval-mozilla-aurora?
Comment on attachment 8649383 [details] [diff] [review]
ime patch

Patch seems safe as it has been in Nightly for over a week. Let's uplift to Aurora42.
Attachment #8649383 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8649389 [details] [diff] [review]
docshell active patch

Aurora42+
Attachment #8649389 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Updated

2 years ago
status-firefox42: --- → affected
https://hg.mozilla.org/releases/mozilla-aurora/rev/ab92db58d5e8
https://hg.mozilla.org/releases/mozilla-aurora/rev/a51a59900ffc
status-firefox42: affected → fixed
You need to log in before you can comment on or make changes to this bug.