Closed
Bug 1190903
Opened 10 years ago
Closed 10 years ago
"Route error: message sent to unknown actor ID" when swapping browsers too fast
Categories
(Core :: IPC, defect)
Core
IPC
Tracking
()
RESOLVED
FIXED
mozilla43
People
(Reporter: ehsan.akhgari, Assigned: billm)
References
Details
Attachments
(3 files)
3.18 KB,
patch
|
Details | Diff | Splinter Review | |
769 bytes,
patch
|
masayuki
:
review+
ritu
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
1.29 KB,
patch
|
Gijs
:
review+
ritu
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
The test in bug 1190106 hits this. See the setTimeout(1000) in browser/base/content/test/general/browser_audioTabIcon.js that I am adding in that bug. Without that, I get:
###!!! [Child][DispatchAsyncMessage] Error: (msgtype=0x20006F,name=???) Route error: message sent to unknown actor ID
[Child 49157] ###!!! ABORT: Content child abort due to IPC error: file /Users/ehsan/moz/src/dom/ipc/ContentChild.cpp, line 1964
#01: mozilla::dom::ContentChild::ProcessingError(mozilla::ipc::HasResultCodes::Result, char const*) (ContentChild.cpp:1965, in XUL)
#02: mozilla::dom::PContentChild::OnProcessingError(mozilla::ipc::HasResultCodes::Result, char const*) (.PContentChild.cpp:7313, in XUL)
#03: mozilla::ipc::MessageChannel::MaybeHandleError(mozilla::ipc::HasResultCodes::Result, IPC::Message const&, char const*) (MessageChannel.cpp:1745, in XUL)
#04: mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) (MessageChannel.cpp:1385, in XUL)
#05: mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message const&) (MessageChannel.cpp:1302, in XUL)
#06: mozilla::ipc::MessageChannel::OnMaybeDequeueOne() (MessageChannel.cpp:1275, in XUL)
#07: void DispatchToMethod<mozilla::ipc::MessageChannel, bool (mozilla::ipc::MessageChannel::*)()>(mozilla::ipc::MessageChannel*, bool (mozilla::ipc::MessageChannel::*)(), Tuple0 const&) (tuple.h:388, in XUL)
#08: RunnableMethod<mozilla::ipc::MessageChannel, bool (mozilla::ipc::MessageChannel::*)(), Tuple0>::Run() (task.h:311, in XUL)
#09: mozilla::ipc::MessageChannel::RefCountedTask::Run() (MessageChannel.h:459, in XUL)
#10: mozilla::ipc::MessageChannel::DequeueTask::Run() (MessageChannel.h:476, in XUL)
#11: MessageLoop::RunTask(Task*) (message_loop.cc:365, in XUL)
#12: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) (message_loop.cc:375, in XUL)
#13: MessageLoop::DoWork() (message_loop.cc:459, in XUL)
#14: mozilla::ipc::DoWorkRunnable::Run() (MessagePump.cpp:221, in XUL)
#15: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:868, in XUL)
#16: NS_ProcessPendingEvents(nsIThread*, unsigned int) (nsThreadUtils.cpp:219, in XUL)
#17: nsBaseAppShell::NativeEventCallback() (nsBaseAppShell.cpp:100, in XUL)
#18: nsAppShell::ProcessGeckoEvents(void*) (nsAppShell.mm:388, in XUL)
#19: __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ (in CoreFoundation) + 17
#20: __CFRunLoopDoSources0 (in CoreFoundation) + 269
#21: __CFRunLoopRun (in CoreFoundation) + 927
#22: CFRunLoopRunSpecific (in CoreFoundation) + 296
#23: RunCurrentEventLoopInMode (in HIToolbox) + 235
#24: ReceiveNextEventCommon (in HIToolbox) + 431
#25: _BlockUntilNextEventMatchingListInModeWithFilter (in HIToolbox) + 71
#26: _DPSNextEvent (in AppKit) + 978
#27: -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] (in AppKit) + 346
#28: -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] (nsAppShell.mm:122, in XUL)
#29: -[NSApplication run] (in AppKit) + 594
#30: nsAppShell::Run() (nsAppShell.mm:663, in XUL)
#31: XRE_RunAppShell (nsEmbedFunctions.cpp:785, in XUL)
#32: mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) (MessagePump.cpp:259, in XUL)
#33: MessageLoop::RunInternal() (message_loop.cc:235, in XUL)
#34: MessageLoop::RunHandler() (message_loop.cc:228, in XUL)
#35: MessageLoop::Run() (message_loop.cc:201, in XUL)
#36: XRE_InitChildProcess (nsEmbedFunctions.cpp:621, in XUL)
#37: content_process_main(int, char**) (plugin-container.cpp:237, in plugin-container)
#38: main (MozillaRuntimeMain.cpp:11, in plugin-container)
[Child 49157] ###!!! ABORT: Content child abort due to IPC error: file /Users/ehsan/moz/src/dom/ipc/ContentChild.cpp, line 1964
Hit MOZ_CRASH() at /Users/ehsan/moz/src/memory/mozalloc/mozalloc_abort.cpp:33
###!!! [Parent][MessageChannel] Error: (msgtype=0x200081,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
++DOCSHELL 0x11d4f8000 == 9 [pid = 49156] [id = 10]
++DOMWINDOW == 17 (0x11dacec00) [pid = 49156] [serial = 23] [outer = 0x0]
++DOMWINDOW == 18 (0x12891c800) [pid = 49156] [serial = 24] [outer = 0x11dacec00]
###!!! [Parent][MessageChannel] Error: (msgtype=0x200081,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
++DOCSHELL 0x12dc32000 == 10 [pid = 49156] [id = 11]
++DOMWINDOW == 19 (0x12d884800) [pid = 49156] [serial = 25] [outer = 0x0]
++DOMWINDOW == 20 (0x12e403000) [pid = 49156] [serial = 26] [outer = 0x12d884800]
###!!! [Parent][MessageChannel] Error: (msgtype=0x200081,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
++DOCSHELL 0x12e5ad800 == 11 [pid = 49156] [id = 12]
++DOMWINDOW == 21 (0x12e6bf400) [pid = 49156] [serial = 27] [outer = 0x0]
++DOMWINDOW == 22 (0x12e738800) [pid = 49156] [serial = 28] [outer = 0x12e6bf400]
###!!! [Parent][MessageChannel] Error: (msgtype=0x200081,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
++DOCSHELL 0x11d968800 == 12 [pid = 49156] [id = 13]
++DOMWINDOW == 23 (0x12e3ea000) [pid = 49156] [serial = 29] [outer = 0x0]
++DOMWINDOW == 24 (0x12e878000) [pid = 49156] [serial = 30] [outer = 0x12e3ea000]
80 INFO Console message: [JavaScript Error: "remote browser crashed while on about:blank
" {file: "chrome://mochikit/content/mochitest-e10s-utils.js" line: 8}]
++DOMWINDOW == 25 (0x12dc54c00) [pid = 49156] [serial = 31] [outer = 0x11dacec00]
81 INFO Console message: [JavaScript Error: "remote browser crashed while on about:blank
" {file: "chrome://mochikit/content/mochitest-e10s-utils.js" line: 8}]
++DOMWINDOW == 26 (0x12eb4b400) [pid = 49156] [serial = 32] [outer = 0x12d884800]
82 INFO Console message: [JavaScript Error: "remote browser crashed while on about:blank
" {file: "chrome://mochikit/content/mochitest-e10s-utils.js" line: 8}]
++DOMWINDOW == 27 (0x12f96d400) [pid = 49156] [serial = 33] [outer = 0x12e6bf400]
83 INFO Console message: [JavaScript Error: "remote browser crashed while on about:blank
" {file: "chrome://mochikit/content/mochitest-e10s-utils.js" line: 8}]
Reporter | ||
Comment 1•10 years ago
|
||
To reproduce, reverse apply this patch and then run the test in e10s mode:
diff --git a/browser/base/content/test/general/browser_audioTabIcon.js b/browser/base/content/test/general/browser_audioTabIcon.js
index 87e3a3a..2eb989d 100644
--- a/browser/base/content/test/general/browser_audioTabIcon.js
+++ b/browser/base/content/test/general/browser_audioTabIcon.js
@@ -141,6 +141,9 @@ function* test_browser_swapping(tab, browser) {
}, function*(newBrowser) {
yield test_swapped_browser(tab, newBrowser, true)
+ // FIXME: this is needed to work around bug 1190903.
+ yield new Promise(resolve => setTimeout(resolve, 1000));
+
// Now, test swapping with a muted but not playing tab.
// Note that the tab remains muted, so we only need to pause playback.
tab = gBrowser.getTabForBrowser(newBrowser);
Reporter | ||
Comment 2•10 years ago
|
||
(This is the code in question in the test: https://hg.mozilla.org/integration/mozilla-inbound/rev/4a9d48b58545#l2.47)
Updated•10 years ago
|
Reporter | ||
Comment 3•10 years ago
|
||
Can this be prioritized please? This is showing up as a terrible intermittent orange.
Blocks: 1192449
Flags: needinfo?(wmccloskey)
Assignee | ||
Comment 4•10 years ago
|
||
As discussed on IRC, I'm not able to run this test at all. I spent some time debugging it and it seems that the tooltip is never shown because we get here:
http://mxr.mozilla.org/mozilla-central/source/browser/base/content/tabbrowser.xml#3902
In my case, tab.localName == "tabs". I'll see if I can figure out why.
Flags: needinfo?(wmccloskey)
Assignee | ||
Comment 5•10 years ago
|
||
Ehsan, if you know where we're supposed to generate the right tooltipNode element, it's probably quicker than me trying to figure it out.
Flags: needinfo?(ehsan)
Reporter | ||
Comment 6•10 years ago
|
||
I think it comes from nsXULPopupManager::GetLastTriggerNode, which is later returned from XULDocument::GetTooltipNode.
What happens if you comment out that preventDefault() line?
Flags: needinfo?(ehsan)
Assignee | ||
Comment 7•10 years ago
|
||
OK, I traced it back a little further from GetLastTriggerNode, and I think it comes from here:
http://mxr.mozilla.org/mozilla-central/source/layout/xul/nsXULTooltipListener.cpp#198
I still don't quite understand how this will be the "tab" node and not the icon. But I'll debug some more when I have a chance.
> What happens if you comment out that preventDefault() line?
It's already too late because the code in that function is what sets the tooltip text. If we return early, then we don't set the text to "Mute audio" and the test fails.
Reporter | ||
Comment 8•10 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #7)
> OK, I traced it back a little further from GetLastTriggerNode, and I think
> it comes from here:
> http://mxr.mozilla.org/mozilla-central/source/layout/xul/
> nsXULTooltipListener.cpp#198
> I still don't quite understand how this will be the "tab" node and not the
> icon. But I'll debug some more when I have a chance.
Could it be because we don't wait for the tab animation to finish? Jared?
> > What happens if you comment out that preventDefault() line?
>
> It's already too late because the code in that function is what sets the
> tooltip text. If we return early, then we don't set the text to "Mute audio"
> and the test fails.
Well, but you could comment out those checks, right? The crash should have nothing to do with the tooltip, I'm pretty sure.
Flags: needinfo?(jaws)
Reporter | ||
Comment 9•10 years ago
|
||
(In reply to (Ask others for review please; out most of the week) from comment #8)
> > > What happens if you comment out that preventDefault() line?
> >
> > It's already too late because the code in that function is what sets the
> > tooltip text. If we return early, then we don't set the text to "Mute audio"
> > and the test fails.
>
> Well, but you could comment out those checks, right? The crash should have
> nothing to do with the tooltip, I'm pretty sure.
Even if you don't comment then out, you should get to the point that the test crashes, unless I'm misunderstanding something.
Assignee | ||
Comment 10•10 years ago
|
||
(In reply to (Ask others for review please; out most of the week) from comment #9)
> Even if you don't comment then out, you should get to the point that the
> test crashes, unless I'm misunderstanding something.
Well, I did comment out the preventDefault line and I didn't crash. I just hang at some other point in the test. I didn't look into why.
I'm sorry I haven't had time to look at this again yet. I'll try to get to it later today or tomorrow.
Flags: needinfo?(wmccloskey)
Reporter | ||
Comment 11•10 years ago
|
||
Thanks Bill, and please ping me if you need help in massaging the test. :)
Assignee | ||
Comment 12•10 years ago
|
||
OK, finally got it to crash. I had to comment out more stuff. I'll look into this tomorrow.
Just to not, putting extra delays in did not fix the original problem I was having. Something is really messed up there.
Flags: needinfo?(wmccloskey)
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(wmccloskey)
Assignee | ||
Comment 13•10 years ago
|
||
This patch fixes the main problem. We are sending a StopIMEStateManagement message after the tab has been destroyed. The rule is that we're not allowed to send any more messages to the PBrowser after TabParent::Destroy. The Destroy message that gets sent there causes the child to delete its TabChild. If another messages comes in after that, the child won't have an actor to pass it to.
I think we still need the OnTabParentDestroying call in ActorDestroy in case the child process crashes. In that case, Destroy() is never called.
Assignee: nobody → wmccloskey
Status: NEW → ASSIGNED
Flags: needinfo?(wmccloskey)
Attachment #8649383 -
Flags: review?(masayuki)
Assignee | ||
Comment 14•10 years ago
|
||
I also noticed a second problem when running this test. After doing a tab swap, the tab shows up black. If I switch away and switch back, it draws normally.
We don't seem to be setting the docShellIsActive property correctly when swapping docshells. I'm kinda surprised about this, but I looked over the code, and I just don't see it covered. In e10s, failing to set docShellIsActive causes the layer tree for the tab not to be sent, so nothing can be drawn.
The tricky part is deciding where to put this code. I think we need the tabbrowser to be involved since it's the only piece that knows whether a browser should be active or not. (In e10s there's no way to query this property from the parent process.)
I think Tim has reviewed this stuff in the past, but since he's out, are you okay to look at this Gijs?
Attachment #8649389 -
Flags: review?(gijskruitbosch+bugs)
Comment 15•10 years ago
|
||
Comment on attachment 8649389 [details] [diff] [review]
docshell active patch
(In reply to Bill McCloskey (:billm) from comment #14)
> Created attachment 8649389 [details] [diff] [review]
> docshell active patch
>
> I also noticed a second problem when running this test. After doing a tab
> swap, the tab shows up black. If I switch away and switch back, it draws
> normally.
>
> We don't seem to be setting the docShellIsActive property correctly when
> swapping docshells. I'm kinda surprised about this, but I looked over the
> code, and I just don't see it covered. In e10s, failing to set
> docShellIsActive causes the layer tree for the tab not to be sent, so
> nothing can be drawn.
>
> The tricky part is deciding where to put this code. I think we need the
> tabbrowser to be involved since it's the only piece that knows whether a
> browser should be active or not. (In e10s there's no way to query this
> property from the parent process.)
>
> I think Tim has reviewed this stuff in the past, but since he's out, are you
> okay to look at this Gijs?
I can, and it looks sane-ish to me, but I think mconley would be a better choice... but he seems to be out. I'll look in more depth tomorrow.
Flags: needinfo?(jaws)
Assignee | ||
Comment 16•10 years ago
|
||
Comment on attachment 8649389 [details] [diff] [review]
docshell active patch
Review of attachment 8649389 [details] [diff] [review]:
-----------------------------------------------------------------
I just remembered that Dave has worked on this code pretty recently for e10s. Maybe he can review it?
Attachment #8649389 -
Flags: review?(dtownsend)
Comment 17•10 years ago
|
||
Comment on attachment 8649383 [details] [diff] [review]
ime patch
> I think we still need the OnTabParentDestroying call in ActorDestroy in case
> the child process crashes. In that case, Destroy() is never called.
Could you write this reason in to ActorDestory()?
with that change, r=masayuki
Attachment #8649383 -
Flags: review?(masayuki) → review+
Comment 18•10 years ago
|
||
Comment on attachment 8649389 [details] [diff] [review]
docshell active patch
I poked about a bit, this seems essentially the same as the use in https://dxr.mozilla.org/mozilla-central/source/browser/base/content/tabbrowser.xml#1547 , so rs=me
Do we not need to touch ourBrowser's docShellIsActive ?
Attachment #8649389 -
Flags: review?(gijskruitbosch+bugs) → review+
Comment 19•10 years ago
|
||
Assignee | ||
Comment 20•10 years ago
|
||
(In reply to :Gijs Kruitbosch from comment #18)
> Do we not need to touch ourBrowser's docShellIsActive ?
I believe the other docshell should always end up unused.
Comment 21•10 years ago
|
||
Comment on attachment 8649389 [details] [diff] [review]
docshell active patch
Review of attachment 8649389 [details] [diff] [review]:
-----------------------------------------------------------------
I'm fine with Gijs' review
Attachment #8649389 -
Flags: review?(dtownsend)
https://hg.mozilla.org/mozilla-central/rev/b2a791e57881
https://hg.mozilla.org/mozilla-central/rev/28b488cc08b9
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
status-firefox43:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Assignee | ||
Comment 23•10 years ago
|
||
Comment on attachment 8649389 [details] [diff] [review]
docshell active patch
Approval Request Comment
[Feature/regressing bug #]: e10s
[User impact if declined]: tab dragging between windows doesn't work if tab started out not selected
[Describe test coverage new/current, TreeHerder]: on m-c for a few days
[Risks and why]: Pretty low risk.
[String/UUID change made/needed]: none
Attachment #8649389 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 24•10 years ago
|
||
Comment on attachment 8649383 [details] [diff] [review]
ime patch
Approval Request Comment
[Feature/regressing bug #]: unknown
[User impact if declined]: possible crashes on tab close
[Describe test coverage new/current, TreeHerder]: on m-c for a few days
[Risks and why]: very low risk
[String/UUID change made/needed]: none
Attachment #8649383 -
Flags: approval-mozilla-aurora?
Comment 25•10 years ago
|
||
Comment on attachment 8649383 [details] [diff] [review]
ime patch
Patch seems safe as it has been in Nightly for over a week. Let's uplift to Aurora42.
Attachment #8649383 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 26•10 years ago
|
||
Comment on attachment 8649389 [details] [diff] [review]
docshell active patch
Aurora42+
Attachment #8649389 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
status-firefox42:
--- → affected
Comment 27•10 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•