Assertion failure: BrowserHost::GetFrom(newRemoteTab.get()) == newTab->GetBrowserHost(), at /dom/ipc/ContentParent.cpp:5449
Categories
(Core :: DOM: Content Processes, defect, P3)
Tracking
()
People
(Reporter: jkratzer, Assigned: nika)
References
(Blocks 1 open bug)
Details
(Keywords: assertion, testcase, Whiteboard: [bugmon:bisected,confirmed])
Attachments
(3 files, 1 obsolete file)
Testcase found while fuzzing mozilla-central rev 29d6504debf5 (built with: --enable-debug --enable-fuzzing).
Testcase can be reproduced using the following commands:
$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch --build 29d6504debf5 --debug --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.html
Assertion failure: BrowserHost::GetFrom(newRemoteTab.get()) == newTab->GetBrowserHost(), at /dom/ipc/ContentParent.cpp:5449
==1910216==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f49014dc53b bp 0x7ffe6cc45870 sp 0x7ffe6cc45740 T1910216)
==1910216==The signal is caused by a WRITE memory access.
==1910216==Hint: address points to the zero page.
#0 0x7f49014dc53b in mozilla::dom::ContentParent::RecvCreateWindow(mozilla::dom::PBrowserParent*, mozilla::dom::MaybeDiscarded<mozilla::dom::BrowsingContext> const&, mozilla::dom::PBrowserParent*, unsigned int const&, bool const&, bool const&, bool const&, bool const&, nsIURI*, nsTString<char> const&, float const&, IPC::Principal const&, nsIContentSecurityPolicy*, nsIReferrerInfo*, mozilla::OriginAttributes const&, std::function<void (mozilla::dom::CreatedWindowInfo const&)>&&) /dom/ipc/ContentParent.cpp:5448:3
#1 0x7f48fdceb84b in mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PContentParent.cpp:10736:57
#2 0x7f48fdb17e1f in mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&) /ipc/glue/MessageChannel.cpp:2039:25
#3 0x7f48fdb14701 in mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) /ipc/glue/MessageChannel.cpp:1964:9
#4 0x7f48fdb15b85 in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) /ipc/glue/MessageChannel.cpp:1823:3
#5 0x7f48fdb167cd in mozilla::ipc::MessageChannel::MessageTask::Run() /ipc/glue/MessageChannel.cpp:1851:14
#6 0x7f48fd0c9dfe in mozilla::RunnableTask::Run() /xpcom/threads/TaskController.cpp:478:16
#7 0x7f48fd0a50af in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /xpcom/threads/TaskController.cpp:781:26
#8 0x7f48fd0a3d18 in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /xpcom/threads/TaskController.cpp:617:15
#9 0x7f48fd0a3f93 in mozilla::TaskController::ProcessPendingMTTask(bool) /xpcom/threads/TaskController.cpp:401:36
#10 0x7f48fd0cd346 in operator() /xpcom/threads/TaskController.cpp:126:37
#11 0x7f48fd0cd346 in mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:531:5
#12 0x7f48fd0b88af in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1148:16
#13 0x7f48fd0bf5fa in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:466:10
#14 0x7f48fdb1dc96 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:85:21
#15 0x7f48fda3e3a7 in MessageLoop::RunInternal() /ipc/chromium/src/base/message_loop.cc:331:10
#16 0x7f48fda3e2b2 in RunHandler /ipc/chromium/src/base/message_loop.cc:324:3
#17 0x7f48fda3e2b2 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:306:3
#18 0x7f490199f148 in nsBaseAppShell::Run() /widget/nsBaseAppShell.cpp:137:27
#19 0x7f49036f7dc3 in nsAppStartup::Run() /toolkit/components/startup/nsAppStartup.cpp:289:30
#20 0x7f4903823ef4 in XREMain::XRE_mainRun() /toolkit/xre/nsAppRunner.cpp:5291:22
#21 0x7f49038256a1 in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /toolkit/xre/nsAppRunner.cpp:5476:8
#22 0x7f4903825f29 in XRE_main(int, char**, mozilla::BootstrapConfig const&) /toolkit/xre/nsAppRunner.cpp:5535:21
#23 0x560284436a70 in do_main /browser/app/nsBrowserApp.cpp:225:22
#24 0x560284436a70 in main /browser/app/nsBrowserApp.cpp:386:16
#25 0x7f491354a0b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/../csu/libc-start.c:308:16
#26 0x56028441394c in _start (/home/jkratzer/builds/mc-debug/firefox-bin+0x1594c)
UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: SEGV /dom/ipc/ContentParent.cpp:5448:3 in mozilla::dom::ContentParent::RecvCreateWindow(mozilla::dom::PBrowserParent*, mozilla::dom::MaybeDiscarded<mozilla::dom::BrowsingContext> const&, mozilla::dom::PBrowserParent*, unsigned int const&, bool const&, bool const&, bool const&, bool const&, nsIURI*, nsTString<char> const&, float const&, IPC::Principal const&, nsIContentSecurityPolicy*, nsIReferrerInfo*, mozilla::OriginAttributes const&, std::function<void (mozilla::dom::CreatedWindowInfo const&)>&&)
==1910216==ABORTING
Reporter | ||
Comment 1•3 years ago
|
||
Comment 2•3 years ago
|
||
Bugmon Analysis
Verified bug as reproducible on mozilla-central 20210920102201-29d6504debf5.
Failed to bisect testcase (Unable to launch the start build!):
Start: a7cd0e635744eef0be11c050931842920a79dc5a (20200921141232)
End: 29d6504debf5b2028a9da65026ae86a06b5a655d (20210920102201)
BuildFlags: BuildFlags(asan=False, tsan=False, debug=True, fuzzing=True, coverage=False, valgrind=False, no_opt=False, fuzzilli=False)
Comment 3•3 years ago
|
||
@ Jason, can you grab a Pernosco recording of the fuzzer reproducing this assertion failure?
Also, Bugmon wasn't able to bisect this bug. Is that typically because it failed to launch the first build from one year ago? Can we ask Bugmon to try a smaller, more recent regression range than one year? A lot of Fission code has changed over the last year.
Nika says the assertion means the new popup window seems to have the wrong BrowserParent. The test case is window.close()'ing the original window at the same time we're trying to open the new popup window, so there's probably a race condition.
Updated•3 years ago
|
Reporter | ||
Comment 4•3 years ago
|
||
Chris, you can find a pernosco session for this issue here. Regarding the bisection range, Bugmon doesn't have the ability to specify a custom range. I'll run a manual bisection and comment here once it's done.
Reporter | ||
Comment 5•3 years ago
|
||
Chris, sorry for the delay. Due to the unreliability of the testcase, I needed to run the bisection multiple times.
[2021-09-23 14:18:50] Reduced build range to:
[2021-09-23 14:18:50] > Start: 561ed5ccdd2400934cb05151dbf8dbe105bfc963 (20210912213037)
[2021-09-23 14:18:50] > End: 4e61dfde0ad601e48df245bdaf93f3d19b91d845 (20210912221704)
[2021-09-23 14:18:50] > https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=561ed5ccdd2400934cb05151dbf8dbe105bfc963&tochange=4e61dfde0ad601e48df245bdaf93f3d19b91d845
Comment 6•3 years ago
|
||
needinfo'ing Nika because she said she would look at the Pernosco session: https://pernos.co/debug/_7J6MZj_eG2F21vGh3C1mg/index.html
Updated•3 years ago
|
Assignee | ||
Comment 7•3 years ago
|
||
After looking into this a bit more, the cause of this bug appears to be a nested event loop changing the state of our newly created pop-up browser while still performing the initial window.open
call to create that new tab. The first window.open creates a new browser window and waits for browser.xhtml
to load, and the second window.open gets run during that window.open's nested event loop, and decides to use that newly-opened window to load the new tab. The creation of a new tab in the browser window changes the primary remote tab in the window, so when the for the first window tries to fetch it, it gets the wrong content browser back (as the code assumes here that there is only one browser in the window, but a second one was added during the nested event loop in CreateChromeWindow
) https://searchfox.org/mozilla-central/rev/1df999af9999ccb436512cfece57a68d94d36e08/toolkit/components/windowwatcher/nsWindowWatcher.cpp#577.
I think the solution here will require us to specifically mark the initial browser in the created chrome window, and use that instead of the primary content window in the window opening code. This is fairly simple to do in the remote case (the call to GetPrimaryRemoteTab
just needs to change: https://searchfox.org/mozilla-central/rev/1df999af9999ccb436512cfece57a68d94d36e08/toolkit/components/windowwatcher/nsWindowWatcher.cpp#577), but might be more tricky in the in-process case as right now I think that the calls are actually being performed using do_GetInterface
. (Get the chrome AppWindow
here: https://searchfox.org/mozilla-central/rev/1df999af9999ccb436512cfece57a68d94d36e08/toolkit/components/startup/nsAppStartup.cpp#749-753, and call GetInterface
to nsIWebBrowserChrome
, which should end up fetching the nsContentTreeOwner
here: https://searchfox.org/mozilla-central/rev/1df999af9999ccb436512cfece57a68d94d36e08/xpfe/appshell/AppWindow.cpp#306-309. That then calls do_GetInterface
again to get the nsPIDOMWindowOuter
here: https://searchfox.org/mozilla-central/rev/1df999af9999ccb436512cfece57a68d94d36e08/toolkit/components/windowwatcher/nsWindowWatcher.cpp#954-957, which fetches the primary content shell here: https://searchfox.org/mozilla-central/rev/1df999af9999ccb436512cfece57a68d94d36e08/xpfe/appshell/nsContentTreeOwner.cpp#133-143, at least I think - haven't confirmed this is the relevant codepath).
Alternatively, at least for the remote tab case, we don't actually need the tab returned from GetPrimaryRemoteTab
and can get by without it. Perhaps we'll do that in the future.
Assignee | ||
Comment 8•3 years ago
|
||
Leaving a needinfo for myself so I get back to this & make a decision about whether we can do that.
Comment 9•3 years ago
|
||
Tentatively assigning to Nika because she plans to investigate and I'd like all Fission MVP bugs to have an assignee.
TBD: Does this fuzz bug need to block Fission MVP?
Assignee | ||
Comment 10•3 years ago
|
||
Assignee | ||
Comment 11•3 years ago
|
||
Depends on D126993
Comment 12•3 years ago
|
||
Setting status-firefox94=wontfix because we don't need to uplift this assertion failure to Beta 94.
This was actually a buggy assertion in a non-Fission code path.
Clearing needinfo for Nika since the bug is already assigned to her and she has a fix.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 13•3 years ago
|
||
Comment 14•3 years ago
|
||
Backed out 2 changesets (bug 1731597) for causing multiple wpt failures.
Backout link: https://hg.mozilla.org/integration/autoland/rev/594a4b5bb9740526ecb82fc7d38c9f4c3f75d174
Comment 15•3 years ago
|
||
There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:nika, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 16•3 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #12)
Setting status-firefox94=wontfix because we don't need to uplift this assertion failure to Beta 94.
This was actually a buggy assertion in a non-Fission code path.
Clearing needinfo for Nika since the bug is already assigned to her and she has a fix.
I received autonag release-mgmt email as this bug affects a released version without activity for the last 1 week.
However, looking at this comment "buggy assertion in a non-Fission code path", also that fission is enabled by default. I think it's okay we simply modify the " status-firefox95" as wontfix and lower the priority to P3/P5. Does that make sense?
Comment 17•3 years ago
|
||
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #16)
I received autonag release-mgmt email as this bug affects a released version without activity for the last 1 week.
However, looking at this comment "buggy assertion in a non-Fission code path", also that fission is enabled by default. I think it's okay we simply modify the " status-firefox95" as wontfix and lower the priority to P3/P5. Does that make sense?
SGTM. I changed the status flags.
@ Nika, your patches got backed out two months ago in comment 14. Is this fix still relevant now that Fission is enabled by default on desktop? Do we need this fix for Android e10s?
Updated•2 years ago
|
Comment 18•2 years ago
|
||
:nika, since this bug contains a bisection range, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.
Comment 20•2 years ago
|
||
Bugmon Analysis
Unable to reproduce bug 1731597 using build mozilla-central 20210920102201-29d6504debf5. Without a baseline, bugmon is unable to analyze this bug.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
Updated•2 years ago
|
Comment 22•2 years ago
|
||
(In reply to Bugmon [:jkratzer for issues] from comment #20)
Bugmon Analysis
Unable to reproduce bug 1731597 using build mozilla-central 20210920102201-29d6504debf5. Without a baseline, bugmon is unable to analyze this bug.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
(as it seems it was somehow fixed elsewhere?)
Reporter | ||
Comment 23•2 years ago
|
||
It looks like the testcase has just become less reliable. I can still reproduce this issue on mozilla-central 20220929-a58c99b9307b. I'll upload a new testcase shortly.
Reporter | ||
Comment 24•2 years ago
|
||
The attached testcase appears more reliable than the original. I'll re-enable bugmon.
Assignee | ||
Comment 25•2 years ago
|
||
Yeah, these patches or something like them should probably land at some point, though I remember struggling to figure out the cause of the failures a while ago when I first tried to land it, which lead me to put them to the side, as I was busy with other things at the time. Probably worth taking another look at.
Comment 26•2 years ago
|
||
Testcase crashes using the initial build (mozilla-central 20220122095122-61861c0babc6) but not with tip (mozilla-central 20230120212103-f2fbf518572b.)
The bug appears to have been fixed in the following build range:
Start: 676b14e7a199b1bb5ae42b40deb1032bc439585e (20230118165610)
End: 3523d141513d03ec99f06916864ad0a7838a8b65 (20230118193647)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=676b14e7a199b1bb5ae42b40deb1032bc439585e&tochange=3523d141513d03ec99f06916864ad0a7838a8b65
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
Updated•2 years ago
|
Comment 27•2 years ago
|
||
(In reply to Bugmon [:jkratzer for issues] from comment #26)
Testcase crashes using the initial build (mozilla-central 20220122095122-61861c0babc6) but not with tip (mozilla-central 20230120212103-f2fbf518572b.)
The bug appears to have been fixed in the following build range:
Start: 676b14e7a199b1bb5ae42b40deb1032bc439585e (20230118165610)
End: 3523d141513d03ec99f06916864ad0a7838a8b65 (20230118193647)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=676b14e7a199b1bb5ae42b40deb1032bc439585e&tochange=3523d141513d03ec99f06916864ad0a7838a8b65Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
Let's see if this holds.
Comment 28•9 months ago
|
||
Hi :jkratzer, did I ask bugmon in a wrong way to verify this again?
Reporter | ||
Comment 29•9 months ago
•
|
||
(In reply to Jens Stutte [:jstutte] from comment #28)
Hi :jkratzer, did I ask bugmon in a wrong way to verify this again?
Using the confirm
whiteboard command is correct but you also need to re-add the bugmon
keyword.
Comment 30•9 months ago
|
||
Testcase crashes using the initial build (mozilla-central 20230118095131-60b4965aa0ca) but not with tip (mozilla-central 20240116163721-fd640ad054ea.)
The bug appears to have been fixed in the following build range:
Start: 676b14e7a199b1bb5ae42b40deb1032bc439585e (20230118165610)
End: 3523d141513d03ec99f06916864ad0a7838a8b65 (20230118193647)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=676b14e7a199b1bb5ae42b40deb1032bc439585e&tochange=3523d141513d03ec99f06916864ad0a7838a8b65
nika, can you confirm that the above bisection range is responsible for fixing this issue?
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
Comment 32•7 months ago
|
||
Note that there is a modern pernosco reproduction from the dupe bug in https://bugzilla.mozilla.org/show_bug.cgi?id=1883786#c1 (https://pernos.co/debug/Wvb5GQRxXrdqTlnZEDRgNw/index.html)
Updated•6 months ago
|
Comment 33•6 months ago
|
||
So from bug 1883786 it seems there is a testcase where this still happens, despite bugmon being happy here in comment 30. :nika, do you think we should try to re-land these patches?
Assignee | ||
Comment 34•6 months ago
|
||
I don't imagine the patches can land in their current forms anymore, given how old they are. Fixing something along these lines would be good though.
I think in an ideal world, we'll probably end up fixing this as part of the proposed work in bug 1890984, as we could easily rip out the buggy code in that case (we could ask our frontend to open a new browser window, and expect it to give us the right BC reference back, rather than having to guess where it'll end up after loading).
Description
•