Assertion failure: BrowserHost::GetFrom(newRemoteTab.get()) == newTab->GetBrowserHost(), at /dom/ipc/ContentParent.cpp:5449
Categories
(Core :: DOM: Content Processes, defect, P3)
Tracking
()
People
(Reporter: jkratzer, Assigned: nika, NeedInfo)
References
(Blocks 1 open bug)
Details
(Keywords: assertion, testcase, Whiteboard: [bugmon:bisected,confirmed])
Attachments
(3 files, 1 obsolete file)
Testcase found while fuzzing mozilla-central rev 29d6504debf5 (built with: --enable-debug --enable-fuzzing).
Testcase can be reproduced using the following commands:
$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch --build 29d6504debf5 --debug --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.html
Assertion failure: BrowserHost::GetFrom(newRemoteTab.get()) == newTab->GetBrowserHost(), at /dom/ipc/ContentParent.cpp:5449
==1910216==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f49014dc53b bp 0x7ffe6cc45870 sp 0x7ffe6cc45740 T1910216)
==1910216==The signal is caused by a WRITE memory access.
==1910216==Hint: address points to the zero page.
#0 0x7f49014dc53b in mozilla::dom::ContentParent::RecvCreateWindow(mozilla::dom::PBrowserParent*, mozilla::dom::MaybeDiscarded<mozilla::dom::BrowsingContext> const&, mozilla::dom::PBrowserParent*, unsigned int const&, bool const&, bool const&, bool const&, bool const&, nsIURI*, nsTString<char> const&, float const&, IPC::Principal const&, nsIContentSecurityPolicy*, nsIReferrerInfo*, mozilla::OriginAttributes const&, std::function<void (mozilla::dom::CreatedWindowInfo const&)>&&) /dom/ipc/ContentParent.cpp:5448:3
#1 0x7f48fdceb84b in mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PContentParent.cpp:10736:57
#2 0x7f48fdb17e1f in mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&) /ipc/glue/MessageChannel.cpp:2039:25
#3 0x7f48fdb14701 in mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) /ipc/glue/MessageChannel.cpp:1964:9
#4 0x7f48fdb15b85 in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) /ipc/glue/MessageChannel.cpp:1823:3
#5 0x7f48fdb167cd in mozilla::ipc::MessageChannel::MessageTask::Run() /ipc/glue/MessageChannel.cpp:1851:14
#6 0x7f48fd0c9dfe in mozilla::RunnableTask::Run() /xpcom/threads/TaskController.cpp:478:16
#7 0x7f48fd0a50af in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /xpcom/threads/TaskController.cpp:781:26
#8 0x7f48fd0a3d18 in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /xpcom/threads/TaskController.cpp:617:15
#9 0x7f48fd0a3f93 in mozilla::TaskController::ProcessPendingMTTask(bool) /xpcom/threads/TaskController.cpp:401:36
#10 0x7f48fd0cd346 in operator() /xpcom/threads/TaskController.cpp:126:37
#11 0x7f48fd0cd346 in mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:531:5
#12 0x7f48fd0b88af in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1148:16
#13 0x7f48fd0bf5fa in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:466:10
#14 0x7f48fdb1dc96 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:85:21
#15 0x7f48fda3e3a7 in MessageLoop::RunInternal() /ipc/chromium/src/base/message_loop.cc:331:10
#16 0x7f48fda3e2b2 in RunHandler /ipc/chromium/src/base/message_loop.cc:324:3
#17 0x7f48fda3e2b2 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:306:3
#18 0x7f490199f148 in nsBaseAppShell::Run() /widget/nsBaseAppShell.cpp:137:27
#19 0x7f49036f7dc3 in nsAppStartup::Run() /toolkit/components/startup/nsAppStartup.cpp:289:30
#20 0x7f4903823ef4 in XREMain::XRE_mainRun() /toolkit/xre/nsAppRunner.cpp:5291:22
#21 0x7f49038256a1 in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /toolkit/xre/nsAppRunner.cpp:5476:8
#22 0x7f4903825f29 in XRE_main(int, char**, mozilla::BootstrapConfig const&) /toolkit/xre/nsAppRunner.cpp:5535:21
#23 0x560284436a70 in do_main /browser/app/nsBrowserApp.cpp:225:22
#24 0x560284436a70 in main /browser/app/nsBrowserApp.cpp:386:16
#25 0x7f491354a0b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/../csu/libc-start.c:308:16
#26 0x56028441394c in _start (/home/jkratzer/builds/mc-debug/firefox-bin+0x1594c)
UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: SEGV /dom/ipc/ContentParent.cpp:5448:3 in mozilla::dom::ContentParent::RecvCreateWindow(mozilla::dom::PBrowserParent*, mozilla::dom::MaybeDiscarded<mozilla::dom::BrowsingContext> const&, mozilla::dom::PBrowserParent*, unsigned int const&, bool const&, bool const&, bool const&, bool const&, nsIURI*, nsTString<char> const&, float const&, IPC::Principal const&, nsIContentSecurityPolicy*, nsIReferrerInfo*, mozilla::OriginAttributes const&, std::function<void (mozilla::dom::CreatedWindowInfo const&)>&&)
==1910216==ABORTING
Reporter | ||
Comment 1•2 years ago
|
||
Comment 2•2 years ago
|
||
Bugmon Analysis
Verified bug as reproducible on mozilla-central 20210920102201-29d6504debf5.
Failed to bisect testcase (Unable to launch the start build!):
Start: a7cd0e635744eef0be11c050931842920a79dc5a (20200921141232)
End: 29d6504debf5b2028a9da65026ae86a06b5a655d (20210920102201)
BuildFlags: BuildFlags(asan=False, tsan=False, debug=True, fuzzing=True, coverage=False, valgrind=False, no_opt=False, fuzzilli=False)
Comment 3•2 years ago
|
||
@ Jason, can you grab a Pernosco recording of the fuzzer reproducing this assertion failure?
Also, Bugmon wasn't able to bisect this bug. Is that typically because it failed to launch the first build from one year ago? Can we ask Bugmon to try a smaller, more recent regression range than one year? A lot of Fission code has changed over the last year.
Nika says the assertion means the new popup window seems to have the wrong BrowserParent. The test case is window.close()'ing the original window at the same time we're trying to open the new popup window, so there's probably a race condition.
Updated•2 years ago
|
Reporter | ||
Comment 4•2 years ago
|
||
Chris, you can find a pernosco session for this issue here. Regarding the bisection range, Bugmon doesn't have the ability to specify a custom range. I'll run a manual bisection and comment here once it's done.
Reporter | ||
Comment 5•2 years ago
|
||
Chris, sorry for the delay. Due to the unreliability of the testcase, I needed to run the bisection multiple times.
[2021-09-23 14:18:50] Reduced build range to:
[2021-09-23 14:18:50] > Start: 561ed5ccdd2400934cb05151dbf8dbe105bfc963 (20210912213037)
[2021-09-23 14:18:50] > End: 4e61dfde0ad601e48df245bdaf93f3d19b91d845 (20210912221704)
[2021-09-23 14:18:50] > https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=561ed5ccdd2400934cb05151dbf8dbe105bfc963&tochange=4e61dfde0ad601e48df245bdaf93f3d19b91d845
Comment 6•2 years ago
|
||
needinfo'ing Nika because she said she would look at the Pernosco session: https://pernos.co/debug/_7J6MZj_eG2F21vGh3C1mg/index.html
Updated•2 years ago
|
Assignee | ||
Comment 7•2 years ago
|
||
After looking into this a bit more, the cause of this bug appears to be a nested event loop changing the state of our newly created pop-up browser while still performing the initial window.open
call to create that new tab. The first window.open creates a new browser window and waits for browser.xhtml
to load, and the second window.open gets run during that window.open's nested event loop, and decides to use that newly-opened window to load the new tab. The creation of a new tab in the browser window changes the primary remote tab in the window, so when the for the first window tries to fetch it, it gets the wrong content browser back (as the code assumes here that there is only one browser in the window, but a second one was added during the nested event loop in CreateChromeWindow
) https://searchfox.org/mozilla-central/rev/1df999af9999ccb436512cfece57a68d94d36e08/toolkit/components/windowwatcher/nsWindowWatcher.cpp#577.
I think the solution here will require us to specifically mark the initial browser in the created chrome window, and use that instead of the primary content window in the window opening code. This is fairly simple to do in the remote case (the call to GetPrimaryRemoteTab
just needs to change: https://searchfox.org/mozilla-central/rev/1df999af9999ccb436512cfece57a68d94d36e08/toolkit/components/windowwatcher/nsWindowWatcher.cpp#577), but might be more tricky in the in-process case as right now I think that the calls are actually being performed using do_GetInterface
. (Get the chrome AppWindow
here: https://searchfox.org/mozilla-central/rev/1df999af9999ccb436512cfece57a68d94d36e08/toolkit/components/startup/nsAppStartup.cpp#749-753, and call GetInterface
to nsIWebBrowserChrome
, which should end up fetching the nsContentTreeOwner
here: https://searchfox.org/mozilla-central/rev/1df999af9999ccb436512cfece57a68d94d36e08/xpfe/appshell/AppWindow.cpp#306-309. That then calls do_GetInterface
again to get the nsPIDOMWindowOuter
here: https://searchfox.org/mozilla-central/rev/1df999af9999ccb436512cfece57a68d94d36e08/toolkit/components/windowwatcher/nsWindowWatcher.cpp#954-957, which fetches the primary content shell here: https://searchfox.org/mozilla-central/rev/1df999af9999ccb436512cfece57a68d94d36e08/xpfe/appshell/nsContentTreeOwner.cpp#133-143, at least I think - haven't confirmed this is the relevant codepath).
Alternatively, at least for the remote tab case, we don't actually need the tab returned from GetPrimaryRemoteTab
and can get by without it. Perhaps we'll do that in the future.
Assignee | ||
Comment 8•2 years ago
|
||
Leaving a needinfo for myself so I get back to this & make a decision about whether we can do that.
Comment 9•2 years ago
|
||
Tentatively assigning to Nika because she plans to investigate and I'd like all Fission MVP bugs to have an assignee.
TBD: Does this fuzz bug need to block Fission MVP?
Assignee | ||
Comment 10•2 years ago
|
||
Assignee | ||
Comment 11•2 years ago
|
||
Depends on D126993
Comment 12•2 years ago
|
||
Setting status-firefox94=wontfix because we don't need to uplift this assertion failure to Beta 94.
This was actually a buggy assertion in a non-Fission code path.
Clearing needinfo for Nika since the bug is already assigned to her and she has a fix.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 13•1 year ago
|
||
Pushed by nlayzell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/db878c07f32a Part 1: Generalize nsBrowsingContextReadyCallback, r=annyG https://hg.mozilla.org/integration/autoland/rev/48cc3daf8201 Part 2: Stop relying on GetPrimaryRemoteTab when opening popups, r=annyG
Comment 14•1 year ago
|
||
Backed out 2 changesets (bug 1731597) for causing multiple wpt failures.
Backout link: https://hg.mozilla.org/integration/autoland/rev/594a4b5bb9740526ecb82fc7d38c9f4c3f75d174
Comment 15•1 year ago
|
||
There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:nika, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 16•1 year ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #12)
Setting status-firefox94=wontfix because we don't need to uplift this assertion failure to Beta 94.
This was actually a buggy assertion in a non-Fission code path.
Clearing needinfo for Nika since the bug is already assigned to her and she has a fix.
I received autonag release-mgmt email as this bug affects a released version without activity for the last 1 week.
However, looking at this comment "buggy assertion in a non-Fission code path", also that fission is enabled by default. I think it's okay we simply modify the " status-firefox95" as wontfix and lower the priority to P3/P5. Does that make sense?
Comment 17•1 year ago
|
||
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #16)
I received autonag release-mgmt email as this bug affects a released version without activity for the last 1 week.
However, looking at this comment "buggy assertion in a non-Fission code path", also that fission is enabled by default. I think it's okay we simply modify the " status-firefox95" as wontfix and lower the priority to P3/P5. Does that make sense?
SGTM. I changed the status flags.
@ Nika, your patches got backed out two months ago in comment 14. Is this fix still relevant now that Fission is enabled by default on desktop? Do we need this fix for Android e10s?
Updated•11 months ago
|
Comment 18•10 months ago
|
||
:nika, since this bug contains a bisection range, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.
Comment 20•8 months ago
|
||
Bugmon Analysis
Unable to reproduce bug 1731597 using build mozilla-central 20210920102201-29d6504debf5. Without a baseline, bugmon is unable to analyze this bug.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
Updated•6 months ago
|
Comment 22•6 months ago
|
||
(In reply to Bugmon [:jkratzer for issues] from comment #20)
Bugmon Analysis
Unable to reproduce bug 1731597 using build mozilla-central 20210920102201-29d6504debf5. Without a baseline, bugmon is unable to analyze this bug.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
(as it seems it was somehow fixed elsewhere?)
Reporter | ||
Comment 23•6 months ago
|
||
It looks like the testcase has just become less reliable. I can still reproduce this issue on mozilla-central 20220929-a58c99b9307b. I'll upload a new testcase shortly.
Reporter | ||
Comment 24•6 months ago
|
||
The attached testcase appears more reliable than the original. I'll re-enable bugmon.
Assignee | ||
Comment 25•6 months ago
|
||
Yeah, these patches or something like them should probably land at some point, though I remember struggling to figure out the cause of the failures a while ago when I first tried to land it, which lead me to put them to the side, as I was busy with other things at the time. Probably worth taking another look at.
Comment 26•2 months ago
|
||
Testcase crashes using the initial build (mozilla-central 20220122095122-61861c0babc6) but not with tip (mozilla-central 20230120212103-f2fbf518572b.)
The bug appears to have been fixed in the following build range:
Start: 676b14e7a199b1bb5ae42b40deb1032bc439585e (20230118165610)
End: 3523d141513d03ec99f06916864ad0a7838a8b65 (20230118193647)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=676b14e7a199b1bb5ae42b40deb1032bc439585e&tochange=3523d141513d03ec99f06916864ad0a7838a8b65
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
Description
•