Open Bug 1954236 Opened 4 months ago Updated 1 month ago

Crash in [@ mozilla::dom::CanonicalBrowsingContext::SetCrossGroupOpenerId] via XULFrameElement::LoadSrc()

Categories

(Core :: XUL, defect)

Unspecified
Windows 11
defect

Tracking

()

Tracking Status
firefox-esr115 --- unaffected
firefox-esr128 --- unaffected
firefox136 --- unaffected
firefox137 --- unaffected
firefox138 --- wontfix
firefox139 --- wontfix

People

(Reporter: mccr8, Unassigned, NeedInfo)

Details

(Keywords: crash, regression)

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/0c7b0283-30b1-44d5-a0d3-68ddd0250314

MOZ_CRASH Reason:

MOZ_DIAGNOSTIC_ASSERT(mCrossGroupOpenerId == 0) (Can only set CrossGroupOpenerId once)

Top 10 frames:

0  xul.dll  mozilla::dom::CanonicalBrowsingContext::SetCrossGroupOpenerId(unsigned long l...  docshell/base/CanonicalBrowsingContext.cpp:2853
0  xul.dll  nsFrameLoader::Create(mozilla::dom::Element*, bool, nsIOpenWindowInfo*)  dom/base/nsFrameLoader.cpp:441
1  xul.dll  mozilla::dom::XULFrameElement::LoadSrc()  dom/xul/XULFrameElement.cpp:104
2  xul.dll  mozilla::dom::XULFrameElement::BindToTree(mozilla::dom::BindContext&, nsINode&)  dom/xul/XULFrameElement.cpp:156
3  xul.dll  mozilla::dom::Element::BindToTree(mozilla::dom::BindContext&, nsINode&)  dom/base/Element.cpp:2225
3  xul.dll  nsStyledElement::BindToTree(mozilla::dom::BindContext&, nsINode&)  dom/base/nsStyledElement.cpp:197
4  xul.dll  nsXULElement::BindToTree(mozilla::dom::BindContext&, nsINode&)  dom/xul/nsXULElement.cpp:558
5  xul.dll  mozilla::dom::Element::BindToTree(mozilla::dom::BindContext&, nsINode&)  dom/base/Element.cpp:2225
5  xul.dll  nsStyledElement::BindToTree(mozilla::dom::BindContext&, nsINode&)  dom/base/nsStyledElement.cpp:197
6  xul.dll  nsXULElement::BindToTree(mozilla::dom::BindContext&, nsINode&)  dom/xul/nsXULElement.cpp:558

Looks like a regression in the 20250313102934 build. Here are the set of changes added in that build. These crashes are all via XULFrameElement::LoadSrc().

Almost all of the crashes have URLs like this: https://shopee.com.br/buyer/login?next=https%3A%2F%2Fshopee.com.br%2F

Emilio, any ideas what might have caused this? Thanks.

Flags: needinfo?(emilio)

This is a parent process crash. If I had to guess, it'd be a browser front-end side change... But I don't see anything too suspicious there, and I don't see how this can happen by code inspection. Front-end can set this, but in the stack from comment 0 we have just created the browsing context, so... Nika and Farre seem familiar with this flag, maybe they have ideas?

Flags: needinfo?(nika)
Flags: needinfo?(emilio)
Flags: needinfo?(afarre)

In this situation, we're creating a pop-up window from Gecko, which can be seen from the ContentParent::RecvCreateWindow call higher up on the stack, which in turn is calling nsWindowWatcher::OpenWindowWithRemoteTab. During this event loop, we receive a document load event (presumably for the pop-up window), and insert the initial browser into the DOM.

It's actually likely that the BrowsingContext was created earlier, because the content process generally creates the BrowsingContext in situations like these. The BrowsingContext is actually carried in the nsIOpenWindowInfo (via. the mNextRemoteBrowser member). The fact that we set the CrossGroupOpenerId multiple times vaguely suggests that frontend code is inserting two separate <browser> elements which both use the same nsIOpenWindowInfo. I expect that we might crash soon after this call in that case even if we didn't hit this assertion, as a BrowsingContext cannot be embedded twice.

The implementation of nsBrowserAccess was recently moved to a separate module in bug 1946400, which could potentially be related to a behaviour change on the frontend. I expect that inserting the same nsIOpenWindowInfo twice is likely something only possible in a rare edge-case caused by failure in some earlier part of window creation.

One possibility worth noting is that it is possible that the pop-up creation was originally going to be targeted to create a new tab, but fell back to opening a new window due to returning null with a success result (https://searchfox.org/mozilla-central/rev/a8a00d67c6c7118f0b95cffa26740202c3b9e6f3/dom/ipc/ContentParent.cpp#5334-5337). If this happens in a case where the browser was actually inserted, and then not returned, it could lead to this behaviour.

Leaving a ni? for :gijs in case there were some changes to the frontend window opening logic recently around bug 1946400.

Flags: needinfo?(nika) → needinfo?(gijskruitbosch+bugs)

(In reply to Nika Layzell [:nika] (ni? for response) from comment #2)

In this situation, we're creating a pop-up window from Gecko, which can be seen from the ContentParent::RecvCreateWindow call higher up on the stack, which in turn is calling nsWindowWatcher::OpenWindowWithRemoteTab. During this event loop, we receive a document load event (presumably for the pop-up window), and insert the initial browser into the DOM.

Nitpick but to be sure we're on the same page, it's DOMContentLoaded (via DispatchContentLoadedEvents and the sole call to DispatchTrustedEvent in there), right?

It's actually likely that the BrowsingContext was created earlier, because the content process generally creates the BrowsingContext in situations like these. The BrowsingContext is actually carried in the nsIOpenWindowInfo (via. the mNextRemoteBrowser member). The fact that we set the CrossGroupOpenerId multiple times vaguely suggests that frontend code is inserting two separate <browser> elements which both use the same nsIOpenWindowInfo. I expect that we might crash soon after this call in that case even if we didn't hit this assertion, as a BrowsingContext cannot be embedded twice.

How could we verify this, perhaps with a separate diagnostic assert? Can docshell/dom code detect the reuse of the nsIOpenWindowInfo?

I confess to not being very familiar with this code, so I'm not sure I have immediate ideas on how this reuse would happen. Off-hand, if we're talking about initialOpenWindowInfo, AFAICT that is only read here and then used further down... but I don't see how it'd be reused.

Are we sure this is reuse of nsIOpenWindowInfo - and in particular, could it be that somehow we have tried to set a crossGroupOpenerId on the initial tab for some reason (ie via some other path that may not even involve nsIOpenWindowInfo at all) before we hit the code I linked in the previous para?

The implementation of nsBrowserAccess was recently moved to a separate module in bug 1946400, which could potentially be related to a behaviour change on the frontend. I expect that inserting the same nsIOpenWindowInfo twice is likely something only possible in a rare edge-case caused by failure in some earlier part of window creation.

One possibility worth noting is that it is possible that the pop-up creation was originally going to be targeted to create a new tab, but fell back to opening a new window due to returning null with a success result (https://searchfox.org/mozilla-central/rev/a8a00d67c6c7118f0b95cffa26740202c3b9e6f3/dom/ipc/ContentParent.cpp#5334-5337). If this happens in a case where the browser was actually inserted, and then not returned, it could lead to this behaviour.

Leaving a ni? for :gijs in case there were some changes to the frontend window opening logic recently around bug 1946400.

There were not meant/supposed to be any functional/timing changes in that bug. Given the history on one of the other changes I can understand the suspicion but I don't see anything obvious when re-reviewing the change now...

The timing seems plausible-ish, though it is strange that there are 0 crashes on 137 nightly, but there are crashes on 138 nightly as well as 137 beta, which vaguely hints at something that landed in 138 and uplifted to 137 - 1946400 landed in 137 nightly, just over a week before merge day. There are multiple crashes each build ID (on Nightly) starting March 13th with the 10.29am build, afaict.

I also noticed that in correlations,

(85.19% in signature vs 08.96% overall) accessibility = 1

and looking at https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FIREFOX_137_0b4_RELEASE&tochange=FIREFOX_137_0b6_RELEASE there are a number of tabgroups related uplifts, including things like:

Jeremy Swinarton — Bug 1951319: Deferred session restore re-opens tab groups a=pascalc

and quite a number of the crashes have panorama / tab grouping extensions, so I wonder if there is some correlation there (but can't check for correlations with prefs). (That bug is also in the pushlog from comment 0.)

Bouncing needinfo given I'm still a bit lost here and hopeful that Nika has more context / can help move the guessing game forward a little. :-)

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(nika)

(In reply to :Gijs (he/him) from comment #3)

Nitpick but to be sure we're on the same page, it's DOMContentLoaded (via DispatchContentLoadedEvents and the sole call to DispatchTrustedEvent in there), right?

Yes, it is DOMContentLoaded (fired here: https://hg.mozilla.org/mozilla-central/file/a1cc47ed2278fe3b16902c3b70ba4303030f17f6/dom/base/Document.cpp#l8369)

How could we verify this, perhaps with a separate diagnostic assert? Can docshell/dom code detect the reuse of the nsIOpenWindowInfo?

I don't think there's currently any mechanism for it with how the code was written. It's probably possible to add some extra assertions for that by adding some tracking state in the object, but that would need to be implemented.

I confess to not being very familiar with this code, so I'm not sure I have immediate ideas on how this reuse would happen. Off-hand, if we're talking about initialOpenWindowInfo, AFAICT that is only read here and then used further down... but I don't see how it'd be reused.

I believe the way this would be being re-used is if the nsIOpenWindowInfo is consumed during the call to createContentWindowInFrame(...) (https://searchfox.org/mozilla-central/rev/81b1c51a6b1cfe1dc2a5c8b39f3346bc1d167790/dom/ipc/ContentParent.cpp#5318), which is generally used to create a new tab. If that call to createContentWindowInFrame did not throw while returning null, we'd enter this case (https://searchfox.org/mozilla-central/rev/81b1c51a6b1cfe1dc2a5c8b39f3346bc1d167790/dom/ipc/ContentParent.cpp#5334-5337), which would cause us to also call OpenWindowWithRemoteTab, re-using the same nsIOpenWindowInfo (as it's assumed in the case where frontend returns null that the nsIOpenWindowInfo was not consumed).

Perhaps some of the changes you mention later in this bug changed how tabs are created & managed in frontend content, such that there's a new error codepath being taken, or such that the tab is inserted in a "lazy" state, meaning the browser appears to be "null", but then is initialized asynchronously with the provided nsIOpenWindowInfo? I'm unsure how this exactly could be happening.

Leaving the ni? on me for now, so I can get back to this soon, as I haven't had much of a chance to look into it yet unfortunately.

(In reply to Nika Layzell [:nika] (ni? for response) from comment #4)

Perhaps some of the changes you mention later in this bug changed how tabs are created & managed in frontend content, such that there's a new error codepath being taken, or such that the tab is inserted in a "lazy" state, meaning the browser appears to be "null", but then is initialized asynchronously with the provided nsIOpenWindowInfo? I'm unsure how this exactly could be happening.

Adding Dão here in case he has ideas on this front wrt bug 1951319 or other recent tab groups changes.

Flags: needinfo?(dao+bmo)

I don't know anything furthat than what's been discussed already.

Flags: needinfo?(afarre)

Sorry for the late reply. Generally, nothing in here seems rings a bell for me, so I'd expect this to be unrelated to tab groups.

(In reply to :Gijs (he/him) from comment #5)

(In reply to Nika Layzell [:nika] (ni? for response) from comment #4)

Perhaps some of the changes you mention later in this bug changed how tabs are created & managed in frontend content, such that there's a new error codepath being taken, or such that the tab is inserted in a "lazy" state, meaning the browser appears to be "null", but then is initialized asynchronously with the provided nsIOpenWindowInfo? I'm unsure how this exactly could be happening.

Adding Dão here in case he has ideas on this front wrt bug 1951319 or other recent tab groups changes.

Bug 1951319 has been uplifted and shipped in 137. Do we know if this started happening in 137?

Flags: needinfo?(dao+bmo) → needinfo?(gijskruitbosch+bugs)

(In reply to Dão Gottwald [:dao] from comment #7)

Adding Dão here in case he has ideas on this front wrt bug 1951319 or other recent tab groups changes.

Bug 1951319 has been uplifted and shipped in 137. Do we know if this started happening in 137?

Yes, comment #3 has analysis of this. Note that this is a diagnostic assert so it won't be crashing on release or late beta, only nightly and early beta.

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(dao+bmo)
You need to log in before you can comment on or make changes to this bug.