Closed Bug 1921972 Opened 2 months ago Closed 1 months ago

Large spike in crashes [@ mozilla::net::DocumentChannel::SetLoadFlags]

Categories

(Web Compatibility :: Site Reports, defect, P1)

Tracking

(firefox-esr115 wontfix, firefox-esr128 verified, firefox131 wontfix, firefox132+ verified, firefox133+ verified)

VERIFIED FIXED
Tracking Status
firefox-esr115 --- wontfix
firefox-esr128 --- verified
firefox131 --- wontfix
firefox132 + verified
firefox133 + verified

People

(Reporter: farre, Assigned: emilio)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [qa-triaged])

Crash Data

Attachments

(2 files, 2 obsolete files)

No description provided.

I created this to have somewhere to back out bug 1911977.

Flags: needinfo?(emilio)

The reason for the crash is this patch. Backing it out makes the crash go away on emulator.

Assignee: nobody → afarre
Severity: -- → S2
Status: NEW → ASSIGNED
Priority: -- → P1
See Also: → 1694904

Commented on the patch. I think this diagnostic assert is unlikely to be a correctness issue in this particular case of an external protocol navigation (given the flag we're crashing on is VALIDATE_NEVER), for what is worth.

But in any case I don't think we should back out bug 1911977. I'll take a look. For context, STR are in bug 1694904 comment 57 (please let me know if there's simpler steps).

Assignee: afarre → emilio

Nope, that's as simple as we've been able to make them.

The VALIDATE_NEVER isn't actually set on the external protocol navigation, but an existing request in the load group. And it's not necessarily the external navigation that get's the wrong load flag set on itself, it is the fact that we cancel a load and canceling a load can in turn run onload handlers. Which could start new loads, which in turn would get the VALIDATE_NEVER flag set on them, even though they're fresh loads.

Looking at the crashstats for bug 1694904 for Fenix Nightly you can see that most of the crashes are actaully some site doing location.href = .... And that should never get VALIDATE_NEVER set, as far as I could tell.

Simpler steps to reproduce, even on desktop:

So, ok, with those STR, I can see that the flag just comes from merging the load group flags, in particular we inherit the bit from here:

The stack for that is:

#0  mozilla::net::DocumentChannel::SetLoadFlags (this=0x7e10cdf61280, aLoadFlags=2687492) at /home/emilio/src/moz/gecko-4/netwerk/ipc/DocumentChannel.cpp:299
#1  0x00007e10d93891bc in mozilla::net::nsLoadGroup::AddRequest (this=0x7e10d438b880, request=0x7e10cdf61280, ctxt=<optimized out>) at /home/emilio/src/moz/gecko-4/netwerk/base/nsLoadGroup.cpp:1104
#2  0x00007e10d9aa45cf in mozilla::net::DocumentChannelChild::AsyncOpen (this=0x7e10cdf61280, aListener=<optimized out>) at /home/emilio/src/moz/gecko-4/netwerk/ipc/DocumentChannelChild.cpp:86
#3  0x00007e10d9fb0211 in nsURILoader::OpenURI (this=0x7e10d27afe80, channel=0x7e10cdf61280, aFlags=<optimized out>, aWindowContext=<optimized out>) at /home/emilio/src/moz/gecko-4/uriloader/base/nsURILoader.cpp:754
#4  0x00007e10df441da6 in nsDocShell::OpenInitializedChannel (this=this@entry=0x7e10eb67ec00, aChannel=0x7e10cdf61280, aURILoader=0x7e10d27afe80, aOpenFlags=<optimized out>) at /home/emilio/src/moz/gecko-4/docshell/base/nsDocShell.cpp:10626
#5  0x00007e10df43c471 in nsDocShell::DoURILoad (this=this@entry=0x7e10eb67ec00, aLoadState=aLoadState@entry=0x7e10cd218500, aCacheKey=..., aRequest=0x7ffe9f66e8a0) at /home/emilio/src/moz/gecko-4/docshell/base/nsDocShell.cpp:10487
#6  0x00007e10df3dc9b8 in nsDocShell::InternalLoad (this=this@entry=0x7e10eb67ec00, aLoadState=aLoadState@entry=0x7e10cd218500, aCacheKey=...) at /home/emilio/src/moz/gecko-4/docshell/base/nsDocShell.cpp:9523
#7  0x00007e10df40dcd9 in nsDocShell::LoadURI (this=0x7e10eb67ec00, aLoadState=0x7e10cd218500, aSetNavigating=<optimized out>, aContinueHandlingSubframeHistory=<optimized out>) at /home/emilio/src/moz/gecko-4/docshell/base/nsDocShell.cpp:850
#8  0x00007e10df3da3a3 in mozilla::dom::BrowsingContext::LoadURI (this=0x7e10eb6a5f00, aLoadState=0x7e10cd218500, aSetNavigating=false) at /home/emilio/src/moz/gecko-4/docshell/base/BrowsingContext.cpp:2001
#9  0x00007e10db01c1ae in mozilla::dom::LocationBase::SetURI (this=this@entry=0x7e10cb9a9dd8, aURI=0x7e10c76c6cf0, aSubjectPrincipal=..., aRv=..., aReplace=true) at /home/emilio/src/moz/gecko-4/dom/base/LocationBase.cpp:162
#10 0x00007e10db0203d0 in mozilla::dom::LocationBase::SetHrefWithBase
    (this=this@entry=0x7e10cb9a9dd8, aHref=u"intent://play.google.com/store/apps/details?id=com.zynga.starwars.hunters&hl=en_US&pcampaignid=web_auto_redirect&web_logged_in=0&redirect_entry_point=dp#Intent;scheme=https;action=android.intent.actio"..., aBase=<optimized out>, aSubjectPrincipal=..., aReplace=<optimized out>, aRv=...) at /home/emilio/src/moz/gecko-4/dom/base/LocationBase.cpp:252
#11 0x00007e10db01ee83 in mozilla::dom::LocationBase::DoSetHref
    (this=0x7e10cb9a9dd8, aHref=u"intent://play.google.com/store/apps/details?id=com.zynga.starwars.hunters&hl=en_US&pcampaignid=web_auto_redirect&web_logged_in=0&redirect_entry_point=dp#Intent;scheme=https;action=android.intent.actio"..., aSubjectPrincipal=..., aReplace=false, aRv=...) at /home/emilio/src/moz/gecko-4/dom/base/LocationBase.cpp:202
#12 0x00007e10db4858d1 in mozilla::dom::Location_Binding::set_href

Nothing has yet read the flags, and the channel isn't done opening yet, so my understanding is that we still haven't propagated the flags to the parent, so adding the flag should be ok? Or does the parent compute the flags some other way.

In fact we already have an exemption for <object> here. Nika, Andreas, any reason why it would be unsafe for non-object loads (if the parent hasn't opened yet?).

Digging a bit through blame, I can see bug 1656925, which is basically the same issue.

I can think of two fixes here:

  • Same fix as bug 1656925, don't add external protocol to the load group. I think it should be safe, the situation here is basically the same as for downloads, though it seems similarly special-cased.

  • Allow SetLoadFlags to be called while the channel hasn't been opened yet (if it's safe).

Any thoughts on which way to go?

Flags: needinfo?(nika)
Flags: needinfo?(emilio)
Flags: needinfo?(afarre)
Attached file Minimal test-case

Much like we do with downloads (bug 1656925), these don't unload the
current document.

This prevents them from having mismatched flags, which asserts, see the
test-case.

Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4888b2f651bc Don't add external protocol navigations to the loadgroup. r=smaug,necko-reviewers
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/48490 for changes under testing/web-platform/tests
Upstream PR was closed without merging

(In reply to Emilio Cobos Álvarez (:emilio) from comment #7)

Nothing has yet read the flags, and the channel isn't done opening yet, so my understanding is that we still haven't propagated the flags to the parent, so adding the flag should be ok? Or does the parent compute the flags some other way.

In fact we already have an exemption for <object> here. Nika, Andreas, any reason why it would be unsafe for non-object loads (if the parent hasn't opened yet?).

The reason why it's blocked for non-object loads is because the load flags for non-object loads are calculated in the parent process. The load flags are only passed up to the parent process in the TYPE_OBJECT case (https://searchfox.org/mozilla-central/rev/b546e74055186bb387ed7db9079374e1e7e0b88e/netwerk/ipc/DocumentChannelChild.cpp#132), and otherwise, the flags are independently calculated in the parent process (https://searchfox.org/mozilla-central/rev/b546e74055186bb387ed7db9079374e1e7e0b88e/netwerk/ipc/DocumentLoadListener.cpp#986-987).

This is also why bug 1656925 avoided adding the download to the load group - because setting the load flags after the load had already started would break things.

Do we know what flags are trying to be set which shouldn't be? Theoretically we could add a special case for those specific flags to allow them to be sent to the parent process before the channel is opened.

The entire situation around loadgroups and how they interact with DocumentChannel is super awkward though, so I'm not super surprised that we're running into weird issues here unfortunately :-/.

Flags: needinfo?(nika)

As per discussion. Test is in the other patch, assuming this is green
I'll incorporate it, but gotta go for the day.

[Tracking Requested - why for this release]: A fairly high volume crash in Beta 132 that may become worse on release.

If the crash here/in bug 1694904 looks better after this patch lands then it may be a good idea to request uplift.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #15)

Created attachment 9429431 [details]
Bug 1921972 - Allow to propagate loadgroup flags to parent process. r=nika!

As per discussion. Test is in the other patch, assuming this is green
I'll incorporate it, but gotta go for the day.

This is exactly the fix for Bug 1694904

Flags: needinfo?(afarre)
Attachment #9429075 - Attachment is obsolete: true
Flags: needinfo?(emilio)
Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/611e44883688 Allow to propagate loadgroup flags to parent process. r=nika,necko-reviewers,valentin
Status: ASSIGNED → RESOLVED
Closed: 1 months ago
Resolution: --- → FIXED
Upstream PR merged by moz-wptsync-bot
Attachment #9428277 - Attachment is obsolete: true
Crash Signature: [@ mozilla::net::DocumentChannel::SetLoadFlags]

Please nominate this for Beta approval when you get a chance.

Flags: needinfo?(emilio)

Comment on attachment 9429431 [details]
Bug 1921972 - Allow to propagate loadgroup flags to parent process. r=nika!

Beta/Release Uplift Approval Request

  • User impact if declined/Reason for urgency: Fixes crash spike on long-standing bug.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: comment 8
  • List of other uplifts needed: none
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Pretty well tested / exercised code. Has reduced the crash rate to zero.
  • String changes made/needed: none
  • Is Android affected?: Yes
Flags: needinfo?(emilio)
Attachment #9429431 - Flags: approval-mozilla-beta?
Flags: qe-verify+

Comment on attachment 9429431 [details]
Bug 1921972 - Allow to propagate loadgroup flags to parent process. r=nika!

Looks like this needs a rebased patch for Beta.

Flags: needinfo?(emilio)
Attachment #9429431 - Flags: approval-mozilla-beta?

Conflicts from bug 1920960 it appears. Not sure if it'd be easier to rebase around that or just uplift that too (both graft cleanly together).

Yeah seems simpler to just uplift that, I'll re-request both.

Flags: needinfo?(emilio)
Attachment #9429431 - Flags: approval-mozilla-beta?

Comment on attachment 9429431 [details]
Bug 1921972 - Allow to propagate loadgroup flags to parent process. r=nika!

Approved for 132.0b7.

Attachment #9429431 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Whiteboard: [qa-triaged]

I was able to reproduce the crash on an affected Firefox 132.0b6 build, using Windows 11 and macOS 14.7, while following the steps from Comment 6 (but only when an Android device was selected from the devices list while being in RDM mode) and from Comment 8.
Verified as fixed using Firefox 132.0b7 and Firefox Nightly 133.0a1 (2024-10-14), using Windows 11, macOS 14.7 and Ubuntu 22.04. The tab is no longer crashing when following the steps from Comment 6 or Comment 8.

Status: RESOLVED → VERIFIED
Flags: qe-verify+

Comment on attachment 9429431 [details]
Bug 1921972 - Allow to propagate loadgroup flags to parent process. r=nika!

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Needed if we want to uplift bug 1911977.
  • User impact if declined: see other bug
  • Fix Landed on Version:
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Relatively straight-forward change.
Attachment #9429431 - Flags: approval-mozilla-esr128?

Comment on attachment 9429431 [details]
Bug 1921972 - Allow to propagate loadgroup flags to parent process. r=nika!

Approved for 128.4esr.

Attachment #9429431 - Flags: approval-mozilla-esr128? → approval-mozilla-esr128+

I have verified this issue with 128.4.0 ESR on Windows 11 x64, macOS 14, and Ubuntu 24 using the test case from comment 8. The crash is no longer reproducible on this build. Previously, I was able to reproduce it using the steps from comment 6 on an affected Nightly build (2024-10-01).

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: