Large spike in crashes [@ mozilla::net::DocumentChannel::SetLoadFlags]
Categories
(Web Compatibility :: Site Reports, defect, P1)
Tracking
(firefox-esr115 wontfix, firefox-esr128 verified, firefox131 wontfix, firefox132+ verified, firefox133+ verified)
People
(Reporter: farre, Assigned: emilio)
References
(Regression)
Details
(Keywords: regression, Whiteboard: [qa-triaged])
Crash Data
Attachments
(2 files, 2 obsolete files)
106 bytes,
text/html
|
Details | |
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr128+
|
Details | Review |
Reporter | ||
Comment 1•2 months ago
|
||
I created this to have somewhere to back out bug 1911977.
Reporter | ||
Comment 2•2 months ago
|
||
The reason for the crash is this patch. Backing it out makes the crash go away on emulator.
Reporter | ||
Updated•2 months ago
|
Reporter | ||
Comment 3•2 months ago
|
||
Assignee | ||
Comment 4•2 months ago
|
||
Commented on the patch. I think this diagnostic assert is unlikely to be a correctness issue in this particular case of an external protocol navigation (given the flag we're crashing on is VALIDATE_NEVER
), for what is worth.
But in any case I don't think we should back out bug 1911977. I'll take a look. For context, STR are in bug 1694904 comment 57 (please let me know if there's simpler steps).
Reporter | ||
Comment 5•2 months ago
|
||
Nope, that's as simple as we've been able to make them.
The VALIDATE_NEVER
isn't actually set on the external protocol navigation, but an existing request in the load group. And it's not necessarily the external navigation that get's the wrong load flag set on itself, it is the fact that we cancel a load and canceling a load can in turn run onload handlers. Which could start new loads, which in turn would get the VALIDATE_NEVER
flag set on them, even though they're fresh loads.
Looking at the crashstats for bug 1694904 for Fenix Nightly you can see that most of the crashes are actaully some site doing location.href = ...
. And that should never get VALIDATE_NEVER
set, as far as I could tell.
Assignee | ||
Comment 6•2 months ago
|
||
Simpler steps to reproduce, even on desktop:
- Enable RDM.
- Go to https://play.google.com/store/apps/details?id=com.zynga.starwars.hunters&hl=en_US
- Shift-reload (Ctrl+R)
Assignee | ||
Comment 7•2 months ago
|
||
So, ok, with those STR, I can see that the flag just comes from merging the load group flags, in particular we inherit the bit from here:
The stack for that is:
#0 mozilla::net::DocumentChannel::SetLoadFlags (this=0x7e10cdf61280, aLoadFlags=2687492) at /home/emilio/src/moz/gecko-4/netwerk/ipc/DocumentChannel.cpp:299
#1 0x00007e10d93891bc in mozilla::net::nsLoadGroup::AddRequest (this=0x7e10d438b880, request=0x7e10cdf61280, ctxt=<optimized out>) at /home/emilio/src/moz/gecko-4/netwerk/base/nsLoadGroup.cpp:1104
#2 0x00007e10d9aa45cf in mozilla::net::DocumentChannelChild::AsyncOpen (this=0x7e10cdf61280, aListener=<optimized out>) at /home/emilio/src/moz/gecko-4/netwerk/ipc/DocumentChannelChild.cpp:86
#3 0x00007e10d9fb0211 in nsURILoader::OpenURI (this=0x7e10d27afe80, channel=0x7e10cdf61280, aFlags=<optimized out>, aWindowContext=<optimized out>) at /home/emilio/src/moz/gecko-4/uriloader/base/nsURILoader.cpp:754
#4 0x00007e10df441da6 in nsDocShell::OpenInitializedChannel (this=this@entry=0x7e10eb67ec00, aChannel=0x7e10cdf61280, aURILoader=0x7e10d27afe80, aOpenFlags=<optimized out>) at /home/emilio/src/moz/gecko-4/docshell/base/nsDocShell.cpp:10626
#5 0x00007e10df43c471 in nsDocShell::DoURILoad (this=this@entry=0x7e10eb67ec00, aLoadState=aLoadState@entry=0x7e10cd218500, aCacheKey=..., aRequest=0x7ffe9f66e8a0) at /home/emilio/src/moz/gecko-4/docshell/base/nsDocShell.cpp:10487
#6 0x00007e10df3dc9b8 in nsDocShell::InternalLoad (this=this@entry=0x7e10eb67ec00, aLoadState=aLoadState@entry=0x7e10cd218500, aCacheKey=...) at /home/emilio/src/moz/gecko-4/docshell/base/nsDocShell.cpp:9523
#7 0x00007e10df40dcd9 in nsDocShell::LoadURI (this=0x7e10eb67ec00, aLoadState=0x7e10cd218500, aSetNavigating=<optimized out>, aContinueHandlingSubframeHistory=<optimized out>) at /home/emilio/src/moz/gecko-4/docshell/base/nsDocShell.cpp:850
#8 0x00007e10df3da3a3 in mozilla::dom::BrowsingContext::LoadURI (this=0x7e10eb6a5f00, aLoadState=0x7e10cd218500, aSetNavigating=false) at /home/emilio/src/moz/gecko-4/docshell/base/BrowsingContext.cpp:2001
#9 0x00007e10db01c1ae in mozilla::dom::LocationBase::SetURI (this=this@entry=0x7e10cb9a9dd8, aURI=0x7e10c76c6cf0, aSubjectPrincipal=..., aRv=..., aReplace=true) at /home/emilio/src/moz/gecko-4/dom/base/LocationBase.cpp:162
#10 0x00007e10db0203d0 in mozilla::dom::LocationBase::SetHrefWithBase
(this=this@entry=0x7e10cb9a9dd8, aHref=u"intent://play.google.com/store/apps/details?id=com.zynga.starwars.hunters&hl=en_US&pcampaignid=web_auto_redirect&web_logged_in=0&redirect_entry_point=dp#Intent;scheme=https;action=android.intent.actio"..., aBase=<optimized out>, aSubjectPrincipal=..., aReplace=<optimized out>, aRv=...) at /home/emilio/src/moz/gecko-4/dom/base/LocationBase.cpp:252
#11 0x00007e10db01ee83 in mozilla::dom::LocationBase::DoSetHref
(this=0x7e10cb9a9dd8, aHref=u"intent://play.google.com/store/apps/details?id=com.zynga.starwars.hunters&hl=en_US&pcampaignid=web_auto_redirect&web_logged_in=0&redirect_entry_point=dp#Intent;scheme=https;action=android.intent.actio"..., aSubjectPrincipal=..., aReplace=false, aRv=...) at /home/emilio/src/moz/gecko-4/dom/base/LocationBase.cpp:202
#12 0x00007e10db4858d1 in mozilla::dom::Location_Binding::set_href
Nothing has yet read the flags, and the channel isn't done opening yet, so my understanding is that we still haven't propagated the flags to the parent, so adding the flag should be ok? Or does the parent compute the flags some other way.
In fact we already have an exemption for <object>
here. Nika, Andreas, any reason why it would be unsafe for non-object loads (if the parent hasn't opened yet?).
Digging a bit through blame, I can see bug 1656925, which is basically the same issue.
I can think of two fixes here:
-
Same fix as bug 1656925, don't add external protocol to the load group. I think it should be safe, the situation here is basically the same as for downloads, though it seems similarly special-cased.
-
Allow
SetLoadFlags
to be called while the channel hasn't been opened yet (if it's safe).
Any thoughts on which way to go?
Assignee | ||
Comment 8•2 months ago
|
||
Assignee | ||
Comment 9•2 months ago
|
||
Much like we do with downloads (bug 1656925), these don't unload the
current document.
This prevents them from having mismatched flags, which asserts, see the
test-case.
Comment 10•2 months ago
|
||
Comment 14•2 months ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #7)
Nothing has yet read the flags, and the channel isn't done opening yet, so my understanding is that we still haven't propagated the flags to the parent, so adding the flag should be ok? Or does the parent compute the flags some other way.
In fact we already have an exemption for
<object>
here. Nika, Andreas, any reason why it would be unsafe for non-object loads (if the parent hasn't opened yet?).
The reason why it's blocked for non-object loads is because the load flags for non-object loads are calculated in the parent process. The load flags are only passed up to the parent process in the TYPE_OBJECT case (https://searchfox.org/mozilla-central/rev/b546e74055186bb387ed7db9079374e1e7e0b88e/netwerk/ipc/DocumentChannelChild.cpp#132), and otherwise, the flags are independently calculated in the parent process (https://searchfox.org/mozilla-central/rev/b546e74055186bb387ed7db9079374e1e7e0b88e/netwerk/ipc/DocumentLoadListener.cpp#986-987).
This is also why bug 1656925 avoided adding the download to the load group - because setting the load flags after the load had already started would break things.
Do we know what flags are trying to be set which shouldn't be? Theoretically we could add a special case for those specific flags to allow them to be sent to the parent process before the channel is opened.
The entire situation around loadgroups and how they interact with DocumentChannel is super awkward though, so I'm not super surprised that we're running into weird issues here unfortunately :-/.
Assignee | ||
Comment 15•2 months ago
|
||
As per discussion. Test is in the other patch, assuming this is green
I'll incorporate it, but gotta go for the day.
Comment 16•2 months ago
|
||
[Tracking Requested - why for this release]: A fairly high volume crash in Beta 132 that may become worse on release.
If the crash here/in bug 1694904 looks better after this patch lands then it may be a good idea to request uplift.
Updated•2 months ago
|
Reporter | ||
Comment 17•2 months ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #15)
Created attachment 9429431 [details]
Bug 1921972 - Allow to propagate loadgroup flags to parent process. r=nika!As per discussion. Test is in the other patch, assuming this is green
I'll incorporate it, but gotta go for the day.
This is exactly the fix for Bug 1694904
Updated•2 months ago
|
Assignee | ||
Updated•2 months ago
|
Comment 18•2 months ago
|
||
Comment 19•1 months ago
|
||
bugherder |
Updated•1 months ago
|
Reporter | ||
Updated•1 month ago
|
Comment 21•1 month ago
|
||
Please nominate this for Beta approval when you get a chance.
Assignee | ||
Comment 22•1 month ago
|
||
Comment on attachment 9429431 [details]
Bug 1921972 - Allow to propagate loadgroup flags to parent process. r=nika!
Beta/Release Uplift Approval Request
- User impact if declined/Reason for urgency: Fixes crash spike on long-standing bug.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: Yes
- If yes, steps to reproduce: comment 8
- List of other uplifts needed: none
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Pretty well tested / exercised code. Has reduced the crash rate to zero.
- String changes made/needed: none
- Is Android affected?: Yes
Assignee | ||
Updated•1 month ago
|
Comment 23•1 month ago
|
||
Comment on attachment 9429431 [details]
Bug 1921972 - Allow to propagate loadgroup flags to parent process. r=nika!
Looks like this needs a rebased patch for Beta.
Comment 24•1 month ago
•
|
||
Conflicts from bug 1920960 it appears. Not sure if it'd be easier to rebase around that or just uplift that too (both graft cleanly together).
Assignee | ||
Comment 25•1 month ago
|
||
Yeah seems simpler to just uplift that, I'll re-request both.
Assignee | ||
Updated•1 month ago
|
Comment 26•1 month ago
|
||
Comment on attachment 9429431 [details]
Bug 1921972 - Allow to propagate loadgroup flags to parent process. r=nika!
Approved for 132.0b7.
Updated•1 month ago
|
Comment 27•1 month ago
|
||
uplift |
Updated•1 month ago
|
I was able to reproduce the crash on an affected Firefox 132.0b6 build, using Windows 11 and macOS 14.7, while following the steps from Comment 6 (but only when an Android device was selected from the devices list while being in RDM mode) and from Comment 8.
Verified as fixed using Firefox 132.0b7 and Firefox Nightly 133.0a1 (2024-10-14), using Windows 11, macOS 14.7 and Ubuntu 22.04. The tab is no longer crashing when following the steps from Comment 6 or Comment 8.
Assignee | ||
Comment 29•1 month ago
|
||
Comment on attachment 9429431 [details]
Bug 1921972 - Allow to propagate loadgroup flags to parent process. r=nika!
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Needed if we want to uplift bug 1911977.
- User impact if declined: see other bug
- Fix Landed on Version:
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Relatively straight-forward change.
Comment 30•1 month ago
|
||
Comment on attachment 9429431 [details]
Bug 1921972 - Allow to propagate loadgroup flags to parent process. r=nika!
Approved for 128.4esr.
Updated•1 month ago
|
Comment 31•1 month ago
|
||
uplift |
Comment 32•1 month ago
|
||
Description
•