Open Bug 1810039 Opened 2 years ago Updated 3 months ago

Fenix packages about:neterror which relies on aboutNetError.css and friends, but doesn't package those CSS files

Categories

(Fenix :: Browser Engine, defect)

All
Android
defect

Tracking

(firefox111 affected)

Tracking Status
firefox111 --- affected

People

(Reporter: Gijs, Unassigned, NeedInfo)

References

Details

(Whiteboard: [geckoview:m112][geckoview:m113])

In bug 1724718 I am, after a ~16 month hiatus, once again trying to make it so we do not land references to chrome/resource files that do not exist (by making attempts to load such URLs crash, for files inside omni.ja and friends).

A number of wpt and other tests then start failing because aboutNetError.css is missing. I expect that if I fix that, error-pages.css is missing (cf. https://searchfox.org/mozilla-central/rev/b19830c6b22f73758862d34d2c64f57735890e90/toolkit/themes/shared/desktop-jar.inc.mn#17,32 these are both packaged in the desktop jar.inc.mn, but the markup is packaged in the shared-with-android directory).

I'm honestly a bit confused because in practice on android I see quite different error pages so I have always thought Android used their own error pages. But perhaps not for wpt tests or something?

Either way, if we ship the toolkit error page and use it in some cases, we should probably ship the CSS so it looks OK? (and if it doesn't look OK, fix the CSS so it does!)

Depends on: 1734217

(In reply to :Gijs (he/him) from comment #0)

I'm honestly a bit confused because in practice on android I see quite different error pages so I have always thought Android used their own error pages. But perhaps not for wpt tests or something?

As far as I can tell, while Firefox on Android does use its own error pages, those are provided on top of GeckoView, which itself uses aboutNetError.mjs. So the only way to experience them on mobile Firefox is to directly visit something like about:neterror?.

The CSS files and associated images were dropped from mobile in D157726, as they're now used only to control the styling rather than the visibility of various parts of the page. I'd still maintain that they're not actually required for the error pages to "look OK", esp. as they're not actually experienced in production.

(In reply to Eemeli Aro [:eemeli] from comment #1)

(In reply to :Gijs (he/him) from comment #0)

I'm honestly a bit confused because in practice on android I see quite different error pages so I have always thought Android used their own error pages. But perhaps not for wpt tests or something?

As far as I can tell, while Firefox on Android does use its own error pages, those are provided on top of GeckoView, which itself uses aboutNetError.mjs. So the only way to experience them on mobile Firefox is to directly visit something like about:neterror?.

Well, maybe. docshell has something called loadURIDelegate here: https://searchfox.org/mozilla-central/rev/5bcbe6ae54ee5b152f6cef29dc5627ec7c0e1f1e/docshell/base/nsDocShell.cpp#3843-3853 which gets invoked and asked if it wants to provide an error page.

The implementation of that interface is in mozilla-central: https://searchfox.org/mozilla-central/rev/5bcbe6ae54ee5b152f6cef29dc5627ec7c0e1f1e/mobile/android/modules/geckoview/LoadURIDelegate.jsm#55-93

(there's some parent/child indirection but that's not super important here)

Confusingly, if that call fails / throws an exception, we stop trying to show the error page at all (and in fact, I'm not 100% sure how well that codepath is tested - in the 1990s there was an option to show modal alert boxes instead of error pages but I don't think anyone's tested that codepath for at least a decade. I don't know what the web behaves like if a load error gets you... nothing. Do we leave the old page? Do we load about:blank? What state is the docshell in if we stop mid-load!?). However, if the response from the OnLoadError message is null, we will pass null back, and the code in docshell will proceed with the rest of DisplayLoadError, which will net us the toolkit error pages.

I honestly can't quite figure out what android does here - there are a ton of layers of indirection between gecko making this call and docshell getting a value back (including event loop spinning in the link above which is 😱). I think we end up in android-components in https://github.com/mozilla-mobile/android-components/blob/bb489ef682b33e8ebc770908bfe06befb13b0dd6/components/browser/engine-gecko/src/main/java/mozilla/components/browser/engine/gecko/GeckoEngineSession.kt#L652 and in fenix maybe in https://github.com/mozilla-mobile/fenix/blob/53c6341978af24abbb1e1d862ad17bec1820a67f/app/src/main/java/org/mozilla/fenix/AppRequestInterceptor.kt#L57 - but what I find hard to tell is whether there are any cases where we don't find an error page that's "right" in the gecko situation.

The reason I'm wondering about this is that the failures on infra that brought this to my attention were related to CSP / frame / navigation blocking tests, which is a pretty obscure error state to begin with. And the code on the fenix end sure looks like it has a list of errors it "knows about". So... what happens for errors not in the list?

Here are some errors I saw:

https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunning%2Cpending%2Crunnable&searchStr=wpt&revision=194aa5b57fa715232e885984fb8be101dd33a9ed&selectedTaskRun=fH6w2mksQRC1bRTL222q5Q.0

https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunning%2Cpending%2Crunnable&searchStr=wpt&revision=194aa5b57fa715232e885984fb8be101dd33a9ed&selectedTaskRun=Yta2cHKCT46YfMda8MAjJA.0

These are for the web platform tests:

/html/browsers/browsing-the-web/navigating-across-documents/about-srcdoc-navigation-blocked.window.html
/content-security-policy/reporting/report-frame-ancestors-with-x-frame-options.sub.html 

So basically I'm wondering if it's expected that in those cases we still show the toolkit error page on android.

The CSS files and associated images were dropped from mobile in D157726, as they're now used only to control the styling rather than the visibility of various parts of the page. I'd still maintain that they're not actually required for the error pages to "look OK", esp. as they're not actually experienced in production.

So I don't really have a strong opinion here on what the "right" thing is. But I think we should be explicit: if we don't think this page should be used on android maybe we shouldn't ship it at all, and fix whatever geckoview testrunner we're using to provide its own error page? Or we should #ifdef out the relevant CSS files if we think the "right" thing is showing an unstyled error page in these cases. Or we should package the CSS (and dependent images) if they are supposed to load. I think which of those is "right" is probably a call for the android team.

It's worth noting that in bug 1808987 I found numerous about: pages that are packaged on android and rely on info-pages.css, which is not packaged. I don't know if we'd want the same decision in all of these cases, but superficially the issues feel similar.

(In reply to :Gijs (he/him) from comment #2)

Confusingly, if that call fails / throws an exception, we stop trying to show the error page at all (and in fact, I'm not 100% sure how well that codepath is tested - in the 1990s there was an option to show modal alert boxes instead of error pages but I don't think anyone's tested that codepath for at least a decade. I don't know what the web behaves like if a load error gets you... nothing. Do we leave the old page? Do we load about:blank? What state is the docshell in if we stop mid-load!?). However, if the response from the OnLoadError message is null, we will pass null back, and the code in docshell will proceed with the rest of DisplayLoadError, which will net us the toolkit error pages.

The load will be cancelled, and you'll just stay on the page which you were navigating away from. It's similar in many ways to a download happening, so the state doesn't end up super weird. If we're in a case where FilterStatusForErrorPage says we'll display an error but we don't, as we may end up doing a process switch and showing a blank page in that situation, though.

Re: the alert boxes stuff, I vaguely remember we had to poke it during Fission, and it actually is used, just not really by us. Thunderbird turns off error pages for a number of internal windows, like the mail window, help viewer, addressbook, etc: https://searchfox.org/comm-central/search?q=useErrorPages&path=&case=true&regexp=false. I imagine it still works at least a little bit.

I honestly can't quite figure out what android does here - there are a ton of layers of indirection between gecko making this call and docshell getting a value back (including event loop spinning in the link above which is 😱).

Yeah, I don't love that. I had discussed a while back about potentially moving where we do this delegation stuff to happen later, e.g. as part of a custom channel on Android for about:neterror doing an async redirect, to avoid the nested event loops. That being said, docshell code has to deal with a ton of nested event loops when starting loads anyway (the beforeunload event is a nightmare), so it's not as bad as you might think.

I think we end up in android-components in https://github.com/mozilla-mobile/android-components/blob/bb489ef682b33e8ebc770908bfe06befb13b0dd6/components/browser/engine-gecko/src/main/java/mozilla/components/browser/engine/gecko/GeckoEngineSession.kt#L652 and in fenix maybe in https://github.com/mozilla-mobile/fenix/blob/53c6341978af24abbb1e1d862ad17bec1820a67f/app/src/main/java/org/mozilla/fenix/AppRequestInterceptor.kt#L57 - but what I find hard to tell is whether there are any cases where we don't find an error page that's "right" in the gecko situation.

The reason I'm wondering about this is that the failures on infra that brought this to my attention were related to CSP / frame / navigation blocking tests, which is a pretty obscure error state to begin with. And the code on the fenix end sure looks like it has a list of errors it "knows about". So... what happens for errors not in the list?

FWIW, the errors are converted within geckoview from nsresult to an internal enum type here: https://searchfox.org/mozilla-central/rev/5bcbe6ae54ee5b152f6cef29dc5627ec7c0e1f1e/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/WebRequestError.java#285-379 - so if every case on that list is handled, it should theoretically mean that no error types slip through the cracks. That being said, the code definitely has a fall-through as you noticed, and I wouldn't be too surprised if that was a bit busted.

So basically I'm wondering if it's expected that in those cases we still show the toolkit error page on android.

It's probably worth pinging some android folks about that, but I can't imagine the toolkit error page looks great on android right now, so probably not? If android never expects to use these pages, it's probably worth unpackaging them there (we do care about binary size on Android after-all, even if this isn't significant), and figuring out what we want the fall-back to be if the embedder doesn't cooperate. (clearly data:text/html,<h1>💀</h1> as a fallback </s>)

(In reply to :Gijs (he/him) from comment #2)

Here are some errors I saw:

https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunning%2Cpending%2Crunnable&searchStr=wpt&revision=194aa5b57fa715232e885984fb8be101dd33a9ed&selectedTaskRun=fH6w2mksQRC1bRTL222q5Q.0

https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunning%2Cpending%2Crunnable&searchStr=wpt&revision=194aa5b57fa715232e885984fb8be101dd33a9ed&selectedTaskRun=Yta2cHKCT46YfMda8MAjJA.0

These are for the web platform tests:

/html/browsers/browsing-the-web/navigating-across-documents/about-srcdoc-navigation-blocked.window.html
/content-security-policy/reporting/report-frame-ancestors-with-x-frame-options.sub.html 

So basically I'm wondering if it's expected that in those cases we still show the toolkit error page on android.

(In reply to Nika Layzell [:nika] (ni? for response) from comment #3)

It's probably worth pinging some android folks about that, but I can't imagine the toolkit error page looks great on android right now, so probably not? If android never expects to use these pages, it's probably worth unpackaging them there (we do care about binary size on Android after-all, even if this isn't significant), and figuring out what we want the fall-back to be if the embedder doesn't cooperate. (clearly data:text/html,<h1>💀</h1> as a fallback </s>)

Makes sense to me.

Chris, can you point someone on the android team in the direction of this bug and the cases outlined in my quoted comment above? I'm happy to un-package the toolkit things on android if that's the decision, but I would probably need some advice about what to do about the wpt tests that currently seem to be hitting them.

Flags: needinfo?(cpeterson)

Chris, can you point someone on the android team in the direction of this bug and the cases outlined in my quoted comment above? I'm happy to un-package the toolkit things on android if that's the decision, but I would probably need some advice about what to do about the wpt tests that currently seem to be hitting them.

Adding this bug to the top of the Android team's backlog.

Severity: -- → S3
Rank: 200
Component: General → Browser Engine
Flags: needinfo?(cpeterson)
Priority: -- → P2
Whiteboard: [geckoview:m112?]

I stumbled across this comment while looking at an unrelated intermittent:

// Due to Bug 1692578 we currently cannot test bypassing of the error
// the URI loading process takes the desktop path for iframes
@Test fun loadHTTPSOnlyInSubframe() {

And indeed bug 1692578 says we don't use LoadURIDelegate for subframes. So gecko will show its toolkit error pages there, without styling. This makes sense of the failures in bug 1724718 because they were for x-frame-options / CSP failures for subframes.

Based on this, I think the correct fix here is likely to either ship these files on android, or ensure that LoadURIDelegate runs for subframes. (but I expect that might have performance implications...)

See Also: → 1692578
See Also: → 1815805

Irene will investigate.

Assignee: nobody → bugzeeeeee
Flags: needinfo?(brclark)
Priority: P2 → P1
Whiteboard: [geckoview:m112?] → [geckoview:m112]

Hi Gijs, thank you for looking into this! What is the size of the files in question? Our Android apps use their own error pages (except for the iframes), so if it's possible to avoid shipping the files, it would be better.

I wonder if it's worth disabling the problematic wpt tests for now (how many are failing btw?) and file a bug for enabling them, blocking the bug on LoadURIDelegate support (bug 1692578).

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to [:owlish] 🦉 PST from comment #8)

Hi Gijs, thank you for looking into this! What is the size of the files in question?

At time of writing:

file size
error-pages.css 1843 bytes
info-pages.css 2927 bytes
aboutNetError.css 4639 bytes
aboutHttpsOnlyError.css 1552 bytes

That's about 10KiB - but info-pages.css pulls in common.css which off-hand it also looks like we don't currently package on android, and that is substantially bigger: common-shared.css is about 42KiB. Then there might be SVG icons we don't package at the moment.

Our Android apps use their own error pages (except for the iframes), so if it's possible to avoid shipping the files, it would be better.

Well, the other part is that some of these CSS files are also used on other about: pages that android ships - see bug 1808987.

If we fix the iframe support for LoadURIDelegate, then we could drop error-pages.css and the two about*Error.css pages, and the already-shipped aboutNetError.xhtml and toolkit/components/httpsonlyerror/content/errorpage.html and their JS files, and their JSWindowActor files and registration, and potentially their fluent files (though that may be more trouble than it's worth given how fluent files are packaged at the moment).

Obviously it's not my call how to deal with all this... but if it were, and given that even at 50KiB this isn't like adding multiple MiBs to the fenix package, I would probably recommend packaging the CSS files right now, and prioritizing bug 1692578, after which we can drop all the error pages and supporting files from android, but keep the support files for the other about: pages. That would probably end us up around the same package size anyway (plus this is all frontend code so hopefully it's compressed in the apk? It's definitely compressed on desktop in the installer, even though the omni.ja files themselves aren't compressed once installed).

Another alternative would be unshipping all those about: pages from android, but I imagine e.g. about:logging and about:telemetry and so on are deliberately present on android and we don't want to remove them. (though in a quick test I couldn't get about:telemetry to do much of anything useful)

I wonder if it's worth disabling the problematic wpt tests for now (how many are failing btw?) and file a bug for enabling them, blocking the bug on LoadURIDelegate support (bug 1692578).

So the reason the wpt tests failed was that I made our network code MOZ_CRASH if we tried to use local (chrome and resource) files that were not actually packaged. This should help prevent us accidentally forgetting to package things, or not noticing when we break URL references when moving files etc. They failed because although aboutNetError.xhtml is packaged on Android, the CSS files for it and several other about: pages are not packaged.

To make the tests not fail, I simply added Android-specific exceptions to the crashing networking code. So right now, 0 wpt tests are failing because of this. I'm not sure it'd be better to disable the tests and drop this code - the wpt tests will change and/or new tests will be added that also create errors in subframes, and they will have the same problem, but it'll be hard to trace that back to this issue for the people/bots recording test expectations for those tests. So I think the status quo is OK in terms of the wpt tests.

Hopefully that helps? Let me know if anything is still unclear. :-)

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(bugzeeeeee)
Flags: needinfo?(bugzeeeeee)
Whiteboard: [geckoview:m112] → [geckoview:m112][geckoview:m113]

The Android team has not been keeping our P1 bug list up to date, so we're resetting all our P1 bugs to P2 to avoid signalling that we're actively working on bugs that we're not. The BMO documentation https://wiki.mozilla.org/BMO/UserGuide/BugFields#priority says P1 means "fix in the current release cycle" and P2 means "fix in the next release cycle or the following (nightly + 1 or nightly + 2)".

If you are actively working on this bug and expect to ship it in Fx 122 or 123, then please restore the priority back to P1.

Priority: P1 → P2

Clearing Priority so we can reprioritize these bugs relative to our ux-fun-2024 bugs.

Priority: P2 → --
Assignee: bugzeeeeee → nobody
You need to log in before you can comment on or make changes to this bug.