Consider ensuring shopping's OHTTP keyconfig caching happens in the parent process
Categories
(Firefox :: Shopping, task, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox129 | --- | fixed |
People
(Reporter: Gijs, Assigned: jhirsch)
References
(Blocks 1 open bug)
Details
(Whiteboard: [fidefe-shopping])
Attachments
(1 file)
Right now the ohttp stuff will be requested in the privileged about process and that is also where the config is loaded.
Because the process is typically long-lived that's not the end of the world, but it'd be good if the cache enforced that it lived in the parent.
The right solution would probably be to run a JSProcessActor and use that to communicate the keyconfig.
Reporter | ||
Comment 1•1 year ago
|
||
Updated•1 year ago
|
Updated•1 year ago
|
Reporter | ||
Updated•1 year ago
|
Updated•1 year ago
|
Comment 3•1 year ago
|
||
Backed out for causing xpcshell failures in test_ohttp_config_manager.js.
- Backout link
- Push with failures
- Failure Log
- Failure line: TEST-UNEXPECTED-TIMEOUT | netwerk/test/unit/test_ohttp_config_manager.js | Test timed out
Reporter | ||
Comment 4•1 year ago
|
||
I'm not sure how to proceed here. The test fails on android. I'm not in a good position to debug why. The errors seem not directly related to the code under test:
07-31 13:09:35.171 25497 25537 D ServiceAllocator: org.mozilla.gecko.process.GeckoChildProcessServices$gpu updateBindings: FOREGROUND priority, 0 importance, 3 successful binds, 0 failed binds, 0 successful unbinds
07-31 13:09:35.173 4683 4704 W ServiceChildProcess: This process belongs to a different GeckoRuntime owner: 99550679-1aaf-4fce-92c5-1caf969c6290 process: fde8e882-6b2f-437b-a281-e74329c4719a
gets repeated a lot and suggests there's a process management issue or an issue with the pre-existing XPCShellContentUtils or how they are used by the test -- but things work fine on desktop. It's unlikely that the code under test is used on Android from something other than the main/parent process, and it's also not clear that the test failure actually means something is broken about the code under test even if it were used from a privilegedabout process on android.
Nika or Kris, is there a better option here besides landing this with the test disabled on android with a follow-up filed to figure out what is broken here?
Comment 5•1 year ago
|
||
:kaya, do you know what might be causing this error to be happening on Android, and whether it's the cause of the test failures here? It seems surprising to me that this test would be falling over specifically, given we run other xpcshell tests which start child processes.
Comment 6•1 year ago
|
||
There is an r+ patch which didn't land and no activity in this bug for 2 weeks.
:Gijs, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit BugBot documentation.
Reporter | ||
Comment 7•1 year ago
|
||
I've just about progressed to being able to run the test on an emulator locally and see failure, but it's still wholly unclear to me why it's failing.
Comment 8•1 year ago
|
||
Mentioning here as well for visibility: Those spamming logs are not new.
By just looking at the code related to those spamming Gecko runtime logs, looks like it's introduced way earlier (see first version: https://phabricator.services.mozilla.com/D108037).
So, my presumption is that those warnings are not the root cause of the failure.
Looping back to what Nika suggested about the use of remoteSubframes: true
for force enabling the fission for that frame, it did not help as well and could still reproduce the test failure on my local. I am a bit unfamiliar with the scope of this change but will spend some time on it on Monday to get familiarized with it.
Comment 9•1 year ago
|
||
We had a brief conversation in Slack about this a couple of weeks ago, I'll transfer this here for visibility:
I see I Gecko : console.error: (new Error("Fetching OHTTP config from http://localhost:35808/.wellknown/fickle failed with error 500", "resource://gre/modules/OHTTPConfigManager.sys.mjs", 47))
in the logcat in the immediate vicinity of the test. Gijs confirmed the URL looks right. So I am not sure if that server or file accessible from the mobile device (the emulator), because localhost
on an emulator would refer to something running on the emulator. I'm not sure what we usually do in mochitests on Android, but I don't see any other errors in the log. If it's posible to get that config in some other way, I would try that.
Reporter | ||
Comment 10•1 year ago
|
||
(In reply to [:owlish] 🦉 PST from comment #9)
We had a brief conversation in Slack about this a couple of weeks ago, I'll transfer this here for visibility:
I see
I Gecko : console.error: (new Error("Fetching OHTTP config from http://localhost:35808/.wellknown/fickle failed with error 500", "resource://gre/modules/OHTTPConfigManager.sys.mjs", 47))
in the logcat in the immediate vicinity of the test. Gijs confirmed the URL looks right. So I am not sure if that server or file accessible from the mobile device (the emulator), becauselocalhost
on an emulator would refer to something running on the emulator. I'm not sure what we usually do in mochitests on Android, but I don't see any other errors in the log. If it's posible to get that config in some other way, I would try that.
It's intended to fail; the test deliberately returns a 500 error, and the code under test deliberately logs this error (because it's an error!). And for clarity, it's an xpcshell test, not a mochitest.
Assignee | ||
Updated•11 months ago
|
Comment 11•11 months ago
|
||
Redirect a needinfo that is pending on an inactive user to the triage owner.
:jhirsch, since the bug has high priority, could you have a look please?
For more information, please visit BugBot documentation.
Reporter | ||
Updated•11 months ago
|
Reporter | ||
Updated•11 months ago
|
Assignee | ||
Updated•8 months ago
|
Assignee | ||
Comment 12•3 months ago
|
||
Stealing, going to see if I can sort out how to fix the tests.
Assignee | ||
Comment 13•3 months ago
|
||
I played around with this test today in a local android emulator. It seems that all the tests pass except the last one, test_out_of_process_use
, and that test hangs at this line:
let page = await XPCShellContentUtils.loadContentPage("about:certificate", {
remote: true,
});
If I change the argument to remote: false
, there is an interesting error message in the test output logs:
Mozilla crash reason: Missing chrome or resource URLs: chrome://global/skin/in-content/common.css
I'm out of time for today, but I have a few questions for folks on the bug already:
(1) Could we just disable the test_out_of_process_use
test, keeping the 6 others enabled, and land this patch? It rebases cleanly onto current m-c.
(2) Is the "missing chrome or resource URLs" crash a known thing with an easy fix? maybe an android-specific thing?
(3) I don't see XPCShellContentUtils.loadContentPage
used elsewhere in /netwerk
. Is it possible there's some kind of IPC setup that needs to be adjusted for this helper to work in this location? Is there a simple alternative we could use instead?
(Also, just as a side note, I noticed that Ctrl-C can't break the hang once it occurs, but if I kick off a different passing test in another terminal, that second xpcshell runner will seize control of the emulator, and after the second test finishes successfully, the hanging test will dump its output a few seconds later. This is helpful when testing locally.)
Reporter | ||
Comment 14•3 months ago
|
||
(In reply to Jared Hirsch [:jhirsch] (he/him) (Needinfo please) from comment #13)
I played around with this test today in a local android emulator.
Thanks for digging into this!
It seems that all the tests pass except the last one,
test_out_of_process_use
, and that test hangs at this line:let page = await XPCShellContentUtils.loadContentPage("about:certificate", { remote: true, });
If I change the argument to
remote: false
, there is an interesting error message in the test output logs:Mozilla crash reason: Missing chrome or resource URLs: chrome://global/skin/in-content/common.css
I'm out of time for today, but I have a few questions for folks on the bug already:
(1) Could we just disable the
test_out_of_process_use
test, keeping the 6 others enabled, and land this patch? It rebases cleanly onto current m-c.
I think that'd be reasonable. Would you like to do the honours, or do you want me to take this back? :-)
(2) Is the "missing chrome or resource URLs" crash a known thing with an easy fix? maybe an android-specific thing?
We crash deliberately (though only in automation!) because it shouldn't be the case that we ship pages/features on a platform/app that require resources that don't exist. So this is an Android bug where we ship about:certificate
on Android, but don't ship everything it needs (the toolkit + common styling). It should probably be filed as a separate follow-up bug: either we shouldn't ship about:certificate on android (obvious question: is there any way to load that on Android today?) or we should make it not depend on things that are not available, or we should make whatever it needs available on Android.
It'd be nice for the test if we had another page that requires the privileged about process that did work flawlessly on Android, but it doesn't look like we do. We could create one in the test but at this point I don't think it's worth it.
(It would perhaps be nice if we had a cross-test-framework way to reliably fail whatever test we were running, from C++ and JS - but we do not have one today other than "crash". It would also be useful if content process crashes were more obvious on Android... but that may be difficult to do.)
(3) I don't see
XPCShellContentUtils.loadContentPage
used elsewhere in/netwerk
. Is it possible there's some kind of IPC setup that needs to be adjusted for this helper to work in this location? Is there a simple alternative we could use instead?
Not that I'm aware of.
(Also, just as a side note, I noticed that Ctrl-C can't break the hang once it occurs, but if I kick off a different passing test in another terminal, that second xpcshell runner will seize control of the emulator, and after the second test finishes successfully, the hanging test will dump its output a few seconds later. This is helpful when testing locally.)
That's pretty funky - it may be worth flagging this to the Fenix / Android Components folks but I don't know if in practice they hardly ever run xpcshell tests...
Assignee | ||
Comment 15•3 months ago
|
||
I think that'd be reasonable. Would you like to do the honours, or do you want me to take this back? :-)
Sure, I can update the patch. I'll move the test_out_of_process_use
test to its own file, so that I can use the browser.toml skip-if to conditionally disable it on Android but run it on other platforms, and I'll file a followup bug to get it working on Android.
So this is an Android bug where we ship about:certificate on Android, but don't ship everything it needs (the toolkit + common styling). It should probably be filed as a separate follow-up bug: either we shouldn't ship about:certificate on android (obvious question: is there any way to load that on Android today?) or we should make it not depend on things that are not available, or we should make whatever it needs available on Android.
I'm able to load about:certificate in the emulator, so I guess this must be a test setup bug.
I'll also file followup bugs for the missing common.css file in android xpcshell setup, and for android xpcshell tests ignoring Ctrl-C when a process crash occurs.
Reporter | ||
Comment 16•3 months ago
•
|
||
(In reply to Jared Hirsch [:jhirsch] (he/him) (Needinfo please) from comment #15)
I think that'd be reasonable. Would you like to do the honours, or do you want me to take this back? :-)
Sure, I can update the patch. I'll move the
test_out_of_process_use
test to its own file, so that I can use the browser.toml skip-if to conditionally disable it on Android but run it on other platforms, and I'll file a followup bug to get it working on Android.
Thank you!
So this is an Android bug where we ship about:certificate on Android, but don't ship everything it needs (the toolkit + common styling). It should probably be filed as a separate follow-up bug: either we shouldn't ship about:certificate on android (obvious question: is there any way to load that on Android today?) or we should make it not depend on things that are not available, or we should make whatever it needs available on Android.
I'm able to load about:certificate in the emulator, so I guess this must be a test setup bug.
Sure, outside of automation / test running, it won't crash. But I bet the CSS file still isn't loading... and so the styling might look wonky if you were to open about:certificate
with an actual cert. This is also what happens to some network error pages, cf. bug 1810039 and some of the bugs in the "see also".
Edit: to be explicit about this, https://searchfox.org/mozilla-central/rev/64ddb621a0d3905fc2e3df475517d4163d377b22/toolkit/themes/shared/desktop-jar.inc.mn#59 and https://searchfox.org/mozilla-central/search?q=common.css&path=jar®exp=false make it fairly straightforward to know the file just isn't packaged at all on Android (irrespective of fenix vs. xpcshell or w/e).
Updated•3 months ago
|
Comment 17•3 months ago
|
||
Comment 18•3 months ago
|
||
bugherder |
Description
•