Closed Bug 1845340 Opened 1 year ago Closed 3 months ago

Consider ensuring shopping's OHTTP keyconfig caching happens in the parent process

Categories

(Firefox :: Shopping, task, P1)

Desktop
All
task

Tracking

()

RESOLVED FIXED
129 Branch
Tracking Status
firefox129 --- fixed

People

(Reporter: Gijs, Assigned: jhirsch)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fidefe-shopping])

Attachments

(1 file)

Right now the ohttp stuff will be requested in the privileged about process and that is also where the config is loaded.

Because the process is typically long-lived that's not the end of the world, but it'd be good if the cache enforced that it lived in the parent.

The right solution would probably be to run a JSProcessActor and use that to communicate the keyconfig.

Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Severity: -- → N/A
Priority: -- → P1
Pushed by gijskruitbosch@gmail.com: https://hg.mozilla.org/integration/autoland/rev/82405e95ba59 force ohttp config fetches to live in the parent process instead of wherever OHTTPConfigManager is called, r=nika
Whiteboard: [fidefe-shopping]

Backed out for causing xpcshell failures in test_ohttp_config_manager.js.

Flags: needinfo?(gijskruitbosch+bugs)

I'm not sure how to proceed here. The test fails on android. I'm not in a good position to debug why. The errors seem not directly related to the code under test:

07-31 13:09:35.171 25497 25537 D ServiceAllocator: org.mozilla.gecko.process.GeckoChildProcessServices$gpu updateBindings: FOREGROUND priority, 0 importance, 3 successful binds, 0 failed binds, 0 successful unbinds
07-31 13:09:35.173  4683  4704 W ServiceChildProcess: This process belongs to a different GeckoRuntime owner: 99550679-1aaf-4fce-92c5-1caf969c6290 process: fde8e882-6b2f-437b-a281-e74329c4719a

gets repeated a lot and suggests there's a process management issue or an issue with the pre-existing XPCShellContentUtils or how they are used by the test -- but things work fine on desktop. It's unlikely that the code under test is used on Android from something other than the main/parent process, and it's also not clear that the test failure actually means something is broken about the code under test even if it were used from a privilegedabout process on android.

Nika or Kris, is there a better option here besides landing this with the test disabled on android with a follow-up filed to figure out what is broken here?

Flags: needinfo?(nika)
Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(gijskruitbosch+bugs)

:kaya, do you know what might be causing this error to be happening on Android, and whether it's the cause of the test failures here? It seems surprising to me that this test would be falling over specifically, given we run other xpcshell tests which start child processes.

Flags: needinfo?(nika) → needinfo?(kkaya)

There is an r+ patch which didn't land and no activity in this bug for 2 weeks.
:Gijs, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit BugBot documentation.

Flags: needinfo?(nika)
Flags: needinfo?(gijskruitbosch+bugs)

I've just about progressed to being able to run the test on an emulator locally and see failure, but it's still wholly unclear to me why it's failing.

Flags: needinfo?(nika)
Flags: needinfo?(gijskruitbosch+bugs)

Mentioning here as well for visibility: Those spamming logs are not new.

By just looking at the code related to those spamming Gecko runtime logs, looks like it's introduced way earlier (see first version: https://phabricator.services.mozilla.com/D108037).
So, my presumption is that those warnings are not the root cause of the failure.
Looping back to what Nika suggested about the use of remoteSubframes: true for force enabling the fission for that frame, it did not help as well and could still reproduce the test failure on my local. I am a bit unfamiliar with the scope of this change but will spend some time on it on Monday to get familiarized with it.

Flags: needinfo?(kkaya)

We had a brief conversation in Slack about this a couple of weeks ago, I'll transfer this here for visibility:

I see I Gecko : console.error: (new Error("Fetching OHTTP config from http://localhost:35808/.wellknown/fickle failed with error 500", "resource://gre/modules/OHTTPConfigManager.sys.mjs", 47)) in the logcat in the immediate vicinity of the test. Gijs confirmed the URL looks right. So I am not sure if that server or file accessible from the mobile device (the emulator), because localhost on an emulator would refer to something running on the emulator. I'm not sure what we usually do in mochitests on Android, but I don't see any other errors in the log. If it's posible to get that config in some other way, I would try that.

(In reply to [:owlish] 🦉 PST from comment #9)

We had a brief conversation in Slack about this a couple of weeks ago, I'll transfer this here for visibility:

I see I Gecko : console.error: (new Error("Fetching OHTTP config from http://localhost:35808/.wellknown/fickle failed with error 500", "resource://gre/modules/OHTTPConfigManager.sys.mjs", 47)) in the logcat in the immediate vicinity of the test. Gijs confirmed the URL looks right. So I am not sure if that server or file accessible from the mobile device (the emulator), because localhost on an emulator would refer to something running on the emulator. I'm not sure what we usually do in mochitests on Android, but I don't see any other errors in the log. If it's posible to get that config in some other way, I would try that.

It's intended to fail; the test deliberately returns a 500 error, and the code under test deliberately logs this error (because it's an error!). And for clarity, it's an xpcshell test, not a mochitest.

Blocks: shopping2023
No longer blocks: 1840451

Redirect a needinfo that is pending on an inactive user to the triage owner.
:jhirsch, since the bug has high priority, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(kmaglione+bmo) → needinfo?(jhirsch)
Flags: needinfo?(jhirsch)
Flags: needinfo?(gijskruitbosch+bugs)
Severity: N/A → S3

Stealing, going to see if I can sort out how to fix the tests.

Assignee: gijskruitbosch+bugs → jhirsch
Flags: needinfo?(gijskruitbosch+bugs)

I played around with this test today in a local android emulator. It seems that all the tests pass except the last one, test_out_of_process_use, and that test hangs at this line:

let page = await XPCShellContentUtils.loadContentPage("about:certificate", {
  remote: true,
}); 

If I change the argument to remote: false, there is an interesting error message in the test output logs:

Mozilla crash reason: Missing chrome or resource URLs: chrome://global/skin/in-content/common.css

I'm out of time for today, but I have a few questions for folks on the bug already:

(1) Could we just disable the test_out_of_process_use test, keeping the 6 others enabled, and land this patch? It rebases cleanly onto current m-c.

(2) Is the "missing chrome or resource URLs" crash a known thing with an easy fix? maybe an android-specific thing?

(3) I don't see XPCShellContentUtils.loadContentPage used elsewhere in /netwerk. Is it possible there's some kind of IPC setup that needs to be adjusted for this helper to work in this location? Is there a simple alternative we could use instead?

(Also, just as a side note, I noticed that Ctrl-C can't break the hang once it occurs, but if I kick off a different passing test in another terminal, that second xpcshell runner will seize control of the emulator, and after the second test finishes successfully, the hanging test will dump its output a few seconds later. This is helpful when testing locally.)

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Jared Hirsch [:jhirsch] (he/him) (Needinfo please) from comment #13)

I played around with this test today in a local android emulator.

Thanks for digging into this!

It seems that all the tests pass except the last one, test_out_of_process_use, and that test hangs at this line:

let page = await XPCShellContentUtils.loadContentPage("about:certificate", {
  remote: true,
}); 

If I change the argument to remote: false, there is an interesting error message in the test output logs:

Mozilla crash reason: Missing chrome or resource URLs: chrome://global/skin/in-content/common.css

I'm out of time for today, but I have a few questions for folks on the bug already:

(1) Could we just disable the test_out_of_process_use test, keeping the 6 others enabled, and land this patch? It rebases cleanly onto current m-c.

I think that'd be reasonable. Would you like to do the honours, or do you want me to take this back? :-)

(2) Is the "missing chrome or resource URLs" crash a known thing with an easy fix? maybe an android-specific thing?

We crash deliberately (though only in automation!) because it shouldn't be the case that we ship pages/features on a platform/app that require resources that don't exist. So this is an Android bug where we ship about:certificate on Android, but don't ship everything it needs (the toolkit + common styling). It should probably be filed as a separate follow-up bug: either we shouldn't ship about:certificate on android (obvious question: is there any way to load that on Android today?) or we should make it not depend on things that are not available, or we should make whatever it needs available on Android.

It'd be nice for the test if we had another page that requires the privileged about process that did work flawlessly on Android, but it doesn't look like we do. We could create one in the test but at this point I don't think it's worth it.

(It would perhaps be nice if we had a cross-test-framework way to reliably fail whatever test we were running, from C++ and JS - but we do not have one today other than "crash". It would also be useful if content process crashes were more obvious on Android... but that may be difficult to do.)

(3) I don't see XPCShellContentUtils.loadContentPage used elsewhere in /netwerk. Is it possible there's some kind of IPC setup that needs to be adjusted for this helper to work in this location? Is there a simple alternative we could use instead?

Not that I'm aware of.

(Also, just as a side note, I noticed that Ctrl-C can't break the hang once it occurs, but if I kick off a different passing test in another terminal, that second xpcshell runner will seize control of the emulator, and after the second test finishes successfully, the hanging test will dump its output a few seconds later. This is helpful when testing locally.)

That's pretty funky - it may be worth flagging this to the Fenix / Android Components folks but I don't know if in practice they hardly ever run xpcshell tests...

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(jhirsch)

I think that'd be reasonable. Would you like to do the honours, or do you want me to take this back? :-)

Sure, I can update the patch. I'll move the test_out_of_process_use test to its own file, so that I can use the browser.toml skip-if to conditionally disable it on Android but run it on other platforms, and I'll file a followup bug to get it working on Android.

So this is an Android bug where we ship about:certificate on Android, but don't ship everything it needs (the toolkit + common styling). It should probably be filed as a separate follow-up bug: either we shouldn't ship about:certificate on android (obvious question: is there any way to load that on Android today?) or we should make it not depend on things that are not available, or we should make whatever it needs available on Android.

I'm able to load about:certificate in the emulator, so I guess this must be a test setup bug.

I'll also file followup bugs for the missing common.css file in android xpcshell setup, and for android xpcshell tests ignoring Ctrl-C when a process crash occurs.

Flags: needinfo?(jhirsch)

(In reply to Jared Hirsch [:jhirsch] (he/him) (Needinfo please) from comment #15)

I think that'd be reasonable. Would you like to do the honours, or do you want me to take this back? :-)

Sure, I can update the patch. I'll move the test_out_of_process_use test to its own file, so that I can use the browser.toml skip-if to conditionally disable it on Android but run it on other platforms, and I'll file a followup bug to get it working on Android.

Thank you!

So this is an Android bug where we ship about:certificate on Android, but don't ship everything it needs (the toolkit + common styling). It should probably be filed as a separate follow-up bug: either we shouldn't ship about:certificate on android (obvious question: is there any way to load that on Android today?) or we should make it not depend on things that are not available, or we should make whatever it needs available on Android.

I'm able to load about:certificate in the emulator, so I guess this must be a test setup bug.

Sure, outside of automation / test running, it won't crash. But I bet the CSS file still isn't loading... and so the styling might look wonky if you were to open about:certificate with an actual cert. This is also what happens to some network error pages, cf. bug 1810039 and some of the bugs in the "see also".

Edit: to be explicit about this, https://searchfox.org/mozilla-central/rev/64ddb621a0d3905fc2e3df475517d4163d377b22/toolkit/themes/shared/desktop-jar.inc.mn#59 and https://searchfox.org/mozilla-central/search?q=common.css&path=jar&regexp=false make it fairly straightforward to know the file just isn't packaged at all on Android (irrespective of fenix vs. xpcshell or w/e).

Attachment #9345801 - Attachment description: Bug 1845340 - force ohttp config fetches to live in the parent process instead of wherever OHTTPConfigManager is called, r?#necko-reviewers!,nika! → Bug 1845340 - force ohttp config fetches to live in the parent process instead of wherever HPKEConfigManager is called,r?#necko-reviewers!,nika!,gijs!
Pushed by jhirsch@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5edc789fd939 force ohttp config fetches to live in the parent process instead of wherever HPKEConfigManager is called,r=nika,Gijs
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 129 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: