<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Reporter

Comment 1

•

8 months ago

The ones from the Normandy test suite in bug 1859188 aren't quite perma but are frequent and also look "interesting" in how they're associated with a specific round of tests.

Comment 2

•

8 months ago

One thing I noticed when taking a scan over your patches is that there is a behaviour change from the old code as to when the importESModule() call is being made for the lazy tasks from browser-idle-startup. Previously, these modules were declared as lazy module getters on the lazy object, and accessed from within the lazy startup tasks, meaning that they wouldn't be loaded until idle.

Under the new code path, the modules are now synchronously loaded during CatManListenerManager.getListeners() (https://searchfox.org/mozilla-central/rev/ae4d3bc2af37eef804113b621455883a92a29e9c/toolkit/modules/BrowserUtils.sys.mjs#38-40), which happens outside of the ChromeUtils.idleDispatch(...) call (https://searchfox.org/mozilla-central/rev/ae4d3bc2af37eef804113b621455883a92a29e9c/toolkit/modules/BrowserUtils.sys.mjs#519-527). Perhaps some of these modules did not expect to be loaded until idle, and so are acting in a strange way leading to leaks when being loaded early.

It should be relatively straightforward to test this theory out, the easiest way is probably to make the loading of the module lazy again by moving the import to only happen when the fn is called here: https://searchfox.org/mozilla-central/rev/ae4d3bc2af37eef804113b621455883a92a29e9c/toolkit/modules/BrowserUtils.sys.mjs#38-42. Could be as simple as this, though you could also get fancier with caching and stuff (doesn't look ottomh like these are called in tight loops right now though?):

let [objName, method] = value.split(".");
let fn = (...args) => ChromeUtils.importESModule(module)[objName][method](...args);
fn._descriptiveName = value;

Reporter

Comment 3

•

8 months ago

(In reply to Nika Layzell [:nika] (ni? for response) from comment #2)

One thing I noticed when taking a scan over your patches is that there is a behaviour change from the old code as to when the importESModule() call is being made for the lazy tasks from browser-idle-startup. Previously, these modules were declared as lazy module getters on the lazy object, and accessed from within the lazy startup tasks, meaning that they wouldn't be loaded until idle.

I thought I fixed this in one of those patches. See in particular https://phabricator.services.mozilla.com/D220897#8238755 .

https://hg.mozilla.org/mozilla-central/file/75a21e462ddba79b675e2811bd85b4ba9d885d46/toolkit/modules/BrowserUtils.sys.mjs#l38 (searchfox hasn't updated at time of writing) shows that getListeners returns an array of functions. Those functions are the tasks that get run (in some cases, on idle), and although a little more convoluted than your example, they do basically boil down to:

let [objName, method] = value.split(".");
let fn = (...args) => ChromeUtils.importESModule(module)[objName][method](...args);
fn._descriptiveName = value;

right?

Flags: needinfo?(nika)

Comment 4

•

8 months ago

Could you please attach a patch to this bug (on top of bug 1916424) that will reproduce the issue? I want to be able to reproduce the problem locally to investigate it.

Also, the summary talks about LSan, but leaks also show up in the XPCOM leak checker.

Flags: needinfo?(continuation) → needinfo?(gijskruitbosch+bugs)

Reporter

Comment 5

•

8 months ago

•

Edited

(In reply to Andrew McCreight [:mccr8] from comment #4)

Could you please attach a patch to this bug (on top of bug 1916424) that will reproduce the issue? I want to be able to reproduce the problem locally to investigate it.

I've pushed to try https://treeherder.mozilla.org/jobs?repo=try&revision=0089a421bc39366564bf9df91137a2006f3e9d4e because I'm not next to the relevant VM to verify that that patch does reproduce the issue - so feel free to wait until the jobs there reproduce the problem - but if memory serves that commit is what I did yesterday... It matches the interdiff at https://phabricator.services.mozilla.com/D220898?vs=986514&id=987504#toc .

Also, the summary talks about LSan, but leaks also show up in the XPCOM leak checker.

I'm sure I'm missing or misunderstanding something, but where does that show up in https://treeherder.mozilla.org/logviewer?job_id=495482744&repo=autoland&lineNumber=7760 ?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(continuation)

Comment 6

•

8 months ago

(In reply to :Gijs (he/him) from comment #5)

I'm sure I'm missing or misunderstanding something, but where does that show up in https://treeherder.mozilla.org/logviewer?job_id=495482744&repo=autoland&lineNumber=7760 ?

It is the Linux debug bc15 failure from that same push. ASan builds run the LSan leak checker, while debug builds run the XPCOM leak checker.

Flags: needinfo?(continuation)

Summary: LSAN/ASAN leaks when switching SERPTelemetry.sys.mjs uninit calls to use category manager mechanism when running browser_search_glean_serp_event_telemetry_categorization_enabled_by_pref.js → Leaks when switching SERPTelemetry.sys.mjs uninit calls to use category manager mechanism when running browser_search_glean_serp_event_telemetry_categorization_enabled_by_pref.js

Comment 7

•

8 months ago

I added some logging to both uninit calls and ran browser_search_glean_serp_event_telemetry_categorization_enabled_by_pref.js.

Without the patch from your try run (non-leaking) we have this:

SERPCategorizer uninit() runs right after the test_enable_experiment_when_pref_is_not_enabled section starts.
It runs again right after "Turn pref off" happens in that same section.
After the test is done, in the "TEST-START | Shutdown" section, we run the TelemetryHandler uninit, followed by the SERPCategorizer uninit.

With the patch from your try run (leaking), it looks the same EXCEPT that the TelemetryHandler uninit doesn't run at shutdown.

Comment 8

•

8 months ago

I also tried converting only SearchSERPTelemetry.uninit or only SearchSERPCategorization.uninit. Neither of those leaks.

Reporter

Comment 9

•

8 months ago

(In reply to Andrew McCreight [:mccr8] from comment #8)

I also tried converting only SearchSERPTelemetry.uninit or only SearchSERPCategorization.uninit. Neither of those leaks.

I feel dumb now. So maybe the simple issue is, the category manager for any given category treats each initial identifier as unique and because these two things are both in the same module, only one of the uninit calls registers...

Reporter

Comment 10

•

8 months ago

•

Edited

OK, so my thinking that either the catman didn't care about double-keying or that (notification, module uri) would be a unique enough key to not cause problems, is clearly not working out.

I don't really remember if I had originally (summer last year) realized the issue that we're running into here, but the initial design used the exported object as the key, which would have avoided this. It was changed in review to the (much more ergonomic / readable!) syntax that is there now.

I note that I haven't yet bothered to try to convert onWindowsRestored. The module initialization is split between idle and that, so the init there (https://searchfox.org/mozilla-central/rev/83e29f5ee2e301ac7224e2927bddda16634b1897/browser/components/BrowserGlue.sys.mjs#2398) doesn't conflict with the other class which is now having init called from idle via the catman path already, as the "topic" / key would be different between startup idle and onwindowsrestored. No, I don't off-hand know why they're initialized at different times, and uninitialized at the same time...

I see a few somewhat obvious paths forward:

revisit the ordering/keying decision (there's not that many consumers yet and they'd be easy to mechanically update). We could make the object.method bit the key and the module the "value" part, which arguably isn't really any less readable? I'd have to remember to update the all_files_referenced test to cater for it as well, but otherwise it'd be a pretty mechanical change.
if we think SERP is a bit of a special snowflake and the only case where this would happen, we could redesign the search SERP module so it has 1 single uninit/"quit-granted" handling method that just individually calls the two tasks. It doesn't seem particularly valuable to have them dealt with separately (though I suppose in principle the current situation is slightly more resilient to errors in one of the two modules not causing us to neglect to uninit the other one, which would need a bit of try...catch-ing inside the module to deal with).

Although I don't love another big change to how this works, the downside of the second option is that this feels like a footgun that we should address. We don't even have easy linting options for the component manifest files. So I'm leaning towards the first option. Mossop, does that seem OK or am I missing a third, better option? :-)

Flags: needinfo?(nika) → needinfo?(dtownsend)

Dave Townsend [:mossop]

Comment 11

•

8 months ago

(In reply to :Gijs (he/him) from comment #10)

OK, so my thinking that either the catman didn't care about double-keying or that (notification, module uri) would be a unique enough key to not cause problems, is clearly not working out.

I don't really remember if I had originally (summer last year) realized the issue that we're running into here, but the initial design used the exported object as the key, which would have avoided this. It was changed in review to the (much more ergonomic / readable!) syntax that is there now.

I note that I haven't yet bothered to try to convert onWindowsRestored. The module initialization is split between idle and that, so the init there (https://searchfox.org/mozilla-central/rev/83e29f5ee2e301ac7224e2927bddda16634b1897/browser/components/BrowserGlue.sys.mjs#2398) doesn't conflict with the other class which is now having init called from idle via the catman path already, as the "topic" / key would be different between startup idle and onwindowsrestored. No, I don't off-hand know why they're initialized at different times, and uninitialized at the same time...

I see a few somewhat obvious paths forward:

revisit the ordering/keying decision (there's not that many consumers yet and they'd be easy to mechanically update). We could make the object.method bit the key and the module the "value" part, which arguably isn't really any less readable? I'd have to remember to update the all_files_referenced test to cater for it as well, but otherwise it'd be a pretty mechanical change.

if we think SERP is a bit of a special snowflake and the only case where this would happen, we could redesign the search SERP module so it has 1 single uninit/"quit-granted" handling method that just individually calls the two tasks. It doesn't seem particularly valuable to have them dealt with separately (though I suppose in principle the current situation is slightly more resilient to errors in one of the two modules not causing us to neglect to uninit the other one, which would need a bit of try...catch-ing inside the module to deal with).

Although I don't love another big change to how this works, the downside of the second option is that this feels like a footgun that we should address. We don't even have easy linting options for the component manifest files. So I'm leaning towards the first option. Mossop, does that seem OK or am I missing a third, better option? :-)

A vaguely recall that we did discuss this possibility and considered it to be a rare enough case that you would just do option 2 if you needed to do two things in the same module. It should be straightforward to detect this problem and warn about it, maybe even crash in debug builds in the manifest parser code.

Flags: needinfo?(dtownsend)

Assignee

Comment 12

•

8 months ago

:Gijs is another possibility to move one of the instances that needs to un-inited (namely SERP Categorization) out from SearchSERPTelemetry and into a separate file? In some ways, it makes sense to have SERP Telemetry and SERP Categorization separate since they currently have different populations, report data to two different sources (the latter uses OHTTP and doesn't include client data), and it remains to be seen whether the latter will have long term support.

Flags: needinfo?(gijskruitbosch+bugs)

Reporter

Comment 13

•

8 months ago

(In reply to James Teow [:jteow] from comment #12)

:Gijs is another possibility to move one of the instances that needs to un-inited (namely SERP Categorization) out from SearchSERPTelemetry and into a separate file? In some ways, it makes sense to have SERP Telemetry and SERP Categorization separate since they currently have different populations, report data to two different sources (the latter uses OHTTP and doesn't include client data), and it remains to be seen whether the latter will have long term support.

Yes, that would also work! If this is work that (better) aligns with what you want to do anyway, that would make a lot of sense to do.

Could use this bug or a separate one - I would like to do what Mossop suggested in comment #11 though and also crash in debug builds if people register the same entry for the same category twice (as that smells like it'd be a bug either way).

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(jteow)

Assignee

Comment 14

•

8 months ago

Understood, thank you. I'll use this bug as making usage of the category manager is a strong reason for doing the code refactor. I've given this a P2 as I can work on it next sprint (and thus, the next release cycle) but LMK if it needs uplifting.

Assignee: nobody → jteow

Severity: -- → S3

Flags: needinfo?(jteow)

Priority: -- → P2

Whiteboard: [sng]

Jira Integration Bot

Updated

•

8 months ago

See Also: → https://mozilla-hub.atlassian.net/browse/SNG-2264

Assignee

Comment 15

•

8 months ago

Attached file Bug 1949294 - Part 1: Move SERP categorization into its own component file - r?scunnane! — Details

This includes some renaming for consistency and reduce redundancy.

Assignee

Comment 16

•

8 months ago

Attached file Bug 1949294 - Part 2: Un-init SERP categorization and telemetry with category manager - r?gijs! — Details

Depends on D240039