ClientEnvironment.jsm and TelemetryEnvironment.jsm require browser's AttributionCode which breaks on non-browser apps (e.g. Fenix)
Categories
(Toolkit :: Telemetry, defect, P4)
Tracking
()
Tracking | Status | |
---|---|---|
firefox92 | --- | affected |
People
(Reporter: Gijs, Unassigned)
References
Details
In bug 1721627 I'm experimenting with making tests fail if we request internal resources that aren't present. The idea is to prevent patches from landing if they mis-spell (or forget to hg add
, or forget to include in jar.mn
or similar) subresources like CSS files and images, by failing if we hit chrome: or resource: URLs in tests that are not available (ie where the internal jar/file code cannot find the relevant file).
On android, resource:///modules/AttributionCode.jsm is getting flagged up as a URL that is not present. Some of these are guarded in try...catches, but some are not.
The relevant bits should probably be behind AppConstants.MOZ_BUILD_APP == "browser"
checks.
Though, to be fair, I also do not see any non-test uses of the ClientEnvironment static attribution
property, so perhaps removing that altogether would be the easier option?
Comment 1•3 years ago
|
||
I'm a little confused as to why the Environment JSMs are present in Fenix. Telemetry doesn't work on Fenix (well, bits of the core do, but they're libxul, not JSMs), so there shouldn't be any of it around, no?
Updated•3 years ago
|
Reporter | ||
Comment 2•3 years ago
|
||
(In reply to Chris H-C :chutten from comment #1)
I'm a little confused as to why the Environment JSMs are present in Fenix. Telemetry doesn't work on Fenix (well, bits of the core do, but they're libxul, not JSMs), so there shouldn't be any of it around, no?
I'm afraid I'm not an android expert, and I don't know to what degree these JSMs being present is expected or not.
At least the commit that added the AttributionCode.jsm bit to ClientEnvironment (bug 1578885) mentions remote settings and normandy, which might exist on android? I also don't see anything in https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/moz.build that excludes these JSMs from omni.ja for android, but I'd expect a bunch of toolkit would break if Services.telemetry
was suddenly null?
In terms of why I know this is happening, I'm afraid all I have is a C++ stack which is not much use without the JS info: https://treeherder.mozilla.org/logviewer?job_id=346178124&repo=try&lineNumber=25450 . The patch logged:
07-23 17:27:11.038 4177 4192 I Gecko : Missing file: resource:///modules/AttributionCode.jsm
on logcat, on this try job and https://searchfox.org/mozilla-central/search?q=AttributionCode.jsm only shows ClientEnvironment.jsm and TelemetryEnvironment.jsm in non-browser/
, non-test code. I guess the former isn't part of telemetry though, despite what the name might suggest - so perhaps I filed this in the wrong place? As I noted in comment #0 I don't see how we're accessing that property at all, but if that's the culprit then we must be doing so, somehow! I'm hoping Rob can help here...
Comment 3•3 years ago
|
||
(In reply to :Gijs (he/him) from comment #2)
(In reply to Chris H-C :chutten from comment #1)
I'm a little confused as to why the Environment JSMs are present in Fenix. Telemetry doesn't work on Fenix (well, bits of the core do, but they're libxul, not JSMs), so there shouldn't be any of it around, no?
I'm afraid I'm not an android expert, and I don't know to what degree these JSMs being present is expected or not.
At least the commit that added the AttributionCode.jsm bit to ClientEnvironment (bug 1578885) mentions remote settings and normandy, which might exist on android? I also don't see anything in https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/moz.build that excludes these JSMs from omni.ja for android, but I'd expect a bunch of toolkit would break if
Services.telemetry
was suddenly null?In terms of why I know this is happening, I'm afraid all I have is a C++ stack which is not much use without the JS info: https://treeherder.mozilla.org/logviewer?job_id=346178124&repo=try&lineNumber=25450 . The patch logged:
07-23 17:27:11.038 4177 4192 I Gecko : Missing file: resource:///modules/AttributionCode.jsm
on logcat, on this try job and https://searchfox.org/mozilla-central/search?q=AttributionCode.jsm only shows ClientEnvironment.jsm and TelemetryEnvironment.jsm in non-
browser/
, non-test code. I guess the former isn't part of telemetry though, despite what the name might suggest - so perhaps I filed this in the wrong place? As I noted in comment #0 I don't see how we're accessing that property at all, but if that's the culprit then we must be doing so, somehow! I'm hoping Rob can help here...
ClientEnvironment.jsm
(at least at the time) was used by Normandy as well as Activity Stream and Remote Settings, the goal with bug 1578885 was to have the attribution data available very early in startup so it could be used for experiments (showing a user a different first-run page if they had a certain attribution code, etc).
This is hard to debug without more info, like the JS call stack or something... offhand though I'd say it could be coming from one of those three components, is ASRouter or newtab about:welcome used on Android? https://searchfox.org/mozilla-central/search?q=AttributionCode.jsm&path=
Comment 4•3 years ago
|
||
I am struggling to figure out how ClientEnvironment.jsm
is implicated, but looking at https://searchfox.org/mozilla-central/search?q=ClientEnvironment.jsm&path= I see that Nimbus and messaging-system import it as well.
Reporter | ||
Comment 5•3 years ago
|
||
I've now spent half a day trying to get a local android build to work, to no avail (the emulator refuses to start). Alternatively I could keep debugging through try, but that's slow and tedious (though...).
Reporter | ||
Comment 6•3 years ago
|
||
(In reply to :Gijs (he/him) from comment #5)
I've now spent half a day trying to get a local android build to work, to no avail (the emulator refuses to start). Alternatively I could keep debugging through try, but that's slow and tedious (though...).
I have finally gotten this working, but of course it now turns out I cannot reproduce locally... 🙃
I also tried reproducing with the original rev, but I can't get the emulator to work locally with that, so that's not going to fly either. I also tried some other suspicions I had, but I didn't get anywhere - I guess this may need to remain a mystery... If it comes back we can reopen with more details. I'll make sure the patch in 1721627 logs JS stacks on debug builds on infra if it fails.
Reporter | ||
Comment 7•3 years ago
|
||
This went orange when I actually landed the patch: https://treeherder.mozilla.org/jobs?repo=autoland&selectedTaskRun=fBM6W5jET9G93oUKWSDduQ.0&revision=54e2a2ae58ddc48dcfab4bdcbcdc855d50aae7b9
Here's the JS stack:
08-02 17:25:33.565 3346 3361 D Gecko : 0 _updateAttribution() ["resource://gre/modules/TelemetryEnvironment.jsm":1670:6]
08-02 17:25:33.565 3346 3361 D Gecko : 1 _updateSettings() ["resource://gre/modules/TelemetryEnvironment.jsm":1591:9]
08-02 17:25:33.565 3346 3361 D Gecko : 2 EnvironmentCache() ["resource://gre/modules/TelemetryEnvironment.jsm":945:16]
08-02 17:25:33.565 3346 3361 D Gecko : 3 getGlobal() ["resource://gre/modules/TelemetryEnvironment.jsm":101:25]
08-02 17:25:33.565 3346 3361 D Gecko : 4 get currentEnvironment() ["resource://gre/modules/TelemetryEnvironment.jsm":108:11]
08-02 17:25:33.565 3346 3361 D Gecko : 5 assemblePing() ["resource://gre/modules/TelemetryControllerParent.jsm":425:0]
08-02 17:25:33.565 3346 3361 D Gecko : 6 _submitPingLogic() ["resource://gre/modules/TelemetryControllerParent.jsm":487:24]
08-02 17:25:33.565 3346 3361 D Gecko : 7 InterpretGeneratorResume() ["self-hosted":1482:33]
08-02 17:25:33.565 3346 3361 D Gecko : 8 AsyncFunctionNext() ["self-hosted":692:26]
Is the expectation thus that this doesn't run on android/fenix?
Comment 8•3 years ago
|
||
My expectation is that the code shouldn't even be present on Fenix, let alone be running. Firefox Telemetry isn't sent from Fenix, Glean data is.
To throw in a theory: perhaps Firefox Telemetry is present on Android builds because of Fennec (though it had its own written-in-Java Telemetry networking stack, IIRC), and Fenix inherited it?
ni?pocmo Do you have any idea of the history/mechanics are that result in Firefox Telemetry being present and active in Fenix (where it isn't used to send Telemetry)?
Updated•3 years ago
|
Comment 9•3 years ago
|
||
I have to forward this question to agi.
Reporter | ||
Comment 10•3 years ago
|
||
I think the patch from bug 1723628 (https://hg.mozilla.org/mozilla-central/rev/a349aee25be1) may have fixed the immediate issue here, but it seems worth keeping this bug open to figure out the question of how this code is even running on android, and if we have an opportunity to improve android perf and app size by unshipping some of the relevant telemetry bits on android...
Comment 11•3 years ago
|
||
We chatted a little bit with :chutten on matrix, this is likely due to the telemetry ping that we send. I'm not sure if anybody is using that (maybe gfx is?) Unfortunately we're temporarily really low on staff and I don't have time to look at this more in depth.
Comment 12•3 years ago
|
||
I've done some more research and the environment-esque datapoints GFX echoes to Geckoview don't come from TelemetryEnvironment.jsm.
Near as I can tell there's no data reason for Telemetry JSMs to be included in Geckoview builds.
Agi, is there a "quick" way we can try removing toolkit/components/telemetry/app/*
from gv builds to see if things go awry? I imagine they take up a lot of space and removing them could be a quick perf win. Is it a matter of editing some manifest somewhere that I can try?
Reporter | ||
Comment 13•3 years ago
|
||
(In reply to Chris H-C :chutten from comment #12)
I've done some more research and the environment-esque datapoints GFX echoes to Geckoview don't come from TelemetryEnvironment.jsm.
Near as I can tell there's no data reason for Telemetry JSMs to be included in Geckoview builds.
Agi, is there a "quick" way we can try removing
toolkit/components/telemetry/app/*
from gv builds to see if things go awry? I imagine they take up a lot of space and removing them could be a quick perf win. Is it a matter of editing some manifest somewhere that I can try?
You can try changing the relevant moz.build
files that include the jsm
files into omni.ja, so that they don't do that on android. No idea how much that's gonna break though! :-)
Comment 14•3 years ago
|
||
That ^ just add if CONFIG["MOZ_BUILD_APP"] != "mobile/android"
to the highest level moz.build
that includes those files.
Comment 15•3 years ago
|
||
Adding a naive if CONFIG["MOZ_BUILD_APP"] != "mobile/android" and not CONFIG["TESTS"]:
to toolkit/components/telemetry/moz.build
just above adding the JSMs was enough to get tests to fail entertainingly. I'm not even sure it's my code change that caused these failures because... well, the mochitests are in a retry loop and xpcshell can't find something called @/data/local/tmp/test_root/xpc/head.js:1739:69
whatever that means. (Maybe that's the line and column of an unresolved import?). They're definitely broken. And I definitely made a change. But I don't immediately see how the latter causes the former from the logs.
I guess I should set up a local Android build if I want to pursue this more. I wonder if choosing a non-desktop option in mach bootstrap
will ruin my Desktop build config...
Reporter | ||
Comment 16•3 years ago
|
||
(In reply to Chris H-C :chutten (PTO, back August 16) from comment #15)
I guess I should set up a local Android build if I want to pursue this more. I wonder if choosing a non-desktop option in
mach bootstrap
will ruin my Desktop build config...
Just running bootstrap will not - it will just download all the android stuff you need for the android build. You could probably get away with an artifact build if this is just about the JSMs and not about the compiled bits of telemetry. However, I would strongly encourage you to keep a separate objdir and mozconfig file for the android build, unless you really like clobbering and rebuilding things all the time. mozconfigwrapper
can help with that.
Comment 17•3 years ago
|
||
Thanks for the tips! I've been manually keeping separate objdirs using commented-out lines in my mozconfig for years. Next time I set up my build environment I'm definitely getting mozconfigwrapper to do it for me.
Hm, that's odd. about:memory
on geckoview-example has very few Telemetry jsms at all. Fenix has a few more in the parent process.... but no matter what there's no TelemetryEnvironment anyplace. What's up with that.
Maybe this isn't worth that much investigation after all? But then what fell afoul of the original test?
Reporter | ||
Comment 18•3 years ago
|
||
(In reply to Chris H-C :chutten from comment #17)
Thanks for the tips! I've been manually keeping separate objdirs using commented-out lines in my mozconfig for years. Next time I set up my build environment I'm definitely getting mozconfigwrapper to do it for me.
Hm, that's odd.
about:memory
on geckoview-example has very few Telemetry jsms at all. Fenix has a few more in the parent process.... but no matter what there's no TelemetryEnvironment anyplace. What's up with that.Maybe this isn't worth that much investigation after all? But then what fell afoul of the original test?
My suspicion is that some infrequent thing causes us to attempt to submit a telemetry ping. I imagine that rules out the main ping (unless it's only happening to test jobs that happen to run when the subsession split happens, ie across midnight or whatever? That would be fun...).
If we're not sure what's loading these files I have a few options to suggest:
- stop packaging them entirely on android (ie make some/all of https://searchfox.org/mozilla-central/rev/d3683dbb252506400c71256ef3994cdbdfb71ada/toolkit/components/telemetry/moz.build#94 conditional on us not being on android) and push to try, see what breaks
- less drastically, make importing the TelemetryEnvironment.jsm file break or log something unique (which I guess is not super different from the former point, now that bug 1721627 has landed - attempting to import the file that isn't packaged will cause a crash if it happens in automated tests)
- use profiler markers (they register imports) to figure out what is importing the file first
- try to reason about what would/could have caused the
_submitPingLogic
call stack in comment 7...
Comment 19•3 years ago
|
||
(In reply to :Gijs (he/him) from comment #18)
(In reply to Chris H-C :chutten from comment #17)
Thanks for the tips! I've been manually keeping separate objdirs using commented-out lines in my mozconfig for years. Next time I set up my build environment I'm definitely getting mozconfigwrapper to do it for me.
Hm, that's odd.
about:memory
on geckoview-example has very few Telemetry jsms at all. Fenix has a few more in the parent process.... but no matter what there's no TelemetryEnvironment anyplace. What's up with that.Maybe this isn't worth that much investigation after all? But then what fell afoul of the original test?
My suspicion is that some infrequent thing causes us to attempt to submit a telemetry ping. I imagine that rules out the main ping (unless it's only happening to test jobs that happen to run when the subsession split happens, ie across midnight or whatever? That would be fun...).
If we're not sure what's loading these files I have a few options to suggest:
- stop packaging them entirely on android (ie make some/all of https://searchfox.org/mozilla-central/rev/d3683dbb252506400c71256ef3994cdbdfb71ada/toolkit/components/telemetry/moz.build#94 conditional on us not being on android) and push to try, see what breaks
That's comment 12 . Failed entertainingly. Maybe I'll try a reduced set now that I have a working Android build.
- less drastically, make importing the TelemetryEnvironment.jsm file break or log something unique (which I guess is not super different from the former point, now that bug 1721627 has landed - attempting to import the file that isn't packaged will cause a crash if it happens in automated tests)
- use profiler markers (they register imports) to figure out what is importing the file first
Since bug 1721627 I'm more interested in just how much of Telemetry I can remove from GV without exploding things, so I'm disinclined to pursue these.
- try to reason about what would/could have caused the
_submitPingLogic
call stack in comment 7...
Sadly it could be anything calling submitExternalPing
: that stack is just the async
(I don't know what to call it... the Promise resumption?) of that sync call.
I'll try nixxing everything from my local Android build and see where I can go from here. Getting tests to run is a bit of a bear, so I may still be on a long-latency try
-bottlenecked dev cycle in the meantime.
Comment 20•3 years ago
|
||
Further investigation will take lower priority in favour of migrating Telemetry to Glean so we can "just" tear it all out from all shipping products.
Updated•3 years ago
|
Comment 21•2 years ago
|
||
For the next person who sees this: I pushed a try build to experiment with this that:
- removed the JSMs (on Android)
- also removed the component category registrations (on Android)
Everything fails! The xpcshell
tests show lots of places that import Telemetry, like the test harness itself and the AddonManager
, for example. The mochitests crash early, but I don't see quite where. So doing this properly requires a good deal more effort than "just don't package" the relevant JSMs.
Description
•