(In reply to Chris H-C :chutten from comment #7)
I wonder if part of the problem we're presently having with Glean SDK lifetimes is that they're not orderable. By which I mean: user-lifetime is always at least as long as application-lifetime... but with application-lifetime and ping-lifetime either can be longer than the other.
Yes, your understanding of the lifetimes is correct.
User is the longest possible one.
Application is as long as the app process lives.
Ping is as long as the next ping is sent.
This can be a feature. For custom pings where the ping lifetime is controlled by the component putting metrics in it, a ping-lifetime that's flexible can be really nice. (imagine short-lived pings for onboarding events or long-lived pings aggregating a week's worth of federated learning adjustments)
This is indeed a feature :) For pings that are sent at a different frequency than the one given by the application process lifetime (which is usually the case on mobile! Process lifetime can be short!), this is a requirement. Otherwise consumers would need to deal with persistence themselves.
However, that doesn't work so well for non-custom pings where the lifetime is under the Glean SDK's control.
I respectfully disagree with this :-)
The reason things mostly still work with our setup is that our metric types support composition/aggregation (A complete
gfx.composite_time would be across the entire session a compositor is active, but since there's no way to tie that to a ping lifetime we ensure that timing distributions can be combined (stable bucket layouts ftw) and then put the burden on analysis to combine them as appropriate).
(( In Firefox Desktop we of course fix this by not having lifetimes, but also by ensuring built-in pings' and metrics' lifetimes never exceed the app session's length. (ie, you could think of Firefox having application-lifetime always being longer than ping-lifetime (because we persist only pings, not metrics). ))
On Desktop we have a related/similar problem: we do have the concept of "application lifetime" metrics there and we can get, as far as I understand, pings that are lacking certain sets of metrics. The difference is that the semantic is a bit unclear and hidden under the hood: think, for example, about all the deferred messages we listen to when filling in the environment. Depending on when they get hit, we might get a partial environment in, for example, the
new-profile ping or shutdown
main-pings (for short session).
I'm not saying that's common, just saying that's possible: we've had bugs/questions about fields being missing from pings for this exact reason before!
For the "gfx application-lifetime metrics aren't working as we hoped" problem case the ideal solution could be a custom ping (though how we get that through Project EXTRACT is anyone's guess D: ). For the "pings sent with the current application-lifetime metrics even if the ping-lifetime metrics are from a previous app session" problem case we'd still need some sort of application-lifetime metric persistence, if only for the special cases of things inside
Custom pings are possible, if needed ;-) But let's keep this discussion focused on the lifetimes: from the original design doc:
application: the metric contains a property that is related to the application, and is reset on application starts. It is not reset after sending it in a ping.
This seems to be a bug, for us: we implemented a behaviour that's different than the one we spec'd around :(
The real changes would only revolve around startup, for pings that get assembled by Glean during its init. If Fenix inits GV in a deferred way/later, then I'm afraid this is really a problem with Fenix/A-C/GV that needs to be solved there, not in the Glean SDK. But I think that's also easily solvable as well: Fenix could somehow trigger GV telemetry to always be set when it starts.
But if the MPS redesign gives us an orderable sort for lifetimes within Glean-SDK-owned pings, then we might have a chance. If the "metrics" ping is never longer than an application lifetime, then we end up in a pseudo-Firefox-Desktop situation where most of the corner cases are solvable. (the setUploadEnabled case gets a little weird, but we might be able to get around that with cleverness).
I'm afraid that won't be the case, the MPS won't really change the way lifetimes are defined. It addresses a different set of problems (force-close resilience among them!).