How should we handle multiple app_ids from the same binary?
Categories
(Toolkit :: Telemetry, task, P1)
Tracking
()
People
(Reporter: chutten, Assigned: chutten)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
Firefox Desktop is (at present) unique amongst Glean SDK embedders in that it has multiple app ids (firefox_desktop
and firefox_desktop_background_update
(and soon also firefox_desktop_background_tasks
)) all served from the same binary.
We make this work in the pipeline by slicing up metrics_index.py
into segments by app_id, and using that to inform the pipeline which app has which metrics and pings. This stops e.g. firefox_desktop_background_update
from having a "newtab" ping, or any of its event
metrics.
However, since the lists of metrics and pings available within the client are determined by glean_parser
at compile time, firefox_desktop_background_update
can indeed submit a "newtab" ping and the SDK won't stop it. More importantly: any data that is to be sent in another app_id's ping will be successfully stored even though that app_id doesn't intend to send it and the pipeline isn't set up to receive it if it did.
In Glean v51 and earlier this was self-limiting. The maximum size of non-event
metrics is fixed (thank you, Sensible Limits as an architectural choice), and event
metrics would trigger the pings to be submitted automatically by the SDK. (See bug 1787918 and its friends for how the pipeline felt about this)
With bug 1716725 (coming in Glean v52) we've addressed the problem of the SDK sending another app_id's pings automatically just because there are events stored in it. But this puts us in the position where now we may store unlimited numbers of event
records.
This is bad. What should we do about it?
Assignee | ||
Comment 1•2 years ago
|
||
An idea: Limit storage to pings that have been registered. We document that embedders should call register_pings
before calling initialize
. Compliance levels with this directive are unknown, but at least FOG follows it. So we should know at init the name of all the pings we might need to store data for. FOG could generate a list of pings per-app-id and then register only the pings for that app id.
We could make this behaviour optional, like delay_ping_lifetime_io
: limit_storage_to_registered_pings
.
Implementation could take a few paths, like:
- Refuse to ever store any data for unregistered pings. Would be a big change, but would keep our storage tidy.
- Trim our storage at init to only registered pings. Smaller change. Probably works better with JOG which may want to add/change pings at runtime. (Though we could opt to set
limit_storage_to_registered_pings
tofalse
when we're a dev build)
Considerations:
- We don't know if all of our platforms and projects are currently good at registering their pings. We only seem to rely on that for
submit_ping_by_name
(at least as of v52+) so they might not bother.- That's why I'm suggesting embedders could opt into it with Configuration
- JOG. It specifically happens at an unknown time "later" in startup and may create new pings. I'm pretty sure it doesn't bother to even register them. But we should support the use case.
- If we go with Path 1, then JOG could be taught to register its pings. If we go with Path 2, then we could get away with a note in the docs that JOG doesn't support persisting ping-lifetime data across loads.
- Performance: Clearing the metric db of specific stores involves iterating the whole thing.
- Clearing the event stores is somewhat cheaper, but still not great. Neither are really designed to support this operation performantly.
Jan-Erik, what do you think? With Nimbus events in the background-update
ping, this becomes an immediate problem for Glean v52+ in Firefox Desktop.
Comment 2•2 years ago
|
||
(In reply to Chris H-C :chutten from comment #1)
Compliance levels with this directive are unknown, but at least FOG follows it.
Firefox and Focus iOS don't call it. But their custom pings don't contain events, so they would be fine.
With a fully general solution the problem is libraries though. They usually don't call registerPings
. We have at least one in a-c that has custom pings, I don't know if there are events in them (of course we could go and fix it for a-c).
So we will need the limit_storage_to_registered_pings
solution to have it for FOG.
I would prefer that config option with solution 1.
Doing the minimal thing JUST for events is surprisingly easy, it needs to happen in just one place: https://github.com/mozilla/glean/commit/890b4e3875dd8b7fe23a0fba56871cc4a92ef05f
(plus fixing up tests and wiring in limit_storage_to_registered_pings
and ... and ..., yes, not that easy anymore)
Only doing that for events now might be doable though without too much work. We can extend that to non-event metrics afterwards.
That would allow us to land the Glean update rather soon without causing the mentioned issues.
We can then also file follow-up work to get other consumers to correctly call register_pings
and figure out what libraries should do so eventually it becomes standard.
Comment 3•2 years ago
|
||
Comment 4•2 years ago
|
||
With the PR up: I think option 2: trimming events storage on startup we're fine as well.
While in theory this behavior is visible from the outside, we're only setting the flag on FOG where we know we register all pings correctly for now.
Thus it's about the same as just not storing that data (and for now JOG doesn't do pings anyway).
In the future we can still swap that around to drop data if we don't know a ping.
Comment 5•2 years ago
|
||
Description
•