Closed Bug 1804915 Opened 2 years ago Closed 2 years ago

How should we handle multiple app_ids from the same binary?

Categories

(Toolkit :: Telemetry, task, P1)

task

Tracking

()

RESOLVED FIXED

People

(Reporter: chutten, Assigned: chutten)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Firefox Desktop is (at present) unique amongst Glean SDK embedders in that it has multiple app ids (firefox_desktop and firefox_desktop_background_update (and soon also firefox_desktop_background_tasks)) all served from the same binary.

We make this work in the pipeline by slicing up metrics_index.py into segments by app_id, and using that to inform the pipeline which app has which metrics and pings. This stops e.g. firefox_desktop_background_update from having a "newtab" ping, or any of its event metrics.

However, since the lists of metrics and pings available within the client are determined by glean_parser at compile time, firefox_desktop_background_update can indeed submit a "newtab" ping and the SDK won't stop it. More importantly: any data that is to be sent in another app_id's ping will be successfully stored even though that app_id doesn't intend to send it and the pipeline isn't set up to receive it if it did.

In Glean v51 and earlier this was self-limiting. The maximum size of non-event metrics is fixed (thank you, Sensible Limits as an architectural choice), and event metrics would trigger the pings to be submitted automatically by the SDK. (See bug 1787918 and its friends for how the pipeline felt about this)

With bug 1716725 (coming in Glean v52) we've addressed the problem of the SDK sending another app_id's pings automatically just because there are events stored in it. But this puts us in the position where now we may store unlimited numbers of event records.

This is bad. What should we do about it?

An idea: Limit storage to pings that have been registered. We document that embedders should call register_pings before calling initialize. Compliance levels with this directive are unknown, but at least FOG follows it. So we should know at init the name of all the pings we might need to store data for. FOG could generate a list of pings per-app-id and then register only the pings for that app id.

We could make this behaviour optional, like delay_ping_lifetime_io: limit_storage_to_registered_pings.

Implementation could take a few paths, like:

  1. Refuse to ever store any data for unregistered pings. Would be a big change, but would keep our storage tidy.
  2. Trim our storage at init to only registered pings. Smaller change. Probably works better with JOG which may want to add/change pings at runtime. (Though we could opt to set limit_storage_to_registered_pings to false when we're a dev build)

Considerations:

  • We don't know if all of our platforms and projects are currently good at registering their pings. We only seem to rely on that for submit_ping_by_name (at least as of v52+) so they might not bother.
    • That's why I'm suggesting embedders could opt into it with Configuration
  • JOG. It specifically happens at an unknown time "later" in startup and may create new pings. I'm pretty sure it doesn't bother to even register them. But we should support the use case.
    • If we go with Path 1, then JOG could be taught to register its pings. If we go with Path 2, then we could get away with a note in the docs that JOG doesn't support persisting ping-lifetime data across loads.
  • Performance: Clearing the metric db of specific stores involves iterating the whole thing.
    • Clearing the event stores is somewhat cheaper, but still not great. Neither are really designed to support this operation performantly.

Jan-Erik, what do you think? With Nimbus events in the background-update ping, this becomes an immediate problem for Glean v52+ in Firefox Desktop.

Flags: needinfo?(jrediger)

(In reply to Chris H-C :chutten from comment #1)

Compliance levels with this directive are unknown, but at least FOG follows it.

Firefox and Focus iOS don't call it. But their custom pings don't contain events, so they would be fine.
With a fully general solution the problem is libraries though. They usually don't call registerPings. We have at least one in a-c that has custom pings, I don't know if there are events in them (of course we could go and fix it for a-c).

So we will need the limit_storage_to_registered_pings solution to have it for FOG.

I would prefer that config option with solution 1.
Doing the minimal thing JUST for events is surprisingly easy, it needs to happen in just one place: https://github.com/mozilla/glean/commit/890b4e3875dd8b7fe23a0fba56871cc4a92ef05f
(plus fixing up tests and wiring in limit_storage_to_registered_pings and ... and ..., yes, not that easy anymore)

Only doing that for events now might be doable though without too much work. We can extend that to non-event metrics afterwards.
That would allow us to land the Glean update rather soon without causing the mentioned issues.

We can then also file follow-up work to get other consumers to correctly call register_pings and figure out what libraries should do so eventually it becomes standard.

Flags: needinfo?(jrediger)

With the PR up: I think option 2: trimming events storage on startup we're fine as well.
While in theory this behavior is visible from the outside, we're only setting the flag on FOG where we know we register all pings correctly for now.
Thus it's about the same as just not storing that data (and for now JOG doesn't do pings anyway).

In the future we can still swap that around to drop data if we don't know a ping.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
See Also: → 1805427
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: