Open Bug 1899430 Opened 4 months ago Updated 3 months ago

Add telemetry counting time when multiple Firefox instances (profiles) are running concurrently

Categories

(Toolkit :: Application Update, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: nalexander, Unassigned)

References

Details

(Whiteboard: [fidedi-ope])

The Update Service knows when it is not the only Firefox instance running: that's what the multi-instance lock determines, and it's checked periodically in https://searchfox.org/mozilla-central/rev/f60bb10a5fe6936f9e9f9e8a90d52c18a0ffd818/toolkit/mozapps/update/UpdateService.sys.mjs#450. This ticket tracks adding telemetry instrumentation to this (and potentially expanding when we poll the multi-instance lock). That is, I'd like to start polling the multi-instance lock status more generally when Firefox is running to help us understand concurrnet multi-profile Firefox usage.

Per disussion with Travis Long, it seems like blended instrumentation might be best. First, we record Glean events for rising and falling edges (i.e., when we witness a single Firefox instance changing to multiple instances and vice versa). Second, we record counts of the number of times we checked, and the number of times when we do (or do not) witness multiple Firefox instances. That'll let us answer questions like:

  • was this session always the only Firefox profile running?
  • what proportion of this session was this the only Firefox profile running?
  • (maybe) how many other Firefox profiles were running? At various intervals?

I did some digging into how we might achieve this technically. Essential context: we're planning to remove the multi-instance lock update service state machine affordances in favour of additional update state and net-new logic at startup to "bypass updates" to avoid updating when there are other running instances. That means we won't have a timer checking the multi-instance lock status hanging around; this work would either need another periodic timer or an alternate approach.

A periodic timer would work fine, but causes some I/O and Firefox wake-ups that we'd prefer to avoid. I expect it would be possible to only run the timer during periods of interactive usage, but it might make any data collected harder to interpret.

It would be better to have the OS tell us about attempts to take the multi-instance lock. Unfortunately, I see no way to witness lock operations from the OS other than to listen to ETW events. That's a very heavy approach to this problem, one we'd definitely prefer to avoid. We might be able to make regular file-watching primitives work by writing a nonce (our parent process PID, for example) each time we took the multi-instance lock... but it's not ideal. We really do want a richer primitive than a file lock to understand this.

For Windows, it is possible to list processes that have a lock on a file: see, for example, https://stackoverflow.com/a/47510579. It looks less obvious how to do this on POSIX systems, although Linux has /proc support for querying things like this (e.g., https://utcc.utoronto.ca/~cks/space/blog/linux/LslocksNotes).

Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.