We should consider using a real-time Telemetry probe for system add-ons in the addons manager, for more rapid feedback when an update is made available.
Can you explain what kind of monitoring you want to build and who's going to be responsible for it? We've talked about specific probes for updates in general so that we can e.g. roll out main app updates to 10k people and shut it off automatically. Perhaps we need to think a little more generally about that problem space.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #1) > Can you explain what kind of monitoring you want to build and who's going to > be responsible for it? Right now I am just trying to make it possible to differentiate add-ons in the telemetry data, we're discussing (in the go-faster meetings mostly) who will build monitors and be responsible for monitoring the updates. > We've talked about specific probes for updates in > general so that we can e.g. roll out main app updates to 10k people and shut > it off automatically. Perhaps we need to think a little more generally about > that problem space. I had a similar objective in mind, e.g. roll out updates to 10k people, check that we see approximately that number activated via telemetry. The reason we want this particular probe real-time is because the add-ons can do restartless updates, so we should be able to tell very quickly if users are experiencing problems. I think this might be a bit trickier in the general update case since we need to wait for users to restart, but there should be ways to mitigate this (for instance, we should be able to tell that an update is pending for x%, update applied for y%, update failed for z%, or something along those lines). Thinking a bit more broadly, it would be nice to have an overall release status dashboard+monitoring that contains all Firefox updates (regardless if they are "go-faster" sort of components or not), and more specific dashboards+monitors for features that can update independently as well. Ideally there should be one place for folks that want to get the overall status.
I've been able to get what we need from the telemetry environment data we have already - we can do one-offs like bug 1307568 if we need to collect more data temporarily with a real-time ping.