Closed Bug 1203245 Opened 9 years ago Closed 5 years ago

Include the last system add-on update check in telemetry

Categories

(Toolkit :: Add-ons Manager, defect)

defect
Not set
normal

Tracking

()

RESOLVED INACTIVE
Tracking Status
firefox43 --- affected

People

(Reporter: mossop, Unassigned)

References

Details

So we can be sure that Android etc. are getting updates reasonably we'd like top track when system add-on updates happen. An easy way to do this would be to just include the time of the last system add-on update check in the telemetry environment section but there are other options.
I'm not sure of the best way to do this. Looks like the app update service submits the days since last check during a check but if the system add-on update check doesn't happen like it should then this might mean we don't get data. I could just submit the days since last check during startup.
Flags: needinfo?(gfritzsche)
(In reply to Dave Townsend [:mossop] from comment #1)
> I'm not sure of the best way to do this. Looks like the app update service
> submits the days since last check during a check but if the system add-on
> update check doesn't happen like it should then this might mean we don't get
> data. I could just submit the days since last check during startup.

I don't think this needs to be in the environment, just adding a new Telemetry probe should be enough (Histograms.json).
Does only submitting it on startup answer the questions you want to ask?
Or should we update this whenever updates happened? That way you get the latest state with each submitted subsession.
This will be opt-out on release i assume?
Flags: needinfo?(gfritzsche)
(In reply to Georg Fritzsche [:gfritzsche] from comment #2)
> (In reply to Dave Townsend [:mossop] from comment #1)
> > I'm not sure of the best way to do this. Looks like the app update service
> > submits the days since last check during a check but if the system add-on
> > update check doesn't happen like it should then this might mean we don't get
> > data. I could just submit the days since last check during startup.
> 
> I don't think this needs to be in the environment, just adding a new
> Telemetry probe should be enough (Histograms.json).
> Does only submitting it on startup answer the questions you want to ask?
> Or should we update this whenever updates happened? That way you get the
> latest state with each submitted subsession.

What updates are you referring to here? The question we want to answer is: "are clients checking for updated system add-ons reasonably frequently". The question mainly comes from Fennec where there are concerns that regular update checks don't happen often because Gecko isn't running for long enough but it is useful for verifying the feature on desktop too.

> This will be opt-out on release i assume?

Yes
(In reply to Dave Townsend [:mossop] from comment #3)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #2)
> > (In reply to Dave Townsend [:mossop] from comment #1)
> > > I'm not sure of the best way to do this. Looks like the app update service
> > > submits the days since last check during a check but if the system add-on
> > > update check doesn't happen like it should then this might mean we don't get
> > > data. I could just submit the days since last check during startup.
> > 
> > I don't think this needs to be in the environment, just adding a new
> > Telemetry probe should be enough (Histograms.json).
> > Does only submitting it on startup answer the questions you want to ask?
> > Or should we update this whenever updates happened? That way you get the
> > latest state with each submitted subsession.
> 
> What updates are you referring to here?

Whenever a check happens, should we update that histogram?
My understanding of mobile app lifetimes is that they could be very long (whole life-time over multiple foreground/background switches) - multiple checks might happen during that session that we would not see until Fennec is started again.
Another thought, more for analysis i guess: if i didn't run Fennec for a year, then start it again, should the "time since last check" include that year?
(In reply to Georg Fritzsche [:gfritzsche] from comment #4)
> (In reply to Dave Townsend [:mossop] from comment #3)
> > (In reply to Georg Fritzsche [:gfritzsche] from comment #2)
> > > (In reply to Dave Townsend [:mossop] from comment #1)
> > > > I'm not sure of the best way to do this. Looks like the app update service
> > > > submits the days since last check during a check but if the system add-on
> > > > update check doesn't happen like it should then this might mean we don't get
> > > > data. I could just submit the days since last check during startup.
> > > 
> > > I don't think this needs to be in the environment, just adding a new
> > > Telemetry probe should be enough (Histograms.json).
> > > Does only submitting it on startup answer the questions you want to ask?
> > > Or should we update this whenever updates happened? That way you get the
> > > latest state with each submitted subsession.
> > 
> > What updates are you referring to here?
> 
> Whenever a check happens, should we update that histogram?

You mean a system add-on update check?

> My understanding of mobile app lifetimes is that they could be very long
> (whole life-time over multiple foreground/background switches) - multiple
> checks might happen during that session that we would not see until Fennec
> is started again.

rnewman tells me otherwise, but I'll let him weigh in on that.

> Another thought, more for analysis i guess: if i didn't run Fennec for a
> year, then start it again, should the "time since last check" include that
> year?

Maybe, but I'd expect it to be an outlier and not affect the aggregate stats much.
Flags: needinfo?(rnewman)
(In reply to Dave Townsend [:mossop] from comment #5)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #4)
> > (In reply to Dave Townsend [:mossop] from comment #3)
> > > (In reply to Georg Fritzsche [:gfritzsche] from comment #2)
> > > > (In reply to Dave Townsend [:mossop] from comment #1)
> > > > > I'm not sure of the best way to do this. Looks like the app update service
> > > > > submits the days since last check during a check but if the system add-on
> > > > > update check doesn't happen like it should then this might mean we don't get
> > > > > data. I could just submit the days since last check during startup.
> > > > 
> > > > I don't think this needs to be in the environment, just adding a new
> > > > Telemetry probe should be enough (Histograms.json).
> > > > Does only submitting it on startup answer the questions you want to ask?
> > > > Or should we update this whenever updates happened? That way you get the
> > > > latest state with each submitted subsession.
> > > 
> > > What updates are you referring to here?
> > 
> > Whenever a check happens, should we update that histogram?
> 
> You mean a system add-on update check?

Yes, the checks whose delays this will measure.
(In reply to Dave Townsend [:mossop] from comment #5)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #4)
> > (In reply to Dave Townsend [:mossop] from comment #3)
> > > (In reply to Georg Fritzsche [:gfritzsche] from comment #2)
> > > > (In reply to Dave Townsend [:mossop] from comment #1)
> > > > > I'm not sure of the best way to do this. Looks like the app update service
> > > > > submits the days since last check during a check but if the system add-on
> > > > > update check doesn't happen like it should then this might mean we don't get
> > > > > data. I could just submit the days since last check during startup.
> > > > 
> > > > I don't think this needs to be in the environment, just adding a new
> > > > Telemetry probe should be enough (Histograms.json).
> > > > Does only submitting it on startup answer the questions you want to ask?
> > > > Or should we update this whenever updates happened? That way you get the
> > > > latest state with each submitted subsession.
> > > 
> > > What updates are you referring to here?
> > 
> > Whenever a check happens, should we update that histogram?
> 
> You mean a system add-on update check?
> 
> > My understanding of mobile app lifetimes is that they could be very long
> > (whole life-time over multiple foreground/background switches) - multiple
> > checks might happen during that session that we would not see until Fennec
> > is started again.
> 
> rnewman tells me otherwise, but I'll let him weigh in on that.

Depends what you mean by "app lifetime".

The application itself can last for an indeterminate amount of time, and will be created and killed by the OS at arbitrary times to run background services.

The main browser *activity*, which is what users will think of as the app, lasts for a shorter amount of time.

Gecko's lifespan lasts for a subset of the application lifetime, triggered by launching the activity, and only killed when the application dies.

Foreground activity time is a subset of each activity lifetime.

A monospaced timeline:


---------[ Application                               ]----[ Application                     ]-
------------[ Activity          ]-[ Activity     ]--------------------------------------------
--------------[ Gecko                               ]-----------------------------------------
-------------( Foreground )--(FF)--(FFFF)-----(FF)--------------------------------------------
---------------( Background services )---------------------( Background )---------------------
Flags: needinfo?(rnewman)
Thanks Richard.
So Gecko life-time can be arbitrarily long, so we should update this measurement and not only set it on startup.
E.g. when update checks happen, or when the app comes to the foreground, ...?
(In reply to Georg Fritzsche [:gfritzsche] from comment #8)
> Thanks Richard.
> So Gecko life-time can be arbitrarily long, so we should update this
> measurement and not only set it on startup.
> E.g. when update checks happen, or when the app comes to the foreground, ...?

I don't think I know enough about how telemetry works to figure out what to do here. We basically want to know that users are checking for updates reasonably frequently. That gets messy when users don't run the app often so I don't know how we'd factor that in to the results
Currently the Unified Telemetry design for Fennec is WIP, so its a little harder to find the best approach.

One approach that should work properly independent of the Telemetry subsession design:
* use a histogram to store "time since last check" (with lowest precision possible, days, hours, minutes, ...?)
* early after/at startup, write the latest check time from the updater code
* whenever a new update check happened, update the histogram from the updater code (using .reset() & .add() on the histogram)

Vladan, does that make sense to you as well?
Flags: needinfo?(vladan.bugzilla)
> Vladan, does that make sense to you as well?

I'm not in favor of that approach because it uses wall clock time, so it doesn't take into account the fact that checks could appear delayed or broken when the browser simply hasn't been launched in a long time.

I think what Dave would want is a measurement of "Firefox active-usage-time since last check for updated system addons".

Keeping track of active usage time between system addon update checks would be onerous, so instead, this could be implemented by reporting the timestamp of each system addon update check. A server-side script could then look through a client's Telemetry session history and figure out if there's a problem with the system addon update checks.

As an even simpler alternative, Fennec could keep a running count of Firefox sessions (i.e. Gecko sessions) since the last system addon update check. As a result, the reporting would be at the granularity of sessions, but I think that might be sufficient.

For example:

Fennec session #1: Fennec checked for an update during this session, therefore it records a value of 0 to Telemetry (0 session since last update check)
Fennec session #2: Fennec does not check for an update, and reports a value of 1 (1 Gecko session since the last check)
Fennec session #3: didn't check for updates, reports a value of 2
Fennec session #4: did a check, reports 0
Fennec session #5: no check, reports 1
Fennec session #6: check, reports 0

This measure of "number of sessions since last check" can be recorded in a histogram. The final server-side aggregated histogram (with data from all Fennec users) should be sufficient to answer whether updates are happening often enough, in general. It would also enable more in-depth server-side analyses for identifying which conditions lead to update checks being delayed.
Flags: needinfo?(vladan.bugzilla)
Moving to more suitable component to triage against what is currently needed.
Component: Telemetry → Add-ons Manager
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.