Closed Bug 727184 Opened 12 years ago Closed 12 years ago

Increase granularity of telemetry uptime measurement

Categories

(Mozilla Metrics :: Data/Backend Reports, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 778809
Backlogged - BZ

People

(Reporter: justin.lebar+bug, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: Telemetry -- needs PM project priority)

We track uptime in telemetry, and even report it in the telemetry front-end (as SIMPLE_MEASURES_UPTIME).

But "uptime" currently is a binary proposition: It's either "less than 1 day" or "more than 1 day".  This makes investigations such as bug 726375 difficult.

I'm not wed to the specific set of buckets, but if we did 24h / 2^7 (~12m) and doubled from there, all but the first two buckets would be an even number of minutes, and all after the first seven would be an even number of days.  So ~12m, 22.5m, 45m, etc.  I doubt anyone has uptime this high, but "greater than 32 days" might be a good final bucket.
Uptime is in minutes. Not sure what you are proposing. Once bug 707320 is done we should have more long uptimes sent in. Feel free to help with that bug btw :)
Uptime is sent in minutes.  But it is bucketed into "less than 1 day" and "more than one day" in the telemetry database.  See the SIMPLE_MEASURES_UPTIME histogram.

I'm proposing that there be more buckets, and that we re-bucket existing data.
Definitely makes sense to have more buckets.  Can you guys think about what level it should be broken down to and post that in this bug and then we'll plan out the change?
Hit enter by mistake.  What I meant to say is whether the other Telemetry people agree with jlebar's proposal before we start implementing it.
(In reply to Daniel Einspanjer :dre [:deinspanjer] from comment #4)
> Hit enter by mistake.  What I meant to say is whether the other Telemetry
> people agree with jlebar's proposal before we start implementing it.

I don't think there are other telemetry people who care about uptime distribution. Lets just do it.
No longer blocks: 726375
Depends on: 726375
As per the figure at the bottom of http://people.mozilla.org/~sguha/cyccollector.uptime.html we suggest [ 0, 5, 15, 30, 60,90, … every 60 up to 1441 … , 2880 ] (2880 = 2 days) as buckets. This implies every hour would be a bucket
with the exception of the first 90 minutes leaving around 30 buckets for display purposes.
(In reply to Saptarshi Guha from comment #6)
> As per the figure at the bottom of
> http://people.mozilla.org/~sguha/cyccollector.uptime.html we suggest [ 0, 5,
> 15, 30, 60,90, … every 60 up to 1441 … , 2880 ] (2880 = 2 days) as buckets.
> This implies every hour would be a bucket
> with the exception of the first 90 minutes leaving around 30 buckets for
> display purposes.

I'm not wed to a specific set of buckets, but I'd like the max bucket to be bigger than 2 days.  Doing one every 60 minutes up to 1 day, then skipping all the way up to 2 days seems less than ideal, but whatever.
Oh yes, we need a bucket for everything above 2 days (otherwise we wont have a bucket for those cases).

If you look at the graph, there is only a handful of points above 2 days and if you look at the bucket widths (the intervals in the panels at the top of the page) for the last set of 3 horizontal graphs [254 (~4hrs)-1479(~1day)] contains 5% of the data and [1479,a very large number] another 5%.
> If you look at the graph, there is only a handful of points above 2 days

Even for people on the release channel?

The problem we have now is that we have too few buckets.  I'd rather create some new ones we don't need than aggressively aggregate buckets together and come to regret it in the future.
Can't comment for release channel - didn't separate out for channels. However more buckets is always good, so +1 from here.
Please be aware that the ammount of buckets you choose will directly impact the ammount of aggregations we do.try to fight the urge of doing too many.
(In reply to Pedro Alves from comment #11)
> Please be aware that the ammount of buckets you choose will directly impact
> the ammount of aggregations we do.try to fight the urge of doing too many.

Could you please elaborate on this?  What is the tradeoff, exactly?
This will act as a new dimension, so we can filter on this on the front end. When we aggregate the docs from hbase to ES, we aggregate on those dimensions. Every different combination will result in more docs, and this is no exception. Being a type of count thatmwill surely have a somewhat linear distribution, it will result in N more documents (N being themnumber of buckets)


I'm currently on vacations, and a bit slower to answer. Daniel should be able to help here too
(In reply to Pedro Alves from comment #13)
> This will act as a new dimension, so we can filter on this on the front end.

I don't think it needs to.

The uptime number applies to the telemetry ping as a whole.  Filtering by "this ping was sent when Firefox had been running for between 60 and 90 minutes" is not particularly interesting to me, particularly because that doesn't mean that the data in the ping is from when Firefox had been running for 60-90m.

I'm OK keeping the current filtering as "less/more than one day" for the moment, since the ping is sent only once a day.  Anyway, the ping from "less than one day" can now contain data from uptime of greater than one day, since we landed bug 707320.
Correct me, doesn't 60<uptime<90 mean that the measurements were collected for a period of time between 60 to 90 minutes? If not, what does uptime mean?
(In reply to Saptarshi Guha from comment #15)
> Correct me, doesn't 60<uptime<90 mean that the measurements were collected
> for a period of time between 60 to 90 minutes? If not, what does uptime mean?

|uptime| tells us how long Firefox had been running when the telemetry ping was sent.

Before bug 707320, all reported histograms were collected before |uptime|.  After bug 707320, AIUI all bets are off.
Marking: in group of > 33 asks for Telemetry that need PM priority before triage/scheduling.
Status: NEW → ASSIGNED
Whiteboard: Telemetry -- needs PM project priority
Triaged.
Target Milestone: Unreviewed → Backlogged - BZ
I think having bug 778809 fixed will result in what is wanted here
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.