727184 - Increase granularity of telemetry uptime measurement

Reporter

Description

•

14 years ago

We track uptime in telemetry, and even report it in the telemetry front-end (as SIMPLE_MEASURES_UPTIME). But "uptime" currently is a binary proposition: It's either "less than 1 day" or "more than 1 day". This makes investigations such as bug 726375 difficult. I'm not wed to the specific set of buckets, but if we did 24h / 2^7 (~12m) and doubled from there, all but the first two buckets would be an even number of minutes, and all after the first seven would be an even number of days. So ~12m, 22.5m, 45m, etc. I doubt anyone has uptime this high, but "greater than 32 days" might be a good final bucket.

(dormant account)

Comment 1

•

14 years ago

Uptime is in minutes. Not sure what you are proposing. Once bug 707320 is done we should have more long uptimes sent in. Feel free to help with that bug btw :)

Justin Lebar (not reading bugmail)

Reporter

Comment 2

•

14 years ago

Uptime is sent in minutes. But it is bucketed into "less than 1 day" and "more than one day" in the telemetry database. See the SIMPLE_MEASURES_UPTIME histogram. I'm proposing that there be more buckets, and that we re-bucket existing data.

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 3

•

14 years ago

Definitely makes sense to have more buckets. Can you guys think about what level it should be broken down to and post that in this bug and then we'll plan out the change?

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 4

•

14 years ago

Hit enter by mistake. What I meant to say is whether the other Telemetry people agree with jlebar's proposal before we start implementing it.

(dormant account)

Comment 5

•

14 years ago

(In reply to Daniel Einspanjer :dre [:deinspanjer] from comment #4) > Hit enter by mistake. What I meant to say is whether the other Telemetry > people agree with jlebar's proposal before we start implementing it. I don't think there are other telemetry people who care about uptime distribution. Lets just do it.

Justin Lebar (not reading bugmail)

Reporter

Updated

•

14 years ago

No longer blocks: 726375

Depends on: 726375

"Saptarshi Guha[:joy]"

Comment 6

•

13 years ago

As per the figure at the bottom of http://people.mozilla.org/~sguha/cyccollector.uptime.html we suggest [ 0, 5, 15, 30, 60,90, … every 60 up to 1441 … , 2880 ] (2880 = 2 days) as buckets. This implies every hour would be a bucket with the exception of the first 90 minutes leaving around 30 buckets for display purposes.

Justin Lebar (not reading bugmail)

Reporter

Comment 7

•

13 years ago

(In reply to Saptarshi Guha from comment #6) > As per the figure at the bottom of > http://people.mozilla.org/~sguha/cyccollector.uptime.html we suggest [ 0, 5, > 15, 30, 60,90, … every 60 up to 1441 … , 2880 ] (2880 = 2 days) as buckets. > This implies every hour would be a bucket > with the exception of the first 90 minutes leaving around 30 buckets for > display purposes. I'm not wed to a specific set of buckets, but I'd like the max bucket to be bigger than 2 days. Doing one every 60 minutes up to 1 day, then skipping all the way up to 2 days seems less than ideal, but whatever.

"Saptarshi Guha[:joy]"

Comment 8

•

13 years ago

Oh yes, we need a bucket for everything above 2 days (otherwise we wont have a bucket for those cases). If you look at the graph, there is only a handful of points above 2 days and if you look at the bucket widths (the intervals in the panels at the top of the page) for the last set of 3 horizontal graphs [254 (~4hrs)-1479(~1day)] contains 5% of the data and [1479,a very large number] another 5%.

Justin Lebar (not reading bugmail)

Reporter

Comment 9

•

13 years ago

> If you look at the graph, there is only a handful of points above 2 days Even for people on the release channel? The problem we have now is that we have too few buckets. I'd rather create some new ones we don't need than aggressively aggregate buckets together and come to regret it in the future.

"Saptarshi Guha[:joy]"

Comment 10

•

13 years ago

Can't comment for release channel - didn't separate out for channels. However more buckets is always good, so +1 from here.

Pedro Alves

Comment 11

•

13 years ago

Please be aware that the ammount of buckets you choose will directly impact the ammount of aggregations we do.try to fight the urge of doing too many.

Justin Lebar (not reading bugmail)

Reporter

Comment 12

•

13 years ago

(In reply to Pedro Alves from comment #11) > Please be aware that the ammount of buckets you choose will directly impact > the ammount of aggregations we do.try to fight the urge of doing too many. Could you please elaborate on this? What is the tradeoff, exactly?

Pedro Alves

Comment 13

•

13 years ago

This will act as a new dimension, so we can filter on this on the front end. When we aggregate the docs from hbase to ES, we aggregate on those dimensions. Every different combination will result in more docs, and this is no exception. Being a type of count thatmwill surely have a somewhat linear distribution, it will result in N more documents (N being themnumber of buckets) I'm currently on vacations, and a bit slower to answer. Daniel should be able to help here too

Justin Lebar (not reading bugmail)

Reporter

Comment 14

•

13 years ago

(In reply to Pedro Alves from comment #13) > This will act as a new dimension, so we can filter on this on the front end. I don't think it needs to. The uptime number applies to the telemetry ping as a whole. Filtering by "this ping was sent when Firefox had been running for between 60 and 90 minutes" is not particularly interesting to me, particularly because that doesn't mean that the data in the ping is from when Firefox had been running for 60-90m. I'm OK keeping the current filtering as "less/more than one day" for the moment, since the ping is sent only once a day. Anyway, the ping from "less than one day" can now contain data from uptime of greater than one day, since we landed bug 707320.

"Saptarshi Guha[:joy]"

Comment 15

•

13 years ago

Correct me, doesn't 60<uptime<90 mean that the measurements were collected for a period of time between 60 to 90 minutes? If not, what does uptime mean?

Justin Lebar (not reading bugmail)

Reporter

Comment 16

•

13 years ago

(In reply to Saptarshi Guha from comment #15) > Correct me, doesn't 60<uptime<90 mean that the measurements were collected > for a period of time between 60 to 90 minutes? If not, what does uptime mean? |uptime| tells us how long Firefox had been running when the telemetry ping was sent. Before bug 707320, all reported histograms were collected before |uptime|. After bug 707320, AIUI all bets are off.

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

13 years ago

Blocks: daily_beta_tracking

Annie Elliott

Comment 17

•

13 years ago

Marking: in group of > 33 asks for Telemetry that need PM priority before triage/scheduling.

Status: NEW → ASSIGNED

Whiteboard: Telemetry -- needs PM project priority

Annie Elliott

Comment 18

•

13 years ago

Triaged.

Target Milestone: Unreviewed → Backlogged - BZ

(dormant account)

Comment 19

•

13 years ago

I think having bug 778809 fixed will result in what is wanted here

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → DUPLICATE

Bugzilla

Increase granularity of telemetry uptime measurement

Categories

(Mozilla Metrics :: Data/Backend Reports, defect)

Tracking

(Not tracked)

People

(Reporter: justin.lebar+bug, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: Telemetry -- needs PM project priority)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Updated

Comment 17

Comment 18

Comment 19