Closed Bug 725017 Opened 12 years ago Closed 12 years ago

[Telemetry Evolution] Display 95/75/50/25/5 percentiles instead of means for non-boolean histograms

Categories

(Mozilla Metrics :: Frontend Reports, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Unreviewed

People

(Reporter: justin.lebar+bug, Assigned: paulo.pires)

Details

Telemetry evolution currently displays each day's mean value and represents the point's stddev as its size.

This is bad for a variety of reasons:

1) You can't use the size of the point to figure out what the one-sigma confidence interval is, making the stddev useful only for comparisons between points.  (Note that if the stddev is equal and the means are not, then the relative uncertainty between the points is not equal!  So stddev is misleading even for this.)

2) The mean is disproportionally affected by outliers.  In the CYCLE_COLLECTOR graph, the dots are all over the place before roughly 11/15/11.  This pattern exists in Aurora builds as well, so it's not a reflection of each nightly build's performance.  Rather, it must be a function of how many outliers we received on that day.

This data on outliers is important!  But it obfuscates the graph.


I'd like to see 95/75/50/25/5'th percentiles, rather than means.  Previously, we'd agreed that 75/50/25 would be sufficient, but unfortunately that will hide these important outliers.

If the 95 point is way off the charts, we can truncate it, just as we do on the main telemetry histograms.

The point rollovers should contain the raw 95/75/50/25/5 data, as well as the mean (and stddev if you want).

One trick we should probably do is interpolate between the relevant buckets, otherwise we lose data and may "thrash" between two buckets.  That is, suppose we have

 Bucket  1-15:  1,000 samples
 Bucket 15-30:  3,000 samples
 Bucket 30-50: 10,000 samples

and suppose the 5th percentile is the 100th sample in the 15-30 bucket.  Then we should not report the 5th percentile as 15 or 30, but rather as something close to 15.  This interpolation will need to be robust to the 30-50 bucket having 0 samples (there are holes, e.g. in the cycle_collector histogram).

The other trick is displaying this data without making the chart very noisy.  A thin line drawn up and down from the median point, with different colors separating the 5-25/25-50 and 75-95/50-75 ranges may be sufficient.


Note that we have some boolean histograms, and this definitely shouldn't be done there.
For now displaying for each dot a bar that goes from the percentile 25 to the percentile 75.
Let me know what you think of it.
Also added the values of the percentiles 25 50 75 and 90 to the rollovers
Assignee: nobody → paulo.pires
I kind of like it.  I'm not totally sold, but it certainly gives more information than it used to!

It might be improved if we drew smoothed 25/50/75 lines tracking the 25/50/75 percentiles, rather than one vertical line for each point.  What do you think?
Ok, we'll try that but also trying not to overload the graphic with data so it becomes unreadable.
(In reply to Paulo Pires from comment #4)
> Ok, we'll try that but also trying not to overload the graphic with data so
> it becomes unreadable.

Absolutely.  If you try it and it doesn't look good, let's come up with something else!
Showing small lines for the 25, 50 and 75 percentiles for each dot.
It's a better solution for you?
I think the new one is much worse than the old one.

With the vertical 25/75 lines, it was easy to observe how the quartiles changed over time.  It's much harder to do so now.  The tick marks also add a lot more noise to the graph.
I also think it's very hard to read.

Really hated the vertical lines though....

And now that we have 2 codings for the same thing (count in size and color) I say we drop one of them. Personally I prefer the colors over sizes, but you tell me
> And now that we have 2 codings for the same thing (count in size and color) I say we drop 
> one of them. Personally I prefer the colors over sizes

Sounds good to me.

> Really hated the vertical lines though....

Do you have any other ideas?
>> Really hated the vertical lines though....
>Do you have any other ideas?

No. The only thing I would suggest is having the percentile lines off by default and have an option to turn them on. When I read the chart they don't really add much to me; not sure about others, though
What about

> It might be improved if we drew smoothed 25/50/75 lines tracking the 25/50/75 
> percentiles, rather than one vertical line for each point.

I can try to find a graph which uses this technique, if you're not sure what I mean.
I know what you mean but I don't think it'll work, samples are way to unrelated. It would be the equivalent of trying to get a linechart out of the current one
(In reply to Pedro Alves from comment #12)
> I know what you mean but I don't think it'll work, samples are way to
> unrelated. It would be the equivalent of trying to get a linechart out of
> the current one

Okay.  I'd be happy with an option to turn the quartile lines on and off.
How about the previous vertical line from 25 to 75, but with one color from 25 to 50 and another color from 50 to 75?
(In reply to Paulo Pires from comment #14)
> How about the previous vertical line from 25 to 75, but with one color from
> 25 to 50 and another color from 50 to 75?

That might work.
Showing a vertical line for the percentiles with variation color from 25 to 50, 50 to 75 and 75 to 90.
Still working on a show/hide button for them
Well if you think that it's better now and it's still readable than it won't need a show/hide button for the percentiles let me know and we don't hide the percentiles
Are you looking for general feedback now, because you're happy with how it looks, or feedback on something specific about the design as it currently stands?
Just following the chain of ideas that come up for trying to find the best solution to show the percentiles.
I think that now you can see the progression along the days of each percentile, but if you think that the graph is overloaded with data, we can add an option to show/hide them. Let me know if this is a solution that will work for you, generally, and specifically with the percentiles.
I think colors are garish, which distracts from the data.  Can we try to find a more pleasant color scheme?

I find myself following the blue line, the 50-75 percentile, rather than the blue/yellow junction (the median).  Maybe it would make more sense to display 5-25/25-75/75-95.  If we did this, we could use darker gray for 25-75 and lighter gray for the outer lines.  (As is, the 25-50 and 50-75 lines are both "important" so we can't de-emphasize one with a lighter color.)
This would of course hide the median, which isn't ideal.  But the median in the current graph is hard for me to follow anyway.
Updated to the colors you suggested. The garish colors choice was to distinguish better the dots from the vertical lines, and each vertical line.

I think these colors work but it will makes harder to see some dots, maybe we can return to the blue schema for the dots?

We still only have the 25 50 75 and 90 percentiles, but we'll get the 5 and 95 percentiles soon and update the chart asap.
> I think these colors work but it will makes harder to see some dots, maybe we can return to the blue 
> schema for the dots?

Sounds good.

I might make the 25-75 line lighter -- it's not the focus of the graph -- but we can play with this once we have the 5 and 95 percentiles in, if you want.
Dots updated to the blue schema.
Once we have the 5 and 95 percentiles we'll let you know.
5 and 95 percentiles added, does it look ok?
How about the color of the 25-75 line?
I like it!

The only trouble is that the lines dividing the builds (Aurora N / Aurora N+1) are hidden by the other lines.  Could you open a separate bug for this?  Maybe you could use color or something to set apart the build lines.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.