Closed Bug 1315996 Opened 4 years ago Closed 4 years ago

Investigate why Beta's crash rates are better than Release according to arewestableyet.com and the other way around according to the new Telemetry dashboard

Categories

(Toolkit :: Telemetry, defect, P4)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox52 --- affected

People

(Reporter: marco, Unassigned)

References

Details

"Crash rate (b+c)" on arewestableyet.com is lower in Beta than Release; "M+C" and "M+C-S" the other way around (lower in Release than in Beta).

We should figure out why, and this bug could serve as documentation.
arewestableyet.com uses crashes reported to socorro per 100 ADI. The build versions that are measured are determined by the full version number string (e.g. 50.0b7)

The Telemetry Crash dashboard (https://chutten.github.io/telemetry_crashes/) uses crashes reported via Telemetry per 1000 usage hours (kilo usage hours or kuh). The build versions that are measured are determined based on the partial version number string (e.g. 50.0)

So there are quite a lot of dimensions to this problem that could result in what we see:
* The ratio of ADI to kuh could be different on Beta and Release (a larger proportion of users may leave their browsers open longer on Release, artificially lowering the crash rate measured through Telemetry)
* The submission of crashes to Socorro might be higher on Release than Beta
* The versions included in the Telemetry crash dashboard may be crashier (or have either or both of the above two characteristics) than the versions measured on arewestableyet.com
* The act of normalizing (via ADI or kuh) may magnify statistical anomalies making something that is "too close to call" vary one way on one dash and the other way on the other dash.

There are probably other likely variations caused by the differences in collection, reporting, submission, normalization, and presentation.
Sounds like the only normalizer we could add is to not use partial version number strings in telemetry_crashes. :\
That would require updates to client_count. The dashboard's doing what it can with what it's given :S
Oh, and I forgot something else. There's a huge difference in reporting percentage between browser crashes, content crashes, and content shutdown crashes. That'll change the reported values of these combination metrics.
(In reply to Chris H-C :chutten from comment #1)
> So there are quite a lot of dimensions to this problem that could result in
> what we see:

Could we verify if one of these suppositions is correct (at least the ones that we can verify)?

> * The versions included in the Telemetry crash dashboard may be crashier (or
> have either or both of the above two characteristics) than the versions
> measured on arewestableyet.com

(In reply to David Durst [:ddurst] from comment #2)
> Sounds like the only normalizer we could add is to not use partial version
> number strings in telemetry_crashes. :\

I haven't really used arewestableyet.com much myself, but I suppose release managers would be more interested in the latest X betas than all betas, so we should probably change this (if at all possible, from your last comment it seems it would be pretty hard).
I'm CCing them so that they can chime in.
To add one more data point, on Socorro ~8% of Beta crashes in the past week were with Betas < b6 (arewestableyet.com is currently showing b7 - b99).
Yeah, so Chris brings up important points. When comparing new dashboard to old dashboard, we have to remember that a) the basic metric is not the same, and b) the old dashboard is reported crashes (which is statistically less data than telemetry (which is itself a subset of what's out there)).

I think it would be good to verify one of those suppositions, but we need to keep this in mind when comparing dashboards -- we *are* going to see differences. So the subsequent questions about what causes those differences or what we'd verify -- those are the kinds of additional detail/more granularity we should make available by default to accompany the new telemetry-based dashboard.
(In reply to David Durst [:ddurst] from comment #7)
> I think it would be good to verify one of those suppositions, but we need to
> keep this in mind when comparing dashboards -- we *are* going to see
> differences. So the subsequent questions about what causes those differences
> or what we'd verify -- those are the kinds of additional detail/more
> granularity we should make available by default to accompany the new
> telemetry-based dashboard.

Of course, I've filed the bugs exactly for this purpose: to document the differences and try to explain them if possible.
I think it is needed to make users of the old dashboard understand them and trust the new dashboard.
Priority: -- → P4
Using my partially-scientific study of submission rates[1] we see that release submission rates are higher than beta submission rates. This will result in an inflation of apparent crash rates on release on arewestableyet.com since release(b+c) will be proportionally higher than beta(b+c).

[1]: https://gist.github.com/chutten/7a5bfc86f29a4292b9af45c48a542f41
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.