Closed Bug 1525359 Opened 5 years ago Closed 5 years ago

Questions regarding Mac 67 Nightly Content Data

Categories

(Cloud Services :: Mission Control, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: marcia, Assigned: wlach)

Details

During the Channel meeting today we discussed the following items:

(1) The fact that we cannot see any Mac content information in https://missioncontrol.telemetry.mozilla.org/#/nightly/mac. Is it 0 because there isn't enough data to report on the platform?

(2) For 66 Nightly it is reporting 8.10 which sounds a bit high

(3) https://missioncontrol.telemetry.mozilla.org/#/?channel=nightly, Mac shows .18 which seems quite low, even compared to Linux.

Yeah, there's definitely a bug here, I think it's related to the low volume of crashes we're seeing on Mac nightly. Still trying to figure out the root cause.

Will ping back when I know more / have a fix.

Assignee: nobody → wlachance

(In reply to Marcia Knous [:marcia - needinfo? me] from comment #0)

During the Channel meeting today we discussed the following items:

(1) The fact that we cannot see any Mac content information in https://missioncontrol.telemetry.mozilla.org/#/nightly/mac. Is it 0 because there isn't enough data to report on the platform?

Ok, yeah, the problem is this innocuous-looking piece of code:

https://github.com/mozilla/missioncontrol/blob/1fa6387f873d3a86df398d7502ca3463d925e998/missioncontrol/etl/measuresummary.py#L141

We take the 99th percentile of 5-minute counts of errors when calculating the rate. This normally works fine and serves to automatically reject cases where one or two pings have a ridiculous error count (thus skewing the aggregate count). However, on Mac, less than 1% of the 5 minute error aggregations had any content crashes at all, thus we'd calculate a rate of 0.

I don't really want to spend a huge amount of time on this -- I think we already know at this point that the error aggregates table is dumb and we should replace it with something better (i.e. a dynamic aggregation of individual pings which can automatically reject outliers). So I'm going to do the minimal thing and bump the threshold to the 99.9th percentile. That makes the Mac results make sense again and shouldn't effect other rate calculations too much.

(2) For 66 Nightly it is reporting 8.10 which sounds a bit high

This is the adjusted / all issue again. The default (adjusted) compares release-to-release only for the duration that the most recent one has been around. If you click on "all" on the dashboard it should give you a number more in line with what you'd expect.

I'm going to make "all" the default since it seems to be the one that people expect to see.

(3) https://missioncontrol.telemetry.mozilla.org/#/?channel=nightly, Mac shows .18 which seems quite low, even compared to Linux.

This is related to (1).

Changed the rate caculation in mc:

https://github.com/mozilla/missioncontrol/pull/353

You can see the new results here:

https://data-missioncontrol.dev.mozaws.net/#/?channel=nightly

As you can see, this revises the numbers upwards on channels with a lower amount of data (primarily mac and linux on beta/nightly), but the main takeaway is that the number of content crashes is now closer to what we'd expect.

Assuming there are no major objections, I'll get whd to promote this to the main missioncontrol site (https://missioncontrol.telemetry.mozilla.org) tomorrow or so.

Fix has been promoted, going to call this done. Should probably blog about this at some point.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.