Open Bug 1696494 Opened 4 years ago Updated 4 years ago

investigate increase in macOS crash rate with Nightly in MissionControl v2, starting 2021-02-23

Categories

(Cloud Services :: Mission Control, task)

task

Tracking

(Not tracked)

People

(Reporter: aryx, Unassigned)

Details

MissionControl v2 shows a 40-50% increase of the main crash rate for Nightly starting on 2021-02-23.

Usage hours are on a similar level before and after. It seems to be triggered by IPCError-browser | ShutDownKill | __psynch_cvwait | mozilla::TaskController::GetRunnableForMTTask | mozilla::ipc::MessagePump::Run but the date doesn't align exactly and MCv2 shows a jump and no big swings afterwards.

Will, do you have insight what causes the change?

missioncontrol v1 doesn't have a great view by build_id, but I tried creating a query using the error_aggregates dataset in redash, which just provides raw crash counts:

https://sql.telemetry.mozilla.org/queries/78498/source#195129

There doesn't seem to be a sustained increase in main crashes that would correspond to what we're seeing in v2. So I'm a little mystified about what might be happening...

From the source code of MC2 with c as main crash rate:

        {
          "os": "Darwin",
          "date": "2021-02-22",
          "major": 87,
          "minor": 20210221,
          "cv": "20210221",
          "adoption": 0.2618,
          "modelname": "cmr",
          "orig": 0.8101,
          "c": 1.0907,
          "lo90": 0.8306,
          "hi90": 1.3762,
          "channel": "nightly",
          "ordering": 75,
          "Date": "2021-02-22"
        },
        {
          "os": "Darwin",
          "date": "2021-02-23",
          "major": 87,
          "minor": 20210222,
          "cv": "20210222",
          "adoption": 0.2821,
          "modelname": "cmr",
          "orig": 0.6311,
          "c": 1.6186,
          "lo90": 1.1893,
          "hi90": 2.2236,
          "channel": "nightly",
          "ordering": 76,
          "Date": "2021-02-23"
        },

The code gets generated with this script.

Do we have insight what data gets used for the calculation?

Summary: investigate increase in macOS crash rate with Nightly, starting 2021-02-23 → investigate increase in macOS crash rate with Nightly in MissionControl v2, starting 2021-02-23

The stability data MC2 returns seems to be decoupled from reality:

  1. There were jumps in the main crash rate for the last 2 merge days which persisted during those cycles. Switch to the 'Nightly' tab at MC2.
  2. Frequent Nightly crashers don't move the crash rate, see bug 1700525, bug 1700614, bug 1701151, bug 1523500 comment 22.
You need to log in before you can comment on or make changes to this bug.