Nightly Fennec MC rate doubled since last week
Categories
(Cloud Services :: Mission Control, defect)
Tracking
(firefox67 affected)
Tracking | Status | |
---|---|---|
firefox67 | --- | affected |
People
(Reporter: marcia, Unassigned)
Details
(Whiteboard: [geckoview]?)
Last week at Tues/Thursday Channel meeting and Wednesday Cross Functional Meeting, MC was reporting a rate in the mid-20s. Today it is reporting almost double that - is there something that happened regarding usage hours? We merged 2019-01-28, so we are now a few weeks into the nightly 67 cycle.
https://wiki.mozilla.org/Firefox/Channels/Meetings/2019-02-07 - Unfortunately data was missing that day since I was out
https://wiki.mozilla.org/Firefox/Channels/Meetings/2019-02-12#Mobile- Nightly score was 25.39
https://wiki.mozilla.org/Firefox/Channels/Meetings/2019-02-14#Mobile - Nightly score was 24.89
https://public.etherpad-mozilla.org/p/channel-meeting - Today's Nightly rate is 45.73
Comment 1•5 years ago
|
||
It appears there was a large increase in crashes in the last couple of days:
Usage hours has remained relevant constant.
Reporter | ||
Comment 2•5 years ago
|
||
Interesting - I don't see an overall spike showing up either in Socorro or in our daily email notifications for nightly.
I wonder if the spike could be due to the ARM builds: https://crash-stats.mozilla.com/search/?product=FennecAndroid&version=67.0a1&date=%3E%3D2019-02-12T16%3A47%3A00.000Z&date=%3C2019-02-19T16%3A47%3A00.000Z&_facets=signature&_facets=cpu_arch&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-cpu_arch
Comment 3•5 years ago
|
||
Could that be related to the fact that we now offer x86_64 builds? (bug 1505538)
Comment 4•5 years ago
|
||
Did a quick redash query of the telemetry ping counts -- looks like the counts are pretty evenly divided between aarch64 and arm:
https://sql.telemetry.mozilla.org/queries/61536#158420
Whatever the problem was, it seems to have been decreasing in volume.
Comment 5•5 years ago
|
||
Socorro reports only 1 crash report from x86_64 Fennec and only 21 from 32-bit x86 Fennec, which is in line with the x86 Fennec crash rates before we started publishing x86_64 Fennec.
x86_64 Fennec was in the Google Play Store on February 13, three days before the crash rate started increasing. Maybe the increase was related to weekend usage? The crash rate appears to be decreasing now, though perhaps that is just telemetry processing lag?
Comment 6•5 years ago
|
||
The current ARM64 Fennec Nightly builds only have the Baseline JS JIT. The incomplete IonMonkey JS JIT maybe have been accidentally enabled (by bug 1523015) on February 13, causing crashes like bug 1528621). That might explain the increase in ARM64 Fennec crashes, but not the 32-bit ARM Fennec crashes.
Reporter | ||
Comment 7•5 years ago
|
||
https://crash-stats.mozilla.com/signature/?signature=js%3A%3Ajit%3A%3APatchJump shows 57 crashes/21 installs on that crash, which is really the only newish volume crash currently on nightly besides existing Bug 1521158.
As Will notes in Comment 4, the issue seems to have come and gone in the matter of a few days.
Comment 8•5 years ago
|
||
I broke things down on the 17th (the crashiest day) by build and client id:
https://sql.telemetry.mozilla.org/queries/61563/
It looks like a single client running build 20190216093716 is responsible for 15% (191 count) of the crashes which would explain some of the distortion, although the remainder of the crashes seem reasonably well distributed at a first glance. It would really be nice to know what exactly crashed and how: as we've mentioned before, we get a bunch of crashes in telemetry that don't make it to socorro.
As it is though, I'm not sure if I can justify the effort involved in extracting the pings and running an analysis, given that this is a transient problem. We should be symbolicating these pings and generating automatic reports later in 2019.
Adding :chutten here in case he has anything to add.
Comment 9•5 years ago
|
||
Taking a look at the MozCrashReason, the most common reason for the crash is NULL: https://sql.telemetry.mozilla.org/queries/61570/source
I don't know enough about the character of Fennec crashes to guess what causes a NULL reason.
Reporter | ||
Comment 10•5 years ago
|
||
Today's Fennec rate also zoomed up to 199.30, with main crashes showing an increase of 438%. Is there another spike that is showing on the Telemetry side for this increase?
Comment 11•5 years ago
|
||
(In reply to Marcia Knous [:marcia - needinfo? me] from comment #10)
Today's Fennec rate also zoomed up to 199.30, with main crashes showing an increase of 438%. Is there another spike that is showing on the Telemetry side for this increase?
This is another case where the way mission control calculates things can you throw you off. In this case we stopped incorporating any data from 67.0 into the nightly rate, which meant we only had the 68 data (which has less usage hours associated with it). You can see there's nothing remarkable happening by zooming in on the data in the graph:
As you can see the overall number of crashes has remained relatively constant over the last week. I would expect the nightly rate calculation to settle down soon.
Comment 12•5 years ago
|
||
P.S. In the future, could you please file a new bug for each issue that you see, rather than piggy-backing on top of old issues like this? Having a bunch of unrelated problems attached to a single bug report makes it difficult to track/understand how many issues we're seeing over time. If you think the issue might be related, feel free to link to other bugs in a new report. It's easy to mark reports as duplicate after the fact.
Reporter | ||
Comment 13•5 years ago
|
||
(In reply to William Lachance (:wlach) (use needinfo!) from comment #12)
P.S. In the future, could you please file a new bug for each issue that you see, rather than piggy-backing on top of old issues like this? Having a bunch of unrelated problems attached to a single bug report makes it difficult to track/understand how many issues we're seeing over time. If you think the issue might be related, feel free to link to other bugs in a new report. It's easy to mark reports as duplicate after the fact.
Yes, sorry about that.
Comment 14•5 years ago
|
||
Marcia, is this bug about Fennec 67 Nightly's MC rate still relevant?
Reporter | ||
Comment 15•5 years ago
|
||
Resolving as WFM, as Comment 8 explains what happened in this case.
Description
•