Closed Bug 1393107 Opened 3 years ago Closed 9 months ago

Negative Crash Counts in error_aggregates

Categories

(Data Platform and Tools :: General, enhancement, P2)

enhancement

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: frank, Unassigned)

References

Details

Assignee: nobody → bmiroglio
This is not specific to error_aggregates. Crash_aggregates is showing this as well: https://sql.telemetry.mozilla.org/queries/23068/source#59866
Component: Mission Control → Datasets: Crash Aggregates
Product: Cloud Services → Data Platform and Tools
We are seeing more content_shutdown_crashes than content_crashes for clients in the webextensions-multiBucket4 cohort on Beta:

https://sql.telemetry.mozilla.org/queries/23094/source#

I'll reach out to add-ons team / cc'ing andym + chutten
chutten looked into this and it looks an extension-process ShutDownKill is being counted as a content_shutdown_crash but not a content_crash. Please see the sources he provided below. Not sure where to go from here--Andy what do you think?



https://github.com/mozilla/telemetry-streaming/blob/master/src/main/scala/com/mozilla/telemetry/streaming/ErrorAggregator.scala#L277

http://searchfox.org/mozilla-central/source/dom/ipc/ContentParent.cpp#3019
Flags: needinfo?(amckay)
Either that or we suddenly started having difficulty generating minidumps for shutdown kills (the callback isn't getting called and thus we're not incrementing SUBPROCESS_CRASHES_WITH_DUMP/content)
Bug 1360308 landed on June 22nd so there's probably something else that caused that since the drop started around August 10th. Probably :cyu knows more about this me though, perhaps there's something else going on.
Flags: needinfo?(amckay) → needinfo?(cyu)
(In reply to Andy McKay [:andym] from comment #5)
> Bug 1360308 landed on June 22nd so there's probably something else that
> caused that since the drop started around August 10th. Probably :cyu knows
> more about this me though, perhaps there's something else going on.

If this started to show up quite recently, then bug 1360308 is unlikely the cause. Take ContenParent::KillHard() for example, it generates the minidumps asynchronously off the main thread and blocks shutdown until the paired minidumps are generated. We do see some users had difficulty generating minidumps like in bug 1387369. Could this be the case where we see a negative crash count?
Flags: needinfo?(cyu)
Assignee: bmiroglio → nobody
See Also: → 1413172
:chutten, does comment 6 seem a likely explanation? 

Is the implication here that we may be undercounting content crashes?
Flags: needinfo?(chutten)
Priority: -- → P2
Component: Datasets: Crash Aggregates → Datasets: Error Aggregates
(In reply to Mark Reid [:mreid] from comment #7)
> :chutten, does comment 6 seem a likely explanation? 
> 
> Is the implication here that we may be undercounting content crashes?

That is a possible interpretation, yes. The other possibility is that we're overcounting shutdown crashes.

Over in bug 1413172 I've looked at this in a little more detail (now that we have the tools to do so with bug 1410143), the tl;dr of which is: the number of shutdown crashes reported by main pings (via SUBPROCESS_KILL_HARD/ShutDownKill) is 2x-3x the number of shutdown crashes reported by crash pings (via ipc_channel_error being "ShutDownKill")

The only speculation I have at present is that e10s-multi might increase the number of SUBPROCESS_KILL_HARD, but maybe the crash manager only registers this as one crash?
Flags: needinfo?(chutten)

Moving to General.

Component: Datasets: Error Aggregates → General
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.