Closed Bug 1453485 Opened 6 years ago Closed 6 years ago

Calculate content_crashes and content_shutdown_crashes *seperately* from crash pings, not main pings

Categories

(Data Platform and Tools Graveyard :: Datasets: Error Aggregates, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wlach, Assigned: akomar)

References

Details

Attachments

(3 files)

Based on a discussion on #missioncontrol I had with :chutten, I think we should change the way we handle content crashes. Currently we parse them based on counts in the main ping. The content_crashes number in the main ping includes crashes occuring on shutdown (content_shutdown_crashes). In theory you should be able to derive the number of "non shutdown" crashes by subtracting the latter from the former, but in practice this doesn't work right now due to bug 1413172.

We might be able to fix that, but I think this is unnecessarily brittle. We should be able to calculate the two seperately by looking at crash pings and counting those pings with the "ipc_channel_error=ShutDownKill" annotation as content_shutdown_crashes and those without it as pure content_crashes. This has the added advantage of getting us the data faster.

This changes the semantics of error_aggregates somewhat and would only work post-version-58 (when :chutten added the annotation) but I'm personally willing to live with that. The end result of this would hopefully be a much-easier-to-understand-and-interpret mission control dashboard (we would no longer need to derive and/or explain the confusing "content - content_shutdown" distinction).

Discussion on IRC:

https://mozilla.logbot.info/missioncontrol/20180411#c14593695

If this work is done, we should also update the documentation on docs.telemetry.mozilla.org: http://docs.telemetry.mozilla.org/datasets/streaming/error_aggregates/reference.html
Blocks: 1454642
Assignee: nobody → akomarzewski
Priority: -- → P1
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Will, I wonder if we should filter out versions pre-58 from this metric's calculation.
In the implementation I just merged, for pre-version-58, every crash ping contributes to `content_crashes` (there's no `ShutDownKill` annotation yet). If we filter out earlier versions from this, both `content_crashes` and `content_shutdown_crashes` will show `0`.

This metric doesn't make sense anyway for pre-58, but I wanted to be sure it's in line with convention for this view (as there might be other similar cases).
Flags: needinfo?(wlachance)
(In reply to akomarzewski from comment #2)
> Will, I wonder if we should filter out versions pre-58 from this metric's
> calculation.
> In the implementation I just merged, for pre-version-58, every crash ping
> contributes to `content_crashes` (there's no `ShutDownKill` annotation yet).
> If we filter out earlier versions from this, both `content_crashes` and
> `content_shutdown_crashes` will show `0`.
> 
> This metric doesn't make sense anyway for pre-58, but I wanted to be sure
> it's in line with convention for this view (as there might be other similar
> cases).

I'm fine with whatever we do here as long as we document it. For the most part, any version older than 58 is pretty uninteresting at this point from the point of view of error_aggregates.
Flags: needinfo?(wlachance)
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: