Measure subprocess crash rates in telemetry

RESOLVED FIXED in Firefox 35

Status

()

Toolkit
Telemetry
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: bsmedberg, Assigned: bsmedberg)

Tracking

unspecified
mozilla37
x86
Windows 7
Points:
---

Firefox Tracking Flags

(firefox35 fixed, firefox36 fixed, firefox37 fixed)

Details

Attachments

(1 attachment)

(Assignee)

Description

3 years ago
It's easy to measure subprocess crash rates in telemetry (plugin/content/gmplugin). This will make it easier to collect data from the Flash protected-mode experiment.
(Assignee)

Comment 1

3 years ago
Created attachment 8535626 [details] [diff] [review]
crash-telemetry

This patch is on top of bug 1108035 (minor overlapping of #include changes, not anything in the code).
Attachment #8535626 - Flags: review?(georg.fritzsche)
(Assignee)

Updated

3 years ago
Attachment #8535626 - Attachment is patch: true
Comment on attachment 8535626 [details] [diff] [review]
crash-telemetry

Review of attachment 8535626 [details] [diff] [review]:
-----------------------------------------------------------------

If we want crash rates, do we have a metric to correlate the crashes against? I.e. a process creation counters for content, plugin, gmplugin.

Keyed histograms seem like overkill here (for memory and payload footprint) - can't we just use an enumerated histogram here?

::: dom/ipc/CrashReporterParent.cpp
@@ +172,5 @@
>  
>      int32_t processType;
>      int32_t crashType = nsICrashService::CRASH_TYPE_CRASH;
>  
> +    const char* telemetryKey = nullptr;

Plain pointers are not great because ownership etc. is really unclear.
|nsDependentCString key; key.AssignLiteral(...);| instead?
Attachment #8535626 - Flags: review?(georg.fritzsche)
(Assignee)

Comment 3

3 years ago
> If we want crash rates, do we have a metric to correlate the crashes
> against? I.e. a process creation counters for content, plugin, gmplugin.

I'm not sure I understand the question. For the current experiment, it really doesn't matter, since we can just compare the test group and the control group.

In general, we should be able to chart these metrics against session length and against active ticks to see which is the most interesting denominator.

> Keyed histograms seem like overkill here (for memory and payload footprint)
> - can't we just use an enumerated histogram here?

I don't want to, because in a future patch I plan on including the plugin tag name into the key so that we can measure crash rates separately per plugin type.

> > +    const char* telemetryKey = nullptr;
> 
> Plain pointers are not great because ownership etc. is really unclear.
> |nsDependentCString key; key.AssignLiteral(...);| instead?

Since all of these are literals, why is the ownership unclear? I can change it, but if there's a way to just make it readable, the code should be safe in either case.
(Assignee)

Updated

3 years ago
Flags: needinfo?(georg.fritzsche)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> > If we want crash rates, do we have a metric to correlate the crashes
> > against? I.e. a process creation counters for content, plugin, gmplugin.
> 
> I'm not sure I understand the question. For the current experiment, it
> really doesn't matter, since we can just compare the test group and the
> control group.
> 
> In general, we should be able to chart these metrics against session length
> and against active ticks to see which is the most interesting denominator.

Ok. If we don't have a process instantiation count probe we should file a separate bug for this then - e.g. 50% vs. 100% plugin crashes would be significant, right?

> > Keyed histograms seem like overkill here (for memory and payload footprint)
> > - can't we just use an enumerated histogram here?
> 
> I don't want to, because in a future patch I plan on including the plugin
> tag name into the key so that we can measure crash rates separately per
> plugin type.

Ah, ok then. (Side-note: All potential plugin tag names? Doesn't this run into the same privacy concerns we had on an earlier bug on finding out what plugins are being used in the wild?)

> > > +    const char* telemetryKey = nullptr;
> > 
> > Plain pointers are not great because ownership etc. is really unclear.
> > |nsDependentCString key; key.AssignLiteral(...);| instead?
> 
> Since all of these are literals, why is the ownership unclear? I can change
> it, but if there's a way to just make it readable, the code should be safe
> in either case.

With this i had to read both the definition and the code that assigns to figure out the ownership/semantics.
With nsDependentCString it would be clear to me (and we use that type further down anyway).

I think we should avoid plain pointers where reasonable.
Flags: needinfo?(georg.fritzsche)
Attachment #8535626 - Flags: review+
(Assignee)

Comment 5

3 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/185c34ef0199
(Assignee)

Comment 6

3 years ago
Filed bug 1110996 for the launch metric.
(Assignee)

Comment 7

3 years ago
Comment on attachment 8535626 [details] [diff] [review]
crash-telemetry

Approval Request Comment
[Feature/regressing bug #]: New code for measuring plugin crash rates
[Describe test coverage new/current, TBPL]: Manual testing
[Risks and why]: Low risk; slight memory usage increase when telemetry is enabled.
[String/UUID change made/needed]: None. New items were added to Histograms.json changed, but I don't think that can harm addons.
Attachment #8535626 - Flags: approval-mozilla-beta?
Attachment #8535626 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/185c34ef0199
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
(Assignee)

Comment 9

3 years ago
Comment on attachment 8535626 [details] [diff] [review]
crash-telemetry

got a=lsblakk over email
Attachment #8535626 - Flags: approval-mozilla-beta?
Attachment #8535626 - Flags: approval-mozilla-beta+
Attachment #8535626 - Flags: approval-mozilla-aurora?
Attachment #8535626 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/a1a66ef2872d

Needs rebasing for beta uplift.
status-firefox35: --- → affected
status-firefox36: --- → fixed
status-firefox37: --- → fixed
Flags: needinfo?(benjamin)
(Assignee)

Comment 11

3 years ago
https://hg.mozilla.org/releases/mozilla-beta/rev/3bec2715f236
status-firefox35: affected → fixed
Flags: needinfo?(benjamin)
Backed out of Aurora for build bustage: 

https://treeherder.mozilla.org/ui/logviewer.html#?job_id=450357&repo=mozilla-aurora

https://hg.mozilla.org/releases/mozilla-aurora/rev/05f1c9329015
Flags: needinfo?(benjamin)
status-firefox36: fixed → affected
Was missing an #include. Fixed.
https://hg.mozilla.org/releases/mozilla-aurora/rev/0128b632e941
status-firefox36: affected → fixed
Flags: needinfo?(benjamin)
You need to log in before you can comment on or make changes to this bug.