Closed Bug 1602918 Opened 4 years ago Closed 4 years ago

Collect telemetry on crashes, if one Fission docshell was open

Categories

(Core :: Performance, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla76
Fission Milestone M6
Tracking Status
firefox76 --- fixed

People

(Reporter: ccd, Assigned: barret)

References

Details

Attachments

(2 files)

Attached file fission_crash.txt

The proposal is to the collect telemetry on cases when there is a crash, and at least one Fission docshell was open.

Attachment #9114978 - Flags: data-review+

Tracking for Fission dogfooding (M5)

Fission Milestone: --- → M5

Sean said he'll work on adding this.

Assignee: nobody → sefeng
Status: NEW → ASSIGNED

Could someone clarify a bit what exactly should be measured here, just to ensure we'll get the right kind of patch and proper review for it?
Thanks.

Flags: needinfo?(cpeterson)
Flags: needinfo?(cdowhygelund)
Priority: -- → P2

I believe, but will defer to @cdowhygelund , that is to infer crash rates when a fission dochshell is open. The crash could have ocurred elsewhere but it is during a session when a fission docshell was open and present.

Without it crash rates cannot be separated out when a user has used fission -window or not.

Blocks: fission-perf

I think this probe is going to be used to answer how frequent a crash is related to Fission. And just by looking at the Fission flag isn't sufficient.

As Saptarshi and Sean noted, this will best determine crashes related to Fission. As, it is possible to have Fission enabled and not use Fission, this probe focuses on crashes where there is some Fission-related activity occurring . This probe will record cases where a crash occurred somewhere, and at least one Fission docshell was open. Over time we can use this probe to measure aspects of Fission stability.

Flags: needinfo?(cdowhygelund)
Flags: needinfo?(cpeterson)
Blocks: fission-telemetry
No longer blocks: fission-perf

Olli, do you think the above explanations valid? If so, who is the proper reviewer? Thanks!

Flags: needinfo?(bugs)

So is this about crash reports or telemetry? And if latter, is this about child processes only?

Flags: needinfo?(bugs)

I think the idea is, It's going to be a categorical histogram probe which has 2 categories, CRASH_WITH_FISSION_DOCSHELL and CRASH_WITHOUT_FISSION_DOCSHELL .

Once we catches a crash (not sure how this happens in our code), we see if there was a fission docShell open when the crash occurred, and we increase the probe accordingly.

So by using this probe, we can get a sense of how stable fission is. For example, the initial data could look like 10% crashes had Fission docshell open, and after a couple of months, the number could drop to 5%.

Olli, what do you think?

Corey, is my understanding correct?

Flags: needinfo?(cdowhygelund)
Flags: needinfo?(bugs)

But telemetry probes run usually in the Firefox process. And if that process crashes, what are we collecting and where?
Or is this only about child processes and then parent keeps track whether the child process has fission docshells?

Flags: needinfo?(bugs)

wbeard: What are your thoughts on how this probe should be implemented? The desire is to have a Fission analog of the existing crash counts metrics to determine the stability of Fission, as Sean noted. Should this be a crash report? Or should it be telemetry, but only collecting crash counts for child processes as Olli has noted above?

For context, Fission-enabled Firefox can have both non-Fission and Fission windows. The idea of this probe is to measure crashes where at least one Fission docshell was open (e.g., Fission was being used).

Flags: needinfo?(cdowhygelund) → needinfo?(wbeard)

I'm not deeply familiar with the implementation details of the current crash probes, but for crash rates we would need telemetry rather than crash reporter (the latter is opt-in and doesn't give us generalizable numbers).

Currently mission control splits this into content crashes and main/browser crashes. I assume we would only be interested in content crashes here? That is, the number of content crashes that come from a Fission window?

Flags: needinfo?(wbeard)

I believe that, in addition to a "Crash Report", we also send some extremely limited information about crashes in the "crash ping" (https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/data/crash-ping.html), which iirc is treated more like other telemetry pings. Given that :mccr8 has already added a DOMFissionEnabled entry to the crash report in bug 1560977, it may be acceptable to upgrade this annotation and also add it to the crash ping.

Would that be sufficient for this type of telemetry?

It would be desirable to have the probe measure if at least one docshell was opened, rather than Fission being enabled. This is due to the possibility of having Fission enabled, but opening non-Fission windows. However, for crashes, I believe this is an edge case, and adding DOMFissionEnabled to the crash ping will suffice.

The probe bug 1560977 actually checks whether a fission docshell was opened, not whether fission is globally enabled, already.
If every fission window was closed, it does not clear the bit, as we didn't want to miss fission-related crashes occurring during shutdown.

Probe bug 1560977 sounds like what we are looking for, and could be used for this telemetry.

Move Fission telemetry probe bugs from M5 dogfooding milestone to M6 Nightly.

Fission Milestone: M5 → M6
Assignee: sefeng → brennie
Pushed by brennie@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a8e451a69789
Report DOMFissionEnabled in crash pings r=nika
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76
Comment on attachment 9114978 [details]
fission_crash.txt

This data collection review form never received a data-review from a data steward, so forwarding it to :chutten.
Attachment #9114978 - Flags: data-review+ → data-review?(chutten)
Comment on attachment 9114978 [details]
fission_crash.txt

DATA COLLECTION REVIEW RESPONSE:

    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes. This collection is a CrashAnnotation which in general is documented [in-tree](https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/data/crash-ping.html) and specifically this collection will have its description in [CrashAnnotations.yaml](https://searchfox.org/mozilla-central/source/toolkit/crashreporter/CrashAnnotations.yaml).

    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is reported via Telemetry and Crash Reports which can both be controlled through Firefox's Preferences. Also, Crash Reports can be controlled in the Crash Reporter Client when a crash happens.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Corey Dow-Hygelund is responsible.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical. (The request says Cat2, but whether Fission is enabled in a crashed tab is a technical detail, not data about user interaction)

    Is the data collection request for default-on or default-off?

Default on for all channels.

    Does the instrumentation include the addition of any new identifiers?

No.

    Is the data collection covered by the existing Firefox privacy notice?

Yes.

    Does there need to be a check-in in the future to determine whether to renew the data?

No. This collection is permanent.

---
Result: datareview+
Attachment #9114978 - Flags: data-review?(chutten) → data-review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: