Closed Bug 1839697 Opened 2 years ago Closed 2 years ago

Report Java exceptions in the Glean crash pings

Categories

(Fenix :: Crash Reporting, enhancement, P3)

All
Android
enhancement

Tracking

(firefox116 wontfix, firefox117 fixed)

RESOLVED FIXED
117 Branch
Tracking Status
firefox116 --- wontfix
firefox117 --- fixed

People

(Reporter: afranchuk, Assigned: afranchuk)

Details

Attachments

(3 files)

The crash pings added in bug 1810951 are minimal and only report native code crashes. We should also report java exceptions (which manifest as an application crash). These should be reported as a "main" process crash, or a new designator should be used, such as "application". The Java exception details could be included, though even omitting them for now will still improve the situation as we are currently under-reporting crashes.

This would affect lib-crash in the android-components project.

I want to distinguish java exception crashes from native code crashes in the ping metrics. I've come up with 3 options, but I'm wondering which is preferred (if there is precedence for this sort of thing):

  1. Use a different processType value. The java exception does technically come from a different process, but ultimately the user experiences something equivalent to a main crash. Since we generally use main crashes to determine the severity of crashes (users are always aware of main crashes after all), I'm hesitant to add another case that is the same severity.
  2. Use a new reason code for the crash ping. This would work, though I think the ping reason codes are intended to be used for the reason that the ping is being sent, so this might be considered a bit mismatched (but ultimately this could be okay).
  3. Make the distinction implicit based on the presence of other metrics in the ping. For example, we're likely going to add an exception metric which has the java exception details, which would be mutually exclusive with the stack_traces metric (which is designed specifically for native code stack traces so shouldn't be used to send e.g. exception stack traces).

chutten: is my assessment of the reason code correct? Opinions?
royang: any opinions or preferences on which would be best for ingestion and processing of ping data? I think they all should be approximately the same effort when it comes to filtering, with (3) being a bit less precise since it doesn't inherently enforce the mutual exclusivity. Though if we put some time into supporting cross-language stack traces the shape of these metrics will have to change anyway.

Note: There are also exceptions which aren't fatal. I'm not sure how we should handle these (are they even considered crashes?). But at the moment lib-crash doesn't call this method for the registered CrashTelemetryServices so it's not an immediate concern.

Flags: needinfo?(royang)
Flags: needinfo?(chutten)

We provide the tools, you decide how to use them : ) But I do have opinions.

I would definitely discourage implicit distinctions. Make it explicit regardless of how so that even if in the future we consolidate on stack_traces for language-independent stacks, we can still tell Java stacks from not-Java stacks.

As a data consumer myself, I'd not expect to find this information in processType or reason. I'd be looking for something like a string crash_language: java|native or a bool is_exception: true|false or some other standard metric. Are those acceptable options?

Flags: needinfo?(chutten)
Flags: needinfo?(royang)

(In reply to Chris H-C :chutten from comment #2)

We provide the tools, you decide how to use them : ) But I do have opinions.

I would definitely discourage implicit distinctions. Make it explicit regardless of how so that even if in the future we consolidate on stack_traces for language-independent stacks, we can still tell Java stacks from not-Java stacks.

As a data consumer myself, I'd not expect to find this information in processType or reason. I'd be looking for something like a string crash_language: java|native or a bool is_exception: true|false or some other standard metric. Are those acceptable options?

Okay, that's what I figured you'd say. I'm thinking something like cause: java_exception|native_error. My reasoning is that cause more accurately describes the domain of the value than crash_language. I.e., cause implies that the java_exception/os_fault relates to the initial frame/crash reason, whereas crash_language might be taken as relating to all of the information in the crash, but I think in the future we will try to have cross-language exception info. I don't like is_exception because it doesn't distinguish the language (though it could).

Attached file data-review-request.md
Attachment #9342151 - Flags: data-review?(royang)

Comment on attachment 9342151 [details]
data-review-request.md

DATA COLLECTION REVIEW RESPONSE:

  1. Is there or will there be documentation that describes the schema for the ultimate data set in a public, complete, and accurate way?

Yes, through the metrics.yaml file and the Glean Dictionary.

  1. Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

  1. If the request is for permanent data collection, is there someone who will monitor the data over time?

afranchuk@mozilla.com will be responsible for monitoring the data over time

  1. Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical data

  1. Is the data collection request for default-on or default-off?

Default on for all channels.

  1. Does the instrumentation include the addition of any new identifiers?

No

  1. Is the data collection covered by the existing Firefox privacy notice?

Yes

  1. Does the data collection use a third-party collection tool?

No

Result: data-review+

Attachment #9342151 - Flags: data-review?(royang) → data-review+
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: qe-verify+
Resolution: --- → FIXED
Target Milestone: --- → 117 Branch

I'll remove the qe-verify+ flag, since there is no need for QA manual verification.
Thank you!

Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: