Closed Bug 1666733 Opened 4 years ago Closed 4 years ago

Add error reporting to minidump generation

Categories

(Toolkit :: Crash Reporting, enhancement)

Unspecified
All
enhancement

Tracking

()

RESOLVED FIXED
87 Branch
Tracking Status
firefox87 --- fixed

People

(Reporter: gsvelto, Assigned: msirringhaus)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files)

We still have a significant volume of crash reports with empty minidump. The code we use for minidump generation does not report the cause of a failure so we have no way to figure out why we're failing.

We should add proper error-reporting in this bug. This entails a few different things:

  • Overhaul the crash generators for Windows, macOS and Linux to report a meaningful error when failing to write a minidump (e.g. out-of-memory, disk full, missing data, etc...)
  • Add a crash annotation which will include the type of error
  • Modify crash submission and crash pings to include the annotations upon failure
Blocks: 1655196
Blocks: 610551

Linux only, as part of the oxidization effort (Bug 1620993) of breakpad.

Depends on D103331

Assignee: nobody → msirringhaus
Status: NEW → ASSIGNED
Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9ce8fc97afa5 Rebase to latest upstream changes (ARM specific, which is still deactivated) r=gsvelto https://hg.mozilla.org/integration/autoland/rev/ffd0b3afde31 Add error reporting to minidump generation (Linux) r=gsvelto
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 87 Branch
Blocks: 1746850

Comment on attachment 9200879 [details]
Bug 1666733 - Add error reporting to minidump generation (Linux) r=gsvelto

Retroactively requesting a data review for this annotation

  1. What questions will you answer with this data?

Why did we fail to write a minidump after a process crashed

  1. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses:

We're missing information from a number of crash reports, especially under Linux and Android. Until know we can only know if generating a minidump during a crash failed but we have no further information. Additional information in this area should allow us to increase the crash coverage and improve Firefox' stability.
3) What alternative methods did you consider to answer these questions? Why were they not sufficient?

We tried reproducing issues locally but never succeeded. We can't easily reproduce the variety of scenarios that can happen at crash time on our users' machines.

  1. Can current instrumentation answer these questions?

No, we have no data on this.

  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.
Measurement Description Data Collection Category Tracking Bug #
A description of the OS-level error we encountered Category 1 “Technical data” 1666733
  1. Please provide a link to the documentation for this data collection which describes the ultimate data set in a public, complete, and accurate way.

This annotation's description appears in our sourcecode. We don't currently generate documentation from it but we could in the future.

  1. How long will this data be collected? Choose one of the following:

We plan to gather this data as long as we haven't figured out the issues we encountered. This will probably take around a year. We'll retire the annotation when we've addressed all the underlying issues.

  1. What populations will you measure?

All release channels in all countries for Linux/Android users.

  1. If this data collection is default on, what is the opt-out mechanism for users?

This is part of a crash report and as such is opt-in rather than opt-out.

  1. Please provide a general description of how you will analyze this data.

We will look at this data on crash-stats.mozilla.com to understand why we're failing to capture minidumps. We'll analyze the different errors based on the part of code they affect.

  1. Where do you intend to share the results of your analysis?

We'll file bugs on bugzilla.

  1. Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection? If so:

This will go to crash-stats.mozilla.com rather than telemetry.

Attachment #9200879 - Flags: data-review?(tlong)

Comment on attachment 9200879 [details]
Bug 1666733 - Add error reporting to minidump generation (Linux) r=gsvelto

Data Review

  1. Is there or will there be documentation that describes the schema for the ultimate data set in a public, complete, and accurate way?

Yes, through the CrashAnnotations.yaml file

  1. Is there a control mechanism that allows the user to turn the data collection on and off?

The collection of crash data is opt-in, so turning the data collection off equates to the user dismissing the Send Crash Report dialog.

  1. If the request is for permanent data collection, is there someone who will monitor the data over time?

N/A, collection set to end or be renewed in 1 year (estimate January of 2023).

  1. Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical data

  1. Is the data collection request for default-on or default-off?

default-off (The collection of crash statistics is opt-in by the user)

  1. Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?

No

  1. Is the data collection covered by the existing Firefox privacy notice?

Yes

  1. Does the data collection use a third-party collection tool?

No

Result

data-review+

Attachment #9200879 - Flags: data-review?(tlong) → data-review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: