Closed Bug 1726923 Opened 3 months ago Closed 3 months ago

Wayland protocol errors are not handled by the crash reporter

Categories

(Core :: Graphics, defect)

defect

Tracking

()

RESOLVED FIXED
93 Branch
Tracking Status
firefox93 --- fixed

People

(Reporter: rmader, Assigned: rmader)

References

(Blocks 2 open bugs, )

Details

Attachments

(3 files, 1 obsolete file)

This has been a know issue for some time: whenever we crash because of a protocol error (which is always a bug on our side), the crash will not be handled and reported. In order to be sure we get crash reports for such bugs, investigate how we can propagate such errors.

Severity: -- → S3
See Also: → 1709254

It looks like we can use wl_log_set_handler_client() and crash there. There are a few exceptions where wl_log is called without resulting in an abort, but apparently that's only wl_proxy_add_listener() and wl_proxy_add_dispatcher() - both should be easy to handle. This is also how what Xwayland does, see https://gitlab.freedesktop.org/wayland/wayland/-/issues/231#note_1044026

To ensure Wayland protocol errors trigger crash reports.
This is inspired by Xwayland, which handles things similarly.

Note: in theory there are a few cases we wouldn't need to crash on
wl_log - but in practice we always want to.

Assignee: nobody → robert.mader
Status: NEW → ASSIGNED

https://searchfox.org/mozilla-central/rev/cd31ddf343ea3a39e64fe9b74ab1b246f01a8bd2/mfbt/Assertions.h#239,267

MOZ_CRASH_UNSAFE(explanation-string) can be used if the explanation string
cannot be a string literal

MOZ_CRASH_UNSAFE_PRINTF(format, arg1 [, args]) can be used when more
information is desired than a string literal can supply.

Would it even be safe (privacy-wise) to include more precise protocol error information? All crash reasons are public on crash-stats.mozilla.org. I assume that's why they're called unsafe.

(In reply to Darkspirit from comment #4)

Would it even be safe (privacy-wise) to include more precise protocol error information? All crash reasons are public on crash-stats.mozilla.org. I assume that's why they're called unsafe.

Protocol errors, as far as I know, are usually static strings from libwayland or the the compositor. The only non-static data I've seen so far are surface ids and similar stuff. If there was ever privacy sensitive data in them, I'd consider that a compositor bug.
Given that it's quite common to ask for WAYLAND_DEBUG logs in public issue reports and I've never seen any private related info, I think we can do this.

Here's a try build that, with gfx.webrender.compositor.force-enabled turned on, crashes after a bit of scrolling and thus demonstrates that indeed the crash reporter shows up with the appropriate crash reason.

https://treeherder.mozilla.org/jobs?repo=try&revision=5174b081e9445877799921da2e9a414947cbf7a4

This reverts D123286 as we now have a generic way to trigger crash
reports on Wayland protocol errors.

Depends on D123894

Attachment #9238692 - Flags: data-review?(robert.mader)
Attachment #9238692 - Flags: data-review?(robert.mader) → data-review?(tlong)

Comment on attachment 9238692 [details]
Wayland protocol error message collection request

:rmader

Can you please provide answers for items 11 (where you intend to share the results) and 12 (are you using a third-party tool for analysis)?

Also, for permanent collections, an individual contributor must be responsible for monitoring the collection, you can still reference the team as a fallback.

Thanks!

Flags: needinfo?(robert.mader)
Comment on attachment 9238692 [details]
Wayland protocol error message collection request

>
># Request for data collection review form
>
>**All questions are mandatory. You must receive review from a data steward peer on your responses to these questions before shipping new data collection.**
>
>1) What questions will you answer with this data?
>
>The protocol error that trigged the Wayland compositor (display server / OS component) to abort Firefox.
>
>2) Why does Mozilla need to answer these questions?  Are there benefits for users? Do we need this information to address product or business requirements? Some example responses:
>
>It is the crash reason. Knowing it is essential for us to fix bugs in our code.
>
>3) What alternative methods did you consider to answer these questions? Why were they not sufficient?
>
>Not collecting the crash reason. Unfortunately we would then only know that there is a Wayland protocol error, but have no idea which or where.
>
>4) Can current instrumentation answer these questions?
>
>The closest thing is manual bug reports by users.
>
>5) List all proposed measurements and indicate the category of data collection for each measurement, using the [Firefox data collection categories](https://wiki.mozilla.org/Data_Collection) found on the Mozilla wiki.   
>
>**Note that the data steward reviewing your request will characterize your data collection based on the highest (and most sensitive) category.**
>
><table>
>  <tr>
>    <td>Measurement Description</td>
>    <td>Data Collection Category</td>
>    <td>Tracking Bug #</td>
>  </tr>
>  <tr>
>    <td>Wayland protocol / OS error description</td>
>    <td>Technical data</td>
>    <td>1726923</td>
>  </tr>
></table>
>
>6) Please provide a link to the documentation for this data collection which describes the ultimate data set in a public, complete, and accurate way.
>
>https://wayland.freedesktop.org/docs/html/apa.html#protocol-spec-wl_display - wl_display::error
>
>Quoting here:
>The error event is sent out when a fatal (non-recoverable) error has occurred. The object_id argument is the object where the error occurred, most often in response to a request to that object. The code identifies the error and is defined by the object interface. As such, each interface defines its own set of error codes. The message is a brief description of the error, for (debugging) convenience.
>
>Note: we currently can only pass on the message for technical reasons.
>
>7) How long will this data be collected?  Choose one of the following:
>
>* I want to permanently monitor this data. (Linux dev team)
>
>8) What populations will you measure?
>
>* Which release channels? All
>
>* Which countries? All
>
>* Which locales? All
>
>* Any other filters?  Please describe in detail below.
>
>Linux / Wayland. We already collect similar data for the older Linux / X11 code.
>
>9) If this data collection is default on, what is the opt-out mechanism for users?
>
>Not sending crash reports.
>
>10) Please provide a general description of how you will analyze this data.
>
>It will be contained in crash reports.
>
>11) Where do you intend to share the results of your analysis?
>
> In public mozilla crash reports / in bugzilla.
>
>12) Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection? 
>
> No
Flags: needinfo?(robert.mader)

Updated version.

Attachment #9238692 - Attachment is obsolete: true
Attachment #9238692 - Flags: data-review?(tlong)
Attachment #9238706 - Flags: data-review?(tlong)

Comment on attachment 9238706 [details]
Wayland protocol error message collection request

Data Review

  1. Is there or will there be documentation that describes the schema for the ultimate data set in a public, complete, and accurate way?

Yes

  1. Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, by not sending the crash report

  1. If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Robert Mader (robert.mader@posteo.de) / Linux dev team will monitor this collection.

  1. Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical data

  1. Is the data collection request for default-on or default-off?

default-off, user must send the crash report for collection to occur

  1. Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?

No

  1. Is the data collection covered by the existing Firefox privacy notice?

Yes

  1. Does the data collection use a third-party collection tool?

No

Result

data-review+

Attachment #9238706 - Flags: data-review?(tlong) → data-review+

\o/ let it rain crash reports.

Pushed by robert.mader@posteo.de:
https://hg.mozilla.org/integration/autoland/rev/875255176657
Add Wayland crash handler, r=stransky,emilio
https://hg.mozilla.org/integration/autoland/rev/b2c5ed1aea75
Remove asserts on valid viewport values again, r=gfx-reviewers,lsalzman
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 93 Branch
Regressions: 1731450
No longer regressions: 1731450
You need to log in before you can comment on or make changes to this bug.