Closed Bug 1666383 Opened 1 year ago Closed 11 months ago

Missing MOZ_CRASH reason in crash reports from Android

Categories

(Toolkit :: Crash Reporting, defect, P2)

ARM
Android
defect

Tracking

()

RESOLVED FIXED
85 Branch
Tracking Status
firefox84 --- fixed
firefox85 --- fixed

People

(Reporter: mstange, Assigned: gsvelto)

References

Details

Attachments

(1 file)

This crash report crashes on a MOZ_CRASH, but it does not have a "MOZ_CRASH reason" field:
https://crash-stats.mozilla.org/report/index/3ef3fed4-caf6-4652-b164-d10710200921

Here are two more, from the same device, in MOZ_RELEASE_ASSERTs:
https://crash-stats.mozilla.org/report/index/5306d9b2-7969-47c0-9593-5f2570200920
https://crash-stats.mozilla.org/report/index/b962c616-b4ca-4a18-9265-5b39a0200908

This is on hwinnemoe's phone, he can reproduce it at will.

But he also has one or two crashes in his list that do have a MOZ_CRASH reason.

Is this a recent regression?

When investigating crashes, the MOZ_CRASH reason is often more important and more useful than the crash signature or stack.

In fact, I can reproduce the same using about:crashparent on my Moto G5:
https://crash-stats.mozilla.org/report/index/584debe8-e130-4d5a-8a0d-b076e0200921
It should have MOZ_CRASH(Crash via about:crashparent) in its MOZ_CRASH reason field, but it doesn't. Or at least it is not displayed for me.

Here's an existing crash report (not from me, found via the signature) that does have an intact MOZ_CRASH reason:
https://crash-stats.mozilla.org/report/index/dbcce77d-efb9-4a20-a1be-0256a0200918

Severity: -- → S2

I think I know what's going on and the fix should be straightforward: the crash reason is sent over a pipe along with all the other annotations at crash time. The code that sends the annotations from a child process to the main process uses a pipe and assumes that it's set in blocking mode:

https://searchfox.org/mozilla-central/rev/f081504642a115cb8236bea4d8250e5cb0f39b02/toolkit/crashreporter/nsExceptionHandler.cpp#588

It turns out that the pipe has been configured in non-blocking mode, so it's possible that when this happens the write() call fails with EAGAIN and doesn't finish writing out the crash reason. I'll write an xpcshell-test to confirm this and then fix it.

The pipe we use to communicate crash-time annotations to the main process is
non-blocking. It's created using NSPR's PR_CreatePipe() which sets it in
non-blocking mode by default but this code was originally writing to a file
which had been opened in blocking mode. Because of this the writes to it
assumed blocking behaviour and didn't check for EAGAIN. This could lead to
crash annotations not being fully sent if they were too long for a single
write() invocation. This affected both Linux and macOS. This patch trivially
fixes the problem by repeatedly calling write() until the annotations are
fully written out. This is not optimal but it's acceptable given that this code
runs in an exception handler and I don't know if it would be safe to call
poll() in such a context.

Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Priority: -- → P2
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/046fbc896c0b
Make sure that crash annotations are sent to the main process when a child process crashes r=KrisWright
Regressions: 1628261
Status: ASSIGNED → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch

I made a mistake by landing first this patch then the one for bug 1624467. With this patch applied the race that I fixed in that bug widened which lead to the failures. I'll re-land this as-is because it's now working correctly, see this try run:

https://treeherder.mozilla.org/jobs?repo=try&revision=26aafe9595dfbbda9616ac9e6bcf1d76c06567d7&group_state=expanded

Flags: needinfo?(gsvelto)
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/36de02793851
Make sure that crash annotations are sent to the main process when a child process crashes r=KrisWright
Status: REOPENED → RESOLVED
Closed: 11 months ago11 months ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch

Comment on attachment 9189979 [details]
Bug 1666383 - Make sure that crash annotations are sent to the main process when a child process crashes

Beta/Release Uplift Approval Request

  • User impact if declined: None, but crash reports might be missing important information
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The change is small and the affected code is covered by tests
  • String changes made/needed:
Attachment #9189979 - Flags: approval-mozilla-beta?

Comment on attachment 9189979 [details]
Bug 1666383 - Make sure that crash annotations are sent to the main process when a child process crashes

Approved for 84.0b8.

Attachment #9189979 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

I tried to verify this fix, unsuccessfully.

First off, the fix seems to be about content process crashes, not parent process crashes. Of the five crash reports in comment 0 and comment 1, only two reports were from a content process crash (the two "Here are two more" reports). The other three are from parent process crashes.

So I navigated to about:crashcontent in Firefox Nightly. Here's the report I got: https://crash-stats.mozilla.org/report/index/fc1c7397-20e6-42c8-a071-702e80201210
It does not have a "moz crash reason" field. So it looks like the fix didn't fix the bug on my phone. The build I used definitely includes the patch from this bug.

Then I tried to see if I at least got a moz crash reason from about:crashparent. However, about:crashparent doesn't even generate a crash report anymore. I filed that as bug 1681842.

I filed bug 1681846 on the fact that the original bug is still present for me.

I did all my testing on the emulator and I re-tested today without being able to repro. I'm wondering if the problem shows up only on phones or possibly only for ARM builds. I'll test on a phone today and see what gives.

Alright I'm reproducing on my phone.

I smell a compiler problem: if I put some printing in AnnotateMozCrashReason() everything works as it should; if I remove the printing then the gMozCrashReason variable isn't set anymore and the reason isn't populated in the crash.

You need to log in before you can comment on or make changes to this bug.