Missing MOZ_CRASH reason in crash reports from Android
Categories
(Toolkit :: Crash Reporting, defect, P2)
Tracking
()
People
(Reporter: mstange, Assigned: gsvelto)
References
Details
Attachments
(1 file)
|
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
|
Details | Review |
This crash report crashes on a MOZ_CRASH, but it does not have a "MOZ_CRASH reason" field:
https://crash-stats.mozilla.org/report/index/3ef3fed4-caf6-4652-b164-d10710200921
Here are two more, from the same device, in MOZ_RELEASE_ASSERTs:
https://crash-stats.mozilla.org/report/index/5306d9b2-7969-47c0-9593-5f2570200920
https://crash-stats.mozilla.org/report/index/b962c616-b4ca-4a18-9265-5b39a0200908
This is on hwinnemoe's phone, he can reproduce it at will.
But he also has one or two crashes in his list that do have a MOZ_CRASH reason.
Is this a recent regression?
When investigating crashes, the MOZ_CRASH reason is often more important and more useful than the crash signature or stack.
| Reporter | ||
Comment 1•5 years ago
|
||
In fact, I can reproduce the same using about:crashparent on my Moto G5:
https://crash-stats.mozilla.org/report/index/584debe8-e130-4d5a-8a0d-b076e0200921
It should have MOZ_CRASH(Crash via about:crashparent) in its MOZ_CRASH reason field, but it doesn't. Or at least it is not displayed for me.
Here's an existing crash report (not from me, found via the signature) that does have an intact MOZ_CRASH reason:
https://crash-stats.mozilla.org/report/index/dbcce77d-efb9-4a20-a1be-0256a0200918
| Assignee | ||
Updated•5 years ago
|
| Assignee | ||
Comment 2•5 years ago
|
||
I think I know what's going on and the fix should be straightforward: the crash reason is sent over a pipe along with all the other annotations at crash time. The code that sends the annotations from a child process to the main process uses a pipe and assumes that it's set in blocking mode:
It turns out that the pipe has been configured in non-blocking mode, so it's possible that when this happens the write() call fails with EAGAIN and doesn't finish writing out the crash reason. I'll write an xpcshell-test to confirm this and then fix it.
| Assignee | ||
Comment 3•5 years ago
|
||
The pipe we use to communicate crash-time annotations to the main process is
non-blocking. It's created using NSPR's PR_CreatePipe() which sets it in
non-blocking mode by default but this code was originally writing to a file
which had been opened in blocking mode. Because of this the writes to it
assumed blocking behaviour and didn't check for EAGAIN. This could lead to
crash annotations not being fully sent if they were too long for a single
write() invocation. This affected both Linux and macOS. This patch trivially
fixes the problem by repeatedly calling write() until the annotations are
fully written out. This is not optimal but it's acceptable given that this code
runs in an exception handler and I don't know if it would be safe to call
poll() in such a context.
Updated•5 years ago
|
| Assignee | ||
Updated•5 years ago
|
Comment 5•5 years ago
|
||
| bugherder | ||
Comment 6•5 years ago
|
||
Backed out for perma failures.
Log; https://treeherder.mozilla.org/logviewer?job_id=323188948&repo=autoland&lineNumber=3673
Backout: https://hg.mozilla.org/integration/autoland/rev/38f4b3abb54cee8a46f593891ac4b949239ac072
Comment 7•5 years ago
|
||
| Assignee | ||
Comment 8•5 years ago
|
||
I made a mistake by landing first this patch then the one for bug 1624467. With this patch applied the race that I fixed in that bug widened which lead to the failures. I'll re-land this as-is because it's now working correctly, see this try run:
Comment 10•5 years ago
|
||
| bugherder | ||
| Assignee | ||
Comment 11•5 years ago
|
||
Comment on attachment 9189979 [details]
Bug 1666383 - Make sure that crash annotations are sent to the main process when a child process crashes
Beta/Release Uplift Approval Request
- User impact if declined: None, but crash reports might be missing important information
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The change is small and the affected code is covered by tests
- String changes made/needed:
Comment 12•5 years ago
|
||
Comment on attachment 9189979 [details]
Bug 1666383 - Make sure that crash annotations are sent to the main process when a child process crashes
Approved for 84.0b8.
Comment 13•5 years ago
|
||
| bugherder uplift | ||
| Reporter | ||
Comment 14•5 years ago
•
|
||
I tried to verify this fix, unsuccessfully.
First off, the fix seems to be about content process crashes, not parent process crashes. Of the five crash reports in comment 0 and comment 1, only two reports were from a content process crash (the two "Here are two more" reports). The other three are from parent process crashes.
So I navigated to about:crashcontent in Firefox Nightly. Here's the report I got: https://crash-stats.mozilla.org/report/index/fc1c7397-20e6-42c8-a071-702e80201210
It does not have a "moz crash reason" field. So it looks like the fix didn't fix the bug on my phone. The build I used definitely includes the patch from this bug.
Then I tried to see if I at least got a moz crash reason from about:crashparent. However, about:crashparent doesn't even generate a crash report anymore. I filed that as bug 1681842.
| Reporter | ||
Comment 15•5 years ago
|
||
I filed bug 1681846 on the fact that the original bug is still present for me.
| Assignee | ||
Comment 16•5 years ago
|
||
I did all my testing on the emulator and I re-tested today without being able to repro. I'm wondering if the problem shows up only on phones or possibly only for ARM builds. I'll test on a phone today and see what gives.
| Assignee | ||
Comment 17•5 years ago
|
||
Alright I'm reproducing on my phone.
| Assignee | ||
Comment 18•5 years ago
|
||
I smell a compiler problem: if I put some printing in AnnotateMozCrashReason() everything works as it should; if I remove the printing then the gMozCrashReason variable isn't set anymore and the reason isn't populated in the crash.
Description
•