Closed Bug 644707 Opened 13 years ago Closed 13 years ago

Avoid calling system abort(), generate useful stack traces through a manual segfault instead

Categories

(Core :: XPCOM, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla8

People

(Reporter: jdm, Assigned: azakai)

References

Details

Attachments

(1 file, 1 obsolete file)

abort can drag us into libc and screw up backtraces on tinderbox.  I've never seen JS_ASSERT fail, so let's use their method for mozalloc_abort.
raise(SIGABRT) is not caught by the crash reporter, which is why mozalloc_abort uses it only after actually trying to crash first.
Or maybe I'm wrong, ted knows I think.
Comment on attachment 521576 [details] [diff] [review]
Raise an abort signal instead of calling abort.

Oh, boo.
Attachment #521576 - Attachment is obsolete: true
I really don't remember, honestly. I'm pretty sure whatever we're doing for JS_ASSERT works on tinderbox. It's not hard to test, though, just stick one in somewhere and run with MOZ_CRASHREPORTER=1 in your environment, and see if you get the crash reporter.
I tested on Linux, and both |abort()| and |raise(SIGABRT)| trigger the crash reporter. So that part seems fine. The issue with SIGABRT not caught is apparently limited to OS X, as the source comment says in that file. The patch doesn't change anything on OS X, so should be fine, I am in favor of moving forward with it.

But I do have some other concerns:

1. There are other places that call abort(), not just mozalloc_abort. Should we change all of them? Also, some are in things like jemalloc, chromium ipc code, etc. - not our core codebase. What do we want to do about those?

2. I didn't actually check if crash report stack traces look better with this. One thing that concerns me is that we will still have a stack trace inside libc, won't we? Just it will be inside raise() instead of abort(). That is presumably less deep (if abort calls raise), but it doesn't seem guaranteed to work, unless I am missing something?
The other possibility here is just to use TouchBadMemory() like we do on OS X. That will give us a reliable crash with a stack entirely in libxul.
Attached patch patch v2Splinter Review
This patch overrides the system abort() with a redirect to mozalloc_abort, in order to catch all calls to abort() in the browser. It also tries to touch bad memory first, then raise(SIGABORT) (except on OS X where this is known not to work), then _exit, in that order.

This seems like the safest thing to do, or can we do better somehow?
Attachment #544339 - Flags: review?(ted.mielczarek)
I went with raise(SIGABORT) because every crash stack I've seen using JS_ASSERT has worked just fine.
The patch does try raise(SIGABORT), after trying to manually segfault. Manual segfaults seem most likely to work since the stack trace will be entirely in our code.

But in any case, I am fine with either way, both seem likely to be improvements over the current situation. ted, which do you prefer?
I'm in favor of whatever gets us the best stack traces. Have you tested this on Android to see what the stacks look like?
I don't have access to an Android device at the moment, I should be able to test this in a few days.
Assignee: nobody → azakai
Comment on attachment 544339 [details] [diff] [review]
patch v2

Review of attachment 544339 [details] [diff] [review]:
-----------------------------------------------------------------

Seems reasonable.

::: memory/mozalloc/mozalloc_abort.cpp
@@ +81,2 @@
>  
> +    // On *NIX platforms the preffered way to abort is by touching bad memory,

"prefered"
Attachment #544339 - Flags: review?(ted.mielczarek) → review+
Summary: Avoid calling abort() and use raise(SIGABRT) instead → Avoid calling system abort(), generate useful stack traces through a manual segfault instead
Pushed to m-i

http://hg.mozilla.org/integration/mozilla-inbound/rev/f986c9b02c1d

For posterity's sake, we talked on irc and ted ran some tests which showed that we get useful stack data using manual segfaults, but not with raise(SIGABORT) or abort(), confirming the reasoning before.
http://hg.mozilla.org/mozilla-central/rev/f986c9b02c1d
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla8
I wondered what the reason was for interposing the global abort symbol even for aborts from within system libraries, because we end up depending on stack scanning through those system libraries anyway, but I guess it may be useful as it means that all aborts will all have the same signature instead of different operating systems having slightly different offsets within linux-gate.so.
Note that we could also add symbols to our symbol store for linux-gate. Breakpad has some outdated ones in the source tree:
http://code.google.com/p/google-breakpad/source/browse/#svn%2Ftrunk%2Fsrc%2Fclient%2Flinux%2Fdata

I landed a patch recently that made linux-gate get a reasonable debug ID again, but I didn't take the time to generate symbol files for it.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: