Closed Bug 1080077 Opened 5 years ago Closed 5 years ago

Native stack traces for Linux sandbox failures on ASAN builds

Categories

(Core :: Security, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: jld, Assigned: jld)

References

Details

Attachments

(1 file, 1 obsolete file)

ASAN builds disable the crash reporter, so an ASAN build's sandbox failure on TBPL shows only the syscall info (syscall number and argument registers), instead of also having a stack trace extracted from the minidump.  This is noticeably not ideal for determining the root cause of such a crash.

ASAN, unsurprisingly, handles segfaults by logging useful information like a stack trace.  If that unwinder can deal with signal handler frames, then it would suffice to kill the process with a MOZ_CRASH (or equivalent) rather than reraising the SIGSYS.
It gets worse.  ASAN's crash reporter does lots of stuff that violates the sandbox policy — starting with tcgetattr() on stderr and proceeding into trying to rummage around in /proc to symbolicate the stack trace.  So this has probably been breaking ASAN/LSAN coverage of media plugins for... a while.

This also means that segfaulting in response to sandbox violations won't work, unless it's possible to determine that we're not already in ASAN's crash handler, but NS_StackWalk or whatever we use for normal crashes might work.
See Also: → 1081242
No longer depends on: 1080803
Comment #1 is now its own bug.
Attachment #8504185 - Flags: review?(gdestuynder)
Blocks: 1081242
See Also: 1081242
Comment on attachment 8504185 [details] [diff] [review]
bug1080077-sandbox-stacktrace-hg0.diff

Review of attachment 8504185 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/sandbox/linux/glue/SandboxCrash.cpp
@@ +90,5 @@
> +{
> +  // Skip 3 frames: one for this module, one for the signal handler in
> +  // libmozsandbox, and one for the signal trampoline.
> +  //
> +  // FIXME, bug 968531: This should use the signal context so it can

what happens there on ARM?
Attachment #8504185 - Flags: review?(gdestuynder) → review+
(In reply to Guillaume Destuynder [:kang] from comment #4)
> Comment on attachment 8504185 [details] [diff] [review]
> ::: security/sandbox/linux/glue/SandboxCrash.cpp
> > +  // FIXME, bug 968531: This should use the signal context so it can
> 
> what happens there on ARM?

NS_StackWalk stops before it gets to the actual stack, so the callback is never called.

Also, it's not just ARM; there are also problems on x86.  NS_StackWalk uses frame pointers there, but the syscall instruction that raised SIGSYS is probably in libc, and libc is probably built with -fomit-frame-pointer, so it works only if the syscall instruction occurs before any reuse of %rbp by the libc code (and we still lose the address of whatever called into libc).

Even so, it's better than nothing, and it's likely to provide the information needed for bug 1080165, so I'd like to land the patch with some changes to the comment.
Adjusted comment; carrying over r+.
Attachment #8504185 - Attachment is obsolete: true
Attachment #8504425 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/809b3ec41a5d
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
You need to log in before you can comment on or make changes to this bug.