Closed Bug 1142263 Opened 6 years ago Closed 6 years ago

Bug 1137007 makes ASAN on Fedora 21 (clang 3.5.0) confused and angry

Categories

(Core :: Security: Process Sandboxing, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: bwc, Assigned: jld)

References

Details

Attachments

(1 file)

This changeset is causing ASAN to report spurious stack-overflows and SIGSEGV, which strangely enough doesn't seem to result in program termination. Occasionally I'm seeing unit-tests hang in addition to the spurious errors, but I am not 100% sure this is related (will investigate further). The initial output always seems to be the same:

ASAN:SIGSEGV
=================================================================
==17888==ERROR: AddressSanitizer: stack-overflow on address 0x000000000002 (pc 0x0037120fae91 bp 0x7fff0752a560 sp 0x000000000002 T0)
 

I've observed this in sdp_unittest, jsep_session_unittest, signaling_unittests, and running the browser.
Just observed the test-case hanging on this changeset (actually, it looks like it is trying to crash, given that I see an instance of abrt-hook-ccpp pointing at it, but it seems to just spin indefinitely).
Huh. Killing the abrt-hook-ccpp seems to unstick the unit-test. Bizarre.
I'm going to guess that ASAN doesn't like me calling clone(2) directly like that.  Does it work with MOZ_ASSUME_USER_NS=1 (or 0; it doesn't actually matter yet, but any value will make it skip the check) set in the environment?
Flags: needinfo?(docfaraday)
That seems to prevent the problem, at least with the unit-tests. When I get in the office tomorrow, I'll try the same with mochitest.
Assignee: nobody → jld
(In reply to Jed Davis [:jld] from comment #3)
> I'm going to guess that ASAN doesn't like me calling clone(2) directly like
> that.

Worse than that: it looks like ASAN's stack-overflow error is correct.  I forgot to explicitly pass the clone(2) argument that sets the child stack pointer if non-null, and taking care of that seems to fix it.  (I'm running into other errors as well, but they seem to be unrelated.)
(In reply to Jed Davis [:jld] from comment #5)
> Worse than that: it looks like ASAN's stack-overflow error is correct.  I
> forgot to explicitly pass the clone(2) argument that sets the child stack
> pointer if non-null, and taking care of that seems to fix it.

…and this also explains the “doesn't seem to result in program termination” from comment #0 — it's in a cloned child process, and its parent just wants to know if it was created successfully, so whether it successfully evaluates _exit(0) or crashes on a bad stack pointer doesn't actually matter.  (The fun times would start if the garbage that winds up in rSP is a valid pointer to a word holding a pointer to executable memory; so, really, let's not do that.)
Does this patch fix the mystery crashes on your end?
Flags: needinfo?(docfaraday)
Attachment #8576368 - Flags: feedback?(docfaraday)
Comment on attachment 8576368 [details] [diff] [review]
bug1142263-userns-detection-oops-hg0.diff

Review of attachment 8576368 [details] [diff] [review]:
-----------------------------------------------------------------

This seems to have done the trick!
Attachment #8576368 - Flags: feedback?(docfaraday) → feedback+
Attachment #8576368 - Flags: review?(gdestuynder)
Attachment #8576368 - Flags: review?(gdestuynder) → review+
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e8b0913ccdf1 (build-only try run because automation's Linux is too old to reach this case, but I've tested locally, and see comment #8).
Keywords: checkin-needed
Duplicate of this bug: 1142862
https://hg.mozilla.org/mozilla-central/rev/46472d25b238
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
You need to log in before you can comment on or make changes to this bug.