Closed Bug 1287971 Opened 9 years ago Closed 9 years ago

LeakSanitizer does not run in the content process

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla50

Tracking Flags:

Tracking

Status

firefox50

---

fixed

People

(Reporter: mccr8, Assigned: jld)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink][sblc2])

Attachments

(1 file)

bug1287971-lsan-unbox-hg0.diff 9 years ago Jed Davis [:jld] ⟨⏰\|UTC-7⟩ ⟦he/him⟧ 996 bytes, patch	tedd : review+	Details \| Diff \| Splinter Review

Andrew McCreight [:mccr8]

Reporter

Description

•

9 years ago

This came up while looking at bug 1287437. It looks like we're not actually running LeakSanitizer in ASan builds right now, in content processes. This is bad because we could have introduced new leaks in the mean time. You can see this by running some mochitests, like ./mach mochitest -f plain js/xpconnect/ and then looking through the log for instances of "Suppressions used:". There should be one for the parent process and one for the child process. If I run normally, there's only one, but if I run with "MOZ_DISABLE_CONTENT_SANDBOX=t" then there are two (also with bug 1287437 backed out).

Andrew McCreight [:mccr8]

Reporter

Updated

•

9 years ago

Whiteboard: [MemShrink]

Andrew McCreight [:mccr8]

Reporter

Comment 1

•

9 years ago

I believe the sandbox is Nightly-only, so if there are leaks introduced they will show up when we merge to Aurora.

Andrew McCreight [:mccr8]

Reporter

Updated

•

9 years ago

Comment 2

•

9 years ago

In local testing, ASan still seems to detect use-after-frees, so it seems like it is only LeakSanitizer that's not working.

Bill Gianopoulos [:WG9s]

Comment 3

•

9 years ago

(In reply to Andrew McCreight [:mccr8] from comment #1) > I believe the sandbox is Nightly-only, so if there are leaks introduced they > will show up when we merge to Aurora. I consider this a relatively stupid comment. The purpose of these things is to find things causing issues soon after check-in to aid in isolating which patch cased the issue. Trying to figure it out over an accumulation of 6 weeks of patches is not really helpful.

Andrew McCreight [:mccr8]

Reporter

Comment 4

•

9 years ago

(In reply to Bill Gianopoulos [:WG9s] from comment #3) > I consider this a relatively stupid comment. This comment is quite rude, and not appropriate for Bugzilla. > The purpose of these things is > to find things causing issues soon after check-in to aid in isolating which > patch cased the issue. Trying to figure it out over an accumulation of 6 > weeks of patches is not really helpful. I'm well aware of that. I was mostly pointing this out in case RyanVM or Philor does a push of Nightly-as-Aurora and sees weird failures.

Andrew McCreight [:mccr8]

Reporter

Comment 5

•

9 years ago

try push with sandboxing to see if anything has started failing in the mean time: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e78ba769fa7d

Bill Gianopoulos [:WG9s]

Comment 6

•

9 years ago

(In reply to Andrew McCreight [:mccr8] from comment #5) > try push with sandboxing to see if anything has started failing in the mean > time: > https://treeherder.mozilla.org/#/jobs?repo=try&revision=e78ba769fa7d Well then I guess I am the stupid one. I missed the point of your comment.

Bill Gianopoulos [:WG9s]

Comment 7

•

9 years ago

(In reply to Andrew McCreight [:mccr8] from comment #5) > try push with sandboxing to see if anything has started failing in the mean > time: > https://treeherder.mozilla.org/#/jobs?repo=try&revision=e78ba769fa7d If you run a terminal window you should see related Sandbox errors in the terminal window.

Ryan VanderMeulen [:RyanVM]

Comment 8

•

9 years ago

I've handed off the uplift simulations to the remaining sheriffs :)

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Assignee

Comment 9

•

9 years ago

From bug 1287437 comment #10: > One weird thing here is that if the sanitizer runtime managed to block > SIGSYS, possibly by trying to block all signals (we have symbol > interposition for sigprocmask and pthread_sigmask to force SIGSYS to stay > unblocked for exactly this reason, but that wouldn't apply if it does the > sigprocmask syscall directly) then the kernel will unblock the signal *and* > reset its disposition before delivering it, which means the process will > immediately exit — no log messages, no crash reporting, nothing. So that > could result in weird breakage that wouldn't show up as a test failure or > even be obvious to a human reading the logs. > > But a look at the compiler-rt source doesn't show anything that might be > doing this besides TSan, which is known to be incompatible with sandboxing > for various reasons (and will disable it: bug 1182565). So I tried strace — after backing out bug 1286324 and manually symlinking OBJDIR/dist/bin/llvm-symbolizer to the matching llvm-symbolizer (although that might not be necessary for this bug?) — and this happened: [pid 79914] rt_sigprocmask(SIG_BLOCK, ~[ILL ABRT BUS FPE SEGV XCPU XFSZ], [], 8) = 0 [pid 79914] --- SIGSYS {si_signo=SIGSYS, si_code=0x1, si_errno=EPERM, si_pid=5088774, si_uid=0, si_value={int=56, ptr=0xc000003e00000038}} --- [pid 79914] +++ killed by SIGSYS +++ Syscall 56 is clone(); si_syscall aliases si_value, it looks like. So, where in the compiler-rt source might this be coming from? Maybe lib/sanitizer_common/sanitizer_stoptheworld_linux_libcdep.cc: static const int kUnblockedSignals[] = { SIGABRT, SIGILL, SIGFPE, SIGSEGV, SIGBUS, SIGXCPU, SIGXFSZ }; And, as predicted, it's using its own self-contained implementation of sigprocmask() which doesn't use libc at all. The thing I failed to spot in the git-grep output was this, in lib/lsan/lsan_common.cc: StopTheWorld(DoLeakCheckCallback, &param); Unsurprisingly in hindsight, LSan wants to stop all other threads (and signals) while rummaging through the heap, to avoid data races. To do this, the StopTheWorld code forks a process that shares the parent's address space and a few other things, but in its own new thread group (and with its own signal disposition, etc.): uptr tracer_pid = internal_clone( TracerThread, tracer_stack.Bottom(), CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_UNTRACED, &tracer_thread_argument, 0 /* parent_tidptr */, 0 /* newtls */, 0 /* child_tidptr */); That's the clone call where the problems start. But if that succeeded, it would then go on to do a bunch of other unusual stuff in that process, in particular ptrace()ing all of its parent's threads, whose pids it determines by opening /proc/%d/task/, etc. This… trying to make this somehow work with some vestige of sandboxing does not seem like a good idea. Also, from the compiler-rt revision history, this code seems to be as old as LSan, so I don't know how this ever worked for GeckoMediaPlugin sandboxing. Possibly the answer is that it didn't… except in gtests, because (pre-TaskCluster, anyway) those ran on old RedHat with no seccomp-bpf. <hr /> tl;dr: Once bug 742434 landed, content processes were silently dying when they tried to do LSan reports, which effectively disabled LSan in content processes. (This is when the process was already exiting, and the IPC code doesn't check the exit status, so nothing else broke.) But bug 1286324 replaced the thing that got the process killed with a simple failure, resulting in the "LeakSanitizer has encountered a fatal error" message. In either case, the only reasonable solution is disabling sandboxing if ASan (and therefore LSan) is used. The fix is simple: extend the bug 1182565 fix to MOZ_ASAN (and maybe MOZ_MSAN?) as well as MOZ_TSAN. It looks as though small changes like that are acceptable in old-configure.in (i.e., hopefully I won't be required to convert all sanitizer and sandboxing configury to Python just for this).

Assignee: nobody → jld

Carsten Book [:Tomcat]

Comment 10

•

9 years ago

(In reply to Andrew McCreight [:mccr8] from comment #4) > > I'm well aware of that. I was mostly pointing this out in case RyanVM or > Philor does a push of Nightly-as-Aurora and sees weird failures. i took this over Andrew and feel free to needinfo me in the future (just saw this bug here:) will start such a run tomorrow morning my time.

Jim Mathies [:jimm]

Updated

•

9 years ago

Whiteboard: [MemShrink] → [MemShrink][sblc2]

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Assignee

Comment 11

•

9 years ago

Attached patch bug1287971-lsan-unbox-hg0.diff — Details — Splinter Review

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9818c6d55f97

Attachment #8773813 - Flags: review?(julian.r.hector)

Julian Hector [:tedd] [:jhector]

Comment 12

•

9 years ago

Comment on attachment 8773813 [details] [diff] [review] bug1287971-lsan-unbox-hg0.diff Review of attachment 8773813 [details] [diff] [review]: ----------------------------------------------------------------- lgtm

Attachment #8773813 - Flags: review?(julian.r.hector) → review+

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Assignee

Comment 13

•

9 years ago

Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9818c6d55f97 — the intermittents seem to not be my fault.

Keywords: checkin-needed

OS: Unspecified → Linux

Pulsebot

Comment 14

•

9 years ago

Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/8d2a4af272e3 Disable sandboxing in Linux ASan builds. r=jhector

Keywords: checkin-needed

Carsten Book [:Tomcat]

Comment 15

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/8d2a4af272e3

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox50: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla50

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Assignee

Updated

•

8 years ago