Closed Bug 1287437 Opened 3 years ago Closed 3 years ago

LeakSanitizer has encountered a fatal error

Categories

(Core :: Security: Process Sandboxing, defect)

defect
Not set

Tracking

()

RESOLVED DUPLICATE of bug 1287971

People

(Reporter: cbook, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: regression, Whiteboard: [MemShrink][sblc2])

we got a lot of noise recently on asan tests (nothing fatal or so, just riding along the error log when we got a failure) 

https://treeherder.mozilla.org/logviewer.html#?job_id=10612543&repo=fx-team#L3256

02:37:04     INFO -  ==2278==LeakSanitizer has encountered a fatal error.
 02:37:08     INFO -  -----------------------------------------------------
 02:37:08     INFO -  Suppressions used:
 02:37:08     INFO -    count      bytes template
 02:37:08     INFO -       40        986 libc.so
 02:37:08     INFO -      836      26672 nsComponentManagerImpl
 02:37:08     INFO -       52       7072 mozJSComponentLoader::LoadModule
 02:37:08     INFO -        1        384 pixman_implementation_lookup_composite
 02:37:08     INFO -      360      15936 libfontconfig.so
 02:37:08     INFO -        1         32 libdl.so
 02:37:08     INFO -       26       6492 libglib-2.0.so
 02:37:08     INFO -        8        224 libresolv.so
02:37:08 INFO - -----------------------------------------------------
andrew: do you know where to look in hg what caused this ?
Flags: needinfo?(continuation)
(In reply to Carsten Book [:Tomcat] from comment #1)
> andrew: do you know where to look in hg what caused this ?

I'm not sure what you mean by that.

Anyways, this sounds kind of bad. I wonder if we're even actually checking for leaks right now. Did we bump the version of Clang we use for ASan builds recently? I'll try to look at logs to see when this started showing up.

I also wonder if the "WARNING - Can't figure out symbols_url from installer_url" is related.
Blocks: LSan
Product: Core → Testing
(In reply to Andrew McCreight [:mccr8] from comment #2)
> (In reply to Carsten Book [:Tomcat] from comment #1)
> > andrew: do you know where to look in hg what caused this ?
> 
> I'm not sure what you mean by that.
> 

oh i meant if a in-tree change could cause that and so something we can backout
Fortunately, it looks like a recent regression.

I don't see the fatal error in this push on m-c:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=ef5f932101e5b833b2429407cb0873471b4d764e

But I do see it in the next one:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=711963e8daa312ae06409f8ab5c06612cb0b8f7b

This is the set of changes that landed in the second one:
https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=711963e8daa312ae06409f8ab5c06612cb0b8f7b
Flags: needinfo?(continuation)
I'll try to figure out what regressed this.

We should also make the tree turn orange when this happens.
Assignee: nobody → continuation
I bisected down to this push:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=54e9af18d31426a6474d584add8f487d99848854

The only suspicious commit in that push is bug 1286324.

Jed, could you take a look at this? If you can't fix it soon, please consider backing out your patch, as people may be introducing new LSan leaks. I'm not sure if LSan is actually going to report anything or not. (It might just be dying after it would do the report or something.)

I'll file a separate bug for making this turn the tree orange and work on that.
Assignee: continuation → nobody
Blocks: 1286324
Flags: needinfo?(jld)
Keywords: regression
Whiteboard: [MemShrink]
Component: General → Security: Process Sandboxing
Product: Testing → Core
I've confirmed locally that backing out bug 1287877 makes the LeakSanitizer fatal error message go away.
Maybe you could take a look at this, Julian, if Jed isn't around? Thanks.
Flags: needinfo?(julian.r.hector)
Jed's looking at it. Anyways, the bigger problem seems to be that we're not running LeakSanitizer in the content process. Jed's patch just made it so that we got some alert about it rather than silently failing.
Flags: needinfo?(julian.r.hector)
That patch changes the sandbox policy so that fork() fails with EPERM instead of just crashing, so anything reasonable that LSan was doing that could observe it would have already been broken before that patch.  It's possible that mozharness wouldn't notice that kind of crash, because ASan builds have no crash reporter, but there would have been log messages on stderr (starting with "Sandbox: ") and I don't see any in the logs for the “before” m-c build.

One weird thing here is that if the sanitizer runtime managed to block SIGSYS, possibly by trying to block all signals (we have symbol interposition for sigprocmask and pthread_sigmask to force SIGSYS to stay unblocked for exactly this reason, but that wouldn't apply if it does the sigprocmask syscall directly) then the kernel will unblock the signal *and* reset its disposition before delivering it, which means the process will immediately exit — no log messages, no crash reporting, nothing.  So that could result in weird breakage that wouldn't show up as a test failure or even be obvious to a human reading the logs.

But a look at the compiler-rt source doesn't show anything that might be doing this besides TSan, which is known to be incompatible with sandboxing for various reasons (and will disable it: bug 1182565).
Yeah, it looks like LSan is just not running at all on Nightly. I filed bug 1287971 for that.
See Also: → 1287971
This and bug 1287971 are going to have the same solution, it looks like: disable sandboxing if ASan (and therefore LSan) is used.

(Vague summary of bug 1287971 comment #9: it's the same syscall causing both of these bugs, and it's weirder than just a plain fork() but that doesn't really matter; bug 1287971 is that it immediately and silently killed the process (rather than crashing it very noisily as intended) because the last paragraph of comment #10 is wrong, and *this* bug is that it now fails in such a way that LSan is able to complain about it.)
Flags: needinfo?(jld)
Whiteboard: [MemShrink] → [MemShrink][sblc2]
Looking at some before/after failed test jobs on TH, this seems to have been fixed by https://hg.mozilla.org/mozilla-central/rev/8d2a4af272e3 as expected.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1287971
You need to log in before you can comment on or make changes to this bug.