1287437 - LeakSanitizer has encountered a fatal error

Reporter

Description

•

9 years ago

we got a lot of noise recently on asan tests (nothing fatal or so, just riding along the error log when we got a failure) https://treeherder.mozilla.org/logviewer.html#?job_id=10612543&repo=fx-team#L3256 02:37:04 INFO - ==2278==LeakSanitizer has encountered a fatal error. 02:37:08 INFO - ----------------------------------------------------- 02:37:08 INFO - Suppressions used: 02:37:08 INFO - count bytes template 02:37:08 INFO - 40 986 libc.so 02:37:08 INFO - 836 26672 nsComponentManagerImpl 02:37:08 INFO - 52 7072 mozJSComponentLoader::LoadModule 02:37:08 INFO - 1 384 pixman_implementation_lookup_composite 02:37:08 INFO - 360 15936 libfontconfig.so 02:37:08 INFO - 1 32 libdl.so 02:37:08 INFO - 26 6492 libglib-2.0.so 02:37:08 INFO - 8 224 libresolv.so 02:37:08 INFO - -----------------------------------------------------

Carsten Book [:Tomcat]

Reporter

Comment 1

•

9 years ago

andrew: do you know where to look in hg what caused this ?

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 2

•

9 years ago

(In reply to Carsten Book [:Tomcat] from comment #1) > andrew: do you know where to look in hg what caused this ? I'm not sure what you mean by that. Anyways, this sounds kind of bad. I wonder if we're even actually checking for leaks right now. Did we bump the version of Clang we use for ASan builds recently? I'll try to look at logs to see when this started showing up. I also wonder if the "WARNING - Can't figure out symbols_url from installer_url" is related.

Blocks: LSan

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Product: Core → Testing

Carsten Book [:Tomcat]

Reporter

Comment 3

•

9 years ago

(In reply to Andrew McCreight [:mccr8] from comment #2) > (In reply to Carsten Book [:Tomcat] from comment #1) > > andrew: do you know where to look in hg what caused this ? > > I'm not sure what you mean by that. > oh i meant if a in-tree change could cause that and so something we can backout

Andrew McCreight [:mccr8]

Comment 4

•

9 years ago

Fortunately, it looks like a recent regression. I don't see the fatal error in this push on m-c: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=ef5f932101e5b833b2429407cb0873471b4d764e But I do see it in the next one: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=711963e8daa312ae06409f8ab5c06612cb0b8f7b This is the set of changes that landed in the second one: https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=711963e8daa312ae06409f8ab5c06612cb0b8f7b

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 5

•

9 years ago

I'll try to figure out what regressed this. We should also make the tree turn orange when this happens.

Assignee: nobody → continuation

Andrew McCreight [:mccr8]

Comment 6

•

9 years ago

I bisected down to this push: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=54e9af18d31426a6474d584add8f487d99848854 The only suspicious commit in that push is bug 1286324. Jed, could you take a look at this? If you can't fix it soon, please consider backing out your patch, as people may be introducing new LSan leaks. I'm not sure if LSan is actually going to report anything or not. (It might just be dying after it would do the report or something.) I'll file a separate bug for making this turn the tree orange and work on that.

Assignee: continuation → nobody

Blocks: 1286324

Flags: needinfo?(jld)

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Blocks: 1287877

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Keywords: regression

Whiteboard: [MemShrink]

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Component: General → Security: Process Sandboxing

Product: Testing → Core

Andrew McCreight [:mccr8]

Comment 7

•

9 years ago

I've confirmed locally that backing out bug 1287877 makes the LeakSanitizer fatal error message go away.

Andrew McCreight [:mccr8]

Comment 8

•

9 years ago

Maybe you could take a look at this, Julian, if Jed isn't around? Thanks.

Flags: needinfo?(julian.r.hector)

Andrew McCreight [:mccr8]

Comment 9

•

9 years ago

Jed's looking at it. Anyways, the bigger problem seems to be that we're not running LeakSanitizer in the content process. Jed's patch just made it so that we got some alert about it rather than silently failing.

Flags: needinfo?(julian.r.hector)

Jed Davis [:jld] (away until 10-15)

Comment 10

•

9 years ago

That patch changes the sandbox policy so that fork() fails with EPERM instead of just crashing, so anything reasonable that LSan was doing that could observe it would have already been broken before that patch. It's possible that mozharness wouldn't notice that kind of crash, because ASan builds have no crash reporter, but there would have been log messages on stderr (starting with "Sandbox: ") and I don't see any in the logs for the “before” m-c build. One weird thing here is that if the sanitizer runtime managed to block SIGSYS, possibly by trying to block all signals (we have symbol interposition for sigprocmask and pthread_sigmask to force SIGSYS to stay unblocked for exactly this reason, but that wouldn't apply if it does the sigprocmask syscall directly) then the kernel will unblock the signal *and* reset its disposition before delivering it, which means the process will immediately exit — no log messages, no crash reporting, nothing. So that could result in weird breakage that wouldn't show up as a test failure or even be obvious to a human reading the logs. But a look at the compiler-rt source doesn't show anything that might be doing this besides TSan, which is known to be incompatible with sandboxing for various reasons (and will disable it: bug 1182565).

Andrew McCreight [:mccr8]

Comment 11

•

9 years ago

Yeah, it looks like LSan is just not running at all on Nightly. I filed bug 1287971 for that.

Comment 12

•

9 years ago

This and bug 1287971 are going to have the same solution, it looks like: disable sandboxing if ASan (and therefore LSan) is used. (Vague summary of bug 1287971 comment #9: it's the same syscall causing both of these bugs, and it's weirder than just a plain fork() but that doesn't really matter; bug 1287971 is that it immediately and silently killed the process (rather than crashing it very noisily as intended) because the last paragraph of comment #10 is wrong, and *this* bug is that it now fails in such a way that LSan is able to complain about it.)

Flags: needinfo?(jld)

Jim Mathies [:jimm]

Updated

•

9 years ago

Whiteboard: [MemShrink] → [MemShrink][sblc2]

Jed Davis [:jld] (away until 10-15)

Comment 13

•

9 years ago

Looking at some before/after failed test jobs on TH, this seems to have been fixed by https://hg.mozilla.org/mozilla-central/rev/8d2a4af272e3 as expected.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → DUPLICATE

Bugzilla

LeakSanitizer has encountered a fatal error

Categories

(Core :: Security: Process Sandboxing, defect)

Tracking

()

People

(Reporter: cbook, Unassigned)

References

(Blocks 1 open bug,
URL
)

Details

(Keywords: regression, Whiteboard: [MemShrink][sblc2])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Updated

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Comment 13