1403668 - AddressSanitizer builds do not dump logs for stack overflows

Reporter

Description

•

8 years ago

Attached file asan_verbose_log.txt — Details

Normally on error detection "ASAN:DEADLYSIGNAL" is printed to stderr and logs are then dumped. This does not seem to be working in the "stack overflow" case. We do see that the process exits with -11 (SEGV) but no AddressSanitizer information is dumped. This does work on other application (and I think it has in the past on Firefox but I'm not 100%) asan_verbose_log.txt shows the output running with ASAN_OPTIONS=verbosity=1 For an example bug 1403656 will reproduce the issue using m-c build: BuildID=20170925234146 SourceStamp=e6b3498a39b94616ba36798fe0b71a3090b1b14c

Andrew McCreight [:mccr8]

Comment 1

•

8 years ago

Like I said in IRC, my only theory here is that somehow the stack size changes in Stylo broke this. I don't know if Stylo is involved in whatever reflow stuff happens in that bug. Maybe you could bisect that test case to figure out if a change on m-c broke this?

Selena Deckelmann :selenamarie :selena

Comment 2

•

8 years ago

Tyson -- can you perform the bisect?

Flags: needinfo?(twsmith)

Selena Deckelmann :selenamarie :selena

Comment 3

•

8 years ago

Pinged Al about priority.

Christian Holler (:decoder)

Assignee

Comment 4

•

8 years ago

Alright, I spent a few hours debugging this (I think bisections wouldn't have helped given the outcome) and I think I solved it, but the findings are a bit worrying: First of all, we have several SIGSEGV handlers in the codebase, these include * toolkit/profile/nsProfileLock.cpp (installs all sorts of handlers for cleaning up profile locks) * js/src/ds/MemoryProtectionExceptionHandler.cpp (JS handler to annotate some interesting crashes) * js/src/wasm/WasmSignalHandlers.cpp (JS Handler for wasm/asm.js out-of-bounds checks) This is not the full list, but these are the ones that matter. First, I found out that with "allow_user_segv_handler=0" in ASAN_OPTIONS, the problem goes away. That is a clear indicator for one of our handlers being the problem because with this option, the program just can't install any SIGSEGV handlers at all. I then managed to narrow it down to the three handlers above: With all of them disabled, I would get an ASan trace as expected, enabling only one of them broke the ASan trace again. However, *none* of the signal handlers were actually called (confirmed with good old fprintf debugging). And after a while I realized why this is the case: None of these handlers use SA_ONSTACK [1], so they run on the same stack as the main program. That won't work because we happen to be out of stack space so already the attempt to call our signal handler crashes the program. ASan sets up the alternate signal stack and uses that to catch signals, which is why ASan is still working in this situation. The fix for ASan here is to add SA_ONSTACK to all three signal handlers (patch coming in a few). Now for the reason why this is worrying: This does not only affect ASan. Basically, I expect these three signal handlers to break *all* of our SIGSEGV handling at least on Linux, including the crash reporter, if we happen to be out of stack space. Worst case this means that we are missing crash reports on Linux (maybe Android too?) if we are out of stack space. In combination with stack clash issues (e.g. bug 909094) this might be happening more often also in general OOM situations. In the JS engine, I hit a lot of over-recursion crashes due to that particular bug. I'm not sure about other OSes as I don't know enough about their signal handling mechanisms. I know Mac uses a forwarding model as well but I don't know if by default, signal handlers are on the program's main stack. We should investigate this more closely. Since none of the other signal handlers will run, this could also lead to locks not being cleaned up (see nsProfileLock.cpp) after such a crash. For the final question: How did this ever work? I don't know. From what I can see, this should have never worked. Maybe previously, there was some stack space reserved somehow (by the kernel? by us?) which somehow magically made it work. In the Linux Kernel, things regarding the stack recently changed due to the Stack Clash vulnerability, maybe that regressed it. I'm putting a needinfo on Ted and Nathan here to give feedback on this issue and to help estimate the impact of this outside of ASan. I'll also upload a patch in a few that should fix the problem. [1] http://man7.org/linux/man-pages/man2/sigaltstack.2.html

Flags: needinfo?(twsmith)

Flags: needinfo?(ted)

Flags: needinfo?(nfroyd)

Comment hidden (mozreview-request)

Christian Holler (:decoder)

Assignee

Comment 7

•

8 years ago

Comment on attachment 8916274 [details] Bug 1403668 - Use SA_ONSTACK for several SIGSEGV handlers. Tyson, could you confirm that this patch fixes the ASan build behavior? It does fix it for me locally.

Attachment #8916274 - Flags: feedback?(twsmith)

Christian Holler (:decoder)

Assignee

Comment 8

•

8 years ago

One more thing I noticed while looking at this code is that the signal handlers in nsProfileLock.cpp do not use SA_NODEFER while the other SIGSEGV signal handlers we have do. Is this intended?

Tyson Smith [:tsmith]

Reporter

Comment 9

•

8 years ago

(In reply to Christian Holler (:decoder) from comment #7) > Tyson, could you confirm that this patch fixes the ASan build behavior? It > does fix it for me locally. Of course. I'm traveling at the moment and tomorrow is a holiday here but I will test it ASAP. Thanks for the patch!

Christian Holler (:decoder)

Assignee

Comment 10

•

8 years ago

Green on try (oranges are unrelated intermittents, OSX perma-orange is unrelated according to sheriffs): https://treeherder.mozilla.org/#/jobs?repo=try&revision=91573e68a293

Christian Holler (:decoder)

Assignee

Updated

•

8 years ago

Assignee: nobody → choller

(not currently active) Ted Mielczarek

Comment 11

•

8 years ago

Breakpad's signal handler does use sigaltstack / SA_ONSTACK: https://dxr.mozilla.org/mozilla-central/rev/5eba13f5b3a6ad80decdd8c7b30bff5fa477844f/toolkit/crashreporter/breakpad-client/linux/handler/exception_handler.cc#134 https://dxr.mozilla.org/mozilla-central/rev/5eba13f5b3a6ad80decdd8c7b30bff5fa477844f/toolkit/crashreporter/breakpad-client/linux/handler/exception_handler.cc#299 ...but it's entirely plausible that these other handlers not doing so breaks our ability to handle stack overflow. We have had bug 481781 on file for a *long* time.

Flags: needinfo?(ted)

Nathan Froyd [:froydnj]

Comment 12

•

8 years ago

I can believe that adding SA_ONSTACK to registering these signal handlers works; I am skeptical that we really want to do this everywhere. We should at least do it for ASan, because we know that ASan registers an alternate stack. Do we want to do it in general for non-ASan builds and silently depend on Breakpad's alternate signal stack? That doesn't seem like a good thing to me; it seems like we should be setting up our own alternate stack somewhere before we register breakpad (making sure breakpad won't overwrite ours), and then setting SA_ONSTACK. I don't know why this hasn't been a problem before...stack overflow is not *terribly* common, though, so perhaps the issue was just hidden.

Flags: needinfo?(nfroyd)

Christian Holler (:decoder)

Assignee

Comment 13

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #12) > I can believe that adding SA_ONSTACK to registering these signal handlers > works; I am skeptical that we really want to do this everywhere. We should > at least do it for ASan, because we know that ASan registers an alternate > stack. Do we want to do it in general for non-ASan builds and silently > depend on Breakpad's alternate signal stack? That doesn't seem like a good > thing to me; The worst thing that can happen (in builds without ASan *and* crash reporter), is that the signal handlers behave as they did before this patch. The man page for SA_ONSTACK says: > "If an alternate stack is not available, the default stack will be used." So omitting this flag in any configurations that don't have a sigaltstack setup has no effect at all. > it seems like we should be setting up our own alternate stack > somewhere before we register breakpad (making sure breakpad won't overwrite > ours), and then setting SA_ONSTACK. I agree that we should do this, but it sounds like a major refactoring that we should not try to push through in this bug. This is a priority bug for fuzzing because it keeps hitting us in automation. With this simple patch we have the chance to fix the problem both for ASan as well as for our release builds (fixing problems with the crash reporter that we had for 9 years now). We can then still make a refactoring to fix the bug in builds that don't use either ASan or the crash reporter. Therefore I propose to move forward with this patch and then file a follow-up bug for making a generic sigaltstack refactoring.

Emanuel Hoogeveen [:ehoogeveen]

Comment 14

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #12) > it seems like we should be setting up our own alternate stack > somewhere before we register breakpad (making sure breakpad won't overwrite > ours), and then setting SA_ONSTACK. FWIW, it doesn't seem to matter *when* you create the alternate stack, just so long as it exists when the exception handler is called. So we could probably check if an alternate stack exists after everything else (Breakpad, ASan, Elf Loader) has had the chance to set one up, and create it if not. The following did the trick in local testing (very quick and dirty of course): stack_t alt = {}; alt.ss_size = 64 * 1024; alt.ss_sp = mmap(nullptr, alt.ss_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, 0, 0); sigaltstack(&alt, nullptr); So it's just a matter of figuring out where/when to do it.

(not currently active) Ted Mielczarek

Comment 15

•

8 years ago

We forked the Breakpad client a while ago, so we could absolutely pull the altstack stuff out into somewhere common and just call it once on startup and make everything else in Gecko use it.

Christian Holler (:decoder)

Assignee

Updated

•

8 years ago

Blocks: 1407075

Christian Holler (:decoder)

Assignee

Comment 16

•

8 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #15) > We forked the Breakpad client a while ago, so we could absolutely pull the > altstack stuff out into somewhere common and just call it once on startup > and make everything else in Gecko use it. I filed bug 1407075 for this.

(not currently active) Ted Mielczarek

Comment 17

•

8 years ago

mozreview-review

Comment on attachment 8916274 [details] Bug 1403668 - Use SA_ONSTACK for several SIGSEGV handlers. https://reviewboard.mozilla.org/r/187488/#review193100 Good catch! My old "crash me now" extension no longer works, but it would be interesting to re-test crashing with a stack overflow on a nightly build with this patch.

Attachment #8916274 - Flags: review?(ted) → review+

Pulsebot

Comment 18

•

8 years ago

Pushed by choller@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0152a50835c9 Use SA_ONSTACK for several SIGSEGV handlers. r=ted

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 20

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0152a50835c9

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox58: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla58

Tyson Smith [:tsmith]

Reporter

Comment 21

•

8 years ago

(In reply to Christian Holler (:decoder) from comment #7) > Tyson, could you confirm that this patch fixes the ASan build behavior? It > does fix it for me locally. Works for me as well. Thanks everyone!

Tyson Smith [:tsmith]

Reporter

Comment 22

•

8 years ago

mozreview-review

Comment on attachment 8916274 [details] Bug 1403668 - Use SA_ONSTACK for several SIGSEGV handlers. https://reviewboard.mozilla.org/r/187488/#review193604 It works. Thanks.

Attachment #8916274 - Flags: review+

Tyson Smith [:tsmith]

Reporter

Updated

•

8 years ago

Attachment #8916274 - Flags: feedback?(twsmith) → feedback+

Sylvestre Ledru [:Sylvestre]

Updated

•

6 years ago

Blocks: asan-maintenance

asan_verbose_log.txt 8 years ago Tyson Smith [:tsmith] 7.01 KB, text/plain		Details
Bug 1403668 - Use SA_ONSTACK for several SIGSEGV handlers. 8 years ago Christian Holler (:decoder) 59 bytes, text/x-review-board-request	ted : review+ tsmith : review+ tsmith : feedback+	Details