Closed Bug 979069 Opened 10 years ago Closed 10 years ago

ah_crap_handler, which does "Sleeping for 300 seconds." to attach a debugger (gdb) doesn't sleep reliably anymore

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla30

People

(Reporter: dbaron, Assigned: jchen)

References

Details

(Keywords: regression)

Attachments

(2 files)

A longstanding feature of debug builds was that when they crash, there's a signal handler that sleeps for 300 seconds so you have a chance to attach a debugger.

Lately, it's frequently been failing to work -- you'll see the "Sleeping for 300 seconds." message, but then see "Done sleeping..." immediately afterwards.  I don't know that it's been failing all the time, but it's been failing at least around half the time for me.  I recall njn mentioned the same problem on IRC yesterday, and I think I recall someone else seeing it as well.
'man 3 sleep' says:

       sleep()  may be implemented using SIGALRM; mixing calls to alarm(2) and
       sleep() is a bad idea.

which might explain why things like:

#0  mozilla::ThreadStackHelper::SigAction (aSignal=<optimized out>, 
    aInfo=<optimized out>, aContext=<optimized out>)
    at /home/dbaron/builds/ssd/mozilla-central/mozilla/xpcom/threads/ThreadStackHelper.cpp:146
#1  <signal handler called>
#2  __sigaddset (__sig=17, __set=0x7fffffffaca0)
    at ../sysdeps/unix/sysv/linux/bits/sigset.h:119
#3  __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:79
#4  0x00007ffff391f704 in ah_crap_handler (signum=11)
    at /home/dbaron/builds/ssd/mozilla-central/mozilla/toolkit/xre/nsSigHandlers.cpp:90

lead to problems, perhaps (although I think that's a handler for SIGPROF rather than SIGALRM)?

It's not clear to me why this ThreadStackHelper profiling code is running at all, given that I didn't do anything to invoke it.
ThreadStackHelper is used by BackgroundHangMonitor during a hang, which sleep(300) looks like. Depending on the sleep() implementation, I guess it's possible for SIGPROF to interrupt the sleep before the timeout is reached. Should be simple to fix though.
Assignee: nobody → nchen
Status: NEW → ASSIGNED
(Ted's 'crash me now' extension, https://code.google.com/p/crashme/ , may be useful in developing/testing a fix for this.)
Thanks Daniel! I think we should just disable BackgroundHangMonitor for debug builds.

We could temporarily disable it during the ah-crap handler, but we really shouldn't be doing that inside a signal handler because the disable mechanism is not async-signal safe.

Besides, we don't report hang data for debug builds and the data are probably not usable anyways, so disabling the whole thing for debug builds seems like a reasonable choice.
Attachment #8390120 - Flags: review?(nfroyd)
Attachment #8390120 - Flags: review?(nfroyd) → review+
https://hg.mozilla.org/mozilla-central/rev/1fe2943ce1e3
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: