Closed Bug 73087 Opened 23 years ago Closed 14 years ago

Implement mechanism to catch hangs and launch Talkback

Categories

(Core Graveyard :: Talkback Client, enhancement)

x86
Windows NT
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: selmer, Assigned: jay)

References

Details

Much of the feedback from Netscape 6 indicates that there are hangs that many
people are hitting.  We currently have no mechanism for collecting data about
hangs.  If we could trap these hangs into talkback, we
whttp://client/mojo/feedback/newsgroup_feedback_overview.htmlould be able to get
information that helps us fix them.  Even just getting people we can contact to
reproduce problems would be an enormous help.

I talked this over with Chris Saari and he believes it's straightforward to trap
at least one class of hangs by having a timer that works just like the busy
cursor does on the Mac.  After some set number seconds where we have not
returned to the event loop, this timer fires and either drops into talkback (if
we're very sure it's always the right thing to do) or puts up a dialog asking
the user how they'd like to procede (if we think that's safe or we're not always
sure.)

Chris doesn't know how to get into talkback, but can help with hooking up the
event stuff.  Someone would need to own the UI if we did a dialog.  It would
help to be able to tell talkback this report was due to a hang rather than
relying on the user to say that in their comments.
This mechanism will work except we wont get stacktrace. With our qfa 
component we can trigger an artificial incident and bring the customized 
talkback dialogbox(using talkback server UI). By doing this, data will be 
maintained by talkback servers. Note: Customized Talkback UI is not available
for Mac. 
There's no way to force a stacktrace?  What if we forced a crash by explicitly
dereferencing a null pointer?  :-)
I meant valid stack trace. We can always create an artificial crash. Will that 
be useful ? One major problem with that is it may make the system unstable.

Status: NEW → ASSIGNED
See bug 62447 for some discussion about how to intentionally cause a crash.
Chris Saari,  Chofmann claims that the method you proposed is not sufficient
because it's Mac-specific in some way (I forget the details.)  Could we get some
of that discussion here in the bug?
I think it would work XP, assuming your hung app was still going through the
event loop. If not, it might still work on Windows and/or linux if we're
processing timers asychronously (I'm not familar with how timers are implemented
there), ie. they're not processed as part of the event loop.

If you're hung at some interrupt level higher than timers, well, your life
sucks. I highly doubt that though.
this would be great if we could do it.  how long is "too long to be
away from the UI event loop"?   what part of the code monitors, or could
monitor such lack of activity?

In the timer callback you check the machine's tick count and store it. In the
event processing loop, you store the current time. If more than say, 10 seconds
have passed between the last time you were in the event loop and the current
time during the timer callback, you may wish to consider doing something about it.

This only catches bugs that stop you from going through the event loop. It is
entirely possible to make the app appear hung, yet still be going thorough the
event loop. I cite my many 0.9.1 command dispatching/handling bugs.
Are you storing the system time?  There may be scenarios that
we aren't thinking of where this might cause problems:
  * What about sleep functions on laptops?
    The app will be suspended but "current time" will keep going.
  * What if I reset the time on my machine?
    Sudden time jumps might trigger a crash.  Daylight savings time?

Maybe we could use alecf's mozilla timer service instead of system time.

If this goes in, we should make tunable/turn-off-able with a pref.
Blocks: 79151
The Microsoft Error Reporting Tool catches hangs.
I thought I saw somewhere that this feature is already present.
Is it fixed??
*** Bug 179855 has been marked as a duplicate of this bug. ***
re comment 11, this isn't fixed from my limited POV
Blocks: 238292
Currently on the trunk nightlies, (in my experience) there have been more hangs than usual, so implementing this is more important than it's been for a while.

I would suggest, rather than trying to identify when the CPU is too busy for too long or the cursor is in some mode or another, just catch a KILL signal (as distinct from a normal term signal, or whatever they call it on windows), because if it's hung, eventually someone has to kill it.  Win XP already does this and offers to send reports to MS.
idea for breakpad?  
not that it could "catch" a hang, but perhaps a tool that can be user initiated ...
Assignee: namachi → jay
Status: ASSIGNED → NEW
Product: Core → Core Graveyard
Talkback isn't used anymore:
R.Invalid now.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → INVALID
See bug 429592 for the same idea in Breakpad.
You need to log in before you can comment on or make changes to this bug.