Much of the feedback from Netscape 6 indicates that there are hangs that many people are hitting. We currently have no mechanism for collecting data about hangs. If we could trap these hangs into talkback, we whttp://client/mojo/feedback/newsgroup_feedback_overview.htmlould be able to get information that helps us fix them. Even just getting people we can contact to reproduce problems would be an enormous help. I talked this over with Chris Saari and he believes it's straightforward to trap at least one class of hangs by having a timer that works just like the busy cursor does on the Mac. After some set number seconds where we have not returned to the event loop, this timer fires and either drops into talkback (if we're very sure it's always the right thing to do) or puts up a dialog asking the user how they'd like to procede (if we think that's safe or we're not always sure.) Chris doesn't know how to get into talkback, but can help with hooking up the event stuff. Someone would need to own the UI if we did a dialog. It would help to be able to tell talkback this report was due to a hang rather than relying on the user to say that in their comments.
This mechanism will work except we wont get stacktrace. With our qfa component we can trigger an artificial incident and bring the customized talkback dialogbox(using talkback server UI). By doing this, data will be maintained by talkback servers. Note: Customized Talkback UI is not available for Mac.
There's no way to force a stacktrace? What if we forced a crash by explicitly dereferencing a null pointer? :-)
I meant valid stack trace. We can always create an artificial crash. Will that be useful ? One major problem with that is it may make the system unstable.
Status: NEW → ASSIGNED
See bug 62447 for some discussion about how to intentionally cause a crash.
Chris Saari, Chofmann claims that the method you proposed is not sufficient because it's Mac-specific in some way (I forget the details.) Could we get some of that discussion here in the bug?
I think it would work XP, assuming your hung app was still going through the event loop. If not, it might still work on Windows and/or linux if we're processing timers asychronously (I'm not familar with how timers are implemented there), ie. they're not processed as part of the event loop. If you're hung at some interrupt level higher than timers, well, your life sucks. I highly doubt that though.
this would be great if we could do it. how long is "too long to be away from the UI event loop"? what part of the code monitors, or could monitor such lack of activity?
In the timer callback you check the machine's tick count and store it. In the event processing loop, you store the current time. If more than say, 10 seconds have passed between the last time you were in the event loop and the current time during the timer callback, you may wish to consider doing something about it. This only catches bugs that stop you from going through the event loop. It is entirely possible to make the app appear hung, yet still be going thorough the event loop. I cite my many 0.9.1 command dispatching/handling bugs.
Are you storing the system time? There may be scenarios that we aren't thinking of where this might cause problems: * What about sleep functions on laptops? The app will be suspended but "current time" will keep going. * What if I reset the time on my machine? Sudden time jumps might trigger a crash. Daylight savings time? Maybe we could use alecf's mozilla timer service instead of system time. If this goes in, we should make tunable/turn-off-able with a pref.
The Microsoft Error Reporting Tool catches hangs.
I thought I saw somewhere that this feature is already present. Is it fixed??
*** Bug 179855 has been marked as a duplicate of this bug. ***
re comment 11, this isn't fixed from my limited POV
Currently on the trunk nightlies, (in my experience) there have been more hangs than usual, so implementing this is more important than it's been for a while. I would suggest, rather than trying to identify when the CPU is too busy for too long or the cursor is in some mode or another, just catch a KILL signal (as distinct from a normal term signal, or whatever they call it on windows), because if it's hung, eventually someone has to kill it. Win XP already does this and offers to send reports to MS.
idea for breakpad? not that it could "catch" a hang, but perhaps a tool that can be user initiated ...
Assignee: namachi → jay
Status: ASSIGNED → NEW
Component: Talkback Client → Talkback Client
Product: Core → Core Graveyard
Talkback isn't used anymore: R.Invalid now.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → INVALID
See bug 429592 for the same idea in Breakpad.
You need to log in before you can comment on or make changes to this bug.