429592 - (hang-detector) catch process hangs

Reporter

Description

•

17 years ago

Due to some recent change, Minefield has been hanging on me a few times daily. It doesn't seem to actually crash, but just pegs the CPU and is completely unresponsive. It would be interesting to be able to catch and report this kind of problem, as we do with crashes. For example, have a watchdog timer fire if the application fails to service the event loop within 30 seconds. A variation of this would be for general detection and reporting of when the application is responding sluggishly. For example, send a report if it took longer than 5 seconds to service the event loop (even if the browser recovered). Hard hangs (30+ seconds) might be nicely handled for the user by killing and restarting the application, like a crash, while sluggish response is more of a metrics-collection thing. What kind of data to report? A single stack is better than nothing, a few sequential stacks collected X ms apart would be better, and a full-fledged sampling run would be amazing.

timeless

Comment 1

•

17 years ago

personally i'd suggest borrowing the algorithm (or implementation) from windbg's !analyze -v -hang :)

(not currently active) Ted Mielczarek

Comment 2

•

17 years ago

Our integration of Breakpad currently expects to write a dump and exit the app, so if you're expecting it to recover, that will need to be changed. We're also not equipped to do anything more than write a single dump and send it at the moment. We could quite easily provide a method on nsICrashReporter to write a dump with a bit of extra info saying that it's a suspected hang (although we'd probably need to fix bug 397199 to make the info useful).

chris hofmann

Comment 5

•

15 years ago

sicking and damon were asking about the status of this at a recent crash kill. it would help right now in particular when we seem to have a few OOPP hangs.

Benjamin Smedberg

Assignee

Comment 6

•

15 years ago

This bug is about arbitrary hangs, which are very difficult to catch. We are already detecting and aborting for hangs which involve IPC calls to a plugin (although currently we don't get any crash reports from it, it usually un-hangs the browser process). Bug 544936 tracks getting minidumps from both processes involved in an IPC hang.

patch 13 years ago Andreas Gal :gal 5.57 KB, patch	benjamin : review-	Details \| Diff \| Splinter Review
Make AnnotateCrashReport threadsafe in the chrome process, rev. 1 13 years ago Benjamin Smedberg 5.04 KB, patch	ted : review+	Details \| Diff \| Splinter Review
Patch, unsets the hang monitor when we're blocked waiting for an event, rev. 2 13 years ago Benjamin Smedberg 18.33 KB, patch	cjones : review+	Details \| Diff \| Splinter Review
Patch, unsets the hang monitor when we're blocked waiting for an event, rev. 3 13 years ago Benjamin Smedberg 23.05 KB, patch	bent.mozilla : review+	Details \| Diff \| Splinter Review
Disable the hang monitor during test suites which also disable the DOM script timeout, rev. 1 13 years ago Benjamin Smedberg 4.67 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
Add more correct hang monitoring for Cocoa widgets, rev. 1 13 years ago Benjamin Smedberg 2.90 KB, patch	smichaud : review+	Details \| Diff \| Splinter Review
disable hang monitor during GLib main loop poll 13 years ago Karl Tomlinson (:karlt) 1.96 KB, patch	roc : review+ benjamin : feedback+	Details \| Diff \| Splinter Review
Disable hangmonitor for Talos in general, rev. 1 13 years ago Benjamin Smedberg 4.03 KB, patch	jmaher : review+ benjamin : checkin+	Details \| Diff \| Splinter Review