We want the profiler (and I believe the hang monitor) to watch threads that may spend a lot of their lifetime sleeping. We could introduce a facility that would report if the thread is sleeping and avoid sampling that thread. I'd imagine this would be manual instrumentation. Perhaps there is a native (pthread?) way to check it but unless it's supported on most platforms we might be better with user space instrumentation. Ideally the instrumentation overhead would be worth the time saving. Note we would still be adding at the very least a well predicted branch when nothing is watching the state. In the case of the profiler false negative (reporting a thread isn't sleeping when it isn't) is acceptable but not the opposite.
reporting a thread isn't sleeping when it is* - is acceptable.
The patch in bug 909974 has a manual mechanism -- BackgroundHangMonitor::NotifyWait enters wait state, and BackgroundHangMonitor::NotifyActivity exits wait state. Since we already hook into the thread loop to monitor hangs, the additional overhead, which does not use a lock, is not significant. The mechanism does not try to prevent false positive or false negative though -- for example it's possible that a thread is between waking up and clearing the wait flag, so it's running but we detect it as waiting. I'm not sure if this problem is really solvable without introducing significant overhead.
If you mean during the point where the thread has woken up but hasn't had time to report that it's woken up, having this being reported as still sleeping is fine for profiling.
(In reply to Benoit Girard (:BenWa) from comment #3) > If you mean during the point where the thread has woken up but hasn't had > time to report that it's woken up, having this being reported as still > sleeping is fine for profiling. Right that's what I meant. In that case I think the profiler can just use the hang monitor's wait flag. I think in the end the profiler and the hang monitor will become interdependent, because I'm also looking at using a hung thread's profiler labels/markers from the hang monitor.
With the profiler you can now safetly take a sync backtrace which will use the most detailed information available for that thread. It will unwind the stack if possible and merge in the labels. This is what the IO instrumentation code is using and probably want the hang monitor should use.
(In reply to Benoit Girard (:BenWa) from comment #5) > With the profiler you can now safetly take a sync backtrace which will use > the most detailed information available for that thread. It will unwind the > stack if possible and merge in the labels. This is what the IO > instrumentation code is using and probably want the hang monitor should use. Yeah I think that'll be very useful for long hangs (perma-hangs). The hang monitor is also able to measure transient hangs -- tasks that took longer than expected but was able to eventually finish. We catch these transient hangs after-the-fact, and by that point there's no stack left. In that case, I'm wondering if we can use old entries from the pseudo-stack. Although not perfect, these old entries will show bits of what the thread was trying to do during the transient hang.
You need to log in before you can comment on or make changes to this bug.