Open Bug 1616642 Opened 5 years ago Updated 2 years ago

The profiler label in ThreadEventQueue::GetEvent should not use the category IDLE if it's spinning on a different event target

Categories

(Core :: Gecko Profiler, defect, P2)

defect

Tracking

()

People

(Reporter: mstange, Unassigned)

References

(Blocks 1 open bug)

Details

Profile: https://perfht.ml/3bQ9gkh

In this profile there's a 2.3 second long hang where a local event target is being spun on:
https://searchfox.org/mozilla-central/rev/fca0be7e2cf2f922c9b927423ce28e8a04b3fd90/dom/localstorage/LSObject.cpp#1111-1117

During this nested event loop, no input events are processed and the browser is unresponsive. So it shouldn't really be considered "idle". But the category graph makes it look like the browser is idle.

This might not be a common problem but it's a case where the profiler is displaying very misleading information.

Good point Markus, thank you.

Interestingly, this idle label seems to only be present on Windows and Android (but not Linux nor Mac) :
https://searchfox.org/mozilla-central/search?q=nsAppShell%3A%3AProcessNextNativeEvent%3A

But anyway, there are other thread loops marked IDLE, are they all susceptible to be misleading?
https://searchfox.org/mozilla-central/search?q=symbol:E_%3CT_JS%3A%3AProfilingCategoryPair%3E_IDLE

Now these labels may be quite deeply-nested inside callers, so it doesn't seem trivial to convey this information about whether idle loops are really idle (waiting for external events) or not...

Not pondering too deeply about it (yet), I'm thinking that instead of marking the actual wait-blocking calls IDLE, we may want callers of event loops to set the proper category (IDLE or not), assuming that the actual event handling after the wait will always re-set that category to non-IDLE.
But then there'd be the risk that some samples around the event loop management would be marked IDLE as well -- but it doesn't seem as misleading to me, as the intention of whatever we're busily doing between events is effectively IDLE-ing.
Otherwise we'll need a way to force event loops callers to specify whether the inner wait should be marked IDLE or something else, but it seems heavy to me.

Ideas, suggestions?

Priority: -- → P2

Good points. I think most of those loops really are idle in all cases. The definition of "idle" is a bit problematic because it's mixing two concepts: CPU-idleness and responsiveness to events. This muddies the waters a little. All these event loops are CPU-idle. But event loops on the main thread should only be considered "responsive" if they actually process user input events.
Maybe we could add an argument to SpinEventLoopUntil and friends that propagates the intention downwards? Not sure. It's really just the main thread event loop that's special, I think.

That said, the jank marker gives this indication too, right?

That's true. But the category graph made me think that the jank marker was wrong, when it was in fact correct.

Here's another example where the Idle category is quite misleading. On the worker in this profile https://perfht.ml/2InlJ1l when looing at the Stack Height and call tree, we see that we are blocked on WorkerGlobalScope.importScripts or XMLHttpRequest.send but when looking at the category view it looks like nothing is happening.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.