Closed Bug 1607953 Opened 4 years ago Closed 4 years ago

Thread runnable metrics are fairly broken when there are nested event loops

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla74
Tracking Status
firefox74 --- fixed

People

(Reporter: bzbarsky, Assigned: bzbarsky)

References

Details

Attachments

(3 files)

I was trying to figure out how to connect the thread metrics stuff bug 1437438 added to TaskController, and have concluded that the metrics are just broken. I'm hoping we don't really use them for anything...

Some specific things that are broken:

  1. If a runnable with a performance counter runs more than one nested runnable without performance counters, the time for the second and later nested runnables is blamed on the parent runnable. This happens because running the first nested runnable sets mCurrentEventLoopDepth to a new value and leaves it there. As a result, the second and later nested runnables end up with recursiveEvent set to false and don't reset mCurrentEventStart when they finish running, so their time just gets accumulated into the parent.

  2. If that is fixed, then the weird management of mCurrentPerformanceCounter, which is set to the value of currentPerformanceCounter in the recursiveEvent case after the event has run means that any time between nested runnables is not properly accounted in the parent runnable.

  3. If that is fixed, then time in the recursiveEvent case is not accounted properly even if the nested runnable has a docgroup attached to it.

OK, I started poking at this, and now I am really confused. We use mCurrentEventStart for both these metrics and for the long task stuff. But nested event loops will reset mCurrentEventStart in this block of code:

      // End of execution, we can send the duration for the group
      if (recursiveEvent) {
        // If we're in a recursive call, reset the timer,
        // so the parent gets its remaining execution time right.
        mCurrentEventStart = mozilla::TimeStamp::Now();
        mCurrentPerformanceCounter = oldPerformanceCounter;

so if we have a runnable that does some long work, then spins a nested event loop, we will lose track of that long work, for purposes of long task accounting, no?

I'm pretty happy to rewrite this in a way that makes sense to me, but trying to understand what this code is trying to do and why it has the random mainthread and !recursiveEvent etc checks...

Flags: needinfo?(nfroyd)

And in particular, what makes sense to me in terms of how this should work is that mCurrentEventStart (to be renamed) should store the point at which we last were about to call Run(). Then we do our long task accounting both before we update it (like we already do for the performance counter bits: we flush out any built-up time) and after Run() finishes. Or something like that...

I guess it depends on what this long task stuff is really trying to measure and how that should interact with nested event loops.

Summary: Thread runnable metrics are fairly broken → Thread runnable metrics are fairly broken when there are nested event loops
Flags: needinfo?(rjesup)
Flags: needinfo?(rjesup)
Flags: needinfo?(nfroyd)

The idea is to use the slice-accounting setup from the performance counters for
idle tasks as well, and fix the various bugs we have when nested event loops
are involved.

The bit in nsThread::SizeOfEventQueues that counted memory for
mCurrentPerformanceCounter was wrong all along, I think: the docgroup that owns
the performance counter should account for it.

Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED

Both before and after I see a similar pattern: it takes somewhere around 700-2000 runnables to reach 1ms overhead, but the largest "max overhead per runnable" numbers are in the 250-300 microsecond (so 0.25-0.3ms) range. Note that this is measuring both the performance counter overhead and the long task overhead.

Pushed by bzbarsky@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a83cd72bb8c6
Refactor our runnable metrics and long task code to make it more robust and correct.  r=jesup,tarek,froydnj
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: