Closed Bug 1477755 Opened 7 years ago Closed 7 years ago

Measure TimeoutManager execution with the DocGroup PerformanceCounter

Categories

(Toolkit :: Performance Monitoring, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: tarek, Assigned: tarek)

References

Details

Attachments

(1 file)

If we open two about:performance pages, the page that is not focused still have the timer that calls some code to refresh its internal counters, but it appears that in this case, DocGroup::Dispatch() is not called. DocGroup::Dispatch() is the place where we increment the perf counter. The result is that the dispatch count is not incremented even though the page is doing something. We need to find out how this code is executed and why it's not going through DocGroup::Dispatch() like when it's focused.
After some time debugging, this has nothing to do with the page being focused or not. The real issue is that window.setInterval() is not incrementing any counter because it is currently out of the DocGroup scope. This is my current understanding of what's happening: When window.setInterval() is called, the code looks for a target for the event using mWindow.EventTargetFor(TaskCategory::Timer) That call eventually calls DocGroup::EventTargetFor() which returns an nsIEventTarget implementation, and in our case the ThrottledEventQueue instance. nsIEventTarget has a Dispatch() function and when called with ThrottledEventQueue, it adds the event in that queue sink, and the runnable is eventually executed by nsThread::ProcessNextEvent If we want to have the DocGroup performance counter include the runnables triggered by setInterval(), we need to change DocGroup::EventTargetFor() so the nsIEventTarget implementation that is returned eventually calls our counter when a ::Dispatch occurs. (too bad DocGroup is not an nsIEventTarget) In order to avoid any backward compatibility issue, one idea I have would be to implement a ProxyEventTarget that wraps the EventTarget that is supposed to be returned by DocGroup::EventTargetFor() together with a reference to the counter, So when the dispatch occurs, we can happily increment the right counter and have the dispatch happen in the right spot like before. Adding Nathan & Andrea for some feedback on this
Flags: needinfo?(nfroyd)
Flags: needinfo?(amarchesini)
Summary: Missing some dispatches on unfocused windows → window.setInterval() is not tracked by the DocGroup PerformanceCounter
Some sort of proxy event target seems about right. I am a little unsure we are getting the abstractions right, though: TabGroup (via SchedulerGroup) already has per-category event targets, which we would be re-implementing for DocGroup. So we're doing a bunch of code duplication, plus we're consuming quite a bit of space for each DocGroup with all of these proxies, even if we lazily create the proxies. So are we doing the counter incrementing at the right level, should we be sharing the implementation somehow (which solves the code duplication problem, but not the space issues, I think), or something else?
Flags: needinfo?(nfroyd)
I might have an idea that could work, but requires a low level change, to say the least: 1. move GetPerformanceCounter() in the nsIRunnable interface itself. 2. in nsThread.ProcessNextEvent(), use a null-checked event->GetPerformanceCounter() to interact with the counter. TimeoutExecutor for example can link back to the counter that is located in DocGroup() via TimeoutManager.mWindow I guess. We would need to change every nsIRunnable in Firefox that we want to track and that can be linked back to a counter. (document/tab group or worker) I guess it’s ok for the existing counters to stay on DocGroup and WorkerPrivate with this design. The long term benefit though is that every runnable can potentially be added in a counter without having to change the logic, by implementing runnable.GetPerformanceCounter() We can even add a runnable name in the counter and get more details, and not enforce a "have to have a hostname or window id" rule for the counters. any thoughts ? Am I missing something obvious ? Is this too much of a change ? :)
Adding anything to nsIRunnable would be a pretty large change, as well as being undesirable from a space standpoint (pointers in vtables, extra space for functions, etc.). I see that nsThread::GetPerformanceCounter already checks for the existence of a separate interface, maybe you could put the function there instead?
ok, makes sens. Above is the (pragmatic) patch that roughly fixes the issue, by interacting directly with the counter in TimeoutManager::RunTimeout(). Maybe that's good enough for now ?
Summary: window.setInterval() is not tracked by the DocGroup PerformanceCounter → Measure TimeoutManager execution with the DocGroup PerformanceCounter
Comment on attachment 8995701 [details] Bug 1477755 - Measure TimeoutManager execution - https://reviewboard.mozilla.org/r/260084/#review267540 Seems like an OK thing to do. ::: dom/base/TimeoutManager.cpp:882 (Diff revision 2) > } > > // This timeout is good to run > + mozilla::PerformanceCounter* counter = nullptr; > + if (mozilla::StaticPrefs::dom_performance_enable_scheduler_timing()) { > + mozilla::TimeStamp start; This `start` variable dies at the end of the `enable_scheduler_timing()` block, which I don't think is what you want? ::: dom/base/TimeoutManager.cpp:897 (Diff revision 2) > + } > + } > + } > bool timeout_was_cleared = mWindow.RunTimeoutHandler(timeout, scx); > + if (counter) { > + mozilla::TimeDuration duration = TimeStamp::Now() - start; This expression appears to be getting a different `start` than the one you created in the `enable_scheduler_timing()` block above.
Attachment #8995701 - Flags: review?(nfroyd)
Comment on attachment 8995701 [details] Bug 1477755 - Measure TimeoutManager execution - https://reviewboard.mozilla.org/r/260084/#review267540 > This expression appears to be getting a different `start` than the one you created in the `enable_scheduler_timing()` block above. oh wow, good catch, thanks. I did not notice we already had one. I will rename it.
Comment on attachment 8995701 [details] Bug 1477755 - Measure TimeoutManager execution - https://reviewboard.mozilla.org/r/260084/#review267736 WFM, thank you!
Attachment #8995701 - Flags: review?(nfroyd) → review+
Comment on attachment 8995701 [details] Bug 1477755 - Measure TimeoutManager execution - https://reviewboard.mozilla.org/r/260084/#review268202 ::: dom/base/TimeoutManager.cpp:880 (Diff revision 6) > + mozilla::PerformanceCounter* counter = nullptr; > + mozilla::TimeStamp startHandler; > + > + if (mozilla::StaticPrefs::dom_performance_enable_scheduler_timing()) { > + nsIDocument* doc = mWindow.GetDocument(); > + if (doc) { > + dom::DocGroup* docGroup = doc->GetDocGroup(); > + if (docGroup) { > + counter = docGroup->GetPerformanceCounter(); > + if (counter) { > + counter->IncrementDispatchCounter(DispatchCategory(TaskCategory::Timer)); > + startHandler = TimeStamp::Now(); > + } > + } > + } > + } I'm not really that excited by adding another place to collect this data. I would be much more comfortable if we could solve this problem by fixing labelling of throttled event queues and/or timeout executor. If this isn't doable I suggest that we move at least the execution time measuing to TimeoutBudgetManager (since that will handle the recursive calls to RunTimeout due to nested event loops). With regard to counting dispatches I'm not even sure what that means in the context of running timeouts, since we can run several timeouts in the scope of one dispatch of the timeout executor. I think it it would be good if we could stay close to the interpretation of 1 dispatch actually meaning 1 dispatch of an event to the event loop.
Attachment #8995701 - Flags: review?(afarre) → review-
After a discussion with Andreas on IRC, I am going to update the patch to reflect several changes. I know that looks like a lot, but it's not so much in terms of impact. The patch is roughly making sure we don't lose the labeling on the way to the actual runnable execution and sets the dispatch increment right at execution time on a single location. changes: - revert the timeout performance counter - fix the labelling in the SchedulerGroup dispatch when we are not in the content process (we lose it right now) - add in SchedulerGroup::Runnable the task category it was dispatched with - in nsDocument.cpp, omit calls to mDocGroup->ValidateAccess() if we're not in the Content Process - add in both nsThread and WorkerThread a GetDispatchCategory() function that returns the category for the current event - remove all IncrementDispatch() from both the DocGroup and the Worker, and have a single one in nsThread::ProcessNextEvent() right after the event is finished (it can grab the category via GetDispatchCategory() With these changes, we should have a duration+dispatch count for every event dispatched in the scheduler that was labeled via SchedulerGroup, including those done outside the Content processes (like about: pages) The tests will ensure that: - we never have events that get counted as dispatched *and* that don't get an execution duration - we're actually counting setInterval & setTimeout calls - we're counting events on pages like about:memory
Flags: needinfo?(amarchesini)
welp. I am making progress. The patch works but not for the timers. Turns out the timer events are still not labeled because they are wrapped into a TimeoutExecutor runnable which is itself executed by a ThrottledEventQueue::Inner::Executor which is some kind of permanent runnable that dequeue timers event to run. I have tried to expose the interfaces so GetPerformanceCounter() would be able to crawl back to at least TimeoutExecutor (where I can actually get all the info), but ThrottledEventQueue::Inner is a private member so I can't add an IID accessor there for QueryObject to work. I think that ThrottledEventQueue should be doing the counting for its own event by tooling the ThrottledEventQueue::ExecuteRunnable() function. What worries me is that it's fully isolated event processor, so if an event dispatches other event, we won't be able to detect it, meaning that we might get false duration times. Maybe another option would be to have ThrottledEventQueue::Inner::Executor dispatch each event it dequeues as a top-level runnable that is passed to nsThread, which in that case would be able to count things properly. Trying stuff at this point... Note for the future: I feel like "meta runnables" add a lot of complexity in the code and some behaviors were developed under the nsIRunnable umbrella to basically bypass how the scheduler works. Maybe we could enforce a rule so a runnable is a single atomic execution block ?
Ah well no. Tooling the ThrottledEventQueue::ExecuteRunnable() : FAIL. There are too many things to change in order to make this work. I am basically changing all the classes in TimerXXX just to get back to the wrapped TimerExecutor runnable. At this point I can't think of another better solution than the original path to track timers, since it's executed in such an ad-hoc way in the scheduler. I propose that we keep the patch and rework it when the scheduler gets refactored. Independantly from this we should add the fix for the SchedulerGroup::LabeledDispatch() e.g.: nsresult SchedulerGroup::LabeledDispatch(TaskCategory aCategory, already_AddRefed<nsIRunnable>&& aRunnable, dom::DocGroup* aDocGroup) { nsCOMPtr<nsIRunnable> runnable(aRunnable); - if (XRE_IsContentProcess()) { - RefPtr<Runnable> internalRunnable = new Runnable(runnable.forget(), this, aDocGroup); + if (aDocGroup) { + RefPtr<Runnable> internalRunnable = new Runnable(runnable.forget(), this, aDocGroup, aCategory); return InternalUnlabeledDispatch(aCategory, internalRunnable.forget()); } return UnlabeledDispatch(aCategory, runnable.forget()); } and tweak nsDocument so we bypass the call to mDocGroup->ValidateAccess() when we're not in the Content Process
created Bug 1481389 for the SchedulerGroup::LabeledDispatch change. Andreas, after our IRC discussion, are you OK with pushing the initial patch here (I can add a note in TimeoutManager explaining we want to have something better once things gets refactored)
Flags: needinfo?(afarre)
Comment on attachment 8995701 [details] Bug 1477755 - Measure TimeoutManager execution - https://reviewboard.mozilla.org/r/260084/#review269210 This looks good, I only have one request. ::: dom/base/TimeoutManager.cpp:304 (Diff revision 7) > } > > return result; > } > > +PerformanceCounter* I'd really like a comment that states that this isn't ideal, and that in a perfect world we'd handle performance counters for timeouts in the same way we do in nsThread::ProcessNextEvent and if you could create and reference a new bug here that documents the findings about nsTimer, TimeoutExecutor and ThrottledEventQueue it would be perfect.
Attachment #8995701 - Flags: review?(afarre) → review+
Flags: needinfo?(afarre)
Added Bug 1482834 for the next steps
Pushed by tziade@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d754118be3db Measure TimeoutManager execution - r=farre,froydnj
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: