Closed
Bug 1477755
Opened 7 years ago
Closed 7 years ago
Measure TimeoutManager execution with the DocGroup PerformanceCounter
Categories
(Toolkit :: Performance Monitoring, enhancement, P1)
Toolkit
Performance Monitoring
Tracking
()
RESOLVED
FIXED
mozilla63
Tracking | Status | |
---|---|---|
firefox63 | --- | fixed |
People
(Reporter: tarek, Assigned: tarek)
References
Details
Attachments
(1 file)
If we open two about:performance pages, the page that is not focused still have the timer that calls some code to refresh its internal counters, but it appears that in this case, DocGroup::Dispatch() is not called.
DocGroup::Dispatch() is the place where we increment the perf counter.
The result is that the dispatch count is not incremented even though the page is doing something.
We need to find out how this code is executed and why it's not going through DocGroup::Dispatch() like when it's focused.
Assignee | ||
Comment 1•7 years ago
|
||
After some time debugging, this has nothing to do with the page being focused or not.
The real issue is that window.setInterval() is not incrementing any counter because it is
currently out of the DocGroup scope.
This is my current understanding of what's happening:
When window.setInterval() is called, the code looks for a target for the event using mWindow.EventTargetFor(TaskCategory::Timer)
That call eventually calls DocGroup::EventTargetFor() which returns an nsIEventTarget implementation,
and in our case the ThrottledEventQueue instance.
nsIEventTarget has a Dispatch() function and when called with ThrottledEventQueue, it adds the event in that queue sink,
and the runnable is eventually executed by nsThread::ProcessNextEvent
If we want to have the DocGroup performance counter include the runnables triggered by setInterval(),
we need to change DocGroup::EventTargetFor() so the nsIEventTarget implementation that is returned eventually
calls our counter when a ::Dispatch occurs. (too bad DocGroup is not an nsIEventTarget)
In order to avoid any backward compatibility issue, one idea I have would be to implement
a ProxyEventTarget that wraps the EventTarget that is supposed to be returned by DocGroup::EventTargetFor()
together with a reference to the counter,
So when the dispatch occurs, we can happily increment the right counter and have the dispatch happen in the right spot like
before.
Adding Nathan & Andrea for some feedback on this
Flags: needinfo?(nfroyd)
Flags: needinfo?(amarchesini)
Assignee | ||
Updated•7 years ago
|
Summary: Missing some dispatches on unfocused windows → window.setInterval() is not tracked by the DocGroup PerformanceCounter
![]() |
||
Comment 2•7 years ago
|
||
Some sort of proxy event target seems about right. I am a little unsure we are getting the abstractions right, though: TabGroup (via SchedulerGroup) already has per-category event targets, which we would be re-implementing for DocGroup. So we're doing a bunch of code duplication, plus we're consuming quite a bit of space for each DocGroup with all of these proxies, even if we lazily create the proxies.
So are we doing the counter incrementing at the right level, should we be sharing the implementation somehow (which solves the code duplication problem, but not the space issues, I think), or something else?
Flags: needinfo?(nfroyd)
Assignee | ||
Comment 3•7 years ago
|
||
I might have an idea that could work, but requires a low level change, to say the least:
1. move GetPerformanceCounter() in the nsIRunnable interface itself.
2. in nsThread.ProcessNextEvent(), use a null-checked event->GetPerformanceCounter() to interact with the counter.
TimeoutExecutor for example can link back to the counter that is located in DocGroup() via TimeoutManager.mWindow I guess.
We would need to change every nsIRunnable in Firefox that we want to track and that can be linked back to a counter. (document/tab group or worker)
I guess it’s ok for the existing counters to stay on DocGroup and WorkerPrivate with this design.
The long term benefit though is that every runnable can potentially be added in a counter without having to change the logic,
by implementing runnable.GetPerformanceCounter()
We can even add a runnable name in the counter and get more details, and not enforce a "have to have a hostname or window id" rule for the counters.
any thoughts ? Am I missing something obvious ? Is this too much of a change ? :)
![]() |
||
Comment 4•7 years ago
|
||
Adding anything to nsIRunnable would be a pretty large change, as well as being undesirable from a space standpoint (pointers in vtables, extra space for functions, etc.).
I see that nsThread::GetPerformanceCounter already checks for the existence of a separate interface, maybe you could put the function there instead?
Comment hidden (mozreview-request) |
Assignee | ||
Comment 6•7 years ago
|
||
ok, makes sens.
Above is the (pragmatic) patch that roughly fixes the issue, by interacting directly with the counter in TimeoutManager::RunTimeout().
Maybe that's good enough for now ?
Assignee | ||
Updated•7 years ago
|
Summary: window.setInterval() is not tracked by the DocGroup PerformanceCounter → Measure TimeoutManager execution with the DocGroup PerformanceCounter
Comment hidden (mozreview-request) |
![]() |
||
Comment 9•7 years ago
|
||
mozreview-review |
Comment on attachment 8995701 [details]
Bug 1477755 - Measure TimeoutManager execution -
https://reviewboard.mozilla.org/r/260084/#review267540
Seems like an OK thing to do.
::: dom/base/TimeoutManager.cpp:882
(Diff revision 2)
> }
>
> // This timeout is good to run
> + mozilla::PerformanceCounter* counter = nullptr;
> + if (mozilla::StaticPrefs::dom_performance_enable_scheduler_timing()) {
> + mozilla::TimeStamp start;
This `start` variable dies at the end of the `enable_scheduler_timing()` block, which I don't think is what you want?
::: dom/base/TimeoutManager.cpp:897
(Diff revision 2)
> + }
> + }
> + }
> bool timeout_was_cleared = mWindow.RunTimeoutHandler(timeout, scx);
> + if (counter) {
> + mozilla::TimeDuration duration = TimeStamp::Now() - start;
This expression appears to be getting a different `start` than the one you created in the `enable_scheduler_timing()` block above.
Attachment #8995701 -
Flags: review?(nfroyd)
Assignee | ||
Comment 10•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8995701 [details]
Bug 1477755 - Measure TimeoutManager execution -
https://reviewboard.mozilla.org/r/260084/#review267540
> This expression appears to be getting a different `start` than the one you created in the `enable_scheduler_timing()` block above.
oh wow, good catch, thanks. I did not notice we already had one. I will rename it.
Comment hidden (mozreview-request) |
![]() |
||
Comment 12•7 years ago
|
||
mozreview-review |
Comment on attachment 8995701 [details]
Bug 1477755 - Measure TimeoutManager execution -
https://reviewboard.mozilla.org/r/260084/#review267736
WFM, thank you!
Attachment #8995701 -
Flags: review?(nfroyd) → review+
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 16•7 years ago
|
||
mozreview-review |
Comment on attachment 8995701 [details]
Bug 1477755 - Measure TimeoutManager execution -
https://reviewboard.mozilla.org/r/260084/#review268202
::: dom/base/TimeoutManager.cpp:880
(Diff revision 6)
> + mozilla::PerformanceCounter* counter = nullptr;
> + mozilla::TimeStamp startHandler;
> +
> + if (mozilla::StaticPrefs::dom_performance_enable_scheduler_timing()) {
> + nsIDocument* doc = mWindow.GetDocument();
> + if (doc) {
> + dom::DocGroup* docGroup = doc->GetDocGroup();
> + if (docGroup) {
> + counter = docGroup->GetPerformanceCounter();
> + if (counter) {
> + counter->IncrementDispatchCounter(DispatchCategory(TaskCategory::Timer));
> + startHandler = TimeStamp::Now();
> + }
> + }
> + }
> + }
I'm not really that excited by adding another place to collect this data. I would be much more comfortable if we could solve this problem by fixing labelling of throttled event queues and/or timeout executor.
If this isn't doable I suggest that we move at least the execution time measuing to TimeoutBudgetManager (since that will handle the recursive calls to RunTimeout due to nested event loops).
With regard to counting dispatches I'm not even sure what that means in the context of running timeouts, since we can run several timeouts in the scope of one dispatch of the timeout executor. I think it it would be good if we could stay close to the interpretation of 1 dispatch actually meaning 1 dispatch of an event to the event loop.
Attachment #8995701 -
Flags: review?(afarre) → review-
Assignee | ||
Comment 17•7 years ago
|
||
After a discussion with Andreas on IRC, I am going to update the patch to reflect several changes.
I know that looks like a lot, but it's not so much in terms of impact. The patch is roughly
making sure we don't lose the labeling on the way to the actual runnable execution
and sets the dispatch increment right at execution time on a single location.
changes:
- revert the timeout performance counter
- fix the labelling in the SchedulerGroup dispatch when we are not in the content process (we lose it right now)
- add in SchedulerGroup::Runnable the task category it was dispatched with
- in nsDocument.cpp, omit calls to mDocGroup->ValidateAccess() if we're not in the Content Process
- add in both nsThread and WorkerThread a GetDispatchCategory() function that returns the category for the current event
- remove all IncrementDispatch() from both the DocGroup and the Worker, and have a single one in nsThread::ProcessNextEvent() right after the event is finished (it can grab the category via GetDispatchCategory()
With these changes, we should have a duration+dispatch count for every event dispatched in the scheduler that was labeled via SchedulerGroup, including those done outside the Content processes (like about: pages)
The tests will ensure that:
- we never have events that get counted as dispatched *and* that don't get an execution duration
- we're actually counting setInterval & setTimeout calls
- we're counting events on pages like about:memory
Flags: needinfo?(amarchesini)
Assignee | ||
Comment 18•7 years ago
|
||
welp. I am making progress. The patch works but not for the timers.
Turns out the timer events are still not labeled because they are wrapped into a TimeoutExecutor runnable
which is itself executed by a ThrottledEventQueue::Inner::Executor which is some kind of permanent runnable that dequeue timers event to run.
I have tried to expose the interfaces so GetPerformanceCounter() would be able to crawl back to at least TimeoutExecutor (where I can actually get all the info), but ThrottledEventQueue::Inner is a private member so I can't add an IID accessor there for QueryObject to work.
I think that ThrottledEventQueue should be doing the counting for its own event by tooling the ThrottledEventQueue::ExecuteRunnable() function. What worries me is that it's fully isolated event processor, so if
an event dispatches other event, we won't be able to detect it, meaning that we might get false duration times.
Maybe another option would be to have ThrottledEventQueue::Inner::Executor dispatch each event it dequeues as a top-level runnable that is passed to nsThread, which in that case would be able to count things properly.
Trying stuff at this point...
Note for the future: I feel like "meta runnables" add a lot of complexity in the code and some behaviors were developed
under the nsIRunnable umbrella to basically bypass how the scheduler works. Maybe we could enforce a rule so a runnable is a single atomic execution block ?
Assignee | ||
Comment 19•7 years ago
|
||
Ah well no.
Tooling the ThrottledEventQueue::ExecuteRunnable() : FAIL.
There are too many things to change in order to make this work.
I am basically changing all the classes in TimerXXX just to get back to the wrapped TimerExecutor runnable.
At this point I can't think of another better solution than the original path to track timers, since it's executed in
such an ad-hoc way in the scheduler.
I propose that we keep the patch and rework it when the scheduler gets refactored.
Independantly from this we should add the fix for the SchedulerGroup::LabeledDispatch() e.g.:
nsresult
SchedulerGroup::LabeledDispatch(TaskCategory aCategory,
already_AddRefed<nsIRunnable>&& aRunnable,
dom::DocGroup* aDocGroup)
{
nsCOMPtr<nsIRunnable> runnable(aRunnable);
- if (XRE_IsContentProcess()) {
- RefPtr<Runnable> internalRunnable = new Runnable(runnable.forget(), this, aDocGroup);
+ if (aDocGroup) {
+ RefPtr<Runnable> internalRunnable = new Runnable(runnable.forget(), this, aDocGroup, aCategory);
return InternalUnlabeledDispatch(aCategory, internalRunnable.forget());
}
return UnlabeledDispatch(aCategory, runnable.forget());
}
and tweak nsDocument so we bypass the call to mDocGroup->ValidateAccess() when we're not in the Content Process
Assignee | ||
Comment 20•7 years ago
|
||
created Bug 1481389 for the SchedulerGroup::LabeledDispatch change.
Andreas, after our IRC discussion, are you OK with pushing the initial patch here (I can add a note in TimeoutManager explaining we want to have something better once things gets refactored)
Flags: needinfo?(afarre)
Comment hidden (mozreview-request) |
Comment 22•7 years ago
|
||
mozreview-review |
Comment on attachment 8995701 [details]
Bug 1477755 - Measure TimeoutManager execution -
https://reviewboard.mozilla.org/r/260084/#review269210
This looks good, I only have one request.
::: dom/base/TimeoutManager.cpp:304
(Diff revision 7)
> }
>
> return result;
> }
>
> +PerformanceCounter*
I'd really like a comment that states that this isn't ideal, and that in a perfect world we'd handle performance counters for timeouts in the same way we do in nsThread::ProcessNextEvent and if you could create and reference a new bug here that documents the findings about nsTimer, TimeoutExecutor and ThrottledEventQueue it would be perfect.
Attachment #8995701 -
Flags: review?(afarre) → review+
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(afarre)
Assignee | ||
Comment 23•7 years ago
|
||
Added Bug 1482834 for the next steps
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 26•7 years ago
|
||
Pushed by tziade@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d754118be3db
Measure TimeoutManager execution - r=farre,froydnj
Comment 27•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox63:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
You need to log in
before you can comment on or make changes to this bug.
Description
•