Closed
      
        Bug 1363829
      
      
        Opened 8 years ago
          Closed 8 years ago
      
        
    
  
consider using a single nsITimer for setTimeout() in windows  
    Categories
(Core :: DOM: Core & HTML, enhancement)
        Core
          
        
        
      
        
    
        DOM: Core & HTML
          
        
        
      
        
    Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox55 | --- | fixed | 
People
(Reporter: bkelly, Assigned: bkelly)
References
(Depends on 1 open bug, Regressed 2 open bugs)
Details
Attachments
(17 files, 61 obsolete files)
Currently we register a separate nsITimer for every setTimeout()/setInterval() call in nsGlobalWindow.  This is unnecessary since we only really need to be woken up for the *next* timer.  We can then schedule another nsITimer for the next one and so on.
This would be beneficial because we still have some performance issues with large numbers of nsITimers.  In particular, trying to cancel many timers so we can reschedule them triggers a lot of mutex contention with the timer thread.  This can block the main thread.  We do this sort of thing when switching tabs and rescheduling throttled timers due to the window coming to the foreground.
If we make this change we would be relying on the ThrottledEventQueue less.  The queue would still be useful to yield between timers from many windows in the same TabGroup, but timer floods would not back up the queue any more.  So the back pressure mechanism would likely need to change to look at the TimeoutManager's list of timers instead.
I'm going to see if I can get this into FF55.
|   | ||
| Comment 1•8 years ago
           | ||
(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #0)
> This would be beneficial because we still have some performance issues with
> large numbers of nsITimers.  In particular, trying to cancel many timers so
> we can reschedule them triggers a lot of mutex contention with the timer
> thread.  This can block the main thread.
Would it be equally feasible to add APIs for canceling multiple timers in one go instead?
Flags: needinfo?(bkelly)
| Assignee | ||
| Comment 2•8 years ago
           | ||
(In reply to Nathan Froyd [:froydnj] from comment #1)
> Would it be equally feasible to add APIs for canceling multiple timers in
> one go instead?
We could, but I think it would be strictly worse than what I am proposing.  Keeping the current many-timers approach with bulk cancellation would:
1) Use more memory for nsITimers objects
2) Lock the timer thread mutex more frequently for insertions
3) Generate more runnables for timeouts firing near the same time
4) Require building the list of timers to cancel, which would probably not scale well
Using a single timer per window avoids all these issues.  It just does less work for the same functional behavior.
Flags: needinfo?(bkelly)
| Assignee | ||
| Comment 3•8 years ago
           | ||
Ehsan asked me to put a profile showing the problem:
https://perfht.ml/2r2LQRy
This is from running the stress test at:
https://people-mozilla.org/~bkelly/timer-flood/index.html
Search for "CriticalSectionContended".
Cycle collection is also bad here, but its unclear to me if there is anything we can do to improve analyzing so many cycle collectable timer objects.
| Assignee | ||
| Comment 4•8 years ago
           | ||
| Assignee | ||
| Comment 5•8 years ago
           | ||
        Attachment #8867897 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 6•8 years ago
           | ||
| Assignee | ||
| Comment 7•8 years ago
           | ||
| Assignee | ||
| Comment 8•8 years ago
           | ||
This is still a work-in-progress.  I have the executor written and some cleanup done, but I still have to make TimeoutManager use the executor instead of InitTimer().
| Assignee | ||
| Comment 9•8 years ago
           | ||
        Attachment #8867964 -
        Attachment is obsolete: true
|   | ||
| Updated•8 years ago
           | 
Whiteboard: [qf] → [qf:p3]
| Assignee | ||
| Comment 10•8 years ago
           | ||
        Attachment #8867963 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 11•8 years ago
           | ||
        Attachment #8867968 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 12•8 years ago
           | ||
        Attachment #8867965 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 13•8 years ago
           | ||
        Attachment #8867966 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 14•8 years ago
           | ||
| Assignee | ||
| Comment 15•8 years ago
           | ||
| Assignee | ||
| Comment 16•8 years ago
           | ||
These changes look pretty promising locally.  They greatly smooth out the servo tiles demo.  Lets get a build on some other platforms to see how it behaves there (and if it breaks tests):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=838a4951ae9352a0cfa590e702cdbe0a3933198b
I still need to figure out what to do about the back pressure mechanism.  I either need to rewrite it not to depend on ThrottledEventQueue or just remove it.
| Assignee | ||
| Comment 17•8 years ago
           | ||
| Assignee | ||
| Comment 18•8 years ago
           | ||
| Assignee | ||
| Comment 19•8 years ago
           | ||
Hopefully this clears up most of the orange:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4f72aac2d49ef43c10c5fc5d8c7dc6e82d252a8
It looks like there are some tests, though, that race setTimeout(f, 0) against other behavior and expect setTimeout() to lose.  Since this patch speeds up setTimeout() these tests are failing.  I'll have to work through these things.
The reason setTimeout(f, 0) is much faster now is because we uses to schedule an nsITimer for it, but now we just dispatch a runnable to the current thread to execute the timer.  I believe this is correct behavior and its the tests that are broken in this case.
| Assignee | ||
| Comment 20•8 years ago
           | ||
| Assignee | ||
| Comment 21•8 years ago
           | ||
| Assignee | ||
| Comment 22•8 years ago
           | ||
Trying to figure out why reftests are failing:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=96e6ccbb66fdec536f65a691b79da630af39f214
| Updated•8 years ago
           | 
Blocks: Speedometer_V2
| Assignee | ||
| Comment 23•8 years ago
           | ||
Still some orange to figure out, but this is showing some improvements on Talos:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=bbf0763d07cb003559566318c5c7e0e00cf424aa&newProject=try&newRevision=53e3b97ff70e2dac6d888d9b1911cf0579bc0ea3&framework=1&showOnlyImportant=0&showOnlyConfident=1
|   | ||
| Updated•8 years ago
           | 
Blocks: TimeToFirstPaint_FB
|   | ||
| Updated•8 years ago
           | 
No longer blocks: TimeToFirstPaint_FB
| Assignee | ||
| Comment 24•8 years ago
           | ||
        Attachment #8869226 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 25•8 years ago
           | ||
        Attachment #8869227 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 26•8 years ago
           | ||
        Attachment #8869228 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 27•8 years ago
           | ||
        Attachment #8869229 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 28•8 years ago
           | ||
        Attachment #8869230 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 29•8 years ago
           | ||
        Attachment #8869231 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 30•8 years ago
           | ||
        Attachment #8869232 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 31•8 years ago
           | ||
Still some orange to fix, but getting close I think.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6059dfb0ec0599eb65aea49f7e65209e9273903a
| Assignee | ||
| Comment 32•8 years ago
           | ||
        Attachment #8870120 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 33•8 years ago
           | ||
Unfortunately the talos test improvements were due to a bug where we would often fire timers earlier than before.  I've fixed that and now its looking like this bug is talos neutral.
| Assignee | ||
| Comment 34•8 years ago
           | ||
The first few patches here will refactor TimeoutManager so that I can remove dependencies on the nsITimer and the specific Timeout in RunTimeout.
To start, this patch removes some dead code that used to involve passing nullptr to RunTimeout() for the Timeout arg.  This code was orphaned by my changes previously in bug 1303167.
        Attachment #8870116 -
        Attachment is obsolete: true
        Attachment #8870244 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 35•8 years ago
           | ||
Next, replace the Timeout argument to RunTimeout() with simply the deadline we wish to execute up to.  Most of the code in RunTimeout() only needs the deadline.
The code that checks to see if the we executed the target Timeout is removed.  This was really previously needed for cases where we did aggressive yielding.  Consider:
1. Timer t1 is scheduled
2. Timer t2 is scheduled
3. nsITimer for t2 fires, executes t1, and yields after only one timer
4. When t1 is executed its nsITimer is canceled.
5. No nsITimer ever fires again and t2 is never executed.
With the remaining patches in this bug this will no longer be a problem at all.  Also, after bug 1343912 we yield much less aggressively.  Its highly unlikely we will hit this condition even if someone were to run on just this patch and not the others in this bug.
        Attachment #8870117 -
        Attachment is obsolete: true
        Attachment #8870245 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 36•8 years ago
           | ||
Next, we need to change how the Timeout objects are held alive.  The nsITimer for each Timeout holds a ref and that will be going away.  Also, manual AddRef()/Release() is used for holding a ref while the timer is in the list which is error prone.
This patch simplifies everything by just using LinkedList<RefPtr<Timeout>> now that we support it.  This makes things much easier to reason about.
The ref held during the nsITimer callback is implemented as a self RefPtr<Timeout>.  This is also easier to understand, but it will be going away in a later patch anyway.
        Attachment #8870118 -
        Attachment is obsolete: true
        Attachment #8870248 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 37•8 years ago
           | ||
The last place the nsITimer is used is for extracting the actual scheduled delay when adjusting timers.  This patch just stores this value on the Timeout instead so we don't need to access the nsITimer.
        Attachment #8870119 -
        Attachment is obsolete: true
        Attachment #8870251 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 38•8 years ago
           | ||
This patch adds the bulk of the new execution code.  Its contained in a new TimeoutExecutor class:
* The TimeoutExecutor is can be scheduled in "Immediate" mode as a runnable or "Delayed" mode as an nsITimerCallback.
* There is the concept of "flex time" representing how far into the future we may fire a Timeout.  This matches the rounding performed by TimerThread today.
* If the next deadline has passed or less than "flex time" in the future then Immediate mode is triggered as a runnable dispatch.
* Otherwise the deadline is scheduled as a nsITimer in the future.
* The TimeoutExecutor provides a MaybeSchedule(deadline) method that can be called frequently.  It only reschedules the TimeoutExecutor if it sees an earlier deadline than its already scheduled for.
* The TimeoutExecutor must deal with duplicate execution calls.  The timer may still fire even if its canceled and we cannot cancel the nsIRunnable dispatch.
The next patch will actually hook this code up into TimeoutManager.
        Attachment #8870147 -
        Attachment is obsolete: true
        Attachment #8870254 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 39•8 years ago
           | ||
This integrates the TimeoutExecutor into the TimeoutManager.  This mainly involves calling MaybeSchedule() whenever we get a new Timeout or when we know the executor has been stopped.
The only places we need to explicitly stop the executor are:
1. If the next soonest Timeout is canceled with ClearTimeout().
2. If all the timeouts are canceled.
3. If the window is suspended.
        Attachment #8870121 -
        Attachment is obsolete: true
        Attachment #8870256 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 40•8 years ago
           | ||
This removes the now unused Timeout nsITimer code.
        Attachment #8870122 -
        Attachment is obsolete: true
        Attachment #8870257 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 41•8 years ago
           | ||
This removes the back pressure mechanism I added for FF52.  It is designed around every Timeout getting a runnable in the ThrottledEventQueue, but that is not longer the case.
I considered ways of implementing a new backpressure mechanism, but I decided not to for a few reasons:
1. We already provide some back pressure by limiting how many timers we will implement at once.  In particular, once you get many Timeout objects the cycle collector tends to limit things as well.
2. The old back pressure mechanism didn't work very well.  It prevented memory from growing rapidly, but it couldn't stop it for every work load.  It also triggered jank in this work loads due to ResetTimersForThrottleReduction().
3. The backpressue mechanism is complicated and hard to maintain.  The old one had to be tuned a few times as the browser changed.
Overall I think we are better off without a backpressure mechanism for now.  If I can think up a simple way to do it in the future we can always add it again.
My limited testing showed the CC tended to slow things down and memory growth slowed after 1+GB of Timeout objects.
        Attachment #8870123 -
        Attachment is obsolete: true
        Attachment #8870260 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 42•8 years ago
           | ||
Olli, I asked you about this test last week, I think.  Basically, it assumes that we can call setTimeout(f, 0) and that various async work will be done.  This is racy and fails with the new setTimeout() here.  Since we dispatch a runnable instead of using nsITimer we can execute slightly faster.
This patch converts to use requestAnimationFrame() to wait for next paint.  A double rAF is used since paint has not occurred after the first one.  (I tried with a single rAF and it did not fix the test.)
        Attachment #8870261 -
        Flags: review?(bugs)
| Assignee | ||
| Comment 43•8 years ago
           | ||
David, I think I asked you about this reftest a little in IRC.  It turned out not to be a problem with the reftest harness.
Instead, this reftest does a setTimeout(f, 0) on load and then tries to call focus() on an image map.  Unfortunately bug 1366642 means this won't work if first paint has not occurred yet.
This patch changes the reftest to use requestAnimationFrame() to wait for first paint before starting.
        Attachment #8870262 -
        Flags: review?(dbaron)
| Assignee | ||
| Comment 44•8 years ago
           | ||
This try is basically the same as the code I just asked for review on.  It just has the round.patch separated from the P5 here.  For review I folded it into P5.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=47262140282a6aa0e5273b69d3c25b93c116a1e8
| Assignee | ||
| Comment 45•8 years ago
           | ||
Talos results:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=94d7c5dfa2fbca52b417f0244dfd124b237b1acc&newProject=try&newRevision=47262140282a6aa0e5273b69d3c25b93c116a1e8&framework=1&showOnlyImportant=0&showOnlyConfident=1
Note, Joel suggested the bloom results are not significant.
| Assignee | ||
| Comment 46•8 years ago
           | ||
It seems this test is failing about 10% of the time with these patches:
https://dxr.mozilla.org/mozilla-central/source/browser/base/content/test/general/browser_bug462289.js
I'll try to figure out what is wrong there before anything lands.
| Assignee | ||
| Comment 47•8 years ago
           | ||
Comment on attachment 8870254 [details] [diff] [review]
P5 Add an executor class to represent the single timer executing timers for a window. r=ehsan
Sorry, I did not actually fold the rounding patch into here yet.
        Attachment #8870254 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 48•8 years ago
           | ||
Here is a profile with these patches of the timer-flood stress test:
https://perfht.ml/2qR48aC
Compare to comment 3.
There are still some long pauses, but they are all due to cycle collection now.  The only CriticalSectionContended sample is in a malloc call.
What was the failure mode for the image map reftests in P10 without the patch?
Flags: needinfo?(bkelly)
| Assignee | ||
| Comment 50•8 years ago
           | ||
(In reply to David Baron :dbaron: ⌚️UTC-4 from comment #49)
> What was the failure mode for the image map reftests in P10 without the
> patch?
It depended on the timing, but typically one of them would run their init() method before first paint.  This would cause the image page to miss the focus() and not trigger any rendering of the dotted outline.  This would then fail the image comparison.  Sometimes it was the -1.html output that was wrong and sometimes it was the -1-ref.html output that was wrong.
I put more description of the underlying problem in bug 1366642.
Flags: needinfo?(bkelly)
| Assignee | ||
| Comment 51•8 years ago
           | ||
Here is an example of one of the failures:
https://treeherder.mozilla.org/logviewer.html#?job_id=100314787&repo=try&lineNumber=2929
| Comment 52•8 years ago
           | ||
Comment on attachment 8870261 [details] [diff] [review]
P9 Make docshell chrome tests wait for next paint to avoid races. r=smaug
rs+
        Attachment #8870261 -
        Flags: review?(bugs) → review+
Comment on attachment 8870262 [details] [diff] [review]
P10 Make image map reftest wait for first paint before triggering focus/blur. r=dbaron
OK, r=dbaron, I guess.
I'm a little concerned that there might be tests that are no longer testing what they were originally designed to test, that you didn't detect through test failures, though.  Is there a way you could find such tests?
        Attachment #8870262 -
        Flags: review?(dbaron) → review+
| Assignee | ||
| Comment 54•8 years ago
           | ||
(In reply to David Baron :dbaron: ⌚️UTC-4 from comment #53)
> I'm a little concerned that there might be tests that are no longer testing
> what they were originally designed to test, that you didn't detect through
> test failures, though.  Is there a way you could find such tests?
Hmm, I'm not sure how if they don't fail in try.  To be clear, I believe this is a race in the test that has always been there due to the underlying image map issue.  Unfortunately I don't know a good way to find setTimeout() races in non-failing tests.
I will, however, do a new try without this patch to see if its still necessary with the final rounding changes I put in.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a86663f19f4cd74b32e4a3993c3b278373acb47
| Assignee | ||
| Comment 55•8 years ago
           | ||
On my windows desktop with a opt non-pgo build the patches here make setTimeout(f, 0) about 5us faster.  A t-test shows this is statistically significant.
I need to adjust some of the delayed rounding code, though.  My tests show that the current patches fire about 500us later than before.  The old code did no rounding, so I am probably being too conservative here.
| Assignee | ||
| Comment 56•8 years ago
           | ||
Comment on attachment 8870256 [details] [diff] [review]
P6 Use the TimeoutExecutor in TimeoutManager. r=ehsan
Going to change this patch as well to fix some of the rounding issues.
        Attachment #8870256 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 57•8 years ago
           | ||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #54)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=9a86663f19f4cd74b32e4a3993c3b278373acb47
P9 and P10 are still necessary.
| Assignee | ||
| Comment 58•8 years ago
           | ||
Lets try with a different rounding strategy.  This gives us the most accurate firing time in my local tests.  For a 25ms timer:
current m-c: mean delay of 27.5ms with stdev of 2.4ms
this patch set: mean delay of 25.1ms with stdev of 0.7ms
I wonder if our tests can handle this improved accuracy:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7e7192e9c9fac05998e9ebf559d851ee73898eb9
| Assignee | ||
| Comment 59•8 years ago
           | ||
I need to investigate these two test failures:
devtools/client/debugger/test/mochitest/browser_dbg_worker-window.js
browser/base/content/test/general/browser_bug462289.js
| Assignee | ||
| Comment 60•8 years ago
           | ||
See comment 38 for the main description.
This update patch further adjusts how precise delays are converted to integral milliseconds for nsITimer.  The previous version of the patch was trying to do a more typical round operation while previous m-c truncated in many cases.
After doing some testing this code now favors rounding down pretty aggressively.  If we fire too early we then schedule another runnable or 0ms timer to get us to the right time.  This provides greatly improved accuracy at the cost of only a couple more runnables.  Even on a fast machine it does not fire large number of runnables.  Also, this cost won't be incurred if the browser is busy as we are unlikely to get the timer fired early in that case.
        Attachment #8870254 -
        Attachment is obsolete: true
        Attachment #8870575 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 61•8 years ago
           | ||
See comment 39 for the main description.
This updated patch just adjusted how RunTimeout() uses the now timestamp a bit.
        Attachment #8870256 -
        Attachment is obsolete: true
        Attachment #8870577 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 62•8 years ago
           | ||
I'm going to make any additional changes as follow-on patches, so the current set can be reviewed.  I won't change them any more.  Sorry for the churn.
| Assignee | ||
| Comment 63•8 years ago
           | ||
So, every time we fail browser_dbg_worker-window.js the WorkerPrivate is cycle collected after the test tab-leak-check.  Every time we succeed the WorkerPrivate is cycle collected first.
While I didn't trace the object graph completely, I believe this test is creating a cycle between the window and worker.  It depends on CC to fire in order to clean up properly.
In the past we happened to always get the CC to run before the end of the test, but it was not guaranteed.  With the new setTimeout() implementation sometimes the test finishes just before the CC and we fail.
Ryan, this patch expands the debugger test cleanup routine to perform a CC in addition to the existing forceGC().  I also made it wait for an exact GC to complete for good measure.
This seemed more robust for the overall test suite instead of trying to manually cut the cycle in this particular test.  We don't want to have to hunt these things down again in the future.  I don't think manually cutting cycles is the point of these tests.
Here's a try build with 20 retriggers that shows it seems to fix the problems:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f227fff8334685ffa5c73af25687286370dc242a
I believe its because of the race with the CC timer I described above, but I suppose it could also just be changing timing again.
Here's a full try build with the new patch set:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2cc9a071d92b9eaf9dced6f197125bd56e64def8
        Attachment #8870627 -
        Flags: review?(jryans)
| Assignee | ||
| Comment 64•8 years ago
           | ||
I think just browser/base/content/test/general/browser_bug462289.js to sort out now.
Comment on attachment 8870627 [details] [diff] [review]
P11 Make debugger mochitests perform a full CC in addition to the forced GC during test cleanup. r=jryans
Review of attachment 8870627 [details] [diff] [review]:
-----------------------------------------------------------------
We do similar things in other DevTools test directories, and you appear to have a good reason for it, so looks good to me.
        Attachment #8870627 -
        Flags: review?(jryans) → review+
| Assignee | ||
| Comment 66•8 years ago
           | ||
Ryan convinced me in another bug we should not be firing early as often as we are.  So I'm removing the "flex time" to fire into the future:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3435f35fee4504063c1b2dd2f22d37a8cf26f186
Using this tool to measure precision:
https://people-mozilla.org/~bkelly/timeout-precision?mode=delta&delay=10&count=100
| Assignee | ||
| Comment 67•8 years ago
           | ||
This patch removes the "flex time" concept introduced in P5.  Ryan convinced me its actually a compat problem to fire setTimeout() early.  Our old implementation does it, but my testing shows that most other browsers don't.
Let's fix this by removing the flex time and preventing timers from firing early.
I still round down our nsITimer delay by 100 microseconds in order to enable the high precision mode.  My next patch will avoid this in background tabs.
Again, for reference I am using this tool to measure our timeout precision:
https://people-mozilla.org/~bkelly/timeout-precision/?mode=delta&delay=10&count=100
On windows I get about 25% of the timeouts scheduled by this page firing early.  Chrome doesn't fire any early.  Edge fires maybe 2% early.
        Attachment #8870952 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 68•8 years ago
           | ||
This patch introduces the concept of low and high precision.  In low precision we favor rounding up our nsITimer delay.  In high precision we favor rounding down our nsITimer delay.
The significance is that when we round down the nsITimer delay we may end up dispatching some additional 0-ms nsITimer or immediate runnables while we wait for the last few microseconds to count down to the deadline.  This is not that expensive, but lets avoid doing it on background tabs.  Who cares if they are a few milliseconds late.
        Attachment #8870953 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 69•8 years ago
           | ||
Note, I profiled the "high precision mode" where we may fire extra runnables until the deadline expires.  It doesn't even show up in the profile at all:
https://perfht.ml/2qW4ooP
https://perfht.ml/2qWvMTP
| Updated•8 years ago
           | 
        Attachment #8870244 -
        Flags: review?(ehsan) → review+
| Comment 70•8 years ago
           | ||
Comment on attachment 8870245 [details] [diff] [review]
P2 Removing the Timeout argument to TimeoutManager::RunTimeout(). r=ehsan
Review of attachment 8870245 [details] [diff] [review]:
-----------------------------------------------------------------
Fantastic!
        Attachment #8870245 -
        Flags: review?(ehsan) → review+
| Assignee | ||
| Comment 71•8 years ago
           | ||
| Assignee | ||
| Comment 72•8 years ago
           | ||
        Attachment #8871012 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 73•8 years ago
           | ||
Full try on updated tip:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6a55b82a5fdc7ca141690d5c08e89ac7a1167f91
Talos comparison build:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e0a2db5642d3573fc1bf528f641eaa1bec6995bd
| Assignee | ||
| Comment 74•8 years ago
           | ||
Comment on attachment 8871014 [details] [diff] [review]
P14 Make pushPrefEnv() use setTimeout(1) since setTimeout(0) fires slightly faster now. r=ehsan
So it seems specialpowers in general and pushPrefEnv in particular use setTimeout(f, 0) to determine "completion".  Now that setTimeout(f, 0) runs slightly faster this causes some tests fail.
For example, this test becomes intermittent:
https://dxr.mozilla.org/mozilla-central/source/layout/base/tests/test_bug394057.html
I can't see anything in this test other than the pushPrefEnv that could be async.
This patch bumps up the setTimeout calls to a delay of 1 which will ensure an nsITimer is used.  It fixes the test linked here and at least one other.
I feel this patch is kind of lame, but I don't have a better solution since it seems these operations in specialpowers don't have good completion events to really base this stuff on.
        Attachment #8871014 -
        Flags: review?(ehsan)
| Comment 75•8 years ago
           | ||
Comment on attachment 8870248 [details] [diff] [review]
P3 Improve Timeout ref counting to avoide bare AddRef/Release calls. r=ehsan
Review of attachment 8870248 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks, I think this is a great improvement over the current state of things!
::: dom/base/Timeout.h
@@ +100,5 @@
>  
>    // The language-specific information about the callback.
>    nsCOMPtr<nsITimeoutHandler> mScriptHandler;
>  
> +  RefPtr<Timeout> mClosureSelfRef;
Nit: This isn't really a closure, unless I'm missing something?  How about simply calling it mSelfRef?
        Attachment #8870248 -
        Flags: review?(ehsan) → review+
| Updated•8 years ago
           | 
        Attachment #8870251 -
        Flags: review?(ehsan) → review+
| Assignee | ||
| Comment 76•8 years ago
           | ||
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #75)
> > +  RefPtr<Timeout> mClosureSelfRef;
> 
> Nit: This isn't really a closure, unless I'm missing something?  How about
> simply calling it mSelfRef?
This member is removed in P6 anyway, so I probably won't bother renaming this, if thats ok.
| Comment 77•8 years ago
           | ||
Comment on attachment 8870577 [details] [diff] [review]
P6 Use the TimeoutExecutor in TimeoutManager. r=ehsan
Review of attachment 8870577 [details] [diff] [review]:
-----------------------------------------------------------------
This was a joy to review, thanks so much for doing this!
::: dom/base/TimeoutManager.cpp
@@ +1033,5 @@
>      return true;
>    }
>  
>    // Reschedule the OS timer. Don't bother returning any error codes if
>    // this fails since the callers of this method don't care about them.
This comment isn't accurate any more.
::: dom/base/TimeoutManager.h
@@ +219,5 @@
>  
>    // Each nsGlobalWindow object has a TimeoutManager member.  This reference
>    // points to that holder object.
>    nsGlobalWindow&             mWindow;
> +  RefPtr<TimeoutExecutor>     mExecutor;
Hmm, why is this an indirect member and not just an inline |TimeoutExecutor mExecutor;|?  I haven't yet looked at all of the remaining parts but if something else doesn't need the indirection please remove it.
        Attachment #8870577 -
        Flags: review?(ehsan) → review+
| Comment 78•8 years ago
           | ||
Comment on attachment 8870575 [details] [diff] [review]
P5 Add an executor class to represent the single timer executing timers for a window. r=ehsan
Review of attachment 8870575 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/base/TimeoutExecutor.cpp
@@ +220,5 @@
> +
> +NS_IMETHODIMP
> +TimeoutExecutor::SetName(const char* aName)
> +{
> +  return NS_ERROR_NOT_IMPLEMENTED;
Can you please implement this method similar to existing nsINamed implementations as a debug-only thing?
::: dom/base/TimeoutExecutor.h
@@ +22,5 @@
> +  nsCOMPtr<nsITimer> mTimer;
> +  TimeStamp mDeadline;
> +  const TimeDuration mFlexTime;
> +
> +  enum class Mode
Can you please add some comment about what this means?  I didn't really understand this well before reading the next part of the patch series, it would be helpful to have some description of the states the object can live in here.
@@ +56,5 @@
> +  void
> +  Shutdown();
> +
> +  nsresult
> +  MaybeSchedule(const TimeStamp& aDeadline);
I found the names of these methods a bit hard to follow, especially in the .cpp file.  I'd consider avoiding using "Maybe" in the names of the methods.  But that requires a lot of code changes, so it's probably not worth it!  I don't really feel that strongly.
        Attachment #8870575 -
        Flags: review?(ehsan) → review+
| Comment 79•8 years ago
           | ||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #76)
> (In reply to :Ehsan Akhgari (super long backlog, slow to respond) from
> comment #75)
> > > +  RefPtr<Timeout> mClosureSelfRef;
> > 
> > Nit: This isn't really a closure, unless I'm missing something?  How about
> > simply calling it mSelfRef?
> 
> This member is removed in P6 anyway, so I probably won't bother renaming
> this, if thats ok.
Of course!  (I'm reviewing these as I go, sorry about missing stuff like this.)
| Assignee | ||
| Comment 80•8 years ago
           | ||
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #77)
> This was a joy to review, thanks so much for doing this!
Thanks!
> Hmm, why is this an indirect member and not just an inline |TimeoutExecutor
> mExecutor;|?  I haven't yet looked at all of the remaining parts but if
> something else doesn't need the indirection please remove it.
In theory the execution runnable/timer can be scheduled when the nsGlobalWindow/TimeoutManager is destroyed.  It seemed easiest to handle this by making TimeoutExecutor implement the nsIRunnable/nsITimerCallback interfaces and be its own ref-counted object.
I don't think its too bad because we are saving a ton of heap allocations per nsITimer with the patches in this bug.  We don't have to allocate a separate set of nsTimer/nsTimerImpl/runnable objects for each Timeout like we used to.  Instead we just have a single TimeoutExecutor that stays alive and gets re-used.
| Comment 81•8 years ago
           | ||
Comment on attachment 8870257 [details] [diff] [review]
P7 Remove the Timeout nsITimer code. r=ehsan
Review of attachment 8870257 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/base/Timeout.cpp
@@ -27,5 @@
>      mPopupState(openAllowed)
>  {
>  }
>  
>  Timeout::~Timeout()
And then this will go also.
::: dom/base/Timeout.h
@@ +30,4 @@
>    : public LinkedListElement<RefPtr<Timeout>>
>  {
>  public:
>    Timeout();
Nit: now you could also have |~Timeout() = default;|.
        Attachment #8870257 -
        Flags: review?(ehsan) → review+
| Assignee | ||
| Comment 82•8 years ago
           | ||
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #78)
> > +NS_IMETHODIMP
> > +TimeoutExecutor::SetName(const char* aName)
> > +{
> > +  return NS_ERROR_NOT_IMPLEMENTED;
> 
> Can you please implement this method similar to existing nsINamed
> implementations as a debug-only thing?
I'm not sure what you mean, many (most?) are implemented with NS_ERROR_NOT_IMPLEMENTED:
https://dxr.mozilla.org/mozilla-central/source/dom/base/nsContentSink.cpp#1641
https://dxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#665
https://dxr.mozilla.org/mozilla-central/source/dom/events/EventStateManager.cpp#247
etc...
| Comment 83•8 years ago
           | ||
Comment on attachment 8870260 [details] [diff] [review]
P8 Remove TimeoutManager back pressure mechanism. r=ehsan
Review of attachment 8870260 [details] [diff] [review]:
-----------------------------------------------------------------
This is fine with me, but I would like Olli to sign off on this since originally he reviewed your patches for it.  In general I agree that it's a complicated and hard to maintain feature and I trust your judgement on whether it serves us more than it hurts us.
        Attachment #8870260 -
        Flags: review?(ehsan) → review?(bugs)
| Comment 84•8 years ago
           | ||
Comment on attachment 8870952 [details] [diff] [review]
P12 Prevent setTimeout() from firing early by removing TimeoutExecutor flex time. r=ehsan
Review of attachment 8870952 [details] [diff] [review]:
-----------------------------------------------------------------
If your version control setup makes it easy, please fold this into part 5, so that we don't add this and take it away needlessly!
        Attachment #8870952 -
        Flags: review?(ehsan) → review+
| Comment 85•8 years ago
           | ||
Comment on attachment 8870953 [details] [diff] [review]
P13 Creat a low precision mode and use it for scheduling TimeoutExecutor in background tabs. r=ehsan
Review of attachment 8870953 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/base/TimeoutExecutor.h
@@ +16,5 @@
>  
> +enum class TimeoutPrecision : uint8_t {
> +  Low,
> +  High
> +};
The patch comment (which is great!) should probably go here.
        Attachment #8870953 -
        Flags: review?(ehsan) → review+
| Comment 86•8 years ago
           | ||
Comment on attachment 8871014 [details] [diff] [review]
P14 Make pushPrefEnv() use setTimeout(1) since setTimeout(0) fires slightly faster now. r=ehsan
Review of attachment 8871014 [details] [diff] [review]:
-----------------------------------------------------------------
::: testing/specialpowers/content/specialpowersAPI.js
@@ +707,5 @@
>      if (typeof window != 'undefined')
>        setTimeout(callback, 0);
>      // for mochitest-plain
>      else
> +      content.window.setTimeout(callback, 1);
Please add a scary comment here warning people that it's important for this to not go below 1!
@@ +721,5 @@
>           if (typeof window != 'undefined')
>             setTimeout(aCallback, 0);
>           // For mochitest-plain
>           else
> +           content.window.setTimeout(aCallback, 1);
Ditto.
        Attachment #8871014 -
        Flags: review?(ehsan) → review+
| Comment 87•8 years ago
           | ||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #82)
> (In reply to :Ehsan Akhgari (super long backlog, slow to respond) from
> comment #78)
> > > +NS_IMETHODIMP
> > > +TimeoutExecutor::SetName(const char* aName)
> > > +{
> > > +  return NS_ERROR_NOT_IMPLEMENTED;
> > 
> > Can you please implement this method similar to existing nsINamed
> > implementations as a debug-only thing?
> 
> I'm not sure what you mean, many (most?) are implemented with
> NS_ERROR_NOT_IMPLEMENTED:
> 
> https://dxr.mozilla.org/mozilla-central/source/dom/base/nsContentSink.
> cpp#1641
> https://dxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.
> cpp#665
> https://dxr.mozilla.org/mozilla-central/source/dom/events/EventStateManager.
> cpp#247
> etc...
Oh... you're right, my memory is failing me.  Runnable::SetName() is *the exception* not the rule.  Sorry, my bad.
| Comment 88•8 years ago
           | ||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #80)
> (In reply to :Ehsan Akhgari (super long backlog, slow to respond) from
> comment #77)
> > This was a joy to review, thanks so much for doing this!
> 
> Thanks!
> 
> > Hmm, why is this an indirect member and not just an inline |TimeoutExecutor
> > mExecutor;|?  I haven't yet looked at all of the remaining parts but if
> > something else doesn't need the indirection please remove it.
> 
> In theory the execution runnable/timer can be scheduled when the
> nsGlobalWindow/TimeoutManager is destroyed.  It seemed easiest to handle
> this by making TimeoutExecutor implement the nsIRunnable/nsITimerCallback
> interfaces and be its own ref-counted object.
Oh, so it's lifetime can actually survive past the window/timeout manager... OK, that's true in theory.  Now that I think about it more, it may be better to actually not codify any lifetime assumptions in the code here and keep this a RefPtr.  But can you please add a comment explaining the lifetime of the member?
> I don't think its too bad because we are saving a ton of heap allocations
> per nsITimer with the patches in this bug.  We don't have to allocate a
> separate set of nsTimer/nsTimerImpl/runnable objects for each Timeout like
> we used to.  Instead we just have a single TimeoutExecutor that stays alive
> and gets re-used.
Yeah, the patches are improving things overall so I'm not very concerned about this.  :-)  And your argument about is actually a strong reason *not* to have this as an inline member.
| Assignee | ||
| Comment 89•8 years ago
           | ||
        Attachment #8870575 -
        Attachment is obsolete: true
        Attachment #8870952 -
        Attachment is obsolete: true
        Attachment #8871460 -
        Flags: review+
| Assignee | ||
| Comment 90•8 years ago
           | ||
        Attachment #8870577 -
        Attachment is obsolete: true
        Attachment #8871461 -
        Flags: review+
| Assignee | ||
| Comment 91•8 years ago
           | ||
        Attachment #8870257 -
        Attachment is obsolete: true
        Attachment #8871463 -
        Flags: review+
| Assignee | ||
| Comment 92•8 years ago
           | ||
        Attachment #8870953 -
        Attachment is obsolete: true
        Attachment #8871464 -
        Flags: review+
| Assignee | ||
| Comment 93•8 years ago
           | ||
I added a comment.  I also decided we should use 1ms throughout for the setTimeout() calls here.  I was confused and thought some of them were using nsITimer jsm thing, but they are all using the nsGlobalWindow version.
        Attachment #8871465 -
        Flags: review+
| Assignee | ||
| Comment 94•8 years ago
           | ||
        Attachment #8871464 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 95•8 years ago
           | ||
| Assignee | ||
| Comment 96•8 years ago
           | ||
Maybe P13 changes to specialpowers setTimeout is causing more trouble than it fixed.  Lets see what it looks like without that:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef58152d396e2d5c2d44a11f95ccc568cd4f3aa8
| Comment 97•8 years ago
           | ||
Comment on attachment 8870260 [details] [diff] [review]
P8 Remove TimeoutManager back pressure mechanism. r=ehsan
I'm ok removing this. TimeoutManager has gotten so complicated that some code removals are good.
        Attachment #8870260 -
        Flags: review?(bugs) → review+
| Assignee | ||
| Comment 98•8 years ago
           | ||
Comment on attachment 8871465 [details] [diff] [review]
P13 Make pushPrefEnv() use setTimeout(1) since setTimeout(0) fires slightly faster now. r=ehsan
In retrospect I don't think this patch is necessary.
        Attachment #8871465 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 99•8 years ago
           | ||
Comment on attachment 8871468 [details] [diff] [review]
P12 Create a low precision mode and use it for scheduling TimeoutExecutor in background tabs. r=ehsan
I think I have a better solution than the "high precision mode" thing that does a short async runnable spin of the event loop.
        Attachment #8871468 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 100•8 years ago
           | ||
The root cause of the precision problems in this bug are that we require an integer number of milliseconds be passed to nsITimer.  This is an artificial limitation, though.  The underlying implementation already supports a finer grained measurement.
This patch simply pipes TimeDuration into the nsITimer using a new PreciseInitWithCallback() method.  I only bothered with the nsITimerCallback variant since that is what I needed.  Others could be added if desired.
I also preserved TimeDuration throughout the TimerThread run loop, Monitor, and CondVar layers.  This allows the precise delay to be passed down to the OS level call.
You may be asking, why do we need this now and not before?
To answer, consider this situation:
1. setTimeout(f, 1);
2. do some work for 500us
3. setTimeout(g, 1);
This should fire f() and g() 500us apart.
In our existing code this works because each setTimeout() gets its on nsITimer.  The 1ms is preserved and pass through to nsITimer which converts it to a TimeStamp, etc.
With the patches in this bug there is only one nsITimer.  We fire f() and then try to schedule a new nsITimer for g().  Its only 500us in the future, though!  What do we do?  We either fire early and spin for a while or fire late.
The best solution I've found is to preserve the accuracy through the nsITimer implementation.  This avoids extra runnables and wakeups, while also improving our overall accuracy.
On my machine using this approach changed the output on this site:
https://people-mozilla.org/~bkelly/timeout-precision/?mode=delta&delay=10&count=100
From:
min:0.095 mean:0.640 median:0.525 max:6.860 stdev:0.727
To:
min:0.015 mean:0.842 median:0.052 max:5.055 stdev:1.589
As you can see, our median deviation from the target firing time is greatly improved.
        Attachment #8871910 -
        Flags: review?(nfroyd)
| Assignee | ||
| Comment 101•8 years ago
           | ||
Ehsan, this patch replaces the old patch to create a High/Low precision enumeration.  Instead of doing that we instead just use the new API to pass the precise delay to nsITimer.  This avoid the need to round at all.
        Attachment #8871912 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 102•8 years ago
           | ||
| Assignee | ||
| Updated•8 years ago
           | 
        Attachment #8871912 -
        Flags: review?(ehsan)
|   | ||
| Comment 103•8 years ago
           | ||
Comment on attachment 8871910 [details] [diff] [review]
P12 Expose nsITimer::PreciseInitWithCallback() that takes a TimeDuration delay. r=froydnj
Review of attachment 8871910 [details] [diff] [review]:
-----------------------------------------------------------------
I think this is OK; it would be nice if your explanation was in the commit message and/or found a nice comment in the source code.  I can't think of a good place to drop the comment in the source code, though.
r-'ing just for the wait() thing.  The naming thing is just complaining, but if you have a better idea for the name...InitWithCallbackAndTimeDuration?
::: xpcom/threads/CondVar.h
@@ -64,5 @@
>  #ifdef MOZILLA_INTERNAL_API
>      GeckoProfilerThreadSleepRAII sleep;
>  #endif //MOZILLA_INTERNAL_API
> -    if (aInterval == PR_INTERVAL_NO_TIMEOUT) {
> -      mImpl.wait(*mLock);
Can we please try and retain this behavior of calling the no-timeout wait?  The no-argument/PR_INTERVAL_NO_TIMEOUT case is overwhemingly the common case and involves significantly less work on all of our platforms.
::: xpcom/threads/nsITimer.idl
@@ +170,5 @@
> +   * @param aFunc      nsITimerCallback interface to call when timer expires
> +   * @param aDelay     The high precision interval
> +   * @param aType      Timer type per TYPE* consts defined above
> +   */
> +  [noscript] void preciseInitWithCallback(in nsITimerCallback aCallback,
I wish there was a different adjective to use to describe this.  We have "precise" in the TYPE* constants which has (had) a completely different meaning than the one here.  Unfortunately, I can't think of a different adjective to use here, and I think providing a function overload doesn't work unless we [binaryName] the existing initWithCallback...which just moves the naming issue somewhere else and makes the patch for this bug much bigger.
        Attachment #8871910 -
        Flags: review?(nfroyd) → review-
| Assignee | ||
| Comment 104•8 years ago
           | ||
(In reply to Nathan Froyd [:froydnj] from comment #103)
> ::: xpcom/threads/CondVar.h
> @@ -64,5 @@
> >  #ifdef MOZILLA_INTERNAL_API
> >      GeckoProfilerThreadSleepRAII sleep;
> >  #endif //MOZILLA_INTERNAL_API
> > -    if (aInterval == PR_INTERVAL_NO_TIMEOUT) {
> > -      mImpl.wait(*mLock);
> 
> Can we please try and retain this behavior of calling the no-timeout wait? 
> The no-argument/PR_INTERVAL_NO_TIMEOUT case is overwhemingly the common case
> and involves significantly less work on all of our platforms.
I can put this back, but the underlying primitive explicitly handles TimeDuration::Forever():
https://dxr.mozilla.org/mozilla-central/source/mozglue/misc/ConditionVariable_posix.cpp#126
https://dxr.mozilla.org/mozilla-central/source/mozglue/misc/ConditionVariable_windows.cpp#69
| Assignee | ||
| Comment 105•8 years ago
           | ||
(In reply to Nathan Froyd [:froydnj] from comment #103)
> I think this is OK; it would be nice if your explanation was in the commit
> message and/or found a nice comment in the source code.  I can't think of a
> good place to drop the comment in the source code, though.
I'll put the explanation in a comment in the next patch where TimeoutExecutor uses this API.
| Assignee | ||
| Comment 106•8 years ago
           | ||
        Attachment #8871910 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 107•8 years ago
           | ||
Comment on attachment 8871938 [details] [diff] [review]
P12 Expose nsITimer::InitWithCallbackAndTimeDuration() method. r=froydnj
Review of attachment 8871938 [details] [diff] [review]:
-----------------------------------------------------------------
Updated per our IRC conversation.
        Attachment #8871938 -
        Flags: review?(nfroyd)
| Assignee | ||
| Comment 108•8 years ago
           | ||
Updated with new method name and added explanatory comment.
        Attachment #8871912 -
        Attachment is obsolete: true
| Assignee | ||
| Updated•8 years ago
           | 
        Attachment #8871938 -
        Attachment description: P12 Expose nsITimer::PreciseInitWithCallback() that takes a TimeDuration delay. r=froydnj → P12 Expose nsITimer::InitWithCallbackAndTimeDuration() method. r=froydnj
| Assignee | ||
| Updated•8 years ago
           | 
        Attachment #8871954 -
        Attachment description: Use the nsITimer::InitWithCallbackAndTimeDuration() instead of rounding. r=ehsan → P13 Use the nsITimer::InitWithCallbackAndTimeDuration() instead of rounding. r=ehsan
| Assignee | ||
| Comment 109•8 years ago
           | ||
With instrumentation I see we still get 1 or 2 spurious wakeups when nsITimer fires early.  Unfortunately I don't think we can do any better right now without allowing early setTimeout() callbacks.  I think that is very undesirable from a web compat perspective.
| Assignee | ||
| Updated•8 years ago
           | 
        Attachment #8871954 -
        Flags: review?(ehsan)
| Assignee | ||
| Updated•8 years ago
           | 
        Attachment #8871014 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 110•8 years ago
           | ||
| Comment 111•8 years ago
           | ||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #109)
> With instrumentation I see we still get 1 or 2 spurious wakeups when
> nsITimer fires early.  Unfortunately I don't think we can do any better
> right now without allowing early setTimeout() callbacks.  I think that is
> very undesirable from a web compat perspective.
Any idea why? Is this just the OS being wonky? To wit, Sleep(1) doesn't sleep for 1ms but rather the shortest supported period, which is usually 0.5ms (at least after calling timeBeginPeriod(1), as I believe we do). Sleep(0) almost does the same thing as SwitchToThread() (that is, yields the rest of its slice to other threads), except that it also allows I/O to execute. Sleep(n) with n >= 2 work as you would expect.
| Assignee | ||
| Comment 112•8 years ago
           | ||
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #111)
> Any idea why? Is this just the OS being wonky? To wit, Sleep(1) doesn't
> sleep for 1ms but rather the shortest supported period, which is usually
> 0.5ms (at least after calling timeBeginPeriod(1), as I believe we do).
> Sleep(0) almost does the same thing as SwitchToThread() (that is, yields the
> rest of its slice to other threads), except that it also allows I/O to
> execute. Sleep(n) with n >= 2 work as you would expect.
Uh, because I missed some bits in TImerThread that still use the old integer based interval logic.  If I fix those I stop getting early wake ups.
Also, FWIW, this code does not use sleep.  It waits on a condition variable.
| Assignee | ||
| Updated•8 years ago
           | 
        Attachment #8871938 -
        Flags: review?(nfroyd)
| Assignee | ||
| Comment 113•8 years ago
           | ||
        Attachment #8871938 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 114•8 years ago
           | ||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #112)
> (In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #111)
> > Any idea why? Is this just the OS being wonky? To wit, Sleep(1) doesn't
> > sleep for 1ms but rather the shortest supported period, which is usually
> > 0.5ms (at least after calling timeBeginPeriod(1), as I believe we do).
> > Sleep(0) almost does the same thing as SwitchToThread() (that is, yields the
> > rest of its slice to other threads), except that it also allows I/O to
> > execute. Sleep(n) with n >= 2 work as you would expect.
> 
> Uh, because I missed some bits in TImerThread that still use the old integer
> based interval logic.  If I fix those I stop getting early wake ups.
> 
> Also, FWIW, this code does not use sleep.  It waits on a condition variable.
I'm going to have to investigate this a bit more.  I'm not sure how safe it is to try to change this logic given that underlying windows code is based on integer milliseconds:
https://dxr.mozilla.org/mozilla-central/source/mozglue/misc/ConditionVariable_windows.cpp#72
I don't want to accidentally make the TimerThread spin a bunch.
I think review of this will have to wait for next Wednesday after the holiday.
| Assignee | ||
| Comment 115•8 years ago
           | ||
Dao, I found that browser_bug462289.js fails frequently with these patches.  I saw that you changed this file to use executeSoon() instead of setTimeout(f, 0) in bug  1102017.  Was there a reason for that?
I spoke with Blake and he suggested the significant change bug 1102017 was waiting for TabSwitchDone and that the executeSoon() changes may not be strictly necessary.
If I switch the executeSoon() calls back to setTimeout(f, 0) the test is no longer intermittent.
Given that the timing of setTimeout is slightly different its possible this test is racing with a setTimeout somewhere else in browser chrome code.  Using setTimeout here would provide some ordering guarantees that executeSoon does not.  That is my best theory about why it works here.
        Attachment #8871998 -
        Flags: review?(dao+bmo)
| Assignee | ||
| Comment 116•8 years ago
           | ||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #112)
> (In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #111)
> > Any idea why? Is this just the OS being wonky? To wit, Sleep(1) doesn't
> > sleep for 1ms but rather the shortest supported period, which is usually
> > 0.5ms (at least after calling timeBeginPeriod(1), as I believe we do).
> > Sleep(0) almost does the same thing as SwitchToThread() (that is, yields the
> > rest of its slice to other threads), except that it also allows I/O to
> > execute. Sleep(n) with n >= 2 work as you would expect.
> 
> Uh, because I missed some bits in TImerThread that still use the old integer
> based interval logic.  If I fix those I stop getting early wake ups.
This was not why we were getting spurious wakeups.  The issue is that TimerThread will execute a timer up to have the resolution of a PRInterval early.  On windows PRInterval is 1ms, so this means TimerThread will fire a timer up to 500us early.
I'm not sure I feel comfortable changing that here.
| Assignee | ||
| Comment 117•8 years ago
           | ||
Or maybe it will fire 250us early.  In any case, its a fairly large number of microseconds that TimerThread explicitly decides to fire early on windows.
| Assignee | ||
| Comment 118•8 years ago
           | ||
        Attachment #8871996 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 119•8 years ago
           | ||
        Attachment #8872001 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 120•8 years ago
           | ||
Comment on attachment 8872003 [details] [diff] [review]
P12 Expose nsITimer::InitWithCallbackAndTimeDuration() method. r=froydnj
Ok, this implements the changes we discussed in IRC:
1. Separate CondVar::Wait() in non-debug.
2. Change the name to InitWithCallbackAndTimeDuration()
I also changed TimerThread::Run() to use TimeDuration throughout.
Finally, I was using TimeDuration::Resolution() in TimerThread::Run().  I don't think this is equivalent to the halfMicroSecondsIntervalResolution, though.  I switched it back to use the TimerThread's specific half resolution calculation instead.
As I mentioned above, we do still get early wakeups from nsITimer with these changes on windows.  They are, however, greatly reduced.
        Attachment #8872003 -
        Flags: review?(nfroyd)
| Comment 121•8 years ago
           | ||
Comment on attachment 8871954 [details] [diff] [review]
P13 Use the nsITimer::InitWithCallbackAndTimeDuration() instead of rounding. r=ehsan
Review of attachment 8871954 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/base/TimeoutExecutor.cpp
@@ +70,5 @@
> +  // 3. setTimeout(g, 1);
> +  //
> +  // This should fire f() and g() 500us apart.
> +  //
> +  // In the past worked because each setTimeout() got its own nsITimer.  The 1ms
Nit: the first sentence seems like it's missing a word somewhere in the middle.
        Attachment #8871954 -
        Flags: review?(ehsan) → review+
| Assignee | ||
| Comment 122•8 years ago
           | ||
Comment on attachment 8872003 [details] [diff] [review]
P12 Expose nsITimer::InitWithCallbackAndTimeDuration() method. r=froydnj
On further reflection I'm not sure this patch is a good idea.
Removing the PR_MicrosecondsToInterval() call here effectively removes some rounding:
https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/TimerThread.cpp#514
This patch results in use doing a floor to the nearest interval instead.  So its likely spinning the TimerThread.
I'll post a new patch that just uses TimeDuration to set the nsTimerImpl::mTimeout.  The logic for scheduling the wakeup will not change, though.
The end result is that the patch set in this bug will probably make us fire late a bit more frequently than we do today.
        Attachment #8872003 -
        Flags: review?(nfroyd)
| Assignee | ||
| Comment 123•8 years ago
           | ||
| Updated•8 years ago
           | 
        Attachment #8871998 -
        Flags: review?(dao+bmo) → review+
| Assignee | ||
| Comment 124•8 years ago
           | ||
In the interest of trying to maintain our old behavior, lets see if we can allow early firing again.  This will only be limited to the amount allowed by TimerThread.  If our TimerThread ever becomes more accurate then this would automatically adjust.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=69f9ff7e02187f35180197dca6fa9fd798905161
| Assignee | ||
| Comment 125•8 years ago
           | ||
| Assignee | ||
| Comment 126•8 years ago
           | ||
Ok, this latest try is looking good.  I don't see spurious wakeups locally.  The precision is about the same as we have had historically.
I'm going to clean these remaining patches up so maybe I can get this reviewed and landed on Wednesday.
| Assignee | ||
| Comment 127•8 years ago
           | ||
        Attachment #8872003 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 128•8 years ago
           | ||
| Assignee | ||
| Comment 129•8 years ago
           | ||
| Assignee | ||
| Comment 130•8 years ago
           | ||
Try build based on m-c tip:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c0d86c9bfed1463aea422365537fc91357a3395
Talos comparison push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3514b634dd5885723624fe119d0bea2115ea8ce2
| Assignee | ||
| Comment 131•8 years ago
           | ||
There is a new persistent failure I need to investigate:
devtools/client/netmonitor/test/browser_net_simple-request-data.js
Looks like this test was just changed recently for bug 1242505.
| Assignee | ||
| Comment 132•8 years ago
           | ||
Paolo, do you have any advice on how to approach this error?  It started showing up with my patchset here after you landed bug 1242505.
https://treeherder.mozilla.org/logviewer.html#?job_id=102884059&repo=try&lineNumber=5879
I don't actually do anything specific with this test, but my code does slightly change setTimeout() timings.  I don't see any specific setTimeout() calls in this test though.
Flags: needinfo?(paolo.mozmail)
| Comment 133•8 years ago
           | ||
It seems the failure occurs in devtools/client/netmonitor/test/head.js line 408. If this is intermittent, it may be one of the test shutdown issues with devtools, and in bug 1242505 we just just whitelisted them in these three places:
https://dxr.mozilla.org/mozilla-central/search?q=%22tabTarget+is+null%22&redirect=true
If this is deterministic after your patch lands, then you should either debug and address the issue if it's easy to fix, or use expectUncaughtRejection in the individual test file.
Flags: needinfo?(paolo.mozmail)
| Assignee | ||
| Comment 134•8 years ago
           | ||
(In reply to :Paolo Amadini from comment #133)
> If this is deterministic after your patch lands, then you should either
> debug and address the issue if it's easy to fix, or use
> expectUncaughtRejection in the individual test file.
I'll do some more pushes, but it does not look deterministic to me.  For example, in my try build it triggered in win7 VM opt, but not in win7 opt.
| Assignee | ||
| Comment 135•8 years ago
           | ||
| Assignee | ||
| Updated•8 years ago
           | 
        Attachment #8872661 -
        Attachment description: Bug 1363829 P17 Delay browser_net_simple-request-data.js teardown to allow all events to fire. r=Honza → P17 Delay browser_net_simple-request-data.js teardown to allow all events to fire. r=Honza
| Assignee | ||
| Comment 136•8 years ago
           | ||
Comment on attachment 8872422 [details] [diff] [review]
P12 Expose nsITimer::InitWithCallbackAndTimeDuration() method. r=froydnj
Review of attachment 8872422 [details] [diff] [review]:
-----------------------------------------------------------------
Nathan, this patch is similar to the last one I submitted except:
1. It only includes the bits letting us pass a TimeDuration into nsITimer.
2. It does not change the Monitor or CondVar APIs since underneath they do not support better than millisecond resolution on all platforms.
3. It does not modify the main TimerThread::Run() loop logic at all.
Basically it preserves the rounding TimerThread::Run() that we have always done.  The benefit of the patch is to avoid rounding at the nsITimer API layer as well.
        Attachment #8872422 -
        Flags: review?(nfroyd)
| Assignee | ||
| Comment 137•8 years ago
           | ||
Comment on attachment 8872423 [details] [diff] [review]
P15 Expose nsITimer::GetAllowedEarlyFiringMicroseconds(). r=froydnj
Review of attachment 8872423 [details] [diff] [review]:
-----------------------------------------------------------------
Nathan, this patch exposes an AllowedEarlyFiringTime() method on nsITimer.  The TimerThread internally decides to fire timers up to 250us early.  I'd like to make the setTimeout() logic aware of this value so it can avoid scheduling additional timers if its within this range of the deadline.
I did this as a method on nsITimer so that it can be adjusted if we ever support different timer implementations based on platform, etc.
        Attachment #8872423 -
        Flags: review?(nfroyd)
| Assignee | ||
| Comment 138•8 years ago
           | ||
Comment on attachment 8872424 [details] [diff] [review]
P16 Allow setTimeout() to fire early based on nsIThread::GetAllowedEarlyFiringMicroseconds(). r=ehsan
Review of attachment 8872424 [details] [diff] [review]:
-----------------------------------------------------------------
Ehsan, this patch restores the ability of setTimeout() to fire slightly early.  Previous patches called this "flex time".  In this patch, however, the value is based on the actual calculated value from our nsITimer implementation.  If the underlying implementation changes then setTimeout() will automatically adjust as well.
AFAICT, we currently use 250us for this value on our main platforms today.
        Attachment #8872424 -
        Flags: review?(ehsan)
| Assignee | ||
| Comment 139•8 years ago
           | ||
Jan, I was getting intermittent failures in browser_net_simple-request-data.js with my patches here.  They change the timing of things slightly which tends to reveal race conditions.
In this case browser_net_simple-request-data.js has a race condition in that waitForNetworkEvents() resolves its promise when it first receives the last expected event.  The STARTED_RECEIVING_RESPONSE event, however, has asynchronous steps that need to be completed before teardown.
This patch makes the test explicitly wait for all of the queued async test steps.
I also changed the cookies test assertions since they just don't seem accurate any more.  The .requestCookies and .responseCookies attributes don't have a .cookies attribute.  They are just arrays themselves.  I don't think they were getting picked up by the test previously because .once() doesn't catch exceptions.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=31ca9c4aaadb88c217b89818070be7040df50c80
        Attachment #8872725 -
        Flags: review?(odvarko)
| Assignee | ||
| Updated•8 years ago
           | 
        Attachment #8872661 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 140•8 years ago
           | ||
| Assignee | ||
| Comment 141•8 years ago
           | ||
Ugh, the browser_net_simple-request-data.js test is picking up an extra cookie from browser_net_filter* tests.
| Assignee | ||
| Comment 142•8 years ago
           | ||
        Attachment #8872725 -
        Attachment is obsolete: true
        Attachment #8872725 -
        Flags: review?(odvarko)
| Assignee | ||
| Comment 143•8 years ago
           | ||
Comment on attachment 8872755 [details] [diff] [review]
P17 Make browser_net_simple-request-data.js wait for all async tests to complete before finishing. r=honza
Jan, please seem comment 139 for the initial description.
This patch further adds a Service.cookies.removeAll() call to the test directories cleanup handler.  The test was failing in automation because a cookie was leaking from another test to this one.
Looks green on try now.
        Attachment #8872755 -
        Flags: review?(odvarko)
| Assignee | ||
| Comment 144•8 years ago
           | ||
Latest try is looking pretty good:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f8eb8d48591c95155f932879b9fea8d1fd77a2dc
Assuming the last few reviews go ok I'm hoping to land this tomorrow.
| Assignee | ||
| Comment 145•8 years ago
           | ||
Comment on attachment 8872755 [details] [diff] [review]
P17 Make browser_net_simple-request-data.js wait for all async tests to complete before finishing. r=honza
Unfortunately this test still gives periodic errors like this.
https://treeherder.mozilla.org/logviewer.html#?job_id=103177945&repo=try&lineNumber=4469
        Attachment #8872755 -
        Flags: review?(odvarko)
| Assignee | ||
| Comment 146•8 years ago
           | ||
| Assignee | ||
| Updated•8 years ago
           | 
        Attachment #8872755 -
        Attachment is obsolete: true
| Assignee | ||
| Comment 147•8 years ago
           | ||
Comment on attachment 8872844 [details] [diff] [review]
P17 Make browser_net_simple-request-data.js wait for all async tests to complete before finishing. r=honza
Jan, sorry for the flag spam here.  See comment 139 and comment 143 for the previous descriptions.
I had to further update the test here to check for null requestItem in the various waitUntil() polling functions.  It seems its possible for the poll to sometimes run fast enough to get an undefined item in the list.
Again, I don't think the test reported these kinds of errors before because once() does not catch exceptions.  This patch now converts them to promise rejections.
        Attachment #8872844 -
        Flags: review?(odvarko)
| Comment 148•8 years ago
           | ||
Comment on attachment 8872844 [details] [diff] [review]
P17 Make browser_net_simple-request-data.js wait for all async tests to complete before finishing. r=honza
Review of attachment 8872844 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good to me.
R+ assuming Try is green
Thanks for the patch!
Honza
        Attachment #8872844 -
        Flags: review?(odvarko) → review+
|   | ||
| Comment 149•8 years ago
           | ||
Comment on attachment 8872422 [details] [diff] [review]
P12 Expose nsITimer::InitWithCallbackAndTimeDuration() method. r=froydnj
Review of attachment 8872422 [details] [diff] [review]:
-----------------------------------------------------------------
r=me with the name change.
::: xpcom/threads/nsITimer.idl
@@ +172,5 @@
> +   * @param aType      Timer type per TYPE* consts defined above
> +   */
> +  [noscript] void InitWithCallbackAndTimeDuration(in nsITimerCallback aCallback,
> +                                                  [const] in TimeDuration aDelay,
> +                                                  in unsigned long aType);
Having had a chance to think about it a little more, WDYT about calling this InitHighResolutionWithCallback?  This name seems preferable to me, and is resilient in the face of TimeDuration renaming changes or similar.
        Attachment #8872422 -
        Flags: review?(nfroyd) → review+
|   | ||
| Comment 150•8 years ago
           | ||
Comment on attachment 8872424 [details] [diff] [review]
P16 Allow setTimeout() to fire early based on nsIThread::GetAllowedEarlyFiringMicroseconds(). r=ehsan
Review of attachment 8872424 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/base/TimeoutExecutor.h
@@ +22,5 @@
>    nsCOMPtr<nsITimer> mTimer;
>    TimeStamp mDeadline;
>  
> +  // Limits how far we allow timers to fire into the future from their
> +  // deadline.  Starts of at zero, but is then adjusted when we start
Nit: "Starts *off* at zero..."
|   | ||
| Comment 151•8 years ago
           | ||
Comment on attachment 8872423 [details] [diff] [review]
P15 Expose nsITimer::GetAllowedEarlyFiringMicroseconds(). r=froydnj
Review of attachment 8872423 [details] [diff] [review]:
-----------------------------------------------------------------
These comments-as-a-previous-review are great, but they'd be better as proper commit messages...
::: xpcom/threads/nsITimer.idl
@@ +252,5 @@
> +  /**
> +   * The number of microseconds this nsITimer implementation can possibly
> +   * fire early.
> +   */
> +  [noscript] readonly attribute unsigned long allowedEarlyFiringMicroseconds;
Make this [noscript,infallible] so the C++ side can just call GetAllowedEarlyFiringMicroseconds()...we don't ever expect that this could fail, right?
        Attachment #8872423 -
        Flags: review?(nfroyd) → review+
| Assignee | ||
| Comment 152•8 years ago
           | ||
(In reply to Nathan Froyd [:froydnj] from comment #151)
> Make this [noscript,infallible] so the C++ side can just call
> GetAllowedEarlyFiringMicroseconds()...we don't ever expect that this could
> fail, right?
I tried doing [infallible], but that requires the interface to be [builtinclass].  That seemed to be problematic with the way nsTimer/nsTimerImpl are currently written.  I was trying not to change too much here, so I just skipped the infallible for now.
| Assignee | ||
| Comment 153•8 years ago
           | ||
        Attachment #8870244 -
        Attachment is obsolete: true
        Attachment #8873196 -
        Flags: review+
| Assignee | ||
| Comment 154•8 years ago
           | ||
        Attachment #8870245 -
        Attachment is obsolete: true
        Attachment #8873197 -
        Flags: review+
| Assignee | ||
| Comment 155•8 years ago
           | ||
        Attachment #8870248 -
        Attachment is obsolete: true
        Attachment #8873198 -
        Flags: review+
| Assignee | ||
| Comment 156•8 years ago
           | ||
        Attachment #8870251 -
        Attachment is obsolete: true
        Attachment #8873200 -
        Flags: review+
| Assignee | ||
| Comment 157•8 years ago
           | ||
        Attachment #8871460 -
        Attachment is obsolete: true
        Attachment #8873202 -
        Flags: review+
| Assignee | ||
| Comment 158•8 years ago
           | ||
        Attachment #8871461 -
        Attachment is obsolete: true
        Attachment #8873203 -
        Flags: review+
| Assignee | ||
| Comment 159•8 years ago
           | ||
        Attachment #8871463 -
        Attachment is obsolete: true
        Attachment #8873204 -
        Flags: review+
| Assignee | ||
| Comment 160•8 years ago
           | ||
        Attachment #8870260 -
        Attachment is obsolete: true
        Attachment #8873206 -
        Flags: review+
| Assignee | ||
| Comment 161•8 years ago
           | ||
        Attachment #8870261 -
        Attachment is obsolete: true
        Attachment #8873207 -
        Flags: review+
| Assignee | ||
| Comment 162•8 years ago
           | ||
        Attachment #8870262 -
        Attachment is obsolete: true
        Attachment #8873213 -
        Flags: review+
| Updated•8 years ago
           | 
        Attachment #8872424 -
        Flags: review?(ehsan) → review+
| Assignee | ||
| Comment 163•8 years ago
           | ||
        Attachment #8870627 -
        Attachment is obsolete: true
        Attachment #8873240 -
        Flags: review+
| Assignee | ||
| Comment 164•8 years ago
           | ||
This includes the suggested rename.
        Attachment #8872422 -
        Attachment is obsolete: true
        Attachment #8873241 -
        Flags: review+
| Assignee | ||
| Comment 165•8 years ago
           | ||
Rename the method in the commit message as well.
        Attachment #8873241 -
        Attachment is obsolete: true
        Attachment #8873242 -
        Flags: review+
| Assignee | ||
| Comment 166•8 years ago
           | ||
        Attachment #8871954 -
        Attachment is obsolete: true
        Attachment #8873243 -
        Flags: review+
| Assignee | ||
| Comment 167•8 years ago
           | ||
        Attachment #8871998 -
        Attachment is obsolete: true
        Attachment #8873245 -
        Flags: review+
| Assignee | ||
| Comment 168•8 years ago
           | ||
        Attachment #8872423 -
        Attachment is obsolete: true
        Attachment #8873246 -
        Flags: review+
| Assignee | ||
| Comment 169•8 years ago
           | ||
        Attachment #8872424 -
        Attachment is obsolete: true
        Attachment #8873247 -
        Flags: review+
| Assignee | ||
| Comment 170•8 years ago
           | ||
        Attachment #8872844 -
        Attachment is obsolete: true
        Attachment #8873249 -
        Flags: review+
| Comment 171•8 years ago
           | ||
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2e778ec02f89
P1 Remove some deadcode in TimeoutManager related to processing pending timers. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/7eec06a678cb
P2 Removing the Timeout argument to TimeoutManager::RunTimeout(). r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/60ce370ec87b
P3 Improve Timeout ref counting to avoide bare AddRef/Release calls. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/aea41a0174ae
P4 Store the scheduled delay on Timeout instead of relying on the nsITimer to hold it. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/9ac282f67370
P5 Add an executor class to represent the single timer executing timers for a window. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/5214ce6bed25
P6 Use the TimeoutExecutor in TimeoutManager. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/32c98a6c8750
P7 Remove the Timeout nsITimer code. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/bd452eda2e83
P8 Remove TimeoutManager back pressure mechanism. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/a70379a2937c
P9 Make docshell chrome tests wait for next paint to avoid races. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/cd8ae8487d00
P10 Make image map reftest wait for first paint before triggering focus/blur. r=dbaron
https://hg.mozilla.org/integration/mozilla-inbound/rev/833f4fe14182
P11 Make debugger mochitests perform a full CC in addition to the forced GC during test cleanup. r=jryans
https://hg.mozilla.org/integration/mozilla-inbound/rev/79c1d7e0d539
P12 Expose nsITimer::InitHighResolutionWithCallback() method. r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/1c99a1222200
P13 Use the nsITimer::InitHighResolutionWithCallback() instead of rounding. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/9b5fb2901eac
P14 Make browser_bug462289.js test use setTimeout(f, 0) like it did before bug 1102017. r=dao
https://hg.mozilla.org/integration/mozilla-inbound/rev/d27d106fc45d
P15 Expose nsITimer::GetAllowedEarlyFiringMicroseconds(). r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/95421442eec4
P16 Allow setTimeout() to fire early based on nsIThread::GetAllowedEarlyFiringMicroseconds(). r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/7c6116ac71e9
P17 Make browser_net_simple-request-data.js wait for all async tests to complete before finishing. r=honza
| Assignee | ||
| Comment 172•8 years ago
           | ||
Note, I did not measure any improvement on speedometer with this patch.  Dropping the dependency.
No longer blocks: Speedometer_V2
| Assignee | ||
| Comment 173•8 years ago
           | ||
For future reference, here is a profile of:
https://mozdevs.github.io/servo-experiments/experiments/tiles/
On my mac book pro using 55.0a1 (2017-05-28) (64-bit):
https://perfht.ml/2soSOQz
Once a nightly spins with the patches from this bug I will compare.
|   | ||
| Comment 174•8 years ago
           | ||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/2e778ec02f89
https://hg.mozilla.org/mozilla-central/rev/7eec06a678cb
https://hg.mozilla.org/mozilla-central/rev/60ce370ec87b
https://hg.mozilla.org/mozilla-central/rev/aea41a0174ae
https://hg.mozilla.org/mozilla-central/rev/9ac282f67370
https://hg.mozilla.org/mozilla-central/rev/5214ce6bed25
https://hg.mozilla.org/mozilla-central/rev/32c98a6c8750
https://hg.mozilla.org/mozilla-central/rev/bd452eda2e83
https://hg.mozilla.org/mozilla-central/rev/a70379a2937c
https://hg.mozilla.org/mozilla-central/rev/cd8ae8487d00
https://hg.mozilla.org/mozilla-central/rev/833f4fe14182
https://hg.mozilla.org/mozilla-central/rev/79c1d7e0d539
https://hg.mozilla.org/mozilla-central/rev/1c99a1222200
https://hg.mozilla.org/mozilla-central/rev/9b5fb2901eac
https://hg.mozilla.org/mozilla-central/rev/d27d106fc45d
https://hg.mozilla.org/mozilla-central/rev/95421442eec4
https://hg.mozilla.org/mozilla-central/rev/7c6116ac71e9
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
          status-firefox55:
          --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
| Assignee | ||
| Comment 175•8 years ago
           | ||
I took some profiles before and after this bug landed on:
Tiles:
  * https://mozdevs.github.io/servo-experiments/experiments/tiles/
    * before:
      * https://perfht.ml/2sx78Xn
      * https://perfht.ml/2sxlsPv
    * after:
      * https://perfht.ml/2swX0xI
      * https://perfht.ml/2rzrUbu
Here you can see in the "before" case that there are large 200ms to 500ms pauses.  During this period there is a lot of setTimeout() and nsTimer runnables executing.  About ~65ms to ~165ms are spent in setTimeout.
In the after case pauses are limited to ~80ms.  Of these pauses setTimeout takes ~30ms.
I don't completely understand this result, but there are a couple things that could be helping here:
1) Our setTimeout() doesn't always lock with TimerThread any more.  So we can execute more continuously on the main thread without forcing context switching.
2) We are dispatching many fewer timer events, so the possibility to intermix nsTimer runnables during these periods is reduced.
3) Processing timeouts faster allows us to drain the timeout list more quickly reducing the amount of timeouts we have to search in Insert().
Flood:
  * https://people-mozilla.org/~bkelly/timer-flood/index.html
    * before:
      * https://perfht.ml/2sxkPFT
      * https://perfht.ml/2sxdDtm
    * after:
      * https://perfht.ml/2rzqm1n
      * https://perfht.ml/2rz3cIk
The main take away from these profiles is that we end up contending with the TimerThread a lot less. If you search for "Acquire" you can see how often mutexes show up in the profile (this is windows).  There is a lot more contention in the "before" case.
Also, there is a huge amount of contention during back pressure reduction.  This is one reason I chose to remove the back pressure mechanism for now.
Depends on: 1369440
| Comment 176•8 years ago
           | ||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #175)
> I took some profiles before and after this bug landed on:
> 
> Tiles:
>   * https://mozdevs.github.io/servo-experiments/experiments/tiles/
>     * before:
>       * https://perfht.ml/2sx78Xn
This profile isn't a good example.  All of the cost here is bug 1369494 (around 5% is nsITimer related).
>       * https://perfht.ml/2sxlsPv
>     * after:
>       * https://perfht.ml/2swX0xI
>       * https://perfht.ml/2rzrUbu
> 
> Here you can see in the "before" case that there are large 200ms to 500ms
> pauses.  During this period there is a lot of setTimeout() and nsTimer
> runnables executing.  About ~65ms to ~165ms are spent in setTimeout.
> 
> In the after case pauses are limited to ~80ms.  Of these pauses setTimeout
> takes ~30ms.
> 
> I don't completely understand this result, but there are a couple things
> that could be helping here:
> 
> 1) Our setTimeout() doesn't always lock with TimerThread any more.  So we
> can execute more continuously on the main thread without forcing context
> switching.
> 2) We are dispatching many fewer timer events, so the possibility to
> intermix nsTimer runnables during these periods is reduced.
> 3) Processing timeouts faster allows us to drain the timeout list more
> quickly reducing the amount of timeouts we have to search in Insert().
Well the fact remains that setTimeout() is still freaking expensive!  In the second before profile, around 11-12% of the cost of setTimeout() is about creating the nsITimer, the rest is Timeouts::Insert()!  That function is *really* inefficient.
> Flood:
>   * https://people-mozilla.org/~bkelly/timer-flood/index.html
>     * before:
>       * https://perfht.ml/2sxkPFT
>       * https://perfht.ml/2sxdDtm
>     * after:
>       * https://perfht.ml/2rzqm1n
>       * https://perfht.ml/2rz3cIk
> 
> The main take away from these profiles is that we end up contending with the
> TimerThread a lot less. If you search for "Acquire" you can see how often
> mutexes show up in the profile (this is windows).  There is a lot more
> contention in the "before" case.
Indeed, that's exactly working as planned!  \o/
The other interesting difference between these two cases is that Timeouts::Insert() doesn't show up that much in this profile.  And the reason is that this test case won't cause you to end up having a long queue of timeouts.  This difference bothered me a bit, so I decided to do the math.  I broke in the JS debugger in the servo demo to see how many tiles I have.  I had a 91*50 grid.  That means we'll iterate over 4549 timeouts *in the last iteration of the loop*, and I bet the real number is much lower than that.  And this loop only runs one time in the beginning of each round of a new effect rolling over a new picture (so not frequently at all).  This really shouldn't be that slow, and we should barely see it in the profile...
> Also, there is a huge amount of contention during back pressure reduction. 
> This is one reason I chose to remove the back pressure mechanism for now.
Seems like it indeed was a good idea!
| Assignee | ||
| Comment 177•8 years ago
           | ||
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #176)
> >       * https://perfht.ml/2sxlsPv
> >     * after:
> >       * https://perfht.ml/2swX0xI
> >       * https://perfht.ml/2rzrUbu
> > 
> > Here you can see in the "before" case that there are large 200ms to 500ms
> > pauses.  During this period there is a lot of setTimeout() and nsTimer
> > runnables executing.  About ~65ms to ~165ms are spent in setTimeout.
> > 
> > In the after case pauses are limited to ~80ms.  Of these pauses setTimeout
> > takes ~30ms.
> > 
> > I don't completely understand this result, but there are a couple things
> > that could be helping here:
> > 
> > 1) Our setTimeout() doesn't always lock with TimerThread any more.  So we
> > can execute more continuously on the main thread without forcing context
> > switching.
> > 2) We are dispatching many fewer timer events, so the possibility to
> > intermix nsTimer runnables during these periods is reduced.
> > 3) Processing timeouts faster allows us to drain the timeout list more
> > quickly reducing the amount of timeouts we have to search in Insert().
> 
> Well the fact remains that setTimeout() is still freaking expensive!  In the
> second before profile, around 11-12% of the cost of setTimeout() is about
> creating the nsITimer, the rest is Timeouts::Insert()!  That function is
> *really* inefficient.
I think the other factor in the painting pauses on the tiles before was that it might have been triggering back pressure.  We started at 5000 timers and I think tiles does more than that.  So that could partially be why there are those pauses.
> The other interesting difference between these two cases is that
> Timeouts::Insert() doesn't show up that much in this profile.  And the
> reason is that this test case won't cause you to end up having a long queue
> of timeouts.  This difference bothered me a bit, so I decided to do the
> math.  I broke in the JS debugger in the servo demo to see how many tiles I
> have.  I had a 91*50 grid.  That means we'll iterate over 4549 timeouts *in
> the last iteration of the loop*, and I bet the real number is much lower
> than that.  And this loop only runs one time in the beginning of each round
> of a new effect rolling over a new picture (so not frequently at all).  This
> really shouldn't be that slow, and we should barely see it in the profile...
It seems like windows high resolution TimeStamp might be expensive to compare.  The CheckQPC thing seems pretty complex and we do it a lot.
Making Insert scale better in general, though, would be good.  We can discuss in bug 1369494.
Depends on: 1369350
| Updated•6 years ago
           | 
Component: DOM → DOM: Core & HTML
| Updated•3 years ago
           | 
Performance Impact: --- → P3
Whiteboard: [qf:p3]
          You need to log in
          before you can comment on or make changes to this bug.
        
Description
•