Closed Bug 590422 Opened 14 years ago Closed 11 years ago

Consider getting rid of the delay line filter stuff in timer thread

Categories

(Core :: XPCOM, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla22

People

(Reporter: bzbarsky, Assigned: avih)

References

(Depends on 2 open bugs)

Details

(Whiteboard: not-ready-for-cedar [snappy])

Attachments

(2 files, 1 obsolete file)

The current code causes issues like bug 586900 comment 70 describes.  If it's removed, the test in layout/base/tests/test_bug569520.html that the second listener fires after the first one should be reinstated (and maybe even strengthened).
Brendan says we may want this in 2.0... I'm not convinced, but I'll try to get a patch together that we can at least test and whatnot.
Priority: P2 → P1
Brendan, what do you think?
Attachment #469357 - Flags: review?(brendan)
Whiteboard: [need review[
Whiteboard: [need review[ → [need review]
Comment on attachment 469357 [details] [diff] [review]
This seems to work ok for my 10ms setInterval, at least

>@@ -420,18 +364,17 @@ PRInt32 TimerThread::AddTimerInternal(ns
>     // Don't break till we have skipped any overdue timers.
> 
>     // XXXbz why?  Given our definition of overdue in terms of
>     // mTimeoutAdjustment, aTimer might be overdue already!  Why not
>     // just fire timers in order?
> 
>     // XXX does this hold for TYPE_REPEATING_PRECISE?  /be
> 
>-    if (now < timer->mTimeout + mTimeoutAdjustment &&
>-        aTimer->mTimeout < timer->mTimeout) {
>+    if (now < timer->mTimeout && aTimer->mTimeout < timer->mTimeout) {

Your XXXbz and my signed comments can go now, I think.

/be
Attachment #469357 - Flags: review?(brendan) → review+
Indeed.  Removed locally.
Whiteboard: [need review] → [need approval]
Comment on attachment 469357 [details] [diff] [review]
This seems to work ok for my 10ms setInterval, at least

Requesting approval to remove this code, which seems to mostly amplify timer noise nowadays and lead to timers firing much earlier (or much later) than they should.
Attachment #469357 - Flags: approval2.0?
Attachment #469357 - Flags: approval2.0? → approval2.0+
Pushed http://hg.mozilla.org/mozilla-central/rev/d578993db396

Let's see how things go.  Green so far!
Status: NEW → RESOLVED
Closed: 14 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Whiteboard: [need approval]
Target Milestone: --- → mozilla2.0b5
It does look like this may be causing a small Tdhtml regression, presumably because we stop adjusting timer firing times so it takes a bit longer to actually fire the timers (like the amount of time they were actually scheduled for).
OK, I backed this out for now, on suspicion of a 40% regresson in WinXP Tscroll (but nowhere else).

Also likely was 1-4% Tdhtml regression across the board, 1-4% Tp4 win across the board, and a 4% Tsvg regression on WinXP.
Backout changeset: http://hg.mozilla.org/mozilla-central/rev/c8e75a4433d9
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 585437
Someone with more time to look into this needs to try this...
Assignee: bzbarsky → nobody
Priority: P1 → --
Target Milestone: mozilla2.0b5 → ---
No longer depends on: 585437
Depends on: 625258
Comment on attachment 469357 [details] [diff] [review]
This seems to work ok for my 10ms setInterval, at least

Backed out, hence not approved
Attachment #469357 - Flags: approval2.0+ → approval2.0-
Whiteboard: not-ready-for-cedar
It seems a little risky to me to remove the timer adjustment entirely. When the event queue is slow, the adjustment is necessary to keep events firing at the right times. Without this we can get small slowdowns in benchmarks.

If the problems with the adjustment are all refresh driver related (seems to be the case in this bug and bug 731974), there might be a simpler solution. We can just add a small amount of code to the refresh driver callback to reschedule the firing if it happens too quickly. That timer is indeed a little special so special code is not too odd there I don't think.
The problems are not just refresh driver related.  They're also noticeable in setTimeout, and indeed anywhere you actually care about specific (not average) timer values...
I see, ok. How about adding a new timer type for those cases then? Basically a timer that, if the adjustment makes it fire too early, reschedules itself? That would be less intrusive than removing the adjustment entirely I think. Less elegant though...
The problem is that rescheduling on Windows means complete fail, because you can only fire on 15ms boundaries.  So if you need to fire at any sort of reasonable resolution, rescheduling for 15ms from now just because you have 1ms to go to your targer time will be completely broken....
Ok, I did not know that about Windows...
(In reply to Boris Zbarsky (:bz) from comment #15)
> The problem is that rescheduling on Windows means complete fail, because you
> can only fire on 15ms boundaries.

bug 731974 comment 38 seems to suggest Opera does magic here (or possibly not tested on Windows XP, where I guess the trouble is); is doing the moral equivalent of strace on Opera possible?
High-res timers can be enabled on XP, but doing so chews up battery because no one ever taught the XP kernel to use APICs correctly.
There are existing bugs on the Windows situation.  There are all sorts of options, none all that pleasant wrt battery.  If the Windows situation got better, then this bug would get way easier to solve!
I think (and this is convered in one of those bugs) that we should enable high-res timers on Windows if there is web content in play that uses requestAnimationFrame.  The assumption being that we'll be doing "work" anyway, so that the CPU won't be idle.  We can then go back to low resolution timers when that content is closed (when we haven't seen a RAF call in a while).

That doesn't solve the problem of people trying to do animation off regular setTimeout/setInterval, but thta's what RAF is intended to solve anyway.
Since we started getting telemetry for tab animation smoothness about a week ago, I noticed that the most common frame-interval average (for the entire tab animation, except for the first frame) is 14ms instead of a minimum average of 16-17ms.

When I'm testing on a slow computer (AMD E350, windows), I can get results similar to this (not a typical result, but not rare either):

Tab close (Frame-interval / paint-processing):
50.3 / 13.6
23   / 10.4
19.2 / 6.7
31.9 / 6
21.6 / 6.4
17.3 / 6
29.2 / 6.6
8    / 6.9
8.1  / 7.2
7.9  / 7
9.1  / 8.3
8.8  / 8.1
10.4 / 8.4
9.9  / 8.9
8.8  / 7.8
14.4 / 13.5
10.7 / 9.5

If I'm disabling the timer filter with a patch similar to this:

> @@ -200,17 +201,18 @@ void TimerThread::UpdateFilter(uint32_t
>      else if (filterLength < 4)
>        filterLength = 4;
> 
>      for (i = 1; i <= filterLength; i++)
>        smoothSlack += mDelayLine[(mDelayLineCounter-i) & DELAY_LINE_LENGTH_MASK];
>      smoothSlack /= filterLength;
> 
>      // XXXbe do we need amplification?  hacking a fudge factor, need testing...
> -    mTimeoutAdjustment = TimeDuration::FromMilliseconds(smoothSlack * 1.5);
> +    mTimeoutAdjustment = TimeDuration::FromMilliseconds(smoothSlack * 0.1 * Preferences::GetInt("timer.smoothSlackx10", 15));
> +    printf("SS: %0.1f ", mTimeoutAdjustment.ToMilliseconds());
>    }

and set the pref to 0, the tab animation logs looks much more like:

Tab open (Frame-interval / paint-processing):
36.6 / 15.6
24   / 8.1
17.8 / 9
16.5 / 8.5
17.4 / 9.9
14.8 / 7.6
15.6 / 7.2
17.2 / 7.4
16.6 / 7.2
17.1 / 7.1
16.1 / 7.2
16.9 / 7.2
16.6 / 7.9
16.3 / 7.3
16.7 / 7.5

And this is also visibly smoother than the jittery intervals which the timer filter causes, and it's especially noticeable on slow systems where we'd like to use every bit of smoothness we can put our hands on.

The refresh driver already takes long event handling into account by always targeting the next theoretical "vsync". But the timer filter messes with the refresh driver timing greatly, and this visibly affects tab animation, and possibly also other cases where we can't always keep up with 60fps, and as a result hurt smoothness later when we can keep up with 60fps.

Not sure If I should also post this info at bug 731974 (rAF generates too long/short frames). For now I haven't.
Whiteboard: not-ready-for-cedar → not-ready-for-cedar [snappy]
Also, on a slow system, the filter may generate such a timing skew that the refresh driver scheduling returns >20ms too early, and hence re-targets the same timestamp. Bug 842967 should guard against that, but it doesn't solve the general jitter which the timer filter causes after a long event handler.
No need to add this to bug 731974, the issue is here :)  Avi, I would love to see this checked in -- want to shepherd this in?  I think it's basically a matter of getting it pushed for a few try runs and looking at all the benchmark numbers to see what's changed.  The issues with the refresh driver should be resolved due to bug 731974.
I'm guessing that the timer filter might still be useful when the OS timer isn't accurate enough.

I think that the problem here is that the filter takes the timer handler timestamp as an input for the compensation, and the handler can be late not only if the timer is inaccurate, but also on cases where the browser was just busy and the event loop was held up. And this triggers an unnecessary compensation sequence until it settles down.

If we could use instead the timestamp of the timer fire (instead of the handler), we might be able to kill 2 birds with one stone (no animals were hurt as a result of writing of this post. That I know of).
The OS timer will be accurate enough on all platforms that we care about, except for windows in low precision mode.. but now we bump it to high precision mode when we actually care.  Kill it off -- the code will be simpler as a result and there will be fewer unexpected issues.
What kind of backfire are we expecting from this?

Also, I took out the filter-related stuff, but I didn't study the entire timer/thread implementation, so there could be some leftovers to remove/modify.

https://tbpl.mozilla.org/?tree=Try&rev=29e2ba575146
> What kind of backfire are we expecting from this?

See comment 8.
Reference try build (clean rebased m-c) for talos: https://tbpl.mozilla.org/?tree=Try&rev=0081a9a1ae10

Patched try build for somparison: https://tbpl.mozilla.org/?tree=Try&rev=152efb54169e (ignore the preceding mq patch, it does nothing by default).
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=0081a9a1ae10&newRev=152efb54169e&submit=true

Looks to me a relatively balanced mix of some regressions and some improvements (possibly more improvements than regressions, but hard to summarize/weight the results), most of them < 4%.

Summary of non-trivial differences (+ = regression, - = improvement):

- a11y Row Major MozAfterPaint: - 1-4%
- Dromaeo (DOM): - 1-4%
- Paint (Mac 10.8): +5%
- (All) Robocop (Android native): - 1-4%
- SVG Row Major: XP: -25%, others: + 1-4%
- SVG, Opacity Row Major: XP: -56%, others: - 2-10%
- Tp5 No Network Row Major MozAfterPaint (Main Normal Network IO Bytes): Windows 7: + 19362% -> this seems like a spurious, but not sure how to recheck it.
- Tp5 No Network Row Major MozAfterPaint (Non-Main Normal Network IO Bytes): windows 7: -9%
- Tp5 No Network Row Major MozAfterPaint (Non-Main Startup File IO Bytes): windows 7: -13%
- Tp5 No Network Row Major Responsiveness MozAfterPaint: XP: +23%, 7: +17%
- Tp5 No Network Row Major Shutdown MozAfterPaint: XP: +12%, linux: -6%, linux64: +7%
- TResize: win 7: +5%, Mac 10.6 -7%, 10.7: -3%, 10.8: +3%
- Ts (android native): +30%
- Ts Paint, MAX Dirty Profile: - 1-5%
- Ts Paint, MED Dirty Profile: - 1-5%
- tscroll Row Major MozAfterPaint: + 1-3%

So, what's next?
If that's what the numbers look like, I say we land this.  None of that is anywhere close to the 40% regression that led to the earlier backout.  ;)
Comment on attachment 716105 [details] [diff] [review]
Remove delay line filter from timer thread

Lets ;)
Attachment #716105 - Flags: review?(bzbarsky)
Also, one standing out apparent regression is Ts: (android native): +30%. Maybe android does need this? I don't have access to an android device, but maybe someone could give this a second look?
Comment on attachment 716105 [details] [diff] [review]
Remove delay line filter from timer thread

>+    // XXXavih is the above comment still meaningful?

No, it's not.  Please remove it together with this new XXX comment.

Can you please add the nsGlobalWindow comment change from http://hg.mozilla.org/mozilla-central/rev/d578993db396 ?

r=me with that.
Attachment #716105 - Flags: review?(bzbarsky) → review+
Assignee: nobody → avihpit
Attachment #716105 - Attachment is obsolete: true
(In reply to Avi Halachmi (:avih) from comment #32)
> Also, one standing out apparent regression is Ts: (android native): +30%.
> Maybe android does need this? I don't have access to an android device, but
> maybe someone could give this a second look?

So I took a closer look and repeated the ts tests on Android 2.2 and 4.0:

- Android 4.0: the two runs of each build are both around 3.5s. So consistent and no regression here.

- Android 2.2: 3 runs on each build:
  - reference: 4s, 3.4s, 5.4s, one restart.
  - patched  5.2s, 3.4s, 5.4s, one restart.

I'd say this test on 2.2 is quite noisy, and to me there's no regression here.
You can give several base revisions to the compare talos tool so that it and you can better see how big the usual variation is in that test and determine if it's normal or not.
(In reply to Timothy Nikkel (:tn) from comment #36)
> You can give several base revisions to the compare talos tool so that it and
> you can better see how big the usual variation is in that test and determine
> if it's normal or not.

I have, it has the same kind of variance and range over several inbound builds which had the ts test. It's just noisy, and the reference/patch builds show exactly the same pattern as the rest.
I meant for all of the tests, not just ts, for the initial comparison. So that you aren't just comparing two random talos runs with no idea of what the usual variation is in those tests.
I just got this regression email (4% tscroll regression on XP). I can't find my changeset on the perf-o-matic graph, but I wouldn't rule it out that this patch caused it, especially if the result can depend on refresh-intervals. Will look into tscroll test and try to assess if that's indeed the case.

Regression: Mozilla-Inbound-Non-PGO - tscroll Row Major MozAfterPaint - XP - 4.02% increase
-------------------------------------------------------------------------------------------
    Previous: avg 10172.223 stddev 27.898 of 30 runs up to revision 036f8a67636c
    New     : avg 10580.760 stddev 60.682 of 5 runs since revision 2b8d0b73600e
    Change  : +408.537 (4.02% / z=14.644)
    Graph   : http://mzl.la/XfbnlU

Changeset range: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=036f8a67636c&tochange=2b8d0b73600e

Changesets:
  * http://hg.mozilla.org/integration/mozilla-inbound/rev/a1d01526d344
    : Ethan Hugg <ethanhugg@gmail.com> - Bug 844929: Accept numeric parameters > UINT_MAX in SDP o-lines
    : http://bugzilla.mozilla.org/show_bug.cgi?id=844929

  * http://hg.mozilla.org/integration/mozilla-inbound/rev/00ed3d264438
    : Avi Halachmi <avihpit@yahoo.com> - Bug 590422: Remove delay line filter from timer thread. r=bz
    : http://bugzilla.mozilla.org/show_bug.cgi?id=590422

  * http://hg.mozilla.org/integration/mozilla-inbound/rev/13ccc41033ec
    : Joshua Cranmer <Pidgeot18@gmail.com> - Bug 648980 - Add a hook for projects to have custom configure scripts. r=glandium
    : http://bugzilla.mozilla.org/show_bug.cgi?id=648980

  * http://hg.mozilla.org/integration/mozilla-inbound/rev/4c8b199ed597
    : Jim Blandy <jimb@mozilla.com> - Bug 839313: Replace JSVAL_VOID and similar constants with their JS::MumbleValue() equivalents. r=waldo
    : http://bugzilla.mozilla.org/show_bug.cgi?id=839313

  * http://hg.mozilla.org/integration/mozilla-inbound/rev/e150e804f902
    : Jim Blandy <jimb@mozilla.com> - Bug 839313: In js/src/shell/js.cpp, Use JS_SET_RVAL, instead of assigning to *vp. r=waldo
    : http://bugzilla.mozilla.org/show_bug.cgi?id=839313

  * http://hg.mozilla.org/integration/mozilla-inbound/rev/2b8d0b73600e
    : Jim Blandy <jimb@mozilla.com> - Bug 839313: Don't use a runtime value root to implement the 'customNative' property of 'it'. r=waldo
    : http://bugzilla.mozilla.org/show_bug.cgi?id=839313

Bugs:
  * http://bugzilla.mozilla.org/show_bug.cgi?id=844929 - Accept numeric parameters > UINT_MAX in SDP o-lines
  * http://bugzilla.mozilla.org/show_bug.cgi?id=648980 - Add build system hooks for allowing applications to provide their own configure options and autoconf variables
  * http://bugzilla.mozilla.org/show_bug.cgi?id=590422 - Consider getting rid of the delay line filter stuff in timer thread
  * http://bugzilla.mozilla.org/show_bug.cgi?id=839313 - [jsdbg2] Assertion failure: isEmpty(), at mozilla/LinkedList.h:269
And a similar one for Linux64:

Regression: Mozilla-Inbound-Non-PGO - Paint - Linux x64 - 5.53% increase
------------------------------------------------------------------------
    Previous: avg 223.067 stddev 4.487 of 30 runs up to revision a1d01526d344
    New     : avg 235.400 stddev 1.949 of 5 runs since revision 2b8d0b73600e
    Change  : +12.333 (5.53% / z=2.749)
    Graph   : http://mzl.la/X9p9YR

And another for Tp5:

Regression: Mozilla-Inbound-Non-PGO - Tp5 No Network Row Major Responsiveness MozAfterPaint - XP - 15.7% increase
-----------------------------------------------------------------------------------------------------------------
    Previous: avg 2407.600 stddev 116.995 of 30 runs up to revision 036f8a67636c
    New     : avg 2786.200 stddev 31.878 of 5 runs since revision 2b8d0b73600e
    Change  : +378.600 (15.7% / z=3.236)
    Graph   : http://mzl.la/X9pivr
I think that the tscroll test results are not representing a well defined parameter.

What all the scroll tests do is:
1. Load a scrollable page (different content for each test).
2. Scroll NN pixels with 10ms intervals, and when the bottom is reached, record the average step time, in microseconds (= overallDuration / numSteps).

There are few factors here:
1. If the scroll steps are relatively light, we could have 200% regression without seeing any difference in test results, since actual flush/paint only happen when the refresh driver triggers them.
2. If the refresh driver increases frequency, more layout-flush and paints would happen, which might affect the result.

The fact that the chosen interval is 10ms isn't helping either, since it's not in sync with the refresh driver. This would make the results appear in steps, and these steps would depend on the system performance (and Gecko's performance). IMO, this system can't measure fine-grained regressions or improvements.

If we want to measure how fast Gecko can scroll some content, we should be setting the refresh driver to as fast as possible such that we're not expecting any test system to complete a scroll step in 1 frame (layout.frame_rate=1000 ?), and use requestAnimationFrame which would scroll (+paint every frame) as fast as possible. The results would still be in steps, but in 1ms resolution, rather than some step which depends on 16.67ms and 10ms (and is quite probably much bigger than 1ms).

Then the results would be more meaningful IMO.

As for the other regressions:

Paint - Linux x64 - 5.53% : i'll look at what the test measures.

Tp5 No Network Row Major Responsiveness MozAfterPaint - XP - 15.7% increase

and:

> - Tp5 No Network Row Major MozAfterPaint (Main Normal Network IO Bytes): Windows 7: + 19362% -> this seems like a spurious, but not sure how to recheck it.

May be related. This might not be a spurious. It appears on few changesets, intermittently: http://graphs.mozilla.org/graph.html#tests=[[216,131,1]]&sel=1361350064863.9297,1361923948570&displayrange=7&datatype=running

I think it's unrelated to this patch though.
Unfortunately, this also caused a couple previously intermittent OSX browser-chrome failures (bug 766546 and bug 838069) to be nearly permanent. Backed out.
https://hg.mozilla.org/integration/mozilla-inbound/rev/4ee88e30412b
Depends on: 845943
Updates:

- Talos tscroll will hopefully update soon (bug 845943).

- Mochitest3 performance_now intermittent failures on XP (bug 749894): turns out that we get many failures when high-resolution timers are disabled. Detection/decision of high resolution timers changed at bug 822490, and that's the real cause of the regression. Bug 836869 is for improving this detection/decision algorithm.

- Open issue: while both vlad and jrmuizel say that we should have very few cases (if at all) where high resolution timers are disabled, fact is, we sometimes do. On those cases, it's yet to be decided if the performance_now M3 test should pass or fail.


Leave filter for XP? since it became clear that there could be valid cases of disabled high-resolution timers (on tryservers mochitests I only saw it on windows XP), and for those cases the timer filter might actually be useful, should we consider leaving the filter on windows XP? maybe enable the filter dynamically when high-res timers are disabled?
Flags: needinfo?(bzbarsky)
I think I personally would prefer to just rip the filter out...
Flags: needinfo?(bzbarsky)
I would too.  That might mean needing to fix things on XP though.. and leaving it in on XP only might be the easiest path forward, since it'll be gone on all forward-looking platforms.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #43)
> Unfortunately, this also caused a couple previously intermittent OSX
> browser-chrome failures (bug 766546 and bug 838069) to be nearly permanent.
> Backed out.

Bug 766546 is failing horribly regardless of this. Bug 838069 appears to be near landing, again, regardless of this bug. Bug 833769 which also appeared to increase frequency as a result of this was identified as probably webRTC related, and is still failing very frequently, also regardless of this bug.


(In reply to Avi Halachmi (:avih) from comment #44)
> - Open issue: while both vlad and jrmuizel say that we should have very few
> cases (if at all) where high resolution timers are disabled, fact is, we
> sometimes do. On those cases, it's yet to be decided if the performance_now
> M3 test should pass or fail.

Landed bug 749894 to pass when QPC is disabled (as it was originally intended to), filed bug 850846 to revisit failure criteria for performance_now test.


(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #46)
> I would too.  That might mean needing to fix things on XP though.. and
> leaving it in on XP only might be the easiest path forward, since it'll be
> gone on all forward-looking platforms.

Let's try to land this again then, including XP. We should ignore tscroll regressions for now, and watch scroll performance with bug 845943. Bug 749894 shouldn't regress now, and the others are most probably not a result of this bug, even if amplified by it.
fwiw I'm eager to see this land - I have a whole set of separate patches that basically ignore the delay timer only for timer events off the main thread. (Doing that can give me reasonable accuracy for my use case). I'd rather just have it ripped out though and throw away my patches :)
So far only got an expected tscroll regression report on XP. Possibly we have better stddev now. I think I'm also noticing some increase with the intermittent oranges mentioned on comment #47 on inbound, but hard to tell.

Regression: Mozilla-Inbound-Non-PGO - tscroll Row Major MozAfterPaint - XP - 4.19% increase
-------------------------------------------------------------------------------------------
    Previous: avg 10176.280 stddev 42.709 of 30 runs up to revision 04e088286219
    New     : avg 10603.100 stddev 10.398 of 5 runs since revision 7d2ea731f5a5
    Change  : +426.820 (4.19% / z=9.994)
    Graph   : http://mzl.la/15401mb
https://hg.mozilla.org/mozilla-central/rev/d4a5a1e1e8e9
Status: REOPENED → RESOLVED
Closed: 14 years ago11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
Depends on: 852672
This likely regressed tsvg on Android 4.0.4 - bug 854746.
(In reply to Geoff Brown [:gbrown] from comment #52)
> This likely regressed tsvg on Android 4.0.4 - bug 854746.

Thanks. I'll look into this.
The landing here caused a major <video> painting regression, in particular on the Surface Pro in power saver mode. We're now dropping a significant number of frames when playing video.

For example I have a test page at which measures this here:
http://people.mozilla.org/~cpearce/bbb480.html

That page tells me that we're now dropping 12% of all frames when playing a 480p video on the Surface Pro. This makes our playback noticeably choppy.

Presumably we're dropping so many frames because the video decoder relies on an nsITimer to wake itself up at the appropriate to send the next frame to the layers backend, and frames must be being sent there without enough time to paint and and being dropped there.

What are we going to do about this? Can we back out this patch on Aurora, if not from Nightly as well? Is there a higher precision timer that we should be using in the <video> decoder?
(In reply to Chris Pearce (:cpearce) from comment #54)
> The landing here caused a major <video> painting regression, in particular
> on the Surface Pro in power saver mode. We're now dropping a significant
> number of frames when playing video.

Since this patch/bug supposedly makes timers better and more accurate (when high-precision timers are available), I can think of few reasons for the video regression:

1. Initially we identified windows XP as the major case where high-precision timers are NOT always available (QPC gets disabled). It's possible that this patch introduces some regression on this case, but so far we haven't identified any major regression on XP. It's also possible that surface pro (or Win8 in general) in power-save mode also gets QPC disabled, and that video playback is such major regression of this bug on this case.

QPC detection was modified on bug 822490, possibly somewhat regressed it (disables it more than before), and being improved on bug 836869.

Bug 850846 is for deciding if disabled QPC is a valid case which should pass Mochitest (performance.now test) or if we expect QPC to be enabled on the vast majority of cases, and therefore fail Mochitest if it's disabled.

vlad and jrmuizel appeared to support the latter when we discussed it (QPC should rarely get disabled). If we can indeed get there, and if the surface pro power-save video issue is of disabled QPC which could actually be enabled, then the video regression is from bug 822490.

If, OTOH, it turns out that A: disabled QPC is a valid case which would happen on some common scenarios (e.g. power save on win8), and B: this bug causes major video regression when QPC is disabled, then maybe we should reconsider the suggestion at comment 44, and disable the filter only when QPC is enabled, instead of ripping it out completely.
 

2. Before this bug landed, timers could have some major skews and measurable irregularities due to this filter (see comment 21). It's possible that some "counter compensation" was deployed as part of the video playback mechanism, which now backfires once the timers don't exhibit an erratic behavior anymore.


- Could you please run the Mochitest3 performance_now test few times on the surface pro in power save mode, and report the results? this test currently (following bug 749894) passes if QPC is disabled on XP, but should fail when QPC is disabled on any other platforms. I don't recall results on Win8, but it _should_ fail there too when QPC is disabled.

- Could you please visit this page which graphs requestAnimationFrame intervals: http://people.mozilla.org/~wmccloskey/example1.html , try it on the surface pro in power-save mode (and preferably also not in this mode as a reference), and report the results?
(In reply to Avi Halachmi (:avih) from comment #55)

> 
> 2. Before this bug landed, timers could have some major skews and measurable
> irregularities due to this filter (see comment 21).

This was certainly my experience. Basically - timers fired early, often extremely (100ms) early. Especially non main thread timers, but main thread ones too. IMO the problem was that the delay filter conflated two things: a] OS time to move the event between threads and b] queueing delay on the target thread. [b] came to dominate that calculation but unlike [a], recent history of queue depth isn't a great predictor of future queue depth. (it also had no sense of target thread, which made things more unpredictable).

Since this patch has landed non main thread timers have been considerably more accurate. There is now landed code in network that relies on that.

 It's possible that some
> "counter compensation" was deployed as part of the video playback mechanism,
> which now backfires once the timers don't exhibit an erratic behavior
> anymore.

Is it possible video can just arm the timer to fire earlier to more realistically model what was happening before?
(In reply to Chris Pearce (:cpearce) from comment #54)
> What are we going to do about this? Can we back out this patch on Aurora, if
> not from Nightly as well? Is there a higher precision timer that we should
> be using in the <video> decoder?

Ew!  We need to dig into this.  The video decoder really shouldn't be using nsITimer, at least not without also enabling high precision timing mode on Windows.  The only time we jump into that right now is when requestAnimationFrame is used -- definitely look at the rAF interval timing page, but also try playing back video while something taht uses rAF is going on in another tab (or recently, as it takes a bit for us to go back to normal-precision mode).

You can also build the win32 timer query tool in bug 731974 to see what the current timer precision is; can you run it when you see the janky playback to see what we're getting?
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #57)
> Ew!  We need to dig into this.  The video decoder really shouldn't be using
> nsITimer, at least not without also enabling high precision timing mode on
> Windows.

What should we use instead?

> The only time we jump into that right now is when
> requestAnimationFrame is used -- definitely look at the rAF interval timing
> page, but also try playing back video while something taht uses rAF is going
> on in another tab

If I have p.m.o/~wmmccloskey/example1.html open in a background tab, I no longer get janky video. Huzzah!

I guess we should enable high precision timing on Windows when we're playing video, until we get time to switch to some other timing mechanism.

How do we enable high precision timers on Windows?

> You can also build the win32 timer query tool in bug 731974 to see what the
> current timer precision is; can you run it when you see the janky playback
> to see what we're getting?

Surface Pro:
Current resolution: 15.625700 ms, range 0.500000 - 15.625000 ms

Surface Pro, with p.m.o/~wmmccloskey/example1.html open in a background tab:
Current resolution: 1.000600 ms, range 0.500000 - 15.625000 ms

For reference, on my Lenovo W530, Win8 Pro x64:
Current resolution: 1.000600 ms, range 0.500000 - 15.625000 ms

I don't see any variation in these numbers when I switch to power saver or high perf power plan. Indeed now that I test it, I see janky playback in all power modes on the Surface Pro.
(In reply to Chris Pearce (:cpearce) from comment #58)
> How do we enable high precision timers on Windows?

timeBeginPeriod(1), timeEndPeriod(1).

This fixes the regression. I'll set the time period to 1 while we're playing video. It would be good if someone could advise what we're expected to use instead of nsITimer.
(In reply to Chris Pearce (:cpearce) from comment #59)
> timeBeginPeriod(1), timeEndPeriod(1).
> This fixes the regression. ...

So the regression might be attributed to the fact that when high-res timers are not available, we don't average/compensate the timeouts like we used to before this bug landed. This change just explicitly requests better resolution timers.

While video playback could be a case where high-res timers are appropriate, could/are there other cases where we've hurt average timeouts accuracy? AFAIK, right now only rAF intentionally sets high-res timers on windows (and following bug 866561, also video playback).

What about other cases (such as those from comment 48, comment 56)? Maybe we should add some API for requesting maximum timers accuracy? maybe it should be automatic whenever a requested timeout is shorter than some threshold (100ms?)? High-res timers could be turned automatically on/off as appropriate.

I think the automatic approach is both feasible, and doesn't introduce a meaningful power usage penalty even if the caller didn't actually require high-res (see bug 866561 comment 20). Though I'd argue that callers who request timeouts <30m would definitely prefer that they don't get 15ms resolution.
(In reply to Avi Halachmi (:avih) from comment #60)
>
> What about other cases (such as those from comment 48, comment 56)? Maybe we
>
yes, network added a corner case where it does timebeginperiod/timeendperiod too. It runs rarely.

 should add some API for requesting maximum timers accuracy? maybe it should
> be automatic whenever a requested timeout is shorter than some threshold
> (100ms?)? High-res timers could be turned automatically on/off as
> appropriate.

I think that would be great! (I used a 50ms threshold - not based on any particularly strong reasoning.)
(In reply to Patrick McManus [:mcmanus] from comment #61)
> I think that would be great! (I used a 50ms threshold - not based on any
> particularly strong reasoning.)

I had the universe on my side with this. 6 x 16.667 = 100. 6 is almost a typological number, and also half a dozen. Seemed appropriate :p
(In reply to Patrick McManus [:mcmanus] from comment #61)
>  should add some API for requesting maximum timers accuracy? maybe it should
> > be automatic whenever a requested timeout is shorter than some threshold
> > (100ms?)? High-res timers could be turned automatically on/off as
> > appropriate.
> 
> I think that would be great! (I used a 50ms threshold - not based on any
> particularly strong reasoning.)

This is a good idea, but it's potentially tricky for power consumption.  Many web pages will do setTimeout(0) and similar, which we'll translate into 10ms (or I think now 3ms or something).  But they don't really need the precision, they just need "will run soon later".  So for all of those we'd be putting the system into high precision mode, wasting power; that was the argument against doing it in the first place.  rAF was a natural place to request high precision because it was an explicit request for a high framerate.  Video is another.

Maybe we only automatically do it for internal timers, not content JS ones?
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #63)
> (In reply to Patrick McManus [:mcmanus] from comment #61)

> Maybe we only automatically do it for internal timers, not content JS ones?

I agree. not really because the HRT timer itself is a power sink, but because a quick firing timeout loop in content js is a total footgun. In any event, non-windows platforms already give decent granularity, so the setTimeout implementation has to implement a minimum itself if consistency is desirable.
Another perspective could be that we use rAF relatively frequently anyway. At least for now, rAF triggers HRT for 90s, so every tab open/close/scroll, page scroll probably too (at least it should), etc triggers HRT.

It _could_ be the case that we're at HTR most of the time already anyway, depending on usage pattern. It possibly only drops to low-res when the browser is really idle (e.g. minimized, or while watching video, and now even not that ;) ). Telemetry for % at HRT and number of switches/hour?

Another thing to consider is that once OMTC lands, we might use OS callbacks for rAF, which would remove our biggest HRT trigger, possibly leaving us at HRT for a very low percentage of the time. Do we want to have a 15ms timer resolution on that case for web pages and our own timeouts?

I think it's not unlikely that a lot of code already implicitly assumes it has HRT, because indeed it mostly has, but only because rAF was used recently.

Considering that HRT itself uses less than 1% power and 0% CPU usage, and it's only the tighter timeOuts which we care about, maybe just turn it on permanently (or while not idle) and handle the tight timeouts case as we already do anyway?
(In reply to Avi Halachmi (:avih) from comment #65)
> Another perspective could be that we use rAF relatively frequently anyway.
> At least for now, rAF triggers HRT for 90s, so every tab open/close/scroll,
> page scroll probably too (at least it should), etc triggers HRT.

timePeriod(1) is set for 90s after opening a new tab.  Only.  Scrolling, closing etc. doesn't change anything.  Tested empirically.  Also, I only see an almost invisible effect on scrolling rendering - with HRT it's fluent, with 15.6ms it's almost in-notably less fluent.

> Considering that HRT itself uses less than 1% power and 0% CPU usage, and

For curiosity, where did you get these numbers?

> it's only the tighter timeOuts which we care about, maybe just turn it on
> permanently (or while not idle) and handle the tight timeouts case as we
> already do anyway?

I'm not a big fun of this.  I really believe the CPU will not be able to do deeper sleeps that often (several times withing a single second!) that may raise power and lower lifespan too.

Chromium code is setting timePeriod(1) only for a very short time when a scheduled timer is shorter then some 50 or 100ms (when I last checked the code, not tested experimentally).  We can do the same.
(In reply to Honza Bambas (:mayhemer) from comment #66)
> timePeriod(1) is set for 90s after opening a new tab.  Only.  Scrolling,
> closing etc. doesn't change anything.  Tested empirically.  Also, I only see
> an almost invisible effect on scrolling rendering - with HRT it's fluent,
> with 15.6ms it's almost in-notably less fluent.

This is interesting. Tab close animation and (smooth) scroll should definitely use rAF (together with its HRT), and according to this, they currently don't. It's possible that rAF usage on tab open is only incidental, probably following bug 850163 which changed the trigger of tab open animation to rAF (as a replacement for setTimeout(0), but which wouldn't be late to handle the next frame).


> > Considering that HRT itself uses less than 1% power and 0% CPU usage, and
> For curiosity, where did you get these numbers?

I estimated them from this: https://bugzilla.mozilla.org/show_bug.cgi?id=866561#c20

> > it's only the tighter timeOuts which we care about, maybe just turn it on
> > permanently (or while not idle) and handle the tight timeouts case as we
> > already do anyway?
> 
> I'm not a big fun of this.  I really believe the CPU will not be able to do
> deeper sleeps that often (several times withing a single second!) that may
> raise power and lower lifespan too.
> 
> Chromium code is setting timePeriod(1) only for a very short time when a
> scheduled timer is shorter then some 50 or 100ms (when I last checked the
> code, not tested experimentally).  We can do the same.

Yeah, this was also suggested earlier. I tried to come up with few approaches to get some feedback, exactly like yours. Thanks.
Let me add some numbers for the impact of 1ms timers. Reporting as percentages can be dangerous.

Measured on a Core i% 3317U @ 1.7GHz, 64-bit Win 8, running on DC power.

Power
about:blank, 15.6ms timer: 1.81 watts (package power)
about:blank, 1ms    timer: 1.90 watts

Avg C-state residency
about:blank, 15.6ms: 1.13% C0, 0.09% C1, 98.78% C7
about:blank, 1ms   : 1.44% C0, 0.21% C1, 98.35% C7

This is also on a *very* idle system. A 1ms timer means more processes get scheduled, context switched, etc., so an average user's system would likely see more impact. Not to mention other applications that may unexpectedly abuse a 1ms timer by using Sleep(1) or some similar construct.
Here's what I can summarize from recent comments:

1. Some code needs HRT on Windows (some network code, rAF, video playback, possibly more), and this code now either manually triggers HRT, or implicitly counting on HRT being enabled.

2. HRT imposes some (relatively small) penalty in terms of power consumption and context switching. While this penalty is very acceptable when HRT is actually required, we'd prefer to disable HRT when we don't need it.

3. Some cases in rAF-like scenarios (tab animation, smooth scroll, possibly more) don't get HRT enabled, while they logically should.

4. (Did some more tests): HRT is never disabled as long as a background tab uses rAF.

5. Right now, rAF triggers HRT for 90s. With OMTC, rAF could use other systems (e.g. WaitForVBlank) which won't require HRT (not sure what's the implication).


Do we want to implement the following which apparently solves all the above? :

TimeBegin[/End]Period is controlled by the timers only (and removed from rAF, video, etc) as follows:
- Non-chrome content can't trigger HRT explicitly.
- timeouts (what about repeating?) of 0 (1 too?) will never trigger HRT.
- Timers lower than some threshold (50ms or 100ms) from chrome will trigger HRT.
- HRT will be kept enabled for some grace period afterwards (100ms?).

Need decisions if the above is acceptable:
- Thresholds: min/max duration for enabling HRT (suggesting 2ms/100ms), grace (suggesting extra 100ms).
- What about repeating timers (intervals)?
- What about addons?
(In reply to Avi Halachmi (:avih) from comment #69)

> TimeBegin[/End]Period is controlled by the timers only (and removed from
> rAF, video, etc) as follows:

I have the same idea.

> - What about repeating timers (intervals)?

They should have HRT too IMO.

This is very close to what chromium has for years and makes complete sense.
Here is a resource on the impact of high res timers:

http://software.intel.com/en-us/articles/cpu-power-utilization-on-intel-architectures

It's a few years old, but still valid.
(In reply to Avi Halachmi (:avih) from comment #69)
> ...
> TimeBegin[/End]Period is controlled by the timers only (and removed from
> rAF, video, etc) as follows:
> - Non-chrome content can't trigger HRT explicitly.
> - timeouts (what about repeating?) of 0 (1 too?) will never trigger HRT.
> - Timers lower than some threshold (50ms or 100ms) from chrome will trigger
> HRT.
> - HRT will be kept enabled for some grace period afterwards (100ms?).

FYI, wrote a proof of concept for the above (sans the chrome-only trigger) with min/max threshold of 2/50 and grace of 200.

Tried to put the trigger either at nsTimerImpl::SetDelayInternal or at TimerThread::AddTimerInternal (with similar results). It seems to work well, but apparently there are quite a few scenarios which trigger one or more delays within the range of 2-50 ("valid") and therefore trigger HRT. For instance:
- Cursor blink (at the location bar, search field, etc) triggers 3 valid delays every 500ms.
- Links [un]hover trigger quite a few (~15) valid delays.
- On startup there are many valid delays.

Changing the valid range to 2/30 doesn't change the above.

So I modified the threshold to require several valid delays within some interval to trigger HRT. The test interval I used is (numRequired - 1)*MaxValidInterval. So if numRequired is 3 and max is 50, it requires 3 valid delays within 100ms (at least 1 every 50ms).

Turned out that 4 required doesn't trigger HRT on cursor blink, and does trigger it on links hover. It also improves on current implementation on tab animation and scroll (does trigger HRT now) and background-tab rAF (disables HRT after a short while now).

It has a nice potential, but to me doesn't seem like a magic bullet (and chrome-only will add some further complexity). FYI.
(In reply to Avi Halachmi (:avih) from comment #53)
> (In reply to Geoff Brown [:gbrown] from comment #52)
> > This likely regressed tsvg on Android 4.0.4 - bug 854746.
> 
> Thanks. I'll look into this.

I responded at bug 854746.

TL;DR: These tests report to talos a result which is an overall duration of the test run. Of that duration, much of the time is spent in setTimeout (on 4 of the 7 tests as much as 95+% of the reported result is aggregated setTimeouts. 2 tests spend about 75% on timeouts, and one test about 45%).

Therefore I'd argue that these tests measure the consistency of setTimeout more than anything else, and hence the reported "regressions" due to the change which this bug brings to timing (arguably for the better).
Depends on: 999161
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: