Firefox-desktop on windows almost never scrolls 100% smooth

NEW
Unassigned

Status

()

Core
Graphics: Layers
4 years ago
a year ago

People

(Reporter: avih, Unassigned)

Tracking

(Depends on: 2 bugs)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

4 years ago
Created attachment 776049 [details]
testcase 1: Scroll test: 2 text cases and a bookmarklet (cross browser)

TL;DR
-----
While scrolling on Firefox-desktop on windows (typically using the mouse wheel or KB arrows, but also touch-scroll):

- On many pages, especially on slow systems, we can't iterate fast enough - window size[/width?] affects this a lot.

- Even when we can iterate fast enough on fast systems, we miss vsync much more than we should.

We should be able to scroll smoothly, preferably also on slow systems (IE and Chrome mostly manage this), but especially if the system is fast enough to iterate rendering at screen refresh rate.


Scroll results:
---------------
- testcase 1: Cross platform sparse/dense text test cases and bookmarklet. By default - records intervals of 5px scroll over 7 seconds using requestAnimationFrame, and ignores the first 120 values.
- Tested using official Firefox nightly 2013-07-15 with a clean profile. On OS X I used few days old build which I compiled, configured very similar to nightlies.
- Sparse/dense text tested using testcase 1.
- Wikipedia (https://en.wikipedia.org/wiki/Firefox) tested with the bookmarklet at testcase 1.
- ASAP means iterate as fast as possible by setting layout.frame_rate=10000. On OS X this works on recent nightlies following bug 888899.
- The ASAP results show average interval and stddev (_not_ min/max).
- I also have an addon version of testcase 1 which records intervals using the Start[/Stop]FrameTimeRecording API (bug 820167), but the results show similar intervals distribution to the bookmarklet version, so I didn't attach the addon.
- The smoothness results ("mostly smooth", etc) is wrt perfect vsync smoothness at default refresh rate (vsync), assessed visually only. I couldn't find a metric which correlated well to perceived smoothness.
- The window size is inner window - content only (testcase 1 shows the size live while resizing).
- IE and Chrome test results are also using the bookmarklet from testcase 1.


Fast System:
------------
Win7 laptop, i7-3630qm+iGPU, cores: 4+HT, ram: 16G, win-accel: D3D10, d2d: yes, sunspider1.0: 120ms

- (1366x704):
Sparse/Dense text: smooth.             ASAP: 4ms (stddev 0.5)
Wikipedia:         Some missed frames. ASAP: 6ms (stddev 2)

- Maximized (1920x1015):
Sparse text: Smooth.             ASAP: 6ms   (stddev 1)
Dense text:  Few missed frames.  ASAP: 6ms   (stddev 1)
Wikipedia:   Mostly not smooth.  ASAP: 9.5ms (stddev 5)

Chrome (28): maximized Wikipedia: 100% smooth 16.67ms (stddev 0.5).
IE 10: maximized Wikipedia: 100% smooth 16.68ms (stddev 0.15).


Slow System 1:
--------------
Win7 laptop, AMD E350+iGPU, cores: 2, ram: 4G, win-accel: no, d2d: no, sunspider1.0: 450ms

- Minimal width (199x675):
Sparse text: Few missed frames.  ASAP: 7.5ms (stddev 0.8)
Dense text:  Few missed frames.  ASAP: 8ms   (stddev 0.9)
Wikipedia:   Some missed frames. ASAP: 9ms   (stddev 2)

- Maximied (1366x704):
Sparse text: Some missed frames. ASAP: 15ms   (stddev 2)
Dense text:  Some missed frames. ASAP: 16.7ms (stddev 2)
Wikipedia:   Never smooth.       ASAP: 22.5ms (stddev 4)

- Maximied (1366x704) With layers.acceleration.force-enabled=true:
Sparse text: Mostly smooth.     ASAP: 10ms   (stddev 2)
Dense text:  Mostly smooth.     ASAP: 11ms   (stddev 1)
Wikipedia:   Mostly not smooth. ASAP: 16.6ms (stddev 4)

Chrome (28) : Mostly smooth on all pages maximized, including wikipedia.
IE 10: maximized wikipedia: 15.5ms (stddev 1.5) (rAF/vsync bug?).


Slow system 2:
--------------
Win8 tablet, Atom z2760+iGPU, Cores: 2+HT, ram: 2G, win-accel: D3D9, d2d: no, sunspider1.0: 620ms:

- Minimal width (199x674):
Sparse text: Mostly smooth.    ASAP: 10ms (stddev 2)
Dense text:  Mostly smooth.    ASAP: 10ms (stddev 0.7)
Wikipedia:   Sometimes smooth. ASAP: 13ms (stddev 3.5)

- Maximized (1366x704):
Sparse text: Never smooth.      ASAP: 18ms (stddev 2)
Dense text:  Mostly not smooth. ASAP: 18ms (stddev 3.5)
Wikipedia:   Never smooth.      ASAP: 24ms (stddev 5)

Chrome (28): maximized wikipedia: rAF: 24ms (stddev 4) not smooth, but feels more consistent than Firefox. When rolling the mouse wheel: extremely responsive with APZC. Dragging the scrollbar: no-APZC and laggy.

IE 10: maximized wikipedia: 100% smooth with touch-scroll or mouse-wheel roll or scrollbar. rAF: 15.5ms (stddev 2) not-smooth but very consistent (rAF bug?).


OS X, medium speed:
------------------------
MBA 13" late 2010 2.13GHz, M.Lion 10.8.4, cores: 2, ram: 4G, sunspider1.0: 230ms:

- 1440x766 with OMTC:
Sparse text: 100% smooth. ASAP: 2.3ms (stddev 1.6)
Dense text:  100% smooth. ASAP: 2.6ms (stddev 1.6)
Wikipedia:   100% smooth. ASAP: 3.8ms (stddev 1.6)

- 1440x766 without OMTC:
Sparse text: 100% smooth.         ASAP: 5.1ms (stddev 1.7)
Dense text:  100% smooth.         ASAP: 5.7ms (stddev 2)
Wikipedia:   Rarely missed frame. ASAP: 7ms   (stddev 2.5)



Conclusions/observation:
------------------------

- Our vsync timing is at the refresh driver, and AFAIK it takes at least one event until the OS gets the new data. This could be the main reason for missed vsyncs on a fast enough system.

- Intervals distribution - both at the refresh driver (recorded at requestAnimationFrame callbacks) and at LayerManager::PostPresent (recorded with the Start[/Stop]FramesRecording API) are quite similar, and don't correlate well with visible smoothness. I.e it may scroll smooth but have stddev of 4-5, or may have stddev of 1-2 and still not look smooth - both with average of 16.67ms.

- Without win-gpu-accel (D3D9/10) - tearing is usually visible.

- Occasionaly scroll can degrade for few consecutive iterations even with homogenous content - probably GC/CC.

- Occasionaly scroll can degrade for few consecutive iterations when new complex content scrolls into view.

- d2d acceleration degrades or doesn't help on my slow test systems.

- I'm hoping that APZC will be able to mitigate many of the congestions (GC/CC, rendering latency, etc).

- On OS X, OMTC gives optimal gain on these cases - it doubles the throughput.

- On OS X, we rarely miss a frame with OMTC, and to a lesser extent without OMTC - but even without OMTC still _much_ better than my fastest windows test system (which is twice as fast per core than the MBA). Note that even without OMTC, the MBA iterates faster even though it's twice as slow per core.

- As OS X shows with or w/o OMTC, very tight intervals distribution is _not_ required for 100% smooth scroll - typically the intervals distribution is not better than on windows (though chrome and IE can do optimal intervals on some systems, especially IE typically does 16.67ms with stddev of 0.15(!) ms).

- vblank-blocking on present, especially with OMTC will probably be very useful.

- According to taras, most (many?) of the systems running Firefox are single core. This is unfortunate.

- Chrome and IE do APZC (Chrome: checkerboard, IE: blank), probably also OMT[composition/rendering]. Visible in chrome when scrolling fast on a slow system, extremely hard to notice with IE (even with my slowest Atom system).

- Chrome and IE fail to iterate at 16.67ms on some systems. rAF/vsync bug?


All that being said, we _can_ scroll 100% smooth even on low-end systems - if the window size is small, content is light, and gpu-win-accel is on. However, typical content on full screen on most windows systems will not scroll 100% smoothly. We need to be able to scroll smoothly on many more cases than this IMO.

Comment 1

4 years ago
(In reply to Avi Halachmi (:avih) from comment #0)
> 
> We should be able to scroll smoothly, preferably also on slow systems (IE
> and Chrome mostly manage this), but especially if the system is fast enough
> to iterate rendering at screen refresh rate.

Excellent bug report. I'll find someone to pick this up. Stay tuned.
It would be very interesting to repeat these tests with OMTC on on Windows (with either the D3D11 backend or the D3D9 backend).
(Reporter)

Comment 3

4 years ago
> Fast System:
> ------------
> - Maximized (1920x1015) (no OMTC):
> Sparse text: Smooth.             ASAP: 6ms   (stddev 1)
> Dense text:  Few missed frames.  ASAP: 6ms   (stddev 1)
> Wikipedia:   Mostly not smooth.  ASAP: 9.5ms (stddev 5)

I tested this also with OMTC on windows with the same fast system:

- Maximized (1920x1015) (with OMTC):
Sparse text: Smooth.                 ASAP: 2.7ms   (stddev 1)
Dense text:  Few missed frames.      ASAP: 3.1ms   (stddev 1.1)
Wikipedia:   Even less than before.  ASAP: 8.5ms   (stddev 8.5)

OMTC seems to help with average interval - doubles the throughput on simple cases but only slightly better with wikipedia. However, on wikipedia it generates a much wider variance (and visible congestions) than without OMTC.
Build: Firefox Nightly 25.0a1 2013-07-18 on Xubuntu Linux 13.04 64-bit
System: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
        Mesa DRI Intel(R) Ivybridge Mobile 3.0 Mesa 9.1.3

Sparse Text:
> Steps (after ignoring 120): 300 
> Step: 5 px 
> Duration: 7012 ms (target: 7000) 
> Window size: 1176 x 622 
> Average interval: 16.67 ms 
> STDDEV intervals: 1.08 ms

Dense Text:
> Steps (after ignoring 120): 299 
> Step: 5 px 
> Duration: 7013 ms (target: 7000) 
> Window size: 1176 x 622 
> Average interval: 16.72 ms 
> STDDEV intervals: 1.63 ms

I noticed http://priceonomics.com/the-san-francisco-rent-explosion/ is particularly janky but the bookmarklet doesn't seem to work on this page.

1. Load http://priceonomics.com/the-san-francisco-rent-explosion/
2. Click the bookmarklet from the Bookmarks menu
3. Accept the default values (7000, 5, 120) and click OK
Page doesn't scroll and spits out the following results after 7 seconds:

> Steps (after ignoring 120): 0 
> Step: 5 px 
> Duration: 7167 ms (target: 7000) 
> Window size: 1176 x 622 
> Average interval: 0.00 ms 
> STDDEV intervals: 0.00 ms

Comment 5

4 years ago
Steps (after ignoring 120): 223
Step: 5 px
Duration: 7014 ms (target: 7000)
Window size: 1920 x 982
Average interval: 17.26 ms
STDDEV intervals: 4.92 ms

Steps (after ignoring 120): 230
Step: 5 px
Duration: 7013 ms (target: 7000)
Window size: 1920 x 982
Average interval: 16.88 ms
STDDEV intervals: 3.61 ms

On macos
Changing values to 7000,5,5 made the test work:

> Page: http://priceonomics.com/the-san-francisco-rent-explosion/
> Steps (after ignoring 5): 410 
> Step: 5 px 
> Duration: 7011 ms (target: 7000) 
> Window size: 1176 x 622 
> Average interval: 16.87 ms 
> STDDEV intervals: 4.27 ms
(Reporter)

Updated

4 years ago
Blocks: 856427
Bas, BenWa, can you see if this is low hanging fruit, or if we fit it into the larger "scrolling performance" agenda?
Flags: needinfo?(bgirard)
Flags: needinfo?(bas)

Comment 8

4 years ago
(In reply to Milan Sreckovic [:milan] from comment #7)
> Bas, BenWa, can you see if this is low hanging fruit, or if we fit it into
> the larger "scrolling performance" agenda?

BenWa, Bas are out at a conference this week. Nick, since roc asked you to look at this in related bug 885913, can you dig into this bug while gfx guys are away? Perhaps that are some easy short-term wins to be had here to reach OSX-parity.

Comment 9

4 years ago
(In reply to Taras Glek (:taras) from comment #8)
> (In reply to Milan Sreckovic [:milan] from comment #7)
> > Bas, BenWa, can you see if this is low hanging fruit, or if we fit it into
> > the larger "scrolling performance" agenda?
> 
> BenWa, Bas are out at a conference this week. Nick, since roc asked you to
> look at this in related bug 885913, can you dig into this bug while gfx guys
> are away? Perhaps that are some easy short-term wins to be had here to reach
> OSX-parity.

well its Friday in NZ, and I've got a day full of regressions and other high profile bugs, so I'm not going to get to it before Bas and BenWa get back from SIGGRAPH. I can add it to my queue, but I am kind of snowed under right now, I haven't had a chance to look at 885913 yet, sorry.

Comment 10

4 years ago
We also have a related fluidity problem with animations, not just scrolling, so it MAY also be a useful test case to also throw in TestUFO for testing this specific problem, because of TestUFO's utilization of perfect VSYNC animations (in FF24+)

Some of you may be familiar with the new smooth-scrolling tests at TestUFO:
http://www.testufo.com
http://www.testufo.com/#test=framerates-marquee
http://www.testufo.com/#test=photo

However, these below tests can stutter even at framerate=Hz:
http://www.testufo.com/#test=mprt&thickness=2
http://www.testufo.com/#test=animation-time-graph&ppf=1

(Stuttering on these pages doesn't happen with IE10, PC/Mac Chrome, and Mac Safari)

Comment 11

4 years ago
I just noticed an interesting behavior.  Scrolling is sometimes smoother (but with lots of pauses) during the first 10 seconds of launching of FireFox and then immediately loading the page from cache.  Then afterwards, scrolling starts to "miss a bunch of vsync".

Likewise for animations (on 120Hz monitor, maximized window):
http://www.testufo.com/#test=mprt&size=8
This plays smoothly at first (many vsync's caught), then starts to miss a lot of vsync's.
(In reply to Milan Sreckovic [:milan] from comment #7)
> Bas, BenWa, can you see if this is low hanging fruit, or if we fit it into
> the larger "scrolling performance" agenda?

This bug has a lot of interesting information but it's looking at a large problem (scrolling time, stdev) across several machine, runtime configuration and test page. This makes it impossible to focus on any particular fixable issue. This information IS interesting and does point out that we need to improve scrolling performance.

I think what we need to do is take this information and spin it off into several different bugs that investigate each variable in isolation. Here's my suggestions for first thing we should deep-drive into its own bug:
* Why can't we scroll at 60FPS on a slow machine on sparse text?
I imagine if we improve this other things will follow as well.
Flags: needinfo?(bgirard)
(Reporter)

Comment 13

4 years ago
(In reply to Avi Halachmi (:avih) from comment #0)
>
> Slow System 1:
> --------------
> Win7 laptop, AMD E350+iGPU, cores: 2, ram: 4G, win-accel: no, d2d: no,
> sunspider1.0: 450ms
> 
> - Maximied (1366x704):
> Sparse text: Some missed frames. ASAP: 15ms   (stddev 2)
> Dense text:  Some missed frames. ASAP: 16.7ms (stddev 2)
> 
> - Maximied (1366x704) With layers.acceleration.force-enabled=true:
> Sparse text: Mostly smooth.     ASAP: 10ms   (stddev 2)
> Dense text:  Mostly smooth.     ASAP: 11ms   (stddev 1)
>
> 
> Slow system 2:
> --------------
> Win8 tablet, Atom z2760+iGPU, Cores: 2+HT, ram: 2G, win-accel: D3D9, d2d:
> no, sunspider1.0: 620ms:
> 
> - Maximized (1366x704):
> Sparse text: Never smooth.      ASAP: 18ms (stddev 2)
> Dense text:  Mostly not smooth. ASAP: 18ms (stddev 3.5)


(In reply to Benoit Girard (:BenWa) from comment #12)
>
> Here's
> my suggestions for first thing we should deep-drive into its own bug:
> * Why can't we scroll at 60FPS on a slow machine on sparse text?
> I imagine if we improve this other things will follow as well.

Your suggestion could still be interpreted as more than one case.

Slow System 1 (SS1) is barely fast enough without HW composition, but apparently fast enough for 60FPS with HW composition, yet isn't 100% smooth, while SS2 is just not fast enough even with HW composition (though D9D9).

Terminology wise, we should make a distinction IMO between "60 fps" and "100% smooth". The former means that we can iterate fast enough for 60fps, while the latter requires the former, but adds that we never miss vsync.

So "60 fps" should mean, IMO, that the throughput is high enough for 60fps, and that stddev is low enough considering the number of frames over which it's averaged (e.g. if some double/triple buffering is used, OMTC, etc, then it could mitigate occasional slower frames).

So your suggestion for the first spinoff is for 60fps throughput?
Yes. I'd like for our average case to be consistently below 15ms for sparse text on a slow machine. Ideally this is just a case of the OS throttling our paints which we can fix and improve our measurements but it likely isn't because 'Minimal width' would also get throttled. My vote is on investigating that first separately in a different bug.

Comment 15

4 years ago
(In reply to Benoit Girard (:BenWa) from comment #12)
> I think what we need to do is take this information and spin it off into
> several different bugs that investigate each variable in isolation. Here's
> my suggestions for first thing we should deep-drive into its own bug:
> * Why can't we scroll at 60FPS on a slow machine on sparse text?

Please start filing dependent bugs. I hear murmurs that people have an idea of many horrible things that cause this bug. Please get them in actionable form into bugs blocking this bug so we can start planning what to fix and what not to fix.
On my fast laptop, the test bookmarklet on Wikipedia gives me results like this:

Window size: 1473 x 778 [Actually close to 1920x1080, since I get a default zoom of 125%)
Average interval: 16.67 ms
STDDEV intervals: 0.70 ms
intervals histogram:
10.0 - 12.0 ms: 1 
14.0 - 15.9 ms: 9
15.9 - 17.3 ms: 282
17.3 - 18.0 ms: 7
22.0 - 32.0 ms: 1

It looks very smooth to me. Does this mean my laptop is just too fast? Or should I try to track down the causes of frames with delay > 17.3?
With ASAP enabled, average frame delay on my laptop is around 5.5ms, stdev 1.25 ms. Is it worth profiling that to try to get it down?

I notice that the first run of the test is more jittery than running the test again on the same page, perhaps because of image decoding? Avi, did you do anything to put the browser into a consistent state between each run of your tests?
I think we might be able to write some code using DwmGetCompositionTimingInfo to explicitly detect missed frames and make that information available to tests. It seems like that would help here.
(Reporter)

Comment 19

4 years ago
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #16)
> ...
> It looks very smooth to me. Does this mean my laptop is just too fast? Or
> should I try to track down the causes of frames with delay > 17.3?

This is a very good distribution - which I could typically reproduce only on relatively light content on my fast system.

I'd imagine it would indeed look 100% smooth or close to it. I don't think > 17.3 should be tracked since it's normal to have some slightly longer frames, and as a result the next frame would be slightly quicker - typically still without missing vsync. The even distribution outside the central bucket also supports this.

> With ASAP enabled, average frame delay on my laptop is around 5.5ms, stdev
> 1.25 ms. Is it worth profiling that to try to get it down?

Again, very good result of both values. Seeing that such results are possible, maybe the investigation should be into why other fast systems can't get anywhere near those results?

> ... Avi, did you do anything to put the browser into a consistent state between
> each run of your tests?

Yes, typically I ran it on each system few times in succession - to make sure the results are consistent, and if they were noisy, I made sure the one I registered is representative.

> I think we might be able to write some code ... to explicitly detect
> missed frames ...

Sounds useful indeed.
Depends on: 900785
(Reporter)

Comment 20

4 years ago
(In reply to Avi Halachmi (:avih) from comment #0)
>
> Fast System:
> ------------
> Win7 laptop, i7-3630qm+iGPU, cores: 4+HT, ram: 16G, win-accel: D3D10, d2d:
> yes, sunspider1.0: 120ms
>
> - Maximized (1920x1015):
> Sparse text: Smooth.             ASAP: 6ms   (stddev 1)
> Dense text:  Few missed frames.  ASAP: 6ms   (stddev 1)
> Wikipedia:   Mostly not smooth.  ASAP: 9.5ms (stddev 5)

This system also has an Nvidia gt650m (2G ram) gpu on board in an optimus configuration, though the numbers above were taken when firefox was running on the HD4000 intel gpu. The Nvidia control panel doesn't allow to run Firefox on the nvidia gpu.

Maybe the differences to roc's system are due to optimus?

Comment 21

4 years ago
Created attachment 784822 [details]
FrameTimeTests.png

This is a screenshot of unsmooth fluidity observations, at various lengths of time spent inside requestAnimationFrame().  The yellow numbers are full framerates but does not look smooth, because of missed VSYNC's.  You will observe with FireFox, there is two zones of smoothness.  

The benchmark used to record these numbers are:
http://www.testufo.com/#test=animation-time-graph&measure=rendering&busywait=#
Where # is number of milliseconds to wait in a busywait loop inside requestAnimationFrame().

I suspect that the same problem is causing both the unsmooth scrolling, and also the unsmooth requestAnimationFrame animations.
Profile of janky scrolling on my Optimus laptop with an NVIDIA Quadro 1000M, with D3D10 layers acceleration enabled, D2D enabled. Note that DWM was disabled.

Relevant part: http://people.mozilla.com/~bgirard/cleopatra/#report=9f1a5fe5ff70cd1afab9e01e74bb392ed2b5031a

Full profile:  http://people.mozilla.com/~bgirard/cleopatra/#report=c47270087594e8fce6965e2ebb2189a7e2e43536

Comment 23

4 years ago
One interesting experiment is to try adding Thread.Sleep() into the scrolling routines, start with 1 millisecond sleeps, and slowly increase the numbers by one millisecond until scrolling becomes fluid again (e.g. VSYNC's are now reliably always missed, making it look perfect VSYNC.)  There's a strange "hole" of fluidity that occurs with FireFox.  

In my screenshot, Opera 15 seems to stays perfectly fluid until frame delay 14ms (no missed VSYNC), Chrome 28 stays perfectly fluid until frame delay 11ms while FireFox 24/25 only can stay fluid until about frame delay 6ms.  (The 6ms number is remarkably similiar to Avi's reports)

Comment 24

4 years ago
Oops, change "One interesting experiment is to try adding Thread.Sleep() into the scrolling routines" 
Into: "One interesting experiment is to try adding Thread.Sleep() in one millisecond increments into the flip-buffer-to-display code, and test with the scrolling."

Since there's a strange hole of perfect FireFox fluidity that re-occurs with frame delays of about 12ms or 13ms (but never with 7ms, 8ms, 9ms, 10ms, or 11ms)
Bug 900785 has patches to detect missed/dup frames with main-thread D3D10 compositing.

Avi tried them, and saw that us get into a pattern of repeated "missed frame, duplicate frame" pairs, and this correlated with unsmoothness. I also saw this once during my testing, but normally don't see it on my laptop.

To me that sounds like a problem with the vsync code.

Comment 26

4 years ago
That would coincide with my observations during animation as well, so it appears exactly this same problem is occuring with animations in rAF() -- see my file attachment.  Beyond a certain point, it becomes always consecutive "missed frame", so it looks like perfect VSYNC again. (the strange hole I was discussing).

(smooth) 1ms to 6ms late; frame on time for VSYNC 
(stutter) 7ms to 11ms late; random missed VSYNC, missed frame/duplicate frame behavior
(smooth) 12ms to 13ms late; always misses VSYNC, perfectly rounds to next VSYNC
(stutter) 14ms and up; stutters at increasingly lower framerates

This assumes small deviations (e.g. stddev of 1ms or less) since the strange hole is extremely small but appears easily explainable this way.  For experimentation with this, add an undocumented "&busywait=12" or "&busywait=13" to the end of http://www.testufo.com/#test=animation-time-graph&measure=rendering ... Change the value to 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15.  It adds a wait loop inside rAF(), allowing you to observe stutter behaviour during the horizontal scrolling of the graph.  Use a small browser window if your computer is slow.

I am not familiar enough with Mozilla code but as a programmer with 20+ year of display fluidity experience (e.g. perfect smooth scrolling)
-- Is this a bug at the compositing level? 
-- Is this a scheduling bug?  Timing code execution relative to the timing of VSYNC?   e.g. Scheduling the scroll processing too last-minute before VSYNC, scheduling the rAF() code execution too last-minute before VSYNC?  In general, rAF() should be assigned a full frame cycle of CPU processing time before VSYNC; it appears Opera 15 does animations the best of all the browsers, even better than Chrome (you can waste 14ms in rAF() during 60Hz (16.7ms) and animations still stay perfectly smooth).   Note that there's the "law of physics" issue of an input-lag versus fluidity tradeoff (I've noticed this in experiments in programming other computer software), so one may want to schedule the timing of scrolling separately from the scheduling the timing of processing rAF() script code.   
...Basically the most last-minute you process things before VSYNC, you have less delay between reading input (mouse) and actually seeing it on-screen.  But you have less time to process things before missing VSYNC.  This is the "law-of-physics" of a perfect fluidity versus low latency behavior, and is extremely challenging to make equal on slow systems (allow more latency) versus fast systems.   The easiest solution is simply give everything a full frame of latency (16ms) which is not a problem for most people, but can be a problem when things adds up (e.g. Windows Desktop compositing lag, computer monitor lag, etc) then things start feeling sluggish.
-- I wonder if unsmoothness is a result of competing design decisions between input lag versus fluidity?
(Reporter)

Comment 27

4 years ago
(In reply to Avi Halachmi (:avih) from comment #0)
>
> Fast System:
> ------------
> Win7 laptop, i7-3630qm+iGPU, cores: 4+HT, ram: 16G, win-accel: D3D10, d2d:
> yes, sunspider1.0: 120ms
>...
> Wikipedia:   Mostly not smooth.  ASAP: 9.5ms (stddev 5)


(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #16)
> On my fast laptop, the test bookmarklet on Wikipedia gives me results like
> this:
> 
> Window size: 1473 x 778 [Actually close to 1920x1080, since I get a default
> zoom of 125%)
> Average interval: 16.67 ms
> STDDEV intervals: 0.70 ms
> intervals histogram:
> 10.0 - 12.0 ms: 1 
> 14.0 - 15.9 ms: 9
> 15.9 - 17.3 ms: 282
> 17.3 - 18.0 ms: 7
> 22.0 - 32.0 ms: 1


Very interestingly, when I followed Bas' suggestion and tried the nvidia GPU directly (gt650m, external monitor via hdmi connected directly to the nvidia GPU, and I disabled the internal mon such that only the external is used), I get much better results on the same wikipedia page:

ASAP: average: 3.5 ms, stddev: 0.7ms .
60hz: 100% smooth

Intervals histogram:
14.0 - 15.9 ms: 16
15.9 - 17.3 ms: 260
17.3 - 18.0 ms: 25

And the scrolling, either in ASAP mode or in default refresh rate is _much_ less jerky than with the intel (HD4000) gpu.

I don't know if the culprit in my previous results is the nvidia optimus shim or the intel gpu/drivers. I did try to disable the nvidia gpu via the device manager, but got the same bad results.

Either way, it would probably be useful to understand where's the major issue is, and possibly doing something about since both optimus and intel-only systems quite common AFAIK.
(In reply to Avi Halachmi (:avih) from comment #20)
> The Nvidia control panel doesn't allow to run
> Firefox on the nvidia gpu.

I'm not sure when this changed, but I noticed it also.  You can get around it by copying firefox.exe to fx.exe and running that.  (They do detection by binary name.)
Sounds like we should open a bug on this; do we think it was on purpose?  Because Nvidia has the 3D video feature that only works with Firefox, so this doesn't sound right.
(Reporter)

Comment 30

4 years ago
Scroll test profiles with maximized window, using Nightly 2013-08-20.

- Windows (-> Firefox) DPI scaling set to 125%.
- The (average +- stddev) values were taken with the profiler disabled.
- Nightly has D2D10 and D2D enabled by default.

Intel: HD4000 of i7-3630qm, driver: 9.17.10.2932 (screen: 1920x1080)
Nvidia: GT650m (optimus), driver: 320.49 (2013-07-01) (screen: 1920x1200)


1. Intel, 60hz, wikipedia (16.8 +- 2):
http://people.mozilla.com/~bgirard/cleopatra/#report=0474e8bfe2a67798348fcfea54153dcd6166d118

2. Intel, ASAP, wikipedia (11 +- 5):
http://people.mozilla.com/~bgirard/cleopatra/#report=99e4837359a39ecd5443836f65300d6433a21263

3. Intel, 60hz, dense-text (16.67 +- 0.8):
http://people.mozilla.com/~bgirard/cleopatra/#report=2d8213e2bcdc51a2f597a25d20fa9e688335eb0f

4. Intel, ASAP, dense-text (10 +- 4.5):
http://people.mozilla.com/~bgirard/cleopatra/#report=bb9cfe260b18d8136ce7e19675df5937e05981c4


5. Nvidia, 60hz, wikipedia (16.68 +- 0.6):
http://people.mozilla.com/~bgirard/cleopatra/#report=4e12d5fab6706f8eae0529f86975e4fbd7821d57

6. Nvidia, ASAP, wikipedia (3.5 +- 0.8):
http://people.mozilla.com/~bgirard/cleopatra/#report=8392870e5ca71ef23e56f2ea3a5322412a0c2819

7. Nvidia, 60hz, dense-text (16.68 +- 0.6):
http://people.mozilla.com/~bgirard/cleopatra/#report=2af2fe74ecce01727b75db76a8311150b9cc1035

8. Nvidia, ASAP, dense-text (2.7 +- 0.7):
http://people.mozilla.com/~bgirard/cleopatra/#report=cea0e25314d04eb02d5a01e9a0271352301236eb
Tracking this as one of the "tiling on desktop" goals.
Flags: needinfo?(bas)
(Reporter)

Comment 32

3 years ago
Just a summary of the findings in this bug:
- We have a rendering throughput issue on some systems.
- We have an intervals consistency issue on some systems.
- We're not always able to hit vsync even if the throughput and variance seems good enough.
- The issues are much(!) worse with Intel GPUs than with nvidia.
- On OS X OMTC brings optimal gains (2x throughput), but not on windows (as of 2013-08).
- IE is able to perform perfectly and miles ahead of us even on very low end systems with Intel GPU.
Comparing the ASAP mode wikipedia profiles, there doesn't appear to be much difference percentage wise between the two cards.

Both profiles are completely dominated by surface allocation happening with CreateSamplingRestrictedDrawable. The intel card spends ~35% of painting time doing this, and nvidia spends ~50%. The intel chip spends a lot more time within Flush once we've finished creating our drawable, bringing the total time spent within CSRD to ~50% and 59% for intel and nvidia respectively.

The next biggest is drawing text, ~30% vs ~20%.

Nothing really stands out as being particularly awful in the intel case, it looks pretty awful for both and the intel card is just slower.

We have plans to avoid CSRD, and I think we will see some big scrolling wins once that happens.

The biggest chunk of CSRD time is within DrawBorderImage, so I'm assuming this is source clipping rather than tiling.
(Reporter)

Comment 34

3 years ago
Just adding another data point: In a quick test I performed, OMTC improved scrolling also on Intel GPUs. This aligns with comment 33 that the Intel GPUs are slower but otherwise we don't do anything particularly bad with Intel.

I tested with the bookmarklet from comment 0, on the Firefox wikipedia page, with Nightly 2014-06-03, on a Windows 8.1 32b with Asus T100 laptop (Bay Trail Atom z3740):

Without ASAP, both performed around 16.7 ms/frame on average.

With ASAP:
Without OMTC: ~13.5 ms/frame, stddev ~2
With    OMTC: ~10.5 ms/frame, stddev ~3
(Reporter)

Comment 35

3 years ago
Adding yet some more info. Also posted to mozilla.dev.platform mailing list at https://groups.google.com/d/msg/mozilla.dev.platform/7RyyPmvN6ds/5b99Eu1Cu78J

Windows update from few days ago included a new Intel driver for me (10.18.10.3621 - 2014-05-16), which improved our rendering performance considerably with Intel iGPUs, with a rough magnitude of x2-x3 better.

The improvement is very clearly felt with all recent versions of Firefox (noticed with Firefox 30 and nightly 33).

Examples of how this improvement is measured for us after only updating the driver and keeping everything else the same:


TART numbers (average frame intervals) improved 2.5x-x3 both with and without OMTC (OMTC gained a bit more than without OMTC) both with nightly 2014-05-29 and 2014-06-16:

Old driver (few months old):
OMTC with ASAP:
iconFade-close-DPIcurrent.all   Average (5): 18.47 stddev: 0.46
iconFade-open-DPIcurrent.all    Average (5): 10.08 stddev: 0.46 

New Driver (10.18.10.3621):
OMTC with ASAP:
iconFade-close-DPIcurrent.all   Average (5): 4.80 stddev: 0.23
iconFade-open-DPIcurrent.all    Average (5): 3.32 stddev: 0.05


Scroll-test performance (this bug), while I haven't tested with any of the systems tested here previously, I tested with a similar system with Windows 8.1 and i7-4500u (ultra low voltage Haswell i7 with HD4400 iGPU):

Wikipedia Firefox page, full screen 1920x1080:

OMTC:
Average interval: 5.08 ms
STDDEV intervals: 1.02 ms

OMTC:off
Average interval: 8.19 ms
STDDEV intervals: 1.22 ms

While these numbers are not to the level of the nvidia numbers from comment 30, they're about twice as good as the Intel number from the same comment and with about twice weaker CPU and similar iGPU.

I think these are very good news.
These are very good news.  I think we owe (insert beverage of choice) to Intel driver engineers.
This is great news!

(In reply to Avi Halachmi (:avih) from comment #32)
> Just a summary of the findings in this bug:
> - We have a rendering throughput issue on some systems.
> - We have an intervals consistency issue on some systems.
> - We're not always able to hit vsync even if the throughput and variance
> seems good enough.

Are the three above still an issue?

> - The issues are much(!) worse with Intel GPUs than with nvidia.

So this should be fixed.

> - On OS X OMTC brings optimal gains (2x throughput), but not on windows (as
> of 2013-08).

Fixed?

> - IE is able to perform perfectly and miles ahead of us even on very low end
> systems with Intel GPU.

And this last one I presume is fixed.
(Reporter)

Comment 38

3 years ago
(In reply to Jared Wein [:jaws] (please needinfo? me) from comment #37)
> This is great news!

Aye! I think I also forgot to mention that customize animation is _considerably_ better with this driver than with my older driver. Didn't try to measure it, and the numbers won't show on our talos tests (because all our test systems have Ion nvidia GPUs AFAIK - I'm still trying to change that), but I definitely tried it locally, and the improvement is fantastic.

> > Just a summary of the findings in this bug:
> > - We have a rendering throughput issue on some systems.
> > - We have an intervals consistency issue on some systems.
> > - We're not always able to hit vsync even if the throughput and variance
> > seems good enough.
> 
> Are the three above still an issue?

I think the first two should be covered with the new Intel drivers. stddev of the intervals went down considerably as well, as did the intervals themselves. I think these are now comparable, even if still slower, to our performance with nvidia systems (which is quite good).

As for the vsync issue, it's a different matter. Obviously lower intervals and lower variation help (a lot), but the root cause is that right now we're not really blocking on vsync on windows, so instead we do a "soft sync" and trying to match the vsync intervals with our timers, which can't be 100% dependable. OMTC would hopefully get to absolute vsync syncing (OSX already has such blocking). In due time.

> > - The issues are much(!) worse with Intel GPUs than with nvidia.
> 
> So this should be fixed.

Yes. Now it's just not as good ;) assuming most Intel iGPUs got the same benefit as I got with the new drivers. Vladan kicked of a small project to analyze our tab animation telemetry numbers by GPU, and we're hoping to get some good insights when the numbers start flowing in.

> > - On OS X OMTC brings optimal gains (2x throughput), but not on windows (as
> > of 2013-08).
> 
> Fixed?

It's better than before, but I don't think it's as evident as with OS X - as you can see at the numbers on comment 35, OMTC on windows performs about 30% better than without OMTC.

This is a considerable win in my book, and I'd consider the fantastic 2x gain on OS X bordering unbelievable. I won't mind if it improves even more on windows, but I'd consider it as task achieved for now. (tiling support for OMTC should be the next meaningful step).

> > - IE is able to perform perfectly and miles ahead of us even on very low end
> > systems with Intel GPU.
> 
> And this last one I presume is fixed.

No. IE still does this practically perfect, which we can't say on Gecko.
(Reporter)

Comment 39

3 years ago
(In reply to Avi Halachmi (:avih) from comment #38)
> > > - On OS X OMTC brings optimal gains (2x throughput), but not on windows (as
> > > of 2013-08).
> > 
> > Fixed?
> 
> It's better than before, but I don't think it's as evident as with OS X - as
> you can see at the numbers on comment 35, OMTC on windows performs about 30%
> better than without OMTC.

Just to complete the picture - thie above is for scrolling.

For tab animation, while OMTC numbers got down to literally good numbers, the non-OMTC tab animation numbers got down to downright unbelievable with this new driver (2-3ms/frame during tab animation).

So the margin shrinked slightly, but OMTC still does about 20% worse on tab open, and about 150% worse with tab close animation - compared to without OMTC.

Comment 40

3 years ago
(In reply to Avi Halachmi (:avih) from comment #38)
> > > - The issues are much(!) worse with Intel GPUs than with nvidia.
> > 
> > So this should be fixed.
> 
> Yes. Now it's just not as good ;) assuming most Intel iGPUs got the same
> benefit as I got with the new drivers. Vladan kicked of a small project to
> analyze our tab animation telemetry numbers by GPU, and we're hoping to get
> some good insights when the numbers start flowing in.

I just wanted to note that whatever Intel did in those drivers, the first-gen Core-CPUs (2010-2011) with IntelHD have been EOL'd last year and haven't got a new driver since February '13, and thus won't get any of these improvements.
(Reporter)

Comment 41

3 years ago
(In reply to Avi Halachmi (:avih) from comment #39)
> ... and [OMTC is] about 150% worse with tab close animation - compared to without
> OMTC.

I think this specific issue of tab close animation is worth some closer examination. Matt?

(pardon the off-topic-ish-ness of this, but with the new Intel drivers mentioned here it's now become practical to approach this issue more clearly).
Flags: needinfo?(matt.woodrow)
(Reporter)

Comment 42

3 years ago
(In reply to Elbart from comment #40)
> ... note that whatever Intel did in those drivers, the
> first-gen Core-CPUs (2010-2011) with IntelHD have been EOL'd last year and
> haven't got a new driver since February '13, and thus won't get any of these
> improvements.

That's some good info for the telemetry GPU analysis project mentioned in comment 38. Thanks, I wasn't aware of this. Might be worth finding some release notes of this driver and see what changed that made such a meaningful impact.
(In reply to Avi Halachmi (:avih) from comment #41)
> (In reply to Avi Halachmi (:avih) from comment #39)
> > ... and [OMTC is] about 150% worse with tab close animation - compared to without
> > OMTC.
> 
> I think this specific issue of tab close animation is worth some closer
> examination. Matt?
> 
> (pardon the off-topic-ish-ness of this, but with the new Intel drivers
> mentioned here it's now become practical to approach this issue more
> clearly).

It's interesting that you got such a big change with new drivers. I have the 2011 drivers with my HD3000, and get numbers similar to your latest ones.

I've already profiled the tab animations on my system and have landed patches for bug 1024643 and bug 940845 which fix it.
Flags: needinfo?(matt.woodrow)
(Reporter)

Comment 44

3 years ago
(In reply to Matt Woodrow (:mattwoodrow) from comment #43)
> ...
> I've already profiled the tab animations on my system and have landed
> patches for bug 1024643 and bug 940845 which fix it.

Thanks.

I think one of those landed between the builds I've tested at comment 35, and yet the results between those builds were not meaningfully different IIRC.

Adding a note to self to repeat the test from comment 35 with a new nightly with both of these bugs landed and check how much they affected perf.
Flags: needinfo?(avihpit)
Depends on: 1031011
(In reply to Avi Halachmi (:avih) from comment #38)
> Assuming most Intel iGPUs got the same
> benefit as I got with the new drivers. Vladan kicked of a small project to
> analyze our tab animation telemetry numbers by GPU, and we're hoping to get
> some good insights when the numbers start flowing in.

It looks like data from Telemetry is confirming an improvement in the tab animation performance (FX_TAB_ANIM_ANY_FRAME_INTERVAL_MS).

I see ~23.5 ms for the 9.17.10.2932 drivers on a HD4000, while only about ~19.5ms for the new 10.18.10.3621 drivers. The difference is statistically significant and the effect size is big enough that we can say that there is a real tangible improvement.

Comment 46

3 years ago
http://www.duckware.com/test/chrome/jerky3.html very easily and graphically shows that FF32 is not synchronized to VSYNC.
(Reporter)

Comment 47

3 years ago
(In reply to jerryj from comment #46)
> http://www.duckware.com/test/chrome/jerky3.html very easily and graphically
> shows that FF32 is not synchronized to VSYNC.

If you use a new profile you'll see that it actually is vsync'ed most of the time on windows.

The exceptions are:
- Your system should be fast enough for Firefox to iterate at your monitor refresh rate.
- It's only vsync'ed to your main display (if you have more than one).
- Background "stuff" can affect it (like background gmail tab, internal cleanups, etc).
Flags: needinfo?(avihpit)

Comment 48

3 years ago
avihpit(In reply to Avi Halachmi (:avih) from comment #47)

- fast enough: yes.  setting layout.frame_rate to zero results in 160fps and a MUCH smoother display with NO jerks!
- #monitor: one
- background: nothing else open

with layout.frame_rate set to -1, I get the expected frame rate that matches the vsync refresh rate (so the frames are being generated; and the graph suggests generated 'on time'), but the display is horribly jerky and the test I have provided above (jerky3.html) proves that the problem/cause is that not all generated frames are being presented to the display.  Run the same test with IE, and I get a perfectly smooth display.

Just saying, FF has a bug somewhere...
(Reporter)

Comment 49

3 years ago
(In reply to jerryj from comment #48)
> ...
> Just saying, FF has a bug somewhere...

Yeah, current vsync on windows is not perfect. Also, it's affected by more than just background tabs, e.g. also garbage collection and other Firefox internal stuff which happens on the main thread.

Also, the current implementation uses "soft vsync" where it's not tied directly to the vsync signal but rather uses timeouts to the estimated next vsync, which is less than optimal.

Hopefully bug 987532 will make the implementation tighter and more robust.

Comment 50

3 years ago
(In reply to jerryj from comment #48)
> avihpit(In reply to Avi Halachmi (:avih) from comment #47)
> 
> - fast enough: yes.  setting layout.frame_rate to zero results in 160fps and
> a MUCH smoother display with NO jerks!
> - #monitor: one
> - background: nothing else open
> 
> with layout.frame_rate set to -1, I get the expected frame rate that matches
> the vsync refresh rate (so the frames are being generated; and the graph
> suggests generated 'on time'), but the display is horribly jerky and the
> test I have provided above (jerky3.html) proves that the problem/cause is
> that not all generated frames are being presented to the display.  Run the
> same test with IE, and I get a perfectly smooth display.
> 
> Just saying, FF has a bug somewhere...

How do you say that firefox doesn't present every frame? On my system between Chrome 38,IE11 and Firefox 32 there's little difference. IE11 is more stable but even firefox and chrome seems good enough and the graphs stay blue and on time.

Comment 51

3 years ago
On http://www.duckware.com/test/chrome/jerky3.html, let the test run for 10-20 seconds to stabilize, then, do you see any red/cyan inside the "VSYNC synchronized indicator".  If so, that proves that not every frame (generated) is being presented within a unique VSYNC refresh interval.

The graphs measure the software generation side of things (which is great in most web browsers today; 60fps is not problem given that unbounded, they can handle hundreds of frames).  The "VSYNC synchronized indicator" helps to measure how well (or not) those frames are synced with VSYNC (all web browsers have some issues with this).
(Reporter)

Comment 52

3 years ago
Guys, we _know_ the windows vsync implementation is not perfect. It landed at bug 856427 as a stop-gap till a better implementation emerges, and on that bug you'll also find extensive discussions, measurements and tests.

We're not going to improve the current implementation, but rather replace it with the one from project silk. The vsync from project silk is also better suited to work across platforms (with platform-specific vsync dispatcher), so that's where we're going.

Comment 53

3 years ago
(In reply to Avi Halachmi (:avih) from comment #52)
> Guys, we _know_ the windows vsync implementation is not perfect. It landed
> at bug 856427 as a stop-gap till a better implementation emerges, and on
> that bug you'll also find extensive discussions, measurements and tests.

What tests are you referring to that are able to test for proper vsync synchronization on the display side of things?  I see lots of tests that time the software rendering side of things, but that is not at issue.  The animation callback is being called on time, frames are being generated in no time -- so frames are being rendered great.  But the rendered frames are not all making it to the display (synced with vsync so each unique frame lands in the proper vsync refresh interval).  Where is a test for that?

Comment 54

3 years ago
(In reply to Avi Halachmi (:avih) from comment #52)
> Guys, we _know_ the windows vsync implementation is not perfect. It landed
> at bug 856427 as a stop-gap till a better implementation emerges, and on
> that bug you'll also find extensive discussions, measurements and tests.

Avi, what tests are you referring to that are able to test for proper vsync synchronization on the display side of things?  I see lots of tests that time the software rendering side of things, but that is not at issue.  The animation callback is being called on time, frames are being generated in no time -- so frames are being rendered great.  But the rendered frames are not all making it to the display (synced with vsync so each unique frame lands in the proper vsync refresh interval).  Where is a test for that?
Flags: needinfo?(avihpit)
(Reporter)

Comment 55

3 years ago
We didn't test "when it makes it to the display and whether or not it looks vsync-ed" automatically because it's impossible to measure this without a video camera recording to analyze (which is kinda what the Eideticker project does, but it only runs on mobile and AFAIK not fully integrated into our testing frameworks).

So the tests indeed measure the timing as far as the software could see it.

Of course, ultimately what we want is the perceived/observed smoothness, but the "software timing" tests do provide meaningful and useful metrics which could help us with this ultimate goal.

On top of that, the animations were observed to assess the smoothness subjectively, as you could see at comment 0 (this bug) and few others.

So yes, software timing is useful (especially when it's broken), but ultimately we should observe and assess smoothness visually, either subjectively, or with a video camera recording analysis, or with tools which help in evaluating the smoothness - like the one you posted at comment 51.
Flags: needinfo?(avihpit)
(Reporter)

Comment 56

2 years ago
(In reply to Avi Halachmi (:avih) from comment #34)
> ...
> I tested with the bookmarklet from comment 0, on the Firefox wikipedia page,
> with Nightly 2014-06-03, on a Windows 8.1 32b with Asus T100 laptop (Bay
> Trail Atom z3740):
> 
> Without ASAP, both performed around 16.7 ms/frame on average.
> 
> With ASAP:
> Without OMTC: ~13.5 ms/frame, stddev ~2
> With    OMTC: ~10.5 ms/frame, stddev ~3


It's been a while since I've updated this thread, so now with e10s and silk enabled by default, it's time to get some more numbers.

Using:
- same Asus T100 system but with newer drivers (though they don't seem to improve performance much, if at all, beyond those mentioned at comment 35).
- same bookmarklet
- same wikipedia page (possibly modified since then).
- Nightly 2015-03-20.
- e10s enabled.
- Silk enabled (all 3 prefs, including gfx.vsync.refreshdriver which is still disabled by default).

With ASAP:     ~7   ms/frame, stddev ~1.7 ms
Without ASAP: ~17.1 ms/frame, stddev ~2   ms <- visibly drops 2-3 frames/s

intervals histogram:
10.0 - 12.0 ms: 2
12.0 - 14.0 ms: 15
14.0 - 15.9 ms: 31
15.9 - 17.3 ms: 176
17.3 - 18.0 ms: 15
18.0 - 22.0 ms: 37
22.0 - 32.0 ms: 15

However, without ASAP, when I scroll using the touchpad (two fingers), it looks super smooth.

And very interestingly, when I run the same bookmarklet right after scrolling with the touchpad, I get these stats:

Without ASAP: 16.68 ms/frame, stddev  0.12 ms <- looks 100% perfect.

intervals histogram:
14.0 - 15.9 ms: 2
15.9 - 17.3 ms: 295
17.3 - 18.0 ms: 2

It will run once or twice and still produce these perfect results, and some time later (~30s?) it will drop to less smooth. Use the touchpad to scroll again -> again super smooth and again the bookmarklet scroll perfectly too.

I thought that maybe the touchpad scroll somehow increase the system timers resolution to 1ms (timeBeginPeriod(1)), so I checked the windows timers resolution using clockres from systeminternals. Surprisingly, it was always fixed at 1ms - both when it was scrolling perfectly and also when it didn't.

No conclusion yet, but it seems like some "perfect timing" mode hides someplace.

Interestingly, I connected this tablet to a 1920x1080 display (instead of its internal 1366x768), and was able to reproduce identical results - Firefox scrolls the wikipedia page perfectly (literally - not a single dropped frame) on full HD screen, even on battery and in power saving mode, on an Intel Atom SoC with 2W SDP (this term apparently replaces TDP). During the scroll the CPU+iGPU power usage didn't go over 1W (using Intel's power gadget).

However, like with the tablet's screen, this only happens when scrolling using the touchpad, or using the bookmarklet for some seconds after using the touchpad scroll.

Using mouse wheel scroll instead of touchpad scroll before testing with the bookmarklet, or using the KB arrows - doesn't help. It only gets smooth after using the touchpad.

I also tried to observe the other CPU parameters during the scroll, and noticed that the CPU frequency was lower when it was not smooth (~500 - 1200MHz while not smooth, ~1800MHz while smooth).

Still interestingly, on both cases the CPU+GPU package was using ~1W, despite the fact that it was on higher frequency when it was smooth.
There was some discussion of an issue that sounds similar starting here: http://forums.mozillazine.org/viewtopic.php?f=23&t=2845451&start=75#p14037665

Further down [1], he reported that disabling all C-states in the BIOS got it working with the expected smoothness.

I wasn't sure what to conclude except that somewhere in the chain, something is falling back to a lower resolution timer or timestamp (perhaps the C-states are causing some fluctuations in the TSC that are causing us to stop trusting it and fall back to something more coarse).

[1] http://forums.mozillazine.org/viewtopic.php?f=23&t=2845451&start=75#p14040851
(Reporter)

Comment 58

2 years ago
Adding some more data points to make comparison more relevant.

Note that when ASAP is enabled, Silk is not relevant, and timing wise it should be performing identically with or without Silk (bug 1128690 makes sure of it).

2014-06 (comment 34)
> With ASAP:
> With    OMTC:
>   ~10.5 ms/frame, stddev ~3

2015-03 (comment 56)
> OMTC + ASAP + e10s:
>   ~7 ms/frame, stddev ~2

Some more data:

OMTC + ASAP + e10s + after touchpad scroll:
  6 ms/frame, stddev ~1 ms

OMTC + ASAP + no-e10s - regardless if after touchpad scroll or not:
  ~10.5 ms/frame, stddev 2 ms


Note that the OMTC+ASAP+ no-e10s look similar to those from 2014, but adding e10s into the mix seems to improve them considerably. This is also apparent from Talos e10s scroll results (tp5o_scroll - see bug 1144120 for some numbers).
(Reporter)

Comment 59

2 years ago
(In reply to Avi Halachmi (:avih) from comment #56)
> ...
> I also tried to observe the other CPU parameters during the scroll, and
> noticed that the CPU frequency was lower when it was not smooth (~500 -
> 1200MHz while not smooth, ~1800MHz while smooth).
> 
> Still interestingly, on both cases the CPU+GPU package was using ~1W,
> despite the fact that it was on higher frequency when it was smooth.

(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #57)
> ...
> Further down [1], he reported that disabling all C-states in the BIOS got it
> working with the expected smoothness.


I tested this theory further on a different system: Windows 8.1 64, i7-4500u (HD4400). That's a 15W TDP CPU+iGPU which typically idles at ~3.5W.

I tested scrolling using Firefox 33.0.3 and also Nightly 2015-03-20, in "balanced" power mode (where the minimum CPU State is set to "5%") and also when I changed the minimum state to 100% (which indeed increased the CPU frequency immediately):

- On All cases scrolling used ~12W power.
- On balanced mode it idled at ~3.5W, while in "min 100%" it idled around 4+W.
- Scrolling was noticeably smoother in "min 100%" mode on both browsers.

I'm now thinking that artificially increasing CPU power state while animating/scrolling/etc (for instance if we render 3 consecutive different frames, and turning it off after 1000 ms of of "non animation") could have a very positive effect on performance without any negative effect I could think of.

Lower power states are supposedly used for reducing power draw, but it doesn't seem that Firefox uses more power while animating in high power states. But it still perform visibly (and measurably) much smoother.
(Reporter)

Comment 60

2 years ago
(In reply to Avi Halachmi (:avih) from comment #59)
> I tested scrolling using Firefox 33.0.3 and also Nightly 2015-03-20 ...

36.0.3

Apologies for the spam.
(Reporter)

Comment 61

2 years ago
(In reply to Avi Halachmi (:avih) from comment #56)
> ...
> It will run once or twice and still produce these perfect results, and some
> time later (~30s?) it will drop to less smooth. Use the touchpad to scroll
> again -> again super smooth and again the bookmarklet scroll perfectly too.
> ...
> I also tried to observe the other CPU parameters during the scroll, and
> noticed that the CPU frequency was lower when it was not smooth (~500 -
> 1200MHz while not smooth, ~1800MHz while smooth).
> 
> Still interestingly, on both cases the CPU+GPU package was using ~1W,
> despite the fact that it was on higher frequency when it was smooth.


Just to have it for reference, I found exactly what the touchpad scroll does: it increases the "minimum processor state" for the current power scheme to 100%, and restores its original value 20 seconds later.

On this System (Asus T100 with Atom intel CPU), the min/max processor states are not available at the power config options (OEMs can hide it but a reg key can restore the UI), but I wrote a small program which reads the current scheme's minimum using the APIs PowerReadACValueIndex/PowerReadDCValueIndex (one for AC, one for battery) with GUID_PROCESSOR_SETTINGS_SUBGROUP/GUID_PROCESSOR_THROTTLE_MINIMUM.

Both reading this value and modifying it (yet another small piece of code) were successful according to the API return codes and also when observing the CPU frequency and power draw. This confirmed what the touchpad scroll was modifying.

When the minimum is set to 100%, the power draw for idle increases but not by much: on the Atom from 0.1W to 0.2W, on i7-4500U from 3.5W to ~4W.

This also seems much less intrusive than using GUID_PROCESSOR_SETTINGS_SUBGROUP/GUID_PROCESSOR_IDLE_DISABLE. The latter is only 1 bit and seems to really unleash everything such that the CPU almost reaches its TDP even on idle and the frequency goes to "turbo" (above the default frequency) as long as the thermal envelope allows.

I don't think Firefox should modify the user's power scheme parameters, especially when they're hidden and the user can't see that it's modified (though "restore scheme defaults" does work). But that's what the touchpad scroll does anyway.

FYI.
(In reply to Avi Halachmi (:avih) from comment #61)
> I don't think Firefox should modify the user's power scheme parameters,
> especially when they're hidden and the user can't see that it's modified
> (though "restore scheme defaults" does work).

It does sound like a pretty invasive thing to do, but to play devil's advocate for a moment: is it really so different from calling timeBeginPeriod/timeEndPeriod (on Windows), which we already do*? These calls can also seriously impact battery life, but it's a tradeoff we're willing to make for smooth animation.

* and there have been bugs where an unmatched pair keeps the system at its highest granularity until Firefox is closed
(Reporter)

Comment 63

2 years ago
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #62)
> ... is it really so different from calling
> timeBeginPeriod/timeEndPeriod (on Windows), which we already do* ...
> 
> * and there have been bugs where an unmatched pair keeps the system at its
> highest granularity until Firefox is closed

Actually it is quite worse.

I assumed the existence of "restore default" which I've used in the past, but apparently for the specific power scheme I was at ("Power Saver"), restore default was disabled, and the minimum processor state for this config stayed after reboot (when I modified it myself), such that it never gets back by itself (and also the touchpad scroll now will never restore it since it restores the original value it recorded when the scroll started).

Apparently this specific scheme was not part of windows' defaults (hence probably the disable restore defaults option), so "powercfg -restoredefaultschemes" actually removed it. Probably/hopefully reinstalling one of the drivers will bring it back again, possibly even the touchpad driver.

Anyway, it can be irreversible especially if it's a custom scheme and especially if the minimum processor state is hidden by default by the OEM on that system.

Sounds too much for my taste TBH.

Updated

a year ago
See Also: → bug 911584, bug 995728
You need to log in before you can comment on or make changes to this bug.