Closed Bug 606672 Opened 14 years ago Closed 5 years ago

Fennec scrolling code is too expensive.

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: romaxa, Unassigned)

References

(Depends on 1 open bug)

Details

Attachments

(21 files, 2 obsolete files)

Flat oprofile 14 years ago Oleg Romashin (:romaxa) 200.49 KB, text/plain		Details
Callgraph profile 14 years ago Oleg Romashin (:romaxa) 1.74 MB, text/plain		Details
Scroll test 14 years ago Oleg Romashin (:romaxa) 2.15 KB, patch		Details \| Diff \| Splinter Review
trace log TMFLAGS=aborts (x86) 14 years ago Oleg Romashin (:romaxa) 1.18 KB, text/plain		Details
profile mjit.chrome = false 14 years ago Oleg Romashin (:romaxa) 208.99 KB, text/plain		Details
trace log TMFLAGS=aborts (arm) 14 years ago Oleg Romashin (:romaxa) 8.77 KB, text/plain		Details
trace log TMFLAGS=full (arm) 14 years ago Oleg Romashin (:romaxa) 466.88 KB, text/plain		Details
Backtrace to jsInvoke, script->filename/lineno 14 years ago Oleg Romashin (:romaxa) 7.02 KB, text/plain		Details
oprofile with mjit enabled (manual scrolling) 14 years ago Oleg Romashin (:romaxa) 215.73 KB, text/plain		Details
No mjit manual scrolling 14 years ago Oleg Romashin (:romaxa) 211.10 KB, text/plain		Details
Scrolling pipeline optimize part1 14 years ago Oleg Romashin (:romaxa) 1.50 KB, patch	mfinkle : review+	Details \| Diff \| Splinter Review
Part 2 14 years ago Oleg Romashin (:romaxa) 2.92 KB, patch	mfinkle : review+	Details \| Diff \| Splinter Review
Combined patch, with header 14 years ago Oleg Romashin (:romaxa) 4.25 KB, patch		Details \| Diff \| Splinter Review
Remove getBoundingClientRect from scrollBy 14 years ago Oleg Romashin (:romaxa) 1.31 KB, patch	mfinkle : review+	Details \| Diff \| Splinter Review
Another version of getBoundingClientRect remove from mousemove 14 years ago Oleg Romashin (:romaxa) 2.00 KB, patch		Details \| Diff \| Splinter Review
libxul.so fennec simple scrolling oprofile. consume ~25% of all CPU. 14 years ago Oleg Romashin (:romaxa) 71.04 KB, text/plain		Details
Scroll printing, BuildDisplayListForStackingContext 14 years ago Oleg Romashin (:romaxa) 4.49 KB, text/plain		Details
no rendering, no BuildDisplayList calls 14 years ago Oleg Romashin (:romaxa) 51.22 KB, text/plain		Details
libxul opreport graph 14 years ago Oleg Romashin (:romaxa) 191.22 KB, image/svg+xml		Details
callgraph oprofile libxul.so only 14 years ago Oleg Romashin (:romaxa) 774.49 KB, text/plain		Details
0.2% limit callgraph SVG 14 years ago Oleg Romashin (:romaxa) 528.20 KB, image/svg+xml		Details
Patch for _panControlsAwayOffset 14 years ago Ilkka Otsala 7.04 KB, patch		Details \| Diff \| Splinter Review
Scrolling callgraph for latest tip 13 years ago Oleg Romashin (:romaxa) 263.06 KB, image/svg+xml		Details

Oleg Romashin (:romaxa)

Reporter

Description

•

14 years ago

I've measured scrolling speed on latest fennec (mc:a2f52283e534, mb:1fc62351ecc8) with layers, enabled methodjit for chrome.

I was building GTK fennec.

and found that fennec scrolling still lot more expensive comparing to microb (default browser).

Fennec  ~ 35-39 FPS
MicroB  ~ 50-52 FPS

for test was used:
fennec -url file:///usr/share 

and special program-wrapper which is measuring XSHMPutImage frequency.

Oleg Romashin (:romaxa)

Reporter

Comment 1

•

14 years ago

With default browser we have lot of samples in CPU idle and maximum FPS rate, and for fennec we have lower FPS rate and almost no free CPU

samples %      image name              app name              symbol name
Default Browser MICROB
2711  30.1692  vmlinux-2.6.28-omap1    vmlinux-2.6.28-omap1  omap3_enter_idle
1791  19.9310  libtablet-browser-view  browser               scale_image_16(short*, int, int, int, sh
768    8.5466  libc-2.5.so             Xorg                  memcpy
164    1.8251  oprofiled               oprofiled             /usr/bin/oprofiled
77     0.8569  libpixman-1.so.0.17.3   browser               pixman_rasterize_edges
58     0.6454  vmlinux-2.6.28-omap1    oprofiled             __link_path_walk
55     0.6121  vmlinux-2.6.28-omap1    vmlinux-2.6.28-omap1  tick_nohz_stop_sched_tick
54     0.6009  vmlinux-2.6.28-omap1    vmlinux-2.6.28-omap1  tick_nohz_restart_sched_tick
52     0.5787  busybox                 busybox               /bin/busybox
51     0.5675  vmlinux-2.6.28-omap1    Xorg                  unix_poll
50     0.5564  oprofile                oprofile              /oprofile

Fennec Browser
666    6.9274  libc-2.5.so             Xorg                  memcpy
648    6.7402  libxul.so               fennec                pixman_composite_src_0565_0565_asm_neon
424    4.4102  libxul.so               fennec                pixman_composite_src_n_0565_asm_neon
183    1.9035  oprofiled               oprofiled             /usr/bin/oprofiled
98     1.0193  libxul.so               fennec                .plt
95     0.9881  vmlinux-2.6.28-omap1    vmlinux-2.6.28-omap1  omap3_enter_idle
91     0.9465  vmlinux-2.6.28-omap1    Xorg                  unix_poll
83     0.8633  vmlinux-2.6.28-omap1    Xorg                  do_select
73     0.7593  fennec                  fennec                malloc
64     0.6657  vmlinux-2.6.28-omap1    fennec                schedule
63     0.6553  oprofile                oprofile              /oprofile

Oleg Romashin (:romaxa)

Reporter

Comment 2

•

14 years ago

Attached file Flat oprofile — Details

checking fennec profile I  found some overhead from JS (mjit), .plt, and lot of other functions 0.5% each.. 

InlineGetProp(js::VMFrame&
js::mjit::stubs::UncachedCallHelper
js::Interpret
....

Oleg Romashin (:romaxa)

Reporter

Comment 3

•

14 years ago

Attached file Callgraph profile — Details

Oleg Romashin (:romaxa)

Reporter

Comment 4

•

14 years ago

question is would it be possible to have scrolling/zooming code without JS interpret/methodjit overhead and which bugs need to be fixed before we can get that?

Does it make sense to keep critical functionality in native code?

Oleg Romashin (:romaxa)

Reporter

Updated

•

14 years ago

Depends on: 606730

Oleg Romashin (:romaxa)

Reporter

Comment 5

•

14 years ago

Attached patch Scroll test — Details — Splinter Review

we are loosing ~6 % only in JS code, and ~ 4% in wasting BgColor rendering

This patch is emulate scrolling without bunch of JS code, just scroll by 1 pixel up/down (activating with long tap)

with this test we have:
56 fps and ~16 % CPU idle 
56 fps and ~24 % CPU idle  - with bg color rendering disabled

Oleg Romashin (:romaxa)

Reporter

Comment 6

•

14 years ago

Attached file trace log TMFLAGS=aborts (x86) — Details

Trace log

Oleg Romashin (:romaxa)

Reporter

Updated

•

14 years ago

Attachment #485557 - Attachment description: trace log TMFLAGS=aborts → trace log TMFLAGS=aborts (x86)

Oleg Romashin (:romaxa)

Reporter

Comment 7

•

14 years ago

Attached file profile mjit.chrome = false — Details

This is oprofile with methodjit.chrome = false (only tracejit = true for content and chrome).

by some reason I  don't see any trace function in this build..
here is my mozconfig:
***************************
ac_add_options --enable-application=mobile
ac_add_options --enable-optimize
ac_add_options --disable-crashreporter
ac_add_options --enable-update-channel=nightly
ac_add_options --enable-update-packaging
ac_add_options --disable-debug
ac_add_options --enable-tests
ac_add_options --enable-codesighs
ac_add_options --enable-chrome-format=flat
ac_add_options --enable-optimize=" -g -O2 -fno-omit-frame-pointer "
ac_add_options --enable-default-toolkit=cairo-gtk2
export CFLAGS="-gdwarf-2"
export CXXFLAGS="-gdwarf-2"
export MOZ_DEBUG_SYMBOLS=1
ac_add_options --enable-debug-symbols="-gdwarf-2"
export MOZILLA_OFFICIAL=1
mk_add_options PROFILE_GEN_SCRIPT=@TOPSRCDIR@/build/profile_pageloader.pl
#mk_add_options MOZ_MAKE_FLAGS="-j4"
mk_add_options MOZ_OBJDIR="obj-fn-gtk-arm5"
ac_add_options --with-maemo-version=5
ac_add_options --disable-pedantic --with-arm-kuser
***************************

looks like we have only JS-interpretation in current GTK n900 builds

Oleg Romashin (:romaxa)

Reporter

Comment 8

•

14 years ago

Attached file trace log TMFLAGS=aborts (arm) — Details

this trace log from arm.
trace log show only one new entry:

trace stopped: 12876: object used as index
Abort recording of tree resource://gre/modules/XPCOMUtils.jsm:127@19 at resource://gre/modules/XPCOMUtils.jsm:127@31: getelem.

which is pointing to:
http://mxr.mozilla.org/mozilla-central/source/js/src/xpconnect/loader/XPCOMUtils.jsm#127
generateQI: function XPCU_generateQI(interfaces) {
return makeQI([Ci[i].name for each (i in interfaces) if (Ci[i])]);

Oleg Romashin (:romaxa)

Reporter

Comment 9

•

14 years ago

Attached file trace log TMFLAGS=full (arm) — Details

full TMFLAGS log

Oleg Romashin (:romaxa)

Reporter

Comment 10

•

14 years ago

Attached file Backtrace to jsInvoke, script->filename/lineno — Details

I've checked backtrace with simple scrolltest patch applied (https://bugzilla.mozilla.org/attachment.cgi?id=485556&action=edit).
breakpoint on js::Interpret show that interpretation done for scrollTimeout: function().

Oleg Romashin (:romaxa)

Reporter

Comment 11

•

14 years ago

even if I don't do any scrolling, I  have js::Interpret breakpoint working for
nsUpdateTimerManager.js: notify: function TM_notify(timer)

also
nsPlacesDBFlush.js:notify: function DBFlush_timerCallback
nsPlacesDBFlush.js:XPCOMUtils.defineLazyGetter(this, "_db", function()

for normal (use scrolling) I have js::Interpret working at
chrome://browser/content/input.js", lineno = 118
MouseModule.prototype = {
  handleEvent: function handleEvent(aEvent) {


something really goes wrong here, and tracing does not work here, and I  don't understand why....

Mark Finkle (:mfinkle) (use needinfo?)

Comment 12

•

14 years ago

(In reply to comment #6)
> Created attachment 485557 [details]
> trace log TMFLAGS=aborts (x86)
> 
> Trace log

Looks like the code in the scrolling test should find a way to move the QI out of the loop:

frmldr =  getBrowser().QueryInterface(Ci.nsIFrameLoaderOwner).frameLoader;

Mark Finkle (:mfinkle) (use needinfo?)

Comment 13

•

14 years ago

(In reply to comment #11)
> even if I don't do any scrolling, I  have js::Interpret breakpoint working for
> nsUpdateTimerManager.js: notify: function TM_notify(timer)

Needed for update checking

> nsPlacesDBFlush.js:notify: function DBFlush_timerCallback
> nsPlacesDBFlush.js:XPCOMUtils.defineLazyGetter(this, "_db", function()

I believe this timer is fired to update the tempdb tables for Places

> for normal (use scrolling) I have js::Interpret working at
> chrome://browser/content/input.js", lineno = 118
> MouseModule.prototype = {
>   handleEvent: function handleEvent(aEvent) {
> 
> 
> something really goes wrong here, and tracing does not work here, and I  don't
> understand why....

Do you have an abort log for this one?

Aakash Desai [:aakashd]

Updated

•

14 years ago

tracking-fennec: --- → ?

Doug Turner (:dougt)

Comment 14

•

14 years ago

the DBFlush timer is going away:
  https://bugzilla.mozilla.org/show_bug.cgi?id=518804#c6

Oleg Romashin (:romaxa)

Reporter

Comment 15

•

14 years ago

it is not about removing flush timer... it is about why this code not jitted? and why trace abort does not show any warnings...

does anyone have profile stuff on android? to verify that it is not my local problem?

tracking-fennec: ? → ---

Oleg Romashin (:romaxa)

Reporter

Comment 16

•

14 years ago

ok, thanks... after landing bug 605415 and enabling methodjit for chrome jsInterpret has disappeared from profile...

Question is what else prevent us from enabling method jit by default?

Depends on: 605415

Oleg Romashin (:romaxa)

Reporter

Comment 17

•

14 years ago

Attached file oprofile with mjit enabled (manual scrolling) — Details

Oleg Romashin (:romaxa)

Reporter

Comment 18

•

14 years ago

Attached file No mjit manual scrolling — Details

this and previous profile taken from latest central build, with disabled color-layer rendering and applied QT-RENDER_DIRECT patch.

approximately with and without mjit I have ~48-49 FPS scrolling
without mjit we have js::Interpret = 4.1337%
with mjit we have:
0.8303  anon (tgid:8125 range:0x49f00000-0x49f49000)
0.4940 InlineGetProp
0.4940 js::mjit::stubs::CallProp
0.4414 anon (tgid:8125 range:0x48b8c000-0x48c00000
0.3573 js::Interpret
~2.6%


should it be like this?

Oleg Romashin (:romaxa)

Reporter

Comment 19

•

14 years ago

I have checked why js::Interpret with mjit enabled and gdb show this:

Breakpoint 3, js::Interpret (cx=0x43f77300, entryFrame=0x46900290, inlineCallCount=0, interpMode=JSINTERP_NORMAL) at /home/romaxa/mozdev/mozillahg/mozilla-central/js/src/jsinterp.cpp:2168
2168	Interpret(JSContext *cx, JSStackFrame *entryFrame, uintN inlineCallCount, JSInterpMode interpMode)
(gdb) bt
#0  js::Interpret (cx=0x43f77300, entryFrame=0x46900290, inlineCallCount=0, interpMode=JSINTERP_NORMAL) at /home/romaxa/mozdev/mozillahg/mozilla-central/js/src/jsinterp.cpp:2168
#1  0x413da7e8 in js::RunScript (cx=0x43f77300, script=<value optimized out>, fp=0x46900290) at /home/romaxa/mozdev/mozillahg/mozilla-central/js/src/jsinterp.cpp:637
#2  0x413db544 in js::Invoke (cx=0x43f77300, argsRef=<value optimized out>, flags=0) at /home/romaxa/mozdev/mozillahg/mozilla-central/js/src/jsinterp.cpp:740
#3  0x41526ce4 in js::mjit::stubs::SlowCall (f=@0xbeef8090, argc=1095920868) at /home/romaxa/mozdev/mozillahg/mozilla-central/js/src/methodjit/InvokeHelpers.cpp:197
#4  0x41520304 in SlowCallFromIC (f=@0x43f77300, ic=<value optimized out>) at /home/romaxa/mozdev/mozillahg/mozilla-central/js/src/methodjit/MonoIC.cpp:417
#5  0x4dafbf2c in ?? ()
Cannot access memory at address 0x1ffff8
(gdb) frame 2
#2  0x413db544 in js::Invoke (cx=0x43f77300, argsRef=<value optimized out>, flags=0) at /home/romaxa/mozdev/mozillahg/mozilla-central/js/src/jsinterp.cpp:740
740	        ok = RunScript(cx, script, fp);
(gdb) p *script
...
 filename = 0x48abf151 "chrome://browser/content/browser.js", lineno = 765, 

browser.js", lineno = 765, pointing to:
http://mxr.mozilla.org/mobile-browser/source/chrome/content/browser.js#794

Oleg Romashin (:romaxa)

Reporter

Comment 20

•

14 years ago

I check after some investigation I found that if just call 
frameLoader.scrollViewportBy(x, y);
somewhere in beginning of browser.js:"dragMove:"  function,  and with disabled this._waitingForPaint handler check

then I  have ~60FPS

full fennec scrolling give us only ~36 FPS

so we just loosing half of scrolling performance in JS mouse move/kinetic handler code.

For example this 2 lines:
x = Math.floor(Math.max(0, Math.min(docWidth - viewportWidth, frameLoader.viewportScrollX + x)) - frameLoader.viewportScrollX);
y = Math.floor(Math.max(0, Math.min(docHeight - viewportHeight, frameLoader.viewportScrollY + y)) - frameLoader.viewportScrollY);

drop FPS rate by 5 FPS 55->50...

Oleg Romashin (:romaxa)

Reporter

Comment 21

•

14 years ago

also I  found that we have very deep JS stack in scrolling code:
handleEvent -> _onMouseMove -> _doDragMove -> _dragBy -> dragMove -> _panScroller -> scrollBy(browser.js) -> scrollBy (browser.xml)

and each stack level remove ~1 FPS from our 60FPS range

so if I just move with hack frmldr.scrollViewportBy into handleEvent or _onMouseMove
I getting 60 FPS without any problem

Oleg Romashin (:romaxa)

Reporter

Comment 22

•

14 years ago

Also one mega super slow functions is 
_panControlsAwayOffset:

Oleg Romashin (:romaxa)

Reporter

Comment 23

•

14 years ago

panControlsAwayOffset - calculation and document.getElementById calls drops FPS from 59->55

Oleg Romashin (:romaxa)

Reporter

Comment 24

•

14 years ago

hmm very strange but I tried to minimize function calls amount and include whole Browser.contentScrollboxScroller scroll functionality in one function, and found that this piece of code cost us ~12 FPS... don't know why.
*************************
      var x = {};
      var y = {};
      var x1 = {};
      var y1 = {};
      Browser.contentScrollboxScroller.getPosition(x, y);
      ///  frameloader scrollViewportBy
      Browser.contentScrollboxScroller.getPosition(x1, y1);
      doffset.subtract(x1 - x0, y1 - y0);
*************************
also commenting out of "doffset.subtract" line give us lot of FPS...

Also I  found that doffset.subtract even more expensive for JS with methodjit comparing to JS it tracejit.

Matt Brubeck (:mbrubeck)

Updated

•

14 years ago

OS: Linux → Maemo

Hardware: x86 → ARM

Oleg Romashin (:romaxa)

Reporter

Comment 25

•

14 years ago

Ok I've separated subtract into next:
1)
var x2 = {};
var y2 = {};
2)
x2 = x1 - x0;
y2 = y1 - y0;
3)
doffset.x -= x2;
doffset.y -= y2;

and found that with 2) we have FPS drop
if 2) commented out and only 3) available, then perf is ok

Mark Finkle (:mfinkle) (use needinfo?)

Comment 26

•

14 years ago

(In reply to comment #25)
> Ok I've separated subtract into next:
> 1)
> var x2 = {};
> var y2 = {};

I assume this could be written as: var x2, y2;

We shouldn't need the {} in this change.

Oleg Romashin (:romaxa)

Reporter

Comment 27

•

14 years ago

Ok, I  replaced:
- Browser.contentScrollboxScroller.getPosition(x, y);
+ let scroll = getBrowser().getPosition();
- Browser.contentScrollboxScroller.getPosition(x1, y1);
+ let scroll1 = getBrowser().getPosition();

doffset.subtract(scroll1.x - scroll.x, scroll1.y - scroll.y);

and it works fast

Oleg Romashin (:romaxa)

Reporter

Comment 28

•

14 years ago

Attached patch Scrolling pipeline optimize part1 — Details — Splinter Review

Attachment #490568 - Flags: review?(mark.finkle)

Mark Finkle (:mfinkle) (use needinfo?)

Comment 29

•

14 years ago

Comment on attachment 490568 [details] [diff] [review]
Scrolling pipeline optimize part1


>   /** Pan scroller by the given amount. Updates doffset with leftovers. */
>   _panScroller: function _panScroller(scroller, doffset) {
>-    let { x: x0, y: y0 } = Browser.getScrollboxPosition(scroller);
>+    let scroll = Browser.getScrollboxPosition(scroller);

use "scroll0", just so it's easier to tell them apart

Attachment #490568 - Flags: review?(mark.finkle) → review+

Oleg Romashin (:romaxa)

Reporter

Comment 30

•

14 years ago

Attached patch Part 2 — Details — Splinter Review

Attachment #490581 - Flags: review?(mark.finkle)

Mark Finkle (:mfinkle) (use needinfo?)

Comment 31

•

14 years ago

Comment on attachment 490581 [details] [diff] [review]
Part 2

>diff -r 673c3b2feb98 chrome/content/browser-ui.js

>   ["controls",           "browser-controls"],
>   ["panelUI",            "panel-container"],
>   ["viewBuffer",         "view-buffer"],

Please remove this one. It's not used in the code anymore

>   ["toolbarContainer",   "toolbar-container"],
>-  ["browsers",           "browsers"]
>+  ["browsers",           "browsers"],
>+  ["contentViewPort",    "content-viewport"],

"contentViewport" please

Attachment #490581 - Flags: review?(mark.finkle) → review+

Benjamin Stover (:stechz)

Comment 32

•

14 years ago

Nice!

Oleg Romashin (:romaxa)

Reporter

Comment 33

•

14 years ago

sad, our mouse move event handling code is lot more expensive comparing to RenderLayer code.

without rendering and mouse handling we have      ~46% CPU free
with rendering and without mouse handling we have ~30% CPU free
without rendering and with mouse handling we have ~21% CPU free

so, our GL rendering code using ~15% and at the same time 21% just for mouse move event handling (Fennec mouse module just return immediately without any handling)

Oleg Romashin (:romaxa)

Reporter

Comment 34

•

14 years ago

I've replaced MouseMove event with Pinch gesture event, in order to drop some unexpected expensive mousemove listeners, and got +2% CPU idle.

Oleg Romashin (:romaxa)

Reporter

Comment 35

•

14 years ago

Attached patch Combined patch, with header — Details — Splinter Review

Oleg Romashin (:romaxa)

Reporter

Comment 36

•

14 years ago

ok, things looks really bad here, because we do next:
- call getBoundingClientRect (call FlushLayout)
 http://mxr.mozilla.org/mobile-browser/source/chrome/content/browser.js#1150
- scroll change parts of layout
 http://mxr.mozilla.org/mobile-browser/source/chrome/content/browser.js#1170
- call again getBoundingClientRect which flushLayout e.t.c
http://mxr.mozilla.org/mobile-browser/source/chrome/content/browser.js#841

This makes scrolling FPS drop 60->47 :(

Oleg Romashin (:romaxa)

Reporter

Comment 37

•

14 years ago

I  think we should rewrite scrolling code, and make URLbar/sidebars position calculation without calling into expensive getBoundingClientRect, and calculate ahead how and what will be changed after calling scrollTo...

Also we should check for URLBar state and if it is not visible and scrolling from top->bottom, then we should not calculate anything at all.

Oleg Romashin (:romaxa)

Reporter

Updated

•

14 years ago

Summary: GTK-Fennec scrolling too expensive, and slower comparing to N900 default browser → Fennec scrolling code is too expensive.

Oleg Romashin (:romaxa)

Reporter

Comment 38

•

14 years ago

Patches pushed into mb
http://hg.mozilla.org/mobile-browser/rev/20b73f653cb8

Oleg Romashin (:romaxa)

Reporter

Comment 39

•

14 years ago

Attached patch Remove getBoundingClientRect from scrollBy — Details — Splinter Review

Attachment #491515 - Flags: review?(mark.finkle)

Mark Finkle (:mfinkle) (use needinfo?)

Comment 40

•

14 years ago

Comment on attachment 491515 [details] [diff] [review]
Remove getBoundingClientRect from scrollBy

Did you test that the values coming from boxObject are the same as getBoundingClientRect?

Have you done a good amount of scroll testing on different web pages?

r+ if the answers are "yes"

Attachment #491515 - Flags: review?(mark.finkle) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 41

•

14 years ago

Argh, boxObject is evil and must die, and the fact it doesn't flush is really a bug.

Is there some other way to resolve this problem?

Benjamin Stover (:stechz)

Comment 42

•

14 years ago

Cache bcr. Browser element only uses width/height, so as long as browser doesn't resize it is valid.

If only there was an element resize event we could listen for in browser...

But perhaps Fennec could just add its own custom browser resize event that the browser element could listen for.

Oleg Romashin (:romaxa)

Reporter

Comment 43

•

14 years ago

> Is there some other way to resolve this problem?
we need to rewrite this code in such way that getBoundingClientRect never called in onmousemove event.

so we can even remember rect in dragStart function

Oleg Romashin (:romaxa)

Reporter

Comment 44

•

14 years ago

Attached patch Another version of getBoundingClientRect remove from mousemove — Details — Splinter Review

This is seems better version which just remember boundRect on dragStart and reset on dragStop

Previous patch with boxObject seems breaking kinetic scrolling

Attachment #491515 - Attachment is obsolete: true

Attachment #492047 - Flags: review?(mark.finkle)

Oleg Romashin (:romaxa)

Reporter

Comment 45

•

14 years ago

Also I found that we have 2 most expensive functions executed in layout which handling mouseMove:
1) nsIFrame::BuildDisplayListForChild  ~4% of libxul.so oprofile
is there are any existing perf patches of bugs related to BuildDisplayListForChild functionality?

2) nsBoxFrame::GetMouseThrough() const ~3% of libxul.so oprofile
roc: would it be possible to cache somehow GetMouseThrough value, and clear that cache when frame state changes?

Mark Finkle (:mfinkle) (use needinfo?)

Comment 46

•

14 years ago

Are the patches in r+'ed attachments still worth landing?

attachment 490568 [details] [diff] [review]
attachment 490581 [details] [diff] [review]

Both patches are low risk, so if they make a difference in scrolling performance, we should land them (with nits addressed) for b3.

I guess attachment 491351 [details] [diff] [review] is the combined patch with most nits addressed

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 47

•

14 years ago

We could try using a frame state bit instead of calling GetMouseThrough (which is virtual). That might help.

We don't have any optimizations for BuildDisplayListForChild, and it's probably hard to optimize well. Then again, 4% of the profile isn't a lot.

Oleg Romashin (:romaxa)

Reporter

Comment 48

•

14 years ago

Comment on attachment 492047 [details] [diff] [review]
Another version of getBoundingClientRect remove from mousemove

Does not make sense to do this small fix... we need to rework scrolling code

Attachment #492047 - Flags: review?(mark.finkle)

Oleg Romashin (:romaxa)

Reporter

Comment 49

•

14 years ago

Attached file libxul.so fennec simple scrolling oprofile. consume ~25% of all CPU. — Details

I removed all boundClientRects and made some optimized scrolling test page.. and we still consume ~25% CPU in libxul/layout display list builders... comparing to simple QT/Gtk/Qt-QML embedding browser (only ~5% CPU) it is too much.

Can we do anything about buildDisplayListForContext functionality, cache it or something like that?
Also I see lot of nsDisplayList::HitTest calls on each motion event which is always returning the same nsIFrame target 

roc: cjones: do you have any ideas about what could be optimized in this profile, and what we can drop for fennec UI/remoteViewport scrolling case?

Oleg Romashin (:romaxa)

Reporter

Comment 50

•

14 years ago

Attached file Scroll printing, BuildDisplayListForStackingContext — Details

While we do viewport scrolling with simple mouse move then we have:
BuildDisplayListForStackingContext for ~6 frames coming from
http://mxr.mozilla.org/mozilla-central/source/layout/base/nsPresShell.cpp#6551

+ repeat of BuildDisplayListForStackingContext coming from 
http://mxr.mozilla.org/mozilla-central/source/layout/base/nsLayoutUtils.cpp#1309

and then we repeating for each mouse move
nsPresShell.cpp#6551 with different move coordinates, and always getting same frame target (RemoteViewportFrame)
and nsLayoutUtils.cpp#1309 with exactly the same dirtyRect area.

Oleg Romashin (:romaxa)

Reporter

Comment 51

•

14 years ago

I think one option is Create some layout API and do targetFrame cache in
nsLayoutUtils::GetFrameForPoint (between mousePress/MouseRelease)

Also it make sense to cache main view displayList between mousePress/Release and don't waste CPU on rebuilding display list on each mouse move event (60 times per second.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 52

•

14 years ago

I don't want to start caching display lists. That will add even more overhead and complexity, making other things slower.

Instead we should try to make display list construction faster. For event processing we only need to traverse a small number of frames, it should be fast.

Oleg Romashin (:romaxa)

Reporter

Comment 53

•

14 years ago

> Instead we should try to make display list construction faster. For event
> processing we only need to traverse a small number of frames, it should be
> fast.

I think number of frames is already small enough (only UI elements).
the problem is that we are processing these frames using full HTML/SVG/XUL build display path, which is not very cheap...
So "make display list construction faster." - is something not very clear for me here. could you clarify how we can make it faster for certain UI handling cases?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 54

•

14 years ago

Profile it carefully and figure out how to make the code paths shorter.

For event handling, I'm guessing we spend quite a lot of time scanning the child lists of XUL frames in nsBoxFrame::BuildDisplayListForChildren, trying to find the child or children that overlap the dirty rect. Maybe we can reorder stuff in BuildDisplayListForChild so that the "does not intersect" case is reached faster. But that guess needs to be verified with profiling.

Oleg Romashin (:romaxa)

Reporter

Comment 55

•

14 years ago

> find the child or children that overlap the dirty rect. Maybe we can reorder
> stuff in BuildDisplayListForChild so that the "does not intersect" case is

don't understand what are you suggesting here. is it something simple to try or require first full understanding how BuildDisplayListForChild works in all details? (don't know that code for now)

Oleg Romashin (:romaxa)

Reporter

Comment 56

•

14 years ago

Attached file no rendering, no BuildDisplayList calls — Details

I tried to disable some functions and checked CPU usage:

  function commented          CPU %     CPU idle%
- no function               - 23.6176   0
- GetEventTarget            - 20.6176   3
- list.PaintRoot            - 17.2733   18 (high jump because rendering was gone, no GPU/sgx drivers involved)
- nsLayoutUtils::PaintFrame - 14.575    18

the rest of libxul.so CPU consumption is in attached opreport

~3.7% spent in functions "js::" ~2% spent in GetProp related functionality, seems JS code need some variable cache.

Oleg Romashin (:romaxa)

Reporter

Comment 57

•

14 years ago

Attached image libxul opreport graph (obsolete) — Details

Ok with callgraph I found one problem with WidgetQt, which was calling setCursor repeatedly even if cursor all the way the same...
and I got +8% CPU free!

Oleg Romashin (:romaxa)

Reporter

Updated

•

14 years ago

Depends on: 621931

Oleg Romashin (:romaxa)

Reporter

Comment 58

•

14 years ago

Attached file callgraph oprofile libxul.so only — Details

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 59

•

14 years ago

(In reply to comment #56)
> Created attachment 500169 [details]
> no rendering, no BuildDisplayList calls
> 
> I tried to disable some functions and checked CPU usage:
> 
>   function commented          CPU %     CPU idle%
> - no function               - 23.6176   0
> - GetEventTarget            - 20.6176   3

This result is a bit uncertain because going from 0% idle to 3% idle might mean we're painting more frames.

The profile graph shows 17% of the profile in nsLayoutUtils::GetFrameForPoint, which seems like a lot. Unfortunately the call stacks for nsIFrame::BuildDisplayListForStackingContext for painting and event handling got merged, so it's not easy to see what's going on in the event handling path.

The thing to do here might be to write a microbenchmark using elementFromPoint that tests finding an element in a stack of XUL boxes, and profile/optimize that.

Oleg Romashin (:romaxa)

Reporter

Comment 60

•

14 years ago

Attached image 0.2% limit callgraph SVG — Details

Attachment #500201 - Attachment is obsolete: true

Oleg Romashin (:romaxa)

Reporter

Updated

•

14 years ago

Depends on: 621984

Oleg Romashin (:romaxa)

Reporter

Comment 61

•

14 years ago

> This result is a bit uncertain because going from 0% idle to 3% idle might mean
> we're painting more frames.
in all cases we are painting 60 FPS, that is limited with vsync

Oleg Romashin (:romaxa)

Reporter

Updated

•

14 years ago

Depends on: 622090

Oleg Romashin (:romaxa)

Reporter

Updated

•

14 years ago

Depends on: 623485

Oleg Romashin (:romaxa)

Reporter

Updated

•

14 years ago

Depends on: 623487

Mark Finkle (:mfinkle) (use needinfo?)

Comment 62

•

14 years ago

Comment on attachment 491515 [details] [diff] [review]
Remove getBoundingClientRect from scrollBy

I know roc doesn't like boxObject, but it works and we seem to use boxObject.width and boxObject.height all over mozilla-central. Let's move forward with this patch in the interim.

Attachment #491515 - Attachment is obsolete: false

Ilkka Otsala

Comment 63

•

14 years ago

Attached patch Patch for _panControlsAwayOffset (obsolete) — Details — Splinter Review

Oleg asked me to get rid of getBoundingClientRect calls in _panControlsAwayOffset. 

Here is the first version of patch for feedback. The implementation is not bullet proof yet, there is some cases where calculations goes a bit wrong. 

But anyhow, to be able to get rid of those getBoundingClientRect calls, the only possible approach I found is to maintain few variables for current visibility of side and top bars and according to those try calculate out how much certain side bar pans away.

I compared how much time is spent in both implementations, in the new one and in old one. According to my measurements new implementation for vertical panning is ~70% faster and for horizontal panning ~35% faster.

I really appreciate for feedback and suggestions.

Attachment #503138 - Flags: feedback?(mark.finkle)

Oleg Romashin (:romaxa)

Reporter

Comment 64

•

14 years ago

> Patch for _panControlsAwayOffset

err, we have separate bug for this problem already bug 622090

Ilkka Otsala

Comment 65

•

14 years ago

Comment on attachment 503138 [details] [diff] [review]
Patch for _panControlsAwayOffset

>err, we have separate bug for this problem already bug 622090
Moved the patch to correct bug, and set this as obsolete.

Attachment #503138 - Attachment is obsolete: true

Attachment #503138 - Flags: feedback?(mark.finkle)

Oleg Romashin (:romaxa)

Reporter

Comment 66

•

13 years ago

Attached image Scrolling callgraph for latest tip — Details

Timothy Nikkel (:tnikkel)

Comment 67

•

13 years ago

Comment on attachment 524580 [details]
Scrolling callgraph for latest tip

The graph has the call sequence PresShell::DispatchSynthMouseMove, nsViewMananger::DispatchEvent, nsViewManager::Refresh, but I don't think that is possible. The DispatchEvent call from DispatchSynthMouseMove can't call Refresh because it only dispatches a mouse move event, not paint events.

Oleg Romashin (:romaxa)

Reporter

Updated

•

13 years ago

Depends on: 657844

Oleg Romashin (:romaxa)

Reporter

Updated

•

13 years ago

Depends on: 657859

Oleg Romashin (:romaxa)

Reporter

Comment 68

•

13 years ago

Ok, one more measurement:
with fennec scrolling code disabled, only this._dragger._contentView.scrollBy(0, 1); in onMouseMove call we have:
MouseEventSendTime: CurreTime:1297843735.139312
ScrollByStart: CurreTime:1297843735.143066, prevDiff:0.003.754 - 3.7ms for dispatching mouse move event and call contentView.scrollBy
Invalidate: CurreTime:1297843735.143493, prevDiff:0.000427
ScrollByFinished: CurreTime:1297843735.143615, prevDiff:0.000122
PaintStart: CurreTime:1297843735.146362, prevDiff:0.002747
PaintFinished: CurreTime:1297843735.150909, prevDiff:0.004547 - 4.5 ms for painting
Full paint round ~12ms

With fennec JS scrolling code enabled:
MouseEventSendTime: CurreTime:1297850461.586303
ScrollByStart: CurreTime:1297850461.597076, prevDiff:0.010773 - ~11ms for processing mousemove event, practically ~7ms spent in expensive js code
Invalidate: CurreTime:1297850461.597412, prevDiff:0.000336
ScrollByFinished: CurreTime:1297850461.597503, prevDiff:0.000091
PaintStart: CurreTime:1297850461.603759, prevDiff:0.006256
PaintFinished: CurreTime:1297850461.607635, prevDiff:0.003876

Oleg Romashin (:romaxa)

Reporter

Comment 69

•

13 years ago

Found that we sending TapMove event for each mouseMove... is that really need to be un-conditional? can we init that event only when it is time to send that event?

Doug Turner (:dougt)

Comment 70

•

13 years ago

wes ^

Wesley Johnston (:wesj)

Comment 71

•

13 years ago

Not quite sure what you mean by "init that event only when it is time to send that event". We use TapMove events for two things I know of. Moving things in the UI (right now that's just selection markers, but we'll probably eventually allow things like drag and drop reordering of tabs, moving sliders, etc.) and for sending touchevents to the child process.

Getting around the first shouldn't be hard. We may be able to user our dragging code a bit more efficiently, or implement something similar but more efficient.

Currently, we check if we think the child process might have touchevent listeners on touchdown and use that to determine whether we need to wait and watch for preventDefault before scrolling pages. We could make that better (bug 654129):

http://mxr.mozilla.org/mozilla-central/source/mobile/chrome/content/browser.js#1956

We could backport that... somehow... to input.js in order to prevent even sending the TapMove events at all. We'd need to be careful about how we do that since we've been trying to keep input.js isolated from the browser. I'm not sure from those numbers how much win there is there.

Oleg Romashin (:romaxa)

Reporter

Comment 72

•

13 years ago

Yes, it would be nice to not send TapMove events when:
1) page does not have mouse/touch-move listeners.
2) first TapMove event was not handled during one press->move->release iteration. (we drag element  which does not handle mousemove events)

I'm checking MouseMove->Paint path, which is need to be <=16ms. and now we have that time equals ~20-25ms. Every such possible fix will not give use obvious significant win, but 10 such tweaks will make us close to target value.

BMO Automation

Comment 73

•

5 years ago

Closing all opened bug in a graveyard component

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.