Last Comment Bug 696248 - Flipping between "fixed" and "not-fixed" headers in GMail scrolling is slow
: Flipping between "fixed" and "not-fixed" headers in GMail scrolling is slow
Status: RESOLVED FIXED
: testcase
Product: Core
Classification: Components
Component: Layout (show other bugs)
: Trunk
: All All
: -- normal with 1 vote (vote)
: mozilla10
Assigned To: Robert O'Callahan (:roc) (email my personal email if necessary)
:
Mentors:
Depends on: 729456
Blocks:
  Show dependency treegraph
 
Reported: 2011-10-20 16:02 PDT by Robert O'Callahan (:roc) (email my personal email if necessary)
Modified: 2012-02-22 04:06 PST (History)
15 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Part 1: flush onscroll events before painting (5.21 KB, patch)
2011-10-20 20:25 PDT, Robert O'Callahan (:roc) (email my personal email if necessary)
mats: review+
Details | Diff | Splinter Review
testcase (276 bytes, text/html)
2011-10-20 20:26 PDT, Robert O'Callahan (:roc) (email my personal email if necessary)
no flags Details
Testcase #2 (444 bytes, text/html)
2011-10-21 16:37 PDT, Mats Palmgren (:mats)
no flags Details
Testcase #3 (only works in debug build with x-test plugin) (811 bytes, text/html)
2011-10-21 16:58 PDT, Mats Palmgren (:mats)
no flags Details

Description Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-20 16:02:48 PDT
In my GMail account I opened the dev.planning folder and then opened a large thread ("a five week release cycle"), nothing expanded.

With the patches in bug 681867 we're faster than Chrome for normal scrolling, on my machine, but we stutter horribly when the top toolbar and the right-hand "People" pane transition in and out of being position:fixed.

To quantify this, I used this bookmarklet:
var startTime=Date.now(); var iters = 0; function doScroll() { var w = document.getElementById("canvas_frame").contentWindow; w.scrollTo(0, w.scrollY==110 ? 130 : 110); if (iters++ < 100) setTimeout(doScroll, 0); else alert(w.innerWidth + "x" + w.innerHeight + ": " + (Date.now() - startTime) + "ms"); } doScroll();
with a content window size of 1543x816.

The scrollY values are carefully chosen. In both Firefox and Chrome, at offset 130 the top toolbar and "People" pane are position:fixed, and at 110 they're both not. I presume an onscroll event handler changes those styles.

Firefox trunk (with the patches for bug 681867): about 27s
Chrome: 4.6s
Comment 1 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-20 20:21:33 PDT
There are probably several things slowing us down here.

Part of the problem is that we fire onscroll asynchronously, so we're quite likely to scroll, paint, run the onscroll event, paint again, scroll again, etc... i.e. we could paint more than once per scroll operation.
Comment 2 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-20 20:25:42 PDT
Created attachment 568595 [details] [diff] [review]
Part 1: flush onscroll events before painting

This helps a bit. It reduces the time to about 22s in my test.

Probably more importantly, it greatly improves the appearance because we don't paint any 'bad' frames where we've scrolled down quite far but the onscroll event hasn't run yet.
Comment 3 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-20 20:26:49 PDT
Created attachment 568596 [details]
testcase

Scrolling in this testcase looks way better with the patch than without it.
Comment 4 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-20 21:25:18 PDT
It's unfortunate that as you scroll GMail the top toolbar becomes position:fixed first, then the right sidebar becomes position:fixed at a different offset, so we go through two bumpy transitions instead of just one.
Comment 5 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-20 22:10:04 PDT
I did some profiling with the above patch applied.

win32k.sys!NtGdiDdDDIDestroyAllocation and win32k.sys!NtGdiDdDDICreateAllocation amount to about 5% of the profile. So it looks like graphics memory churn is significant.

8% under ntkrnlmp.exe!KiChainedDispatch. Some sort of generic system-call overhead.

9% under nvwgf2um.dll not attributable elsewhere ... I think that's the NVidia graphics driver.

At least 13% in JS and various callees, mostly property access but some flushing of notifications.

Overall about 8% under nsDocument::FlushPendingNotifications, flushing style and reflows.

60% of the profile is under nsWindow::OnPaint. 56% is under nsDisplayList::PaintRoot (i.e. not dispay list construction and optimization). 55% is under LayerManagerD3D10::EndTransaction (i.e. not layer construction). 53% is under FrameLayerBuilder::DrawThebesLayer. We're just spending a lot of time painting display items here:
-- nsDisplayBorder::Paint is 33% of the profile
-- nsDisplayBackground::Paint is 16% of the profile
-- nsDisplayText::Paint is 4% of the profile.

nsDisplayBorder::Paint is taking the path that does a PushGroup. That's probably hurting us a lot here; the PushGroup alone is 8%, and there's associated memory churn and flushing and PopGroup compositing. Also, the gfxContext::Fill under DrawBorderSidesCompositeColors is taking a path that leads to _cairo_d2d_blend_temp_surface which doesn't sound good ... 9% under that function.

nsDisplayBackground::Paint is almost entirely _cairo_d2d_create_linear_gradient_brush --- 13% of the profile.

nsDisplayText::Paint has nothing jumping out at me, but it's already relatively small.
Comment 6 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-20 22:24:24 PDT
> nsDisplayBackground::Paint is almost entirely
> _cairo_d2d_create_linear_gradient_brush --- 13% of the profile.

Bas says there's no way to avoid calling CreateLinearGradientBrush (where all the time is) for every fill, without significant cairo changes. No problem in Azure though (where we can cache the GradientStops object). Probably should just Azure-ify.
Comment 7 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-21 03:51:30 PDT
The problematic borders seem to be the borders for the individual email messages in the conversation view (which are highly dense when all the subjects are collapsed away). The relevant style is

.Bk .G2 {
  padding-top:3px;
  background-color:#fff;
  border:2px solid #cfcfcf;
  -moz-border-right-colors:#EFEFEF #cfcfcf;
  -moz-border-bottom-colors:#e2e2e2 #cfcfcf;
  -moz-border-left-colors:#EFEFEF #cfcfcf;
  border-top:1px solid #cfcfcf;
  -moz-border-radius:2px;
  border-radius:2px
}

So we've got border-radius plus multiple colors :-(.

In Chrome, this element only has the #cfcfcf border color; an outer element carries the other color.
Comment 8 Mats Palmgren (:mats) 2011-10-21 16:37:07 PDT
Created attachment 568824 [details]
Testcase #2

Testing the patch with a m-c debug build on Linux64.

STR
1. load testcase #2
2. scroll  
=> painting stops completely until you dismiss the alert (use ESC)

STR
0. create a few blank tabs
1. load testcase #2
2. scroll
3. resize the window  
=> painting stops permanently even after pressing ESC for a while.
clicking on a tab does change the window title though. strange error
messages after a while:
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "'[JavaScript Error: "too much recursion" {file: "chrome://browser/content/browser.js" line: 1389}]' when calling method: [mozIStorageStatementCallback::handleCompletion]"  nsresult: "0x80570021 (NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS)"  location: "<unknown>"  data: yes]
Comment 9 Mats Palmgren (:mats) 2011-10-21 16:58:31 PDT
Created attachment 568828 [details]
Testcase #3 (only works in debug build with x-test plugin)

Results with m-c debug build on Linux64:
with patch: 28 or 29 paints (no flickering)
without patch: around 70 paints (major flickering)

Would it be possible to make a regression test based on something like this?

There's also a difference in how this test behaves when moving
the mouse around while the test runs:
with patch: same as above
without patch: painting more or less stops until the scrolling stops,
sometimes the number is below 10, not so much flickering but most of
the time you don't see the counter until the scrolling stops.

Is there an explanation for this difference in behavior with the
patch applied?
Comment 10 Mats Palmgren (:mats) 2011-10-21 17:12:03 PDT
Comment on attachment 568595 [details] [diff] [review]
Part 1: flush onscroll events before painting

r- because I don't think the nested event loop behavior is
acceptable (Testcase #2).  Other than that, the patch looks good. 
Please move the new members after mUpdatePluginGeometryForFrame
though so it isn't disconnected from its comment.
Comment 11 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-21 21:06:10 PDT
OK, so we need to block onscroll from firing during a nested event loop.
Comment 12 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-22 03:18:02 PDT
Hmm, maybe it's more complicated than that :-(. This is probably related to bug 626963 and bug 670666. It seems that spinning up an event loop under WM_PAINT is just a bad idea.

I think this is a good reason to do what I suggested in bug 598482 comment 62: paint synchronously in the refresh driver and do all flushing before we trigger the synchronous paint.

In the meantime, maybe we should just disable running a nested event loop during WM_PAINT to fix this and bug 626963?
Comment 13 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-22 03:19:16 PDT
(breaking alert() and async XHR during resize and onscroll events, but I think that's OK?)
Comment 14 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-24 19:18:02 PDT
Actually it seems to me that if we fix bug 598482 first, we can make the refresh driver flush scroll events and then we'll be mostly OK since scroll events will be flushed when we do the invalidation for the scroll from the refresh driver via the view manager.
Comment 15 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-24 19:20:06 PDT
Mats, given the existing issues with alert() in resize event handlers, what do you think about taking the current patch until we fix those one of the ways I discussed above? I presume this isn't going to affect any real Web sites.
Comment 16 Mats Palmgren (:mats) 2011-10-25 20:04:01 PDT
Comment on attachment 568595 [details] [diff] [review]
Part 1: flush onscroll events before painting

Ok.  I'm still a little bit worried about if it will open new ways to
crash us in unsafe ways... we've had some troubles in this area in
the past.  From that I can see though, it shouldn't make it worse.

Two nits:
It doesn't compile on OSX which doesn't allow >> in the template -
you need to separate them with a space. (in two places)

Please add in PresShell::WillPaint
  if (mIsDestroying)
    return;
after the rootPresContext->FlushWillPaintObservers(); line
otherwise you'll get 
###!!! ASSERTION: Must have view manager: '!isSafeToFlush || mViewManager', file layout/base/nsPresShell.cpp, line 3973
if you destroy the pres shell from the onscroll handler.
Comment 17 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-25 21:15:33 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/a2cb1c0671d5
Comment 18 Ed Morley [:emorley] 2011-10-26 17:07:01 PDT
https://hg.mozilla.org/mozilla-central/rev/a2cb1c0671d5

Note You need to log in before you can comment on or make changes to this bug.