Closed Bug 696248 Opened 13 years ago Closed 13 years ago

Flipping between "fixed" and "not-fixed" headers in GMail scrolling is slow

Categories

(Core :: Layout, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla10

People

(Reporter: roc, Assigned: roc)

References

Details

(Keywords: testcase)

Attachments

(4 files)

In my GMail account I opened the dev.planning folder and then opened a large thread ("a five week release cycle"), nothing expanded.

With the patches in bug 681867 we're faster than Chrome for normal scrolling, on my machine, but we stutter horribly when the top toolbar and the right-hand "People" pane transition in and out of being position:fixed.

To quantify this, I used this bookmarklet:
var startTime=Date.now(); var iters = 0; function doScroll() { var w = document.getElementById("canvas_frame").contentWindow; w.scrollTo(0, w.scrollY==110 ? 130 : 110); if (iters++ < 100) setTimeout(doScroll, 0); else alert(w.innerWidth + "x" + w.innerHeight + ": " + (Date.now() - startTime) + "ms"); } doScroll();
with a content window size of 1543x816.

The scrollY values are carefully chosen. In both Firefox and Chrome, at offset 130 the top toolbar and "People" pane are position:fixed, and at 110 they're both not. I presume an onscroll event handler changes those styles.

Firefox trunk (with the patches for bug 681867): about 27s
Chrome: 4.6s
There are probably several things slowing us down here.

Part of the problem is that we fire onscroll asynchronously, so we're quite likely to scroll, paint, run the onscroll event, paint again, scroll again, etc... i.e. we could paint more than once per scroll operation.
This helps a bit. It reduces the time to about 22s in my test.

Probably more importantly, it greatly improves the appearance because we don't paint any 'bad' frames where we've scrolled down quite far but the onscroll event hasn't run yet.
Attachment #568595 - Flags: review?(matspal)
Attached file testcase
Scrolling in this testcase looks way better with the patch than without it.
It's unfortunate that as you scroll GMail the top toolbar becomes position:fixed first, then the right sidebar becomes position:fixed at a different offset, so we go through two bumpy transitions instead of just one.
I did some profiling with the above patch applied.

win32k.sys!NtGdiDdDDIDestroyAllocation and win32k.sys!NtGdiDdDDICreateAllocation amount to about 5% of the profile. So it looks like graphics memory churn is significant.

8% under ntkrnlmp.exe!KiChainedDispatch. Some sort of generic system-call overhead.

9% under nvwgf2um.dll not attributable elsewhere ... I think that's the NVidia graphics driver.

At least 13% in JS and various callees, mostly property access but some flushing of notifications.

Overall about 8% under nsDocument::FlushPendingNotifications, flushing style and reflows.

60% of the profile is under nsWindow::OnPaint. 56% is under nsDisplayList::PaintRoot (i.e. not dispay list construction and optimization). 55% is under LayerManagerD3D10::EndTransaction (i.e. not layer construction). 53% is under FrameLayerBuilder::DrawThebesLayer. We're just spending a lot of time painting display items here:
-- nsDisplayBorder::Paint is 33% of the profile
-- nsDisplayBackground::Paint is 16% of the profile
-- nsDisplayText::Paint is 4% of the profile.

nsDisplayBorder::Paint is taking the path that does a PushGroup. That's probably hurting us a lot here; the PushGroup alone is 8%, and there's associated memory churn and flushing and PopGroup compositing. Also, the gfxContext::Fill under DrawBorderSidesCompositeColors is taking a path that leads to _cairo_d2d_blend_temp_surface which doesn't sound good ... 9% under that function.

nsDisplayBackground::Paint is almost entirely _cairo_d2d_create_linear_gradient_brush --- 13% of the profile.

nsDisplayText::Paint has nothing jumping out at me, but it's already relatively small.
> nsDisplayBackground::Paint is almost entirely
> _cairo_d2d_create_linear_gradient_brush --- 13% of the profile.

Bas says there's no way to avoid calling CreateLinearGradientBrush (where all the time is) for every fill, without significant cairo changes. No problem in Azure though (where we can cache the GradientStops object). Probably should just Azure-ify.
The problematic borders seem to be the borders for the individual email messages in the conversation view (which are highly dense when all the subjects are collapsed away). The relevant style is

.Bk .G2 {
  padding-top:3px;
  background-color:#fff;
  border:2px solid #cfcfcf;
  -moz-border-right-colors:#EFEFEF #cfcfcf;
  -moz-border-bottom-colors:#e2e2e2 #cfcfcf;
  -moz-border-left-colors:#EFEFEF #cfcfcf;
  border-top:1px solid #cfcfcf;
  -moz-border-radius:2px;
  border-radius:2px
}

So we've got border-radius plus multiple colors :-(.

In Chrome, this element only has the #cfcfcf border color; an outer element carries the other color.
Attached file Testcase #2
Testing the patch with a m-c debug build on Linux64.

STR
1. load testcase #2
2. scroll  
=> painting stops completely until you dismiss the alert (use ESC)

STR
0. create a few blank tabs
1. load testcase #2
2. scroll
3. resize the window  
=> painting stops permanently even after pressing ESC for a while.
clicking on a tab does change the window title though. strange error
messages after a while:
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "'[JavaScript Error: "too much recursion" {file: "chrome://browser/content/browser.js" line: 1389}]' when calling method: [mozIStorageStatementCallback::handleCompletion]"  nsresult: "0x80570021 (NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS)"  location: "<unknown>"  data: yes]
Results with m-c debug build on Linux64:
with patch: 28 or 29 paints (no flickering)
without patch: around 70 paints (major flickering)

Would it be possible to make a regression test based on something like this?

There's also a difference in how this test behaves when moving
the mouse around while the test runs:
with patch: same as above
without patch: painting more or less stops until the scrolling stops,
sometimes the number is below 10, not so much flickering but most of
the time you don't see the counter until the scrolling stops.

Is there an explanation for this difference in behavior with the
patch applied?
Keywords: testcase
OS: Windows 7 → All
Hardware: x86_64 → All
Comment on attachment 568595 [details] [diff] [review]
Part 1: flush onscroll events before painting

r- because I don't think the nested event loop behavior is
acceptable (Testcase #2).  Other than that, the patch looks good. 
Please move the new members after mUpdatePluginGeometryForFrame
though so it isn't disconnected from its comment.
Attachment #568595 - Flags: review?(matspal) → review-
OK, so we need to block onscroll from firing during a nested event loop.
Hmm, maybe it's more complicated than that :-(. This is probably related to bug 626963 and bug 670666. It seems that spinning up an event loop under WM_PAINT is just a bad idea.

I think this is a good reason to do what I suggested in bug 598482 comment 62: paint synchronously in the refresh driver and do all flushing before we trigger the synchronous paint.

In the meantime, maybe we should just disable running a nested event loop during WM_PAINT to fix this and bug 626963?
(breaking alert() and async XHR during resize and onscroll events, but I think that's OK?)
Actually it seems to me that if we fix bug 598482 first, we can make the refresh driver flush scroll events and then we'll be mostly OK since scroll events will be flushed when we do the invalidation for the scroll from the refresh driver via the view manager.
Mats, given the existing issues with alert() in resize event handlers, what do you think about taking the current patch until we fix those one of the ways I discussed above? I presume this isn't going to affect any real Web sites.
Comment on attachment 568595 [details] [diff] [review]
Part 1: flush onscroll events before painting

Ok.  I'm still a little bit worried about if it will open new ways to
crash us in unsafe ways... we've had some troubles in this area in
the past.  From that I can see though, it shouldn't make it worse.

Two nits:
It doesn't compile on OSX which doesn't allow >> in the template -
you need to separate them with a space. (in two places)

Please add in PresShell::WillPaint
  if (mIsDestroying)
    return;
after the rootPresContext->FlushWillPaintObservers(); line
otherwise you'll get 
###!!! ASSERTION: Must have view manager: '!isSafeToFlush || mViewManager', file layout/base/nsPresShell.cpp, line 3973
if you destroy the pres shell from the onscroll handler.
Attachment #568595 - Flags: review- → review+
https://hg.mozilla.org/mozilla-central/rev/a2cb1c0671d5
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
Whiteboard: [inbound]
Depends on: 729456
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: