Closed Bug 382392 Opened 13 years ago Closed 12 years ago

FF3 20070528: scrolling non-fixed-pos content when fixed-pos content present is extremely sluggish

Categories

(Core :: Layout, defect)

x86
All
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: thomasverelst, Assigned: roc)

References

()

Details

(Keywords: perf, regression, testcase)

Attachments

(5 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a5pre) Gecko/20070528 Minefield/3.0a5pre
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a5pre) Gecko/20070528 Minefield/3.0a5pre

Starting with this build:
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a5pre) Gecko/20070528 Minefield/3.0a5pre
(download folder: 2007-05-28-04-trunk)

...when scrolling up/down a page which has both fixed and non-fixed positioned content, I observe the following issues:

- scrolling is extremely sluggish
- temporary repainting problems of non-fixed positioned content in varying locations during scrolling, even scrolling slowly (chunks of text disappearing below each other like colliding tectonic plates; also, horizontal lines appear at the same height in those areas)
- scrollbar knob trails behind the cursor at quite a distance while dragging it

This may be due to fixing bug 343430

Reproducible: Always
Version: unspecified → Trunk
Component: General → Layout
Product: Firefox → Core
QA Contact: general → layout
The main problem seems to be caused by bug 368247, although bug 343430 made it look uglier. This could be a duplicate of one of the bug 368247 blocking bugs.
Blocks: 368247
Depends on: 379834
I can't reproduce this -- scrolling is quite fast for me on that page.  However, there is a long dotted border on that page, so this could be something like bug 379834 
A quick test removing the dotted border shows that the border is the cause of the performance issue; removing it also fixes the rest of the stuff I posted in the main post.

I did notice a minor performance degradation while scrolling for some time and, just like with bug 379834, traced it down to the change from 20070430 to 20070501.

The impact on performance became more apparent for me since build 20070528 but I guess that the severity level is irrelevant if removing the border fixes it.
Status: UNCONFIRMED → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 379834
No longer depends on: 379834
Hmm... well...

Using 3.0a5 build 20070531, the scrolling is back to normal but the "tectonic plate" effect is still there (very noticeable when scrolling down slowly using the scrollbar knob).

It's like there are two layers.  The layer in the back spans the entire viewport and the front layer covers the top half of the viewport.  When scrolling down, the front layer moves upwards correctly but the back layer comes in a little late.

Whatever... I don't know how to explain this any better...
I'm not seeing much improvement either. I'm getting a bit seasick when I scroll down. I think it also depends on the performance of the computer, but the regression is still there. 
Blocks: 343430
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---
Status: UNCONFIRMED → NEW
Ever confirmed: true
So removing the 1px dotted border improves the scrolling quite a bit, so that part is basically bug 379834 revisited.
However, even after removing that, scrolling performance isn't quite as good as on branch.
Depends on: 379834
Flags: blocking1.9?
Attached file testcase
A fixed positioned object with a lot of text on the right seems to be another reason for reduced scrolling performance on current trunk builds.
Attached file testcase2
This testcases makes the performance regression even more clear.
With current trunk build I get 27235ms.
With a branch build, I get 13531ms.
Flags: blocking1.9? → blocking1.9-
Whiteboard: [wanted-1.9]
I guess that this bug is what causes scrolling at gmail to be extremly sluggish when a chat window is present.
Duplicate of this bug: 408054
Flags: wanted1.9+
Whiteboard: [wanted-1.9]
Duplicate of this bug: 414952
Assignee: nobody → roc
In testcase 2, the display list code and view manager are doing the right thing: they order a bitblit scroll and invalidate the non-moving DIV on the left and the scrolled-into-view strip at the top or bottom of the window. On Mac, we end up repainting the whole window because Appkit seems to take the union of the dirty rects and pass that back to us as the rect to paint --- no region information available :-(.

I'm looking into Windows next...
Windows has similar problems. We don't have a repaint region for it, just a bounding rectangle.
Attached patch possible fixSplinter Review
Separating the paint of the scrolled-into-view strips from the paint of any non-moving content, by inserting an extra Composite() call, fixes the problem on Mac.

On Windows things are more complicated because we have logic to restrict the area we scroll, so even with the extra Composite() call, the invalidation of the non-moving, non-scrolled content invalidates two strips whose union is most of the window. Removing that special Windows logic fixes that, but regresses bug 343430.

This fixes bug 424915.

Regressing bug 343430 isn't really acceptable, so I'm going to try to keep the XP_WIN block and make Windows nsWindow::OnPaint get and pass down the update region.
On GTK2 the problem is that there is no way to scroll just part of a window. So we either scroll the whole window and get jerky results for the fixed-pos stuff, which is bug 343430, or we scroll none of the window and repaint everything instead, which is slow. So I'm not going to try to fix this for GTK2 which is unfortunate since it means 424915 won't be fixed.
That extra-Composite trick regresses bug 343430 on Mac, so we don't really want to do that. On Mac we're in a similar situation to GTK2, we currently don't support scrolling of a partial window so we either get jerkiness or slowness.
Attached patch Windows fixSplinter Review
So in the end I can only fix this on Windows. But at least this fix makes total sense and is basically a good thing all round.
Attachment #312476 - Flags: review?(vladimir)
(In reply to comment #17)
> On Mac we're in a similar situation to GTK2, we currently don't
> support scrolling of a partial window so we either get jerkiness or slowness.

Let's expose a preference option for this! :)

Roc - do you have any idea on how it may influence the DHTML animation performance on windows?
If I understand the concept we'll start redrawing only a region of window on each frame. Should it result in better performance or smoothness or it's just for such edge cases like this one?
It would certainly improve some tests, but could hurt others, partly depending on what Windows does internally.

Currently when we paint on Windows we always paint everything inside the bounding-rectangle of what really needs to be painted. With this patch we can paint only the region that really needs to be painted. This makes a big difference when the area that needs to be painted is, say, a small rectangle at the top-left of the window and a small rectangle at the bottom-right of the window.

However, tracking complex regions can take more time and even space, so it's not always a win, especially if they're really complex.
thanks for this explanation. 
do you want any testing to be done around this? Unfortunately I don't have Windows build env, but I can spend a few hours running a build with this through my testcasebase :)
Sure, that'd be really useful. You can use the try-servers I guess.
Comment on attachment 312476 [details] [diff] [review]
Windows fix

Hah, I actually had this same patch a long while ago but I didn't see any perf difference, so I never went anywhere with it... I didn't have the right testcase for it, looks like.  (I love "GetRandomRgn", probably my favourite windows API name.  I also love how they changed the coorinate space between OS revisions...)
Attachment #312476 - Flags: review?(vladimir) → review+
the try-servers either down or they don't like me at all :(
Comment on attachment 312476 [details] [diff] [review]
Windows fix

Fixes a performance regression on real sites. Fairly low risk since the region painting path has been used by GTK2 for ages. GetRandomRgn is a weird API but it is documented as the right way to do what we need to do here.
Attachment #312476 - Flags: approval1.9?
Attachment #312476 - Flags: approval1.9? → approval1.9+
Checking in widget/src/windows/nsWindow.cpp;
/cvsroot/mozilla/widget/src/windows/nsWindow.cpp,v  <--  nsWindow.cpp
new revision: 3.735; previous revision: 3.734
done

Is this bug fixed then, even if only a Windows fix could be found?
Status: NEW → ASSIGNED
Keywords: checkin-needed
Dunno. It seems a shame to not mark anything fixed, but we should probably leave this open for Mac/Linux.
OS: Windows 2000 → All
I backed this out, as I believe it caused the crashes in bug 426369.
Depends on: 426369
Yes, stuart pointed out that the patch fails to free paintRgn and this causes us to crash in other places when we run out of GDI objects and fail to check errors properly. Oops. New patch coming up.

It would be nice if our leak tracking tracked GDI objects somehow...
Call DeleteObject.

I ran an APNG demo for a while watching the Task Manager's GDI object count. No leak.
Attachment #312971 - Flags: review?(vladimir)
Whiteboard: [needs review]
I filed bug 426412 when I saw the GDI leak (before comment 29 appeared), becuase I never had a crash, like bug 426369, just a large leak. But everything is ok now, with the 3th update from today (Gecko/2008040112) where this was backed out.
Comment on attachment 312971 [details] [diff] [review]
Windows fix that doesn't leak

Looks good; r=me
Attachment #312971 - Flags: review?(vladimir) → review+
Relanded
Status: ASSIGNED → RESOLVED
Closed: 13 years ago12 years ago
Resolution: --- → FIXED
Flags: in-litmus?
In reference to #16:

> On GTK2 the problem is that there is no way to scroll just part of a window.

What's wrong with?

http://library.gnome.org/devel/gdk/stable/gdk-Windows.html#gdk-window-move-region

(Since 2.8 means that it's been in stable releases of GTK+ since Aug. 2005)
Wow. Cool. How does that interact with child widgets?
Whiteboard: [needs review]
What gdk_window_move_region() does:

 - It shifts the bits in the region
 - It shifts existing invalid regions that were contained in the region
 - It creates new invalidations corresponding to parts of the source region 
   that were not within the bounds of the window
 - It puts an item in the translation queue, so if an expose was generated
   by the X server before the move_region() and received after, it gets
   shifted appropriately

What it does not do by design:

 - It does not move child windows
 - It does not adjust the allocations of child widgets, since they are
   at the GTK layer, not the GDK layer. 

   If you have "NO_WINDOW" widgets that overlap the moved region, then
   their bits will then have been moved, but their allocations will not
   have been updated, and any child windows (like the input-only window that
   takes clicks on a button) will not have been moved. The simple way
   to fix up is to loop over immediate children of the owner of the 
   window, and if they are !NO_WINDOW, compute their new allocations and
   call gtk_widget_size_allocate(), which will update the allocation,
   move the child windows, and redraw the new and old locations. Advice
   on complex, sneaky ways of doing things that don't generate the 
   extra redraws available on request.

What it does not do as implementation limitation:

 - It does not generate client side invalidations for areas that 
   "scroll" out from under child windows; these areas will get redrawn
   only when a GraphicsExpose event is received back from the server.
   (Generally not a big deal unless you have a lot of latency in your
   X connection, though some trailing may be visible in these areas.)

 - It does not generate client side invalidations for areas that 
   "scroll" out from overlaying windows, again these will wait on
   events received back from the X server.
Thanks for the info! It sounds like that might actually work if we make GTK do what we do Windows.

But here's a question that is not quite clear from your answer: if we have !NO_WINDOW children and we do the fixup loop that you suggest, do we get flicker or not? With gdk_window_scroll we don't get flicker; if gdk_window_move_region regresses that, that would be a big problem. Is that where your complex, sneaky advice?

A long time ago I read
http://www.gtk.org/~otaylor/whitepapers/guffaw-scrolling.txt
which seems to only apply to scrolling the whole window, which made me think that getting scrolling right with child widgets and scrolling only part of the window wasn't an option with X.

Unfortunately it's very late to do a big change here for Firefox 3, and post-FF3 I'm planning to basically eliminate use of child widgets altogether which will make this easy.
The question to me is whether you want the !NO_WINDOW children to scroll
or not. 

 - If you don't want them to scroll, no fixup is necessary

 - If you do want them to scroll, then (just like with OffsetRgn on windows)
   you'll have to scroll, then move the children, and you'll get a bit of
   jitter. But a complete redraw of the window contents doesn't help that
   particular jitter either - the only way to fix that jitter would be to
   coordinate with the compositing manager to prevent an intermediate 
   screen refresh (in the composited case), or to get rid of child windows.

   Jitter like this where parts of the window move out of sync is not
   wonderfully pretty, but it's much better than gdk_window_scroll() then 
   redrawing the fixed elements, since if you do that the fixed elements
   bounce around.

Yes - guffaw scrolling is in fact only applicable to scrolling the whole 
window.
Yeah, we want them to move. So I agree gdk_window_move_region would be an improvement. I've filed bug 427431 on that. I'm not sure if it's worth trying to ram that in at this point.
Depends on: 430065
You need to log in before you can comment on or make changes to this bug.