Closed Bug 635035 Opened 13 years ago Closed 13 years ago

ThebesLayerBuffer can compute the wrong quadrant to draw into with non-identity resolution, should support self-copies for image surfaces

Categories

(Core :: Layout, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
fennec 2.0+ ---

People

(Reporter: stechz, Assigned: cjones)

References

Details

Attachments

(11 files, 4 obsolete files)

2.02 KB, patch
Details | Diff | Splinter Review
5.59 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
2.22 KB, patch
roc
: review+
Details | Diff | Splinter Review
4.02 KB, patch
roc
: review+
Details | Diff | Splinter Review
1.85 KB, patch
Details | Diff | Splinter Review
1.35 KB, patch
roc
: review+
Details | Diff | Splinter Review
3.95 KB, patch
Details | Diff | Splinter Review
4.57 KB, patch
roc
: review+
Details | Diff | Splinter Review
2.55 KB, patch
Details | Diff | Splinter Review
4.53 KB, patch
Details | Diff | Splinter Review
161.77 KB, image/png
Details
When panning in Fennec, the Thebes buffer will change frequently because of small changes in the visible region that defines its dimensions. This is bad on Fennec, because it causes frequent reallocations that result in fragmentation.

The jumpy numbers are caused by setting the visible region in FrameLayerBuilder:
http://mxr.mozilla.org/mozilla-central/source/layout/base/FrameLayerBuilder.cpp#938

Which is a direct result of unioning all the display items together:
http://mxr.mozilla.org/mozilla-central/source/layout/base/FrameLayerBuilder.cpp#1057

We either need to stabilize the visible region, or not realloc buffers when the visible region slightly changes in ThebesLayerBuffer.
Keywords: mobile, perf
tracking-fennec: --- → 2.0+
The buffer can be any size you want as long as it contains the visible region. So we shouldn't mess with the visible region, but instead change the buffer allocation strategy.
very hard to reproduce on the desktop.  very easy to reproduce on android.  maybe android specific.
On any sites in particular?
i have been using news.google.com, zooming into a small area of text, and panning up and down.

I rebuilt desktop with MOZ_GFX_OPTIMIZE_MOBILE enabled and think I can see the problem -- it isn't as obvious as on android.

I think it is this change (but probably should verify tomorrow)


http://mxr.mozilla.org/mozilla-central/source/layout/base/nsPresShell.cpp#5322
That code isn't on the "normal" painting path, it's what we use for drawWindow(), so I wouldn't expect that to affect displayport stuff.
Assignee: nobody → crowderbt
FWIW, you can get similar output in a debug build with

NSPR_LOG_MODULES="Layers:5" fennec 2>&1 | grep 'buffer(x2)'
Loading google.com, I get

-254924480[1744a60]: BasicShadowableThebes(1ee81e0): creating 480 x 800 buffer(x2)
-254924480[1744a60]: BasicShadowableThebes(1e11f60): creating 481 x 800 buffer(x2)

and a similar effect in landscape

589527360[18f8a60]: BasicShadowableThebes(1f18eb0): creating 800 x 500 buffer(x2)
589527360[18f8a60]: BasicShadowableThebes(1f08b90): creating 800 x 501 buffer(x2)

These look like reallocs due to rounding buffer sizes out instead of to the nearest pixel.  See also bug 634759 comment 10.  If the frontend is already trying to set up resolution/displayport so as to get round device pixels, but missing just slightly from precision issues, I don't see a problem off-hand with rounding to the nearest pixel instead of rounding out.
Frontend is sending the exact same numbers on every displayport update, so rounding out or rounding in won't matter. f(x) is always f(x), whether f is round, floor, or ceil.

Please see comment #1.
The effect I see on google.com is, load page and alloc, then autofit and realloc.  The frontend can't be using the same combination of resolution/displayport (AFAIK), but indeed the resize could be due to a display list change instead of a rounding discrepancy.

But as I write this, I just remembered that we throw out buffers on resolution changes regardless of whether the buffer size would change, so this is a moot point.

I'm seeing more interesting results on the desktop version of news.google.com, looking.
Here's the general pattern I'm seeing panning up and down news.google.com, keeping resolution the same

-410846912[16dfa60]: BasicShadowableThebes(22e9900): creating 529 x 1601 buffer(x2)
-410846912[16dfa60]: ThebesLayerBuffer(22e9aa0): surface can't self-copy, throwing out buffer(x2)
-410846912[16dfa60]: BasicShadowableThebes(22e9900): creating 529 x 1601 buffer(x2)
-410846912[16dfa60]: BasicShadowableThebes(22e9900): creating 529 x 1603 buffer(x2)
-410846912[16dfa60]: ThebesLayerBuffer(22e9aa0): surface can't self-copy, throwing out buffer(x2)
-410846912[16dfa60]: BasicShadowableThebes(22e9900): creating 529 x 1601 buffer(x2)
-410846912[16dfa60]: ThebesLayerBuffer(22e9aa0): surface can't self-copy, throwing out buffer(x2)
-410846912[16dfa60]: BasicShadowableThebes(22e9900): creating 529 x 1601 buffer(x2)
-410846912[16dfa60]: BasicShadowableThebes(22e9900): creating 529 x 1603 buffer(x2)

I'm not sure why we're fluctuating between h=1601 and h=1603, might be display-list monkeyshine.  That's not a huge deal though in this case, because once we get a h=1603 buffer we should be keeping it.  The inability of gfxImageSurface to self-copy is causing us to discard the buffer.  Might need to fix, will poke a bit more.
(I realized over dinner that I haven't pulled from m-b in a while, like a month.  Might change things a bit.)
Brian, I have some ideas for this, going to steal unless you object ;).
Assignee: crowderbt → jones.chris.g
The vast majority of the buffer reallocs on news.google.com were coming from thebes buffers trying to self-copy, but failing with image surfaces.  Fixing the lack of self-copy in pixman looks like a lost cause, judging by the code; I don't believe self-copies can be supported in the SIMD fast-paths without either deoptimizing the current code (uh-uh) or adding a slower path most users seem not to need.  We can fix this outside of pixman though, I'll look into that if it's a win over this impl.

Anyway, this patch compromises by "self-copying" through a temporary surface and back to the original surface.  This is strictly cheaper and less memory-intensive than allocating a new shmem buffer pair (in theory).
The other source of reallocs when panning up and down news.google.com was 1-3 pixel jitter in buffer dimensions.  With the previous patch, we do OK because when we jitter larger a bit, we can keep the larger size permanently.  But this patch eliminates the jitter for simple panning.  It would also help (though not much) with pages that grow a bit or whatever.

It's probably worth fixing the jitter on its own.  There's also some more buffer churn I don't understand yet.
If anyone has a chance to try these out, I'd really appreciate it.  WIP because they need a lot of manual testing :/.
(In reply to comment #14)
> The vast majority of the buffer reallocs on news.google.com were coming from
> thebes buffers trying to self-copy, but failing with image surfaces.  Fixing
> the lack of self-copy in pixman looks like a lost cause, judging by the code; I
> don't believe self-copies can be supported in the SIMD fast-paths without
> either deoptimizing the current code (uh-uh) or adding a slower path most users
> seem not to need.

Last time I looked, it didn't seem that bad. Vertical scrolling, which is the case we mostly care about, just turns into memmove. Horizontal scrolling by more than 4 pixels should be fine with SIMD too.
I wasn't claiming that the self-copy impl is hard, just that it appears to me that integrating it into pixman would result in code that the pixman people wouldn't be interested in.  Worth another look though if/when we want self-copies.
What about writing a small NEON-optimized version for our specific case here?
That's of course possible.  The order of things I want to fix is
 (1) Stop churning shmem on large scrolls
 (2) [if the copy-through-tmp-surface shows up in profiles] Implement simple self-copying of gfxImageSurface in C
 (3) [if the pure-C self-copy shows up in profiles] Consider SIMD-ized variants

I doubt (2) will be worth doing until bug 634397, because I saw a lot of shifting displayport while panning that looked like bug 624451.  (Cool stuff, BTW!)  I think (3) is a bit down the line, and might not ever be worth doing since memcpy/memmove are supposed to use SIMD themselves.  Sometimes that's even true.
I vote for option 2. For vertical scrolling you can use a single memmove() to do the entire copy, and the C library should have an optimal implementation of memmove().
Not in general, but for the cases we care about we can do that.
All the patches for this are based on 635373.
Depends on: 635373
(This problem has been around for a while, but I figured I'd just fix it here.)

I was seeing problems with front->back readback trying to update a bogus gfxContext for the newly-painted quadrant.  The reason turned out to be that ExtendForScaling() on the draw region moved the draw bounds outside of the ThebesLayerBuffer's buffer rect.  I don't understand the ExtendForScaling() code very well, but we do need to a gfxContext covering the same pixels in PaintBuffer() as the one updated during readback.

Since ThebesLayerBuffer::BeginPaint() asks for a context covering the un-extended draw region, this patch shifts things around so we ask for a quadrant for readback that covers the same bounds.
Attachment #514748 - Attachment is obsolete: true
Attachment #514749 - Attachment is obsolete: true
Attachment #515244 - Flags: review?(matt.woodrow+bugzilla)
(This is another pre-existing bug that I found while debugging my patches here.)

The problem here is that both mBufferRect *and* neededRegion.GetBounds() can be sized properly for the current buffer if there's scaling.  If neededRegion.GetBounds() happens to be larger than mBufferRect, then we end using the wrong, smaller size and messing up other calculations.  This patch keeps the don't-shrink-mBufferRect logic intact and deals with the edge-case when there's scaling.
Attachment #515245 - Flags: review?(roc)
memmove vs. memcpy shouldn't make much of a difference if they're implemented well, but that's worth measuring at some point, especially with android's libc.
Attachment #515248 - Flags: review?(roc)
I'm not particularly happy with this patch.  Suggestions welcome.
Attachment #515249 - Flags: review?(roc)
I can't think of any reason to keep this around.
Attachment #515250 - Flags: superreview?(roc)
Should add that I decided against adding buffer-fluffing here.  I suspect the cause of the buffer-size jitter is related to the underlying cause of bug 630743.  Can investigate fluffing later if needed, it's trivial.
Comment on attachment 515244 [details] [diff] [review]
part 1: Separate the extended draw region and computed draw region so that shadow-layer readback gets the same quadrant as what was drawn

We're actually trying to get rid of ExtendForScaling usage in bug 633344, but I think an equivalent problem will still exist since we still need to extend the draw region out.

Looks good to me!
Attachment #515244 - Flags: review?(matt.woodrow+bugzilla) → review+
Attachment #515246 - Flags: superreview?(roc) → superreview+
+     * |aSourceRect| and the destination rectangle defined by
+     * |aDestTopLeft| must lie entirely within this surface.
+     * (XXX should this API be friendlier?)
+     *
+     * Return true iff the pixels were successfully moved.

Actually ... I think this API should clip the source and destination to the bounds, and should always succeed.
Comment on attachment 515248 [details] [diff] [review]
part 4: Implement MovePixels() for image surfaces

+        // XXX can memcpy for disjoint dest/source
+        memmove(dst, src, nBytes);

Lose the comment. memmove should internally optimize for disjointness if that helps.
Attachment #515248 - Flags: review?(roc) → review+
+  // We have to assume here we're dealing with rects that are supposed
+  // to round to device pixels, or else many other things besides
+  // pixel moves will be wrong.
+  src.Round();
+  dest.Round();

Maybe we should assert something instead of rounding, then? Maybe we should actually assert that MovePixels is only called for identity resolution?
Attachment #515250 - Flags: superreview?(roc) → superreview+
(In reply to comment #32)
> +     * |aSourceRect| and the destination rectangle defined by
> +     * |aDestTopLeft| must lie entirely within this surface.
> +     * (XXX should this API be friendlier?)
> +     *
> +     * Return true iff the pixels were successfully moved.
> 
> Actually ... I think this API should clip the source and destination to the
> bounds, and should always succeed.

Sure.

(In reply to comment #33)
> Comment on attachment 515248 [details] [diff] [review]
> part 4: Implement MovePixels() for image surfaces
> 
> +        // XXX can memcpy for disjoint dest/source
> +        memmove(dst, src, nBytes);
> 
> Lose the comment. memmove should internally optimize for disjointness if that
> helps.

Yes, good impls should.  But yeah, that comment isn't going to help anyone.

(In reply to comment #34)
> +  // We have to assume here we're dealing with rects that are supposed
> +  // to round to device pixels, or else many other things besides
> +  // pixel moves will be wrong.
> +  src.Round();
> +  dest.Round();
> 
> Maybe we should assert something instead of rounding, then? Maybe we should
> actually assert that MovePixels is only called for identity resolution?

Sure, we can assert that the values are within epsilon of whole pixels.  I'm pretty sure the cause of bug 630743 will trip these assertions, and indeed I saw some seaming this weekend when test-driving these patches.

Not self-copying for non-identity resolution is going to hurt on fennec.  There was a quite noticeable perf improvement with these patches and bug 634397.  With bug 630743 fixed, I don't think fennec would attempt fractional-device-pixel scrolls.  I'm not sure about transformed thebeslayers.
Whiteboard: [needs-review (roc)]
So are you saying that we know these Round() calls are wrong and should not be needed but we need them because the underlying bug is too hard to fix?
These patches need to be updated, they're not awaiting review.

(In reply to comment #36)
> So are you saying that we know these Round() calls are wrong and should not be
> needed but we need them because the underlying bug is too hard to fix?

No, I'm saying that the Round() calls should be correct for fennec after bug 630743 and bug 637852 are fixed.  I don't know about transformed thebeslayers.  We can selectively disable self-copy too for just transformed thebeslayers pretty easily.
Whiteboard: [needs-review (roc)]
Whiteboard: [needs-new-patch]
I like this API a lot better.  Thanks!
Attachment #515246 - Attachment is obsolete: true
Attachment #516994 - Flags: superreview?(roc)
The icky memmove stuff didn't change, but I'd appreciate a second look at the clip computations.
Attachment #516995 - Flags: review?(roc)
Whiteboard: [needs-new-patch] → [needs-review (roc)]
635035 part 1 now produces black screens for me on Android. Chris, any ideas?
No... works OK for me on desktop, but I haven't tried on device.  Are you running a build with other patches applied under this?
No other patches applied, and I was testing on Android. Do you mind seeing if it happens to you too?
I'm not really around right now, but I can look later.

Are seeing this on desktop too?  Do you have other patches applied above part 1?  Are there any pages in particular on which the black screens appear?
Comment on attachment 516994 [details] [diff] [review]
part 3: Add gfxASurface::MovePixels() interface and generic impl, v2

+    if (!bounds.Intersects(aSourceRect) || !bounds.Intersects(aSourceRect)) {

oops? Also I don't think you need this unless you think fully-clipped moves are common.

The API is OK though.
Attachment #516994 - Flags: superreview?(roc) → superreview+
+    long naturalStride = ComputeStride(mSize, mFormat);
+    if (mStride == naturalStride && dest.width == bounds.width) {

Why do we need to check the natural stride here? If there are unused bytes at the end of rows, why can't we just move them?

Dropping that check would also let this path work with negative stride, which could be useful.

Otherwise looks good.
Attachment #516996 - Flags: superreview?(roc) → superreview+
Comment on attachment 516996 [details] [diff] [review]
part 5: Add a helper to check whether a gfxRect is rounded within a given epsilon

Actually aEpsilon should be a double here.
(In reply to comment #46)
> I'm not really around right now, but I can look later.
> 
> Are seeing this on desktop too?  Do you have other patches applied above part
> 1?  Are there any pages in particular on which the black screens appear?

Like I said in c45, no other patches applied above part 1.

Desktop looks fine for me.
(In reply to comment #47)
> Comment on attachment 516994 [details] [diff] [review]
> part 3: Add gfxASurface::MovePixels() interface and generic impl, v2
> 
> +    if (!bounds.Intersects(aSourceRect) || !bounds.Intersects(aSourceRect)) {
> 
> oops? Also I don't think you need this unless you think fully-clipped moves are
> common.

Oops!  Shouldn't be, can remove.

(In reply to comment #48)
> +    long naturalStride = ComputeStride(mSize, mFormat);
> +    if (mStride == naturalStride && dest.width == bounds.width) {
> 
> Why do we need to check the natural stride here? If there are unused bytes at
> the end of rows, why can't we just move them?

No, the surface might be a subimage of another surface, e.g.

> Dropping that check would also let this path work with negative stride, which
> could be useful.

It would be possible to normalize the stride and first/last row before figuring out which path to use.  Do you think negative strides are worth optimizing for?
Attached image Corruption screenshot
With just these patches applied I see this corruption for Google page flip. Can someone confirm?
Re-nom'ing for 4.x based on bug 637852.  We could take this and bug 634397 for 4.0 if we can live with the seaming, but I don't think that's a good idea.
tracking-fennec: 2.0+ → ?
Actually, we need to land part 1 and part 2 for 4.0.
Part 1 and 2 fix bugs that can cause the wrong buffer quadrant to be computed, leading us to paint into the wrong part of the thebeslayerbuffer.
Keywords: mobile, perf
Summary: ThebesLayerBuffer's dimensions get changed way too often → ThebesLayerBuffer can compute the wrong quadrant to draw into with non-identity resolution
Whiteboard: [needs-review (roc)]
Summary: ThebesLayerBuffer can compute the wrong quadrant to draw into with non-identity resolution → ThebesLayerBuffer can compute the wrong quadrant to draw into with non-identity resolution, should support self-copies for image surfaces
re-plusing based on new plan forward
tracking-fennec: ? → 2.0+
(In reply to comment #36)
> So are you saying that we know these Round() calls are wrong and should not be
> needed but we need them because the underlying bug is too hard to fix?

The answer to this ended up being "partly yes, in the 4.0 timeframe", disregard comment 37.  I'm leaving a FIXME/bug 637852 breadcrumb.
Could we get comment #52 addressed? Can anyone else repro?
Looking in a bit.
No longer depends on: 637852
I can repro something like comment 52 on http://fastflip.googlelabs.com/ by pressing the ">" button on the first row of images.  The corruption goes away on its own after ~5 seconds.  This reminds me a lot of an existing bug I think I saw somewhere, but I don't remember which.
Unsurprisingly, part 6 is the culprit.  Might be a bug in MovePixels().
I'm getting "WARNING: Rects don't round to device pixels within precision; glitches likely to follow" when this happens :/.  I'll see if there's anything we can do.
Just a dumb typo I made when updating to the new API:

         if (mBufferRotation == nsIntPoint(0,0)) {
           nsIntRect srcRect(nsIntPoint(0, 0), mBufferRect.Size());
-          nsIntPoint dest = destBufferRect.TopLeft() - mBufferRect.TopLeft();
+          nsIntPoint dest = mBufferRect.TopLeft() - destBufferRect.TopLeft();
           MovePixels(mBuffer, srcRect, dest, curXRes, curYRes);

Ready to land after tryserver'ing.
Depends on: 679091
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: