Closed Bug 1009306 Opened 10 years ago Closed 1 year ago

Visible seams on the page (maybe at tile borders?)

Categories

(Core :: Graphics, defect, P5)

30 Branch
ARM
Android
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
fennec + ---

People

(Reporter: kbrosnan, Assigned: mstange)

References

Details

(Keywords: regression, reproducible, Whiteboard: [gfx-noted])

Attachments

(4 files, 1 obsolete file)

I am seeing one pixel drawing errors that look to happen in vertical columns. See attached screenshot.
Component: Graphics, Panning and Zooming → Graphics
Product: Firefox for Android → Core
Summary: One pixel displacement (of tiles?) → Visible seams on the page (maybe at tile borders?)
Is there a specific device/version/URL you're seeing this on?
Flags: needinfo?(kbrosnan)
I have been using a Nexus 5. I suspect that this is a rounding error at some specific zoom level as I have only seen it on sites that allow pinch zooming. Reddit in the screenshot http://reddit.com/r/android or http://news.ycombinator.comu
Flags: needinfo?(kbrosnan)
Assignee: nobody → snorp
tracking-fennec: ? → 32+
I spent some time looking at this and can see it on 30 and newer. I might have a more consistent test case.
Blocks: 1014333
Assignee: snorp → chrislord.net
Status: NEW → ASSIGNED
Negotiating for Chris' time.
Flags: needinfo?(milan)
Like in bug 1014333, which I think is a dupe of this, I'm not sure this is easily fixable, or indeed where I'd even start if I wanted to fix this.

I'll spend some more time with it today, but I'd suggest that someone more familiar with the rasterisation part of Gecko would be a better candidate to fix this.
Trying to be more methodical about this.

My hypothesis is that this is the result of a rounding error in Gecko during progressive painting somewhere. The alternative hypothesis would be that Cairo has rounding errors or accuracy issues, but this would not be fixable so I'm going with the former until I've exhausted the possibilities.

I don't think this error is introduced when copying buffer contents into tiles - these errors would be more visible I believe, but also when the zoom is stable, mResolution is 1.0 and so there is no place to introduce rounding errors (I've checked this).

Another place rounding error could be introduced is in ClientTiledLayerBuffer::PaintThebes, which calls Scale, then translates by the bounds origin of the paint region. This theory should be easily testable by always over-allocating this region and removing the translate. Attempting this now.
(In reply to Chris Lord [:cwiiis] from comment #9)
> Trying to be more methodical about this.
> 
> My hypothesis is that this is the result of a rounding error in Gecko during
> progressive painting somewhere. The alternative hypothesis would be that
> Cairo has rounding errors or accuracy issues, but this would not be fixable
> so I'm going with the former until I've exhausted the possibilities.
> 
> I don't think this error is introduced when copying buffer contents into
> tiles - these errors would be more visible I believe, but also when the zoom
> is stable, mResolution is 1.0 and so there is no place to introduce rounding
> errors (I've checked this).
> 
> Another place rounding error could be introduced is in
> ClientTiledLayerBuffer::PaintThebes, which calls Scale, then translates by
> the bounds origin of the paint region. This theory should be easily testable
> by always over-allocating this region and removing the translate. Attempting
> this now.

Actually, the comment about mResolution applies in this case too - and all the same, always over-allocating does not fix the issue, so the error doesn't appear to be here.

It just seems that combining old rendering with new rendering causes sub-pixel offsets that introduce these rendering errors.

It feels like this is a layout issue either in the ScrollBy functions, which should be scrolling to pixel-aligned value, or in the invalidation code that should be invalidating layers when their sub-pixel scroll offset changes - or both (I guess if the former worked, the latter wouldn't be necessary(?))

What makes this kind of hard to test is that without progressive rendering, these regions don't have irregular overlaps, so you would see fewer 'seams'. Next, I'll see if disabling render cancelling causes the incidence of these rendering issues to decrease - if so, it would point to this being a layout issue, otherwise it would point to this being an issue either still in the tiling code somewhere, or in cairo.
Always over-allocating the paint buffer and disabling cancelling, I can still easily reproduce this - I'm fairly certain this is a layout issue at this point. I'm going to see if I can reproduce with progressive painting off entirely now (I've figured out some better, more consistent STR).
I can easily reproduce this with both progressive paint and the low precision buffer turned off.

STR:

1- Go to a long Reddit comment thread, like http://www.reddit.com/r/askscience/comments/25mmxp/in_a_loaded_bike_wheel_which_spokes_are_under_the/ in the b2g browser, or in fennec (the bug reproduces in either - ftr, I'm testing in b2g)

2- Scroll down and zoom far into a comment box so that the border of the comment box is in the center of the screen and the scroll position is at the left edge of the page. You should zoom in far enough that the border takes a pixel or two on the screen (ftr, I zoom in so that about 6 lines of text fit vertically on the screen)

3- Scroll about two screens to the right.

4- Scroll down a little so that the border is still easily visible but the vertical scroll position has changed

5- Scroll back to the left again

At this point, you should see that the border thickness changes along the line. If you don't, just repeat steps 3-5 until it does (but this happens pretty much 100% of the time for me).

There are two errors happening here, as commented on in paragraph 3 of comment #10.

n?mattwoodrow to provide some layout/invalidation insight.
Flags: needinfo?(matt.woodrow)
mstange on IRC points to this #ifdef, which I mistakenly thought was removed: http://dxr.mozilla.org/mozilla-central/source/layout/base/FrameLayerBuilder.cpp#1565

Also, that ClampAndAlignWithPixels knows about layer resolution, so something may be going wrong there.

These are two good pointers that I think I can continue fruitfully investigating with - I'll post anything I find.
(In reply to Chris Lord [:cwiiis] from comment #13)
> mstange on IRC points to this #ifdef, which I mistakenly thought was
> removed:
> http://dxr.mozilla.org/mozilla-central/source/layout/base/FrameLayerBuilder.
> cpp#1565

If I remove this #ifndef and the related #ifndef's above it, I can no longer reproduce this error.

It seems whole-layer invalidations happen any time your scrolling takes you to the maximum scroll extent of the frame, which makes sense (if you want to be able to scroll to the absolute edge, you're probably going to lose your sub-pixel alignment - I think we may want to revisit this though, I think it would be ok to have a sub-pixel area you can't scroll to).

I think it would be reasonable to just remove these #ifndef's, but we may also want to change the above behaviour, at least for mobile. Layout comment on this would be valuable.
This fixes the issue, but leaves us with the issue that we have whole-layer invalidations when you hit scroll extents.
Attachment #8445931 - Flags: review?(roc)
Perhaps an alternative fix for the remaining issue, rather than changing the flexible scroll functions, is to make sure that our scroll-clamping-scroll-port-size takes this into account?
(In reply to Chris Lord [:cwiiis] from comment #16)
> Perhaps an alternative fix for the remaining issue, rather than changing the
> flexible scroll functions, is to make sure that our
> scroll-clamping-scroll-port-size takes this into account?

Looking at where this actually gets set, I'm not sure how it could take it into account - it already starts in ParentLayer coordinates, so I'm not sure where the subpixel difference ends up coming from... Rounding error? Oversizing the scroll clamping scroll port size by an arbitrary amount has odd effects, so that isn't an option it seems.
I've had no luck so far in trying to force Gecko to always align to layer pixels. Changing ClampAndAlignWithPixels to ignore the bounds and always return the aligned value still results in whole-layer invalidations when hitting the top or bottom of the scroll extent. I don't know why this is yet.
(In reply to Chris Lord [:cwiiis] from comment #18)
> I've had no luck so far in trying to force Gecko to always align to layer
> pixels. Changing ClampAndAlignWithPixels to ignore the bounds and always
> return the aligned value still results in whole-layer invalidations when
> hitting the top or bottom of the scroll extent. I don't know why this is yet.

Actually that does work :) For some reason the define isn't working in the file I'm using it in, so patch incoming once I figure that out...
I think this is a reasonable replacement for the previous behaviour of not invalidating in this situation.
Attachment #8446007 - Flags: review?(roc)
Flags: needinfo?(matt.woodrow)
Kevin, do these patches fix things for you? I can no longer reproduce, but perhaps I've changed some other things locally and it'd be good to get confirmation one way or the other.
Flags: needinfo?(kbrosnan)
(In reply to Chris Lord [:cwiiis] from comment #16)
> Perhaps an alternative fix for the remaining issue, rather than changing the
> flexible scroll functions, is to make sure that our
> scroll-clamping-scroll-port-size takes this into account?

See bug 1012752 for more discussion on this topic. I didn't realize that this bug only occurs when scrolling to the end, otherwise I would have pointed to bug 1012752 sooner.
(In reply to Chris Lord [:cwiiis] from comment #13)
> Also, that ClampAndAlignWithPixels knows about layer resolution, so
> something may be going wrong there.

The way that it gets the resolution is broken, see bug 1030370. That might or might not affect the case in this bug.
Comment on attachment 8446007 [details] [diff] [review]
Always align scroll position with pixels on mobile

Review of attachment 8446007 [details] [diff] [review]:
-----------------------------------------------------------------

The bug this patch is trying to fix is basically bug 1012752, and the issues there apply. I think we need to just figure out what we're going to do in bug 1012752.
Attachment #8446007 - Flags: review?(roc) → review-
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #24)
> Comment on attachment 8446007 [details] [diff] [review]
> Always align scroll position with pixels on mobile
> 
> Review of attachment 8446007 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> The bug this patch is trying to fix is basically bug 1012752, and the issues
> there apply. I think we need to just figure out what we're going to do in
> bug 1012752.

Fair enough, in that case, I've added a dependency. I'm not sure how good an idea it is to apply the first patch without fixing this, as it's quite easy to hit this path if there's a small amount of horizontal scroll, for example.
Depends on: 1012752
I've been talking with mstange on IRC, and this is how I understand things.

We have these situations:

1. Unpatched (how it is now):
On mobile, we live with 'seams' that commonly occur at validation edges (greatly exacerbated by progressive tiled rendering), where text position shifts and borders shift (e.g. bug 1014333).

2. Patched (attachment 8445931 [details] [diff] [review]):
These seams stop occurring, but we still get the behaviour where if you scroll to an extent, you get a whole-layer invalidation (bad for perf) and then all text and borders shift slightly (which looks bad, but not as bad as (1)).

3. Patched (attachment 8445931 [details] [diff] [review] + attachment 8446007 [details] [diff] [review] or attachment 8424948 [details] [diff] [review]):
No seams, no shift in text position or border thickness at scroll extents, but possible exposure of a pixel-wide area on the bottom and/or right edge of scrollable containers when scrolled to their extents.

4. Proper fix:
Scroll range takes into account layer pixel-grid snapping so that the extents of the scroll range don't cause the layer's pixel alignment to change (rendering attachment 8446007 [details] [diff] [review] obsolete).


(4) is obviously desirable, but hard. Personally, at least for mobile, I think (3) is a more desirable situation than (2), and certainly less bad than (1).

Given this, would you reconsider living with (3) until we get (4), or can we not settle for any less than (4) even on mobile?
Flags: needinfo?(roc)
I don't object to trying #3 on mobile.
Flags: needinfo?(roc)
Comment on attachment 8446007 [details] [diff] [review]
Always align scroll position with pixels on mobile

Re-requesting review based on comment #27 (this patch only applies to mobile)
Attachment #8446007 - Flags: review- → review?(roc)
On further thought, I think the second patch here won't result in those failures mentioned in bug 1012752 as as soon as you scroll something in b2g at least, it gets layerised - so snapping to layer pixels shouldn't need to take into account the origin (as it will effectively be zero and the snapping will match up).
(In reply to Chris Lord [:cwiiis] from comment #29)
> On further thought, I think the second patch here won't result in those
> failures mentioned in bug 1012752 as as soon as you scroll something in b2g
> at least, it gets layerised - so snapping to layer pixels shouldn't need to
> take into account the origin (as it will effectively be zero and the
> snapping will match up).

No. ContainerState::CreateOrRecycleThebesLayer computes a subpixel offset for the layer contents so that mere layerization does not change the subpixel offset of content and cause a visible rendering change.
This does the same thing as my patch, but better - mine has some issues that cause a couple of test failures, but I think this one will be ok. I've confirmed that it fixes the issue.

Try run: https://tbpl.mozilla.org/?tree=Try&rev=192816210dac
Attachment #8446007 - Attachment is obsolete: true
Attachment #8449322 - Flags: review?(roc)
Comment on attachment 8449322 [details] [diff] [review]
Always align scroll position with pixels on mobile (mstange's patch)

Cancelling review, this patch doesn't yet work when the layer is zoomed.
Attachment #8449322 - Flags: review?(roc)
Chris is it worth me building the most recent patch? If so I should have some time tomorrow to deal with this.
(In reply to Kevin Brosnan [:kbrosnan] from comment #35)
> Chris is it worth me building the most recent patch? If so I should have
> some time tomorrow to deal with this.

If you could, yes please - it would be good to confirm this fixes the issue (it does as far as I can tell, but I'd rather have someone else confirm it). The remaining issue is that we end up with more invalidation than is necessary, but that doesn't have significant visual impact.
This patch does not fix the issues in bug 1014333 I still see thicker than expected. I think the text alignment issues are resolved.
Flags: needinfo?(kbrosnan)
(In reply to Kevin Brosnan [:kbrosnan] from comment #37)
> This patch does not fix the issues in bug 1014333 I still see thicker than
> expected. I think the text alignment issues are resolved.

When you say you see thicker than expected, do you see the same line change thickness along the page? Could you post a screenshot, perhaps?
Flags: needinfo?(kbrosnan)
Looks the same as https://bug1014333.bugzilla.mozilla.org/attachment.cgi?id=8428571
Flags: needinfo?(kbrosnan)
Tracking (already was tracked in bug 1014333) + updating the tracking flags.
Chris - This bug looks like it's stalled. Assuming this is still reproducible, can you pick this back up again?
Flags: needinfo?(chrislord.net)
(In reply to Lawrence Mandel [:lmandel] from comment #42)
> Chris - This bug looks like it's stalled. Assuming this is still
> reproducible, can you pick this back up again?

I think we need someone else to take a look at it - if it isn't caused by what I provided patches for (i.e. if it isn't fixed by attachment #8445931 [details] [diff] [review]), I'm really at a loss.
Assignee: chrislord.net → nobody
Flags: needinfo?(chrislord.net)
Status: ASSIGNED → NEW
Milan, this is a regression in 32, and we've got 2 more weeks to take speculative fixes on beta. Need an assignee
Flags: needinfo?(milan)
I don't see us fixing this for 32.
Flags: needinfo?(milan)
Also, I still think I was on the right track and that strictly speaking, this is a layout bug.
NI Jet, in case it is layout issue, and he has somebody that can take a look.
Flags: needinfo?(bugs)
(In reply to Milan Sreckovic [:milan] (PTO 8/11 - 8/15) from comment #47)
> NI Jet, in case it is layout issue, and he has somebody that can take a look.

Has someone tried to flip tiling off? 
https://bugzilla.mozilla.org/attachment.cgi?id=8387597&action=diff

I get that we're not going to turn tiling off to fix this bug, but the "(maybe at tile borders?)" line in the bug title suggests that this needs a bit more triage.
Flags: needinfo?(bugs) → needinfo?(kbrosnan)
I think Kevin may be on PTO. ni aaronmt to help out.
Flags: needinfo?(aaron.train)
tracking-fennec: 32+ → +
I'm having trouble trying to reproduce this again on my N5.
Flags: needinfo?(aaron.train)
With no recent progress, I'm marking this as won't fix for 32.

Jet/Milan - We have already shipped 2 releases and are about to ship a third with this bug. As such, I'm not going to make the case that this needs to be fixed in 33. I will leave it as tracked until we get you to weigh in on when this can be prioritized for a fix. Can we figure out who should own this and get it on one of your prioritized lists?
Flags: needinfo?(milan)
Flags: needinfo?(bugs)
I think we need somebody with a heavy layout experience here; I'm also not clear if bug 1012752 getting fixed helps or completely resolves this situation.
Flags: needinfo?(milan)
Markus will have a look at this one next week, and Fix ETA is likely the following week.
Flags: needinfo?(bugs) → needinfo?(mstange)
Markus, Jet, any news on this bug? Thanks
Flags: needinfo?(bugs)
I noticed that this is not reproducible in aurora 34 but is on beta 33. I found a fixed range which points to Bug 1042423

Last good revision: 326c91338df0
First bad revision: 06b540d3667a
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=326c91338df0&tochange=06b540d3667a
Status: NEW → RESOLVED
Closed: 10 years ago
Depends on: 1042423
No longer depends on: 1012752
Flags: needinfo?(mstange)
Flags: needinfo?(kbrosnan)
Flags: needinfo?(bugs)
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
I'm super dubious of this being fixed, someone should check that the fix isn't actually just the whole page being constantly redrawn, or that progressive tiled updates haven't broken again.

n?kats, but I'll take a look at this too on Monday.
Flags: needinfo?(bugmail.mozilla)
Given the size of the patches in bug 1042423, I don't think we are going to uplift these patches in beta. It is too late in the cycle for this.
Except if Chris is able to do something this week, the 33 release is going to have this bug.
Bug 1042423 has known regressions anyway, so it's not safe to uplift until those have been addressed.
Flags: needinfo?(bugmail.mozilla)
OK. Wontfix for 33 then
I can confirm that this issue is fixed on B2G. The #ifdefs that avoid invalidations on layer pixel alignment changes are "#ifndef MOZ_WIDGET_ANDROID"s, and MOZ_WIDGET_ANDROID is not set on B2G, only on Fennec. So B2G does invalidate on layer pixel alignment changes, which usually only happen once you hit the edge of the scrollable area (bug 1012752).

Removing the #ifdefs completely would fix the seams bug in Fennec, obviously. But, unlike B2G, this would cause Fennec to invalidate during regular panning, too, and regress bug 724786.

The cause of bug 724786 is the following:
Updating the main thread scroll position during panning happens with a window.scrollTo call from JS: http://hg.mozilla.org/mozilla-central/annotate/ae27ae77e32f/mobile/android/chrome/content/browser.js#l3661
The scrollTo DOM API currently only accepts integer values, so this scroll position is rounded to an integer CSS pixel value, and the resulting value in many cases can't be achieved without changing the layer pixel alignment.
Bug 724786 does not occur on B2G because B2G doesn't update the scroll position through a rounding JS API and thus can work with precise fractional values throughout the whole code path.

Fennec is going to switch to the C++ APZ code in the future, so I won't worry about the Fennec bug here too much. Once Fennec is switched, we can look into removing the #ifdefs that cause the seams, and hopefully bug 724786 won't regress at that point.
I strongly disagree with punting on this issue for Fennec. If the solution is to convert the Java APZ to C++ then we need a commitment to actually finish this work instead the 'future'. Firefox for Android has a similar number of users as OS X Firefox. I would not accept this behavior on release of desktop Firefox.
Alright, what if we stop using scrollTo and use something else which accepts a float? It's dumb to be restricted by the JS API here.
Flags: needinfo?(mstange)
Wait this bug is marked as FIXED. Is that not the case? :mstange's comments suggest that we are fundamentally hosed...
(In reply to Markus Stange [:mstange] from comment #60)

> Fennec is going to switch to the C++ APZ code in the future, so I won't
> worry about the Fennec bug here too much. Once Fennec is switched, we can
> look into removing the #ifdefs that cause the seams, and hopefully bug
> 724786 won't regress at that point.

You have to worry about the Fennec bug. As Kevin points out, Android is almost the second largest Firefox platform. These kinds of rendering issues make Firefox (and Mozilla) look really bad.

We can't wait for the switch to C++ APZ unless it's landing in Fx36, which it is not.
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #62)
> Alright, what if we stop using scrollTo and use something else which accepts
> a float? It's dumb to be restricted by the JS API here.

I'm currently playing with that (I added an nsIDOMWindowUtils setScrollXYFloat method) and will report back when I know how well it works.
Status: RESOLVED → REOPENED
Flags: needinfo?(mstange)
Resolution: FIXED → ---
Status: REOPENED → ASSIGNED
Markus, any update?
Flags: needinfo?(mstange)
Yes, this is working well locally, still need to work out a few more kinks before I can start assembling reviewable patches.
Flags: needinfo?(mstange)
No longer blocks: 1014333
No longer depends on: 1042423
Blocks: 1083551
Markus, let us know if Jamie can help.
Whiteboard: [gfx-noted]
Markus, what's the current status of this bug?
Flags: needinfo?(mstange)
I still see this from time to time on Firefox for Android. Kats' APZ Android work may change the behavior. bug 1206872
Assignee: nobody → mstange
Kevin, does this still reproduce for you?
Flags: needinfo?(kbrosnan)
Version: Trunk → 30 Branch
Yes.
Flags: needinfo?(kbrosnan)
Priority: -- → P5
Resetting status flags since this wasn't really fixed in 34/35.
Target Milestone: mozilla34 → ---
See Also: → 1656792
Severity: normal → S3

This is probably fixed now that we're using WebRender on Android.

Status: ASSIGNED → RESOLVED
Closed: 10 years ago1 year ago
Flags: needinfo?(mstange.moz)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: