Text being corrupted when scrolling

VERIFIED FIXED in mozilla2.0b10

Status

()

defect
VERIFIED FIXED
9 years ago
9 years ago

People

(Reporter: Ehsan, Assigned: tnikkel)

Tracking

({regression, testcase})

unspecified
mozilla2.0b10
Points:
---

Firefox Tracking Flags

(blocking2.0 final+)

Details

(Whiteboard: [softblocker])

Attachments

(7 attachments, 1 obsolete attachment)

Posted image Screenshot
This happened to me when scrolling some text.  See the screenshot for the corruption.  It seems that some pixel lines are repeated and some are skipped.

I saw this on <http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1294767077.1294770656.11506.gz&fulltext=1> but I don't think that the exact log matters.

Forcing a repaint (by switching tabs for example) fixes this, but I can reproduce this problem very easily.

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b9pre) Gecko/20110108 Firefox/4.0b9pre
built from: http://hg.mozilla.org/mozilla-central/rev/8f1c39add00f
on OSX 10.6.5
blocking2.0: --- → ?
If you disable hardware acceleration does it still happen?
(In reply to comment #1)
> If you disable hardware acceleration does it still happen?

Yes.
Ehsan, is this still happening?
(In reply to comment #3)
> Ehsan, is this still happening?

Yes.  I can still reproduce this quite easily.
Please get a regression window and perhaps a reliable testcase. Then disable acceleration and start debugging :-). Looking at what gets repainted in FrameLayerBuilder::DrawThebesLayer would be a good start.
Assignee: nobody → ehsan
Regression range: <http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=286d3026ae20&tochange=d1da1005b6d6>.

I'll bisect this range to find the culprit changeset.
Attachment #504816 - Attachment mime type: text/plain → text/html
Posted file Test case
I can reproduce the bug reliably with this test case.
So, this test case apparently reproduces the problem only on my computer, which is strange, but good enough for now.  I also tested this by manually scrolling on other computers, which showed the problem.
Keywords: testcase
Timothy suggested that I should drop in some debugging code to check whether the regions passed in via aRegionToDraw are continuous or not.  They are, in fact, continuous, so that hunch is of no use here.  :(
Target Milestone: --- → mozilla2.0b10
Even when the bug happens?
You might want to check whether bug 625672 helps here.
(In reply to comment #11)
> Even when the bug happens?

Yes.
(In reply to comment #12)
> You might want to check whether bug 625672 helps here.

No, unfortunately, it still happens with the patches on that bug.  :-(

I'm curious though, how did you end up debugging that?  Maybe I could go through similar steps for this too...
I reduced to a minimal testcase and then dumped what we were painting in DrawThebesLayer.

In this case, do you have to be zoomed in or out to see the bug? It looks like a rounding issue maybe.
(In reply to comment #15)
> I reduced to a minimal testcase

This is a problem.  This bug is really easy to trigger manually, but a minimal test is really hard to come up with.  For example, the unit test that I attached here works on _some_ nightlies for me, but not on others.  And it has never worked in my own debug builds.  So each time I have to scroll manually and stop as soon as I see the bug.

> and then dumped what we were painting in
> DrawThebesLayer.

I guess I don't really know what I should be looking for in the log there.

> In this case, do you have to be zoomed in or out to see the bug? It looks like
> a rounding issue maybe.

It happens without any zoom in effect.  Does that help narrow things down?
Is acceleration on or off? Does it matter?
(In reply to comment #17)
> Is acceleration on or off? Does it matter?

This happens with both acceleration on or off.  Although, you suggested that I should turn acceleration off before debugging this in comment 5, so that's what I did.
Sorry, it's hard to keep track of everything :-(

If you put other content in the test page, is it affected? E.g. images? Or just text?
(In reply to comment #19)
> If you put other content in the test page, is it affected? E.g. images? Or just
> text?

It happens to images to.  Seems like it's not text specific.
Alright. A minimal testcase would have an overflow:hidden element containing a single image, or maybe an image and a tall CSS background color. If you can reproduce the bug with that, that'll make things easier.

I think you need to dig into what happens to the ThebesLayer each time we paint the image: what surface the image paint eventually ends up in, and which chunk of the image ends up where.
Just to eliminate all possibilities: it doesn't happen with just a background color?
(In reply to comment #22)
> Just to eliminate all possibilities: it doesn't happen with just a background
> color?

I did not see this problem on a page with only bgcolor=blue but no text.

I did see it on the same test page with bgcolor=blue, but only the text was corrupted, and the background color wasn't.
Posted file Minimal testcase
STR:

1. Scroll the div to the bottom, and then scroll back up just enough to make the firefox logo fall out of the visible region.
2. Extremely slowly scroll down by holding the two fingers on the touchpad, and gently moving them down, until the firefox logo appears on the screen.

You'll see a "seam" at the top part of the image 2 out of 3 times that you do this, which corrects itself as soon as you continue scrolling (or force a repaint by a tab switch or whatever).  This correction is done by the rest of the image "moving up".  It's as though the first pixel line on the image is being rendered on the correct position, and the rest just gets rendered one pixel below the correct place.  This is very similar to the text corruption that I've been seeing.

I'll attach a screenshot too.
I can reproduce on Linux with smooth scrolling turned on!
It happens every other time, consistently.
Can you reproduce the problem on the other test case too, with smooth scrolling turned on?  What about with the log in comment 0?  (You have to click the crash link to go to the bottom of the page...)
OS: Mac OS X → All
Hardware: x86 → All
Assignee: ehsan → tnikkel
It doesn't happen for me with the log testcase attached here (with smooth scrolling).
Even in builds where the bug doesn't appear, the image is drawn at a 1 pixel height difference alternate times when repeatedly doing "down arrow key, up arrow key".
I get http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d45c87e58110&tochange=26c47ba8064f as my regression range (2010-10-09 to 2010-10-10). Ehsan, it wouldn't be horrible if you verified you get the same range for both testcases.

I'm guessing it is bug 600148. The ToNearestPixels call in FrameLayerBuilder.cpp in ContainerState::CreateOrRecycleThebesLayer that sets up the transform for a thebes layer is probably how it caused this bug.
Indeed, making that call site alone use the old method of computing ToNearestPixels fixes the bug for me. I'll look into exactly what we want to be doing here.
In converting an nscoord that is about 29 bits to a float we lose information, and so when we divide by app units per dev pixel we get a slightly incorrect result. If we just revert bug 600148 we get similar but different artifacts. So I think we will need to make these functions do the division at least on doubles.
(In reply to comment #32)
> Indeed, making that call site alone use the old method of computing
> ToNearestPixels fixes the bug for me. I'll look into exactly what we want to be
> doing here.

Timothy, would you mind attaching a quick && dirty patch which does this so that I can test it locally?  Just to make sure that we're barking up the right tree for both test cases...  :-)
(In reply to comment #34)
> Timothy, would you mind attaching a quick && dirty patch which does this so
> that I can test it locally?  Just to make sure that we're barking up the right
> tree for both test cases...  :-)

Actually I found out later a patch doing that fixed the one artifact but produced others (which makes sense). This patch uses doubles to avoid losing information and should be artifact free.
This patch seems to fix both problems for me.
inline float NSAppUnitsToFloatPixels(nscoord aAppUnits, float aAppUnitsPerPixel)
{
  return (float(aAppUnits) / aAppUnitsPerPixel);
}

inline PRInt32 NSAppUnitsToIntPixels(nscoord aAppUnits, float aAppUnitsPerPixel)
{
  return NSToIntRound(float(aAppUnits) / aAppUnitsPerPixel);
}

How about we just change those to be "double(aAppUnits)"? Might need to explicitly cast to float after the division to silence warnings on Windows. That should fix it.
That would work as long as we don't care about the edge cases where even the pixel value can't be represented exactly as a float.
Good point.

Then how about adding NSAppUnitsToDoublePixels, and an overload for NSToIntRoundUp that takes a double, and using them in nsPoint::ToNearestPixels and similar places?
Posted patch patchSplinter Review
Attachment #508671 - Attachment is obsolete: true
Attachment #510387 - Flags: review?(roc)
There are a lot of others places that convert from app units to float pixels that could be vulnerable to this. But I'm not going to convert them all here.
http://hg.mozilla.org/mozilla-central/rev/78778779f370
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Ehsan, I'd appreciate if you could check that the final patch fixes the issue for you.
(In reply to comment #43)
> Ehsan, I'd appreciate if you could check that the final patch fixes the issue
> for you.

I'll do that in tomorrow's nightly.
It seems to have been fixed.  Fantastic job, Timothy!  :-)
Status: RESOLVED → VERIFIED
Thanks for testing.
You need to log in before you can comment on or make changes to this bug.