Closed Bug 468496 Opened 16 years ago Closed 13 years ago

Stretched images can show black blobs at edges on Linux (nearest neighbor scaling is not a substitute for EXTEND_PAD)

Categories

(Core :: Graphics, defect)

x86
Linux
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla7
Tracking Status
blocking2.0 --- -
status2.0 --- wanted
blocking1.9.2 --- -
status1.9.2 --- wanted

People

(Reporter: zwol, Assigned: zwol)

References

(Blocks 1 open bug)

Details

(Keywords: polish, testcase)

Attachments

(8 files)

Attached file test case
The attached test case takes an 1x2 pixel image and stretches it horizontally to several hundred pixels wide, several times (with different stretch distances).  On Linux, some of the green lines will show black dots at the far right.  If you change the zoom level, the dots change size jump from one line to another, and at some zoom levels there are also black lines at the bottom of each image.    I have anecdotal reports that similar image artifacts are highly visible with Ubuntu 8.10's build of Firefox 3.0.x (not sure which point revision) when playing the browser-hosted game at www.kingdomofloathing.com, so this is not merely a problem for contrived test cases.

The artifacts arise because, on Linux, we are trying to avoid the use of the CAIRO_EXTEND_PAD source filter because it takes a slow fallback path.  Instead, we use nearest-neighbor image scaling, on the theory that that will never sample pixels outside the image.  As this test case demonstrates, that theory is incorrect.

I will shortly attach a patch which uses EXTEND_PAD universally and modifies the Cairo library to turn on accelerated handling of this mode when the X server supports it.  (Render protocol 0.10 - available in X.org server 1.5 - includes support for EXTEND_PAD and it appears to work when I make the library aware of it.)  I'd be happy to see a fix that don't involve a major performance hit on older X servers but I have been unable to think of one.

(I suspect this test case, or a slight modification, will also show artifacts at some zoom levels on Mac, because we are also not using EXTEND_PAD there - in that case not for performance but because someone had the erroneous impression that EXTEND_PAD is the default on Mac.  Code inspection suggests that Cairo does implement EXTEND_PAD correctly and speedily on Mac.)
Flags: wanted1.9.1?
Here's the patch as described above.  Reftest run in progress.
Attachment #351956 - Flags: superreview?(roc)
Attachment #351956 - Flags: review?(vladimir)
Comment on attachment 351956 [details] [diff] [review]
proposed patch (incl Cairo changes)

Looks fine, except for:

>+        pattern->SetExtend(gfxPattern::EXTEND_PAD);

This shouldn't be done on OSX; the quartz surface behaves slightly differently and PAD will cause us to take a slow path.

Also, make sure you do the cairo patch bit -- add the patch file itself to gfx/cairo and edit gfx/cairo/README to reference it, or it'll get blown away next cairo upgrade.  Jeff or I will upstream it when we can.
Attachment #351956 - Flags: review?(vladimir) → review+
I seem to recall problems with PAD being buggy in the X server, but I don't remember any details. I guess if reftests pass that's a pretty good sign, but how confident are we about the various server/driver combinations? Does Carl know?

I basically like the patch but I'd like to see a new version addressing Vlad's comments. Also, what does returning UNSUPPORTED actually cause cairo to do here?
(In reply to comment #2)
> >+        pattern->SetExtend(gfxPattern::EXTEND_PAD);
> 
> This shouldn't be done on OSX; the quartz surface behaves slightly differently
> and PAD will cause us to take a slow path.

It's necessary.  I haven't tried this exact testcase but I can reproduce visual artifacts on OSX without it.  Also, looking at cairo-quartz-surface.c, it doesn't seem to take a fallback for EXTEND_PAD, only EXTEND_REPEAT/REFLECT.

> Also, make sure you do the cairo patch bit -- add the patch file itself to
> gfx/cairo and edit gfx/cairo/README to reference it, or it'll get blown away
> next cairo upgrade.  Jeff or I will upstream it when we can.

Ok.

(In reply to comment #3)
> I seem to recall problems with PAD being buggy in the X server, but I don't
> remember any details. I guess if reftests pass that's a pretty good sign, but
> how confident are we about the various server/driver combinations? Does Carl
> know?

For the record, I see no regressions in the reftests with this change, but the reftest included in the patch fails -- still analyzing.  I think it's an accident of how I wrote the test.

> I basically like the patch but I'd like to see a new version addressing Vlad's
> comments. Also, what does returning UNSUPPORTED actually cause cairo to do
> here?

The failure propagates to _cairo_surface_composite in cairo-surface.c which then calls _cairo_surface_fallback_composite -- so the rendering is the same, but without X-server acceleration.
(In reply to comment #4)
> (In reply to comment #2)
> > >+        pattern->SetExtend(gfxPattern::EXTEND_PAD);
> > 
> > This shouldn't be done on OSX; the quartz surface behaves slightly differently
> > and PAD will cause us to take a slow path.
> 
> It's necessary.  I haven't tried this exact testcase but I can reproduce visual
> artifacts on OSX without it.  Also, looking at cairo-quartz-surface.c, it
> doesn't seem to take a fallback for EXTEND_PAD, only EXTEND_REPEAT/REFLECT.

I can't reproduce with this testcase; no black or anything but green shows up.  As you say, PAD is ignored anyway (incorrectly, I guess, but correctly for perf).  So there's a separate bug there, I think.
Yeah, if there's a Mac problem, let's handle that under a separate bug.
(In reply to comment #5)
> > I haven't tried this exact testcase but I can reproduce visual
> > artifacts on OSX without it.  Also, looking at cairo-quartz-surface.c, it
> > doesn't seem to take a fallback for EXTEND_PAD, only EXTEND_REPEAT/REFLECT.
> 
> I can't reproduce with this testcase; no black or anything but green shows up. 
> As you say, PAD is ignored anyway (incorrectly, I guess, but correctly for
> perf).  So there's a separate bug there, I think.

PAD is not ignored on OSX.  I have empirical evidence that it does *something*, and my impression from reading the code is that it is implemented with acceleration.  Compare the rendering of reftests/border-image/solid-image-2.html at all zoom levels with these two builds:

https://build.mozilla.org/tryserver-builds/2008-12-02_11:21-zweinberg@mozilla.com-borderimage/zweinberg@mozilla.com-borderimage-firefox-try-mac.dmg
 
https://build.mozilla.org/tryserver-builds/2008-12-05_17:37-zweinberg@mozilla.com-borderimage/zweinberg@mozilla.com-borderimage-firefox-try-mac.dmg

At some zoom levels (reportedly not the default, though) you should see gaps in the borders with the first, but not the second.  The only difference is that the second one sets EXTEND_PAD for quartz surfaces.  (Note that both builds have other patches in them -- you should *not* see gaps in the border with a trunk build.)

Given that the mitigating change is in a shared code path for all images, I am certain it is possible to construct a test case that doesn't use border-image but nonetheless fails on OSX, but I have to leave it to someone who actually has a Mac.

I don't see any value in splitting the bug.  It is the same bug on both Mac and Linux - we need to be using EXTEND_PAD and we're not.
(In reply to comment #4)
> For the record, I see no regressions in the reftests with this change, but the
> reftest included in the patch fails -- still analyzing.  I think it's an
> accident of how I wrote the test.

Bad news: the reftest failure in the patch seems to be caused by going back to whatever Cairo's default is for resampling on image upscale.  With the patch as posted, the test file's images are rendered with a very slight color gradient: at about one-third of the way across each image, the white stripe changes from #ffffff to #fbfffb, and at about two-thirds across, the green stripe changes from #00ff00 to #04ff04.  This does not happen if I leave in the call to pattern->SetFilter(0).

I'd be interested to know if this test fails in Windows and Mac builds.
Blocks: 455364
... I can reproduce the equivalent of the reftest failure with code that uses only the patched cairo library, and the failure goes away if I take the patches out of cairo.  + some web searches turned up other cases where RepeatPad in Render just plain doesn't work.

This makes me a lot less sanguine about the whole idea. :-(
Can you paste some links here which refer to specific problems we should watch out for the next time we think about enabling PAD on GTK?
http://lists.freedesktop.org/archives/xorg/2008-February/032973.html has a test program that fails catastrophically for PAD and REFLECT (or rather, the Render equivalents) on my desktop computer (ATI graphics) and for REFLECT, but not PAD, on my laptop (Intel graphics).  Which has the lovely implication that this is not just about the Render implementation, but about which specific video driver you've got...

I'll attach the test program I wrote tomorrow, when the desktop is on again.

Any thoughts on how to proceed here?
No suggestions here other than "mark some tests random on GTK and hope things get better someday"
wanted1.9.1 is dead
Flags: wanted1.9.1? → wanted1.9.2+
I promised to attach this test case, which demonstrates bugs in X Render (you need to use it with a Cairo library patched to try to use RepeatPad for CAIRO_EXTEND_PAD).
No longer blocks: 455364
I think there is a way to get consistent cross-platform rendering without sacrificing too much performance.  The common case is that images are not scaled, and post-bug 296818, we don't even keep them around in memory in uncompressed form.  So that says to me that we could treat images in display memory as a pure cache, and in particular, we could do all the scaling on the client side before copying to display memory.  That would mean that it's our copy of pixman doing all the work, so we know it's correct, but there wouldn't be any need to pull the image data off the X server, rescale it, and put it back, like what happens now if you enable EXTEND_PAD on Linux.

We *would* create offscreen bitmap surfaces for the scaled images so that as long as the nsThebesImage object sticks around, repaint doesn't have to repeat the scaling work.  I think this might even make scrolling of a scaled document faster, because right now we're scaling everything on every draw operation.

I'm reluctant to implement this before joedrew!'s rumored image layer cleanups land, because right now it's such a horrible mess in there, but I'm willing to make an attempt at it if those are not happening anytime soon.
We used to do things that way. We stopped. Better people than I can explain why.
I think that's what we want to get to, once we get to being able to do decode-on-draw.  It's not a great solution right now though, because we don't want to cache all scaled images since they can become quite large; you can toss in a limit, and then you're back to having to scale on demand, which means having to keep the original bits around.  It also means that if hardware does accelerate the scaling, that we'd lose any access to that; for example, if the native image representation is a GL texture.

But, I think that we should be able to at least partially do it the way you describe... I think the missing piece is decode-on-draw, so that we can better manage memory usage.
I don't understand ... how is decode-on-draw going to work with scaled images? How can you pre-scale the compressed data?
Just so that we have a way to request the image to get redecoded from within the code that does the final rendering, to avoid having to keep different decoded regions in memory at all times.
Attachment #351956 - Flags: superreview?(roc)
Attachment #351956 - Flags: review+
Comment on attachment 351956 [details] [diff] [review]
proposed patch (incl Cairo changes)

canceling r/sr on the patch -- something similar may eventually be the right thing but not now.
This has a few dupes and we currently list it as our only known Linux-specific issue (in our release notes). Seems like we should push a bit harder to fix it if possible.
Flags: blocking1.9.2?
Similar effects can be seen on Mac for a closely-related reason (see bug 472037 and bug 486761).  I've been meaning to ask bholley to look into the proposal outlined in comment 15 once he's done with basic decode-on-draw.
Definitely wanted, would take a safe patch, but can't block 3.6 on this.
Flags: blocking1.9.2? → blocking1.9.2-
This is no longer listed as a known issue in Firefox 3.6 but it's still not fixed. Please fix it soon.
The patch is outdated. nsThebesImage.cpp no longer exist in firefox 3.6.
FWIW, that code was moved to imgFrame.cpp.
blocking1.9.2: --- → ?
blocking2.0: --- → ?
tracking-fennec: --- → ?
status1.9.2: --- → ?
status2.0: --- → ?
I am not realistically going to fix this anytime soon.
Assignee: zweinberg → nobody
Status: ASSIGNED → NEW
blocking1.9.2: ? → -
Blocks: 534054
blocking2.0: ? → final+
status2.0: ? → ---
Does anyone know someone who is able to fix this problem? Zack maybe you can put some text together to help a developer getting started on this. 

Zack can you specify why you cannot fix this issue, if it is resources maybe we can ask Ubuntu to step in. Otherwise if nobody here now's someone able to fix it I will contact Ubuntu/Debian and Fedora to look for someone who can.

It is really not nice to leave such an annoying bug open for so long.
Comments 1-15 and the various test cases and mailing list threads linked therein reflect everything that I know about this bug.

Joe Drew said he had a plan, though.
Thanks for your fast reply Zack, I will wait for Joe to say something. In the mean time I will set up a machine to compile firefox on- to see if the patches still work.

Is there anyone at Mozilla reading who can say something on this issue??
Yes, we're listening. This blocks 1.9.3 final, which means it'll be fixed for Firefox 4 final.

A plan Jeff and I sketched out involves (IIRC):
 - Turning on EXTEND_PAD unconditionally, and
 - using and/or extending the Cairo X version checking code to sniff for the versions of X that EXTEND_PAD sucks on, and falling back to software (pixman) in those cases.

That might not be the final plan, but it's *a* plan.
OK thanks Joe. I is nice to hear that it is fixed for Firefox 4. I will keep monitoring this till than. If there is anything else I can do let me know.
Assignee: nobody → jmuizelaar
FWIW, attachment 351955 [details] shows the background instead of black when using operator-over instead of overriding with operator-source here:

http://hg.mozilla.org/mozilla-central/annotate/0df95f3a6de6/modules/libpr0n/src/imgFrame.cpp#l631

That may often be a little better though still not right, and there are other reasons to use extend-pad.

I'm curious whether the bug here is in calculating scale factors and clips or in the nearest neighbour implementation.  However, extend-pad is probably the appropriate solution anyway.
tracking-fennec: ? → ---
I believe EXTEND_PAD is enable for upscaled image in:
https://bugzilla.mozilla.org/show_bug.cgi?id=422179

However, I'm not sure about downscaled image.
Bug 422179 uses EXTEND_PAD if EXTEND_PAD is fast and good - but only for upscales. I suspect we should use EXTEND_PAD for downscales too, which will fix this bug once and for all. I have a patch which I think will do this.
This patch is built on my patch in bug 566283, but the merge is pretty trivial.

Zack, can you take a look and see if this works?
Attachment #497112 - Flags: feedback?(zackw)
blocking2.0: final+ → -
status2.0: --- → wanted
Comment on attachment 497112 [details] [diff] [review]
Use EXTEND_PAD on X

Tested <http://hg.mozilla.org/mozilla-central/rev/5d9e1a56d922> on my current Linux box: X server vendor release code 10707000, Nouveau driver 0.0.16, NV94 video chipset.

The test case in attachment 351955 [details] shows display artifacts:

* with an unmodified tree;
* with joe's patches for this bug and bug 566283;
* and even with PreparePatternForUntiledDrawing chainsawed down to just

{
    aPattern->SetExtend(gfxPattern::EXTEND_PAD);
    aPattern->SetFilter(aDefaultFilter);
}

The artifacts are not the same as what I get with 3.6 on the same hardware.  3.6 shows thin green lines, all the same color throughout, some of which have solid black dots at their right edges; at some zoom levels there are also solid black lines underneath the green line (with a white pixel or two in between).

mozilla-central never produces the solid black dots at the right edges; however, the green lines are not all the same color throughout, they are the proper color in the middle, and fade to neutral gray at both edges.  Some of the lines get black lines underneath them.  And they all appear to be slightly fatter than they were in 3.6.  This behavior is more or less the same regardless of whether any patches are applied (the patch seems to change subtleties of the artifacts).

It is plausible to me that this is a bug in nouveau, and honestly I think the correct course of action at this point is to nuke the entire mess, use EXTEND_PAD unconditionally, and then work with the Cairo and X driver folks to figure out what's still wrong and fix it properly.  Nonetheless I cannot say that this patch fixes the original problem.
Attachment #497112 - Flags: feedback?(zackw) → feedback-
Replacing the entire body of PreparePatternForUntiledDrawing with

{
    aPattern->SetExtend(gfxPattern::EXTEND_NONE);
    aPattern->SetFilter(gfxPattern::FILTER_FAST);
}

reproduces the 3.6 behavior.  Replacing it with

{
    aPattern->SetExtend(gfxPattern::EXTEND_PAD);
    aPattern->SetFilter(gfxPattern::FILTER_FAST);
}

produces behavior which is _nearly_ correct; the image itself is upscaled correctly (instead of being gradientified and fattened), and there aren't black lines under the image at any zoom level, but some of the lines still get black blobs at their right edges at some zoom levels, and also now sometimes at their left edges.  However, this happens much less often.

Based on this I feel relatively confident in diagnosing a video driver bug, and my recommendation is still to rip out the entire mess and use EXTEND_PAD and a good filter unconditionally, then to work with the X.org folks to get the video drivers fixed.
(In reply to comment #51)
> Tested <http://hg.mozilla.org/mozilla-central/rev/5d9e1a56d922> on my current
> Linux box: X server vendor release code 10707000, Nouveau driver 0.0.16, NV94
> video chipset.

What pixman version was being used by the server here?

(The patch produces the correct results here with xorg-server 1.9.2.901 and xf86-video-ati 6.13.2, or a tigervnc based on xorg-server 1.7.6 and pixman-0.20.0.)
Xorg.0.log says:

# Build Date: 02 December 2010  01:10:32AM
# xorg-server 2:1.7.7-10 
# Current version of pixman: 0.16.4

... which is pretty egregiously old, innit.  Stay tuned.
I upgraded the shared pixman and cairo libraries on this computer to 0.21.2 and 1.10.0 respectively.  Xorg.0.log indicates pixman 0.21 is in use.  Same bad behavior with (or without) joedrew's patches.

The nouveau drivers are the next obvious culprit.  They haven't been updated by this distro since 2010-08-25, but I really don't want to go to the trouble of building them myself (I'd have to build my own kernel as well).

Mike Hommey should have been cc:ed on this bug a long time ago.  My bad.  Any comments, Mike?
Using cairo 1.10.0 and pixmap 0.21.2, everything is displayed gracefully on 4.0b8 built against system cairo, on intel display. (iceweasel from http://mozilla.debian.net/packages/, and cairo and pixmap from current debian experimental, for those who care).

Nouveau drivers are likely to be the culprit.
Your system is nigh-identical to mine but for the video drivers, so yeah, that sounds pretty convincing.  I am testing some random trunk revision from last week + the patches, though.

Who would I have to blackmail to get more recent Nouveau drivers into experimental?
This is what I think should be done.  Yeah, it means we're at the mercy of the video drivers on X11, but that gives us more leverage to kick the X people and the distros, and I don't like working around problems when we can fix them at source.  

Incidentally, my tests were all with in-tree Cairo, but I've now tried --enable-system-cairo and it made no difference.
Attachment #499238 - Flags: feedback?(joe)
(In reply to comment #59)
> Who would I have to blackmail to get more recent Nouveau drivers into
> experimental?

You can try debian-x@lists.debian.org directly, or file a wishlist bug against xserver-xorg-video-nouveau.
Regret to report that I still have display artifacts with the much newer Nouveau driver that just hit Debian experimental:

X.Org X Server 1.9.2.902 (1.9.3 RC 2)
Release Date: 2010-12-03
Current Operating System: Linux moxana 2.6.37-rc7-amd64
NOUVEAU driver Date:   Tue Nov 30 15:27:36 2010 +1000
Comment on attachment 499238 [details] [diff] [review]
alternative patch: unconditional EXTEND_PAD

I can't see myself being OK with this any time soon - there are too many people with old-enough X that unconditional EXTEND_PAD will hurt.
Attachment #499238 - Flags: feedback?(joe) → feedback-
(In reply to comment #63)
> I can't see myself being OK with this any time soon - there are too many people
> with old-enough X that unconditional EXTEND_PAD will hurt.

In that case, and looking at the reports in comment 55 and comment 58, maybe we should go ahead with your patch.  We certainly shouldn't be generalizing from my computer, which has display artifacts *with or without* your patch, that have been conclusively pinned on the video drivers (see https://bugs.freedesktop.org/show_bug.cgi?id=32855 ).
Attachment #497112 - Flags: feedback- → feedback+
Comment on attachment 497112 [details] [diff] [review]
Use EXTEND_PAD on X

>-                if (fastExtendPad) {
>-                    aPattern->SetExtend(gfxPattern::EXTEND_PAD);
>-                    aPattern->SetFilter(aDefaultFilter);
>-                } else
>-#endif
>-                    aPattern->SetFilter(gfxPattern::FILTER_FAST);
>+
>+            if (!isDownscale && !fastExtendPad) {
>+                aPattern->SetFilter(gfxPattern::FILTER_FAST);
>             }
>+

Looks like an else block with SetFilter(aDefaultFilter) is needed.
Without this patch, that currently only happens when !isDownscale, but I expect it should also be done when !isDownscale, as on other platforms.

Otherwise, this approach of using EXTEND_PAD when supported by cairo seems sensible to me.
Attachment #497112 - Flags: feedback+
Blocks: 634362
What else needs to be done here?
Get joedrew's patch landed?
Comment on attachment 497112 [details] [diff] [review]
Use EXTEND_PAD on X

It needs review first, and maybe some rebasing.
Attachment #497112 - Flags: review?(roc)
Attachment #497112 - Flags: review?(jmuizelaar)
Comment on attachment 497112 [details] [diff] [review]
Use EXTEND_PAD on X

Is there a reason for ignoring aDefaultFilter?  (Comment 65)
Attachment #497112 - Flags: review-
Attachment #497112 - Flags: review?(jmuizelaar) → review+
This was based on Joe's attachment 497112 [details] [diff] [review], but restores the use of
aDefaultFilter when we use EXTEND_PAD and adds aDefaultFilter for downscale
with default extend.

There is also some code simplification.  We now require cairo 1.10 so there is
no need to check the cairo version.
Attachment #532740 - Flags: review?(joe)
Comment on attachment 532740 [details] [diff] [review]
use RepeatPad on newer X servers even when downscaling

Review of attachment 532740 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxDrawable.cpp
@@ +78,5 @@
> +        case gfxASurface::SurfaceTypeQuartz:
> +        case gfxASurface::SurfaceTypeQuartzImage:
> +            // Don't set EXTEND_PAD, Mac seems to be OK. Really?
> +            aPattern->SetFilter(aDefaultFilter);
> +            break;

This hunk needs to be removed, as we've removed it with the Cairo update.
Attachment #532740 - Flags: review?(joe) → review+
Blocks: 660740
No longer blocks: 660740
http://hg.mozilla.org/mozilla-central/rev/9be577cafd7e
http://hg.mozilla.org/mozilla-central/rev/57aeb0525615
Assignee: jmuizelaar → zackw
Status: NEW → RESOLVED
Closed: 13 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla7
Verified fixed with with Fennec 7.0a1 (20110620)
Status: RESOLVED → VERIFIED
Not fixed when using canvas..drawImage
Example usage: http://demos.hacks.mozilla.org/openweb/HWACCEL/
Note: demo appears correct but runs slowly because it falls back to software rendering. I've verified with debug tracing that REPEAT_NONE is still used for that test case.

According to https://bugzilla.mozilla.org/show_bug.cgi?id=620065#c24, this bug should take care of that, but obviously didn't. Let me know if I need to file it separately.
P.S. tested using Aurora (7.0a2)
This bug is pretty long in the tooth, and by all accounts did fix the problem with <img>, so please file a new bug and mark it dependent on this one.
Blocks: 679257
This bug is still alive. I opened a new bug https://bugzilla.mozilla.org/show_bug.cgi?id=975358
You need to log in before you can comment on or make changes to this bug.