Closed Bug 637828 Opened 13 years ago Closed 1 year ago

mozilla.org project transitions show graphics corruption

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: stechz, Assigned: jrmuizel)

References

(
URL
)

Details

Attachments

(4 files, 3 obsolete files)

Screenshot 13 years ago Benjamin Stover (:stechz) 70.88 KB, image/png		Details
Remove unneeded optimization 13 years ago Jeff Muizelaar [:jrmuizel] 905 bytes, patch		Details \| Diff \| Splinter Review
Version that builds 13 years ago Jeff Muizelaar [:jrmuizel] 999 bytes, patch	joe : review+ pavlov : approval2.0-	Details \| Diff \| Splinter Review
Reset the transform on the destination 13 years ago Jeff Muizelaar [:jrmuizel] 3.52 KB, patch	joe : review-	Details \| Diff \| Splinter Review
Reset the transform on the destination v2 13 years ago Jeff Muizelaar [:jrmuizel] 1.04 KB, patch	joe : review+	Details \| Diff \| Splinter Review
Reset the transform on the destination v3 13 years ago Jeff Muizelaar [:jrmuizel] 5.11 KB, patch	jrmuizel : review+	Details \| Diff \| Splinter Review
Another use case: "ghost pixels" when panning regular text 13 years ago Thomas Arend [:tarend] 130.72 KB, image/png		Details

Benjamin Stover (:stechz)

Reporter

Description

•

13 years ago

Attached image Screenshot — Details

Seems to only reproduce on Android.

Steps:
1. Navigate to http://mozilla.org
2. Wait for transition from Firefox to Thunderbird

There are graphics corruption issues after the fade is finished. See attachment.

Benjamin Stover (:stechz)

Reporter

Updated

•

13 years ago

tracking-fennec: --- → ?

Benjamin Stover (:stechz)

Reporter

Comment 1

•

13 years ago

This only seems to happen for smaller scales? After a certain amount of zooming in, this problem doesn't occur. I'm assuming the magic resolution range for this corruption is <=1.0.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 2

•

13 years ago

I see this on desktop in portrait orientation.

Assignee: nobody → jones.chris.g

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 3

•

13 years ago

STR in test-ipcbrowser
 (1) Load http://mozilla.org
 (2) Set resolution to 0.5/0.5

Any resolution <1.0 seems to tickle the bug.  Curiously, the corruption is
 - ABSENT in a vanilla desktop-firefox build
 - ABSENT in a vanilla desktop-firefox build with MOZ_LAYERS_FORCE_SHMEM_SURFACES=1
 - ABSENT in a vanilla desktop-fennec build (guessing from comment 1)
 - present in desktop-fennec build with --enable-mobile-optimize

So it looks like the secret sauce needed to trigger this is --enable-mobile-optimize.  The only significant difference between that and MOZ_LAYERS_FORCE_SHMEM_SURFACES=1 in a vanilla build wrt shadow layers is the image resampling algorithms used.  (The only other difference is that slightly different code is used to composite layers to the Gtk2 window, but since this bug appears on android too I think we can rule that out.)

This smells like a pixman bug :/.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 4

•

13 years ago

This makes the bug go away

diff --git a/gfx/layers/ipc/ShadowLayers.cpp b/gfx/layers/ipc/ShadowLayers.cpp
--- a/gfx/layers/ipc/ShadowLayers.cpp
+++ b/gfx/layers/ipc/ShadowLayers.cpp
@@ -393,21 +393,17 @@ ShadowLayerForwarder::GetParentBackendTy
   return mParentBackend;
 }
 
 static gfxASurface::gfxImageFormat
 OptimalFormatFor(gfxASurface::gfxContentType aContent)
 {
   switch (aContent) {
   case gfxASurface::CONTENT_COLOR:
-#ifdef MOZ_GFX_OPTIMIZE_MOBILE
-    return gfxASurface::ImageFormatRGB16_565;
-#else
     return gfxASurface::ImageFormatRGB24;
-#endif
   case gfxASurface::CONTENT_ALPHA:

Looks like pixman is choking somewhere trying to composite to/from 565 surfaces.  I'd guess a problem with rgba32->565.  Digging.

OS: Android → All

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 5

•

13 years ago

This bug is gone when I --enable-system-cairo for a local cairo 1.10 build with my distro's pixman (0.18.4).

Halp!  Do we cherry-pick upstream patches in this situation?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 6

•

13 years ago

Judging by the date of the git commit we've got in-tree, it looks we're on pixman ~0.18.2.  Is that right?

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 7

•

13 years ago

The pixman in tree should be from between 0.19.4 and 0.19.6. Does the patch on bug 604168 fix it for you?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 8

•

13 years ago

Yeah, I got confused by an out-of-date README and then sidetracked for a bit.  pixman-version.h says 0.19.1, FWIW.

I will give that a shot.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 9

•

13 years ago

Bad news: I tried a local build of cairo 1.10 with a local build of pixman 0.21.2 (being careful to check LD_DEBUG), and the bug was gone.  That's sort of good.

But, with a local build of pixman 0.18.4 running with local cairo 1.10, the bug was also gone.  0.18.4 looks to be older than our in-tree pixman.  That means the remaining possible culprits are
 - cairo 1.9 vs 1.10 (sigh)
 - some patch we have applied to pixman in-tree
 - some patch we have applied to cairo in-tree

(The bug also doesn't appear with in-tree cairo+pixman when I force the Xlib backend to use rgb565 surfaces instead of rgb24, but that's not too surprising.)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 10

•

13 years ago

In-tree cairo claims to be 1.9.5 (cairo-features.h.in), not sure how accurate that is.  I tried some cairo versions around there

1.9.2 - no tee backend
1.9.4 - crash
1.9.6 - crash
1.9.8 - OK

So, maybe something was fixed around 1.9.6-1.9.8.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 11

•

13 years ago

I spent some time looking over the cairo log to see if there was anything that looked related to this bug. Nothing /really/ stood out, though there were a few patches that looked somewhat relevant. I tried applying them and deps, but turns out that some quite substantial cairo changes, many of them to cairo-image-surface, landed the day after the rev we imported last. (Not coincidentally, I imagine! :) .) So attempting to cherry-pick looks to be a lost cause, at least with my cairo-fu.

One of the substantial changes was http://cgit.freedesktop.org/cairo/commit/?id=b9407af6a4bc792c1bcb52c90aa8a618627bb618, referencing the tantalizing goodies

image-rgba firefox-talos-gfx-0 342050.17 (342155.88 0.02%) -> 69412.44 (69702.90 0.21%): 4.93x speedup
███▉
[snip]
image-rgba firefox-planet-gnome-0 217512.61 (217636.80 0.06%) -> 123341.02 (123641.94 0.12%): 1.76x speedup
▊

It's of course unclear how those numbers would translate to m-c tip. This probably wasn't all that appealing in the past because we haven't used the image backend for anything "serious". We are now, on mobile. I don't think this is news to anyone here.

So, I would be remiss to not pursue this: What would it take to land cairo 1.10 for fennec 4, if *only* for fennec 4? We have a patch in bug 562746. We can land cairo 1.10 on a branch if need be. Some folks who can hack on regression fixes are clear of their FF4 blockers (like me).

The alternative, possibly leaving this bug and maybe others unfixed and leaving possibly significant perf wins on the table, because we end up shipping a year-out-of-date cairo, doesn't sound very good.

Let the flames begin!

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 12

•

13 years ago

Tentatively setting dep; a cairo expert could probably directly patch this bug or cherry-pick a fix, but that would take some time for me and I'd rather settle comment 11 first.

Depends on: 562746

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 13

•

13 years ago

Another option which I believe has been bounced around before is shipping a libcairo.so built from scratch and building --enable-system-cairo.  This shouldn't be a problem on android since we don't have to worry about conflicting symbols.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 14

•

13 years ago

That sounds like a reasonable plan

Siarhei Siamashka

Comment 15

•

13 years ago

(In reply to comment #11)
> So, I would be remiss to not pursue this: What would it take to land cairo 1.10
> for fennec 4, if *only* for fennec 4?  We have a patch in bug 562746.  We can
> land cairo 1.10 on a branch if need be.  Some folks who can hack on regression
> fixes are clear of their FF4 blockers (like me).

That's a great idea in general. I think that cairo 1.10.x should be now in a *much* better shape than 1.9 snapshots. Once the release schedule for cairo 1.10 had been announced, a lot of people joined testing efforts and found numerous bugs which are now fixed. Also cairo 1.10 now got into some linux distros where additional testing and bugfixing happened.

Regarding fennec, special care needs to be taken not to forget doing cairo_surface_flush/cairo_surface_mark_dirty calls in places where necessary:
http://cairographics.org/manual-1.0.2/cairo-cairo-surface-t.html#cairo-surface-mark-dirty
http://cairographics.org/manual-1.0.2/cairo-cairo-surface-t.html#cairo-surface-flush

New version of cairo is less forgiving in the cases when this is not done properly (see bug 616700 as an example). Also these flush/dirty calls are only relevant for image backend and are for example noops for xlib, so the bugs may show up in fennec only.

Also in the linux world, the in-tree pixman/cairo copies are irrelevant because everyone is using system libraries anyway.

Siarhei Siamashka

Comment 16

•

13 years ago

(In reply to comment #15)
> Regarding fennec, special care needs to be taken not to forget doing
> cairo_surface_flush/cairo_surface_mark_dirty calls in places where necessary:
> http://cairographics.org/manual-1.0.2/cairo-cairo-surface-t.html#cairo-surface-mark-dirty
> http://cairographics.org/manual-1.0.2/cairo-cairo-surface-t.html#cairo-surface-flush

And I think that in order to make this missing cairo_surface_flush/cairo_surface_mark_dirty calls hunt more efficient, the use of valgrind API could be used. We need to track any accesses to surface memory buffers performed by anything other than cairo/pixman and happening at a wrong time (not between flush and dirty calls). And this is exactly what valgrind can do easily.

Benjamin Stover (:stechz)

Reporter

Comment 17

•

13 years ago

I'm not confident about upgrading cairo. We would be adding a lot of unknowns for Fennec's release candidate. Updating cairo is the kind of thing we want to do at the beginning of the release cycle.

Are there any other solutions that can circumvent this bug?

Benjamin Stover (:stechz)

Reporter

Comment 18

•

13 years ago

Siarhei: I've found several places where we are not flushing and marking dirty. I could use some help figuring out how to set up valgrind to find the instances we care about.

Siarhei Siamashka

Comment 19

•

13 years ago

It should be possible to patch cairo to make use of VALGRIND_MAKE_MEM_DEFINED and VALGRIND_MAKE_MEM_NOACCESS or VALGRIND_MAKE_MEM_UNDEFINED client requests:
    http://valgrind.org/docs/manual/mc-manual.html

So that 'cairo_image_surface_create_for_data' and 'cairo_surface_mark_dirty' would use VALGRIND_MAKE_MEM_NOACCESS on the surface memory buffer and 'cairo_surface_flush' would use VALGRIND_MAKE_MEM_DEFINED. After this is done, valgrind would complain about any attempts to access this buffer when it is not marked as 'defined'. Additionally either every cairo function also should mark buffer as defined on entry and restore its original state on exit in order not to catch itself, or just the valgrind log needs to be filtered to ignore any memory accesses which have cairo functions showing up in a backtrace.

Benjamin Stover (:stechz)

Reporter

Comment 20

•

13 years ago

Is there any easy function to call that gives me the number of bytes for an image, given its format, width, height, and stride?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 21

•

13 years ago

No, need to compute from GetSize() and BytePerPixelFromFormat(). But total size isn't typically useful information on its own for an arbitrary image surface, except for accounting; what are you trying to do?

Benjamin Stover (:stechz)

Reporter

Comment 22

•

13 years ago

VALGRIND_MAKE_MEM_DEFINED needs to know the size of the memory region.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 23

•

13 years ago

Yeah, you can't do that unless the stride is the width*bpp.  Need to mark images row-by-row otherwise.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 24

•

13 years ago

(Probably simplest just to use a single row-by-row impl, since this isn't terribly perf critical.)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Updated

•

13 years ago

No longer depends on: 562746

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Updated

•

13 years ago

Blocks: 638594

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Updated

•

13 years ago

No longer blocks: 638594

Depends on: 638594

Doug Turner (:dougt)

Updated

•

13 years ago

tracking-fennec: ? → 2.0+

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 25

•

13 years ago

If someone can come up with a reduced test case, that might help here.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 26

•

13 years ago

I tried for a bit, but this page is animating using jquery junk and I couldn't reproduce the problem with tests that just animated the opacity of <img>'s and <div>'s and text.  In fact, it was after I tried to make a small test case that I asked you about cairo-trace last night ;).

I'm sure it's doable with some work.

Benjamin Stover (:stechz)

Reporter

Comment 27

•

13 years ago

I think the key features that tickle this bug are:
1) Something with opacity and gradients
2) Some sort of timed invalidation involving these gradients

Brad Lassey [:blassey] (use needinfo?)

Updated

•

13 years ago

Whiteboard: [needs-cairo-update]

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 28

•

13 years ago

Removing the use of slow operation 'SRC with a mask' helps this problem.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 29

•

13 years ago

Attached patch Remove unneeded optimization (obsolete) — Details — Splinter Review

I don't see any reason to use SOURCE here. Pixman will already turn OVER with RGB24 source into COPY when it can. Plus if we use SOURCE with a MASK it will turn into a very slow path, including hitting pixman's general code.

Attachment #516979 - Flags: review?(roc)

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 30

•

13 years ago

Attached patch Version that builds — Details — Splinter Review

Attachment #516979 - Attachment is obsolete: true

Attachment #516979 - Flags: review?(roc)

Jeff Muizelaar [:jrmuizel]

Assignee

Updated

•

13 years ago

Attachment #516983 - Flags: review?(roc)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 31

•

13 years ago

This patch sounds good on its own, but is the underlying bug something we're likely to trip on elsewhere?

Joe Drew (not getting mail)

Comment 32

•

13 years ago

Jeff tells me that the ADD composite operation seems to be wrong, but it might be the input to that operation in clip_and_composite_source.

Joe Drew (not getting mail)

Comment 33

•

13 years ago

Comment on attachment 516983 [details] [diff] [review]
Version that builds

This is removing an optimization that was added in bug 418494 - an optimization that Jeff actually questioned in bug 418494 comment 4. Further, Jeff tells me that pixman is already smart enough to optimize to source when it makes sense, like when we have no alpha channel. Finally, Jeff also tells me that using source requires us to draw through a mask, while over can be a lot faster.

So, as long as this doesn't regress us on Try, I'm happy with this patch!

Attachment #516983 - Flags: review?(roc) → review+

Brad Lassey [:blassey] (use needinfo?)

Updated

•

13 years ago

Whiteboard: [needs-cairo-update] → [can-land]

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 34

•

13 years ago

(In reply to comment #32)
> Jeff tells me that the ADD composite operation seems to be wrong, but it might
> be the input to that operation in clip_and_composite_source.

I've confirmed that inputs to the composite operation
at http://mxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-surface-fallback.c#365 are fine and the result is wrong.

This suggests that compositing a scaled RGB24 source with a A8 mask, the ADD operation and a RGB565 destination surface is broken.

Knowing this should make this much easier to fix. Unfortunately, I may not be able to debug this soon (before Monday) So I'd be happy if someone else took a look.

Benjamin Stover (:stechz)

Reporter

Comment 35

•

13 years ago

This is exciting news! I'm no expert, so if anyone else wants to look please do. In the mean time, I think I'll poke around...

Benjamin Stover (:stechz)

Reporter

Comment 36

•

13 years ago

BTW, there is still another corruption bug, even when all paths to cairo-surface-fallback.c:365 are gone. You can see it on the download button. If these are the same bug, perhaps there's an underlying draw path to blame.

Benjamin Stover (:stechz)

Reporter

Comment 37

•

13 years ago

No luck debugging this. We'll need someone else to take a look.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 38

•

13 years ago

I tried reproducing the problem with a standalone test case. So either, there's something else special going on here or I missed something.

Joe Drew (not getting mail)

Comment 39

•

13 years ago

Note that I haven't seen any evidence of try Talos results, which is what I want before landing this change.

Whiteboard: [can-land] → [can-land][can land only if it passed try]

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 40

•

13 years ago

The problem is:
the destination pixman image has a transform.

Cairo does not expect this.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 41

•

13 years ago

(In reply to comment #40)
> The problem is:
> the destination pixman image has a transform.
> 
> Cairo does not expect this.

It seems like this happens when we use the destination as a transformed source. The solution to this problem, is not obvious. Upstream has completely rewritten this code.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 42

•

13 years ago

Attached patch Reset the transform on the destination (obsolete) — Details — Splinter Review

This fixes the problem for me. Hopefully, it fixes all the other problem too.

Assignee: jones.chris.g → jmuizelaar

Attachment #517529 - Flags: review?

Joe Drew (not getting mail)

Comment 43

•

13 years ago

Comment on attachment 517529 [details] [diff] [review]
Reset the transform on the destination

too much patch

Attachment #517529 - Flags: review? → review-

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 44

•

13 years ago

Attached patch Reset the transform on the destination v2 (obsolete) — Details — Splinter Review

This time without the debugging code.

Attachment #517529 - Attachment is obsolete: true

Attachment #517535 - Flags: review?

Joe Drew (not getting mail)

Comment 45

•

13 years ago

Comment on attachment 517535 [details] [diff] [review]
Reset the transform on the destination v2

just enough patch

Attachment #517535 - Flags: review? → review+

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 46

•

13 years ago

It seems like the code at fault here is really pixman. I think that pixman should probably not be using a transformed fetch on the destination image under any circumstances.

For example, in this case we're fetching destination pixels from a different part of the image than we're storing them to. I can't see any reason for wanting this behaviour.

Benjamin Stover (:stechz)

Reporter

Comment 47

•

13 years ago

This fixes everything that updating cairo fixed for me! Let's get both patches in!

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 48

•

13 years ago

(In reply to comment #47)
> This fixes everything that updating cairo fixed for me! Let's get both patches
> in!

I'm not in as much of a rush to take the first patch because of any potential risk. If someone can show a measurable difference, it would make it more attractive.

Stuart Parmenter

Updated

•

13 years ago

No longer depends on: 638594

Stuart Parmenter

Updated

•

13 years ago

Blocks: 616638

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 49

•

13 years ago

Attached patch Reset the transform on the destination v3 — Details — Splinter Review

Here's a final version ready for pushing.

Attachment #517535 - Attachment is obsolete: true

Attachment #517578 - Flags: review+

Jeff Muizelaar [:jrmuizel]

Assignee

Updated

•

13 years ago

Whiteboard: [can-land][can land only if it passed try] → [can-land]

Stuart Parmenter

Updated

•

13 years ago

Attachment #516983 - Flags: approval2.0-

Stuart Parmenter

Updated

•

13 years ago

Keywords: checkin-needed

Whiteboard: [can-land]

Benjamin Stover (:stechz)

Reporter

Comment 52

•

13 years ago

Pushed http://hg.mozilla.org/mozilla-central/rev/6dab3f1dfa0e

Leaving open for other patch.

tracking-fennec: 2.0+ → ---

Keywords: checkin-needed

Siarhei Siamashka

Comment 53

•

13 years ago

(In reply to comment #46)
> It seems like the code at fault here is really pixman. I think that pixman
> should probably not be using a transformed fetch on the destination image under
> any circumstances.

I wonder if this is actually fixed in pixman already: http://cgit.freedesktop.org/pixman/commit/?id=7f4eabbeec92e55fd8f812c0e5d8568eacbb633d

Thomas Arend [:tarend]

Comment 54

•

13 years ago

Attached image Another use case: "ghost pixels" when panning regular text — Details

Mike Beltzner [:beltzner, not reading bugmail]

Comment 55

•

13 years ago

Should we also take this on the relbranch?

blocking2.0: --- → ?

Whiteboard: [asking for .x]

Mike Beltzner [:beltzner, not reading bugmail]

Comment 56

•

13 years ago

Removing - jrmuizel tells me I'm seeing a different bug!

blocking2.0: ? → ---

Whiteboard: [asking for .x]

Matt Brubeck (:mbrubeck)

Comment 57

•

13 years ago

(In reply to comment #54)
> Another use case: "ghost pixels" when panning regular text

I've filed bug 641506 for that issue, which seems to be newer than this bug.

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Gregory Pappas [:gregp]

Comment 58

•

1 year ago

Unable to reproduce in current versions.

URL: https://web.archive.org/web/201103310...

Status: NEW → RESOLVED

Closed: 1 year ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.