Corrupted layer drawn

RESOLVED FIXED in Firefox 16

Status

()

Core
Layout
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: BenWa, Assigned: roc)

Tracking

({regression})

unspecified
mozilla18
x86
All
regression
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox15 unaffected, firefox16+ fixed, firefox17+ fixed, firefox18+ verified)

Details

Attachments

(4 attachments, 1 obsolete attachment)

(Reporter)

Description

5 years ago
Created attachment 657862 [details]
Screenshot

I got this corruption while working on cleopatra against nightly. I setup a custom deployment for this bug:
http://people.mozilla.com/~bgirard/cleopatra_bug/?report=e9e2e60361ced1e3c0368f655358158b2a04d88a

Change that caused this bug:
js/diagnosticBar.js
-    if (width < 1) return 0;
+    //if (width < 1) return 0;

This removes the minimal size for a diagnostic (red icons) which increases the complexity of the page and adds many small element.
(Reporter)

Comment 1

5 years ago
Problem does not reproduce with Mac OMTC.
(Reporter)

Comment 2

5 years ago
This regression is currently in Beta but hasn't shipped yet. We NEED to fix this before releasing.
status-firefox15: --- → unaffected
status-firefox16: --- → affected
status-firefox17: --- → affected
status-firefox18: --- → affected
tracking-firefox16: --- → ?
Keywords: regression, regressionwindow-wanted
(Reporter)

Comment 3

5 years ago
STR:
You must be running with OpenGL layers (check about:support for GPU accelerated windows > 0)
tracking-firefox16: ? → +
tracking-firefox17: --- → +
tracking-firefox18: --- → +
(Reporter)

Comment 4

5 years ago
If you get a chance Alice a regression window here would be very helpful.
(Reporter)

Comment 5

5 years ago
Regression window:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=50e4ff05741e&tochange=a79132ac2f05
Keywords: regressionwindow-wanted

Comment 6

5 years ago
Benoit - are you planning to investigate further since this is a gfx issue? Do you feel this will be a common issue post-release?
Assignee: nobody → bgirard
(Reporter)

Comment 7

5 years ago
I'm going to try and schedule time for this tomorrow.

My gut feeling is that this issue is very situational with a uncommon trigger but it does reproduce 100% of the time. Given that the severity of the bug is very high. So I don't estimate that it would be very common in general but very common in sites that trigger it.
(Reporter)

Comment 8

5 years ago
We don't have TBPL builds so I'm going to guess patch at random :(. First attempt is:
http://hg.mozilla.org/mozilla-central/rev/461c9816a3be

Because it deals with non pixel align gradients which this test case has.
(Reporter)

Comment 9

5 years ago
Yay, I got lucky, the problem is in fact Bug 779399.

Roc should I back out 461c9816a3be from central, aurora, beta or do you want to fix it?

(In reply to Alex Keybl [:akeybl] from comment #6)
> Benoit - are you planning to investigate further since this is a gfx issue?
> Do you feel this will be a common issue post-release?

With the regression identified I can give a better answer. This will reproduce 100% of the time if a page uses a non pixel aligned gradient.
Blocks: 779399
(In reply to Benoit Girard (:BenWa) from comment #9)
> Yay, I got lucky, the problem is in fact Bug 779399.
> 
> Roc should I back out 461c9816a3be from central, aurora, beta or do you want
> to fix it?
> 
> (In reply to Alex Keybl [:akeybl] from comment #6)
> > Benoit - are you planning to investigate further since this is a gfx issue?
> > Do you feel this will be a common issue post-release?
> 
> With the regression identified I can give a better answer. This will
> reproduce 100% of the time if a page uses a non pixel aligned gradient.

Given the user impact here, if bug 779399 feels like an isolated issue and a forward fix carries any risk, we should definitely back out for Beta.
(Reporter)

Comment 11

5 years ago
Backout on beta:
https://hg.mozilla.org/releases/mozilla-beta/rev/fc24961171a3
I don't understand how https://hg.mozilla.org/releases/mozilla-aurora/rev/5b1b65bf7f4d, which only makes some tweaks to gradient rendering and only touches public Thebes APIs, could cause texture corruption like that seen in the screenshot in comment #0.

Maybe it tickles an existing bug?

Updated

5 years ago
status-firefox16: affected → fixed

Updated

5 years ago
Depends on: 792903

Comment 13

5 years ago
(In reply to Benoit Girard (:BenWa) from comment #11)
> Backout on beta:
> https://hg.mozilla.org/releases/mozilla-beta/rev/fc24961171a3

This causes 16Beta4 crash! See Bug792903 comment#5
So did the backout actually fix the bug originally reported here?

This bug, with the oblique lines in the screenshot, definitely looks like a wrong stride or width value in a texture upload.
STR:

Use a Mac Nightly from September 3
Use current Gecko Profiler add-on (September 24)
Load the profile link given in comment 0
Unroll the tree until it starts scrolling down: corruption then starts appearing, initially concentrated in the scrollbar area on the right (I'm on OSX 10.8 if that matters) and in the top ~ 20 pixels of the treeview. Also a diagonal gray line (slope -1) bars the whole treeview across. Uploading screenshot.
Whiteboard: STR in comment 15
Created attachment 664201 [details]
another screenshot showing a diagonal line on top of non-corrupted layer

Also, taking this screenshot instantly resulted in the treeview area becoming completely garbled.
Sorry, didn't realize that this had only been backed out on beta -- so no need to use an old nightly to reproduce.
(In reply to Benoit Girard (:BenWa) from comment #9)
> With the regression identified I can give a better answer. This will
> reproduce 100% of the time if a page uses a non pixel aligned gradient.

One thing that's weird is that the backed-out patch (https://hg.mozilla.org/releases/mozilla-aurora/rev/5b1b65bf7f4d) makes the edges of the gradient *more* pixel-aligned, not less (at the cost of introducing a small scale at times).

I think we should reland the backed-out patch. This bug is the only regression linked to that patch (which was in the tree for a long time, and is still on Nightly and Aurora). And, simple Thebes calls like the patch uses should not be able to cause the kind of corruption seen in this bug, so the patch can't be the root cause.
I looked at the layer tree in the failing case. There's a big ThebesLayer containing almost the entire window's content, and some of the panes aren't corrupt, so this doesn't look like a layers bug. Now I'm thinking a cairo/quartz bug.
I think I've figured this out and it's my bug. Stand by...
Assignee: bgirard → roc
Whiteboard: STR in comment 15
The bug is that if the tile rect gets snapped to an empty rect, the patch in bug 779399 will scale us by 0 in the horizontal or vertical direction, which puts the cairo context into an error state, which means nothing else gets drawn. (IMHO it would be more logical for drawing operations to be no-ops while the CTM is singular, and return to normal after resetting the CTM, but whatever.) What looks like stride errors is simply the contents of the destination buffer with nothing drawn into it.
Created attachment 664442 [details] [diff] [review]
We crash with dump-paint-list enabled when there are nested calls to nsLayoutUtils::PaintFrame
Attachment #664442 - Flags: review?(matt.woodrow)
Created attachment 664443 [details] [diff] [review]
Avoid scaling by 0 when snapping gradient tiles

This is the actual fix, for trunk (on top of the patch 779399, although it sort of replaces that patch).
Attachment #664443 - Flags: review?(jmuizelaar)
https://tbpl.mozilla.org/?tree=Try&rev=9909520cb871
Attachment #664442 - Flags: review?(matt.woodrow) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/bc0b95817e55
Whiteboard: [leave open]
Attachment #664442 - Flags: checkin+
https://hg.mozilla.org/mozilla-central/rev/bc0b95817e55
Flags: in-testsuite?
Attachment #664443 - Flags: review?(jmuizelaar) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/ffcdd896a1fa
Component: Graphics: Layers → Layout
Flags: in-testsuite? → in-testsuite+
OS: Mac OS X → All
Whiteboard: [leave open]
Once this has merged to central we should try to get this on Aurora before the uplift.
Backed out for reftest failures:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=8a788d3eccd1

eg https://tbpl.mozilla.org/php/getParsedLog.php?id=15771883&tree=Mozilla-Inbound

https://hg.mozilla.org/integration/mozilla-inbound/rev/9b992222370b
The reftest failed due to numerical inaccuracy. We paint single-pixel-wide slices of a gradient and sometimes the slice width is slightly different from one pixel, and that causes us to paint the slice with a small amount of transparency, which creates the lighter-colored lines seen in the Windows reftest analyzer.

The Mac failures are less regular but probably related to the same issue.
I also realized that this test would normally pass even without the fix here, since the gradient line covers the gradient tile so we'd paint the whole gradient in one pass instead of slice by slice. That doesn't actually happen in this case due to numerical inaccuracy making us think the gradient line is slightly non-perpendicular to the tile edge, but we should fix that (in a separate bug).

So we need to do the test differently.
Created attachment 667782 [details] [diff] [review]
patch with updated test

This version of the test doesn't check that the gradient renders --- the gradient is just a transition from white to transparent white, over a white background, so it shouldn't be visible on any platform. Instead it checks that we're able to draw the text on top of the gradient, which means the graphics context should not have been put into an error state by drawing the gradient.
Attachment #664443 - Attachment is obsolete: true
https://tbpl.mozilla.org/?tree=Try&rev=a697a88ee4b3
Reftests are green. I'll reland this.
https://hg.mozilla.org/integration/mozilla-inbound/rev/a5c50066ecbb
Backed out for:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=a5c50066ecbb

https://hg.mozilla.org/integration/mozilla-inbound/rev/8af00f1609b9
I'm pretty sure the reftest failure was caused by bug 794709, not this bug, since the try push is green and the failing test is a reftest-print reftest and doesn't use gradients. So I'm repushing this.
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ea61f044f72
Blocks: 795837, 795859
Comment on attachment 667782 [details] [diff] [review]
patch with updated test

Review of attachment 667782 [details] [diff] [review]:
-----------------------------------------------------------------

This is a pretty bad regression from bug 779399 that is showing up on real Web sites. We fixed it on beta by backing out bug 779399 (which left other visible bugs behind). This is the real fix, so we should take it for Aurora.
Attachment #667782 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/7ea61f044f72
Status: NEW → RESOLVED
Last Resolved: 5 years ago
status-firefox18: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla18

Updated

5 years ago
Attachment #667782 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
 Approved for aurora . Please land before monday oct 8th merge.
https://hg.mozilla.org/releases/mozilla-aurora/rev/10b51625cc3d
status-firefox17: affected → fixed

Updated

5 years ago
Duplicate of this bug: 795837
Duplicate of this bug: 795859
Reproduced the STR in comment 15 with HWA off on Nightly 2012-09-04.
Verified fixed on FF 18b4 Win 7x64, Mac OS X 10.7.5.
status-firefox18: fixed → verified
You need to log in before you can comment on or make changes to this bug.