Assertion failure: [GFX1]: RGBX corner pixel at (423,51) in 846x103 surface is not opaque

RESOLVED FIXED in Firefox 52

Status

()

defect
P3
critical
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: mats, Assigned: lsalzman)

Tracking

({assertion})

Trunk
mozilla53
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox51 wontfix, firefox52 fixed, firefox53 fixed)

Details

(Whiteboard: [gfx-noted])

Attachments

(3 attachments)

I have never seen this assertion before, but after updating my mozilla-inbound
tree today I've got it three times within minutes.  No clear STR yet though.

Linux x86-64 debug build.

Crash Annotation GraphicsCriticalError: |[0][GFX1]: RGBX corner pixel at (423,51) in 846x103 surface is not opaque: 88,83,79,254 (t=720.011) [GFX1]: RGBX corner pixel at (423,51) in 846x103 surface is not opaque: 88,83,79,254
Assertion failure: [GFX1]: RGBX corner pixel at (423,51) in 846x103 surface is not opaque: 88,83,79,254, at gfx/2d/Logging.h:515
Hi Mat,
Can you describe more details about how to reproduce this problem? Also can you provide your about:support info?
Flags: needinfo?(mats)
I haven't seen this crash in a while, so it might have been a temporary issue
on m-i that got fixed.  I'll reopen the bug if it occurs again...
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(mats)
Resolution: --- → WORKSFORME
I got this crash again just now in local Linux x86-64 debug build, 
Here's the Graphics data from about:support:

Graphics
Features
Compositing	Basic
Asynchronous Pan/Zoom	none
WebGL Renderer	X.Org -- Gallium 0.4 on AMD CAICOS (DRM 2.43.0, LLVM 3.8.0)
WebGL2 Renderer	X.Org -- Gallium 0.4 on AMD CAICOS (DRM 2.43.0, LLVM 3.8.0)
Hardware H264 Decoding	No
Audio Backend	pulse
GPU #1
Active	Yes
Description	X.Org -- Gallium 0.4 on AMD CAICOS (DRM 2.43.0, LLVM 3.8.0)
Vendor ID	X.Org
Device ID	Gallium 0.4 on AMD CAICOS (DRM 2.43.0, LLVM 3.8.0)
Driver Version	3.0 Mesa 11.2.0
Diagnostics
AzureCanvasAccelerated	0
AzureCanvasBackend	skia
AzureContentBackend	skia
AzureFallbackCanvasBackend	none
CairoUseXRender	0
Decision Log
HW_COMPOSITING	
blocked by default: Acceleration blocked by platform
OPENGL_COMPOSITING	
unavailable by default: Hardware compositing is disabled
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
It's fairly reproducible with these steps:
1. create a fresh profile
2. enable the Menu Bar (right click in the tab bar, click the checkbox in the context menu)
3. in Preferences, set "When Nightly starts" to: Show my windows and tabs from last time
4. in Preferences, disable e10s
5. quit
6. start from the command line like so:
firefox -profile /path-to-the-profile-directory/ layout/reftests/css-grid/grid-auto-min-sizing-definite-001.html layout/reftests/css-grid/grid-auto-min-sizing-definite-001-ref.html layout/reftests/css-grid/grid-fragmentation-dyn1-002.html layout/reftests/css-grid/grid-fragmentation-002-ref.html
7. grab the lower-right corner of the window and resize the window for a while

Maybe repeat 5+6+7 a few times if it doesn't occur directly.
Note that the HTML files listed on the command line is only available from
the top directory of a mozilla source tree.  You can grab those files from
https://dxr.mozilla.org/mozilla-central/source/layout/reftests/css-grid
and save them anywhere, the path shouldn't be important I think.
I'm not sure if all of them are necessary even, but it gives you a few
more tabs each time you restart in step 6.
Note that having a Menu Bar appears to be crucial, I cannot reproduce it
without it.  I'm less sure about e10s, perhaps it's just less reproducible
there and I wasn't persistent enough.  Anyway, non-e10s make it very
reproducible for me (within a few seconds in step 7).

Let me know if you need any data from a debugger.
(In reply to Mats Palmgren (:mats) from comment #4)
> It's fairly reproducible with these steps:
> 1. create a fresh profile
> 2. enable the Menu Bar (right click in the tab bar, click the checkbox in
> the context menu)
> 3. in Preferences, set "When Nightly starts" to: Show my windows and tabs
> from last time
> 4. in Preferences, disable e10s
> 5. quit
> 6. start from the command line like so:
> firefox -profile /path-to-the-profile-directory/
> layout/reftests/css-grid/grid-auto-min-sizing-definite-001.html
> layout/reftests/css-grid/grid-auto-min-sizing-definite-001-ref.html
> layout/reftests/css-grid/grid-fragmentation-dyn1-002.html
> layout/reftests/css-grid/grid-fragmentation-002-ref.html

Do you know how to run this command with my local Nightly build? Thanks.
Flags: needinfo?(mats)
First, you need a debug build to reproduce this crash.

You need to replace "firefox" with a path to the firefox in you local build.
If you downloaded it, I think it's in the top directory.  If you built it yourself
you could just use "./mach run" instead of "firefox" it will find it for you.
"/path-to-the-profile-directory/" needs to be replaced with the path
to the profile you created in step 1.  Other than that, it's a command line
that you issue from a terminal window as written.
Flags: needinfo?(mats)
Priority: -- → P3
Whiteboard: [gfx-noted]
Posted file some stacks
It looks to me that GetSkImageForSurface just takes the raw data as is
and expects all alpha values to be 0xFF ?  Is the assumption about
the data in the given surface here just wrong?
http://searchfox.org/mozilla-central/rev/c477aa8bd99278962998adba1c5e4b15a02c42c7/gfx/2d/DrawTargetSkia.cpp#214

It's easy to reproduce this on Linux and catch it in 'rr' for analysis.
I know next to nothing about our gfx code though, so I don't know what
I'm looking for...  Can someone from the gfx team help out please?
Flags: needinfo?(milan)
Posted file crash-log.log
After some trying, I can reproduce this issue on my Ubuntu. The attached file was my call stack I saw. I will look into this to figure it out.
Assignee: nobody → vliu
Lee probably has an insight into RGBX/RGBA Skia issues...
Flags: needinfo?(milan) → needinfo?(lsalzman)
(In reply to Milan Sreckovic [:milan] from comment #9)
> Lee probably has an insight into RGBX/RGBA Skia issues...

Based on the STR and the aspect of resizing, my guess would be that some part of the window is not being properly cleared and leaving garbage data instead most likely due to supplying a clear rectangle that is based on an outdated size, or perhaps expose events getting lost.

So the first order of business would be to check what is actually in the buffer when the assertion triggers? Did something actually write non-opaque data into the pixel that triggered the assertion? If so, then that would be a bug. Is it just uninitialized garbage data that is due to out of date size information or a lost expose? If so, that would be a bug.

Presumably Vincent can look into these issues and track this down since he has assigned himself to this bug.
Flags: needinfo?(lsalzman)
> Did something actually write non-opaque data into the pixel that triggered the assertion?

Yes, see the second stack in the first attachment.
(In reply to Mats Palmgren (:mats) from comment #11)
> > Did something actually write non-opaque data into the pixel that triggered the assertion?
> 
> Yes, see the second stack in the first attachment.

I need to know some values from this line when it triggers: https://dxr.mozilla.org/mozilla-central/source/gfx/skia/skia/src/core/SkBlitMask_D32.cpp?q=SkBlitMask_D32.cpp%3A167&redirect_type=direct#167

What are the values there of mask[i], m, src[i], and dst[i] (before dst[i] is modified... probably have to insert some code to store dst[i] into an SkPMColor tmp to get that)?
Flags: needinfo?(mats)
0x00007f1b8cc9f104 in A8_RowProc_Opaque (dst=0x7f1b5394d98c, maskIn=0x7f1b66740780, src=0x7f1b5730b000, count=159) at gfx/skia/skia/src/core/SkBlitMask_D32.cpp:167
167                 dst[i] = SkAlphaMulQ(src[i], m) + SkAlphaMulQ(dst[i], 256 - m);
(rr) info locals
m = 198
i = 148
mask = 0x7f1b66740780 '\377' <repeats 146 times>, "\355\331Ų\236\212vcO;(\024"
(rr) p mask[i]
$2 = 197 '\305'
(rr) p src[i]
$3 = 4282533707
(rr) p dst[i]
$4 = 4294046451
(rr) n

Thread 1 hit Hardware watchpoint 1: -location aData[offset + kARGBAlphaOffset]

Old value = 255 '\377'
New value = 254 '\376'
A8_RowProc_Opaque (dst=0x7f1b5394d98c, maskIn=0x7f1b66740780, src=0x7f1b5730b000, count=159) at gfx/skia/skia/src/core/SkBlitMask_D32.cpp:178
178         }
(rr) p dst[i]
$5 = 4268321905
Flags: needinfo?(mats)
(In reply to Mats Palmgren (:mats) from comment #13)
> 0x00007f1b8cc9f104 in A8_RowProc_Opaque (dst=0x7f1b5394d98c,
> maskIn=0x7f1b66740780, src=0x7f1b5730b000, count=159) at
> gfx/skia/skia/src/core/SkBlitMask_D32.cpp:167
> 167                 dst[i] = SkAlphaMulQ(src[i], m) + SkAlphaMulQ(dst[i],
> 256 - m);
> (rr) info locals
> m = 198
> i = 148
> mask = 0x7f1b66740780 '\377' <repeats 146 times>,
> "\355\331Ų\236\212vcO;(\024"
> (rr) p mask[i]
> $2 = 197 '\305'
> (rr) p src[i]
> $3 = 4282533707
> (rr) p dst[i]
> $4 = 4294046451
> (rr) n
> 
> Thread 1 hit Hardware watchpoint 1: -location aData[offset +
> kARGBAlphaOffset]
> 
> Old value = 255 '\377'
> New value = 254 '\376'
> A8_RowProc_Opaque (dst=0x7f1b5394d98c, maskIn=0x7f1b66740780,
> src=0x7f1b5730b000, count=159) at
> gfx/skia/skia/src/core/SkBlitMask_D32.cpp:178
> 178         }
> (rr) p dst[i]
> $5 = 4268321905

Looks like an instance of the old blending precision loss issue rising from the grave to haunt us... If you change the nearby #if 1 to #if 0, does the problem go away?
 4:48.01 gfx/skia/skia/src/core/SkBlitMask_D32.cpp:170:38: error: use of undeclared identifier 'rbmask'; did you mean 'mask'?
 4:48.01             uint32_t s0 = EXPAND0(v, rbmask, m);
 4:48.01                                      ^~~~~~
 4:48.01                                      mask
(In reply to Mats Palmgren (:mats) from comment #15)
>  4:48.01 gfx/skia/skia/src/core/SkBlitMask_D32.cpp:170:38: error: use of
> undeclared identifier 'rbmask'; did you mean 'mask'?
>  4:48.01             uint32_t s0 = EXPAND0(v, rbmask, m);
>  4:48.01                                      ^~~~~~
>  4:48.01                                      mask

throw in this before that line:
const uint32_t rbmask = 0x00FF00FF;
The same assertion still fails with the same stack before.
Tracing the value back to where it was last changed:

0x00007fa82596cbec in A8_RowProc_Opaque (dst=0x7fa7ef459eb4, maskIn=0x7fa7f1b2a690, src=0x7fa7f1b32000, count=140) at gfx/skia/skia/src/core/SkBlitMask_D32.cpp:176
176                 dst[i] = COMBINE(s0 + d0, s1 + d1, rbmask);
(rr) info locals
rbmask = 16711935
s1 = 4278205952
d0 = 4009816320
v = 4293914865
s0 = 956318464
d1 = 4278251520
m = 256
i = 44
mask = 0x7fa7f1b2a690 '\377' <repeats 127 times>, "\365\342κ\247\223\177lXE1\035\n", '\377' <repeats 60 times>...
(rr) p dst[i]
$2 = 4293914865
(rr) n

Thread 1 hit Hardware watchpoint 1: -location aData[offset + kARGBAlphaOffset]

Old value = 255 '\377'
New value = 254 '\376'
A8_RowProc_Opaque (dst=0x7fa7ef459eb4, maskIn=0x7fa7f1b2a690, src=0x7fa7f1b32000, count=140) at gfx/skia/skia/src/core/SkBlitMask_D32.cpp:179
179         }
(rr) p dst[i]
$3 = 4264046132
(In reply to Mats Palmgren (:mats) from comment #17)
> The same assertion still fails with the same stack before.
> Tracing the value back to where it was last changed:
> 
> 0x00007fa82596cbec in A8_RowProc_Opaque (dst=0x7fa7ef459eb4,
> maskIn=0x7fa7f1b2a690, src=0x7fa7f1b32000, count=140) at
> gfx/skia/skia/src/core/SkBlitMask_D32.cpp:176
> 176                 dst[i] = COMBINE(s0 + d0, s1 + d1, rbmask);
> (rr) info locals
> rbmask = 16711935
> s1 = 4278205952
> d0 = 4009816320
> v = 4293914865
> s0 = 956318464
> d1 = 4278251520
> m = 256
> i = 44
> mask = 0x7fa7f1b2a690 '\377' <repeats 127 times>,
> "\365\342κ\247\223\177lXE1\035\n", '\377' <repeats 60 times>...
> (rr) p dst[i]
> $2 = 4293914865
> (rr) n
> 
> Thread 1 hit Hardware watchpoint 1: -location aData[offset +
> kARGBAlphaOffset]
> 
> Old value = 255 '\377'
> New value = 254 '\376'
> A8_RowProc_Opaque (dst=0x7fa7ef459eb4, maskIn=0x7fa7f1b2a690,
> src=0x7fa7f1b32000, count=140) at
> gfx/skia/skia/src/core/SkBlitMask_D32.cpp:179
> 179         }
> (rr) p dst[i]
> $3 = 4264046132

Okay, so their fallback code is buggy even... Replace that entire big pile of nonsense with the following SkPMLerp version so it looks like this:

        if (m) {
            m += (m >> 7);
            dst[i] = SkPMLerp(src[i], dst[i], m);
        }

This one *should* work.
Yes, that appears to fix it.
Upstream Skia bug report: https://skia-review.googlesource.com/c/7140/
This is just a continuation of fixes made in bug 1200684.

The old blending math Skia previously used could lose precision, such that even if you blended with src=255 and dst=255, the result might be less than 255. This patch just fixes this one other case to use the new blend math implemented in bug 1200684. This preserves the precision so that the result is 255 like expected.
Assignee: vliu → lsalzman
Status: REOPENED → ASSIGNED
Attachment #8827615 - Flags: review?(vliu)
Depends on: 1200684
Comment on attachment 8827615 [details] [diff] [review]
fix A8_RowProc_Opaque to not use legacy broken lerp

I also verified that this patch also fix the problem on my side. The patch looks fine to me and thanks for looking into this.
Attachment #8827615 - Flags: review?(vliu) → review+
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/34ffa05e2af0
fix A8_RowProc_Opaque to not use legacy broken lerp. r=vliu
https://hg.mozilla.org/mozilla-central/rev/34ffa05e2af0
Status: ASSIGNED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment on attachment 8827615 [details] [diff] [review]
fix A8_RowProc_Opaque to not use legacy broken lerp

Approval Request Comment
[Feature/Bug causing the regression]: Since Skia content enabling, so affects 51+.
[User impact if declined]: Possible crashes when rendering SVG masks in some cases.
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]:  no
[List of other uplifts needed for the feature/fix]: aurora (52)
[Is the change risky?]: no
[Why is the change risky/not risky?]: This is just a further extension of a fix made in bug 1200684 which has been quite well tested already.
[String changes made/needed]: None
Attachment #8827615 - Flags: approval-mozilla-aurora?
Comment on attachment 8827615 [details] [diff] [review]
fix A8_RowProc_Opaque to not use legacy broken lerp

skia rendering fix, aurora52+
Attachment #8827615 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.