Hit MOZ_CRASH(Unexpected error with MOZ_GL_DEBUG_ABORT_ON_ERROR. (Run with MOZ_GL_DEBUG_ABORT_ON_ERROR=0 to disable)) at GLContext.h:759

RESOLVED FIXED in Firefox 51

Status

()

defect
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: cbook, Assigned: cleu)

Tracking

(Blocks 1 bug, {assertion})

unspecified
mozilla53
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox49+ wontfix, firefox50+ wontfix, firefox51+ fixed, firefox52 fixed, firefox53 fixed)

Details

(Whiteboard: [gfx-noted], )

Attachments

(4 attachments, 1 obsolete attachment)

Reporter

Description

3 years ago
Posted file complete stack
Found via bughunter and seems affected beta -> nightly 

Reproduced with latest m-c debug tinderbox build on windows 

Steps to reproduce:
-> http://www.anj.fyi/behold/
-> Load Hit MOZ_CRASH(Unexpected error with MOZ_GL_DEBUG_ABORT_ON_ERROR. (Run with MOZ_GL_DEBUG_ABORT_ON_ERROR=0 to disable)) at c:\builds\moz2_slave\m-cen-w32-d-000000000000000000\build\src\gfx\gl\GLContext.h:759
Reporter

Comment 1

3 years ago
[Tracking Requested - why for this release]:
bughunter
Whiteboard: [gfx-noted]
about:support please.
Flags: needinfo?(cbook)
Reporter

Comment 4

3 years ago
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #3)
> I can't reproduce with the build:
> http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-
> win64-debug/1470060589/
> at win10.

it seems you might reload the page several times via shift+reload this works for me to trigger the crash on win 7
Flags: needinfo?(cbook)
Reporter

Comment 5

3 years ago
about:support 
(its a win7 vm on a fusion 8 mac 10.11 mbp).

Graphics
--------

Features
Compositing: Basic
Asynchronous Pan/Zoom: wheel input enabled; touch input enabled
WebGL Renderer: Google Inc. -- ANGLE (Software Adapter Direct3D11 vs_5_0 ps_5_0)
WebGL2 Renderer: WebGL creation failed: * Refused to create native OpenGL context because of blacklist entry: FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR * Exhausted GL driver options.
Hardware H264 Decoding: No; Hardware video decoding disabled or blacklisted
Audio Backend: wasapi
Direct2D: Blocked for your graphics card because of unresolved driver issues.
DirectWrite: false (6.2.9200.17568)
GPU #1
Active: Yes
Description: VMware SVGA 3D
Vendor ID: 0x15ad
Device ID: 0x0405
Driver Version: 8.15.1.33
Driver Date: 10-16-2015
Drivers: vm3dum vm3dum_10
Subsys ID: 040515ad
RAM: 384

Diagnostics
AzureCanvasAccelerated: 0
AzureCanvasBackend: skia
AzureContentBackend: cairo
AzureFallbackCanvasBackend: cairo
Decision Log
D3D11_COMPOSITING:
Blocklisted; failure code BLOCKLIST_FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR
D3D9_COMPOSITING:
Blocklisted; failure code BLOCKLIST_FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR
DIRECT2D:
unavailable by default: Direct2D requires Direct3D 11 compositing
D3D11_HW_ANGLE:
unavailable by default: D3D11 compositing is disabled
disabled by env: D3D11 compositing is disabled

---------------------
Jerry or Jeff, any luck reproducing this?
Flags: needinfo?(jgilbert)
Flags: needinfo?(hshih)
I have no bandwidth for this.
Flags: needinfo?(jgilbert)
David or Milan, can you help? Not sure how serious this is.  Hitting MOZ_CRASH is usually not good. If I don't need to worry about it for 49, let's defer it to 51.
Flags: needinfo?(milan)
Flags: needinfo?(dbolter)
I still can't reproduce this at win7.
I will try a win7 vm on fusion 8 later.
Flags: needinfo?(hshih)

Comment 10

3 years ago
Posted file aurora debug stack
I ran this url in bughunter on windows 7 again and found only one aurora debug crash @ gl::GLContext::AfterGLCall(char const *) [GLContext.h:264adddee81b : 758 + 0x22]

It may be intermittent. If you need, I can try to reproduce more aggressively.
First comment - we weren't allowing this configuration (software ANGLE for WebGL, basic compositor) until bug 1271770, which landed on nightly in 50, then got uplifted to 49 on July 27th.  I would imagine that to be the cause - exposing an existing bug because of the additional number of people that can now run WebGL, rather than a new bug.

Based on the debug stack - perhaps VMWare doesn't do mipmaps properly.
Flags: needinfo?(milan) → needinfo?(jgilbert)
Flags: needinfo?(dbolter)
Milan, that sounds like a potentially good thing (more people able to run WebGL). Deferring this to 50 at this point as we are heading into 49 beta 8 now.
Flags: needinfo?(jgilbert) → needinfo?(howareyou322)
Michael, please help to reproduce this crash in win 7 VM. I think it might go different flow after bug 1271770 or bug 1297965.
Flags: needinfo?(howareyou322) → needinfo?(cleu)
Assignee

Comment 14

3 years ago
OK, I will prepare a Windows 7 VM running under VMWare Fusion
Assignee

Comment 15

3 years ago
Hi Tomcat, 

I have set up a windows 7 VM with same VMWare SVGA driver version as the about:support you posted.

But I cannot reproduce the crash, can you pull the latest mozilla-central code and test it again?

In addition, what is the build number of your VMWare fusion?

I think maybe this issue is related to the GFX emulation inside VMWare Fusion.
Flags: needinfo?(cleu) → needinfo?(cbook)
Reporter

Comment 16

3 years ago
Posted file latest m-c stack
crashed after a long time 

mozilla-central tinderbox build (debug, windows 7) based on https://hg.mozilla.org/mozilla-central/rev/9baec74b3db1bf005c66ae2f50bafbdb02c3be38
Flags: needinfo?(cbook)
Reporter

Comment 17

3 years ago
(In reply to Michael Leu[:lenzak800](UTC+8)[PTO 10/6 ~ 10/13] from comment #15)
> Hi Tomcat, 
> 
> I have set up a windows 7 VM with same VMWare SVGA driver version as the
> about:support you posted.
> 
> But I cannot reproduce the crash, can you pull the latest mozilla-central
> code and test it again?
> 
> In addition, what is the build number of your VMWare fusion?
> 
> I think maybe this issue is related to the GFX emulation inside VMWare
> Fusion.

Hi Michael, its Version 8.5.0 (4352717) Fusion Mac - also attached a m-c stack from today
Assignee

Comment 18

3 years ago
OK, I will open it and wait for several minutes.
BTW, here is my about:support.

Graphics
Features
Compositing	Basic
Asynchronous Pan/Zoom	wheel input enabled; touch input enabled
WebGL Renderer	Google Inc. -- ANGLE (Software Adapter Direct3D11 vs_4_1 ps_4_1)
WebGL2 Renderer	Google Inc. -- ANGLE (Software Adapter Direct3D11 vs_4_1 ps_4_1)
Hardware H264 Decoding	No; Hardware video decoding disabled or blacklisted
Audio Backend	unknown
Direct2D	Blocked for your graphics card because of unresolved driver issues.
DirectWrite	false (6.1.7601.17514)
GPU #1
Active	Yes
Description	VMware SVGA 3D
Vendor ID	0x15ad
Device ID	0x0405
Driver Version	8.15.1.33
Driver Date	10-16-2015
Drivers	vm3dum vm3dum_10
Subsys ID	040515ad
RAM	8
Diagnostics
AzureCanvasAccelerated	0
AzureCanvasBackend	skia
AzureContentBackend	cairo
AzureFallbackCanvasBackend	cairo
failures	[GFX1-]: Refresh driver waiting for the compositor for1.04809 seconds.
Decision Log
D3D11_COMPOSITING	
Blocklisted; failure code BLOCKLIST_FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR
D3D9_COMPOSITING	
Blocklisted; failure code BLOCKLIST_FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR
DIRECT2D	
unavailable by default: Direct2D requires Direct3D 11 compositing
D3D11_HW_ANGLE	
unavailable by default: D3D11 compositing is disabled
disabled by env: D3D11 compositing is disabled
Assignee

Comment 19

3 years ago
OK, I can reproduce it after I move the VM to same VMWare Fusion build.

It seems that only windows 7 32bit has this issue.
Track 51+ as it can be reproduced.
Assignee: nobody → cleu
Assignee

Comment 21

3 years ago
I found that this crash always happens after a buffer allocation failure.
This website has a texture with dimension 8192*8192, which requires 256MB of memory.

And this allocation which utilizes calloc sometime fails with a nullptr return here.
https://dxr.mozilla.org/mozilla-central/source/dom/canvas/TexUnpackBlob.cpp?q=Unable+to+allocate+buffer+during+conversion&redirect_type=single#253

But both the VM's system or GFX memory are sufficient to allocate, so I am still trying to figure out why this memory allocation failure happen and why this happens intermittently.
Assignee

Comment 22

3 years ago
OK, I think I found the reason.

calloc returns nullptr when no consecutive required memory space available.

Since 256MB is a very big value, it will be fail-prone.

It also explains why only 32bit windows happens because 64bit has a much larger address space.

So it may be a problem of our error handling, we should have bailed out from the texture operations before it hit MOZ_CRASH.
(In reply to Michael Leu[:lenzak800](UTC+8)[PTO 10/6 ~ 10/13] from comment #22)
> OK, I think I found the reason.
> 
> calloc returns nullptr when no consecutive required memory space available.
> 
> Since 256MB is a very big value, it will be fail-prone.
> 
> It also explains why only 32bit windows happens because 64bit has a much
> larger address space.
> 
> So it may be a problem of our error handling, we should have bailed out from
> the texture operations before it hit MOZ_CRASH.

Good catch, Michael.
Assignee

Comment 24

3 years ago
So here is how the crash happens.

1. The website calls TexImage2D, requesting a large buffer.

2. Under certain environment, we use CPU-side conversion in ConvertIfNeeded
https://dxr.mozilla.org/mozilla-central/source/dom/canvas/TexUnpackBlob.cpp?q=ConvertIfNeeded&redirect_type=direct#171

3. We cannot allocate such a large buffer in CPU side, so the calloc returns null, following an OOM exception thrown.
https://dxr.mozilla.org/mozilla-central/source/dom/canvas/TexUnpackBlob.cpp?q=ConvertIfNeeded&redirect_type=direct#253

4. The website still calls GenerateMipmap with this faulty texture, so we got an INVALID_OPERATION exception, then we crashed because it is a debug build with MOZ_GL_DEBUG_ABORT_ON_ERROR set.

This whole flow seems to be normal, we have told the website we're out of memory but they still use it, so we pop out more error, and it crashes because its debug configuration.

Jeff, do you have any suggestion about this one?

Maybe it is just a normal reaction instead of a bug.
Flags: needinfo?(jgilbert)
Great run-down.

#4 should not happen. We should be validating that it's valid to send glGenerateMipmaps to the driver before we call it. The fix is the repair our GenerateMipmap validation.
Flags: needinfo?(jgilbert)
Assignee

Comment 27

3 years ago
Comment on attachment 8804598 [details] [diff] [review]
Prevent BaseImageInfo being initialized when TexOrSubImage fails.

Review of attachment 8804598 [details] [diff] [review]:
-----------------------------------------------------------------

I found that we do have some checks when we call GenerateMipmap,
the problem is that if blob->TexOrSubImage fails with a false return before it reaches the real GL part (DoTexSubImage),
it wouldn't be treated as failure because glError is 0 since we didn't reach real GL call.
https://dxr.mozilla.org/mozilla-central/source/dom/canvas/WebGLTextureUpload.cpp#1423

So we continue to execute, initializing the ImageInfo inside WebGLTexture which make other further calls thought it is a valid texture.

This patch add checks for blob->TexOrSubImage, bailing out if it returns false, which prevent the ImageInfo from falsely initialized.

So further texture calls will bail out by their own checking.
Attachment #8804598 - Flags: review?(jgilbert)
Comment on attachment 8804598 [details] [diff] [review]
Prevent BaseImageInfo being initialized when TexOrSubImage fails.

Review of attachment 8804598 [details] [diff] [review]:
-----------------------------------------------------------------

Feels like we should do similar error reporting as we do for out of memory scenarios, rather than just return?
Assignee

Comment 29

3 years ago
(In reply to Milan Sreckovic [:milan] from comment #28)
> Comment on attachment 8804598 [details] [diff] [review]
> Prevent BaseImageInfo being initialized when TexOrSubImage fails.
> 
> Review of attachment 8804598 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Feels like we should do similar error reporting as we do for out of memory
> scenarios, rather than just return?

It has thrown exception here so the website should have known they run out of memory.
https://dxr.mozilla.org/mozilla-central/source/dom/canvas/TexUnpackBlob.cpp?q=ConvertIfNeeded&redirect_type=direct#253

The problem is that it continue to initialize ImageInfo after the exception thrown,
so further web gl calls don't know the texture is faulty.

Since it has thrown OM exception, I think we can just return.
Too late for 50, we should plan to fix this in 51.
Comment on attachment 8804598 [details] [diff] [review]
Prevent BaseImageInfo being initialized when TexOrSubImage fails.

Review of attachment 8804598 [details] [diff] [review]:
-----------------------------------------------------------------

This is the right idea, but I didn't really make the guarantee tight when I wrote this before.
I uploaded patch that expands on yours.

::: dom/canvas/WebGLTextureUpload.cpp
@@ +1336,5 @@
>  
>      GLenum glError;
> +    if (!blob->TexOrSubImage(isSubImage, needsRespec, funcName,
> +                             this, target, level, driverUnpackInfo,
> +                             xOffset, yOffset, zOffset, &glError)) {

{ goes on its own line for multi-line conditionals.
Attachment #8804598 - Flags: review?(jgilbert) → review-
Attachment #8804598 - Attachment is obsolete: true
Attachment #8811103 - Flags: review?(cleu)
Assignee

Comment 33

3 years ago
mozreview-review
Comment on attachment 8811103 [details]
Bug 1290831 - Clarify TexUnpackBlob::TexOrSubImage's fallibility and update callers. -

https://reviewboard.mozilla.org/r/93320/#review93480

Thanks for your suggestion. :)
Attachment #8811103 - Flags: review?(cleu) → review+

Comment 34

3 years ago
Pushed by jgilbert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/971de933cd5d
Clarify TexUnpackBlob::TexOrSubImage's fallibility and update callers. - r=cleu
Reporter

Comment 35

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/971de933cd5d
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Hi Michael,
could you please nominate this uplift to Beta51 and Aurora52 if this patch is not too risky?
Flags: needinfo?(cleu)
Assignee

Comment 37

3 years ago
Comment on attachment 8811103 [details]
Bug 1290831 - Clarify TexUnpackBlob::TexOrSubImage's fallibility and update callers. -

Approval Request Comment
[Feature/regressing bug #]: Bug1290831
[User impact if declined]: Users will encounter unstable webGL when we perform CPu-side texture processing.
[Describe test coverage new/current, TreeHerder]: Hand test and see no crash in debug build.
[Risks and why]: Low, it just slightly modifies the operation validation process.
[String/UUID change made/needed]: N/A
Flags: needinfo?(cleu)
Attachment #8811103 - Flags: approval-mozilla-aurora?
Hi Michael, 
Because WebGL 2 will be shipped in 51, isn't this worth uplifting to Beta51?
Flags: needinfo?(cleu)
Yes.
Flags: needinfo?(cleu)
Comment on attachment 8811103 [details]
Bug 1290831 - Clarify TexUnpackBlob::TexOrSubImage's fallibility and update callers. -

Approval Request Comment
[Feature/regressing bug #]: webgl2
[User impact if declined]:
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: 
[String/UUID change made/needed]:
Attachment #8811103 - Flags: approval-mozilla-beta?
Comment on attachment 8811103 [details]
Bug 1290831 - Clarify TexUnpackBlob::TexOrSubImage's fallibility and update callers. -

Fix an issue related to WebGL 2. Beta51+ and Aurora52+. Should be in 51 beta 2.
Attachment #8811103 - Flags: approval-mozilla-beta?
Attachment #8811103 - Flags: approval-mozilla-beta+
Attachment #8811103 - Flags: approval-mozilla-aurora?
Attachment #8811103 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.