Closed
Bug 1290831
Opened 9 years ago
Closed 9 years ago
Hit MOZ_CRASH(Unexpected error with MOZ_GL_DEBUG_ABORT_ON_ERROR. (Run with MOZ_GL_DEBUG_ABORT_ON_ERROR=0 to disable)) at GLContext.h:759
Categories
(Core :: Graphics: CanvasWebGL, defect)
Core
Graphics: CanvasWebGL
Tracking
()
People
(Reporter: cbook, Assigned: cleu)
References
()
Details
(Keywords: assertion, Whiteboard: [gfx-noted])
Attachments
(4 files, 1 obsolete file)
|
6.48 KB,
text/plain
|
Details | |
|
3.09 KB,
text/plain
|
Details | |
|
6.46 KB,
text/plain
|
Details | |
|
58 bytes,
text/x-review-board-request
|
cleu
:
review+
gchang
:
approval-mozilla-aurora+
gchang
:
approval-mozilla-beta+
|
Details |
Found via bughunter and seems affected beta -> nightly
Reproduced with latest m-c debug tinderbox build on windows
Steps to reproduce:
-> http://www.anj.fyi/behold/
-> Load Hit MOZ_CRASH(Unexpected error with MOZ_GL_DEBUG_ABORT_ON_ERROR. (Run with MOZ_GL_DEBUG_ABORT_ON_ERROR=0 to disable)) at c:\builds\moz2_slave\m-cen-w32-d-000000000000000000\build\src\gfx\gl\GLContext.h:759
| Reporter | ||
Comment 1•9 years ago
|
||
[Tracking Requested - why for this release]:
bughunter
tracking-firefox49:
--- → ?
tracking-firefox50:
--- → ?
Updated•9 years ago
|
Whiteboard: [gfx-noted]
Comment 3•9 years ago
|
||
I can't reproduce with the build:
http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win64-debug/1470060589/
at win10.
| Reporter | ||
Comment 4•9 years ago
|
||
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #3)
> I can't reproduce with the build:
> http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-
> win64-debug/1470060589/
> at win10.
it seems you might reload the page several times via shift+reload this works for me to trigger the crash on win 7
Flags: needinfo?(cbook)
| Reporter | ||
Comment 5•9 years ago
|
||
about:support
(its a win7 vm on a fusion 8 mac 10.11 mbp).
Graphics
--------
Features
Compositing: Basic
Asynchronous Pan/Zoom: wheel input enabled; touch input enabled
WebGL Renderer: Google Inc. -- ANGLE (Software Adapter Direct3D11 vs_5_0 ps_5_0)
WebGL2 Renderer: WebGL creation failed: * Refused to create native OpenGL context because of blacklist entry: FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR * Exhausted GL driver options.
Hardware H264 Decoding: No; Hardware video decoding disabled or blacklisted
Audio Backend: wasapi
Direct2D: Blocked for your graphics card because of unresolved driver issues.
DirectWrite: false (6.2.9200.17568)
GPU #1
Active: Yes
Description: VMware SVGA 3D
Vendor ID: 0x15ad
Device ID: 0x0405
Driver Version: 8.15.1.33
Driver Date: 10-16-2015
Drivers: vm3dum vm3dum_10
Subsys ID: 040515ad
RAM: 384
Diagnostics
AzureCanvasAccelerated: 0
AzureCanvasBackend: skia
AzureContentBackend: cairo
AzureFallbackCanvasBackend: cairo
Decision Log
D3D11_COMPOSITING:
Blocklisted; failure code BLOCKLIST_FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR
D3D9_COMPOSITING:
Blocklisted; failure code BLOCKLIST_FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR
DIRECT2D:
unavailable by default: Direct2D requires Direct3D 11 compositing
D3D11_HW_ANGLE:
unavailable by default: D3D11 compositing is disabled
disabled by env: D3D11 compositing is disabled
---------------------
Comment 6•9 years ago
|
||
Jerry or Jeff, any luck reproducing this?
status-firefox49:
--- → affected
status-firefox50:
--- → affected
status-firefox51:
--- → ?
tracking-firefox51:
--- → ?
Flags: needinfo?(jgilbert)
Flags: needinfo?(hshih)
Comment 8•9 years ago
|
||
David or Milan, can you help? Not sure how serious this is. Hitting MOZ_CRASH is usually not good. If I don't need to worry about it for 49, let's defer it to 51.
Flags: needinfo?(milan)
Flags: needinfo?(dbolter)
Comment 9•9 years ago
|
||
I still can't reproduce this at win7.
I will try a win7 vm on fusion 8 later.
Flags: needinfo?(hshih)
Comment 10•9 years ago
|
||
I ran this url in bughunter on windows 7 again and found only one aurora debug crash @ gl::GLContext::AfterGLCall(char const *) [GLContext.h:264adddee81b : 758 + 0x22]
It may be intermittent. If you need, I can try to reproduce more aggressively.
First comment - we weren't allowing this configuration (software ANGLE for WebGL, basic compositor) until bug 1271770, which landed on nightly in 50, then got uplifted to 49 on July 27th. I would imagine that to be the cause - exposing an existing bug because of the additional number of people that can now run WebGL, rather than a new bug.
Based on the debug stack - perhaps VMWare doesn't do mipmaps properly.
Flags: needinfo?(milan) → needinfo?(jgilbert)
Updated•9 years ago
|
Flags: needinfo?(dbolter)
Comment 12•9 years ago
|
||
Milan, that sounds like a potentially good thing (more people able to run WebGL). Deferring this to 50 at this point as we are heading into 49 beta 8 now.
Updated•9 years ago
|
Flags: needinfo?(jgilbert) → needinfo?(howareyou322)
Comment 13•9 years ago
|
||
Michael, please help to reproduce this crash in win 7 VM. I think it might go different flow after bug 1271770 or bug 1297965.
Flags: needinfo?(howareyou322) → needinfo?(cleu)
| Assignee | ||
Comment 14•9 years ago
|
||
OK, I will prepare a Windows 7 VM running under VMWare Fusion
| Assignee | ||
Comment 15•9 years ago
|
||
Hi Tomcat,
I have set up a windows 7 VM with same VMWare SVGA driver version as the about:support you posted.
But I cannot reproduce the crash, can you pull the latest mozilla-central code and test it again?
In addition, what is the build number of your VMWare fusion?
I think maybe this issue is related to the GFX emulation inside VMWare Fusion.
Flags: needinfo?(cleu) → needinfo?(cbook)
| Reporter | ||
Comment 16•9 years ago
|
||
crashed after a long time
mozilla-central tinderbox build (debug, windows 7) based on https://hg.mozilla.org/mozilla-central/rev/9baec74b3db1bf005c66ae2f50bafbdb02c3be38
Flags: needinfo?(cbook)
| Reporter | ||
Comment 17•9 years ago
|
||
(In reply to Michael Leu[:lenzak800](UTC+8)[PTO 10/6 ~ 10/13] from comment #15)
> Hi Tomcat,
>
> I have set up a windows 7 VM with same VMWare SVGA driver version as the
> about:support you posted.
>
> But I cannot reproduce the crash, can you pull the latest mozilla-central
> code and test it again?
>
> In addition, what is the build number of your VMWare fusion?
>
> I think maybe this issue is related to the GFX emulation inside VMWare
> Fusion.
Hi Michael, its Version 8.5.0 (4352717) Fusion Mac - also attached a m-c stack from today
| Assignee | ||
Comment 18•9 years ago
|
||
OK, I will open it and wait for several minutes.
BTW, here is my about:support.
Graphics
Features
Compositing Basic
Asynchronous Pan/Zoom wheel input enabled; touch input enabled
WebGL Renderer Google Inc. -- ANGLE (Software Adapter Direct3D11 vs_4_1 ps_4_1)
WebGL2 Renderer Google Inc. -- ANGLE (Software Adapter Direct3D11 vs_4_1 ps_4_1)
Hardware H264 Decoding No; Hardware video decoding disabled or blacklisted
Audio Backend unknown
Direct2D Blocked for your graphics card because of unresolved driver issues.
DirectWrite false (6.1.7601.17514)
GPU #1
Active Yes
Description VMware SVGA 3D
Vendor ID 0x15ad
Device ID 0x0405
Driver Version 8.15.1.33
Driver Date 10-16-2015
Drivers vm3dum vm3dum_10
Subsys ID 040515ad
RAM 8
Diagnostics
AzureCanvasAccelerated 0
AzureCanvasBackend skia
AzureContentBackend cairo
AzureFallbackCanvasBackend cairo
failures [GFX1-]: Refresh driver waiting for the compositor for1.04809 seconds.
Decision Log
D3D11_COMPOSITING
Blocklisted; failure code BLOCKLIST_FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR
D3D9_COMPOSITING
Blocklisted; failure code BLOCKLIST_FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR
DIRECT2D
unavailable by default: Direct2D requires Direct3D 11 compositing
D3D11_HW_ANGLE
unavailable by default: D3D11 compositing is disabled
disabled by env: D3D11 compositing is disabled
| Assignee | ||
Comment 19•9 years ago
|
||
OK, I can reproduce it after I move the VM to same VMWare Fusion build.
It seems that only windows 7 32bit has this issue.
Comment 20•9 years ago
|
||
Track 51+ as it can be reproduced.
Updated•9 years ago
|
Assignee: nobody → cleu
| Assignee | ||
Comment 21•9 years ago
|
||
I found that this crash always happens after a buffer allocation failure.
This website has a texture with dimension 8192*8192, which requires 256MB of memory.
And this allocation which utilizes calloc sometime fails with a nullptr return here.
https://dxr.mozilla.org/mozilla-central/source/dom/canvas/TexUnpackBlob.cpp?q=Unable+to+allocate+buffer+during+conversion&redirect_type=single#253
But both the VM's system or GFX memory are sufficient to allocate, so I am still trying to figure out why this memory allocation failure happen and why this happens intermittently.
| Assignee | ||
Comment 22•9 years ago
|
||
OK, I think I found the reason.
calloc returns nullptr when no consecutive required memory space available.
Since 256MB is a very big value, it will be fail-prone.
It also explains why only 32bit windows happens because 64bit has a much larger address space.
So it may be a problem of our error handling, we should have bailed out from the texture operations before it hit MOZ_CRASH.
Comment 23•9 years ago
|
||
(In reply to Michael Leu[:lenzak800](UTC+8)[PTO 10/6 ~ 10/13] from comment #22)
> OK, I think I found the reason.
>
> calloc returns nullptr when no consecutive required memory space available.
>
> Since 256MB is a very big value, it will be fail-prone.
>
> It also explains why only 32bit windows happens because 64bit has a much
> larger address space.
>
> So it may be a problem of our error handling, we should have bailed out from
> the texture operations before it hit MOZ_CRASH.
Good catch, Michael.
| Assignee | ||
Comment 24•9 years ago
|
||
So here is how the crash happens.
1. The website calls TexImage2D, requesting a large buffer.
2. Under certain environment, we use CPU-side conversion in ConvertIfNeeded
https://dxr.mozilla.org/mozilla-central/source/dom/canvas/TexUnpackBlob.cpp?q=ConvertIfNeeded&redirect_type=direct#171
3. We cannot allocate such a large buffer in CPU side, so the calloc returns null, following an OOM exception thrown.
https://dxr.mozilla.org/mozilla-central/source/dom/canvas/TexUnpackBlob.cpp?q=ConvertIfNeeded&redirect_type=direct#253
4. The website still calls GenerateMipmap with this faulty texture, so we got an INVALID_OPERATION exception, then we crashed because it is a debug build with MOZ_GL_DEBUG_ABORT_ON_ERROR set.
This whole flow seems to be normal, we have told the website we're out of memory but they still use it, so we pop out more error, and it crashes because its debug configuration.
Jeff, do you have any suggestion about this one?
Maybe it is just a normal reaction instead of a bug.
Flags: needinfo?(jgilbert)
Comment 25•9 years ago
|
||
Great run-down.
#4 should not happen. We should be validating that it's valid to send glGenerateMipmaps to the driver before we call it. The fix is the repair our GenerateMipmap validation.
Flags: needinfo?(jgilbert)
| Assignee | ||
Comment 26•9 years ago
|
||
| Assignee | ||
Comment 27•9 years ago
|
||
Comment on attachment 8804598 [details] [diff] [review]
Prevent BaseImageInfo being initialized when TexOrSubImage fails.
Review of attachment 8804598 [details] [diff] [review]:
-----------------------------------------------------------------
I found that we do have some checks when we call GenerateMipmap,
the problem is that if blob->TexOrSubImage fails with a false return before it reaches the real GL part (DoTexSubImage),
it wouldn't be treated as failure because glError is 0 since we didn't reach real GL call.
https://dxr.mozilla.org/mozilla-central/source/dom/canvas/WebGLTextureUpload.cpp#1423
So we continue to execute, initializing the ImageInfo inside WebGLTexture which make other further calls thought it is a valid texture.
This patch add checks for blob->TexOrSubImage, bailing out if it returns false, which prevent the ImageInfo from falsely initialized.
So further texture calls will bail out by their own checking.
Attachment #8804598 -
Flags: review?(jgilbert)
Comment on attachment 8804598 [details] [diff] [review]
Prevent BaseImageInfo being initialized when TexOrSubImage fails.
Review of attachment 8804598 [details] [diff] [review]:
-----------------------------------------------------------------
Feels like we should do similar error reporting as we do for out of memory scenarios, rather than just return?
| Assignee | ||
Comment 29•9 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #28)
> Comment on attachment 8804598 [details] [diff] [review]
> Prevent BaseImageInfo being initialized when TexOrSubImage fails.
>
> Review of attachment 8804598 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Feels like we should do similar error reporting as we do for out of memory
> scenarios, rather than just return?
It has thrown exception here so the website should have known they run out of memory.
https://dxr.mozilla.org/mozilla-central/source/dom/canvas/TexUnpackBlob.cpp?q=ConvertIfNeeded&redirect_type=direct#253
The problem is that it continue to initialize ImageInfo after the exception thrown,
so further web gl calls don't know the texture is faulty.
Since it has thrown OM exception, I think we can just return.
Too late for 50, we should plan to fix this in 51.
| Comment hidden (mozreview-request) |
Comment 32•9 years ago
|
||
Comment on attachment 8804598 [details] [diff] [review]
Prevent BaseImageInfo being initialized when TexOrSubImage fails.
Review of attachment 8804598 [details] [diff] [review]:
-----------------------------------------------------------------
This is the right idea, but I didn't really make the guarantee tight when I wrote this before.
I uploaded patch that expands on yours.
::: dom/canvas/WebGLTextureUpload.cpp
@@ +1336,5 @@
>
> GLenum glError;
> + if (!blob->TexOrSubImage(isSubImage, needsRespec, funcName,
> + this, target, level, driverUnpackInfo,
> + xOffset, yOffset, zOffset, &glError)) {
{ goes on its own line for multi-line conditionals.
Attachment #8804598 -
Flags: review?(jgilbert) → review-
Updated•9 years ago
|
Attachment #8804598 -
Attachment is obsolete: true
Updated•9 years ago
|
Attachment #8811103 -
Flags: review?(cleu)
| Assignee | ||
Comment 33•9 years ago
|
||
| mozreview-review | ||
Comment on attachment 8811103 [details]
Bug 1290831 - Clarify TexUnpackBlob::TexOrSubImage's fallibility and update callers. -
https://reviewboard.mozilla.org/r/93320/#review93480
Thanks for your suggestion. :)
Attachment #8811103 -
Flags: review?(cleu) → review+
Comment 34•9 years ago
|
||
Pushed by jgilbert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/971de933cd5d
Clarify TexUnpackBlob::TexOrSubImage's fallibility and update callers. - r=cleu
| Reporter | ||
Comment 35•9 years ago
|
||
| bugherder | ||
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox53:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment 36•9 years ago
|
||
Hi Michael,
could you please nominate this uplift to Beta51 and Aurora52 if this patch is not too risky?
Flags: needinfo?(cleu)
| Assignee | ||
Comment 37•9 years ago
|
||
Comment on attachment 8811103 [details]
Bug 1290831 - Clarify TexUnpackBlob::TexOrSubImage's fallibility and update callers. -
Approval Request Comment
[Feature/regressing bug #]: Bug1290831
[User impact if declined]: Users will encounter unstable webGL when we perform CPu-side texture processing.
[Describe test coverage new/current, TreeHerder]: Hand test and see no crash in debug build.
[Risks and why]: Low, it just slightly modifies the operation validation process.
[String/UUID change made/needed]: N/A
Flags: needinfo?(cleu)
Attachment #8811103 -
Flags: approval-mozilla-aurora?
Comment 38•9 years ago
|
||
Hi Michael,
Because WebGL 2 will be shipped in 51, isn't this worth uplifting to Beta51?
Flags: needinfo?(cleu)
Comment 40•9 years ago
|
||
Comment on attachment 8811103 [details]
Bug 1290831 - Clarify TexUnpackBlob::TexOrSubImage's fallibility and update callers. -
Approval Request Comment
[Feature/regressing bug #]: webgl2
[User impact if declined]:
[Describe test coverage new/current, TreeHerder]:
[Risks and why]:
[String/UUID change made/needed]:
Attachment #8811103 -
Flags: approval-mozilla-beta?
Comment 41•9 years ago
|
||
Comment on attachment 8811103 [details]
Bug 1290831 - Clarify TexUnpackBlob::TexOrSubImage's fallibility and update callers. -
Fix an issue related to WebGL 2. Beta51+ and Aurora52+. Should be in 51 beta 2.
Attachment #8811103 -
Flags: approval-mozilla-beta?
Attachment #8811103 -
Flags: approval-mozilla-beta+
Attachment #8811103 -
Flags: approval-mozilla-aurora?
Attachment #8811103 -
Flags: approval-mozilla-aurora+
Comment 42•9 years ago
|
||
| bugherder uplift | ||
Comment 43•9 years ago
|
||
| bugherder uplift | ||
Updated•9 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•