Open Bug 1309586 Opened 8 years ago Updated 2 years ago

Assertion failure: false at WebGLContext.cpp:2274

Categories

(Core :: Graphics: CanvasWebGL, defect, P3)

defect

Tracking

()

People

(Reporter: cbook, Unassigned)

References

()

Details

(Keywords: assertion, stale-bug, Whiteboard: [gfx-noted])

Attachments

(3 files, 1 obsolete file)

Attached file stack
Assertion failure: false, at c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/src/dom/canvas/WebGLContext.cpp:2274

found via bughunter and reproduced on current windows debug tinderbox build on win 7.

Steps to reproduce:
-> Load http://www.colorizephoto.com/converter

--> Assertion
Priority: -- → P1
Whiteboard: [gfx-noted]
Flags: needinfo?(cleu)
Hi Tomcat,

Is your test environment same as bug1290831?

I can reproduce the crash but it happened only once, is it an intermittent crash?
Flags: needinfo?(cleu) → needinfo?(cbook)
OK, now I know how to exactly reproduce this crash.

When we are unable to reproduce the crash, go to Settings->Privacy to clear all the history and records, and restart nightly and the crash happens again.

So it seems that it has something to do with our offline cache system.
yeah its the same environment :) and awesome work and thanks for all your work michael!
Flags: needinfo?(cbook)
This assertion is caused by a GL_OUT_OF_MEMORY error, it seems that texSubImage2D fails to claim enough memory to clear.

The reason why it does not reproduce is because the website does not call texSubImage2D after first access.

Now I am trying to figure out what texture causes this error.
If GL_OUT_OF_MEMORY can happen here, which means that it should not assume there must be no error, so I remove the MOZ_ASSERT and let it throw an Out Of Memory exception
Assignee: nobody → cleu
Why does ANGLE throw OOM? Occasionally, it's not actually out of memory, but rather doesn't know what else to do, and panics. We should be sure that we're not causing it to panic.
Attached patch patchSplinter Review
Michael, can you help me confirm if this patch can fix the problem?
Flags: needinfo?(cleu)
No, the crash still happens.
Flags: needinfo?(cleu)
I have narrowed down the GL call throws this GL_OUT_OF_MEMORY exception.

It is a glClear in WebGLContext::ForceClearFramebufferWithDefaultValues
https://dxr.mozilla.org/mozilla-central/source/dom/canvas/WebGLContext.cpp?q=WebGLContext.cpp&redirect_type=direct#1550

Since it is a quite basic GL call, may be it indicates a malformed framebuffer or context?
Hi Michael, as Jeff mentioned in comment 6, there might not be OOM. ANGLE would throw OOM for all of panics, for example, http://searchfox.org/mozilla-central/source/gfx/angle/src/libGLESv2/global_state.cpp#136

We can use GL_DEBUG_OUPUT extension to get actual error message inside ANGLE. Please see https://bugzilla.mozilla.org/attachment.cgi?id=8764527&action=diff for example.

Or more simple, since you can reproduce in your local machine. Using debugger to step into ANGLE to see which error is happened.
Thanks for your suggestion, Morris :)

I have found where throws the exception inside ANGLE, it is thrown by a failure CreateTexure2D.
And the error code is 0x800700E, which seems to be an error code of OOM in Windows.

https://dxr.mozilla.org/mozilla-central/source/gfx/angle/src/libANGLE/renderer/d3d/d3d11/TextureStorage11.cpp#1115

This crash also happens on my Windows 10 physical machine,

I think maybe it is caused by a malformed or oversized buffer manipulation.
The failure CreateTexture2D attempts to create a 8192x8192 texture, which is a deja vu in bug1209831.
https://bugzilla.mozilla.org/show_bug.cgi?id=1290831#c21

I think it is not coincidence because 8192x8192 is also the maximum size which VMWare SVGA adapter supports.

I don't think this website needs texture this large, maybe the texture is populated by gecko itself.
Michael, you can use MOZ_GL_DEBUG_VERBOS / MOZ_GL_DEBUG_ABORT_ON_ERROR to print gl calls and see if there is other gl errors before the assertion.
Thanks for your suggestion, Ethan :)

I found 2 glGetIntegerv calls return error GL_INVALID_ENUM when gecko is initializing GL context.

What it attempts to query seems to be GL_MAX_3D_TEXTURE_SIZE(_EXT) and GL_MAX_ARRAY_TEXTURE_LAYERS(_EXT).

Not sure whether it is relevant to the crash.
I found that there are some games crashed by similar failure (CreateTexture2D returns E_OOM), their solution is usually asking users to update their driver.

And this crash cannot be reproduced on my physical windows 10 machine anymore after I updated driver.

So maybe this is actually a driver issue.

ps. The GL_INVALID_ENUM I've mentioned before is an expected failure and is not related to this crash.
Hi jeff, I think this is a driver issue.
On some older NVidia and all VMWare 32bit drivers, we get E_OUTOFMEMORY when calling CreateTexture2D.
http://searchfox.org/mozilla-central/rev/f5c9e9a249637c9abd88754c8963ecb3838475cb/gfx/angle/src/libANGLE/renderer/d3d/d3d11/TextureStorage11.cpp#1057

Should we block affected driver version or just throw this exception back to the website?
Flags: needinfo?(jgilbert)
ANGLE should definitely be generating OUT_OF_MEMORY here.
Flags: needinfo?(jgilbert)
Comment on attachment 8816072 [details] [diff] [review]
Handle OOM error when we encounter it in ClearFrameBuffer

Review of attachment 8816072 [details] [diff] [review]:
-----------------------------------------------------------------

If it is normal that ANGLE throws OOM here, maybe we should not treat it as a fatal error and just forward the exception to the website.
Attachment #8816072 - Flags: feedback?(jgilbert)
Oh, but it's in glClear?
We really, really don't want to wrap glClear in error checking wrappers.

We should probably allow handling of unexpected GL_OOM errors, and handle it by force-losing the context.
It seems to be a real OOM here.

After retest by the latest nightly, the crash is caused by a calloc failure which is different than before.
https://dxr.mozilla.org/mozilla-central/source/dom/canvas/WebGLTexture.cpp?q=ZeroTextureData&redirect_type=direct#729

This website asks a 512MB buffer which is too big for this calloc to handle and it got a nullptr, and we crash because the assertion here.
https://dxr.mozilla.org/mozilla-central/source/dom/canvas/WebGLTextureUpload.cpp#735

I think we should handle this situation instead just assert it will always succeed.
Attachment #8816072 - Flags: feedback?(jgilbert)
Assignee: cleu → nobody
Moving to p3 because no activity for at least 24 weeks.
Priority: P1 → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: