Closed Bug 933336 Opened 12 years ago Closed 2 years ago

Null pointer in SurfaceFactory_Gralloc ctor because document has no container

Categories

(Core :: Graphics, defect)

ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED INVALID

People

(Reporter: jld, Unassigned)

Details

Attachments

(1 file, 1 obsolete file)

I'm trying to run mochitests on the emulator, and content/canvas/test/webgl/non-conf-tests/test_highp_fs.html fails because of a null pointer access in the SurfaceFactory_Gralloc constructor (or, if debugging is enabled, an assertion in that function), in the content process. I tried chasing the null backwards through the code, into WebGLContent::SetDimensions, and the problem seems to be that the document involved has no container, and therefore no docWidget, no layerManager, and no forwarder / surfaceAllocator. (I hope this makes some amount of sense, because I currently understand little about the DOM and less about the graphics subsystem.) I don't understand why this would fail for me but work on TBPL. I have local changes so that content process sandboxing works (which shouldn't affect anything else unless/until a process is actually being killed for sandbox violation), and I've applied the patch from bug 927633 so that the tests can get this far under sandboxing (which landed on b2g-inbound while I was investigating, and has green B2G mochitests from TBPL). Bug 881169 looks similar, so I've cc'ed the people involved with it.
Component: General → Graphics
Product: Firefox OS → Core
That's interesting. Could you post a stack? (Is this a crash?)
F/MOZ_Assert( 764): Assertion failure: allocator, at /home/jld/src/B2G.goldfish/gecko/gfx/gl/SharedSurfaceGralloc.cpp:44 (gdb) bt #0 0x41594322 in SurfaceFactory_Gralloc (this=0x44bf6180, prodGL=<value optimized out>, caps=..., allocator=0x0) at /home/jld/src/B2G.goldfish/gecko/gfx/gl/SharedSurfaceGralloc.cpp:44 #1 0x415a1652 in mozilla::gl::GLScreenBuffer::Create (gl=0x44b8d800, size=<value optimized out>, caps=...) at /home/jld/src/B2G.goldfish/gecko/gfx/gl/GLScreenBuffer.cpp:44 #2 0x41598ca2 in mozilla::gl::GLContext::CreateScreenBufferImpl (this=0x4217a78a, size=..., caps=<value optimized out>) at /home/jld/src/B2G.goldfish/gecko/gfx/gl/GLContext.cpp:3440 #3 0x415963e4 in mozilla::gl::GLContext::CreateScreenBuffer (size=..., caps=..., flags=<value optimized out>) at /home/jld/src/B2G.goldfish/gecko/gfx/gl/GLContext.h:3081 #4 mozilla::gl::GLContext::InitOffscreen (size=..., caps=..., flags=<value optimized out>) at /home/jld/src/B2G.goldfish/gecko/gfx/gl/GLContext.h:3046 #5 mozilla::gl::GLContextProviderEGL::CreateOffscreen (size=..., caps=..., flags=<value optimized out>) at /home/jld/src/B2G.goldfish/gecko/gfx/gl/GLContextProviderEGL.cpp:1326 #6 0x40a997d8 in mozilla::WebGLContext::SetDimensions (this=0x447dfe00, width=1, height=1) at /home/jld/src/B2G.goldfish/gecko/content/canvas/src/WebGLContext.cpp:551 #7 0x40afa260 in mozilla::dom::HTMLCanvasElement::UpdateContext (this=0x447bd280, aCx=0x402c35e0, aNewContextOptions=...) at /home/jld/src/B2G.goldfish/gecko/content/html/content/src/HTMLCanvasElement.cpp:801 #8 0x40afa436 in mozilla::dom::HTMLCanvasElement::GetContext (this=0x447bd280, aCx=<value optimized out>, aContextId=..., aContextOptions=..., rv=...) at /home/jld/src/B2G.goldfish/gecko/content/html/content/src/HTMLCanvasElement.cpp:713 #9 0x412fb188 in getContext (cx=0x402c35e0, obj=..., self=0x447bd280, args=...) at /home/jld/src/B2G.goldfish/objdir-gecko/dom/bindings/HTMLCanvasElementBinding.cpp:203 #10 0x412f3b0a in genericMethod (cx=0x402c35e0, argc=<value optimized out>, vp=<value optimized out>) at /home/jld/src/B2G.goldfish/objdir-gecko/dom/bindings/HTMLCanvasElementBinding.cpp:605 #11 0x41aefb00 in js::CallJSNative (cx=0x402c35e0, native=0x412f3a55 <genericMethod>, args=...) at /home/jld/src/B2G.goldfish/gecko/js/src/jscntxtinlines.h:220 #12 0x41b026a8 in js::Invoke (cx=0x402c35e0, args=..., construct=js::NO_CONSTRUCT) at /home/jld/src/B2G.goldfish/gecko/js/src/vm/Interpreter.cpp:462 #13 0x419fa426 in js_fun_apply (cx=0x402c35e0, argc=<value optimized out>, vp=0xbea4ce40) at /home/jld/src/B2G.goldfish/gecko/js/src/jsfun.cpp:1030 #14 0x41aefb00 in js::CallJSNative (cx=0x402c35e0, native=0x419fa229 <js_fun_apply(JSContext*, unsigned int, JS::Value*)>, args=...) at /home/jld/src/B2G.goldfish/gecko/js/src/jscntxtinlines.h:220 #15 0x41b026a8 in js::Invoke (cx=0x402c35e0, args=..., construct=js::NO_CONSTRUCT) at /home/jld/src/B2G.goldfish/gecko/js/src/vm/Interpreter.cpp:462 #16 0x419fa202 in js_fun_call (cx=0x402c35e0, argc=2, vp=0x440502f0) at /home/jld/src/B2G.goldfish/gecko/js/src/jsfun.cpp:876 #17 0x41aefb00 in js::CallJSNative (cx=0x402c35e0, native=0x419fa0bd <js_fun_call(JSContext*, unsigned int, JS::Value*)>, args=...) at /home/jld/src/B2G.goldfish/gecko/js/src/jscntxtinlines.h:220 #18 0x41b026a8 in js::Invoke (cx=0x402c35e0, args=..., construct=js::NO_CONSTRUCT) at /home/jld/src/B2G.goldfish/gecko/js/src/vm/Interpreter.cpp:462 #19 0x41af6402 in Interpret (cx=0x402c35e0, state=...) at /home/jld/src/B2G.goldfish/gecko/js/src/vm/Interpreter.cpp:2499 #20 0x41b01fac in js::RunScript (cx=0x402c35e0, state=...) at /home/jld/src/B2G.goldfish/gecko/js/src/vm/Interpreter.cpp:419 #21 0x41b025f4 in js::Invoke (cx=0x402c35e0, args=..., construct=js::NO_CONSTRUCT) at /home/jld/src/B2G.goldfish/gecko/js/src/vm/Interpreter.cpp:481 #22 0x41b02f2a in js::Invoke (cx=0x402c35e0, thisv=..., fval=..., argc=1, argv=0xbea4ded0, rval=...) at /home/jld/src/B2G.goldfish/gecko/js/src/vm/Interpreter.cpp:512 #23 0x41a73570 in call (this=<value optimized out>, cx=0x402c35e0, proxy=<value optimized out>, args=...) at /home/jld/src/B2G.goldfish/gecko/js/src/jsproxy.cpp:1012 #24 0x41a6ec2a in js::Proxy::call (cx=0x402c35e0, proxy=..., args=...) at /home/jld/src/B2G.goldfish/gecko/js/src/jsproxy.cpp:2643 #25 0x41a73eb8 in proxy_Call (cx=0x402c35e0, argc=<value optimized out>, vp=<value optimized out>) at /home/jld/src/B2G.goldfish/gecko/js/src/jsproxy.cpp:3037 #26 0x41aefb00 in js::CallJSNative (cx=0x402c35e0, native=0x41a73e39 <proxy_Call>, args=...) at /home/jld/src/B2G.goldfish/gecko/js/src/jscntxtinlines.h:220 #27 0x41b026a8 in js::Invoke (cx=0x402c35e0, args=..., construct=js::NO_CONSTRUCT) at /home/jld/src/B2G.goldfish/gecko/js/src/vm/Interpreter.cpp:462 #28 0x41b02f2a in js::Invoke (cx=0x402c35e0, thisv=..., fval=..., argc=1, argv=0x44050200, rval=...) at /home/jld/src/B2G.goldfish/gecko/js/src/vm/Interpreter.cpp:512 #29 0x41a77996 in js::DirectProxyHandler::call (this=<value optimized out>, cx=0x402c35e0, proxy=<value optimized out>, args=<value optimized out>) at /home/jld/src/B2G.goldfish/gecko/js/src/jsproxy.cpp:454 #30 0x41ab837c in js::CrossCompartmentWrapper::call (this=<value optimized out>, cx=0x402c35e0, wrapper=..., args=...) at /home/jld/src/B2G.goldfish/gecko/js/src/jswrapper.cpp:454 #31 0x41a6ec2a in js::Proxy::call (cx=0x402c35e0, proxy=..., args=...) at /home/jld/src/B2G.goldfish/gecko/js/src/jsproxy.cpp:2643 #32 0x41a73eb8 in proxy_Call (cx=0x402c35e0, argc=<value optimized out>, vp=<value optimized out>) at /home/jld/src/B2G.goldfish/gecko/js/src/jsproxy.cpp:3037 #33 0x41aefb00 in js::CallJSNative (cx=0x402c35e0, native=0x41a73e39 <proxy_Call>, args=...) at /home/jld/src/B2G.goldfish/gecko/js/src/jscntxtinlines.h:220 #34 0x41b026a8 in js::Invoke (cx=0x402c35e0, args=..., construct=js::NO_CONSTRUCT) at /home/jld/src/B2G.goldfish/gecko/js/src/vm/Interpreter.cpp:462 #35 0x41af6402 in Interpret (cx=0x402c35e0, state=...) at /home/jld/src/B2G.goldfish/gecko/js/src/vm/Interpreter.cpp:2499 #36 0x41b01fac in js::RunScript (cx=0x402c35e0, state=...) at /home/jld/src/B2G.goldfish/gecko/js/src/vm/Interpreter.cpp:419 #37 0x41b036ae in ExecuteKernel (cx=0x402c35e0, script=..., scopeChainArg=<value optimized out>, rval=<value optimized out>) at /home/jld/src/B2G.goldfish/gecko/js/src/vm/Interpreter.cpp:610 #38 js::Execute (cx=0x402c35e0, script=..., scopeChainArg=<value optimized out>, rval=<value optimized out>) at /home/jld/src/B2G.goldfish/gecko/js/src/vm/Interpreter.cpp:647 #39 0x419b1fe2 in JS::Evaluate (cx=0x402c35e0, obj=..., options=..., chars=0x44b12008 u"DriverInfo = (function() {\n // ", '-' <repeats 75 times>, "\n // Debug info handling\n\n function defaultInfoFunc(str) {\n console.log('Info: ' + str);"..., length=3900, rval=0x0) at /home/jld/src/B2G.goldfish/gecko/js/src/jsapi.cpp:4781 #40 0x40c8944a in nsJSUtils::EvaluateString (aCx=0x402c35e0, aScript=<value optimized out>, aScopeObject=<value optimized out>, aCompileOptions=..., aEvaluateOptions=..., aRetValue=0x0, aOffThreadToken=0x0) at /home/jld/src/B2G.goldfish/gecko/dom/base/nsJSUtils.cpp:278 #41 0x40c840c4 in nsJSContext::EvaluateString (this=0x43ed59c0, aScript=..., aScopeObject=..., aCompileOptions=..., aCoerceToString=false, aRetValue=0x0, aOffThreadToken=0x0) at /home/jld/src/B2G.goldfish/gecko/dom/base/nsJSEnvironment.cpp:962 #42 0x40a6aeb2 in nsScriptLoader::EvaluateScript (this=<value optimized out>, aRequest=0x446dec00, aScript=..., aOffThreadToken=0x0) at /home/jld/src/B2G.goldfish/gecko/content/base/src/nsScriptLoader.cpp:1009 #43 0x40a6b260 in nsScriptLoader::ProcessRequest (this=0x446dce20, aRequest=0x446dec00, aOffThreadToken=0x0) at /home/jld/src/B2G.goldfish/gecko/content/base/src/nsScriptLoader.cpp:874 #44 0x40a6d96c in nsScriptLoader::ProcessPendingRequests (this=0x446dce20) at /home/jld/src/B2G.goldfish/gecko/content/base/src/nsScriptLoader.cpp:1039 #45 0x406285e8 in nsRunnableMethodImpl<unsigned int (mozilla::net::BackgroundFileSaverStreamListener::*)(), void, true>::Run (this=<value optimized out>) at ../../../dist/include/nsThreadUtils.h:382 #46 0x414ddf08 in nsThread::ProcessNextEvent (this=0x40202320, mayWait=<value optimized out>, result=0xbea4ee0f) at /home/jld/src/B2G.goldfish/gecko/xpcom/threads/nsThread.cpp:622 #47 0x414a9c1e in NS_ProcessNextEvent (thread=0x40202320, mayWait=false) at /home/jld/src/B2G.goldfish/gecko/xpcom/glue/nsThreadUtils.cpp:251 #48 0x41171cf8 in mozilla::ipc::MessagePump::Run (this=0x40201b50, aDelegate=0xbea4f8cc) at /home/jld/src/B2G.goldfish/gecko/ipc/glue/MessagePump.cpp:85 #49 0x41171e78 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x40201b50, aDelegate=0xbea4f8cc) at /home/jld/src/B2G.goldfish/gecko/ipc/glue/MessagePump.cpp:250 #50 0x41506672 in MessageLoop::RunInternal (this=0xbea4f8cc) at /home/jld/src/B2G.goldfish/gecko/ipc/chromium/src/base/message_loop.cc:220 #51 0x4150668a in MessageLoop::RunHandler (this=0xbea4f8cc) at /home/jld/src/B2G.goldfish/gecko/ipc/chromium/src/base/message_loop.cc:213 #52 MessageLoop::Run (this=0xbea4f8cc) at /home/jld/src/B2G.goldfish/gecko/ipc/chromium/src/base/message_loop.cc:187 #53 0x410f6b8a in nsBaseAppShell::Run (this=0x43e908e0) at /home/jld/src/B2G.goldfish/gecko/widget/xpwidgets/nsBaseAppShell.cpp:161 #54 0x405c9932 in XRE_RunAppShell () at /home/jld/src/B2G.goldfish/gecko/toolkit/xre/nsEmbedFunctions.cpp:714 #55 0x41171de2 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x40201b50, aDelegate=0xbea4f8cc) at /home/jld/src/B2G.goldfish/gecko/ipc/glue/MessagePump.cpp:217 #56 0x41506672 in MessageLoop::RunInternal (this=0xbea4f8cc) at /home/jld/src/B2G.goldfish/gecko/ipc/chromium/src/base/message_loop.cc:220 #57 0x4150668a in MessageLoop::RunHandler (this=0xbea4f8cc) at /home/jld/src/B2G.goldfish/gecko/ipc/chromium/src/base/message_loop.cc:213 #58 MessageLoop::Run (this=0xbea4f8cc) at /home/jld/src/B2G.goldfish/gecko/ipc/chromium/src/base/message_loop.cc:187 #59 0x405ca1da in XRE_InitChildProcess (aArgc=2, aArgv=0xbea4f9d4, aProcess=3198483044) at /home/jld/src/B2G.goldfish/gecko/toolkit/xre/nsEmbedFunctions.cpp:551 #60 0x00008786 in main (argc=6, argv=0xbea4fa64) at /home/jld/src/B2G.goldfish/gecko/ipc/app/MozillaRuntimeMain.cpp:116
Also fails on current (as of this writing) m-c with --disable-content-sandbox. I continue to not understand why this would break for me but not for TBPL.
Attached patch jed-crash-null-surfaceallocator (obsolete) — Splinter Review
Jed, can you try this patch? Does it prevent the crash, and do you see its message in logcat?
Flags: needinfo?(jld)
(In reply to Benoit Jacob [:bjacob] from comment #4) > Jed, can you try this patch? Does it prevent the crash, and do you see its > message in logcat? The magic patch saved the day — the message appeared, the process didn't crash, and the test case seems to have passed.
Flags: needinfo?(jld)
Jeff & Sotaro, do you think that this is "the right patch" or just hiding the real issue? (See above, Jed confirms that it removes the crash).
Attachment #827987 - Flags: review?(sotaro.ikeda.g)
Attachment #827987 - Flags: review?(jgilbert)
Attachment #826741 - Attachment is obsolete: true
Comment on attachment 827987 [details] [diff] [review] Check for null allocator From GLScreenBuffer point of view, it seems correct. But I do not know from WebGL and SkiaGL canvas point of view.
Attachment #827987 - Flags: review?(sotaro.ikeda.g) → review+
Comment on attachment 827987 [details] [diff] [review] Check for null allocator Review of attachment 827987 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/gl/GLScreenBuffer.cpp @@ +43,5 @@ > { > + // bug 933336 - we got crashes when we tried to allocate surfaces > + // and we didn't have a surfaceAllocator. > + if (!caps.surfaceAllocator) { > + return nullptr; This will disable WebGL on B2G in these cases, instead of falling back to readback. From the bug, it sounds like the goal is to have WebGL run here, not disable it.
Attachment #827987 - Flags: review?(jgilbert) → review-
Do we want to fall back? I thought that allocator being null would only happen in weird cases where we'd be e.g. still doing canvas work late during the shutdown sequence. If we do want to fall back, please give me a hint as to how to do that.
(In reply to Benoit Jacob [:bjacob] from comment #9) > Do we want to fall back? I thought that allocator being null would only > happen in weird cases where we'd be e.g. still doing canvas work late during > the shutdown sequence. > > If we do want to fall back, please give me a hint as to how to do that. If we don't know when this should happen (fun!), we should NS_WARNING here. To let it fall-back, just don't return nullptr, and instead just *don't* run that b2g-specific code.
(In reply to Jeff Gilbert [:jgilbert] from comment #8) > This will disable WebGL on B2G in these cases, instead of falling back to > readback. From the bug, it sounds like the goal is to have WebGL run here, > not disable it. For what it's worth: Even with that patch, WebGL inside the B2G emulator seems to be… less disabled on my laptop than on TBPL, at least, because I ran into several mochitests which assumed that running on B2G implied a lack of support for WebGL and were failing because some WebGL calls had unexpectedly succeeded.
Comment on attachment 827987 [details] [diff] [review] Check for null allocator So I looked into writing a NS_WARNING here, but I don't actually think that that is justified. As things currently are, nothing prevents a script from creating a canvas context, which will call GLScreenBuffer::Create, while the layers are already being torn down. And that is not going to change. So if I put a NS_WARNING there, it is going to show up in logs, and we won't be able to do anything about it; and failing the surface creating is a fine thing to do since we are already tearing things down; a fallback would only be useless extra work.
Attachment #827987 - Flags: review- → review?(jgilbert)
Comment on attachment 827987 [details] [diff] [review] Check for null allocator Review of attachment 827987 [details] [diff] [review]: ----------------------------------------------------------------- I really don't want to troubleshoot something later down the road where we're failing to allocate frames because of this. The whole point of this stuff is to offer accelerated version, not fail if the accelerated version doesn't work. Please keep this fallback working.
Attachment #827987 - Flags: review?(jgilbert) → review-
The only situations where the SurfaceAllocator would be null, that I can think of, are situations where we wouldn't want to fall back. Example: during the shutdown sequence. Can you think of a situation where the SurfaceAllocator would be null, and we would want to fall back?
Flags: needinfo?(jgilbert)
No longer blocks: 932104
I still think we should *always* fall back, instead of trying to predict when we should or shouldn't work properly. If you really don't think it's good to fall back here, we I think we *must* have some useful indication that we're consciously letting this break, such as by Warning to the console.
Flags: needinfo?(jgilbert)
I am not sure that we are on the same page here. We are not talking about a "some resource is exhausted, let's fall back to another resource type" here. We're not, like, running out of graphics memory and having to fall back to plain old memory, or anything like that. If we were, I'd be all for a fallback. To continue the "memory allocation" metaphor, the present situation is rather like this: we were calling our memory allocator through a function pointer (i.e. a pointer to the malloc function) and that pointer happens to be null. It should never be null, but maybe we have a race condition during the shutdown sequence of our application, so we've already nulled our pointer to the malloc() function and later on we're still trying to allocate things. We're working in parallel on fixing the shutdown sequence by holding strong references to that ISurfaceAllocator, so that should soon no longer happen (bug 933082) but meanwhile this seemed like an easy crash fix.
Note that the other scenario that could well explain the present crash (especially the "because document has no container part") is that we have an offscreen canvas here that's never been inserted into the DOM below an actual window, so it doesn't have a ClientLayerManager associated with it, and currently the ClientLayerManager _is_ the ISurfaceAllocator (ooops). That, too, should be fixed by bug 933082.
Severity: normal → S3

Closing old B2G bugs

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: