Open Bug 1343102 Opened 3 years ago Updated 6 months ago

WebRender builds abort at startup when run under RR, with "Assertion failure: api, at gfx/layers/ipc/CompositorBridgeParent.cpp:1592"

Categories

(Core :: Graphics: WebRender, defect, P3)

defect

Tracking

()

Tracking Status
firefox54 --- affected
firefox56 --- unaffected
firefox57 --- unaffected

People

(Reporter: dholbert, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [gfx-noted])

STR:
 1. Build mozilla-central with these options in your mozconfig:
ac_add_options --enable-debug --disable-optimize
ac_add_options --enable-webrender

 2. Build rr, as described here (or you could probably download prebuilt rr binaries)
  https://github.com/mozilla/rr/wiki/Building-And-Installing

 3. Try to run your mozilla build under rr:
  cd mozilla-central
  /path/to/your/rr/objdir/bin/rr record ./mach run

ACTUAL RESULTS:
Startup abort (before any window appears), with the following output:
===========
Assertion failure: api, at $SRC/gfx/layers/ipc/CompositorBridgeParent.cpp:1592
#01: mozilla::layers::CompositorBridgeParent::AllocPWebRenderBridgeParent(WrPipelineId const&, mozilla::layers::TextureFactoryIdentifier*) ($SRC/gfx/layers/ipc/CompositorBridgeParent.cpp:1592 (discriminator 6))
#02: mozilla::layers::PCompositorBridgeParent::OnMessageReceived(IPC::Message const&, IPC::Message*&) ($OBJ/ipc/ipdl/PCompositorBridgeParent.cpp:1713)
#03: mozilla::ipc::MessageChannel::DispatchSyncMessage(IPC::Message const&, IPC::Message*&) ($SRC/ipc/glue/MessageChannel.cpp:1766)
#04: mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) ($SRC/ipc/glue/MessageChannel.cpp:1726)
#05: mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) ($SRC/ipc/glue/MessageChannel.cpp:1604 (discriminator 2))
#06: mozilla::ipc::MessageChannel::MessageTask::Run() ($SRC/ipc/glue/MessageChannel.cpp:1637)
#07: MessageLoop::RunTask(already_AddRefed<mozilla::Runnable>) ($SRC/ipc/chromium/src/base/message_loop.cc:358 (discriminator 2))
#08: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) ($SRC/ipc/chromium/src/base/message_loop.cc:366 (discriminator 2))
#09: MessageLoop::DoWork() ($SRC/ipc/chromium/src/base/message_loop.cc:441)
#10: base::MessagePumpDefault::Run(base::MessagePump::Delegate*) ($SRC/ipc/chromium/src/base/message_pump_default.cc:36)
#11: MessageLoop::RunInternal() ($SRC/ipc/chromium/src/base/message_loop.cc:239)
#12: MessageLoop::RunHandler() ($SRC/ipc/chromium/src/base/message_loop.cc:232)
#13: MessageLoop::Run() ($SRC/ipc/chromium/src/base/message_loop.cc:211)
#14: base::Thread::ThreadMain() ($SRC/ipc/chromium/src/base/thread.cc:182)
#15: ThreadFunc(void*) ($SRC/ipc/chromium/src/base/platform_thread_posix.cc:38)
#16: start_thread (/build/glibc-t3gR2i/glibc-2.23/nptl/pthread_create.c:333)
#17: __clone (/build/glibc-t3gR2i/glibc-2.23/misc/../sysdeps/unix/sysv/linux/x86_6
===========

I have no issues if I run without rr.


The assert in question is here:
http://searchfox.org/mozilla-central/rev/4039fb4c5833706f6880763de216974e00ba096c/gfx/layers/ipc/CompositorBridgeParent.cpp#1592

And it was added in bug 1328602, here:
https://hg.mozilla.org/mozilla-central/rev/68f28c9cae2d1d46567519f57825c3afbba48529#l1.21
Flags: needinfo?(nical.bugzilla)
Is there any other output to console before the assertion?
Yes, actually.  Here's the full output, up to the assertion -- I'm guessing that "WARNING: Failed to create GLXContext!" might be related.
=======
$ /scratch/work/tools/rr/obj/bin/rr record ./mach run
rr: Saving execution to trace directory `/home/dholbert/.local/share/rr/mach-4'.
 0:00.71 /scratch/work/builds/mozilla-central/mozilla-central-11-11-01.13-56/obj/dist/bin/firefox -no-remote -profile /scratch/work/builds/mozilla-central/mozilla-central-11-11-01.13-56/obj/tmp/scratch_user
Xinerama superpowers activated for 2 screens!
[11264] WARNING: attempt to modify an immutable nsStandardURL: file /scratch/work/builds/mozilla-central/mozilla-central-11-11-01.13-56/mozilla/netwerk/base/nsStandardURL.cpp, line 1641
[GLX] window 5200010 has VisualID 0x2b
[Parent 11264] WARNING: Failed to create GLXContext!: file /scratch/work/builds/mozilla-central/mozilla-central-11-11-01.13-56/mozilla/gfx/gl/GLContextProviderGLX.cpp, line 885
[Parent 11264] WARNING: Failed to create GLXContext!: file /scratch/work/builds/mozilla-central/mozilla-central-11-11-01.13-56/mozilla/gfx/gl/GLContextProviderGLX.cpp, line 885
Assertion failure: api, at /scratch/work/builds/mozilla-central/mozilla-central-11-11-01.13-56/mozilla/gfx/layers/ipc/CompositorBridgeParent.cpp:1592
=======
For comparison: when I run this WebRender build *without* rr, here's the terminal output up until the first Firefox window appears:
=======
$ ./mach run
Xinerama superpowers activated for 2 screens!
[11468] WARNING: attempt to modify an immutable nsStandardURL: file /scratch/work/builds/mozilla-central/mozilla-central-11-11-01.13-56/mozilla/netwerk/base/nsStandardURL.cpp, line 1641
[GLX] window 5200010 has VisualID 0x2b
[Child 11515] WARNING: '!compMgr', file /scratch/work/builds/mozilla-central/mozilla-central-11-11-01.13-56/mozilla/xpcom/components/nsComponentManagerUtils.cpp, line 63
WebRender - OpenGL version new 3.2.0 NVIDIA 367.57
[Parent 11468] WARNING: No inner window available!: file /scratch/work/builds/mozilla-central/mozilla-central-11-11-01.13-56/mozilla/dom/base/nsGlobalWindow.cpp, line 10183
[Parent 11468] WARNING: No inner window available!: file /scratch/work/builds/mozilla-central/mozilla-central-11-11-01.13-56/mozilla/dom/base/nsGlobalWindow.cpp, line 10183
[Child 11515] WARNING: '!compMgr', file /scratch/work/builds/mozilla-central/mozilla-central-11-11-01.13-56/mozilla/xpcom/components/nsComponentManagerUtils.cpp, line 63
=======

Notably, no GLXContext warnings.
NOTES on the GLXContext warnings:

From stepping through my rr trace, it looks like we're ending up with a null GLXContext because "mozilla::gl::GLXLibrary::xCreateContextAttribs" returns null, here:
>            context = glx.xCreateContextAttribs(
>                display,
>                cfg,
>                glxContext,
>                True,
>                attrib_list.Elements());
https://dxr.mozilla.org/mozilla-central/rev/e1135c6fdc9bcd80d38f7285b269e030716dcb72/gfx/gl/GLContextProviderGLX.cpp#852

Internally, that method calls "xCreateContextAttribsInternal()":
https://dxr.mozilla.org/mozilla-central/rev/e1135c6fdc9bcd80d38f7285b269e030716dcb72/gfx/gl/GLContextProviderGLX.cpp#770
...which (based on single-stepping) calls glXCreateContextAttribsARB from /usr/lib/nvidia-367/libGLX_nvidia.so.0.  And these all return null back up to mozilla::gl::GLContextGLX::CreateGLContext(), giving us a null "context" variable there, which is why we warn.

It seems like CreateGLContext() handles this gracefully with error checking.
NOTES on the "api" assertion (from stepping backwards in rr), which is indeed due to the GLXContext warnings:

* The variable we're asserting about ("api") is generated in WebRenderAPI::Create, which returns null because its "wrApi" variable is null:
https://dxr.mozilla.org/mozilla-central/source/gfx/webrender_bindings/WebRenderAPI.cpp#117,121-132
* "wrApi" gets created off-main-thread, and it looks like we synchronously wait for that to complete in this ^^ code.
* The off-main-thread setup code returns early when it gets here:
>   RefPtr<gl::GLContext> gl = gl::GLContextProvider::CreateForCompositorWidget(mCompositorWidget, true);
>   if (!gl || !gl->MakeCurrent()) {
>     return;
https://dxr.mozilla.org/mozilla-central/rev/e1135c6fdc9bcd80d38f7285b269e030716dcb72/gfx/webrender_bindings/WebRenderAPI.cpp#45-47

And in particular, we return because "gl" is null, because GLContextProvider::CreateForCompositorWidget returns null, because of the issue highlighted by the GLXContext warnings discussed in comment 4.

SO: it seems like we need to fix things such that GLContextGLX::CreateGLContext() can return something non-null under rr, OR we need to fix things such that we can usefully proceed past this "api" assertion, when "api" is null.
(In reply to Daniel Holbert [:dholbert] from comment #5)
> SO: it seems like we need to fix things such that
> GLContextGLX::CreateGLContext() can return something non-null under rr, OR
> we need to fix things such that we can usefully proceed past this "api"
> assertion, when "api" is null.

It seems to me that either of these options will effectively result in disabling webrender, in which case you might as well not put --enable-webrender in your mozconfig to start off with. It's true that we need better runtime checking of whether the system will support WebRender, but right now we don't have that in place. (Is there any particular reason you're trying to run WR under RR, anyway?).
(That being said we should still try to figure out why we can't create a GL context under rr when we can without rr, which seems to be the root of the problem here)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)
> It seems to me that either of these options will effectively result in
> disabling webrender

I was worried about that, yeah.  Seems like that's still fine, though, as long as we spam a terminal warning to let the user (developer) know that they're not getting the WebRender-enabled experience that they might be expecting.

> (Is there any particular reason you're trying to run WR under RR, anyway?)

I was trying to capture an intermittent test-failure that (in linux at least) basically only happens in WebRender builds. (bug 1340441)
(In reply to Daniel Holbert [:dholbert] from comment #8)
> I was worried about that, yeah.  Seems like that's still fine, though, as
> long as we spam a terminal warning to let the user (developer) know that
> they're not getting the WebRender-enabled experience that they might be
> expecting.

I filed bug 1343345 for more graceful handling of this. We can leave this bug for tracking the root cause (rr causing the GL context creation failure).

> I was trying to capture an intermittent test-failure that (in linux at
> least) basically only happens in WebRender builds. (bug 1340441)

Ah, I see. If the bug is only manifesting in WebRender-enabled builds, then it's quite possible that it's a bug in the WebRender integration code. However it looks like it's also happening on other (non-linux) platforms - if it's the same bug then you might be able to reproduce it with rr's chaos mode on a non-webrender linux build.
Yeah we need to implement falling back gracefully when WebRender initialization fails or when rendering breaks for whatever reason by the time it ships. I'm not in a hurry, though. Right now it's good that things blow up loud and clear if WebRender isn't used while we are expecting it to.
Flags: needinfo?(nical.bugzilla)
Whiteboard: [gfx-noted]
You need to log in before you can comment on or make changes to this bug.