Open Bug 1069523 Opened 10 years ago Updated 2 years ago

Frequent hang in compositor thread with DRI3 drivers

Categories

(Core :: Graphics: Layers, defect)

x86_64
Linux
defect

Tracking

()

Tracking Status
e10s - ---

People

(Reporter: johns, Assigned: handyman)

References

Details

(Whiteboard: [upstream DRI3 bug])

For the last two weeks or so I've been hitting this hang fairly frequently with hardware acceleration turned on in linux:

Parent:

> #0  0x00007ffff7bc9b2f in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
> #1  0x00007ffff7e979ae in PR_WaitCondVar (cvar=0x7fffd4021280, timeout=<optimized out>) at nsprpub/pr/src/pthreads/ptsynch.c:385
> #2  0x00007ffff22b3a26 in Wait (this=<optimized out>, aInterval=<optimized out>) at objdir/ipc/glue/../../dist/include/mozilla/CondVar.h:79
> #3  operator-> (this=<optimized out>) at ../../dist/include/mozilla/Monitor.h:40
> #4  WaitForSyncNotify (this=<optimized out>, this=<optimized out>) at ipc/glue/MessageChannel.cpp:1431
> #5  mozilla::ipc::MessageChannel::SendAndWait (this=0x7fffd400b060, aMsg=<optimized out>, aReply=0x7fffffff5a08) at ipc/glue/MessageChannel.cpp:723
> #6  0x00007ffff22b3446 in mozilla::ipc::MessageChannel::Send (this=0x7fffd400b060, aMsg=0x7fffbe07bc40, aReply=0x7fffffff5a08) at ipc/glue/MessageChannel.cpp:630
> #7  0x00007ffff242bd17 in mozilla::layers::PLayerTransactionChild::SendUpdate (this=0x7fffd4387df0, cset=..., id=<optimized out>, targetConfig=..., isFirstPaint=<optimized out>, scheduleComposite=<optimized out>, paintSequenceNumber=<optimized out>, isRepeatTransaction=<optimized out>, transactionStart=..., reply=<optimized out>) at objdir/ipc/ipdl/./PLayerTransactionChild.cpp:244
> #8  0x00007ffff288c535 in mozilla::layers::ClientLayerManager::ForwardTransaction (this=0x7fffd4a3ef80, aScheduleComposite=<optimized out>) at gfx/layers/ipc/ShadowLayers.cpp:650
> #9  0x00007ffff288b41c in mozilla::layers::ClientLayerManager::EndTransaction (this=0x7fffd4a3ef80, aCallback=0x7ffff3c7e7c0 <mozilla::FrameLayerBuilder::DrawThebesLayer(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*)>, aCallbackData=0x7fffffff7b60, aFlags=mozilla::layers::LayerManager::END_DEFAULT) at gfx/layers/client/ClientLayerManager.cpp:292
> #10 0x00007ffff3cd67ad in nsDisplayList::PaintForFrame (this=<optimized out>, aBuilder=0x7fffffff7b60, aCtx=<optimized out>, aForFrame=<optimized out>, aFlags=<optimized out>) at layout/base/nsDisplayList.cpp:1352
> #11 0x00007ffff3cf0f21 in nsLayoutUtils::PaintFrame (aRenderingContext=0x0, aFrame=0x7fffdc9bd4e0, aDirtyRegion=..., aBackstop=<optimized out>, aFlags=<optimized out>) at layout/base/nsDisplayList.cpp:1198
> #12 0x00007ffff3c57cd3 in PresShell::Paint (this=0x7fffdc735800, aViewToPaint=<optimized out>, aDirtyRegion=..., aFlags=1) at layout/base/nsPresShell.cpp:6230
> #13 0x00007ffff35caea7 in GetViewManager (aWidget=0x7fffde5078f0, this=<optimized out>) at view/nsViewManager.cpp:443
> #14 nsViewManager::ProcessPendingUpdatesForView (this=0x7fffdd816d00, aView=<optimized out>, aFlushDirtyRegion=<optimized out>) at view/nsViewManager.cpp:384
> #15 0x00007ffff3c69353 in nsRefreshDriver::Tick (this=0x7fffdc735000, aNowEpoch=<optimized out>, aNowTime=...) at layout/base/nsRefreshDriver.cpp:1341
> #16 0x00007ffff3c6af09 in TickDriver (this=0x7fffb36c6cc0, driver=<optimized out>, jsnow=<optimized out>, driver=<optimized out>, jsnow=<optimized out>, now=...) at layout/base/nsRefreshDriver.cpp:173
> #17 Tick (this=<optimized out>) at layout/base/nsRefreshDriver.cpp:164
> #18 mozilla::RefreshDriverTimer::TimerTick (aTimer=0x7fffd402128c, aClosure=<optimized out>) at layout/base/nsRefreshDriver.cpp:190
> #19 0x00007ffff1fe5087 in nsTimerEvent::Run (this=0x7fffc126c7f0) at xpcom/threads/nsTimerImpl.cpp:618
> #20 0x00007ffff1fe2811 in nsThread::ProcessNextEvent (this=0x7ffff6c4d140, aMayWait=<optimized out>, aResult=0x7fffffff9587) at xpcom/threads/nsThread.cpp:823
> #21 0x00007ffff22b6e97 in mozilla::ipc::MessagePump::Run (this=0x7fffe74674c0, aDelegate=0x7ffff6c92500) at xpcom/glue/nsThreadUtils.cpp:265
> #22 0x00007ffff35df1cc in nsBaseAppShell::Run (this=0x7fffe3cc7400) at ipc/chromium/src/base/message_loop.cc:234
> #23 0x00007ffff42dfc0e in nsAppStartup::Run (this=0x7fffe1a47060) at toolkit/components/startup/nsAppStartup.cpp:280
> #24 0x00007ffff4334abc in XREMain::XRE_main (this=0x7fffffff9a60, argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>) at toolkit/xre/nsAppRunner.cpp:4123
> #25 0x00007ffff4334f19 in XRE_main (argc=128, argv=0x17f, aAppData=0xffffffffffffffff, aFlags=<optimized out>) at toolkit/xre/nsAppRunner.cpp:4408
> #26 0x00000000004049e3 in do_main (argc=<optimized out>, argv=<optimized out>, xreDirectory=0x7ffff6c4c780) at browser/app/nsBrowserApp.cpp:282
> #27 main (argc=<optimized out>, argv=<optimized out>) at browser/app/nsBrowserApp.cpp:643

Child:

> #0  0x00007ffff1e1581d in poll () from /usr/lib/libc.so.6
> #1  0x00007ffff4a053d9 in PollWrapper (ufds=0x7fffdc715560, nfsd=4, timeout_=-1) at widget/gtk/nsAppShell.cpp:44
> #2  0x00007fffefaebf04 in ?? () from /usr/lib/libglib-2.0.so.0
> #3  0x00007fffefaec01c in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0
> #4  0x00007ffff4a0537c in nsAppShell::ProcessNextNativeEvent (this=<optimized out>, mayWait=<optimized out>) at widget/gtk/nsAppShell.cpp:156
> #5  0x00007ffff49e6597 in nsBaseAppShell::OnProcessNextEvent (this=0x7fffe1e1ea90, thr=0x7fffe8357a80, mayWait=true, recursionDepth=<optimized out>) at widget/xpwidgets/nsBaseAppShell.cpp:140
> #6  0x00007ffff49e66dd in non-virtual thunk to nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool, unsigned int) () at Unified_cpp_widget_xpwidgets0.cpp:315
> #7  0x00007ffff33e96e0 in nsThread::ProcessNextEvent (this=0x7fffe8357a80, aMayWait=true, aResult=<optimized out>) at xpcom/threads/nsThread.cpp:794
> #8  0x00007ffff36bde97 in mozilla::ipc::MessagePump::Run (this=0x7fffe83ac240, aDelegate=0x7fffffffca40) at xpcom/glue/nsThreadUtils.cpp:265
> #9  0x00007ffff49e61cc in nsBaseAppShell::Run (this=0x7fffe1e1ea90) at ipc/chromium/src/base/message_loop.cc:234
> #10 0x00007ffff36be51e in mozilla::ipc::MessagePumpForChildProcess::Run (this=<optimized out>, aDelegate=<optimized out>) at toolkit/xre/nsEmbedFunctions.cpp:713
> #11 0x00007ffff573f399 in XRE_InitChildProcess (aArgc=<optimized out>, aArgv=<optimized out>) at ipc/chromium/src/base/message_loop.cc:234
> #12 0x00000000004034f8 in content_process_main (argc=<optimized out>, argv=<optimized out>) at ipc/app/../contentproc/plugin-container.cpp:158
> #13 main (argc=-596552352, argv=0x7fffffffde38) at ipc/app/MozillaRuntimeMain.cpp:11
billm suggested I grab the compositor stack as well:

> #0  0x00007ffff7bc9b2f in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
> #1  0x00007fffed721c49 in ?? () from /usr/lib/libxcb.so.1
> #2  0x00007fffed722ea9 in xcb_wait_for_special_event () from /usr/lib/libxcb.so.1
> #3  0x00007fffeb081254 in ?? () from /usr/lib/libGL.so.1
> #4  0x00007fffeb081855 in ?? () from /usr/lib/libGL.so.1
> #5  0x00007fffeb082245 in ?? () from /usr/lib/libGL.so.1
> #6  0x00007fffd32ac187 in ?? () from /usr/lib/xorg/modules/dri/i965_dri.so
> #7  0x00007fffd32ac4b5 in ?? () from /usr/lib/xorg/modules/dri/i965_dri.so
> #8  0x00007fffd32a0f9d in ?? () from /usr/lib/xorg/modules/dri/i965_dri.so
> #9  0x00007ffff28e26aa in raw_fClear (this=<optimized out>, mask=<optimized out>, this=<optimized out>, mask=<optimized out>) at /home/nephyrin/moz/ff-neph-custom-refox/gfx/layers/../../dist/include/GLContext.h:938
> #10 operator-> (this=<optimized out>, this=<optimized out>, mask=16640, this=<optimized out>) at ../../dist/include/GLContext.h:945
> #11 mozilla::layers::CompositorOGL::BeginFrame (this=0x7fffd5d98bc0, aInvalidRegion=..., aClipRectIn=<optimized out>, aRenderBounds=..., aClipRectOut=<optimized out>, aRenderBoundsOut=<optimized out>) at /home/nephyrin/moz/moz-git-build-refox/gfx/layers/opengl/CompositorOGL.cpp:776
> #12 0x00007ffff28b4568 in mozilla::layers::LayerManagerComposite::EndTransaction (this=0x7fffd68222d0, aCallback=<optimized out>, aCallbackData=<optimized out>, aFlags=<optimized out>) at /home/nephyrin/moz/moz-git-build-refox/gfx/layers/composite/LayerManagerComposite.cpp:650
> #13 0x00007ffff28cca77 in operator-> (this=<optimized out>, aFlags=mozilla::layers::LayerManager::END_DEFAULT, this=<optimized out>) at /home/nephyrin/moz/moz-git-build-refox/gfx/layers/composite/LayerManagerComposite.cpp:210
> #14 mozilla::layers::CompositorParent::CompositeToTarget (this=0x7fffd6aac800, aTarget=0x0, aRect=<optimized out>) at /home/nephyrin/moz/moz-git-build-refox/gfx/layers/ipc/CompositorParent.cpp:706
> #15 0x00007ffff22a4437 in MessageLoop::DeferOrRunPendingTask (this=0x7fffdb0fdd28, pending_task=...) at /home/nephyrin/moz/moz-git-build-refox/ipc/chromium/src/base/message_loop.cc:362
> #16 0x00007ffff22a4b67 in MessageLoop::DoDelayedWork (this=0x7fffdb0fdd28, next_delayed_work_time=<optimized out>) at /home/nephyrin/moz/moz-git-build-refox/ipc/chromium/src/base/message_loop.cc:475
> #17 0x00007ffff22a560c in base::MessagePumpDefault::Run (this=0x7fffda594ce0, delegate=0x7fffdb0fdd28) at /home/nephyrin/moz/moz-git-build-refox/ipc/chromium/src/base/message_pump_default.cc:39
> #18 0x00007ffff22a86a7 in base::Thread::ThreadMain (this=0x7fffdbb46b80) at /home/nephyrin/moz/moz-git-build-refox/ipc/chromium/src/base/message_loop.cc:234
> #19 0x00007ffff2298207 in ThreadFunc (closure=0x7fffd0f80e4c) at /home/nephyrin/moz/moz-git-build-refox/ipc/chromium/src/base/platform_thread_posix.cc:39
> #20 0x00007ffff7bc5124 in start_thread () from /usr/lib/libpthread.so.0
> #21 0x00007ffff6ecc4bd in clone () from /usr/lib/libc.so.6
The hanging dri function is dri3_find_back, which has comment:

> Find an idle back buffer. If there isn't one, then
> wait for a present idle notify event from the X server
Assignee: nobody → davidp99
I seem to have the same issue, but it's not e10s at all for me - browser.tabs.remote.autostart is set to false

This happens on Fedora 21 x86_64 with Intel HD 3000/mobile i5 like all the time - every 5 minutes. Firefox is literally unusable. I have layers acceleration force enabled, which may or may not be related I guess.

Sometimes it hangs up the whole GNOME desktop and then if I don't shoot firefox down from a TTY, it will lock up the entire system - so this is probably a GFX driver issue and not a firefox bug (or not only).

FWIW, here is the backtrace:

> (gdb) bt
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1  0x00007fffecc2a399 in _xcb_conn_wait (c=0x7ffff6ae6000, cond=<optimized out>, vector=0x0, count=0x0)
>     at xcb_conn.c:415
> #2  0x00007fffecc2b609 in xcb_wait_for_special_event (c=0x7fffcbef35ec, c@entry=0x7ffff6ae6000, se=0x80)
>     at xcb_in.c:715
> #3  0x00007fffea7a5e14 in dri3_find_back (c=c@entry=0x7ffff6ae6000, priv=priv@entry=0x7fffcbeee180)
>     at dri3_glx.c:1191
> 
> #4  0x00007fffea7a64ac in dri3_get_buffer (format=format@entry=4107, 
>     buffer_type=buffer_type@entry=dri3_buffer_back, loaderPrivate=loaderPrivate@entry=0x7fffcbeee180, 
>     driDrawable=<optimized out>) at dri3_glx.c:1217
> #5  0x00007fffea7a6ff2 in dri3_get_buffers (driDrawable=<optimized out>, format=4107, stamp=0x7fffcba2c770, 
>     loaderPrivate=0x7fffcbeee180, buffer_mask=<optimized out>, buffers=0x7fffd40fd8b0) at dri3_glx.c:1394
> #6  0x00007fffcdbd0d77 in intel_update_image_buffers (drawable=<optimized out>, brw=<optimized out>)
>     at brw_context.c:1452
> #7  intel_update_renderbuffers (context=0x7fffcbef35ec, context@entry=0x7fffcbef5070, drawable=0x7fffcba2c740)
>     at brw_context.c:1144
> #8  0x00007fffcdbd10a5 in intel_prepare_render (brw=brw@entry=0x7fffcba02028) at brw_context.c:1165
> #9  0x00007fffcdbc5add in brw_clear (ctx=0x7fffcba02028, mask=18) at brw_clear.c:234
> #10 0x00007ffff1f2a4f9 in mozilla::layers::CompositorOGL::BeginFrame(nsIntRegion const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*) ()
>    from /home/jonas/firefox/libxul.so
> #11 0x00007ffff1f12611 in mozilla::layers::LayerManagerComposite::Render() () from /home/jonas/firefox/libxul.so
> ---Type <return> to continue, or q <return> to quit---
> #12 0x00007ffff1f12835 in mozilla::layers::LayerManagerComposite::EndTransaction(void (*)(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) () from /home/jonas/firefox/libxul.so
> #13 0x00007ffff1f128ed in mozilla::layers::LayerManagerComposite::EndEmptyTransaction(mozilla::layers::LayerManager::EndTransactionFlags) () from /home/jonas/firefox/libxul.so
> #14 0x00007ffff1f22af4 in mozilla::layers::CompositorParent::CompositeToTarget(mozilla::gfx::DrawTarget*, nsIntRect const*) () from /home/jonas/firefox/libxul.so
> #15 0x00007ffff1b82385 in MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) ()
>    from /home/jonas/firefox/libxul.so
> #16 0x00007ffff175b933 in MessageLoop::DoDelayedWork(base::TimeTicks*) () from /home/jonas/firefox/libxul.so
> #17 0x00007ffff1b82536 in base::MessagePumpDefault::Run(base::MessagePump::Delegate*) ()
>    from /home/jonas/firefox/libxul.so
> #18 0x00007ffff1b826f7 in MessageLoop::Run() () from /home/jonas/firefox/libxul.so
> #19 0x00007ffff1b865af in base::Thread::ThreadMain() () from /home/jonas/firefox/libxul.so
> #20 0x00007ffff1b77a0a in ThreadFunc(void*) () from /home/jonas/firefox/libxul.so
> #21 0x00007ffff7bc657a in start_thread (arg=0x7fffd40fe700) at pthread_create.c:310
> #22 0x00007ffff6cc853d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> (gdb)
So as a workaround, using an intel driver compiled with --disable-dri3 avoids this codepath and hang, and --disable-dri3 appears to be default on at least Arch linux. DRI3 is likely to be the default at some point in the future, however.
This seems to be the according upstream bug in the intel gfx driver: https://bugs.freedesktop.org/show_bug.cgi?id=84252
(In reply to Jonas Thiem from comment #5)
> This seems to be the according upstream bug in the intel gfx driver:
> https://bugs.freedesktop.org/show_bug.cgi?id=84252

Yes, that's definitely the issue I was seeing.
Whiteboard: [upstream DRI3 bug]
I can also confirm comment 3 that this is not e10s specific, likely just OMTC
Summary: [e10s] Frequent hang in PLayerTransactionChild::SendUpdate → Frequent hang in PLayerTransactionChild::SendUpdate
Summary: Frequent hang in PLayerTransactionChild::SendUpdate → Frequent hang in compositor thread with DRI3 drivers
See Also: → 1111329
Hangs on nouveau with dri3 and omtc enabled too.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.