Frequent hang in compositor thread with DRI3 drivers

NEW
Assigned to

Status

()

Core
Graphics: Layers
3 years ago
2 years ago

People

(Reporter: johns, Assigned: handyman)

Tracking

unspecified
x86_64
Linux
Points:
---

Firefox Tracking Flags

(e10s-)

Details

(Whiteboard: [upstream DRI3 bug])

(Reporter)

Description

3 years ago
For the last two weeks or so I've been hitting this hang fairly frequently with hardware acceleration turned on in linux:

Parent:

> #0  0x00007ffff7bc9b2f in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
> #1  0x00007ffff7e979ae in PR_WaitCondVar (cvar=0x7fffd4021280, timeout=<optimized out>) at nsprpub/pr/src/pthreads/ptsynch.c:385
> #2  0x00007ffff22b3a26 in Wait (this=<optimized out>, aInterval=<optimized out>) at objdir/ipc/glue/../../dist/include/mozilla/CondVar.h:79
> #3  operator-> (this=<optimized out>) at ../../dist/include/mozilla/Monitor.h:40
> #4  WaitForSyncNotify (this=<optimized out>, this=<optimized out>) at ipc/glue/MessageChannel.cpp:1431
> #5  mozilla::ipc::MessageChannel::SendAndWait (this=0x7fffd400b060, aMsg=<optimized out>, aReply=0x7fffffff5a08) at ipc/glue/MessageChannel.cpp:723
> #6  0x00007ffff22b3446 in mozilla::ipc::MessageChannel::Send (this=0x7fffd400b060, aMsg=0x7fffbe07bc40, aReply=0x7fffffff5a08) at ipc/glue/MessageChannel.cpp:630
> #7  0x00007ffff242bd17 in mozilla::layers::PLayerTransactionChild::SendUpdate (this=0x7fffd4387df0, cset=..., id=<optimized out>, targetConfig=..., isFirstPaint=<optimized out>, scheduleComposite=<optimized out>, paintSequenceNumber=<optimized out>, isRepeatTransaction=<optimized out>, transactionStart=..., reply=<optimized out>) at objdir/ipc/ipdl/./PLayerTransactionChild.cpp:244
> #8  0x00007ffff288c535 in mozilla::layers::ClientLayerManager::ForwardTransaction (this=0x7fffd4a3ef80, aScheduleComposite=<optimized out>) at gfx/layers/ipc/ShadowLayers.cpp:650
> #9  0x00007ffff288b41c in mozilla::layers::ClientLayerManager::EndTransaction (this=0x7fffd4a3ef80, aCallback=0x7ffff3c7e7c0 <mozilla::FrameLayerBuilder::DrawThebesLayer(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*)>, aCallbackData=0x7fffffff7b60, aFlags=mozilla::layers::LayerManager::END_DEFAULT) at gfx/layers/client/ClientLayerManager.cpp:292
> #10 0x00007ffff3cd67ad in nsDisplayList::PaintForFrame (this=<optimized out>, aBuilder=0x7fffffff7b60, aCtx=<optimized out>, aForFrame=<optimized out>, aFlags=<optimized out>) at layout/base/nsDisplayList.cpp:1352
> #11 0x00007ffff3cf0f21 in nsLayoutUtils::PaintFrame (aRenderingContext=0x0, aFrame=0x7fffdc9bd4e0, aDirtyRegion=..., aBackstop=<optimized out>, aFlags=<optimized out>) at layout/base/nsDisplayList.cpp:1198
> #12 0x00007ffff3c57cd3 in PresShell::Paint (this=0x7fffdc735800, aViewToPaint=<optimized out>, aDirtyRegion=..., aFlags=1) at layout/base/nsPresShell.cpp:6230
> #13 0x00007ffff35caea7 in GetViewManager (aWidget=0x7fffde5078f0, this=<optimized out>) at view/nsViewManager.cpp:443
> #14 nsViewManager::ProcessPendingUpdatesForView (this=0x7fffdd816d00, aView=<optimized out>, aFlushDirtyRegion=<optimized out>) at view/nsViewManager.cpp:384
> #15 0x00007ffff3c69353 in nsRefreshDriver::Tick (this=0x7fffdc735000, aNowEpoch=<optimized out>, aNowTime=...) at layout/base/nsRefreshDriver.cpp:1341
> #16 0x00007ffff3c6af09 in TickDriver (this=0x7fffb36c6cc0, driver=<optimized out>, jsnow=<optimized out>, driver=<optimized out>, jsnow=<optimized out>, now=...) at layout/base/nsRefreshDriver.cpp:173
> #17 Tick (this=<optimized out>) at layout/base/nsRefreshDriver.cpp:164
> #18 mozilla::RefreshDriverTimer::TimerTick (aTimer=0x7fffd402128c, aClosure=<optimized out>) at layout/base/nsRefreshDriver.cpp:190
> #19 0x00007ffff1fe5087 in nsTimerEvent::Run (this=0x7fffc126c7f0) at xpcom/threads/nsTimerImpl.cpp:618
> #20 0x00007ffff1fe2811 in nsThread::ProcessNextEvent (this=0x7ffff6c4d140, aMayWait=<optimized out>, aResult=0x7fffffff9587) at xpcom/threads/nsThread.cpp:823
> #21 0x00007ffff22b6e97 in mozilla::ipc::MessagePump::Run (this=0x7fffe74674c0, aDelegate=0x7ffff6c92500) at xpcom/glue/nsThreadUtils.cpp:265
> #22 0x00007ffff35df1cc in nsBaseAppShell::Run (this=0x7fffe3cc7400) at ipc/chromium/src/base/message_loop.cc:234
> #23 0x00007ffff42dfc0e in nsAppStartup::Run (this=0x7fffe1a47060) at toolkit/components/startup/nsAppStartup.cpp:280
> #24 0x00007ffff4334abc in XREMain::XRE_main (this=0x7fffffff9a60, argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>) at toolkit/xre/nsAppRunner.cpp:4123
> #25 0x00007ffff4334f19 in XRE_main (argc=128, argv=0x17f, aAppData=0xffffffffffffffff, aFlags=<optimized out>) at toolkit/xre/nsAppRunner.cpp:4408
> #26 0x00000000004049e3 in do_main (argc=<optimized out>, argv=<optimized out>, xreDirectory=0x7ffff6c4c780) at browser/app/nsBrowserApp.cpp:282
> #27 main (argc=<optimized out>, argv=<optimized out>) at browser/app/nsBrowserApp.cpp:643

Child:

> #0  0x00007ffff1e1581d in poll () from /usr/lib/libc.so.6
> #1  0x00007ffff4a053d9 in PollWrapper (ufds=0x7fffdc715560, nfsd=4, timeout_=-1) at widget/gtk/nsAppShell.cpp:44
> #2  0x00007fffefaebf04 in ?? () from /usr/lib/libglib-2.0.so.0
> #3  0x00007fffefaec01c in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0
> #4  0x00007ffff4a0537c in nsAppShell::ProcessNextNativeEvent (this=<optimized out>, mayWait=<optimized out>) at widget/gtk/nsAppShell.cpp:156
> #5  0x00007ffff49e6597 in nsBaseAppShell::OnProcessNextEvent (this=0x7fffe1e1ea90, thr=0x7fffe8357a80, mayWait=true, recursionDepth=<optimized out>) at widget/xpwidgets/nsBaseAppShell.cpp:140
> #6  0x00007ffff49e66dd in non-virtual thunk to nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool, unsigned int) () at Unified_cpp_widget_xpwidgets0.cpp:315
> #7  0x00007ffff33e96e0 in nsThread::ProcessNextEvent (this=0x7fffe8357a80, aMayWait=true, aResult=<optimized out>) at xpcom/threads/nsThread.cpp:794
> #8  0x00007ffff36bde97 in mozilla::ipc::MessagePump::Run (this=0x7fffe83ac240, aDelegate=0x7fffffffca40) at xpcom/glue/nsThreadUtils.cpp:265
> #9  0x00007ffff49e61cc in nsBaseAppShell::Run (this=0x7fffe1e1ea90) at ipc/chromium/src/base/message_loop.cc:234
> #10 0x00007ffff36be51e in mozilla::ipc::MessagePumpForChildProcess::Run (this=<optimized out>, aDelegate=<optimized out>) at toolkit/xre/nsEmbedFunctions.cpp:713
> #11 0x00007ffff573f399 in XRE_InitChildProcess (aArgc=<optimized out>, aArgv=<optimized out>) at ipc/chromium/src/base/message_loop.cc:234
> #12 0x00000000004034f8 in content_process_main (argc=<optimized out>, argv=<optimized out>) at ipc/app/../contentproc/plugin-container.cpp:158
> #13 main (argc=-596552352, argv=0x7fffffffde38) at ipc/app/MozillaRuntimeMain.cpp:11
(Reporter)

Comment 1

3 years ago
billm suggested I grab the compositor stack as well:

> #0  0x00007ffff7bc9b2f in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
> #1  0x00007fffed721c49 in ?? () from /usr/lib/libxcb.so.1
> #2  0x00007fffed722ea9 in xcb_wait_for_special_event () from /usr/lib/libxcb.so.1
> #3  0x00007fffeb081254 in ?? () from /usr/lib/libGL.so.1
> #4  0x00007fffeb081855 in ?? () from /usr/lib/libGL.so.1
> #5  0x00007fffeb082245 in ?? () from /usr/lib/libGL.so.1
> #6  0x00007fffd32ac187 in ?? () from /usr/lib/xorg/modules/dri/i965_dri.so
> #7  0x00007fffd32ac4b5 in ?? () from /usr/lib/xorg/modules/dri/i965_dri.so
> #8  0x00007fffd32a0f9d in ?? () from /usr/lib/xorg/modules/dri/i965_dri.so
> #9  0x00007ffff28e26aa in raw_fClear (this=<optimized out>, mask=<optimized out>, this=<optimized out>, mask=<optimized out>) at /home/nephyrin/moz/ff-neph-custom-refox/gfx/layers/../../dist/include/GLContext.h:938
> #10 operator-> (this=<optimized out>, this=<optimized out>, mask=16640, this=<optimized out>) at ../../dist/include/GLContext.h:945
> #11 mozilla::layers::CompositorOGL::BeginFrame (this=0x7fffd5d98bc0, aInvalidRegion=..., aClipRectIn=<optimized out>, aRenderBounds=..., aClipRectOut=<optimized out>, aRenderBoundsOut=<optimized out>) at /home/nephyrin/moz/moz-git-build-refox/gfx/layers/opengl/CompositorOGL.cpp:776
> #12 0x00007ffff28b4568 in mozilla::layers::LayerManagerComposite::EndTransaction (this=0x7fffd68222d0, aCallback=<optimized out>, aCallbackData=<optimized out>, aFlags=<optimized out>) at /home/nephyrin/moz/moz-git-build-refox/gfx/layers/composite/LayerManagerComposite.cpp:650
> #13 0x00007ffff28cca77 in operator-> (this=<optimized out>, aFlags=mozilla::layers::LayerManager::END_DEFAULT, this=<optimized out>) at /home/nephyrin/moz/moz-git-build-refox/gfx/layers/composite/LayerManagerComposite.cpp:210
> #14 mozilla::layers::CompositorParent::CompositeToTarget (this=0x7fffd6aac800, aTarget=0x0, aRect=<optimized out>) at /home/nephyrin/moz/moz-git-build-refox/gfx/layers/ipc/CompositorParent.cpp:706
> #15 0x00007ffff22a4437 in MessageLoop::DeferOrRunPendingTask (this=0x7fffdb0fdd28, pending_task=...) at /home/nephyrin/moz/moz-git-build-refox/ipc/chromium/src/base/message_loop.cc:362
> #16 0x00007ffff22a4b67 in MessageLoop::DoDelayedWork (this=0x7fffdb0fdd28, next_delayed_work_time=<optimized out>) at /home/nephyrin/moz/moz-git-build-refox/ipc/chromium/src/base/message_loop.cc:475
> #17 0x00007ffff22a560c in base::MessagePumpDefault::Run (this=0x7fffda594ce0, delegate=0x7fffdb0fdd28) at /home/nephyrin/moz/moz-git-build-refox/ipc/chromium/src/base/message_pump_default.cc:39
> #18 0x00007ffff22a86a7 in base::Thread::ThreadMain (this=0x7fffdbb46b80) at /home/nephyrin/moz/moz-git-build-refox/ipc/chromium/src/base/message_loop.cc:234
> #19 0x00007ffff2298207 in ThreadFunc (closure=0x7fffd0f80e4c) at /home/nephyrin/moz/moz-git-build-refox/ipc/chromium/src/base/platform_thread_posix.cc:39
> #20 0x00007ffff7bc5124 in start_thread () from /usr/lib/libpthread.so.0
> #21 0x00007ffff6ecc4bd in clone () from /usr/lib/libc.so.6
(Reporter)

Comment 2

3 years ago
The hanging dri function is dri3_find_back, which has comment:

> Find an idle back buffer. If there isn't one, then
> wait for a present idle notify event from the X server
(Assignee)

Updated

3 years ago
Assignee: nobody → davidp99
tracking-e10s: ? → +

Comment 3

3 years ago
I seem to have the same issue, but it's not e10s at all for me - browser.tabs.remote.autostart is set to false

This happens on Fedora 21 x86_64 with Intel HD 3000/mobile i5 like all the time - every 5 minutes. Firefox is literally unusable. I have layers acceleration force enabled, which may or may not be related I guess.

Sometimes it hangs up the whole GNOME desktop and then if I don't shoot firefox down from a TTY, it will lock up the entire system - so this is probably a GFX driver issue and not a firefox bug (or not only).

FWIW, here is the backtrace:

> (gdb) bt
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1  0x00007fffecc2a399 in _xcb_conn_wait (c=0x7ffff6ae6000, cond=<optimized out>, vector=0x0, count=0x0)
>     at xcb_conn.c:415
> #2  0x00007fffecc2b609 in xcb_wait_for_special_event (c=0x7fffcbef35ec, c@entry=0x7ffff6ae6000, se=0x80)
>     at xcb_in.c:715
> #3  0x00007fffea7a5e14 in dri3_find_back (c=c@entry=0x7ffff6ae6000, priv=priv@entry=0x7fffcbeee180)
>     at dri3_glx.c:1191
> 
> #4  0x00007fffea7a64ac in dri3_get_buffer (format=format@entry=4107, 
>     buffer_type=buffer_type@entry=dri3_buffer_back, loaderPrivate=loaderPrivate@entry=0x7fffcbeee180, 
>     driDrawable=<optimized out>) at dri3_glx.c:1217
> #5  0x00007fffea7a6ff2 in dri3_get_buffers (driDrawable=<optimized out>, format=4107, stamp=0x7fffcba2c770, 
>     loaderPrivate=0x7fffcbeee180, buffer_mask=<optimized out>, buffers=0x7fffd40fd8b0) at dri3_glx.c:1394
> #6  0x00007fffcdbd0d77 in intel_update_image_buffers (drawable=<optimized out>, brw=<optimized out>)
>     at brw_context.c:1452
> #7  intel_update_renderbuffers (context=0x7fffcbef35ec, context@entry=0x7fffcbef5070, drawable=0x7fffcba2c740)
>     at brw_context.c:1144
> #8  0x00007fffcdbd10a5 in intel_prepare_render (brw=brw@entry=0x7fffcba02028) at brw_context.c:1165
> #9  0x00007fffcdbc5add in brw_clear (ctx=0x7fffcba02028, mask=18) at brw_clear.c:234
> #10 0x00007ffff1f2a4f9 in mozilla::layers::CompositorOGL::BeginFrame(nsIntRegion const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*) ()
>    from /home/jonas/firefox/libxul.so
> #11 0x00007ffff1f12611 in mozilla::layers::LayerManagerComposite::Render() () from /home/jonas/firefox/libxul.so
> ---Type <return> to continue, or q <return> to quit---
> #12 0x00007ffff1f12835 in mozilla::layers::LayerManagerComposite::EndTransaction(void (*)(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) () from /home/jonas/firefox/libxul.so
> #13 0x00007ffff1f128ed in mozilla::layers::LayerManagerComposite::EndEmptyTransaction(mozilla::layers::LayerManager::EndTransactionFlags) () from /home/jonas/firefox/libxul.so
> #14 0x00007ffff1f22af4 in mozilla::layers::CompositorParent::CompositeToTarget(mozilla::gfx::DrawTarget*, nsIntRect const*) () from /home/jonas/firefox/libxul.so
> #15 0x00007ffff1b82385 in MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) ()
>    from /home/jonas/firefox/libxul.so
> #16 0x00007ffff175b933 in MessageLoop::DoDelayedWork(base::TimeTicks*) () from /home/jonas/firefox/libxul.so
> #17 0x00007ffff1b82536 in base::MessagePumpDefault::Run(base::MessagePump::Delegate*) ()
>    from /home/jonas/firefox/libxul.so
> #18 0x00007ffff1b826f7 in MessageLoop::Run() () from /home/jonas/firefox/libxul.so
> #19 0x00007ffff1b865af in base::Thread::ThreadMain() () from /home/jonas/firefox/libxul.so
> #20 0x00007ffff1b77a0a in ThreadFunc(void*) () from /home/jonas/firefox/libxul.so
> #21 0x00007ffff7bc657a in start_thread (arg=0x7fffd40fe700) at pthread_create.c:310
> #22 0x00007ffff6cc853d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> (gdb)
(Reporter)

Comment 4

3 years ago
So as a workaround, using an intel driver compiled with --disable-dri3 avoids this codepath and hang, and --disable-dri3 appears to be default on at least Arch linux. DRI3 is likely to be the default at some point in the future, however.

Comment 5

3 years ago
This seems to be the according upstream bug in the intel gfx driver: https://bugs.freedesktop.org/show_bug.cgi?id=84252
(Reporter)

Comment 6

3 years ago
(In reply to Jonas Thiem from comment #5)
> This seems to be the according upstream bug in the intel gfx driver:
> https://bugs.freedesktop.org/show_bug.cgi?id=84252

Yes, that's definitely the issue I was seeing.
Whiteboard: [upstream DRI3 bug]
(Reporter)

Comment 7

3 years ago
I can also confirm comment 3 that this is not e10s specific, likely just OMTC
Summary: [e10s] Frequent hang in PLayerTransactionChild::SendUpdate → Frequent hang in PLayerTransactionChild::SendUpdate
(Reporter)

Updated

3 years ago
Summary: Frequent hang in PLayerTransactionChild::SendUpdate → Frequent hang in compositor thread with DRI3 drivers

Updated

3 years ago
See Also: → bug 1111329
Duplicate of this bug: 1115019

Comment 9

2 years ago
Hangs on nouveau with dri3 and omtc enabled too.
tracking-e10s: + → -
You need to log in before you can comment on or make changes to this bug.