Closed Bug 1001853 Opened 11 years ago Closed 11 years ago

Hang in gfx::SurfaceStream_TripleBuffer_Async::WaitForCompositor()

Categories

(Core :: Graphics: Layers, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla32

People

(Reporter: gfritzsche, Unassigned)

Details

(Keywords: hang)

Attachments

(1 file)

Kairo found a fun bug that freezes the browser:
* on desktop often (confirmed on Linux & OS X)
* on geeksphones all the time

frame #0: 0x00007fff954d0716 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff90a4ac3b libsystem_pthread.dylib`_pthread_cond_wait + 727
    frame #2: 0x00000001012319ea libnss3.dylib`pt_TimedWait + 234
    frame #3: 0x00000001012317d2 libnss3.dylib`PR_WaitCondVar + 450
    frame #4: 0x0000000101584b2f XUL`mozilla::CondVar::Wait(this=0x000000011c0468b8, interval=100) + 127 at BlockingResourceBase.cpp:350
    frame #5: 0x00000001016aee52 XUL`mozilla::Monitor::Wait(this=0x000000011c0468a0, interval=100) + 34 at Monitor.h:47
    frame #6: 0x0000000102765104 XUL`mozilla::gfx::SurfaceStream_TripleBuffer_Async::WaitForCompositor(this=0x000000011c046800) + 100 at SurfaceStream.cpp:480
    frame #7: 0x0000000102764d95 XUL`mozilla::gfx::SurfaceStream_TripleBuffer::SwapProducer(this=0x000000011c046800, factory=0x0000000128c74140, size=0x0000000128ab8d8c) + 149 at SurfaceStream.cpp:437
    frame #8: 0x0000000102746b69 XUL`mozilla::gl::GLScreenBuffer::Swap(this=0x00000001271b54c0, size=0x0000000128ab8d8c) + 89 at GLScreenBuffer.cpp:425
    frame #9: 0x000000010273d73d XUL`mozilla::gl::GLScreenBuffer::PublishFrame(this=0x00000001271b54c0, size=0x0000000128ab8d8c) + 45 at GLScreenBuffer.cpp:446
    frame #10: 0x000000010273d6e9 XUL`mozilla::gl::GLContext::PublishFrame(this=0x000000011cd24000) + 153 at GLContext.cpp:1577
    frame #11: 0x0000000103e00622 XUL`mozilla::WebGLContext::PresentScreenBuffer(this=0x0000000129e2e400) + 226 at WebGLContext.cpp:1160
    frame #12: 0x0000000103e0fdcd XUL`mozilla::WebGLContextUserData::PreTransactionCallback(data=0x000000012a637e60) + 125 at WebGLContext.cpp:826
    frame #13: 0x00000001028769b9 XUL`mozilla::layers::CanvasLayer::FirePreTransactionCallback(this=0x0000000134685800) + 57 at Layers.h:1922
    frame #14: 0x000000010287497e XUL`mozilla::layers::ClientCanvasLayer::RenderLayer(this=0x0000000134685800) + 558 at ClientCanvasLayer.cpp:160
    frame #15: 0x0000000102874a8c XUL`non-virtual thunk to mozilla::layers::ClientCanvasLayer::RenderLayer(this=0x0000000134685a40) + 28 at Unified_cpp_gfx_layers1.cpp:168
    frame #16: 0x00000001028afa1a XUL`mozilla::layers::ClientContainerLayer::RenderLayer(this=0x000000011b964400) + 682 at ClientContainerLayer.h:79
    frame #17: 0x00000001028afb7c XUL`non-virtual thunk to mozilla::layers::ClientContainerLayer::RenderLayer(this=0x000000011b9646b0) + 28 at Unified_cpp_gfx_layers2.cpp:86
    frame #18: 0x00000001028afa1a XUL`mozilla::layers::ClientContainerLayer::RenderLayer(this=0x000000012719d400) + 682 at ClientContainerLayer.h:79
    frame #19: 0x00000001028afb7c XUL`non-virtual thunk to mozilla::layers::ClientContainerLayer::RenderLayer(this=0x000000012719d6b0) + 28 at Unified_cpp_gfx_layers2.cpp:86
    frame #20: 0x0000000102881c91 XUL`mozilla::layers::ClientLayerManager::EndTransactionInternal(this=0x000000011c90f470, aCallback=0x0000000104603df0, aCallbackData=0x00007fff5fbfc0e0, (null)=END_DEFAULT)(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) + 369 at ClientLayerManager.cpp:190
    frame #21: 0x0000000102881e4b XUL`mozilla::layers::ClientLayerManager::EndTransaction(this=0x000000011c90f470, aCallback=0x0000000104603df0, aCallbackData=0x00007fff5fbfc0e0, aFlags=END_DEFAULT)(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) + 91 at ClientLayerManager.cpp:216
    frame #22: 0x0000000104675c02 XUL`nsDisplayList::PaintForFrame(this=0x00007fff5fbfc038, aBuilder=0x00007fff5fbfc0e0, aCtx=0x0000000000000000, aForFrame=0x0000000119242458, aFlags=13) const + 2882 at nsDisplayList.cpp:1358
    frame #23: 0x00000001046750af XUL`nsDisplayList::PaintRoot(this=0x00007fff5fbfc038, aBuilder=0x00007fff5fbfc0e0, aCtx=0x0000000000000000, aFlags=13) const + 111 at nsDisplayList.cpp:1200
    frame #24: 0x0000000104693bec XUL`nsLayoutUtils::PaintFrame(aRenderingContext=0x0000000000000000, aFrame=0x0000000119242458, aDirtyRegion=0x00007fff5fbfc900, aBackstop=4294967295, aFlags=772) + 3612 at nsLayoutUtils.cpp:2794
    frame #25: 0x00000001045d0f7e XUL`PresShell::Paint(this=0x0000000114f8fc00, aViewToPaint=0x0000000119287080, aDirtyRegion=0x00007fff5fbfc900, aFlags=1) + 2078 at nsPresShell.cpp:5917
    frame #26: 0x0000000103bcd8ee XUL`nsViewManager::ProcessPendingUpdatesPaint(this=0x0000000119c234c0, aWidget=0x0000000114f8f800) + 526 at nsViewManager.cpp:451
    frame #27: 0x0000000103bcd590 XUL`nsViewManager::ProcessPendingUpdatesForView(this=0x0000000119c234c0, aView=0x0000000119287080, aFlushDirtyRegion=true) + 496 at nsViewManager.cpp:392
    frame #28: 0x0000000103bce524 XUL`nsViewManager::ProcessPendingUpdates(this=0x0000000119c234c0) + 132 at nsViewManager.cpp:1082
    frame #29: 0x00000001045ee8a3 XUL`nsRefreshDriver::Tick(this=0x0000000114f8f400, aNowEpoch=1398525358440316, aNowTime=TimeStamp at 0x00007fff5fbfcd90) + 3635 at nsRefreshDriver.cpp:1207
    frame #30: 0x00000001045f3c3c XUL`mozilla::RefreshDriverTimer::TickDriver(driver=0x0000000114f8f400, jsnow=1398525358440316, now=TimeStamp at 0x00007fff5fbfcdc8) + 92 at nsRefreshDriver.cpp:168
    frame #31: 0x00000001045f3ac6 XUL`mozilla::RefreshDriverTimer::Tick(this=0x0000000112802480) + 310 at nsRefreshDriver.cpp:160
    frame #32: 0x00000001045f3981 XUL`mozilla::RefreshDriverTimer::TimerTick(aTimer=0x0000000114e745c0, aClosure=0x0000000112802480) + 33 at nsRefreshDriver.cpp:185
    frame #33: 0x00000001016aca6d XUL`nsTimerImpl::Fire(this=0x0000000114e745c0) + 989 at nsTimerImpl.cpp:555
    frame #34: 0x00000001016ace81 XUL`nsTimerEvent::Run(this=0x0000000115007170) + 209 at nsTimerImpl.cpp:639
    frame #35: 0x00000001016a7b52 XUL`nsThread::ProcessNextEvent(this=0x0000000100328aa0, mayWait=false, result=0x00007fff5fbfd173) + 1570 at nsThread.cpp:715
    frame #36: 0x000000010159a62a XUL`NS_ProcessPendingEvents(thread=0x0000000100328aa0, timeout=20) + 154 at nsThreadUtils.cpp:210
    frame #37: 0x00000001033c8b19 XUL`nsBaseAppShell::NativeEventCallback(this=0x00000001113eba20) + 201 at nsBaseAppShell.cpp:98
    frame #38: 0x0000000103353f48 XUL`nsAppShell::ProcessGeckoEvents(aInfo=0x00000001113eba20) + 440 at nsAppShell.mm:388
    frame #39: 0x00007fff94fe5731 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
    frame #40: 0x00007fff94fd6ea2 CoreFoundation`__CFRunLoopDoSources0 + 242
    frame #41: 0x00007fff94fd662f CoreFoundation`__CFRunLoopRun + 831
    frame #42: 0x00007fff94fd60b5 CoreFoundation`CFRunLoopRunSpecific + 309
    frame #43: 0x00007fff95a14a0d HIToolbox`RunCurrentEventLoopInMode + 226
    frame #44: 0x00007fff95a14685 HIToolbox`ReceiveNextEventCommon + 173
    frame #45: 0x00007fff95a145bc HIToolbox`_BlockUntilNextEventMatchingListInModeWithFilter + 65
    frame #46: 0x00007fff88f2f3de AppKit`_DPSNextEvent + 1434
    frame #47: 0x00007fff88f2ea2b AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 122
    frame #48: 0x0000000103352e47 XUL`-[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:](self=0x0000000100322980, _cmd=0x00007fff899625c3, mask=18446744073709551615, expiration=0x422d63c37f00000d, mode=0x00007fff7a5fed00, flag='\x01') + 119 at nsAppShell.mm:165
    frame #49: 0x00007fff88f22b2c AppKit`-[NSApplication run] + 553
    frame #50: 0x0000000103354a72 XUL`nsAppShell::Run(this=0x00000001113eba20) + 162 at nsAppShell.mm:746
    frame #51: 0x0000000105065dbc XUL`nsAppStartup::Run(this=0x00000001128d8600) + 156 at nsAppStartup.cpp:278
    frame #52: 0x0000000104f2d22f XUL`XREMain::XRE_mainRun(this=0x00007fff5fbff100) + 5151 at nsAppRunner.cpp:4019
    frame #53: 0x0000000104f2da7e XUL`XREMain::XRE_main(this=0x00007fff5fbff100, argc=5, argv=0x00007fff5fbffa08, aAppData=0x00007fff5fbff398) + 734 at nsAppRunner.cpp:4088
    frame #54: 0x0000000104f2debd XUL`XRE_main(argc=5, argv=0x00007fff5fbffa08, aAppData=0x00007fff5fbff398, aFlags=0) + 77 at nsAppRunner.cpp:4300
    frame #55: 0x000000010000211f firefox`do_main(argc=5, argv=0x00007fff5fbffa08, xreDirectory=0x0000000100327340) + 1647 at nsBrowserApp.cpp:282
    frame #56: 0x0000000100001643 firefox`main(argc=5, argv=0x00007fff5fbffa08) + 323 at nsBrowserApp.cpp:643
Keywords: hang
BenWa, do you have an idea who could take a look at this?
This is a WebGL application triggering a complete freeze of the browser.
Flags: needinfo?(bgirard)
Maybe bug 947227 never got fixed properly?
Flags: needinfo?(bgirard) → needinfo?(matt.woodrow)
Summary: Possible deadlock in gfx::SurfaceStream_TripleBuffer_Async::WaitForCompositor() → Hang in gfx::SurfaceStream_TripleBuffer_Async::WaitForCompositor()
Version: unspecified → Trunk
Yeah, it appears the timeout period is being ignored entirely. Possibly an nspr bug?

I'm pushing on bug 854421 as an alternative fix to this.
Flags: needinfo?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #3)
> I'm pushing on bug 854421 as an alternative fix to this.

Hmm, that sounds fairly complicated and not like something we'd uplift to older releases (including ESR or FxOS 1.4), or would we?


I find that if I delay my call of resizing the GL context to the screen size somewhat (including setting the resolution attribute on the shader, which I found is what triggers the hang) with a setTimeout, I can work around this. Do you guys still need my website for testing this? I'm eager to push that workaround for now, but I could put the current state that triggers the hang to a place where it's still accessible.
Took way too long to find this. NSPR doesn't report timeouts as an error (why?), so we just kept going around the while(mStaging) loop forever.

I don't see the point of the loop, once we've waited once it's either done the job, or it won't ever do it so we should give up.

This drops the return value of WaitForCompositor() and just check mStaging to determine if it was successful.
Attachment #8421611 - Flags: review?(jgilbert)
Attachment #8421611 - Flags: review?(jgilbert) → review+
Comment on attachment 8421611 [details] [diff] [review]
Don't loop waiting on the monitor

Review of attachment 8421611 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/SurfaceStream.cpp
@@ +477,5 @@
>  SurfaceStream_TripleBuffer_Async::WaitForCompositor()
>  {
>      PROFILER_LABEL("SurfaceStream_TripleBuffer_Async", "WaitForCompositor");
>  
> +    mMonitor.Wait(PR_MillisecondsToInterval(100));

What is this? I don't see how the caller of WaitForCompositor would expect it to give up after 100ms? What special about 100ms? This at the very least needs a comment.

If the problem is we don't get a response then the threading code is wrong and shouldn't waiting on something that wont happen.
Attachment #8421611 - Flags: review-
(In reply to Benoit Girard (:BenWa) from comment #6)
> Comment on attachment 8421611 [details] [diff] [review]
> Don't loop waiting on the monitor
> 
> Review of attachment 8421611 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/gl/SurfaceStream.cpp
> @@ +477,5 @@
> >  SurfaceStream_TripleBuffer_Async::WaitForCompositor()
> >  {
> >      PROFILER_LABEL("SurfaceStream_TripleBuffer_Async", "WaitForCompositor");
> >  
> > +    mMonitor.Wait(PR_MillisecondsToInterval(100));
> 
> What is this? I don't see how the caller of WaitForCompositor would expect
> it to give up after 100ms? What special about 100ms? This at the very least
> needs a comment.

Yeah, I can add a comment. This is just a temporary measure to prevents hangs on aurora/beta.

> 
> If the problem is we don't get a response then the threading code is wrong
> and shouldn't waiting on something that wont happen.

The compositor can fail to composite for a bunch of reasons. Trying to preempt all of them from the client side is *really* hard, and if we miss one then we deadlock.

All of this code is going to go away once bug 854421 lands anyway.
https://hg.mozilla.org/mozilla-central/rev/370d74cb269d
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Awesome to see a simple patch here! \o/
I hope we can get that uplifted to affected in-development branches when it sticks on trunk. :)
I can actually verify that in a build with this patch in, I cannot get my app to freeze the Firefox process any more! \o/


That said, in a moment I'll push the workaround to the live version(s) of my app, so that even affected builds will not reproduce with it any more, so removing the URL from this bug to not confuse someone into testing with it (if you need a version of the app that can reproduce, contact me). I just waited with this to be able to verify the patch first. :)
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: