924622 - Intermittent Android Shutdown "ABORT: mismatched CxxStackFrame ctor/dtors" [@ mozalloc_abort(char const*)]

Reporter

Description

•

11 years ago

Don't know if those profiling errors are related, but they look bad. https://tbpl.mozilla.org/php/getParsedLog.php?id=28830191&tree=Mozilla-Inbound Android 2.2 Armv6 Tegra mozilla-inbound opt test jsreftest-1 on 2013-10-08 08:07:31 PDT for push d1cd57876e48 slave: tegra-067 REFTEST FINISHED: Slowest test took 22989ms (http://10.250.49.158:30067/jsreftest/tests/jsreftest.html?test=js1_5/GC/regress-203278-2.js) REFTEST INFO | Result summary: REFTEST INFO | Successful: 49119 (49119 pass, 0 load only) REFTEST INFO | Unexpected: 0 (0 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 unexpected fixed asserts, 0 failed load, 0 exception) REFTEST INFO | Known problems: 1104 (20 known fail, 0 known asserts, 952 random, 118 skipped, 14 slow) REFTEST INFO | Total canvas count = 0 REFTEST TEST-START | Shutdown INFO | automation.py | Application ran for: 0:14:27.382977 INFO | zombiecheck | Reading PID log: /tmp/tmp5MkjLRpidlog mozcrash INFO | Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-inbound-android-armv6/1381238513/fennec-27.0a1.en-US.android-arm-armv6.crashreporter-symbols.zip /data/anr/traces.txt not found PROCESS-CRASH | Shutdown | application crashed [@ mozalloc_abort(char const*)] Crash dump filename: /tmp/tmpokHVgJ/643deeb3-a3b4-be66-34160430-4a921b7e.dmp Operating system: Android 0.0.0 Linux 2.6.32.9-00002-gd8084dc-dirty #1 SMP PREEMPT Wed Feb 2 11:32:06 PST 2011 armv7l nvidia/harmony/harmony/harmony:2.2/FRF91/20110202.102810:eng/test-keys CPU: arm 2 CPUs Crash reason: SIGSEGV Crash address: 0x0 Thread 16 (crashed) 0 libmozalloc.so!mozalloc_abort(char const*) [mozalloc_abort.cpp:d1cd57876e48 : 30 + 0x8] r4 = 0x00000000 r5 = 0xffffffff r6 = 0xafd424a8 r7 = 0xafd42550 r8 = 0x554332d9 r9 = 0x554337fc r10 = 0x00000000 fp = 0x5548f5bc sp = 0x5b3ff840 lr = 0x4498ea20 pc = 0x4498ea28 Found by: given as instruction pointer in context 1 libxul.so!NS_DebugBreak [nsDebugImpl.cpp:d1cd57876e48 : 430 + 0x6] r4 = 0x00000000 r5 = 0xffffffff r6 = 0xafd424a8 r7 = 0xafd42550 r8 = 0x554332d9 r9 = 0x554337fc r10 = 0x00000000 fp = 0x5548f5bc sp = 0x5b3ff848 pc = 0x54a19344 Found by: call frame info 2 libxul.so!mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) const [MessageChannel.cpp:d1cd57876e48 : 1539 + 0x1e] r4 = 0x4ed02860 r5 = 0x554337fc r6 = 0x5543353f r7 = 0x55433544 r8 = 0x553f900a r9 = 0x55492451 r10 = 0x5b3ffdf0 fp = 0x55492451 sp = 0x5b3ffc78 pc = 0x5468d99c Found by: call frame info 3 libxul.so!mozilla::ipc::MessageChannel::~MessageChannel() [MessageChannel.cpp:d1cd57876e48 : 79 + 0x2a] r4 = 0x57c7a730 r5 = 0x00000000 r6 = 0x55754db4 r7 = 0x00000010 r8 = 0x5af54080 r9 = 0x55492451 r10 = 0x5b3ffdf0 fp = 0x55492451 sp = 0x5b3ffcc8 pc = 0x5468dff8 Found by: call frame info 4 libxul.so!mozilla::layers::PImageBridgeParent::~PImageBridgeParent() [PImageBridgeParent.cpp:d1cd57876e48 : 82 + 0x6] r4 = 0x57c7a700 r5 = 0x00000000 r6 = 0x55754db4 r7 = 0x00000010 r8 = 0x5af54080 r9 = 0x55492451 r10 = 0x5b3ffdf0 fp = 0x55492451 sp = 0x5b3ffcd8 pc = 0x546ed2f8 Found by: call frame info 5 libxul.so!mozilla::layers::ImageBridgeParent::~ImageBridgeParent() [ImageBridgeParent.cpp:d1cd57876e48 : 57 + 0x2] r4 = 0x57c7a700 r5 = 0x00000000 r6 = 0x55754db4 r7 = 0x00000010 r8 = 0x5af54080 r9 = 0x55492451 r10 = 0x5b3ffdf0 fp = 0x55492451 sp = 0x5b3ffce0 pc = 0x54aa3bdc 10-08 08:28:46.291 E/Profiler( 1776): BPUnw: [6 total] thread_register_for_profiling(me=0x701db8, stacktop=0x5deffe4b) 10-08 08:28:46.291 E/Profiler( 1776): BPUnw: [5 total] thread_unregister_for_profiling(me=0x701db8) 10-08 08:28:46.551 E/Profiler( 1776): BPUnw: [4 total] thread_unregister_for_profiling(me=0x2f50f0) 10-08 08:28:46.561 I/Gecko ( 1776): ###!!! [MessageChannel][Parent][../../../ipc/glue/MessageChannel.cpp:79] Assertion (mCxxStackFrames.empty()) failed. mismatched CxxStackFrame ctor/dtors 10-08 08:28:46.561 I/Gecko ( 1776): MessageChannel 'backtrace': 10-08 08:28:46.561 I/Gecko ( 1776): [(0) in sync PImageBridge::Msg_Stop(actor=2147483647) ] 10-08 08:28:46.561 I/Gecko ( 1776): remote Interrupt stack guess: 0 10-08 08:28:46.561 I/Gecko ( 1776): deferred stack size: 0 10-08 08:28:46.561 I/Gecko ( 1776): out-of-turn Interrupt replies stack size: 0 10-08 08:28:46.561 I/Gecko ( 1776): Pending queue size: 0, front to back: 10-08 08:28:46.561 I/Gecko ( 1776): ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file ../../../ipc/glue/MessageChannel.cpp, line 1539 10-08 08:28:46.561 E/Gecko ( 1776): mozalloc_abort: ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file ../../../ipc/glue/MessageChannel.cpp, line 1539 10-08 08:28:47.541 I/DEBUG ( 937): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** 10-08 08:28:47.541 I/DEBUG ( 937): Build fingerprint: 'nvidia/harmony/harmony/harmony:2.2/FRF91/20110202.102810:eng/test-keys' 10-08 08:28:47.541 I/DEBUG ( 937): pid: 1776, tid: 1834 >>> org.mozilla.fennec <<< 10-08 08:28:47.541 I/DEBUG ( 937): signal 11 (SIGSEGV), fault addr 00000000 10-08 08:28:46.291 E/Profiler( 1776): BPUnw: [6 total] thread_register_for_profiling(me=0x701db8, stacktop=0x5deffe4b) 10-08 08:28:46.291 E/Profiler( 1776): BPUnw: [5 total] thread_unregister_for_profiling(me=0x701db8) 10-08 08:28:46.551 E/Profiler( 1776): BPUnw: [4 total] thread_unregister_for_profiling(me=0x2f50f0) 10-08 08:28:46.561 E/Gecko ( 1776): mozalloc_abort: ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file ../../../ipc/glue/MessageChannel.cpp, line 1539 10-08 08:28:52.591 W/InputManagerService( 1020): Got RemoteException sending setActive(false) notification to pid 1776 uid 10033

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 1

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=28831782&tree=Mozilla-Inbound

Benjamin Smedberg

Comment 2

•

11 years ago

Something is very wrong here! The gecko main thread (thread 6) is at this stack: 3 libnss3.so!PR_Wait [ptsynch.c:d1cd57876e48 : 582 + 0x6] r4 = 0x4ed397c0 r5 = 0x0021a740 r6 = 0x00000001 sp = 0x4f3198c8 pc = 0x51a1b8a8 Found by: call frame info 4 libxul.so!mozilla::ReentrantMonitor::Wait(unsigned int) [ReentrantMonitor.h:d1cd57876e48 : 89 + 0x6] r4 = 0x4f3198e4 r5 = 0x4f3198ec r6 = 0x4edb4250 sp = 0x4f3198d8 pc = 0x54156f68 Found by: call frame info 5 libxul.so!mozilla::layers::ImageBridgeChild::DestroyBridge() [ImageBridgeChild.cpp:d1cd57876e48 : 632 + 0xa] r4 = 0x4f3198e4 r5 = 0x4f3198ec r6 = 0x4edb4250 sp = 0x4f3198e0 pc = 0x54aa30d8 Found by: call frame info 6 libxul.so!mozilla::layers::ImageBridgeChild::ShutDown() [ImageBridgeChild.cpp:d1cd57876e48 : 583 + 0x2] r4 = 0x5599138c r5 = 0x4ed72550 r6 = 0x00000037 r7 = 0x000000dc r8 = 0x55754db4 sp = 0x4f319918 pc = 0x54aa3150 Found by: call frame info 7 libxul.so!gfxPlatform::Shutdown() [gfxPlatform.cpp:d1cd57876e48 : 485 + 0x2] r4 = 0x5599138c r5 = 0x4ed72550 r6 = 0x00000037 r7 = 0x000000dc r8 = 0x55754db4 sp = 0x4f319920 pc = 0x54a6febc Found by: call frame info 8 libxul.so!nsComponentManagerImpl::KnownModule::~KnownModule() [nsComponentManager.h:d1cd57876e48 : 224 + 0x2] r4 = 0x4edab0e0 r5 = 0x4ed72550 r6 = 0x00000037 r7 = 0x000000dc r8 = 0x55754db4 sp = 0x4f319928 pc = 0x54a07800 Found by: call frame info 9 libxul.so!nsAutoPtr<nsComponentManagerImpl::KnownModule>::~nsAutoPtr() [nsAutoPtr.h:d1cd57876e48 : 77 + 0x6] r4 = 0x4edab0e0 r5 = 0x4ed72550 r6 = 0x00000037 r7 = 0x000000dc r8 = 0x55754db4 sp = 0x4f319930 pc = 0x54a0ac64 Found by: call frame info 10 libxul.so!nsTArray_Impl<nsAutoPtr<nsComponentManagerImpl::KnownModule>, nsTArrayInfallibleAllocator>::Clear() [nsTArray.h:d1cd57876e48 : 534 + 0x6] r4 = 0x4ed722ac r5 = 0x4ed72550 r6 = 0x00000037 r7 = 0x000000dc r8 = 0x55754db4 sp = 0x4f319940 pc = 0x54a0aca0 Found by: call frame info 11 libxul.so!nsComponentManagerImpl::Shutdown() [nsComponentManager.cpp:d1cd57876e48 : 808 + 0x6] r4 = 0x4ed72200 r5 = 0x549dd98c r6 = 0x55754db4 r7 = 0x4f319984 r8 = 0x4edab7c0 r10 = 0x5551de64 sp = 0x4f319960 pc = 0x54a0add4 Found by: call frame info 13 libxul.so!ScopedXPCOMStartup::~ScopedXPCOMStartup() [nsAppRunner.cpp:d1cd57876e48 : 1130 + 0x6] r4 = 0x4ed3a14c r5 = 0x55754db4 r6 = 0x55980e5c r7 = 0x00000000 r8 = 0x55754db4 r9 = 0x4497ef84 r10 = 0x00000000 fp = 0x00225120 sp = 0x4f3199b0 pc = 0x53bc964c Found by: call frame info 14 libxul.so!XREMain::XRE_main(int, char**, nsXREAppData const*) [nsAppRunner.cpp:d1cd57876e48 : 3961 + 0x6] r4 = 0x4f3199f4 r5 = 0x00000000 r6 = 0x4ed3a14c r7 = 0x00000000 r8 = 0x55754db4 r9 = 0x4497ef84 r10 = 0x00000000 fp = 0x00225120 sp = 0x4f3199c8 pc = 0x53bcd3fc Found by: call frame info So we are *really* late in the shutdown process, after all threads should have been stopped, and we're now calling the XPCOM module destructors. The main thread should not be accepting any new events at this point. According to the comments in gfxPlatform::Shutdown() though, we're trying to shut down IPDL protocols at this stage by calling ImageBridgeChild::ShutDown and CompositorParent::ShutDown. This looks obviously unsafe; the code dates back to bug 763234. This needs to be happening much earlier in the shutdown process. The "mismatched CxxStackFrame" assertion probably means that we're calling a method on a dead MessageChannel, which is a member of the ImageBridgeParent which is likely dead also. But given the refcounting of ImageBridgeParent, it's not clear to me why it would be dead.

Comment hidden (Legacy TBPL/Treeherder Robot)

Phil Ringnalda (:philor)

Updated

•

11 years ago

OS: Android → All

Hardware: ARM → All

Phil Ringnalda (:philor)

Updated

•

11 years ago

Summary: Intermittent Android PROCESS-CRASH | Shutdown | application crashed [@ mozalloc_abort(char const*)] after "ABORT: mismatched CxxStackFrame ctor/dtors" → Intermittent PROCESS-CRASH | Shutdown | application crashed [@ mozalloc_abort(char const*)] after "ABORT: mismatched CxxStackFrame ctor/dtors"

Comment hidden (Legacy TBPL/Treeherder Robot)

Geoff Brown [:gbrown]

Updated

•

11 years ago

Comment 28

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=32952390&tree=B2g-Inbound

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

11 years ago

Summary: Intermittent PROCESS-CRASH | Shutdown | application crashed [@ mozalloc_abort(char const*)] after "ABORT: mismatched CxxStackFrame ctor/dtors" → Intermittent Android PROCESS-CRASH | application crashed [@ mozalloc_abort(char const*)] after "ABORT: mismatched CxxStackFrame ctor/dtors"

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 33

•

11 years ago

As Cosmin filed as bug 959080 we have also seen this at least once in our Mozmill tests. A crash report can be found here: bp-ebeafe13-adce-4637-9c55-7742c2140113. This crash affects Firefox 29.0a1 down to 27.0 builds from 20131209204824. Looks like a regression since the 26.0 release. Cosmin, please check where the crash happened and if it is somewhat reproducible.

Crash Signature: [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak | mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) const ]

status-firefox27: --- → affected

status-firefox28: --- → affected

status-firefox29: --- → affected

tracking-firefox27: --- → ?

tracking-firefox28: --- → ?

tracking-firefox29: --- → ?

Flags: needinfo?(cosmin.malutan)

Keywords: regression

Cosmin Malutan, [:cosmin-malutan]

Comment 34

•

11 years ago

This crashed during a mozilla-aurora_remote testrun, with Firefox 28.0a2 locale de, on mm-osx-107-3 node, build id 20140111004004. I will run testruns with those configurations, but I doubt it will fail.

Flags: needinfo?(cosmin.malutan)

Benjamin Smedberg

Comment 35

•

11 years ago

This assertion usually means that somebody is operating on a dead object. In this case I strongly suspect that we're dual-deleting an ImageBridgeParent object. See also bug 959080 where this is seen on mac as well, so it seems to affect every platform with OMTC enabled. Here is the full relevant stack; we're on the image-bridge child thread: 18:59:56 INFO - 0 libmozalloc.so!mozalloc_abort(char const*) [mozalloc_abort.cpp:747121b2bb50 : 30 + 0x4] 18:59:56 INFO - 1 libxul.so!NS_DebugBreak [nsDebugImpl.cpp:747121b2bb50 : 427 + 0x5] 18:59:56 INFO - 2 libxul.so!mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) const [MessageChannel.cpp:747121b2bb50 : 1703 + 0x13] 18:59:56 INFO - 3 libxul.so!mozilla::ipc::MessageChannel::~MessageChannel() [MessageChannel.cpp:747121b2bb50 : 18:59:56 INFO - 4 libxul.so!mozilla::layers::PImageBridgeParent::~PImageBridgeParent() [PImageBridgeParent.cpp:747121b2bb50 : 89 + 0x7] 18:59:56 INFO - 5 libxul.so!mozilla::layers::ImageBridgeParent::~ImageBridgeParent() [ImageBridgeParent.cpp:747121b2bb50 : 59 + 0x3] 18:59:56 INFO - 6 libxul.so!mozilla::layers::ImageBridgeParent::~ImageBridgeParent() [ImageBridgeParent.cpp:747121b2bb50 : 59 + 0x3] 18:59:56 INFO - 7 libxul.so!mozilla::detail::RefCounted<mozilla::layers::ISurfaceAllocator, (mozilla::detail::RefCountAtomicity)0>::Release() const [RefPtr.h:747121b2bb50 : 82 + 0xb] 18:59:56 INFO - 8 libxul.so!mozilla::layers::DeleteImageBridgeSync [StaticPtr.h:747121b2bb50 : 158 + 0x7] ... event loop The main thread is *very* late in shutdown, we're already destroying modules: 18:59:56 INFO - 5 libxul.so!mozilla::layers::ImageBridgeChild::DestroyBridge() [ImageBridgeChild.cpp:747121b2bb50 : 589 + 0x9] 18:59:56 INFO - 6 libxul.so!mozilla::layers::ImageBridgeChild::ShutDown() [ImageBridgeChild.cpp:747121b2bb50 : 540 + 0x3] 18:59:56 INFO - 7 libxul.so!gfxPlatform::Shutdown() [gfxPlatform.cpp:747121b2bb50 : 570 + 0x3] 18:59:56 INFO - 8 libxul.so!nsComponentManagerImpl::KnownModule::~KnownModule() [nsComponentManager.h:747121b2bb50 : 226 + 0x1] 18:59:56 INFO - 9 libxul.so!nsAutoPtr<nsComponentManagerImpl::KnownModule>::~nsAutoPtr() [nsAutoPtr.h:747121b2bb50 : 78 + 0x5] 18:59:56 INFO - 10 libxul.so!nsTArray_Impl<nsAutoPtr<nsComponentManagerImpl::KnownModule>, nsTArrayInfallibleAllocator>::Clear() [nsTArray.h:747121b2bb50 : 531 + 0x3] 18:59:56 INFO - 11 libxul.so!nsComponentManagerImpl::Shutdown() [nsComponentManager.cpp:747121b2bb50 : 790 + 0x7] 18:59:56 INFO - 12 libxul.so!mozilla::ShutdownXPCOM(nsIServiceManager*) [nsXPComInit.cpp:747121b2bb50 : 808 + 0x3]

Crash Signature: [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak | mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) const ]

status-firefox27: affected → ---

status-firefox28: affected → ---

status-firefox29: affected → ---

tracking-firefox27: ? → ---

tracking-firefox28: ? → ---

tracking-firefox29: ? → ---

Component: IPC → Graphics: Layers

Keywords: regression

Benjamin Smedberg

Comment 36

•

11 years ago

This signature is currently #8 on Firefox 27 beta on mac, but many of the crashes are an unrelated crash in plugin code, so it's hard to know what the real severity of this crash is. Shutdown crashes tend to be less important in general.

Crash Signature: [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak | mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) const ]

Benjamin Smedberg

Updated

•

11 years ago

Flags: needinfo?(nical.bugzilla)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 37

•

11 years ago

To answer Benjamin's question from bug 959080, the crash with Mozmill happens without browser.remote.tabs enabled. It's a default opt build. (In reply to Cosmin Malutan from comment #34) > This crashed during a mozilla-aurora_remote testrun, with Firefox 28.0a2 > locale de, on mm-osx-107-3 node, build id 20140111004004. This is not that helpful. Important here would be the test we are failing in.

Cosmin Malutan, [:cosmin-malutan]

Comment 38

•

11 years ago

I know it doesn't help a lot but the job is gone, it didn't failed again when I tried to reproduce it. If it will crash again I will return with more info.

Nicolas Silva [:nical]

Comment 39

•

11 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #2) > > So we are *really* late in the shutdown process, after all threads should > have been stopped, and we're now calling the XPCOM module destructors. The > main thread should not be accepting any new events at this point. I don't think this is supposed to post anything in the main thread's event loop. It does post stuff in the ImageBridgeChild thread which is a chromium-style thread, which is about to be destroyed after everything is shut down. Do you mean that the mutex or something esle here is implicitly waking up XPCOM stuff that is already dead? Or is it that we can't touch a chromium thread at this point? > > According to the comments in gfxPlatform::Shutdown() though, we're trying to > shut down IPDL protocols at this stage by calling ImageBridgeChild::ShutDown > and CompositorParent::ShutDown. This looks obviously unsafe; the code dates > back to bug 763234. This needs to be happening much earlier in the shutdown > process. I don't know much about the shutdown process, but the only requirements here is that ImageBridge's shutdown happens before CompositorParent's (because shutting down CompositorParent destroys the thread in which CompositorParent lives). And that it happens when we close the browser (We create and destroy ImageBridge along with the browser). > > The "mismatched CxxStackFrame" assertion probably means that we're calling a > method on a dead MessageChannel, which is a member of the ImageBridgeParent > which is likely dead also. But given the refcounting of ImageBridgeParent, > it's not clear to me why it would be dead. ImageBridgeParent's refcount is about to go to zero (it is happening in ImageBridgeParent::DeferredDestroy) I wonder what is closing the channel.

Flags: needinfo?(nical.bugzilla)

Benjamin Smedberg

Comment 40

•

11 years ago

> Do you mean that the mutex or something esle here is implicitly waking up > XPCOM stuff that is already dead? No. Or is it that we can't touch a chromium > thread at this point? This. No secondary threads are supposed to exist at this point; all threads are supposed to be joined or abandoned during the xpcom-shutdown-threads phase.

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

11 years ago

Summary: Intermittent Android PROCESS-CRASH | application crashed [@ mozalloc_abort(char const*)] after "ABORT: mismatched CxxStackFrame ctor/dtors" → Intermittent PROCESS-CRASH | application crashed [@ mozalloc_abort(char const*)] after "ABORT: mismatched CxxStackFrame ctor/dtors"

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 53

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=33631991&tree=Mozilla-Aurora

Comment hidden (Legacy TBPL/Treeherder Robot)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Updated

•

11 years ago

Blocks: 970640

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 67

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=34561608&tree=Mozilla-Central This continues to fail frequently. Do we care?

Flags: needinfo?(nical.bugzilla)

Flags: needinfo?(benjamin)

Benjamin Smedberg

Comment 69

•

11 years ago

Yes, it's not test-only; it's showing up in the wild as topcrash bug 970100. This is important.

Blocks: 970100

Flags: needinfo?(benjamin)

Nicolas Silva [:nical]

Comment 70

•

11 years ago

I think we do care but I am a bit swamped. From the discussion on this bug, my understanding is that gfxPlatform::ShutDown is already too late to tear down ImageBridge. It might just be a matter of finding another place to call ImageBridgeParent::ShutDown() that is during the browser's shutdown, prior to gfxPlatform::ShutDown() and at a point where we can still use a chromium thread's event loop. I don't know where that would be, though.

Nicolas Silva [:nical]

Updated

•

11 years ago

Flags: needinfo?(nical.bugzilla)

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 76

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=35066383&tree=Mozilla-Inbound

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 80

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=35360302&tree=Mozilla-Aurora

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 81

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=35433705&tree=Fx-Team (In reply to Nicolas Silva [:nical] from comment #70) > I think we do care but I am a bit swamped. From the discussion on this bug, > my understanding is that gfxPlatform::ShutDown is already too late to tear > down ImageBridge. It might just be a matter of finding another place to call > ImageBridgeParent::ShutDown() that is during the browser's shutdown, prior > to gfxPlatform::ShutDown() and at a point where we can still use a chromium > thread's event loop. I don't know where that would be, though. Any suggestions who might? :)

Flags: needinfo?(nical.bugzilla)

Comment hidden (Legacy TBPL/Treeherder Robot)

Geoff Brown [:gbrown]

Updated

•

11 years ago

Blocks: 936226

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 98

•

11 years ago

Milan, can we please find an owner for this?

Flags: needinfo?(milan)

Comment hidden (Legacy TBPL/Treeherder Robot)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 102

•

11 years ago

Benoit, let's design a shutdown sequence to take care of this (see comment 70) and fix it for 31.

Assignee: nobody → bjacob

Flags: needinfo?(nical.bugzilla)

Flags: needinfo?(milan)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 103

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=36407544&tree=Fx-Team 12:33:03 INFO - ###!!! [MessageChannel][Parent][/builds/slave/fx-team-osx64-0000000000000000/build/ipc/glue/MessageChannel.cpp:229] Assertion (mCxxStackFrames.empty()) failed. mismatched CxxStackFrame ctor/dtors 12:33:03 INFO - MessageChannel 'backtrace': 12:33:03 INFO - [(0) in sync PImageBridge::Msg_Stop(actor=2147483647) ] 12:33:03 INFO - remote Interrupt stack guess: 0 12:33:03 INFO - deferred stack size: 0 12:33:03 INFO - out-of-turn Interrupt replies stack size: 0 12:33:03 INFO - Pending queue size: 0, front to back: 12:33:03 INFO - [919] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/fx-team-osx64-0000000000000000/build/ipc/glue/MessageChannel.cpp, line 1722 12:33:03 INFO - [919] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/fx-team-osx64-0000000000000000/build/ipc/glue/MessageChannel.cpp, line 1722

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 107

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=36597974&tree=Mozilla-Central

Phil Ringnalda (:philor)

Comment 108

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=36699274&tree=Mozilla-Inbound

Comment hidden (Legacy TBPL/Treeherder Robot)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 162

•

11 years ago

We have a lot of orange in e10s caused by bug 989567, which shows up as follows: https://tbpl.mozilla.org/php/getParsedLog.php?id=37399660&tree=Holly#error0 Maybe 1/2 debug builds tests this problem. I'm told that we need to fix this bug in order to fix that. When will someone be able to work on this?

Comment hidden (Legacy TBPL/Treeherder Robot)

Seth Fowler [:seth] [:s2h]

Comment 173

•

11 years ago

I hit this just now on OS X 10.9. I was debugging another intermittent orange at the time. I had chaos mode enabled, and I maxed out my CPU cores using the following command: > parallel yes {1} '>' /dev/null ::: A B C D E F G H I J K L M N O P I'm posting this in the hopes that it will help someone reproduce this bug.

Seth Fowler [:seth] [:s2h]

Comment 174

•

11 years ago

Just hit it again after running the test I'm debugging several more times. It seems pretty easy to reproduce this way.

Comment hidden (Legacy TBPL/Treeherder Robot)

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 179

•

11 years ago

Milan, was there a specific reason why this was assigned to me? This looks like a better fit for nical, who already commented above, and who wrote the patch on bug 763234 that is referred to in comment 2 above. Un-assigning myself to reflect current reality --- feel free to reassign to me.

Assignee: bjacob → nobody

Flags: needinfo?(milan)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 180

•

11 years ago

Will assign somebody by tomorrow.

Flags: needinfo?(milan)

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Updated

•

11 years ago

Assignee: nobody → nical.bugzilla

Nicolas Silva [:nical]

Comment 182

•

11 years ago

Attached patch Shut down gfx IPDL protocols before the shutdown of XPCOM threads (obsolete) — Details — Splinter Review

I am new to XPCOM stuff so I took a rather naive approach, don't hesitate to tell me if It's completely wrong, there might be dependencies that I am not seeing. looking at stack traces and at nsXPComInit.cpp, I see that gfxPlatform::Shutdown happens in (nsComponentManagerImpl::gComponentManager)->Shutdown(); which takes place after nsThreadManager::get()->Shutdown(); In ShutdownXPCOM. That's pretty much what Benjamin points out in comment 2. So I added an Observer on NS_XPCOM_WILL_SHUTDOWN_OBSERVER_ID which is triggered before nsThreadManager shutdown, and moved the gfx ipdl related stuff there.

Attachment #8404598 - Flags: review?(benjamin)

Comment hidden (Legacy TBPL/Treeherder Robot)

Benjamin Smedberg

Comment 186

•

11 years ago

Comment on attachment 8404598 [details] [diff] [review] Shut down gfx IPDL protocols before the shutdown of XPCOM threads I recommend using "xpcom-shutdown" instead of "xpcom-will-shutdown". You also don't need mGfxIpcShutdownObserver at all here: the observer service holds a strong ref so holding another ref is unnecessary. Also note that bholley just changed the ordering of module destructors here in bug 913138; that's not sufficient for this bug but you should be aware of it.

Attachment #8404598 - Flags: review?(benjamin) → review+

Nicolas Silva [:nical]

Comment 187

•

11 years ago

try push with a bunch of m4 jobs on osx https://tbpl.mozilla.org/?tree=Try&rev=2bcdeb0c90f4

Comment hidden (Legacy TBPL/Treeherder Robot)

(no longer active)

Updated

•

11 years ago

Blocks: 986113

(no longer active)

Comment 192

•

11 years ago

This blocks a 1.4 blocker (bug 986113).

blocking-b2g: --- → 1.4?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 193

•

11 years ago

I pushed this patch to holly and I think it still has some issues. I see crashes like this: https://tbpl.mozilla.org/php/getParsedLog.php?id=37591939&tree=Holly#error0 It looks like maybe an issue where code is running after the image bridge is shut down but it still expects the image bridge to be alive? There's also a stack where it looks like we're still trying to do IPC too late: https://tbpl.mozilla.org/php/getParsedLog.php?id=37590588&full=1&branch=holly#error0

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 194

•

11 years ago

Attached patch sleep — Details — Splinter Review

Oh, I should mention that holly is used for e10s testing. I can reproduce the second issue by applying this patch on top of the main one in this bug. Then start a browser, open a new e10s window from the file menu (assuming OMTC is enabled), and quit. I see a crash every time on shutdown.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 195

•

11 years ago

I also see problems on Windows. https://tbpl.mozilla.org/php/getParsedLog.php?id=37605735&tree=Holly

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 197

•

11 years ago

(In reply to Bill McCloskey (:billm) from comment #193) > I pushed this patch to holly and I think it still has some issues. I see > crashes like this: > > https://tbpl.mozilla.org/php/getParsedLog.php?id=37591939&tree=Holly#error0 > > It looks like maybe an issue where code is running after the image bridge is > shut down but it still expects the image bridge to be alive? > > There's also a stack where it looks like we're still trying to do IPC too > late: > > https://tbpl.mozilla.org/php/getParsedLog. > php?id=37590588&full=1&branch=holly#error0 Now its media that is shutting down after ImageBridge which is a problem.

Nicolas Silva [:nical]

Comment 198

•

11 years ago

Same thing for Widget. I have a patch that moves the shutdown of Media and widget to NS_XPCOM_WILL_SHUTDOWN_OBSERVER_ID. Now I am also hitting a problem with resources that need ImageBridge to get deallocated and that are cycle collected in nsCycleCollector_shutdown which happens way after the shutdown of XPCOM threads. I suppose it won't be possible to shutdown the cycle collector sooner, but perhaps we could trigger a cycle collection between NS_XPCOM_WILL_SHUTDOWN_OBSERVER_ID and NS_XPCOM_SHUTDOWN_OBSERVER_ID? Chris, is it ok to move the shutdown of Media to NS_XPCOM_WILL_SHUTDOWN_OBSERVER_ID?

Flags: needinfo?(cpearce)

Nicolas Silva [:nical]

Comment 199

•

11 years ago

Attached patch Shut down gfx IPDL protocols after Media and Widget, and before the shutdown of XPCOM threads (obsolete) — Details — Splinter Review

Attachment #8404598 - Attachment is obsolete: true

Nicolas Silva [:nical]

Comment 200

•

11 years ago

Attached patch Shut down gfx IPDL protocols after Media and Widget, and before the shutdown of XPCOM threads (obsolete) — Details — Splinter Review

This patch seems to fix it for me. It does the following things: * Have Gfx ipc shut down at NS_XPCOM_SHUTDOWN_OBSERVER_ID so that it happens before XPCOM's thread shutdown * Move Media's shut down to NS_XPCOM_WILL_SHUTDOWN_OBSERVER_ID so that it gets destroyed before gfx's ipc. * Move Widget's shut down to NS_XPCOM_WILL_SHUTDOWN_OBSERVER_ID so that it gets destroyed before gfx's ipc. * Make it possible for ImageContainer to be cycle collected after ImageBridgeChild's destruction without crashing by adding some checks in ImageBridge. If this happens, the gfx IPDL actors already have cleaned up the memory so nothing gets leaked. I asked a lot of reviewers because this fiddles with several modules: Benjamin for the nsXPCOmInit stuff in general, Chris for moving media's shutdown, Sotaro for the ImageBridgeChild stuff and Matt for Widget.

Attachment #8405225 - Attachment is obsolete: true

Attachment #8405252 - Flags: review?(sotaro.ikeda.g)

Attachment #8405252 - Flags: review?(matt.woodrow)

Attachment #8405252 - Flags: review?(cpearce)

Attachment #8405252 - Flags: review?(benjamin)

Nicolas Silva [:nical]

Comment 201

•

11 years ago

try push https://tbpl.mozilla.org/?tree=Try&rev=e8bc19cfcacc

Comment hidden (Legacy TBPL/Treeherder Robot)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 205

•

11 years ago

(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #192) > This blocks a 1.4 blocker (bug 986113). I don't see how an intermittent failure in a debug test would stop us from shipping a phone. If there is an underlying problem, let's open a bug for it with an STR. Using 1.4+ to mean "this is important to work on now" is wrong.

(no longer active)

Comment 206

•

11 years ago

(In reply to Milan Sreckovic [:milan] from comment #205) > (In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment > #192) > > This blocks a 1.4 blocker (bug 986113). > > I don't see how an intermittent failure in a debug test would stop us from > shipping a phone. If there is an underlying problem, let's open a bug for > it with an STR. Using 1.4+ to mean "this is important to work on now" is > wrong. Please see bug 986113 comment 26. As far as I understand, while this bug is intermittently hit on our testing infrastructure, it's 100% reproducible on e10s and maybe on b2g too. Bug 986113 basically is us detecting several pending tasks during shutdown, and this is one of them while there are others to fix too (please see the dependency list of that bug.)

Preeti Raghunath(:Preeti)

Comment 207

•

11 years ago

Blocks a blocker

blocking-b2g: 1.4? → 1.4+

Benjamin Smedberg

Comment 208

•

11 years ago

Comment on attachment 8405252 [details] [diff] [review] Shut down gfx IPDL protocols after Media and Widget, and before the shutdown of XPCOM threads diff --git a/gfx/layers/ipc/ImageBridgeChild.h b/gfx/layers/ipc/ImageBridgeChild.h + * check whether it is in the IBC thread and dispatching itself in the IBC thread + * if is is not is dangerous if we don't also check that ImageBridge is created. Something's wrong with this sentence. I'm a little concerned with this in general. What exactly is the dependency of media and widget on the gfx shutdown sequence? Rather than expressing this dependency by moving things around in will-shutdown/shutdown/shutdown-threads, can we explicitly call from XPCOM shutdown into the gfx shutdown code at the correct time (after xpcom-shutdown, before xpcom-shutdown-threads)? I continue to resist the urge to do a full-blown dependency system for startup and shutdown; I'd prefer to make as the shutdown sequence explicitly-ordered if that won't cause other problems.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 209

•

11 years ago

(In reply to Preeti Raghunath(:Preeti) from comment #207) > Blocks a blocker Not on b2g, see below. (In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #206) > ... > > Please see bug 986113 comment 26. As far as I understand, while this bug is > intermittently hit on our testing infrastructure, it's 100% reproducible on > e10s and maybe on b2g too. Bug 986113 basically is us detecting several > pending tasks during shutdown, and this is one of them while there are > others to fix too (please see the dependency list of that bug.) "... and maybe on b2g too" and we are making it a blocker for b2g 1.4 release? What is the STR for reproducing this on a b2g device? 100% or otherwise?

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

11 years ago

blocking-b2g: 1.4+ → 1.4?

(no longer active)

Comment 210

•

11 years ago

(In reply to Milan Sreckovic [:milan] from comment #209) > (In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment > #206) > > ... > > > > Please see bug 986113 comment 26. As far as I understand, while this bug is > > intermittently hit on our testing infrastructure, it's 100% reproducible on > > e10s and maybe on b2g too. Bug 986113 basically is us detecting several > > pending tasks during shutdown, and this is one of them while there are > > others to fix too (please see the dependency list of that bug.) > > "... and maybe on b2g too" and we are making it a blocker for b2g 1.4 > release? > > What is the STR for reproducing this on a b2g device? 100% or otherwise? I don't have the answer to this question (and I'm just echoing what Bill said anyway, so needinfo-ing him). Note that due to the nature of bug 986113, we need to fix a number of bugs and gradually make our way towards making this assertion not fire. In an ideal world, we'd know exactly which bugs those are, but we don't have any way of knowing that.

Flags: needinfo?(wmccloskey)

Comment hidden (Legacy TBPL/Treeherder Robot)

Chris Peterson [:cpeterson]

Comment 219

•

11 years ago

Mass tracking-e10s flag change. Filter bugmail on "2be0fcce-e36a-4e2c-aa80-0e3d33eb5406".

tracking-e10s: --- → +

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 220

•

11 years ago

I don't know of any way to trigger this on builds that we ship to users. Bug 986113 is about an assertion, so it only triggers on debug builds. Gregor marked it as blocking, presumably because it's a priority to get debug builds tested on tinderbox. It's pretty easy to make the work_queue_.empty() assertion trip in debug builds. I think if you apply the patch I've posted here and then kill the parent during the sleep call, we'll get the work_queue_.empty() pretty consistently.

Flags: needinfo?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 221

•

11 years ago

Nical, I pushed the new patch to holly. https://tbpl.mozilla.org/?tree=Holly&rev=d41bfe60b03d I still see a lot of problems in debug runs during shutdown.

Comment hidden (Legacy TBPL/Treeherder Robot)

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 225

•

11 years ago

Comment on attachment 8405252 [details] [diff] [review] Shut down gfx IPDL protocols after Media and Widget, and before the shutdown of XPCOM threads Review of attachment 8405252 [details] [diff] [review]: ----------------------------------------------------------------- I think you're better off having a single xpcom-shutdown observer that is responsible for explicitly shutting down these interdependent things in the correct order. I bet there's others we'll discover in future, maybe like the WebRTC media code. i.e. a single observer that calls MediaShutdownManager::Shutdown(), then CompositorParent::ShutDown(), and then, ImageBridgeChild::ShutDown().

Attachment #8405252 - Flags: review?(cpearce) → review-

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 226

•

11 years ago

(In reply to Nicolas Silva [:nical] from comment #198) > Chris, is it ok to move the shutdown of Media to > NS_XPCOM_WILL_SHUTDOWN_OBSERVER_ID? I don't think this is a problem per se, but as I said above, I think explicitly codifying the shutdown dependency order in a single xpcom-shutdown observer is the way most robust solution here.

Flags: needinfo?(cpearce)

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 228

•

11 years ago

(In reply to Chris Pearce (:cpearce) from comment #225) > Comment on attachment 8405252 [details] [diff] [review] > Shut down gfx IPDL protocols after Media and Widget, and before the shutdown > of XPCOM threads > > Review of attachment 8405252 [details] [diff] [review]: > ----------------------------------------------------------------- > > I think you're better off having a single xpcom-shutdown observer that is > responsible for explicitly shutting down these interdependent things in the > correct order. I bet there's others we'll discover in future, maybe like the > WebRTC media code. > > i.e. a single observer that calls MediaShutdownManager::Shutdown(), then > CompositorParent::ShutDown(), and then, ImageBridgeChild::ShutDown(). Sounds good, where would this observer be? in xpcom/?

Nicolas Silva [:nical]

Comment 229

•

11 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #208) > I'm a little concerned with this in general. What exactly is the dependency > of media and widget on the gfx shutdown sequence? Among other things, Widget owns the LayerManager and the GLContext that is used by the compositor. when the nsBaseWidget object is destroyed there is a lot of gfx/ipc stuff going on to go through all of the layers resources and cleanup things that have a GL resource. This really needs to happen before we loose the ability to use ImageBridge/CompositorParent. Note that destroying a widget is not always part of gecko's shutdown, it also happens any time you close a window. Media holds strong references to resources allocated by the gfx IPDL protocols (in order to be able to share the video frames with the compositor without making copies), and more generally talks directly to the ImageBridge (to send video frames to the compositor, allocate them, destroy them, etc.). > Rather than expressing > this dependency by moving things around in > will-shutdown/shutdown/shutdown-threads, can we explicitly call from XPCOM > shutdown into the gfx shutdown code at the correct time (after > xpcom-shutdown, before xpcom-shutdown-threads)? > > I continue to resist the urge to do a full-blown dependency system for > startup and shutdown; I'd prefer to make as the shutdown sequence > explicitly-ordered if that won't cause other problems. Sounds like what Chris asked, I am happy with this although I am not sure where that code should live. At this point I suppose it would make sense to just call these shutdown functions directly from ShutdownXPCOM, but I can Imagine people not liking the idea.

Nicolas Silva [:nical]

Comment 230

•

11 years ago

(In reply to Bill McCloskey (:billm) from comment #221) > Nical, I pushed the new patch to holly. > https://tbpl.mozilla.org/?tree=Holly&rev=d41bfe60b03d > I still see a lot of problems in debug runs during shutdown. Looking at them: It happens in some IPDL event-lopp code being run in a thread while XPCOM is doing the threads shutdown on the main thread. I don't know which IPDL thread is running that late but it should not be ImageBridge nor Compositor parent since the while point of the patch is to make sure those two are cleaned up earlier, so i'd track it as another bug. There is also ASSERTION: unexpected event topic: 'strcmp(aTopic, NS_XPCOM_WILL_SHUTDOWN_OBSERVER_ID) == 0' which is a mistake in my patch (I was supposed to check NS_XPCOM_SHUTDOWN_OBSERVER_ID in gfxPlatform instead).

Nicolas Silva [:nical]

Comment 231

•

11 years ago

Attached patch Shut down gfx IPDL protocols after Media and Widget, and before the shutdown of XPCOM threads (obsolete) — Details — Splinter Review

v2: gfx ipc shutdown notifies NS_XPCOM_GFX_IPC_SHUTDOWN_OBSERVER_ID so that modules that depend on gfx's ipc get a chance to shutdown in time.

Attachment #8405252 - Attachment is obsolete: true

Attachment #8405252 - Flags: review?(sotaro.ikeda.g)

Attachment #8405252 - Flags: review?(matt.woodrow)

Attachment #8405252 - Flags: review?(benjamin)

Attachment #8406047 - Flags: review?(sotaro.ikeda.g)

Attachment #8406047 - Flags: review?(matt.woodrow)

Attachment #8406047 - Flags: review?(cpearce)

Attachment #8406047 - Flags: review?(benjamin)

Nicolas Silva [:nical]

Comment 232

•

11 years ago

try push: https://tbpl.mozilla.org/?tree=Try&rev=a8339515ee15

Benjamin Smedberg

Comment 233

•

11 years ago

Comment on attachment 8406047 [details] [diff] [review] Shut down gfx IPDL protocols after Media and Widget, and before the shutdown of XPCOM threads This is still pretty generic. Rather than having a new observer topic can we just explicitly call a static MediaShutdownManager::Shutdown() method from http://hg.mozilla.org/mozilla-central/annotate/215080b813a7/xpcom/build/nsXPComInit.cpp#l878 ?

Nicolas Silva [:nical]

Comment 234

•

11 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #233) > Comment on attachment 8406047 [details] [diff] [review] > Shut down gfx IPDL protocols after Media and Widget, and before the shutdown > of XPCOM threads > > This is still pretty generic. Rather than having a new observer topic can we > just explicitly call a static MediaShutdownManager::Shutdown() method from > http://hg.mozilla.org/mozilla-central/annotate/215080b813a7/xpcom/build/ > nsXPComInit.cpp#l878 ? For media it's easy since MediaShutdownManager is a singleton. For widget, it's more annoying because each widget has its own observer listening to the shutdown event. I don't think it's worth the effort of keeping track of every widget manually just for the sake of shutting them down manually. We can add a specific observer topic for widgets, that we'd fire explicitly in ShutdownXPCOM. But then we wouldn't gain much compared with the current patch.

Flags: needinfo?(cpearce)

Nicolas Silva [:nical]

Comment 235

•

11 years ago

woops the needinfo was not intentional

Flags: needinfo?(cpearce)

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 239

•

11 years ago

Attached patch v3:Shut down gfx IPDL protocols after Media and Widget, and before the shutdown of XPCOM threads (obsolete) — Details — Splinter Review

v3, simpler, invoke gfx's ipc shutdown directly in ShutdownXPCOM and doesn't modify widget and media.

Attachment #8406047 - Attachment is obsolete: true

Attachment #8406047 - Flags: review?(sotaro.ikeda.g)

Attachment #8406047 - Flags: review?(matt.woodrow)

Attachment #8406047 - Flags: review?(cpearce)

Attachment #8406047 - Flags: review?(benjamin)

Attachment #8406262 - Flags: review?(sotaro.ikeda.g)

Attachment #8406262 - Flags: review?(benjamin)

Benjamin Smedberg

Updated

•

11 years ago

Attachment #8406262 - Flags: review?(benjamin) → review+

Comment hidden (Legacy TBPL/Treeherder Robot)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 245

•

11 years ago

It looks like the IsCreated() assertion is still failing: https://tbpl.mozilla.org/php/getParsedLog.php?id=37790722&tree=Holly

Sotaro Ikeda [:sotaro]

Comment 246

•

11 years ago

Comment on attachment 8406262 [details] [diff] [review] v3:Shut down gfx IPDL protocols after Media and Widget, and before the shutdown of XPCOM threads Review of attachment 8406262 [details] [diff] [review]: ----------------------------------------------------------------- Looks good. Criteria of adding IsCreated() checks to ImageBridgeChild seem not clear. Can you explain about it?

Attachment #8406262 - Flags: review?(sotaro.ikeda.g) → review+

Comment hidden (Legacy TBPL/Treeherder Robot)

Sotaro Ikeda [:sotaro]

Comment 248

•

11 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #246) > Comment on attachment 8406262 [details] [diff] [review] > v3:Shut down gfx IPDL protocols after Media and Widget, and before the > shutdown of XPCOM threads > > Review of attachment 8406262 [details] [diff] [review]: > ----------------------------------------------------------------- > > Looks good. Criteria of adding IsCreated() checks to ImageBridgeChild seem > not clear. Can you explain about it? It seems like adding IsCreated() to all ImageBridgeChild functions except start up static functions.

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 254

•

11 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #248) > (In reply to Sotaro Ikeda [:sotaro] from comment #246) > > Comment on attachment 8406262 [details] [diff] [review] > > v3:Shut down gfx IPDL protocols after Media and Widget, and before the > > shutdown of XPCOM threads > > > > Review of attachment 8406262 [details] [diff] [review]: > > ----------------------------------------------------------------- > > > > Looks good. Criteria of adding IsCreated() checks to ImageBridgeChild seem > > not clear. Can you explain about it? > > It seems like adding IsCreated() to all ImageBridgeChild functions except > start up static functions. Something like that. I added IsCreated checks to static methods that we may (unintentionally) end up calling after ImageBridge is destroyed (example, some media resources holding on to ImageClients and TextureClients that are cycle collected after the shutdown of ImageBridge). I tried to fiddle with cycle collection to make sure all of these resources are collected earlier than the gfx ipc shutdown, but I didn't manage to get rid of all of it. So the safest route is to make sure these ImageBridge methods don't crash Gecko when they are called too late. The alternative would have been to force the callers to check ImageBridgeChild::IsCreate() before doing anything with the ImageBridge but it's a bit heavy and over time people will forget to do it.

Nicolas Silva [:nical]

Comment 255

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/f52300725e6f

Carsten Book [:Tomcat]

Comment 256

•

11 years ago

(In reply to Nicolas Silva [:nical] from comment #255) > https://hg.mozilla.org/integration/mozilla-inbound/rev/f52300725e6f sorry had to backout for failures like https://tbpl.mozilla.org/php/getParsedLog.php?id=37828798&tree=Mozilla-Inbound

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 258

•

11 years ago

I have been a bit to optimistic when adding that assertion (that caused the backout) removed it and relanded. https://hg.mozilla.org/integration/mozilla-inbound/rev/e17d61d6acf6 I am a bit worried about failing that much though, I'll look into it.

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 268

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/e17d61d6acf6 When you're reasonably comfortable with this patch's bake time, we definitely want this uplifted as far as Release Management will let it go.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla31

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

11 years ago

status-firefox29: --- → affected

status-firefox30: --- → affected

status-firefox31: --- → fixed

status-firefox-esr24: --- → unaffected

Preeti Raghunath(:Preeti)

Updated

•

11 years ago

blocking-b2g: 1.4? → backlog

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 272

•

11 years ago

(In reply to TBPL Robot from comment #271) That was on m-c tip :(

Status: RESOLVED → REOPENED

status-firefox31: fixed → affected

Resolution: FIXED → ---

Target Milestone: mozilla31 → ---

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 302

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=37888549&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=37880403&tree=Mozilla-Inbound

Comment hidden (Legacy TBPL/Treeherder Robot)

Phil Ringnalda (:philor)

Comment 304

•

11 years ago

Very reluctantly backed out in https://hg.mozilla.org/mozilla-central/rev/26209c172150 because we're pretty sure that made it considerably worse, rather than better.

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 317

•

11 years ago

(In reply to Phil Ringnalda (:philor) from comment #304) > Very reluctantly backed out in > https://hg.mozilla.org/mozilla-central/rev/26209c172150 because we're pretty > sure that made it considerably worse, rather than better. :( These crash are definitely happening before the shutdown of XPCOM threads. Benjamin, are there other ways to hit this assertion you can think of ?

Flags: needinfo?(benjamin)

Benjamin Smedberg

Comment 318

•

11 years ago

Reading through the log of https://tbpl.mozilla.org/php/getParsedLog.php?id=37889317&full=1&branch=mozilla-inbound The gecko main thread (thread 6) is in: 4 libxul.so!mozilla::layers::ImageBridgeChild::DestroyBridge() [ImageBridgeChild.cpp:ebcacae1532c : 618 + 0xa] 5 libxul.so!mozilla::layers::ImageBridgeChild::ShutDown() [ImageBridgeChild.cpp:ebcacae1532c : 570 + 0x2] 6 libxul.so!mozilla::ShutdownXPCOM(nsIServiceManager*) [nsXPComInit.cpp:ebcacae1532c : 750 + 0x2] The crashing thread (thread 25) is at: http://hg.mozilla.org/integration/mozilla-inbound/file/ebcacae1532c/ipc/glue/MessageChannel.cpp#l228 0 libmozalloc.so!mozalloc_abort(char const*) [mozalloc_abort.cpp:ebcacae1532c : 30 + 0x8] 1 libxul.so!NS_DebugBreak [nsDebugImpl.cpp:ebcacae1532c : 421 + 0x6] 2 libxul.so!mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) const [MessageChannel.cpp:ebcacae1532c : 1722 + 0x1e] 3 libxul.so!mozilla::ipc::MessageChannel::~MessageChannel() [MessageChannel.cpp:ebcacae1532c : 229 + 0x2a] 4 libxul.so!mozilla::layers::PImageBridgeParent::~PImageBridgeParent() [PImageBridgeParent.cpp:ebcacae1532c : 93 + 0x6] 5 libxul.so!mozilla::layers::ImageBridgeParent::~ImageBridgeParent() [ImageBridgeParent.cpp:ebcacae1532c : 61 + 0x16] 6 libxul.so!mozilla::layers::ImageBridgeParent::~ImageBridgeParent() [ImageBridgeParent.cpp:ebcacae1532c : 61 + 0x2] 7 libxul.so!mozilla::AtomicRefCountedWithFinalize<mozilla::layers::ISurfaceAllocator>::Release() [AtomicRefCountedWithFinalize.h:ebcacae1532c : 46 + 0x16] 8 libxul.so!mozilla::layers::DeleteImageBridgeSync [StaticPtr.h:ebcacae1532c : 158 + 0x6] 9 libxul.so!RunnableFunction<void (*)(IPC::Channel*, int), Tuple2<IPC::Channel*, int> >::Run() + 0x1e 10 libxul.so!MessageLoop::RunTask(Task*) [message_loop.cc:ebcacae1532c : 344 + 0xa] 11 libxul.so!MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) [message_loop.cc:ebcacae1532c : 352 + 0x6] 12 libxul.so!MessageLoop::DoWork() [message_loop.cc:ebcacae1532c : 430 + 0x2] Use-after-free of the channel is almost certainly still what's happening here. Is this the compositor thread? The other gecko threads are: * thread 10 in an IPC message loop, probably the IPC thread, blocked * thread 13, the GC thread, blocked * thread 14, the JS watchdog, blocked * thread 15, 18, 19, 20, 23, 26, XPCOM threads, blocked * thread 16, the hang monitor, blocked * thread 17, the background hang monitor, blocked * thread 21, 22, a DOM worker, blocked * thread 24, a chromium thread waiting for something. I don't know what this is but it's unlikely to be a big deal So I really think the thing to do here is to fprintf(stderr) in ImageBridgeParent::~ImageBridgeParent() and see whether we are actually deleting this object more than once. AIUI there should only ever be one ImageBridgeParent object per android process, right? So if we see that printf more than once, we know we have a problem and can start taking stacks of that case. AtomicRefCountedWithFinalize::Release is a little scary, but assuming that ISurfaceAllocator::ShrinkShmemSectionHeap can't ever refcount itself I don't see any obvious refcounting errors in the code. Given that the patches make this more reproducible, has anyone tried just catching this in a debugger?

Flags: needinfo?(benjamin)

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 322

•

11 years ago

Thread 6 (main thread) is sending a synchronous proxy to thread 25 (the ImageBridgeChild thread) so the main thread is expected to be waiting in with this stack. when the other thread is there. Maybe there are still protocols managed by PImageBridge alive, and we as we destroy the ImageBridgeChild it forces the destruction of the manadged protocols which generates some messages and we destroy PImageBridgeParent while these messages are still inflight. Could that make us hit this assertion? I haven't managed to reproduce it on Linux + OMTC yet, I'll try on android.

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 332

•

11 years ago

try push with a new version of the patch that makes the ImageBridge's shutdown a bit more bullet-proof by forcing all it's managed protocols to shutdown before we destroy it, plus a hack to assert that ImageBridgeParents are not deleted more than once. https://tbpl.mozilla.org/?tree=Try&rev=5377221d5161

Comment hidden (Legacy TBPL/Treeherder Robot)

Brad Lassey [:blassey] (use needinfo?)

Updated

•

11 years ago

Blocks: e10s-m1

Comment hidden (Legacy TBPL/Treeherder Robot)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 352

•

11 years ago

Did you mean for that try push to run some tests?

Flags: needinfo?(nical.bugzilla)

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 354

•

11 years ago

(In reply to Bill McCloskey (:billm) from comment #352) > Did you mean for that try push to run some tests? heh yes :) my bad

Flags: needinfo?(nical.bugzilla)

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 375

•

11 years ago

here is the correct try push https://tbpl.mozilla.org/?tree=Try&rev=f1c288578435

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 397

•

11 years ago

Attached patch v4: Same patch + force all protocols managed by ImageBridge to shut down before ImageBridge (obsolete) — Details — Splinter Review

The only difference compared to the previous version is in StopImageBridgeSync

Attachment #8406262 - Attachment is obsolete: true

Attachment #8411681 - Flags: review?(sotaro.ikeda.g)

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 405

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=38411714&tree=Mozilla-Inbound

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 407

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=38403588&tree=Fx-Team These last two manual ones were on debug builds, but AFAICT it's the same underlying issue. Let me know if you want them in a new bug.

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 410

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=38423936&tree=Mozilla-Inbound

Comment hidden (Legacy TBPL/Treeherder Robot)

Sotaro Ikeda [:sotaro]

Updated

•

11 years ago

Attachment #8411681 - Flags: review?(sotaro.ikeda.g) → review+

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 423

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/fbb86a21aba0

Comment hidden (Legacy TBPL/Treeherder Robot)

Ed Morley [:emorley]

Comment 440

•

11 years ago

(In reply to Nicolas Silva [:nical] from comment #423) > https://hg.mozilla.org/integration/mozilla-inbound/rev/fbb86a21aba0 Backed out for frequent leaks on OS X 10.6: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=leopard.*t-1&rev=fbb86a21aba0 eg: https://tbpl.mozilla.org/php/getParsedLog.php?id=38486633&tree=Mozilla-Inbound remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/5fec13d66698

Ed Morley [:emorley]

Comment 441

•

11 years ago

Sorry and 10.8 too, there was just a backlog: https://tbpl.mozilla.org/php/getParsedLog.php?id=38491207&tree=Mozilla-Inbound

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 449

•

11 years ago

(In reply to TBPL Robot from comment #446) FWIW, this instance occurred on a push that was in between the landing in comment 423 and the backout in comment 440.

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 464

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=38560020&tree=Fx-Team

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 467

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=38577205&tree=Mozilla-Inbound

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 474

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=38604387&tree=Fx-Team

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 482

•

11 years ago

Attached patch v5: same patch with missing explicit Release() added when an ImageContainer outlives the ImageBridge (obsolete) — Details — Splinter Review

The only difference with the previous patch is the explicit Release() calls in DispatchReleaseImageClient and DispatchReleaseTextureClient. I also filed bug 1002451 to avoid falling in this trap again.

Attachment #8411681 - Attachment is obsolete: true

Attachment #8413733 - Flags: review?(sotaro.ikeda.g)

Comment hidden (Legacy TBPL/Treeherder Robot)

Sotaro Ikeda [:sotaro]

Updated

•

11 years ago

Attachment #8413733 - Flags: review?(sotaro.ikeda.g) → review+

Nicolas Silva [:nical]

Comment 485

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/8008f2e4865e

Comment hidden (Legacy TBPL/Treeherder Robot)

edmorley https://tbpl.mozilla.org/php/getParsedLog.php?id=38635420&tree=Mozilla-Inbound Android 4.0 Panda mozilla-inbound opt test plain-reftest-5 on 2014-04-28 08:06:29 revision: 8008f2e4865e slave: panda-0615 04/28/2014 08:10:01: DEBUG: 25356 ? S 0:00 python /builds/tools/buildfarm/mobile/../utils/retry.py --stderr-regexp ERROR 404: Not Found --fail-if-match wget -q -O/builds/panda-0611/buildbot.tac.new http://slavealloc.pvt.build.mozilla.org/gettac/panda-0611 04/28/2014 08:10:01: DEBUG: 25405 ? S 0:00 python /builds/tools/buildfarm/mobile/../utils/retry.py --stderr-regexp ERROR 404: Not Found --fail-if-match wget -q -O/builds/panda-0620/buildbot.tac.new http://slavealloc.pvt.build.mozilla.org/gettac/panda-0620 04/28/2014 08:10:01: DEBUG: 25407 ? S 0:00 python /builds/tools/buildfarm/mobile/../utils/retry.py --stderr-regexp ERROR 404: Not Found --fail-if-match wget -q -O/builds/panda-0616/buildbot.tac.new http://slavealloc.pvt.build.mozilla.org/gettac/panda-0616 04/28/2014 08:10:01: DEBUG: 25415 ? S 0:00 python /builds/tools/buildfarm/mobile/../utils/retry.py --stderr-regexp ERROR 404: Not Found --fail-if-match wget -q -O/builds/panda-0621/buildbot.tac.new http://slavealloc.pvt.build.mozilla.org/gettac/panda-0621 PROCESS-CRASH | Shutdown | application crashed [@ mozalloc_abort(char const*)] 04-28 08:39:23.382 I/Gecko ( 2085): [2085] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/m-in-and-000000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1731 04-28 08:39:23.382 E/Gecko ( 2085): mozalloc_abort: [2085] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/m-in-and-000000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1731 Return code: 1 04-28 08:39:23.382 I/Gecko ( 2085): [2085] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/m-in-and-000000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1731 04-28 08:39:23.382 E/Gecko ( 2085): mozalloc_abort: [2085] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/m-in-and-000000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1731

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 494

•

11 years ago

Looks like a change in the ordering of #includes cause the build to fail on some platform. This should fix it: https://hg.mozilla.org/integration/mozilla-inbound/rev/8dda04e44e3e

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 501

•

11 years ago

Backed out: https://hg.mozilla.org/integration/mozilla-inbound/rev/3aa6bab70380 Looks like we are not seeing the end of this soon.

Comment hidden (Legacy TBPL/Treeherder Robot)

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 504

•

11 years ago

There remained a diplodocus: https://hg.mozilla.org/integration/mozilla-inbound/rev/fe7f450a5435

Comment hidden (Legacy TBPL/Treeherder Robot)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 509

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/8008f2e4865e https://hg.mozilla.org/mozilla-central/rev/8dda04e44e3e https://hg.mozilla.org/mozilla-central/rev/3aa6bab70380 https://hg.mozilla.org/mozilla-central/rev/fe7f450a5435

Status: REOPENED → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla32

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 510

•

11 years ago

This got backed out.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 533

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=38758281&tree=Fx-Team

status-firefox29: affected → wontfix

status-firefox32: --- → affected

Target Milestone: mozilla32 → ---

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 542

•

11 years ago

Attached patch patch v6 (obsolete) — Details — Splinter Review

Sorry for the review spam, Sotaro. The differences with the previous version are in ImageBridgeShutdownStep1 and ImageBridgeShutdownStep2 (just renamed from StopImageBridgeSync and DeleteImageBridgeSync) with the addition of a another sync Stop ipc message sent after sync WillStop + a spin of the ImageBridge's event loop to make sure all messages have been processed before we delete ImageBridgeParent.

Attachment #8413733 - Attachment is obsolete: true

Attachment #8415320 - Flags: review?(sotaro.ikeda.g)

Sotaro Ikeda [:sotaro]

Updated

•

11 years ago

Attachment #8415320 - Flags: review?(sotaro.ikeda.g) → review+

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 544

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/d26dfd37031a

Nicolas Silva [:nical]

Comment 545

•

11 years ago

I forgot to merge the #include ordering fix into the patch before landing https://hg.mozilla.org/integration/mozilla-inbound/rev/74ef5120ae2f

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 550

•

11 years ago

Backed out for frequent Android crashes and less-frequent OSX crashes. https://hg.mozilla.org/integration/mozilla-inbound/rev/a6864d25a859 https://tbpl.mozilla.org/php/getParsedLog.php?id=38814953&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=38814012&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=38814761&tree=Mozilla-Inbound etc...

Comment hidden (Legacy TBPL/Treeherder Robot)

Ed Morley [:emorley]

Comment 582

•

11 years ago

Nicolas, do you have an idea when you might be able to look at this again? :-) This intermittent failure is #2 on OrangeFactor, and as such worthy of escalation due to https://wiki.mozilla.org/Sheriffing/Test_Disabling_Policy

Flags: needinfo?(nical.bugzilla)

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 599

•

11 years ago

(In reply to Ed Morley [:edmorley UTC+0] from comment #582) > Nicolas, do you have an idea when you might be able to look at this again? > :-) > > This intermittent failure is #2 on OrangeFactor, and as such worthy of > escalation due to https://wiki.mozilla.org/Sheriffing/Test_Disabling_Policy I am still looking at this. The fact that all of my previous attempts at fixing this failed makes me think that the real cause of the crash is (at least partly) not what I thought it was. At the moment I am out of ideas, so I am stuffing the code with logs and kicking some try pushes to see if any useful info comes out. Hard to tell how long it will take for me to better identify the problem and find a solution that makes it past inbound. I may stay silent in the bug for a few days but I am still looking into it.

Flags: needinfo?(nical.bugzilla)

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 634

•

11 years ago

My latest attempt is looking good on try so far. One of the issues I was previously hitting was that I (wrongly) assumed that after a synchronous IPDL message returns, the other side (receiver) is done handling the message, which is almost true but not quite. So doing this fails (a lot on android 4, not so much elsewhere): child->SendSomeSyncMessage() parent = nullptr; // crash because parent's dtor is racing with the end of the ipdl code that handles th reception of SomeSyncMessage https://tbpl.mozilla.org/?tree=Try&rev=d3040dd60df0

Nicolas Silva [:nical]

Comment 635

•

11 years ago

Attached patch v7 (obsolete) — Details — Splinter Review

The most important change here is that the last reference of ImageBridgeParent is nulled out on the compositor thread for the reason I presented above.

Attachment #8415320 - Attachment is obsolete: true

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 662

•

11 years ago

Comment on attachment 8417283 [details] [diff] [review] v7 Review of attachment 8417283 [details] [diff] [review]: ----------------------------------------------------------------- darn, forgot to set the r? flag

Attachment #8417283 - Flags: review?(sotaro.ikeda.g)

Comment hidden (Legacy TBPL/Treeherder Robot)

Sotaro Ikeda [:sotaro]

Updated

•

11 years ago

Attachment #8417283 - Flags: review?(sotaro.ikeda.g) → review+

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 689

•

11 years ago

The rebase was non-trivial so here is another try push https://tbpl.mozilla.org/?tree=Try&rev=7b6ecc72a79d

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 695

•

11 years ago

(In reply to Nicolas Silva [:nical] from comment #689) > The rebase was non-trivial so here is another try push > https://tbpl.mozilla.org/?tree=Try&rev=7b6ecc72a79d So green. So wow!

Ed Morley [:emorley]

Comment 696

•

11 years ago

Thank you for persisting with this :-)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 697

•

11 years ago

Nicolas, I don't see you on IRC, but hopefully you don't mind if I go ahead and push this :)

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 698

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/d7d7cc47bcc6

Flags: in-testsuite+

Keywords: checkin-needed

Comment hidden (Legacy TBPL/Treeherder Robot)

Ed Morley [:emorley]

Comment 700

•

11 years ago

There's breakage on B2G (non-unified builds): https://tbpl.mozilla.org/php/getParsedLog.php?id=39207487&tree=Mozilla-Inbound#error0 07:08:17 INFO - ../../../gecko/xpcom/build/nsXPComInit.cpp: In function 'nsresult mozilla::ShutdownXPCOM(nsIServiceManager*)': 07:08:17 ERROR - ../../../gecko/xpcom/build/nsXPComInit.cpp:797: error: 'mozilla::layers::SharedBufferManagerChild' has not been declared

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 701

•

11 years ago

Backed out for bustage :( https://hg.mozilla.org/integration/mozilla-inbound/rev/df2472ecd34f https://tbpl.mozilla.org/php/getParsedLog.php?id=39207487&tree=Mozilla-Inbound

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 708

•

11 years ago

Fixed the missing #include and landed https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=86f9003c1251

Nicolas Silva [:nical]

Comment 709

•

11 years ago

(after a good looking try push: https://tbpl.mozilla.org/?tree=Try&rev=dd1d8724cf39 in which the red stuff seems to be infrastructure problems)

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 716

•

11 years ago

I had to back this out for causing frequent mochitest-e10s-2 shutdown hangs as tracked in bug 1007284. https://hg.mozilla.org/integration/mozilla-inbound/rev/8a5a9a06f59a

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 717

•

11 years ago

But not a single Android/OSX crash on inbound while this was landed!

Comment hidden (Legacy TBPL/Treeherder Robot)

:nigelb https://tbpl.mozilla.org/php/getParsedLog.php?id=39614356&tree=Fx-Team Android 4.0 Panda fx-team opt test plain-reftest-3 on 2014-05-13 18:54:22 revision: 203a4f407bcb slave: panda-0285 PROCESS-CRASH | Shutdown | application crashed [@ mozalloc_abort(char const*)] 05-13 19:29:33.000 I/Gecko ( 2091): [2091] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/fx-team-and-000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1731 05-13 19:29:33.000 E/Gecko ( 2091): mozalloc_abort: [2091] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/fx-team-and-000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1731 Return code: 1 05-13 19:29:33.000 I/Gecko ( 2091): [2091] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/fx-team-and-000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1731 05-13 19:29:33.000 E/Gecko ( 2091): mozalloc_abort: [2091] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/fx-team-and-000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1731 ConnectionError: HTTPSConnectionPool(host='blobupload.elasticbeanstalk.com', port=443): Max retries exceeded with url: /blobs/sha512/5cd40b3324f799f591ffe48b22fad056bc5f00ebdc9f81c810abfa9bab39efc062fd0bdbe2ffa91819a52f416cc5b7aad5f428fd241a37d116537dd0672f25f0 (Caused by <class 'socket.error'>: [Errno 110] Connection timed out) ConnectionError: HTTPSConnectionPool(host='blobupload.elasticbeanstalk.com', port=443): Max retries exceeded with url: /blobs/sha512/6963a1ba519d1f7808ffa5cf3845f05244ddd74c94ae18e7fba47adb5acf59bb777ee95fdce20bcf271b4dab4bf326878ca1c512e89aa04aadc62439f4ae4e68 (Caused by <class 'socket.error'>: [Errno 110] Connection timed out) ConnectionError: HTTPSConnectionPool(host='blobupload.elasticbeanstalk.com', port=443): Max retries exceeded with url: /blobs/sha512/2dc94496d8c5ba6b13b1f6686d0edce5a461657f83be4c232da19d38b4d5315f41371dd8c002da6dbe408372407ae0b9fd2327161621aee335d1de23a9379d1b (Caused by <class 'socket.error'>: [Errno 110] Connection timed out)

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 875

•

11 years ago

Quick update: I have a patch that fixes the crash but regresses M-e10s 2. In the case of this M-e10s failure, if I intentionally leak the compositor thread, we don't crash anymore, except that the leak is caught and debug tests transform into a see of oranges. So it looks like something is still chatting with the compositor thread after we shut it down, but only in M-e10s 2 and only 40% of the time or so. when the M-e10s 2 crash happens, the crashing thread has nothing in its stack trace, and the only thread with a meaningful stack trace is the main thread, which is in ParentImpl::ShutdownBackgroundThread which is triggered by something reacting to NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID.

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 879

•

11 years ago

I tweaked the shutdown order some more: this one looks promising: https://tbpl.mozilla.org/?tree=Try&rev=031b669d1e23 (e10s M2 retriggers will confirm, and then I'll do a push with all platforms)

Nicolas Silva [:nical]

Comment 880

•

11 years ago

finger crossed https://tbpl.mozilla.org/?tree=Try&rev=fb1c8304710b

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 887

•

11 years ago

new try https://tbpl.mozilla.org/?tree=Try&rev=3c95ea504630

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 890

•

11 years ago

Attached patch v10293801 same things with a few tweaks to the shutdown order to make it pass tests. — Details — Splinter Review

Very few differences with the previous version. One of the differences is having the destruction of the compositor thread be triggered right after the NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID rather than just before (since something is still trying to interact with the thread in some way when reacting to this event). Also the deletion of the ImageBridgeParent actor happens before destroying the thread, but not right after the shutdown handshake (otherwise bad timing can make us delete the actor while some ipdl code is still running).

Attachment #8417283 - Attachment is obsolete: true

Attachment #8423135 - Flags: review?(sotaro.ikeda.g)

Comment hidden (Legacy TBPL/Treeherder Robot)

Sotaro Ikeda [:sotaro]

Comment 892

•

11 years ago

Comment on attachment 8423135 [details] [diff] [review] v10293801 same things with a few tweaks to the shutdown order to make it pass tests. Review of attachment 8423135 [details] [diff] [review]: ----------------------------------------------------------------- review+ if the comments are addressed. ::: gfx/thebes/gfxPlatform.cpp @@ +505,5 @@ > gfxPrefs::DestroySingleton(); > gfxFont::DestroySingletons(); > > delete gPlatform; > gPlatform = nullptr; Is there no problem to delete the above objects before CompositorParent::ShutDown() call? ::: xpcom/build/nsXPComInit.cpp @@ +796,5 @@ > + layers::ImageBridgeChild::ShutDown(); > +#ifdef MOZ_WIDGET_GONK > + layers::SharedBufferManagerChild::ShutDown(); > +#endif > + layers::AsyncTransactionTracker::Finalize(); AsyncTransactionTracker::Finalize() need to be called after CompositorParent::ShutDown().

Attachment #8423135 - Flags: review?(sotaro.ikeda.g) → review+

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 918

•

11 years ago

Addressed sotaro's comments and landed: https://hg.mozilla.org/integration/mozilla-inbound/rev/9d4726566626

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 923

•

11 years ago

This caused mochitest crashes that contributed to a large bustage pileup on inbound and led to having to revert to a last-good cset, inconveniencing other innocent developers as well. https://hg.mozilla.org/integration/mozilla-inbound/rev/eb2a6f7785a2

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 958

•

11 years ago

yet another try push: https://tbpl.mozilla.org/?tree=Try&rev=3ea963f05d43

Comment hidden (Legacy TBPL/Treeherder Robot)

KWierso https://tbpl.mozilla.org/php/getParsedLog.php?id=39978409&tree=Fx-Team Android 4.0 Panda fx-team opt test plain-reftest-7 on 2014-05-19 16:27:27 revision: f72f42617d0e slave: panda-0291 PROCESS-CRASH | Shutdown | application crashed [@ mozalloc_abort(char const*)] 05-19 16:59:54.351 I/Gecko ( 2096): [2096] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/fx-team-and-000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1731 05-19 16:59:54.351 E/Gecko ( 2096): mozalloc_abort: [2096] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/fx-team-and-000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1731 Return code: 1 05-19 16:59:54.351 I/Gecko ( 2096): [2096] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/fx-team-and-000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1731 05-19 16:59:54.351 E/Gecko ( 2096): mozalloc_abort: [2096] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/fx-team-and-000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1731 ConnectionError: HTTPSConnectionPool(host='blobupload.elasticbeanstalk.com', port=443): Max retries exceeded with url: /blobs/sha512/f3d985f14618c5fa53613c03801bec4cbbc0faa8ce109df1980f871803efe7afc3e551cf8c4b6b6aa53cb75e9fc70d3dd23c68fc62bfef58e475d724f4a42a27 (Caused by <class 'socket.error'>: [Errno 110] Connection timed out) ConnectionError: HTTPSConnectionPool(host='blobupload.elasticbeanstalk.com', port=443): Max retries exceeded with url: /blobs/sha512/450b8af3c444f4e6ec738706120d40d2d53edfaf3308788221099f0bff2f9a2cdd61fe1646ec40828ebcf78777059defa6c59ab7c2e1b62a26299b409e52145c (Caused by <class 'socket.error'>: [Errno 110] Connection timed out) ConnectionError: HTTPSConnectionPool(host='blobupload.elasticbeanstalk.com', port=443): Max retries exceeded with url: /blobs/sha512/193b31764e2bf54edafc18b35c2ca13bdbad439ef238c723212771b7fd8dae9b3242eee8fec859730b846bf08b34c6938cc4b120906809232871d668078065cd (Caused by <class 'socket.error'>: [Errno 110] Connection timed out)

Comment hidden (Legacy TBPL/Treeherder Robot)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Updated

•

11 years ago

Blocks: e10s-gfx

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 1010

•

11 years ago

another try push: https://tbpl.mozilla.org/?tree=Try&rev=a83141e578d6 in my last try pushes I have seen a "X bad drawable" fatal assertions on M-e10s showing up, which I haven't figured out. It happens also when we don't use textureCLientX11 (so it's not related to the lifetime of a TextureClient, at least). I can also fairly easily reproduce the X11 crash without my patches so I don't know if I am making it more likely to reproduce. I didn't have that a week ago and I didn't modify anything X11 related so I'm inclined to think it's not my patches, but this got backed out so many times that I have become rather pessimistic about whether random breakage comes from this patch or not.

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 1015

•

11 years ago

Bug 1005056 is tracking the X11 issues, FWIW. They're pretty frequent in crashtest-ipc as well.

Comment hidden (Legacy TBPL/Treeherder Robot)

philor https://tbpl.mozilla.org/php/getParsedLog.php?id=40136186&tree=B2g-Inbound Android 4.0 Panda b2g-inbound opt test plain-reftest-5 on 2014-05-21 15:36:30 revision: 58aa8da5a45d slave: panda-0736 05/21/2014 15:45:04: DEBUG: 27029 ? S 0:00 python /builds/tools/buildfarm/mobile/../utils/retry.py --stderr-regexp ERROR 404: Not Found --fail-if-match wget -q -O/builds/panda-0734/buildbot.tac.new http://slavealloc.pvt.build.mozilla.org/gettac/panda-0734 05/21/2014 15:45:04: DEBUG: 27104 ? S 0:00 python /builds/tools/buildfarm/mobile/../utils/retry.py --stderr-regexp ERROR 404: Not Found --fail-if-match wget -q -O/builds/panda-0743/buildbot.tac.new http://slavealloc.pvt.build.mozilla.org/gettac/panda-0743 05/21/2014 15:45:04: DEBUG: 27106 ? S 0:00 python /builds/tools/buildfarm/mobile/../utils/retry.py --stderr-regexp ERROR 404: Not Found --fail-if-match wget -q -O/builds/panda-0739/buildbot.tac.new http://slavealloc.pvt.build.mozilla.org/gettac/panda-0739 05/21/2014 15:45:04: DEBUG: 27110 ? S 0:00 python /builds/tools/buildfarm/mobile/../utils/retry.py --stderr-regexp ERROR 404: Not Found --fail-if-match wget -q -O/builds/panda-0744/buildbot.tac.new http://slavealloc.pvt.build.mozilla.org/gettac/panda-0744 05/21/2014 15:45:04: DEBUG: 27113 ? S 0:00 python /builds/tools/buildfarm/mobile/../utils/retry.py --stderr-regexp ERROR 404: Not Found --fail-if-match wget -q -O/builds/panda-0745/buildbot.tac.new http://slavealloc.pvt.build.mozilla.org/gettac/panda-0745 PROCESS-CRASH | Shutdown | application crashed [@ mozalloc_abort(char const*)] 05-21 16:10:25.132 I/Gecko ( 2113): [2113] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/b2g-in-and-0000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1732 05-21 16:10:25.132 E/Gecko ( 2113): mozalloc_abort: [2113] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/b2g-in-and-0000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1732 Return code: 1 05-21 16:10:25.132 I/Gecko ( 2113): [2113] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/b2g-in-and-0000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1732 05-21 16:10:25.132 E/Gecko ( 2113): mozalloc_abort: [2113] ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file /builds/slave/b2g-in-and-0000000000000000000/build/ipc/glue/MessageChannel.cpp, line 1732 requests.exceptions.SSLError: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed Return code: 1

Comment hidden (Legacy TBPL/Treeherder Robot)

Nicolas Silva [:nical]

Comment 1084

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/7a05dad0a4a2

Comment hidden (Legacy TBPL/Treeherder Robot)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 1094

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/7a05dad0a4a2

Status: REOPENED → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla32

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 1095

•

11 years ago

*cues Hallelujah chorus* If anybody deserves a badge for persistence, it's you, Nicolas :)

Comment hidden (Legacy TBPL/Treeherder Robot)

(no longer active)

Comment 1097

•

11 years ago

Indeed, fantastic job! :-)

Comment hidden (Legacy TBPL/Treeherder Robot)

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 1100

•

11 years ago

\o/ Congrats! Nit: some fprintf's were accidentally added by this patch.

Comment hidden (Legacy TBPL/Treeherder Robot)

Ed Morley [:emorley]

Comment 1103

•

11 years ago

(In reply to Benoit Jacob [:bjacob] from comment #1100) > \o/ Congrats! > > Nit: some fprintf's were accidentally added by this patch.

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Comment 1104

•

11 years ago

(In reply to Benoit Jacob [:bjacob] from comment #1100) > \o/ Congrats! > > Nit: some fprintf's were accidentally added by this patch. Oh boy. --> Bug 1016321

Flags: needinfo?(nical.bugzilla)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 1105

•

11 years ago

Nicolas, can you take a look at the feasibility of uplifting this to Aurora/Beta? These crashes are pretty frequent on 30/31 as well. Thanks!

status-b2g-v1.3: --- → wontfix

status-b2g-v1.3T: --- → wontfix

status-b2g-v1.4: --- → affected

status-b2g-v2.0: --- → fixed

status-firefox32: affected → fixed

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

11 years ago

Blocks: 984320

Ed Morley [:emorley]

Comment 1110

•

10 years ago

Tweaking summary to avoid false positives on TBPL.

Summary: Intermittent PROCESS-CRASH | application crashed [@ mozalloc_abort(char const*)] after "ABORT: mismatched CxxStackFrame ctor/dtors" → Intermittent Android Shutdown "ABORT: mismatched CxxStackFrame ctor/dtors" [@ mozalloc_abort(char const*)]

Sotaro Ikeda [:sotaro]

Comment 1111

•

10 years ago

(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #1105) > Nicolas, can you take a look at the feasibility of uplifting this to > Aurora/Beta? These crashes are pretty frequent on 30/31 as well. Thanks! Nical, is it possible to uplift to 30? To uplift Bug 1006957 to b2g v1.4, that bug seems to need this bug's fix.

Flags: needinfo?(nical.bugzilla)

Sotaro Ikeda [:sotaro]

Updated

•

10 years ago

Blocks: 1006957

Comment hidden (Legacy TBPL/Treeherder Robot)

Sotaro Ikeda [:sotaro]

Comment 1113

•

10 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #1111) > (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #1105) > > Nicolas, can you take a look at the feasibility of uplifting this to > > Aurora/Beta? These crashes are pretty frequent on 30/31 as well. Thanks! > > Nical, is it possible to uplift to 30? To uplift Bug 1006957 to b2g v1.4, > that bug seems to need this bug's fix. I changed a patch for b2g v1.4 in Bug 1006957. It does not need this patch now.

Flags: needinfo?(nical.bugzilla)

Sotaro Ikeda [:sotaro]

Updated

•

10 years ago

No longer blocks: 1006957

Nicolas Silva [:nical]

Comment 1114

•

10 years ago

Attached patch aurora uplift — Details — Splinter Review

try push https://tbpl.mozilla.org/?tree=Try&rev=bddc86e9e47a I am supposed to be on vacation, so if you want this landed before Wednesday, I'll let you check the try push and land it if it's good, otherwise I'll land it when I come back.

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 1115

•

10 years ago

All over it :)

Flags: needinfo?(ryanvm)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 1116

•

10 years ago

Comment on attachment 8430771 [details] [diff] [review] aurora uplift The Try run is solid green (modulo some suites that aren't expected to pass on !trunk). [Approval Request Comment] Bug caused by (feature/regressing bug #): Unknown User impact if declined: Hangs/crashes at shutdown. Earlier comments in this bug suggest that the crash signature has been seen in the wild. At least 5 intermittent oranges have gone away since this landed on trunk. Testing completed (on m-c, etc.): On m-c for nearly a week. Aurora patch green on Try. Risk to taking this patch (and alternatives if risky): No known regressions since this landed on trunk. Per discussion with Sotaro, it is very likely that any problems with the patch would have turned up on the Try run as our test suite has proven to be very sensitive to shutdown ordering issues. String or IDL/UUID changes made by this patch: None

Attachment #8430771 - Flags: approval-mozilla-aurora?

Flags: needinfo?(ryanvm)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 1117

•

10 years ago

There's no way we're going to have a beta patch ready in time, so marking it wontfix for Firefox 30. Still hopeful about getting this onto b2g30, though. But no need to rush that either.

status-firefox30: affected → wontfix

Comment hidden (Legacy TBPL/Treeherder Robot)

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

10 years ago

Attachment #8430771 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 1121

•

10 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/f2a6966ea5f1

status-firefox31: affected → fixed

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 1123

•

10 years ago

Bug 997699 makes getting this onto b2g30 tricky.

Comment hidden (Legacy TBPL/Treeherder Robot)

Ed Morley [:emorley]

Updated

•

10 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Comment hidden (Legacy TBPL/Treeherder Robot)

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 1138

•

10 years ago

Attached patch Destroy ImageBridge actors on the main thread — Details — Splinter Review

One thing that's happening here (I don't know if it's the only thing) is that top-level protocol actors now must be created and destroyed on the main thread. Some more details in bug 1028383 where we're adding assertions to guard that and make such bugs easier to debug. Concretely, when we're creating/destroying top-level actors on multiple threads, they can race to access the mOpenActors linked list. That's apparently a fairly recent thing from Nuwa. See bug 976479 comment 94. So this patch fixes a couple place where we were destroying ImageBridgeParent and ImageBridgeChild actors off the main thread, and adds more assertions to guard that.

Attachment #8443715 - Flags: review?(nical.bugzilla)

Attachment #8443715 - Flags: review?(matt.woodrow)

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 1139

•

10 years ago

Right so only the ImageBridgeChild actor destruction really is moved to the main thread. The parent side fix is mostly unrelated: I couldnt see a reason why we would want the mTransport to be destroyed off main thread, so I removed that, similar to what we do in CompositorParent. What do you think?

Nicolas Silva [:nical]

Updated

•

10 years ago

Attachment #8443715 - Flags: review?(nical.bugzilla) → review+

Nicolas Silva [:nical]

Updated

•

10 years ago

Assignee: nical.bugzilla → bjacob

Matt Woodrow (:mattwoodrow)

Updated

•

10 years ago

Attachment #8443715 - Flags: review?(matt.woodrow) → review+

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 1160

•

10 years ago

(In reply to Benoit Jacob [:bjacob] from comment #1138) > Created attachment 8443715 [details] [diff] [review] > Destroy ImageBridge actors on the main thread Is this waiting on something to land?

status-b2g-v1.4: affected → wontfix

status-b2g-v2.1: --- → affected

status-firefox33: --- → affected

Flags: needinfo?(bjacob)

Target Milestone: mozilla32 → ---

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 1161

•

10 years ago

Yes. I have this 10-deep patch queue fixing the whole shitdown sequence and I can't easily land it bit by bit because that only makes a different intermittent go on fire. I have to fix it all at once. My last try push was not very far from the mark, https://tbpl.mozilla.org/?tree=Try&rev=c74e623e8b70 and this new one should get us closer still: https://tbpl.mozilla.org/?tree=Try&rev=9a085d9e9f06

Flags: needinfo?(bjacob)

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

10 years ago

Blocks: 1008254

Comment hidden (Legacy TBPL/Treeherder Robot)

Ed Morley [:emorley]

Updated

•

10 years ago

Blocks: 986738

Comment hidden (Legacy TBPL/Treeherder Robot)

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 1219

•

10 years ago

Should be fixed by bug 774388. The timing seems to match: bug 774388 landed on Sunday night on inbound and the last occurrences here are from Monday on other trees.

Depends on: 774388

Benoit Jacob [:bjacob] (mostly away)

Assignee

Updated

•

10 years ago

Status: REOPENED → RESOLVED

Closed: 11 years ago → 10 years ago

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

10 years ago

status-b2g-v2.1: affected → fixed

status-firefox33: affected → fixed

Target Milestone: --- → mozilla33

Nobody; OK to take it and work on it

Updated

•

10 years ago

blocking-b2g: backlog → ---

tracking-b2g: --- → backlog

Shut down gfx IPDL protocols before the shutdown of XPCOM threads 11 years ago Nicolas Silva [:nical] 4.09 KB, patch	benjamin : review+	Details \| Diff \| Splinter Review
sleep 11 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 945 bytes, patch		Details \| Diff \| Splinter Review
Shut down gfx IPDL protocols after Media and Widget, and before the shutdown of XPCOM threads 11 years ago Nicolas Silva [:nical] 6.25 KB, patch		Details \| Diff \| Splinter Review
Shut down gfx IPDL protocols after Media and Widget, and before the shutdown of XPCOM threads 11 years ago Nicolas Silva [:nical] 15.70 KB, patch	cpearce : review-	Details \| Diff \| Splinter Review
Shut down gfx IPDL protocols after Media and Widget, and before the shutdown of XPCOM threads 11 years ago Nicolas Silva [:nical] 17.14 KB, patch		Details \| Diff \| Splinter Review
v3:Shut down gfx IPDL protocols after Media and Widget, and before the shutdown of XPCOM threads 11 years ago Nicolas Silva [:nical] 6.63 KB, patch	benjamin : review+ sotaro : review+	Details \| Diff \| Splinter Review
v4: Same patch + force all protocols managed by ImageBridge to shut down before ImageBridge 11 years ago Nicolas Silva [:nical] 17.69 KB, patch	sotaro : review+	Details \| Diff \| Splinter Review
v5: same patch with missing explicit Release() added when an ImageContainer outlives the ImageBridge 11 years ago Nicolas Silva [:nical] 18.08 KB, patch	sotaro : review+	Details \| Diff \| Splinter Review
patch v6 11 years ago Nicolas Silva [:nical] 24.38 KB, patch	sotaro : review+	Details \| Diff \| Splinter Review
v7 11 years ago Nicolas Silva [:nical] 25.97 KB, patch	sotaro : review+	Details \| Diff \| Splinter Review
v10293801 same things with a few tweaks to the shutdown order to make it pass tests. 11 years ago Nicolas Silva [:nical] 24.34 KB, patch	sotaro : review+	Details \| Diff \| Splinter Review
aurora uplift 10 years ago Nicolas Silva [:nical] 25.15 KB, patch	lsblakk : approval-mozilla-aurora+	Details \| Diff \| Splinter Review
Destroy ImageBridge actors on the main thread 10 years ago Benoit Jacob [:bjacob] (mostly away) 4.20 KB, patch	nical : review+ mattwoodrow : review+	Details \| Diff \| Splinter Review