Closed
Bug 774622
Opened 12 years ago
Closed 12 years ago
crash with abort message: "mismatched CxxStackFrame ctor/dtors: file /builds/slave/m-cen-andrd-ntly/build/ipc/glue/RPCChannel.cpp, line 656" on quitting Nightly
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
VERIFIED
FIXED
mozilla18
People
(Reporter: AdrianT, Assigned: nical)
References
Details
(4 keywords, Whiteboard: [native-crash])
Crash Data
Attachments
(1 file, 2 obsolete files)
5.39 KB,
patch
|
cjones
:
review+
akeybl
:
approval-mozilla-aurora+
akeybl
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-2b3ea40a-be14-424e-b07d-6145d2120717 . Other crashes: https://crash-stats.mozilla.com/report/index/bp-6337ba57-4ad8-450b-9737-bd7772120717 https://crash-stats.mozilla.com/report/index/bp-ace5405c-2498-4f42-a283-ebe1c2120717 ============================================================= Nightly 16.0a1 2012-07-16 Google Nexus (Android 4.0.4) Steps to reproduce: 1) Load planet.mozilla.org. 2) Wait for the page to load. 3) Quit Firefox Actual result: A crash happens. I got 3 different crashes from 4 tries. Adding the crash signatures to the same bug as they may be related since the same steps to reproduce apply. Issue is not reproducible on Aurora ============================================================= 0 libmozalloc.so mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:19 1 libc.so libc.so@0x11f6e 2 dalvik-heap (deleted) dalvik-heap @0x7b511f 3 dalvik-mark-stack (deleted) dalvik-mark-stack @0x30eaf40 4 org.mozilla.fennec-2.apk org.mozilla.fennec-2.apk@0x124556b 5 org.mozilla.fennec-2.apk org.mozilla.fennec-2.apk@0x33696a 6 dalvik-heap (deleted) dalvik-heap @0xfad5f63 7 org.mozilla.fennec-2.apk org.mozilla.fennec-2.apk@0x703633 8 libxul.so XPCJSRuntime::Get js/xpconnect/src/xpcprivate.h:645 9 libxul.so XPCCallContext::~XPCCallContext js/xpconnect/src/XPCCallContext.cpp:302 10 libxul.so nsXPCWrappedJSClass::DelegatedQueryInterface js/xpconnect/src/XPCWrappedJSClass.cpp:745 11 @0xffffffff 12 libxul.so XPCCallContext::~XPCCallContext js/xpconnect/src/XPCCallContext.cpp:334 13 libxul.so nsXPCWrappedJSClass::DelegatedQueryInterface js/xpconnect/src/XPCWrappedJSClass.cpp:745 14 @0x5f4c5d3e 15 libmozglue.so libmozglue.so@0x7041 16 libmozglue.so libmozglue.so@0x8fbd 17 libmozglue.so libmozglue.so@0x90fd 18 libmozglue.so libmozglue.so@0x9c7b 19 libmozglue.so libmozglue.so@0x20ffa 20 libmozglue.so libmozglue.so@0x199d7 21 libmozglue.so libmozglue.so@0x7041 22 libmozglue.so libmozglue.so@0x8fbd 23 libc.so libc.so@0x12132 24 libc.so libc.so@0x436ae 25 libmozglue.so libmozglue.so@0x19b29 26 libmozglue.so libmozglue.so@0x19b41 27 libxul.so std::__node_alloc::allocate _alloc.h:158 28 libxul.so std::priv::_STLP_alloc_proxy<unsigned int, IPC::Message, std::allocator<IPC::Mes _alloc.h:307 29 @0x3 30 libxul.so mozilla::ipc::RPCChannel::DebugAbort ipc/glue/RPCChannel.cpp:656 31 libxul.so mozilla::ipc::RPCChannel::~RPCChannel ipc/glue/RPCChannel.cpp:80 32 libxul.so mozilla::layers::PCompositorParent::~PCompositorParent obj-firefox/ipc/ipdl/PCompositorParent.cpp:88 33 libxul.so mozilla::layers::CompositorParent::~CompositorParent gfx/layers/ipc/CompositorParent.cpp:106 34 libxul.so mozilla::layers::CompositorParent::~CompositorParent gfx/layers/ipc/CompositorParent.cpp:106 35 libxul.so mozilla::layers::CompositorParent::Release CompositorParent.h:58 36 libxul.so RunnableFunction<void , Tuple2<nsRefPtr<mozilla::layers::CompositorParent>, nsRe nsAutoPtr.h:874 37 libxul.so RunnableFunction<void , Tuple2<nsRefPtr<mozilla::layers::CompositorParent>, nsRe ipc/chromium/src/base/task.h:411 38 libxul.so MessageLoop::RunTask ipc/chromium/src/base/message_loop.cc:327 39 libxul.so MessageLoop::DeferOrRunPendingTask ipc/chromium/src/base/message_loop.cc:334 40 libxul.so MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:434 41 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:86 42 libxul.so MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:208 43 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:201 44 libxul.so nsBaseAppShell::Run widget/xpwidgets/nsBaseAppShell.cpp:163 45 libxul.so nsAppStartup::Run toolkit/components/startup/nsAppStartup.cpp:257 46 libxul.so XREMain::XRE_mainRun toolkit/xre/nsAppRunner.cpp:3787 47 libxul.so XREMain::XRE_main toolkit/xre/nsAppRunner.cpp:3864 48 libxul.so XRE_main toolkit/xre/nsAppRunner.cpp:3940 49 libxul.so GeckoStart toolkit/xre/nsAndroidStartup.cpp:73 50 libmozglue.so libmozglue.so@0x10aad 51 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29918e 52 libdvm.so libdvm.so@0x1ec32 53 dalvik-heap (deleted) dalvik-heap @0xcb3bae 54 libdvm.so libdvm.so@0x58eed 55 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x134e61 56 libmozglue.so libmozglue.so@0x10a5b 57 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x11d4b9 58 libc.so libc.so@0x14a43 59 libdvm.so libdvm.so@0x98e3d 60 libc.so libc.so@0x158a7 61 libmozglue.so libmozglue.so@0x10a5b 62 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x11d4b9 63 libc.so libc.so@0x158a7 64 libmozglue.so libmozglue.so@0x10a5b 65 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x11d4b9 66 libc.so libc.so@0x15f09 67 libdvm.so libdvm.so@0x5ae8d 68 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x347e 69 dalvik-heap (deleted) dalvik-heap @0xddde1e 70 dalvik-heap (deleted) dalvik-heap @0xddde1e 71 libdvm.so libdvm.so@0xa2532 72 dalvik-heap (deleted) dalvik-heap @0xddde1e 73 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x2991a2 74 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29918e 75 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29918e 76 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x148534 77 libdvm.so libdvm.so@0x8e8ab 78 libdvm.so libdvm.so@0x74897 79 dalvik-heap (deleted) dalvik-heap @0xcb3bae 80 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29918e 81 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29918e 82 libdvm.so libdvm.so@0x58d43 83 dalvik-heap (deleted) dalvik-heap @0xcb3bae 84 libdvm.so libdvm.so@0x5ac1d 85 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29918e 86 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x9ccb8 87 dalvik-heap (deleted) dalvik-heap @0xcb3bae 88 libdvm.so libdvm.so@0x1edbe 89 libdvm.so libdvm.so@0x30a8e 90 libdvm.so libdvm.so@0xb2f8e 91 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29b4b6 92 libdvm.so libdvm.so@0x34272 93 libdvm.so libdvm.so@0x5f987 94 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29b4b6 95 dalvik-heap (deleted) dalvik-heap @0xc9b946 96 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x134a9c 97 libdvm.so libdvm.so@0x6c8ff 98 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29b4b6 99 dalvik-heap (deleted) dalvik-heap @0xc9b946 100 libdvm.so libdvm.so@0xb7c56 101 libdvm.so libdvm.so@0x5f987 102 libdvm.so libdvm.so@0x6c923 103 libdvm.so libdvm.so@0xb7c56 104 libdvm.so libdvm.so@0x5f987 105 libdvm.so libdvm.so@0xb2f8e 106 libdvm.so libdvm.so@0x5fa37 107 libdvm.so libdvm.so@0x5f987 108 libc.so libc.so@0x12c4e 109 libc.so libc.so@0x127a2
Comment 1•12 years ago
|
||
(In reply to adrian tamas from comment #0) > https://crash-stats.mozilla.com/report/index/bp-6337ba57-4ad8-450b-9737-bd7772120717 This one looks like bug 763805/bug 763813.
Keywords: regression
Summary: crash in mozalloc_abort on quitting Nightly → crash with abort message: "mismatched CxxStackFrame ctor/dtors: file /builds/slave/m-cen-andrd-ntly/build/ipc/glue/RPCChannel.cpp, line 656" on quitting Nightly
Whiteboard: [native-crash]
Comment 2•12 years ago
|
||
Based on crash stats, it first appeared in 16.0a1/20120715. The regression range is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0602e44ac248&tochange=57abb5f70e01 It might be a regression from bug 745148.
Comment 4•12 years ago
|
||
Chris looks like a regression from your bug, can you take a look?
Assignee: nobody → jones.chris.g
tracking-fennec: ? → 16+
tracking-firefox16:
--- → ?
Keywords: regressionwindow-wanted
Updated•12 years ago
|
Crash Signature: [@ mozalloc_abort | XPCJSRuntime::Get]
[@ mozalloc_abort | nsAString_internal::Assign ]
[@ mozalloc_abort | nsSVGUtils::GetFontXHeight ] → [@ mozalloc_abort | XPCJSRuntime::Get]
[@ mozalloc_abort | nsAString_internal::Assign]
[@ mozalloc_abort | nsSVGUtils::GetFontXHeight]
[@ mozalloc_abort | NS_IsMainThread_P]
[@ mozalloc_abort | libwebcore.so@0x3acf63]
[@ mozalloc_abort | dalvik-bitm…
Updated•12 years ago
|
Crash Signature: dalvik-bitmap-2 (deleted)@0x14d11f]
[@ mozalloc_abort] → dalvik-bitmap-2 (deleted)@0x14d11f]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ]
Comment 6•12 years ago
|
||
The mozalloc_abort | XPCJSRuntime::Get signature is #7 top crasher in 16.0a2 and #5 in 17.0a1. This bug ranks probably higher with its other signatures that are shared with other bugs.
Keywords: topcrash
Reporter | ||
Comment 7•12 years ago
|
||
Good build: 2012-07-14 Bad build: 2012-07-15 Possible regression range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0602e44ac248&tochange=57abb5f70e01
Keywords: regressionwindow-wanted
Comment 8•12 years ago
|
||
That range has a lot of changesets in it, please use tinderbox-builds.
Keywords: regressionwindow-wanted
Comment 9•12 years ago
|
||
The presence of bug 767775 and bug 745148 in that list looks suspicious. Any thoughts, Chris?
Updated•12 years ago
|
Updated•12 years ago
|
Crash Signature: dalvik-bitmap-2 (deleted)@0x14d11f]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ] → dalvik-bitmap-2 (deleted)@0x14d11f]
[@ mozalloc_abort | nsThreadManager::GetIsMainThread]
[@ mozalloc_abort | arena_avail_tree_insert]
[@ mozalloc_abort | js::ObjectImpl::markChildren]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ]
Comment 10•12 years ago
|
||
Crashes on abort account for 4.4% of all crashes in 15.0b3 and 17.7% in 16.0a2. Guessing this bug is the only crash regression in 16.0, it accounts for around 13% of all crashes in 16.0a2, making it #1 top crasher in 16.0a2.
Comment 11•12 years ago
|
||
I've hit this crash a couple of times on the Galaxy Nexus, Android 4.1 while quitting while I was on http://buienradar.nl/ . Adrian, could you try and narrow the regression range further?
Reporter | ||
Comment 12•12 years ago
|
||
Testing tinderbox central builds: no crash: http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android/1342280061/ crash: http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android/1342285161/ I will continue to try to narrow the regression range further.
Comment 13•12 years ago
|
||
Here is the regression range of comment 12: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=011147826130&tochange=9046ecf4db8f Neither bug 767775, nor bug 745148 belong to it. I think it's a regression from bug 763234.
Reporter | ||
Comment 14•12 years ago
|
||
In inbound tinderbox builds: The good build: 2012-07-13 73e6fad01985 The bad build: 2012-07-13 f9723f26c8a9 Pushlog: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=73e6fad01985&tochange=f9723f26c8a9
Updated•12 years ago
|
Keywords: regressionwindow-wanted
Assignee | ||
Comment 16•12 years ago
|
||
Bug 763234 expects that no one uses the CompositorParents or whatever is living on the compositor thread after gfxPlatform has been destroyed. Could it be related to this?
Updated•12 years ago
|
Crash Signature: dalvik-bitmap-2 (deleted)@0x14d11f]
[@ mozalloc_abort | nsThreadManager::GetIsMainThread]
[@ mozalloc_abort | arena_avail_tree_insert]
[@ mozalloc_abort | js::ObjectImpl::markChildren]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ] → dalvik-bitmap-2 (deleted)@0x14d11f]
[@ mozalloc_abort | dalvik-bitmap-2 (deleted)@0x12011f]
[@ mozalloc_abort | 0 (deleted)@0x11f911f]
[@ mozalloc_abort | nsThreadManager::GetIsMainThread]
[@ mozalloc_abort | arena_avail_tree_insert]
[@ mozalloc_abor…
Comment 20•12 years ago
|
||
In 16.0a2/20120819, it accounts for 46% of all crashes.
Comment 21•12 years ago
|
||
(In reply to Scoobidiver from comment #20) > In 16.0a2/20120819, it accounts for 46% of all crashes. I'll ping cjones now and see if anybody else may have more time to look at this.
Assignee | ||
Comment 22•12 years ago
|
||
I am looking at it. I had troubles debugging this until recently because I couldn't get symbols to show up in gdb (see bug 781668). If it is very important that we get a temporary hack to remove the crash ASAP, one (very bad) possibility is to leak CompositorParent. CompositorParent is created a startup and destroyed when exiting firefox. The crash happens in its destructor.
Has anyone been able to create reliable STR for this bug? (In reply to Nicolas Silva [:nical] from comment #22) > If it is very important that we get a temporary hack to remove the crash > ASAP, one (very bad) possibility is to leak CompositorParent. > CompositorParent is created a startup and destroyed when exiting firefox. > The crash happens in its destructor. Shutdown crashes are also extremely frustrating because that's exactly what we want to do at shutdown. Another workaround is determine after what point in shutdown this crash is occurring, and then _exit(0) instead of aborting if we hit this assertion at that point. STR definitely the way to start here.
Comment 24•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #23) > STR definitely the way to start here. Adding needURLs and qawanted.
Comment 25•12 years ago
|
||
Sorry, I cannot draw URLs for all abort crashes, and this looks like it's becoming just a collection of all abort crashes, which probably can have any number of different causes. I doubt there's any use in pulling URLs for all of those. I also don't know how to pull URLs specifically for one abort message, which may be what could be reasonable here.
Comment 26•12 years ago
|
||
Actually, I remembered that with our old method of getting URLs off some CSV files, we can query the app notes, and so I did that. I'm leaving out those with only one hit in the last 12 days. [rkaiser@fs1 ~]$ gunzip --stdout /data/security_group/crash_urls/201208{1,2}*-crashdata.csv.gz | awk -W compat -F\t '$7 ~ /^FennecAndroid$/ && $26 ~ /mismatched CxxStackFrame ctor\/dtors/ {print $2}' | sort | uniq -c | sort -nr 1449 \N 10 http://moztw.org/foxmosa/game/pairs/ 6 about: 4 http://lenta.ru/news/2012/08/17/srok/ 3 http://www.telegraph.co.uk/health/healthnews/9485807/Baldness-cure-could-be-on-shelves-in-two-years.html 3 http://www.bbc.co.uk/news/uk-england-sussex-19329653 3 https://touch.facebook.com/?_rdr 3 about:home 3 about:config 3 about:apps 2 http://www.northamptonsaints.co.uk/news/6331.php 2 http://www.hkepc.com/forum/viewthread.php?tid=1845400&extra=page%3D1 2 http://www.bbc.co.uk/sport/0/mobile/cricket/19309373 2 http://videojs.com/ 2 https://signin.ebay.com/ws/eBayISAPI.dll?SignIn&UsingSSL=1&pUserId=&co_partnerId=2&siteid=0&ru=http%3A%2F%2Fcgi5.ebay.com%2Fws%2FeBayISAPI.dll%3FNewListing%26%26np_id%3D0%26cpg%3D2%26aid%3D3%26tid%3D8453930017%26guest%3D1%26userid%3D&pageType=1144 2 https://air.mozilla.org/ 2 http://m.forums.theregister.co.uk/forum/1/2012/08/17/twitter_api_rules/ 2 http://m.bbc.co.uk/news/entertainment_and_arts 2 about:firefox 2 about:blank
Keywords: needURLs
Updated•12 years ago
|
Crash Signature: mozalloc_abort | js::ObjectImpl::markChildren]
[@ mozalloc_abort | js_NewStringCopyZ]
[@ mozalloc_abort | js_NewString]
[@ mozalloc_abort | js::TraceChildren]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ] → mozalloc_abort | js::ObjectImpl::markChildren]
[@ mozalloc_abort | js_NewStringCopyZ]
[@ mozalloc_abort | js_NewString]
[@ mozalloc_abort | js::TraceChildren]
[@ mozalloc_abort | __wrap_malloc]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ]
Reporter | ||
Comment 28•12 years ago
|
||
The steps from Comment 0 can still be used to reproduce the issue using Nightly 2012-08-27 on Galaxy Note running Android 4.0.4. Steps to reproduce: 1) Load planet.mozilla.org. 2) Wait for the page to load. 3) Quit Firefox Reproducibility rate is about 40-50% of the cases. From ~10 tries I got 4 crashes.
Keywords: qawanted
Assignee | ||
Comment 29•12 years ago
|
||
FWIW, I don't manage to reproduce it with debug builds while I can reproduce it on optimized builds (on a galaxy nexus and a nexus 7).
Comment 30•12 years ago
|
||
I can reproduce with the suggested steps in comment #28 pretty easily (08/28, GN:4.1.1) I/GeckoApp( 2680): pause I/GeckoApp( 2680): application paused I/Gecko ( 2680): ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file ../../../ipc/glue/RPCChannel.cpp, line 656
Updated•12 years ago
|
Keywords: reproducible
Assignee | ||
Comment 32•12 years ago
|
||
I see that we are sometimes deleting the compositor thread before we delete CompositorParent. When this happens the nsWindow are already deleted, so it is likely that something dispatches a NewRunnableFunction or NewRunnableMethod that takes the CompositorParent in parameter (this uses a ref ptr internally) which keeps it alive for too long. I'm almost there
Target Milestone: --- → Firefox 17
Updated•12 years ago
|
Crash Signature: mozalloc_abort | js::ObjectImpl::markChildren]
[@ mozalloc_abort | js_NewStringCopyZ]
[@ mozalloc_abort | js_NewString]
[@ mozalloc_abort | js::TraceChildren]
[@ mozalloc_abort | __wrap_malloc]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ] → mozalloc_abort | js::ObjectImpl::markChildren]
[@ mozalloc_abort | js_NewStringCopyZ]
[@ mozalloc_abort | js_NewString]
[@ mozalloc_abort | js::TraceChildren]
[@ mozalloc_abort | __wrap_malloc]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ]
[@ mozal…
status-firefox18:
--- → affected
Version: Firefox 16 → Firefox 18
Updated•12 years ago
|
Crash Signature: mozalloc_abort | ProcessGeneric]
[@ mozalloc_abort | mozilla::dom::CSS2PropertiesBinding::set_borderImageRepeat]
[@ mozalloc_abort | exn_toSource]
[@ mozalloc_abort | browser.db (deleted)@0x156451]
[@ mozalloc_abort | browser.db (deleted)@0x356e6c]
[… → mozalloc_abort | ProcessGeneric ]
[@ mozalloc_abort | mozilla::dom::CSS2PropertiesBinding::set_borderImageRepeat ]
[@ mozalloc_abort | exn_toSource ]
[@ mozalloc_abort | browser.db (deleted)@0x156451 ]
[@ mozalloc_abort | browser.db (deleted)@0x356e6c…
Comment 33•12 years ago
|
||
Christian, it's a regression from 16.0, not from 18.0.
Version: Firefox 18 → Firefox 16
Assignee | ||
Comment 34•12 years ago
|
||
It seems that the problem is coming from the CompositorParent being kept alive longer than expected because of a dispatched Task, as I said earlier. This patch fixes the crash on my nexus 7. I just pushed it to try. What it does is it manages a reference count for the thread, so that when CompositorParent::DestroyThread is called it only decrements the ref count, and only when the ref count reaches zero we delete the thread. It turned out that managing a static int counter by hand required less code than creating a reference counted subclass of base::Thread so I did it by hand.
Assignee: jones.chris.g → nsilva
Attachment #656574 -
Flags: review?(jones.chris.g)
Reporter | ||
Updated•12 years ago
|
Crash Signature: (deleted)@0x356e6c ]
[@ mozalloc_abort | JSCompartment::sweepBreakpoints ] → (deleted)@0x356e6c ]
[@ mozalloc_abort | JSCompartment::sweepBreakpoints ]
[@ mozalloc_abort | libfreebl3.so (deleted)@0x20341 | XPCJSRuntime::Get]
Assignee | ||
Comment 35•12 years ago
|
||
oh well, ran into the cxx stackframe abort again with the patch applied, so I probably just made it less likely to happen :( Chris, do you know if it is important that CompositorParent's destructor gets called before the compositor thread is destroyed? If so my patch still has some value...
Assignee | ||
Comment 36•12 years ago
|
||
I have (yet another) theory: The abort that causes the crash is a piece of ipdl code that checks that a stack mCxxStackFrame is empty. this stack is pusehd and poped only through RAII with CxxStackFrame (see RPCChannel.h) when sending and receiving messages, like this RPCChannel::Send(Message* msg) { Message copy = *msg; CxxStackFrame f(*this, OUT_MESSAGE, ©); // push return AsyncChannel::Send(msg); } // (RAII) pop with a sync message we expect to see the following sequence (not always true actually): Sender.RPCChannel.mCxxStackFrame.push(OUT_MESSAGE) Receiver.RPCChannel.mCxxStackFrame.push(IN_MESSAGE) // the message is handled on the receiver's thread Receiver.RPCChannel.mCxxStackFrame.pop(IN_MESSAGE) Sender.RPCChannel.mCxxStackFrame.pop(OUT_MESSAGE) However, if the stack is poped after sending back to the sender the signal that the message has beene handled, it is actually possible that the sender thread start running again before the receiver thread reaches the end of the scope and pops its stack. In our case the crash happens always in the destructor of CompositorParent very shortly after PCompositorParent::SendStop, because we dispatch a task on the main thread that holds a refPtr to the compositor parent, that calls SendStop and then returns. When the task returns CompositorParent is deleted because the task's RefPtr is destroyed. so ~CompositorParent happens very shortly after SendStop. if i printf_stderr something in ~CxxStackFrame just before poping the stack this makes it way easier to reproduce. Chris, does it make sense? You know ipdl way better than me.
Is the CompositorParent setting itself to be destroyed during ActorDestroy? What's the expected destruction sequence.
Comment on attachment 656574 [details] [diff] [review] Fixes fennec exit crash by reference counting the compositor thread. Clearing review request since doesn't seem the entire fix.
Attachment #656574 -
Flags: review?(jones.chris.g)
Comment 39•12 years ago
|
||
Still crashing on the latest Nightly: https://crash-stats.mozilla.com/report/index/bp-a2f48d7a-8ba0-4eb2-8257-b07ed2120831
Crash Signature: (deleted)@0x356e6c ]
[@ mozalloc_abort | JSCompartment::sweepBreakpoints ]
[@ mozalloc_abort | libfreebl3.so (deleted)@0x20341 | XPCJSRuntime::Get] → (deleted)@0x356e6c ]
[@ mozalloc_abort | JSCompartment::sweepBreakpoints ]
[@ mozalloc_abort | libfreebl3.so (deleted)@0x20341 | XPCJSRuntime::Get]
[@ mozalloc_abort | UnmarkGrayChildren ]
Assignee | ||
Comment 40•12 years ago
|
||
When the widget is destroyed (in its dtor), CompositorChild first sends a WillStop message, telling the parent side that it has to get rid of its resources now. After SendWillStop, a task is dispatched on the main thread to defer the second phase of the destruction (it's the same 2-steps destruction sequence principle as ImageBridge's). This task is a RunnableFunction that executes DeferredDestroyCompositor (nsBaseWdget.cpp) which calls compositorChild->Destroy() and immediately after, loses the last reference to the CompositorParent and the CompositorChild which deletes them. Destroy() calles SendStop() which synchronously finishes the destruction sequence. in short: nsBaseWidget::dtor | CompositorChild::SendWillStop [sync] | | CompositorParent::RecvWillStop | Dispatch DeferredDestroyCompositor ... (on fennec it appears that we run into gfxPlatform::ShutDown here, in which case:) gfxPlatform::ShutDown | delete compositorThread ... DeferredDestroyCompositor | CompositorChild::SendStop [sync] | | compsitorParent::RecvStop | CompositorChild::dtor | CompositorParent::dtor I think we have at least two bugs here, the first being that the compositor thread can be destroyed way too early, so my first patch fixes this. The second bug is that since CompositorParent::dtor (and therefore RPCChannel's dtor) is called immediately after SendStop, we have a race between the CompositorParent's destructor and the RPCChannel's poping its mCxxStackFrames on the other side.
Updated•12 years ago
|
Crash Signature: (deleted)@0x356e6c ]
[@ mozalloc_abort | JSCompartment::sweepBreakpoints ]
[@ mozalloc_abort | libfreebl3.so (deleted)@0x20341 | XPCJSRuntime::Get]
[@ mozalloc_abort | UnmarkGrayChildren ] → (deleted)@0x356e6c ]
[@ mozalloc_abort | JSCompartment::sweepBreakpoints ]
[@ mozalloc_abort | libfreebl3.so (deleted)@0x20341 | XPCJSRuntime::Get]
[@ mozalloc_abort | UnmarkGrayChildren ]
[@ mozalloc_abort | __swrite | libxul.so@0x10d3d37]
[@ mozall…
The parent and child worker threads each get their own set of stack frames. If we're sharing them somehow, that would be a fantastically bad bug! :/
Assignee | ||
Comment 42•12 years ago
|
||
They're not sharing their stack frames, but they both get deleted in the main thread at the end of DeferredDestroyCompositor. In our case we are having a problem with CompositorParent being destroyed on the main thread while its CxxStackFrames has not yet be poped on the compositor thread after a sync message from the main thread to the compositor thread.
Oh, so we're trying to delete the CompositorParent on the main thread while it's still active on the compositor thread? That's also very bad! :)
Comment 44•12 years ago
|
||
Linux is affected and the stack trace is not buggy. See https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort+|+NS_DebugBreak_P+|+mozilla%3A%3Aipc%3A%3ARPCChannel%3A%3ADebugAbort
Component: General → Graphics: Layers
OS: Android → All
Product: Firefox for Android → Core
Hardware: ARM → All
Target Milestone: Firefox 17 → ---
Version: Firefox 16 → 16 Branch
Assignee | ||
Comment 45•12 years ago
|
||
I have a patch in whice we dispatch a task on the Compositor thread from within CompositorParent::RecvStop in order to make sure that a nsRefPtr of the compositor parent stays there and keeps the compositor parent alive until the end of the ipdl code around RecvStop. So this solves the problem of deleting the CompositorParent while it is still active on the compositor thread. With this patch we then have the compositor thread's ref count that reaches zero within its own thread, so we end up having to dispatch a task on the main thread so that later deletes the compositor thread. The destruction sequence is becoming quite messy: When a widget is destroyed it triggers most of the compositor parent's clean up, and dispatches a task on the main thread to terminate the cleanup. This is not extremely nice because we then have part of the compositor stuff being destroyed before the destruction of gfxPlatform, and the rest being destroyed after. When the second half of the destruction happens, we need to have the compositor thread dispatch a task to himself to avoid deleting the compositor parent too soon. When we reach this task on the compositor thread we then need to dispatch a task from the compositor thread to the main thread so that the main thread deletes the compositor thread. The way we end up dispatching tasks all over the place makes it hard for us too have guarantees that we destroy the modules in the right order. For instance, Chris, do we have a constraint on how soon we must have deleted the last object that uses ipdl?
Comment 46•12 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #45) > When we reach this task on the compositor thread > we then need to dispatch a task from the compositor thread to the main > thread so that the main thread deletes the compositor thread. I'm concerned with dispatching a task from the compositor thread to the main thread after we've destroyed the channel and at which points the main thread is tearing down the browser. We have no guarantee that the main thread will always be in a proper state to respond to this event and wont be far along it's shutdown sequence. I think the right thing to do here is to destroy the associated compositor parent before the widget. To do this we need to find a way to make sure all the pending messages are processed.
Assignee | ||
Comment 48•12 years ago
|
||
This patch seems to fix it for me on fennec debug and optimized. I agree with BenWa that fixing the problem at this level is probably not the best solution, yet we have to do something about this crash in the short term and fixing the entire destruction sequence may be very tricky.
Attachment #656574 -
Attachment is obsolete: true
Attachment #658588 -
Flags: review?(jones.chris.g)
Updated•12 years ago
|
Crash Signature: libxul.so@0x1148aa3 | NS_IsMainThread_P]
[@ mozalloc_abort | org.mozilla.fennec_aurora-2.apk@0xf24f40]
[@ mozalloc_abort | __swrite | libxul.so@0x1148aa3 | 0 (deleted)@0x11f911f] → libxul.so@0x1148aa3 | NS_IsMainThread_P]
[@ mozalloc_abort | org.mozilla.fennec_aurora-2.apk@0xf24f40]
[@ mozalloc_abort | __swrite | libxul.so@0x1148aa3 | 0 (deleted)@0x11f911f]
[@ mozalloc_abort | __swrite | libxul.so@0x1148b03]
[@ mozalloc_abort | …
Updated•12 years ago
|
Crash Signature: dalvik-bitmap-2 (deleted)@0x111f63]
[@ mozalloc_abort | org.mozilla.firefox_beta-1.apk@0x6d2038]
[@ mozalloc_abort | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort] → dalvik-bitmap-2 (deleted)@0x111f63]
[@ mozalloc_abort | org.mozilla.firefox_beta-1.apk@0x6d2038]
[@ mozalloc_abort | __swrite | libxul.so@0x10d3cf7]
[@ mozalloc_abort | org.mozilla.firefox_beta-2.apk@0x6d2038]
[@ mozalloc_abort | org.mozilla.firefox_b…
Comment on attachment 658588 [details] [diff] [review] Fixes fennec exit crash by reference counting the compositor thread and making sure a CompositorParent is not deleted while running code on the compositor thread. >diff --git a/gfx/layers/ipc/CompositorParent.cpp b/gfx/layers/ipc/CompositorParent.cpp >+static int sCompositorThreadRefCount = 0; Document the semantics of this. >+static void KeepCompositorParentAlive(CompositorParent* aNowReadyToDie) Call this DeferredDelete. >+{ >+ aNowReadyToDie->Release(); >+} >+ >+static void DeleteCompositorThread() >+{ >+ if (NS_IsMainThread()){ >+ if (sCompositorThread) { >+ delete sCompositorThread; No need to null check, |delete| does the right thing. > void CompositorParent::DestroyThread() > { >+ --sCompositorThreadRefCount; >+ if(sCompositorThreadRefCount == 0) { Nit: |if (0 == --sRefCount)| > CompositorParent::~CompositorParent() > { >+ --sCompositorThreadRefCount; >+ if (sCompositorThreadRefCount == 0) { >+ DeleteCompositorThread(); >+ } Don't duplicate this code. Refactor it into a helper function.
Attachment #658588 -
Flags: review?(jones.chris.g)
Assignee | ||
Comment 50•12 years ago
|
||
Attachment #658588 -
Attachment is obsolete: true
Attachment #660164 -
Flags: review?(jones.chris.g)
Comment on attachment 660164 [details] [diff] [review] Fixes fennec exit crash by reference counting the compositor thread and making sure a CompositorParent is not deleted while running code on the compositor thread. No need to re-request review for small changes like these.
Attachment #660164 -
Flags: review?(jones.chris.g) → review+
Assignee | ||
Comment 52•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/8361bd47b3fe
Comment 53•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/8361bd47b3fe
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
Comment 54•12 years ago
|
||
Please nominate for Aurora/Beta asap (when you're comfortable with the bake time on Nightly).
Updated•12 years ago
|
status-firefox18:
affected → ---
Comment 55•12 years ago
|
||
I cannot reproduce any @mozalloc_abort crashes on the latest Nightly build. (In reply to Alex Keybl [:akeybl] from comment #54) > Please nominate for Aurora/Beta asap (when you're comfortable with the bake > time on Nightly). This crashes are annoying because occur almost for each Firefox quit. Due to the high number of crashes both on m-a and m-b, I nominate this to Aurora and Beta. We cannot deliver an app which crashes much more frequently than it does any other released Betas so far.
status-firefox17:
--- → affected
status-firefox18:
--- → fixed
Updated•12 years ago
|
Updated•12 years ago
|
Attachment #660164 -
Flags: approval-mozilla-beta?
Attachment #660164 -
Flags: approval-mozilla-aurora?
Comment 56•12 years ago
|
||
(In reply to Cristian Nicolae (:xti) from comment #55) > Due to the high number of crashes both on m-a and m-b, I nominate this to Aurora > and Beta. That is not exactly what Alex asked. It's instead: does this patch fix the issue AND not add regressions that are worse than the benefit? You answered yes to the first part, what about the second part?
Status: RESOLVED → VERIFIED
Comment 57•12 years ago
|
||
(In reply to Scoobidiver from comment #56) what about the second part? So far the app is working fine. I tried different scenarios to reproduce this bug, but no crash occurred. Atm there isn't any noticeable regression, so I guess that this patch is great benefit.
Assignee | ||
Comment 58•12 years ago
|
||
I haven't had any problem since this patch so I feel comfortable having it landed on aurora and beta. What would worry me most is having another problem in the destruction sequence that we have not seen yet (the tear-down sequence of omtc is a mess), but it would have to be fixed on top of this anyway.
Comment 59•12 years ago
|
||
Comment on attachment 660164 [details] [diff] [review] Fixes fennec exit crash by reference counting the compositor thread and making sure a CompositorParent is not deleted while running code on the compositor thread. [Triage Comment] Fix for a new top crash in FF16, which we believe is low risk based upon current testing. Approving for branches.
Attachment #660164 -
Flags: approval-mozilla-beta?
Attachment #660164 -
Flags: approval-mozilla-beta+
Attachment #660164 -
Flags: approval-mozilla-aurora?
Attachment #660164 -
Flags: approval-mozilla-aurora+
Updated•12 years ago
|
Crash Signature: org.mozilla.firefox_beta-1.apk@0x3d2038]
[@ mozalloc_abort | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort] → org.mozilla.firefox_beta-1.apk@0x3d2038]
[@ mozalloc_abort | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort]
[@ mozalloc_abort | __swrite | libxul.so@0x10d56b7]
[@ mozalloc_abort | __swrite | libxul.so@0x10d56b7 | 0 (deleted)@0x11f911f]
[@ moz…
Comment 60•12 years ago
|
||
Comment on attachment 660164 [details] [diff] [review] Fixes fennec exit crash by reference counting the compositor thread and making sure a CompositorParent is not deleted while running code on the compositor thread. Review of attachment 660164 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/ipc/CompositorParent.cpp @@ +84,5 @@ > + > +static void DeleteCompositorThread() > +{ > + if (NS_IsMainThread()){ > + delete sCompositorThread; Trailing whitespace @@ +231,5 @@ > + // after this function returns while some ipdl code still needs to run on > + // this thread. > + // We must keep the compositor parent alive untill the code handling message > + // reception is finished on this thread. > + this->AddRef(); // Corresponds to DeferredDeleteCompositorParent's Release NS_ADDREF_THIS() is more common. @@ +232,5 @@ > + // this thread. > + // We must keep the compositor parent alive untill the code handling message > + // reception is finished on this thread. > + this->AddRef(); // Corresponds to DeferredDeleteCompositorParent's Release > + CompositorLoop()->PostTask(FROM_HERE, More trailing whitespace @@ +233,5 @@ > + // We must keep the compositor parent alive untill the code handling message > + // reception is finished on this thread. > + this->AddRef(); // Corresponds to DeferredDeleteCompositorParent's Release > + CompositorLoop()->PostTask(FROM_HERE, > + NewRunnableFunction(&DeferredDeleteCompositorParent, Weird indentation
Assignee | ||
Comment 61•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/6334ab5a2ad3
Assignee | ||
Comment 62•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/2ab98b3f1670
Updated•12 years ago
|
Comment 63•12 years ago
|
||
This crash doesn't occur anymore on the latest Aurora and Beta builds. -- Device: Galaxy Note OS: Android 4.0.4
You need to log in
before you can comment on or make changes to this bug.
Description
•