STR (with Daily): 1) Compose a new HTML mail 2) Send it to someone What happens? Crash! What's expected? No crash! Here's my backtrace: #0 0xb7802424 in __kernel_vsyscall () #1 0xb75a6c16 in nanosleep () at ../sysdeps/unix/syscall-template.S:82 #2 0xb75a6a0f in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138 #3 0xb443f2cc in ah_crap_handler (signum=11) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsSigHandlers.cpp:87 #4 0xb4445d3f in nsProfileLock::FatalSignalHandler (signo=11, info=0xbf8820bc, context=0xbf88213c) at /media/Projects/mozilla/objdir-thunderbird-patches/mozilla/toolkit/profile/nsProfileLock.cpp:190 #5 <signal handler called> #6 0xb4f73abd in nsChromeTreeOwner::SetTitle (this=0xa52d2a60, aTitle=0xbf8824dc) at /media/Projects/mozilla/thunderbird/mozilla/xpfe/appshell/src/nsChromeTreeOwner.cpp:471 #7 0xb4ef7f6e in nsDocShell::SetTitle (this=0x9ee05c00, aTitle=0xbf8824dc) at /media/Projects/mozilla/thunderbird/mozilla/docshell/base/nsDocShell.cpp:5126 #8 0xb4877b0e in nsDocument::DoNotifyPossibleTitleChange (this=0x9f12e800) at /media/Projects/mozilla/thunderbird/mozilla/content/base/src/nsDocument.cpp:5258 #9 0xb4442199 in nsRunnableMethodImpl<void (nsUpdateProcessor::*)(), true>::Run (this=0x9bd57540) at ../../dist/include/nsThreadUtils.h:349 #10 0xb56672c5 in nsThread::ProcessNextEvent (this=0xb7240f90, mayWait=false, result=0xbf88265f) at /media/Projects/mozilla/thunderbird/mozilla/xpcom/threads/nsThread.cpp:624 #11 0xb5623ba8 in NS_ProcessNextEvent_P (thread=<optimized out>, mayWait=false) at /media/Projects/mozilla/objdir-thunderbird-patches/mozilla/xpcom/build/nsThreadUtils.cpp:217 #12 0xb552d14e in mozilla::ipc::MessagePump::Run (this=0xb2619430, aDelegate=0xb7253900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/glue/MessagePump.cpp:82 #13 0xb5694b62 in MessageLoop::RunInternal (this=0xb7253900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/chromium/src/base/message_loop.cc:208 #14 0xb5694b87 in RunHandler (this=0xb7253900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/chromium/src/base/message_loop.cc:201 #15 MessageLoop::Run (this=0xb7253900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/chromium/src/base/message_loop.cc:175 #16 0xb5147c4b in nsBaseAppShell::Run (this=0xb125f1f0) at /media/Projects/mozilla/thunderbird/mozilla/widget/xpwidgets/nsBaseAppShell.cpp:163 #17 0xb4f91146 in nsAppStartup::Run (this=0xb12801c0) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/components/startup/nsAppStartup.cpp:271 #18 0xb443a1c4 in XREMain::XRE_mainRun (this=0xbf882970) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsAppRunner.cpp:3787 #19 0xb443a4e6 in XREMain::XRE_main (this=0xbf882970, argc=1, argv=0xbf883c34, aAppData=0xb7216880) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsAppRunner.cpp:3864 #20 0xb443a729 in XRE_main (argc=1, argv=0xbf883c34, aAppData=0xb7216880, aFlags=0) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsAppRunner.cpp:3940 #21 0x0804a06c in do_main (argv=0xbf883c34, argc=1, exePath=0xbf882b6c "/media/Projects/mozilla/objdir-thunderbird-patches/mozilla/dist/bin/") at /media/Projects/mozilla/thunderbird/mail/app/nsMailApp.cpp:111 #22 main (argc=-1238270672, argv=0xb6317ddc) at /media/Projects/mozilla/thunderbird/mail/app/nsMailApp.cpp:200
Bisection has confirmed the failure to be a regression of <http://hg.mozilla.org/mozilla-central/rev/4e95ccf43b6e> from bug 775676.
CC'ing the people who worked on bug 775676.
Okay, tracking this down, here's the main problem. For arcane reasons I don't fully understand, we cache the compose window; this is done by holding a strong reference to the nsIDOMWindow. When the window is closed, the underlying nsWebShellWindow loses all of its refcounts and dies. Later, if we try to open the cached window (as happens in the mozmill tests) or try to clear out the cache (as happens in the mailbloat tests), when we try to manipulate the underlying window, everything blows up. The docshell claims it doesn't hold a strong reference to the nsWebShellWindow because it causes a cycle. I don't know if the doc shell still being around should cause the nsWebShellWindow/nsXULWindow to stay around or not, but it's definitely not staying around right now and that's what's causing the crash.
Maybe because of line 1.74 ? 1.74 - nsWebShellWindow *win = static_cast<nsWebShellWindow *>(aClosure); 1.75 - MutexAutoLock lock(win->mSPTimerLock); 1.76 - win->SavePersistentAttributes(); 1.77 + MutexAutoLock lock(mSPTimerLock); 1.78 + SavePersistentAttributes();
Summary from IRC discussion with roc: We have two options: 1. Stop caching compose windows 2. Save nsXULWindows in the cache, not nsIDOMWindows I tried option #1 on try and predictably found failure along that path. Option #2 is being attempted.
(In reply to Joshua Cranmer [:jcranmer] from comment #3) > Okay, tracking this down, here's the main problem. For arcane reasons I > don't fully understand, we cache the compose window The reason we're doing that is for speed. Building a new compose window always was relatively slow, probably because it's XBL-heavy, and caching it made that significantly faster.
Created attachment 645745 [details] [diff] [review] Try caching nsXULWindow instead of nsDOMWindow This appears to stop the crashes, but try seems to be recording some compose failures: https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=6ba1c6f511c9
Cc'ing some SeaMonkey folks, because this affects them too.
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0 SeaMonkey/2.14a1 ID:20120725003016 c-c: 3f7c3f228397 m-c:ef20925bc2a5 I've seen this once while (or immediately after) sending plaintext mail. The mail was sent (it was a mailinglist post, and I got it from the server on restart). Another email did not crash. See bug 777552 comment #0 for the stack dump. Joshua: I see you've ASSIGNED this bug to yourself. If the fix includes code not shared between Tb & Sm, then we'll have to port the fix to SeaMonkey (possibly by un-dupeing bug 777552).
Created attachment 646224 [details] [diff] [review] Remove recycling compose code The extra change to mozmill tests is necessary to get them to go green, since it may be that a mozmill change got hidden by busted compose.
Comment on attachment 646224 [details] [diff] [review] Remove recycling compose code Is it possible to remove nsMsgComposeService::IsCachedWindow() and all its dependencies in a follow-on bug, or does it have too many tentacles? If that's in a mailnews IDL, it might be better to get it done before TB 17...
(In reply to Irving Reid (:irving) from comment #13) > Comment on attachment 646224 [details] [diff] [review] > Remove recycling compose code > > Is it possible to remove nsMsgComposeService::IsCachedWindow() and all its > dependencies in a follow-on bug, or does it have too many tentacles? > > If that's in a mailnews IDL, it might be better to get it done before TB > 17... See bug 777732.
Created attachment 647023 [details] [diff] [review] Cache nsIXULWindow as well as nsIDOMWindow I think attachment 645745 [details] [diff] [review] was trying to cache the wrong object - the nsIBaseWindow is actually the nsChromeTreeOwner object whose mWindow member is stale because of bug 778355 so caching it doesn't help. Instead this patch caches the nsIXULWindow object, which gets recycling working again.
also related? @ nsDocument::DoNotifyPossibleTitleChange() bp-4b0d3795-6c66-4848-a2e6-6962f2120727
(In reply to firstname.lastname@example.org from comment #15) > Created attachment 647023 [details] [diff] [review] > Cache nsIXULWindow as well as nsIDOMWindow > > I think attachment 645745 [details] [diff] [review] was trying to cache the > wrong object - the nsIBaseWindow is actually the nsChromeTreeOwner object > whose mWindow member is stale because of bug 778355 so caching it doesn't > help. Instead this patch caches the nsIXULWindow object, which gets > recycling working again. I don't feel comfortable reviewing this change.
Usually this kind of crasher regressions lead to either backing out the problematic patch asap, or fixing the crash asap. Since the fix is still waiting for reviews, any reason to not back out bug 775676?
Yes. Fixing some B2G stuff caused jlebar to make an existing nsWebShellWindow memory leak manifest more often --- that's bug 775676. So he fixed bug 775676, which exposed utterly bogus code in Thunderbird that was relying on the memory leak. It doesn't seem like a good idea to slow down progress on B2G to make Justin fix Thunderbird bugs.
Comment on attachment 647023 [details] [diff] [review] Cache nsIXULWindow as well as nsIDOMWindow I can take a look at this. This is my preferred option for at least the short term, though I think we should follow up on getting performance measurements for cached versus non-cached (via telemetry), and use those to fuel our decision as to if we should continue caching or not.
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0 SeaMonkey/2.14a1 ID:20120730003010 c-c: 1d9804d82741 m-c: 36c30260e7fa I just had two crashes when opening a new email by "Reply to All", one with nsChromeTreeOwner::SetEnabled (listed in this bug's signatures), the other with 0x0 instead but otherwise an extremely similar stack from frame 1 downwards: bp-eb265a20-cb43-4dae-9a02-6771a2120730 (SetEnabled) bp-548102dd-053e-47f8-8418-bfff42120730 (0x0)
Comment on attachment 647023 [details] [diff] [review] Cache nsIXULWindow as well as nsIDOMWindow Review of attachment 647023 [details] [diff] [review]: ----------------------------------------------------------------- So this looks ok, and gets the tests to pass. I think we should go with it for now at least. We should also look in more detail about how much it caching the window actually saves us.
Comment on attachment 646224 [details] [diff] [review] Remove recycling compose code So, r- for everything but the test at the moment, because I think we want to keep caching, until we've got more evidence of how much time cached versus non-cached really saves us.
Created attachment 647478 [details] [diff] [review] Unit test fix Just for completeness, here's the unit test patch that I'm landing extracted from Joshua's patch.
(In reply to Mark Banner (:standard8) from comment #22) > Comment on attachment 647023 [details] [diff] [review] > Cache nsIXULWindow as well as nsIDOMWindow Checked in: https://hg.mozilla.org/comm-central/rev/08016b9719a3 (In reply to Mark Banner (:standard8) from comment #24) > Created attachment 647478 [details] [diff] [review] > Unit test fix Checked in: https://hg.mozilla.org/comm-central/rev/9fbaed739443
I filed bug 779074 about adding telemetry hooks for getting an assessment of cached versus non-cached times.
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0 SeaMonkey/2.14a1 ID:20120731025549 c-c: 9fbaed739443 m-c: 99c53832abfe The bug is also fixed in SeaMonkey: I just sent several messages, yesterday most of them would have crashed, today they didn't. Wouldn't this bug belong in "MailNews Core" rather than "Thunderbird"? This was with a profile where mail.compose.max_recycled_windows had been left at its default of 1.
(In reply to Joshua Cranmer [:jcranmer] from comment #5) > We have two options: > 1. Stop caching compose windows > 2. Save nsXULWindows in the cache, not nsIDOMWindows We experieced bug 814651 in Tb 17. To email@example.com(Assigned To: person), did you do (1.) at same in addition to (2.)?
Bug 818607 also started to occur after patch for this bug was landed. To firstname.lastname@example.org(Assigned To: person of this bug): It's merely both bug 814651 and bug 818607 contingently started to occur from build on which your patch was landed?
You're right, there's a logic error in my patch, and compose windows no longer get recycled. Once I've worked out a patch I'll attach it to bug 814651.