Last Comment Bug 777063 - Crash when sending mail, on window close @ nsChromeTreeOwner::SetTitle
: Crash when sending mail, on window close @ nsChromeTreeOwner::SetTitle
Status: VERIFIED FIXED
: crash, regression
Product: Thunderbird
Classification: Client Software
Component: Message Compose Window (show other bugs)
: Trunk
: x86 All
: -- critical with 1 vote (vote)
: Thunderbird 17.0
Assigned To: neil@parkwaycc.co.uk
:
Mentors:
: 777552 777801 778645 (view as bug list)
Depends on: 814651 866223
Blocks: 775676 777732 778645 818607
  Show dependency treegraph
 
Reported: 2012-07-24 14:00 PDT by Mike Conley (:mconley) - (Needinfo me!)
Modified: 2013-04-26 11:47 PDT (History)
25 users (show)
standard8: in‑testsuite+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Try caching nsXULWindow instead of nsDOMWindow (16.00 KB, patch)
2012-07-25 06:26 PDT, Joshua Cranmer [:jcranmer]
no flags Details | Diff | Splinter Review
Remove recycling compose code (20.15 KB, patch)
2012-07-26 11:26 PDT, Joshua Cranmer [:jcranmer]
standard8: review-
Details | Diff | Splinter Review
Cache nsIXULWindow as well as nsIDOMWindow (8.24 KB, patch)
2012-07-29 15:58 PDT, neil@parkwaycc.co.uk
standard8: review+
Details | Diff | Splinter Review
Unit test fix (758 bytes, patch)
2012-07-31 02:53 PDT, Mark Banner (:standard8)
standard8: review+
Details | Diff | Splinter Review

Description Mike Conley (:mconley) - (Needinfo me!) 2012-07-24 14:00:17 PDT
STR (with Daily):

1) Compose a new HTML mail
2) Send it to someone

What happens?

Crash!

What's expected?

No crash!


Here's my backtrace:

#0  0xb7802424 in __kernel_vsyscall ()
#1  0xb75a6c16 in nanosleep () at ../sysdeps/unix/syscall-template.S:82
#2  0xb75a6a0f in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138
#3  0xb443f2cc in ah_crap_handler (signum=11) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsSigHandlers.cpp:87
#4  0xb4445d3f in nsProfileLock::FatalSignalHandler (signo=11, info=0xbf8820bc, context=0xbf88213c) at /media/Projects/mozilla/objdir-thunderbird-patches/mozilla/toolkit/profile/nsProfileLock.cpp:190
#5  <signal handler called>
#6  0xb4f73abd in nsChromeTreeOwner::SetTitle (this=0xa52d2a60, aTitle=0xbf8824dc) at /media/Projects/mozilla/thunderbird/mozilla/xpfe/appshell/src/nsChromeTreeOwner.cpp:471
#7  0xb4ef7f6e in nsDocShell::SetTitle (this=0x9ee05c00, aTitle=0xbf8824dc) at /media/Projects/mozilla/thunderbird/mozilla/docshell/base/nsDocShell.cpp:5126
#8  0xb4877b0e in nsDocument::DoNotifyPossibleTitleChange (this=0x9f12e800) at /media/Projects/mozilla/thunderbird/mozilla/content/base/src/nsDocument.cpp:5258
#9  0xb4442199 in nsRunnableMethodImpl<void (nsUpdateProcessor::*)(), true>::Run (this=0x9bd57540) at ../../dist/include/nsThreadUtils.h:349
#10 0xb56672c5 in nsThread::ProcessNextEvent (this=0xb7240f90, mayWait=false, result=0xbf88265f) at /media/Projects/mozilla/thunderbird/mozilla/xpcom/threads/nsThread.cpp:624
#11 0xb5623ba8 in NS_ProcessNextEvent_P (thread=<optimized out>, mayWait=false) at /media/Projects/mozilla/objdir-thunderbird-patches/mozilla/xpcom/build/nsThreadUtils.cpp:217
#12 0xb552d14e in mozilla::ipc::MessagePump::Run (this=0xb2619430, aDelegate=0xb7253900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/glue/MessagePump.cpp:82
#13 0xb5694b62 in MessageLoop::RunInternal (this=0xb7253900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/chromium/src/base/message_loop.cc:208
#14 0xb5694b87 in RunHandler (this=0xb7253900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/chromium/src/base/message_loop.cc:201
#15 MessageLoop::Run (this=0xb7253900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/chromium/src/base/message_loop.cc:175
#16 0xb5147c4b in nsBaseAppShell::Run (this=0xb125f1f0) at /media/Projects/mozilla/thunderbird/mozilla/widget/xpwidgets/nsBaseAppShell.cpp:163
#17 0xb4f91146 in nsAppStartup::Run (this=0xb12801c0) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/components/startup/nsAppStartup.cpp:271
#18 0xb443a1c4 in XREMain::XRE_mainRun (this=0xbf882970) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsAppRunner.cpp:3787
#19 0xb443a4e6 in XREMain::XRE_main (this=0xbf882970, argc=1, argv=0xbf883c34, aAppData=0xb7216880) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsAppRunner.cpp:3864
#20 0xb443a729 in XRE_main (argc=1, argv=0xbf883c34, aAppData=0xb7216880, aFlags=0) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsAppRunner.cpp:3940
#21 0x0804a06c in do_main (argv=0xbf883c34, argc=1, exePath=0xbf882b6c "/media/Projects/mozilla/objdir-thunderbird-patches/mozilla/dist/bin/") at /media/Projects/mozilla/thunderbird/mail/app/nsMailApp.cpp:111
#22 main (argc=-1238270672, argv=0xb6317ddc) at /media/Projects/mozilla/thunderbird/mail/app/nsMailApp.cpp:200
Comment 1 Joshua Cranmer [:jcranmer] 2012-07-24 16:25:26 PDT
Bisection has confirmed the failure to be a regression of <http://hg.mozilla.org/mozilla-central/rev/4e95ccf43b6e> from bug 775676.
Comment 2 Joshua Cranmer [:jcranmer] 2012-07-24 16:26:25 PDT
CC'ing the people who worked on bug 775676.
Comment 3 Joshua Cranmer [:jcranmer] 2012-07-24 17:17:17 PDT
Okay, tracking this down, here's the main problem. For arcane reasons I don't fully understand, we cache the compose window; this is done by holding a strong reference to the nsIDOMWindow. When the window is closed, the underlying nsWebShellWindow loses all of its refcounts and dies. Later, if we try to open the cached window (as happens in the mozmill tests) or try to clear out the cache (as happens in the mailbloat tests), when we try to manipulate the underlying window, everything blows up.

The docshell claims it doesn't hold a strong reference to the nsWebShellWindow because it causes a cycle. I don't know if the doc shell still being around should cause the nsWebShellWindow/nsXULWindow to stay around or not, but it's definitely not staying around right now and that's what's causing the crash.
Comment 4 Joe Sabash [:JoeS1] 2012-07-24 17:59:25 PDT
Maybe because of line 1.74 ?
    1.74 -  nsWebShellWindow *win = static_cast<nsWebShellWindow *>(aClosure);
    1.75 -  MutexAutoLock lock(win->mSPTimerLock);
    1.76 -  win->SavePersistentAttributes();
    1.77 +  MutexAutoLock lock(mSPTimerLock);
    1.78 +  SavePersistentAttributes();
Comment 5 Joshua Cranmer [:jcranmer] 2012-07-24 20:36:20 PDT
Summary from IRC discussion with roc:

We have two options:
1. Stop caching compose windows
2. Save nsXULWindows in the cache, not nsIDOMWindows

I tried option #1 on try and predictably found failure along that path. Option #2 is being attempted.
Comment 6 Robert Kaiser 2012-07-25 03:47:19 PDT
(In reply to Joshua Cranmer [:jcranmer] from comment #3)
> Okay, tracking this down, here's the main problem. For arcane reasons I
> don't fully understand, we cache the compose window

The reason we're doing that is for speed. Building a new compose window always was relatively slow, probably because it's XBL-heavy, and caching it made that significantly faster.
Comment 7 Joshua Cranmer [:jcranmer] 2012-07-25 06:26:49 PDT
Created attachment 645745 [details] [diff] [review]
Try caching nsXULWindow instead of nsDOMWindow

This appears to stop the crashes, but try seems to be recording some compose failures: https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=6ba1c6f511c9
Comment 8 Mike Conley (:mconley) - (Needinfo me!) 2012-07-25 13:10:22 PDT
Cc'ing some SeaMonkey folks, because this affects them too.
Comment 9 Wayne Mery (:wsmwk, NI for questions) 2012-07-25 17:30:01 PDT
*** Bug 777552 has been marked as a duplicate of this bug. ***
Comment 10 Tony Mechelynck [:tonymec] 2012-07-25 17:43:47 PDT
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0 SeaMonkey/2.14a1 ID:20120725003016 c-c: 3f7c3f228397 m-c:ef20925bc2a5

I've seen this once while (or immediately after) sending plaintext mail. The mail was sent (it was a mailinglist post, and I got it from the server on restart). Another email did not crash.

See bug 777552 comment #0 for the stack dump.

Joshua: I see you've ASSIGNED this bug to yourself. If the fix includes code not shared between Tb & Sm, then we'll have to port the fix to SeaMonkey (possibly by un-dupeing bug 777552).
Comment 11 Joshua Cranmer [:jcranmer] 2012-07-26 11:05:59 PDT
*** Bug 777801 has been marked as a duplicate of this bug. ***
Comment 12 Joshua Cranmer [:jcranmer] 2012-07-26 11:26:01 PDT
Created attachment 646224 [details] [diff] [review]
Remove recycling compose code

The extra change to mozmill tests is necessary to get them to go green, since it may be that a mozmill change got hidden by busted compose.
Comment 13 :Irving Reid (No longer working on Firefox) 2012-07-26 11:33:14 PDT
Comment on attachment 646224 [details] [diff] [review]
Remove recycling compose code

Is it possible to remove nsMsgComposeService::IsCachedWindow() and all its dependencies in a follow-on bug, or does it have too many tentacles?

If that's in a mailnews IDL, it might be better to get it done before TB 17...
Comment 14 Joshua Cranmer [:jcranmer] 2012-07-26 11:36:49 PDT
(In reply to Irving Reid (:irving) from comment #13)
> Comment on attachment 646224 [details] [diff] [review]
> Remove recycling compose code
> 
> Is it possible to remove nsMsgComposeService::IsCachedWindow() and all its
> dependencies in a follow-on bug, or does it have too many tentacles?
> 
> If that's in a mailnews IDL, it might be better to get it done before TB
> 17...

See bug 777732.
Comment 15 neil@parkwaycc.co.uk 2012-07-29 15:58:20 PDT
Created attachment 647023 [details] [diff] [review]
Cache nsIXULWindow as well as nsIDOMWindow

I think attachment 645745 [details] [diff] [review] was trying to cache the wrong object - the nsIBaseWindow is actually the nsChromeTreeOwner object whose mWindow member is stale because of bug 778355 so caching it doesn't help. Instead this patch caches the nsIXULWindow object, which gets recycling working again.
Comment 16 Wayne Mery (:wsmwk, NI for questions) 2012-07-29 17:00:05 PDT
also related?
@ nsDocument::DoNotifyPossibleTitleChange()
bp-4b0d3795-6c66-4848-a2e6-6962f2120727
Comment 17 Joshua Cranmer [:jcranmer] 2012-07-29 17:07:32 PDT
(In reply to neil@parkwaycc.co.uk from comment #15)
> Created attachment 647023 [details] [diff] [review]
> Cache nsIXULWindow as well as nsIDOMWindow
> 
> I think attachment 645745 [details] [diff] [review] was trying to cache the
> wrong object - the nsIBaseWindow is actually the nsChromeTreeOwner object
> whose mWindow member is stale because of bug 778355 so caching it doesn't
> help. Instead this patch caches the nsIXULWindow object, which gets
> recycling working again.

I don't feel comfortable reviewing this change.
Comment 18 Olli Pettay [:smaug] 2012-07-29 17:27:45 PDT
Usually this kind of crasher regressions lead to either backing out the problematic patch asap,
or fixing the crash asap. Since the fix is still waiting for reviews, any reason to not
back out bug 775676?
Comment 19 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-07-29 17:34:51 PDT
Yes. Fixing some B2G stuff caused jlebar to make an existing nsWebShellWindow memory leak manifest more often --- that's bug 775676. So he fixed bug 775676, which exposed utterly bogus code in Thunderbird that was relying on the memory leak. It doesn't seem like a good idea to slow down progress on B2G to make Justin fix Thunderbird bugs.
Comment 20 Mark Banner (:standard8) 2012-07-30 04:35:51 PDT
Comment on attachment 647023 [details] [diff] [review]
Cache nsIXULWindow as well as nsIDOMWindow

I can take a look at this. This is my preferred option for at least the short term, though I think we should follow up on getting performance measurements for cached versus non-cached (via telemetry), and use those to fuel our decision as to if we should continue caching or not.
Comment 21 Tony Mechelynck [:tonymec] 2012-07-30 12:57:24 PDT
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0 SeaMonkey/2.14a1 ID:20120730003010 c-c: 1d9804d82741 m-c: 36c30260e7fa

I just had two crashes when opening a new email by "Reply to All", one with nsChromeTreeOwner::SetEnabled (listed in this bug's signatures), the other with 0x0 instead but otherwise an extremely similar stack from frame 1 downwards:
bp-eb265a20-cb43-4dae-9a02-6771a2120730 (SetEnabled)
bp-548102dd-053e-47f8-8418-bfff42120730 (0x0)
Comment 22 Mark Banner (:standard8) 2012-07-31 02:38:13 PDT
Comment on attachment 647023 [details] [diff] [review]
Cache nsIXULWindow as well as nsIDOMWindow

Review of attachment 647023 [details] [diff] [review]:
-----------------------------------------------------------------

So this looks ok, and gets the tests to pass. I think we should go with it for now at least. We should also look in more detail about how much it caching the window actually saves us.
Comment 23 Mark Banner (:standard8) 2012-07-31 02:44:32 PDT
Comment on attachment 646224 [details] [diff] [review]
Remove recycling compose code

So, r- for everything but the test at the moment, because I think we want to keep caching, until we've got more evidence of how much time cached versus non-cached really saves us.
Comment 24 Mark Banner (:standard8) 2012-07-31 02:53:46 PDT
Created attachment 647478 [details] [diff] [review]
Unit test fix

Just for completeness, here's the unit test patch that I'm landing extracted from Joshua's patch.
Comment 25 Mark Banner (:standard8) 2012-07-31 02:56:49 PDT
(In reply to Mark Banner (:standard8) from comment #22)
> Comment on attachment 647023 [details] [diff] [review]
> Cache nsIXULWindow as well as nsIDOMWindow

Checked in: https://hg.mozilla.org/comm-central/rev/08016b9719a3

(In reply to Mark Banner (:standard8) from comment #24)
> Created attachment 647478 [details] [diff] [review]
> Unit test fix

Checked in: https://hg.mozilla.org/comm-central/rev/9fbaed739443
Comment 26 Mark Banner (:standard8) 2012-07-31 02:57:35 PDT
I filed bug 779074 about adding telemetry hooks for getting an assessment of cached versus non-cached times.
Comment 27 Tony Mechelynck [:tonymec] 2012-07-31 10:18:34 PDT
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0 SeaMonkey/2.14a1 ID:20120731025549 c-c: 9fbaed739443 m-c: 99c53832abfe

The bug is also fixed in SeaMonkey: I just sent several messages, yesterday most of them would have crashed, today they didn't. Wouldn't this bug belong in "MailNews Core" rather than "Thunderbird"?

This was with a profile where mail.compose.max_recycled_windows had been left at its default of 1.
Comment 28 Tony Mechelynck [:tonymec] 2012-08-03 08:46:40 PDT
*** Bug 778645 has been marked as a duplicate of this bug. ***
Comment 29 WADA 2012-12-06 03:32:47 PST
(In reply to Joshua Cranmer [:jcranmer] from comment #5)
> We have two options:
> 1. Stop caching compose windows
> 2. Save nsXULWindows in the cache, not nsIDOMWindows

We experieced bug 814651 in Tb 17.
To neil@parkwaycc.co.uk(Assigned To: person), did you do (1.) at same in addition to (2.)?
Comment 30 WADA 2013-04-25 00:15:49 PDT
Bug 818607 also started to occur after patch for this bug was landed.

To neil@parkwaycc.co.uk(Assigned To: person of this bug): 
It's merely both bug 814651 and bug 818607 contingently started to occur from build on which your patch was landed?
Comment 31 neil@parkwaycc.co.uk 2013-04-25 05:42:37 PDT
You're right, there's a logic error in my patch, and compose windows no longer get recycled. Once I've worked out a patch I'll attach it to bug 814651.

Note You need to log in before you can comment on or make changes to this bug.