Closed Bug 777063 Opened 12 years ago Closed 12 years ago

Crash when sending mail, on window close @ nsChromeTreeOwner::SetTitle

Categories

(Thunderbird :: Message Compose Window, defect)

x86
All
defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED
Thunderbird 17.0

People

(Reporter: mconley, Assigned: neil)

References

(Depends on 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files, 2 obsolete files)

STR (with Daily):

1) Compose a new HTML mail
2) Send it to someone

What happens?

Crash!

What's expected?

No crash!


Here's my backtrace:

#0  0xb7802424 in __kernel_vsyscall ()
#1  0xb75a6c16 in nanosleep () at ../sysdeps/unix/syscall-template.S:82
#2  0xb75a6a0f in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138
#3  0xb443f2cc in ah_crap_handler (signum=11) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsSigHandlers.cpp:87
#4  0xb4445d3f in nsProfileLock::FatalSignalHandler (signo=11, info=0xbf8820bc, context=0xbf88213c) at /media/Projects/mozilla/objdir-thunderbird-patches/mozilla/toolkit/profile/nsProfileLock.cpp:190
#5  <signal handler called>
#6  0xb4f73abd in nsChromeTreeOwner::SetTitle (this=0xa52d2a60, aTitle=0xbf8824dc) at /media/Projects/mozilla/thunderbird/mozilla/xpfe/appshell/src/nsChromeTreeOwner.cpp:471
#7  0xb4ef7f6e in nsDocShell::SetTitle (this=0x9ee05c00, aTitle=0xbf8824dc) at /media/Projects/mozilla/thunderbird/mozilla/docshell/base/nsDocShell.cpp:5126
#8  0xb4877b0e in nsDocument::DoNotifyPossibleTitleChange (this=0x9f12e800) at /media/Projects/mozilla/thunderbird/mozilla/content/base/src/nsDocument.cpp:5258
#9  0xb4442199 in nsRunnableMethodImpl<void (nsUpdateProcessor::*)(), true>::Run (this=0x9bd57540) at ../../dist/include/nsThreadUtils.h:349
#10 0xb56672c5 in nsThread::ProcessNextEvent (this=0xb7240f90, mayWait=false, result=0xbf88265f) at /media/Projects/mozilla/thunderbird/mozilla/xpcom/threads/nsThread.cpp:624
#11 0xb5623ba8 in NS_ProcessNextEvent_P (thread=<optimized out>, mayWait=false) at /media/Projects/mozilla/objdir-thunderbird-patches/mozilla/xpcom/build/nsThreadUtils.cpp:217
#12 0xb552d14e in mozilla::ipc::MessagePump::Run (this=0xb2619430, aDelegate=0xb7253900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/glue/MessagePump.cpp:82
#13 0xb5694b62 in MessageLoop::RunInternal (this=0xb7253900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/chromium/src/base/message_loop.cc:208
#14 0xb5694b87 in RunHandler (this=0xb7253900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/chromium/src/base/message_loop.cc:201
#15 MessageLoop::Run (this=0xb7253900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/chromium/src/base/message_loop.cc:175
#16 0xb5147c4b in nsBaseAppShell::Run (this=0xb125f1f0) at /media/Projects/mozilla/thunderbird/mozilla/widget/xpwidgets/nsBaseAppShell.cpp:163
#17 0xb4f91146 in nsAppStartup::Run (this=0xb12801c0) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/components/startup/nsAppStartup.cpp:271
#18 0xb443a1c4 in XREMain::XRE_mainRun (this=0xbf882970) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsAppRunner.cpp:3787
#19 0xb443a4e6 in XREMain::XRE_main (this=0xbf882970, argc=1, argv=0xbf883c34, aAppData=0xb7216880) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsAppRunner.cpp:3864
#20 0xb443a729 in XRE_main (argc=1, argv=0xbf883c34, aAppData=0xb7216880, aFlags=0) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsAppRunner.cpp:3940
#21 0x0804a06c in do_main (argv=0xbf883c34, argc=1, exePath=0xbf882b6c "/media/Projects/mozilla/objdir-thunderbird-patches/mozilla/dist/bin/") at /media/Projects/mozilla/thunderbird/mail/app/nsMailApp.cpp:111
#22 main (argc=-1238270672, argv=0xb6317ddc) at /media/Projects/mozilla/thunderbird/mail/app/nsMailApp.cpp:200
Bisection has confirmed the failure to be a regression of <http://hg.mozilla.org/mozilla-central/rev/4e95ccf43b6e> from bug 775676.
Depends on: 775676
CC'ing the people who worked on bug 775676.
Okay, tracking this down, here's the main problem. For arcane reasons I don't fully understand, we cache the compose window; this is done by holding a strong reference to the nsIDOMWindow. When the window is closed, the underlying nsWebShellWindow loses all of its refcounts and dies. Later, if we try to open the cached window (as happens in the mozmill tests) or try to clear out the cache (as happens in the mailbloat tests), when we try to manipulate the underlying window, everything blows up.

The docshell claims it doesn't hold a strong reference to the nsWebShellWindow because it causes a cycle. I don't know if the doc shell still being around should cause the nsWebShellWindow/nsXULWindow to stay around or not, but it's definitely not staying around right now and that's what's causing the crash.
Maybe because of line 1.74 ?
    1.74 -  nsWebShellWindow *win = static_cast<nsWebShellWindow *>(aClosure);
    1.75 -  MutexAutoLock lock(win->mSPTimerLock);
    1.76 -  win->SavePersistentAttributes();
    1.77 +  MutexAutoLock lock(mSPTimerLock);
    1.78 +  SavePersistentAttributes();
Summary from IRC discussion with roc:

We have two options:
1. Stop caching compose windows
2. Save nsXULWindows in the cache, not nsIDOMWindows

I tried option #1 on try and predictably found failure along that path. Option #2 is being attempted.
(In reply to Joshua Cranmer [:jcranmer] from comment #3)
> Okay, tracking this down, here's the main problem. For arcane reasons I
> don't fully understand, we cache the compose window

The reason we're doing that is for speed. Building a new compose window always was relatively slow, probably because it's XBL-heavy, and caching it made that significantly faster.
This appears to stop the crashes, but try seems to be recording some compose failures: https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=6ba1c6f511c9
Assignee: nobody → Pidgeot18
Status: NEW → ASSIGNED
Cc'ing some SeaMonkey folks, because this affects them too.
Blocks: 775676
Crash Signature: [@ nsChromeTreeOwner::SetTitle] [@ nsChromeTreeOwner::SetTitle(wchar_t const*) ]
No longer depends on: 775676
Keywords: crash, regression
Summary: Crash when sending HTML mail, on window close → Crash when sending HTML mail, on window close @ nsChromeTreeOwner::SetTitle
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0 SeaMonkey/2.14a1 ID:20120725003016 c-c: 3f7c3f228397 m-c:ef20925bc2a5

I've seen this once while (or immediately after) sending plaintext mail. The mail was sent (it was a mailinglist post, and I got it from the server on restart). Another email did not crash.

See bug 777552 comment #0 for the stack dump.

Joshua: I see you've ASSIGNED this bug to yourself. If the fix includes code not shared between Tb & Sm, then we'll have to port the fix to SeaMonkey (possibly by un-dupeing bug 777552).
Summary: Crash when sending HTML mail, on window close @ nsChromeTreeOwner::SetTitle → Crash when sending mail, on window close @ nsChromeTreeOwner::SetTitle
Attached patch Remove recycling compose code (obsolete) — Splinter Review
The extra change to mozmill tests is necessary to get them to go green, since it may be that a mozmill change got hidden by busted compose.
Attachment #646224 - Flags: review?(neil)
Comment on attachment 646224 [details] [diff] [review]
Remove recycling compose code

Is it possible to remove nsMsgComposeService::IsCachedWindow() and all its dependencies in a follow-on bug, or does it have too many tentacles?

If that's in a mailnews IDL, it might be better to get it done before TB 17...
(In reply to Irving Reid (:irving) from comment #13)
> Comment on attachment 646224 [details] [diff] [review]
> Remove recycling compose code
> 
> Is it possible to remove nsMsgComposeService::IsCachedWindow() and all its
> dependencies in a follow-on bug, or does it have too many tentacles?
> 
> If that's in a mailnews IDL, it might be better to get it done before TB
> 17...

See bug 777732.
Blocks: 778355
No longer blocks: 778355
I think attachment 645745 [details] [diff] [review] was trying to cache the wrong object - the nsIBaseWindow is actually the nsChromeTreeOwner object whose mWindow member is stale because of bug 778355 so caching it doesn't help. Instead this patch caches the nsIXULWindow object, which gets recycling working again.
Attachment #647023 - Flags: review?(Pidgeot18)
also related?
@ nsDocument::DoNotifyPossibleTitleChange()
bp-4b0d3795-6c66-4848-a2e6-6962f2120727
Crash Signature: [@ nsChromeTreeOwner::SetTitle] [@ nsChromeTreeOwner::SetTitle(wchar_t const*) ] → [@ nsChromeTreeOwner::SetTitle] [@ nsChromeTreeOwner::SetEnabled(bool) ] [@ nsChromeTreeOwner::SetTitle(wchar_t const*) ] [@ nsMsgComposeService::OpenComposeWindowWithParams(char const* nsIMsgComposeParams*)] [@ @0x0 | nsMsgComposeService::OpenCompos…
(In reply to neil@parkwaycc.co.uk from comment #15)
> Created attachment 647023 [details] [diff] [review]
> Cache nsIXULWindow as well as nsIDOMWindow
> 
> I think attachment 645745 [details] [diff] [review] was trying to cache the
> wrong object - the nsIBaseWindow is actually the nsChromeTreeOwner object
> whose mWindow member is stale because of bug 778355 so caching it doesn't
> help. Instead this patch caches the nsIXULWindow object, which gets
> recycling working again.

I don't feel comfortable reviewing this change.
Usually this kind of crasher regressions lead to either backing out the problematic patch asap,
or fixing the crash asap. Since the fix is still waiting for reviews, any reason to not
back out bug 775676?
Yes. Fixing some B2G stuff caused jlebar to make an existing nsWebShellWindow memory leak manifest more often --- that's bug 775676. So he fixed bug 775676, which exposed utterly bogus code in Thunderbird that was relying on the memory leak. It doesn't seem like a good idea to slow down progress on B2G to make Justin fix Thunderbird bugs.
Comment on attachment 647023 [details] [diff] [review]
Cache nsIXULWindow as well as nsIDOMWindow

I can take a look at this. This is my preferred option for at least the short term, though I think we should follow up on getting performance measurements for cached versus non-cached (via telemetry), and use those to fuel our decision as to if we should continue caching or not.
Attachment #647023 - Flags: review?(Pidgeot18) → review?(mbanner)
Blocks: 778645
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0 SeaMonkey/2.14a1 ID:20120730003010 c-c: 1d9804d82741 m-c: 36c30260e7fa

I just had two crashes when opening a new email by "Reply to All", one with nsChromeTreeOwner::SetEnabled (listed in this bug's signatures), the other with 0x0 instead but otherwise an extremely similar stack from frame 1 downwards:
bp-eb265a20-cb43-4dae-9a02-6771a2120730 (SetEnabled)
bp-548102dd-053e-47f8-8418-bfff42120730 (0x0)
Comment on attachment 647023 [details] [diff] [review]
Cache nsIXULWindow as well as nsIDOMWindow

Review of attachment 647023 [details] [diff] [review]:
-----------------------------------------------------------------

So this looks ok, and gets the tests to pass. I think we should go with it for now at least. We should also look in more detail about how much it caching the window actually saves us.
Attachment #647023 - Flags: review?(mbanner) → review+
Comment on attachment 646224 [details] [diff] [review]
Remove recycling compose code

So, r- for everything but the test at the moment, because I think we want to keep caching, until we've got more evidence of how much time cached versus non-cached really saves us.
Attachment #646224 - Flags: review?(neil) → review-
Attachment #645745 - Attachment is obsolete: true
Attached patch Unit test fixSplinter Review
Just for completeness, here's the unit test patch that I'm landing extracted from Joshua's patch.
Attachment #646224 - Attachment is obsolete: true
Attachment #647478 - Flags: review+
(In reply to Mark Banner (:standard8) from comment #22)
> Comment on attachment 647023 [details] [diff] [review]
> Cache nsIXULWindow as well as nsIDOMWindow

Checked in: https://hg.mozilla.org/comm-central/rev/08016b9719a3

(In reply to Mark Banner (:standard8) from comment #24)
> Created attachment 647478 [details] [diff] [review]
> Unit test fix

Checked in: https://hg.mozilla.org/comm-central/rev/9fbaed739443
Assignee: Pidgeot18 → neil
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 17.0
I filed bug 779074 about adding telemetry hooks for getting an assessment of cached versus non-cached times.
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0 SeaMonkey/2.14a1 ID:20120731025549 c-c: 9fbaed739443 m-c: 99c53832abfe

The bug is also fixed in SeaMonkey: I just sent several messages, yesterday most of them would have crashed, today they didn't. Wouldn't this bug belong in "MailNews Core" rather than "Thunderbird"?

This was with a profile where mail.compose.max_recycled_windows had been left at its default of 1.
Status: RESOLVED → VERIFIED
(In reply to Joshua Cranmer [:jcranmer] from comment #5)
> We have two options:
> 1. Stop caching compose windows
> 2. Save nsXULWindows in the cache, not nsIDOMWindows

We experieced bug 814651 in Tb 17.
To neil@parkwaycc.co.uk(Assigned To: person), did you do (1.) at same in addition to (2.)?
Bug 818607 also started to occur after patch for this bug was landed.

To neil@parkwaycc.co.uk(Assigned To: person of this bug): 
It's merely both bug 814651 and bug 818607 contingently started to occur from build on which your patch was landed?
Flags: needinfo?(neil)
You're right, there's a logic error in my patch, and compose windows no longer get recycled. Once I've worked out a patch I'll attach it to bug 814651.
Flags: needinfo?(neil)
Depends on: 866223
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: