Last Comment Bug 241733 - crash if I attempt to move an email while another email is being sent [@ nsMenuFrame::Notify ]
: crash if I attempt to move an email while another email is being sent [@ nsMe...
Status: RESOLVED FIXED
[need testcase]
: crash, fixed1.8.0.7, fixed1.8.1
Product: Core
Classification: Components
Component: Layout (show other bugs)
: Trunk
: x86 Windows 2000
: -- critical with 1 vote (vote)
: ---
Assigned To: jpl24
:
Mentors:
: 287565 322023 339535 342658 345969 347543 (view as bug list)
Depends on:
Blocks: 322939 322162
  Show dependency treegraph
 
Reported: 2004-04-26 04:35 PDT by Sam Campbell
Modified: 2011-06-13 10:01 PDT (History)
14 users (show)
dveditz: blocking1.8.0.1-
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
This is the stack content from the crash on Mozilla 1.6 (93.44 KB, application/octet-stream)
2004-04-26 16:29 PDT, Sam Campbell
no flags Details
patch_v1 (7.71 KB, patch)
2006-01-10 22:09 PST, jpl24
bzbarsky: review+
Details | Diff | Review
patch_v2 (8.63 KB, patch)
2006-01-11 22:45 PST, jpl24
jlurz24: review+
bzbarsky: superreview+
dveditz: approval1.8.0.7+
mtschrep: approval1.8.1+
Details | Diff | Review
Patch merged to 1.8 branch (8.42 KB, patch)
2006-07-31 10:29 PDT, Boris Zbarsky [:bz] (Out June 25-July 6)
no flags Details | Diff | Review

Description Sam Campbell 2004-04-26 04:35:04 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113

When I move a message from my inbox to a local subfolder on my system at the
same time as I am sending another message to an smtp relay, the browser may crash.

Reproducible: Sometimes
Steps to Reproduce:
This problem happens fairly regularly, however have trouble reproducing 'at
will' - that said:

1.  Prepare email, adding attachments makes the issue more likely to happen.
2.  Select 'send this mail now' icon.
3.  While the email is being sent ('sending message' box is visible)
4.  Go to inbox (IMAP for me) select an email, right click, seclect 'move
mail..' and move through the popup menu to get to a local subfolder for storage.
5.  If you are still moving the email at the same time as the 'sending message'
box is visible, then there will be a short (1 sec?) hand, and a crash will
happen, with full browser and mail client dying.

The issue seems to be related to the phase where the sending message box is, and
the drawing of the subfolder menu on the screen.

Adding an attachment creates a larger email to send, so the 'sending message'
box is on the screen longer, and movre likely to cause the crash.

Other info that may be relevent:
Local PC to mailbox on IMAP server.  Email is sent through an SMTP relay when
leaving my PC.  I keep subfolders on my local disk, as well as 'sent' folder. 
Is it possible that its the storing to the local 'sent folder' at the same time
as I do the 'move' of the email from the inbox... just a thought.

I have found it happens to me less often than it did, but thats because now I
have placed where the crash comes from I workaround the issue by habit.  I would
have seen this issue since first loading mozilla at work, around v1.3


Actual Results:  
Browser crash.  Browser restarts fine afterwards.

Expected Results:  
sent the email, and moved the other one to the desired location without crashing.
Comment 1 Sam Campbell 2004-04-26 04:53:48 PDT
I have just reproduced the bug, the netscape feedback agent ID is : TB32396233H
Comment 2 Frank Wein [:mcsmurf] 2004-04-26 08:36:47 PDT
Reporter: Can you try this with 1.7rc1? The Talkback version included in Mozilla
1.6 sends the data to Netscape (AOL) and not to mozilla.org
Comment 3 Sam Campbell 2004-04-26 16:29:29 PDT
Created attachment 147092 [details]
This is the stack content from the crash on Mozilla 1.6
Comment 4 Frank Wein [:mcsmurf] 2004-04-26 22:54:42 PDT
No, i really need a TB id from Mozilla 1.7b or Mozilla1.7rc1, the stack itself
from TB is not useful.
Comment 5 Sam Campbell 2004-04-27 01:16:52 PDT
ok - I have run 1.7rc-1 and caused a crash : TB31486M

Comments:
It was harder to cause the crash.  I had it down to a 1 in 3 tries (say) on
Moz1.6.  I tried about 30 times to get the crash on 1.7rc-1, to the point where
I thought it was fixed, until I eventually got lucky.

I'm guessing... but the speed thing seemed to affect it.  The client appears
more responsive (snappier?) to me, and if it is a timing thing this may have
affected my chances at relicating this issue.

... amongst all this, I did crash it twice in another way which is reproducible,
but that is a different thing, I'll do a search before logging the other one.

Sam
Comment 6 Scott MacGregor 2004-04-30 09:09:48 PDT
from talkback:

nsMenuFrame::Notify
[d:/BUILDS/tinderbox/Mozilla1.7/WINNT_5.0_Clobber/mozilla/layout/xul/base/src/nsMenuFrame.cpp,
line 1297]
nsTimerImpl::Fire
[d:/BUILDS/tinderbox/Mozilla1.7/WINNT_5.0_Clobber/mozilla/xpcom/threads/nsTimerImpl.cpp,
line 395]
handleTimerEvent
[d:/BUILDS/tinderbox/Mozilla1.7/WINNT_5.0_Clobber/mozilla/xpcom/threads/nsTimerImpl.cpp,
line 449]
PL_HandleEvent
[d:/BUILDS/tinderbox/Mozilla1.7/WINNT_5.0_Clobber/mozilla/xpcom/threads/plevent.c,
line 674]
PL_ProcessPendingEvents
[d:/BUILDS/tinderbox/Mozilla1.7/WINNT_5.0_Clobber/mozilla/xpcom/threads/plevent.c,
line 612]
nsEventQueueImpl::ProcessPendingEvents
[d:/BUILDS/tinderbox/Mozilla1.7/WINNT_5.0_Clobber/mozilla/xpcom/threads/nsEventQueue.cpp,
line 395]

I'm not sure what's going on.

see http://lxr.mozilla.org/mozilla/source/layout/xul/base/src/nsMenuFrame.cpp#1290

perhaps we need to (at least) safety check mContent or mMenuParent?
Comment 7 Adam Guthrie 2005-12-31 16:44:09 PST
*** Bug 287565 has been marked as a duplicate of this bug. ***
Comment 8 neil@parkwaycc.co.uk 2006-01-01 06:25:52 PST
A frame as an XPCOM timer callback? Sounds fishy to me...
Comment 9 Boris Zbarsky [:bz] (Out June 25-July 6) 2006-01-01 11:35:36 PST
Er....  It's _very_ fishy.  There should be a separate callback object here (one that's actually refcounted), and the frame should notify it when it gets destroyed.  As the code stands we're probably making a callback on an already-deleted object.

And if so, we need to fix this on 1.8 branch too.  :(

If no one else does this I guess I can try to, but I won't have time until a few weeks from now with other stuff that's piled up.
Comment 10 Daniel Veditz [:dveditz] 2006-01-04 09:03:19 PST
Without a reviewed/tested patch this one looks like it misses the .0.1 train.
Comment 11 neil@parkwaycc.co.uk 2006-01-10 09:16:51 PST
Well, I guess we need to cancel the timer on destruction (in which case the last line of Notify looks suspicious) plus probably switch to a function callback?
Comment 12 Boris Zbarsky [:bz] (Out June 25-July 6) 2006-01-10 09:23:26 PST
That could work.  Or we could switch to using a separate refcounted object here like nsImageFrame does for the imagelib callbacks.
Comment 13 jpl24 2006-01-10 22:09:11 PST
Created attachment 208174 [details] [diff] [review]
patch_v1

Hopefully nobody else was busy fixing this, I didn't have the bug privileges to assign it to myself. This implements I believe what bz was suggesting.

This compiles and runs, my build currently has a bunch of assertions that that I'm hoping aren't related. This should fix the crash, but since this crash isn't easily reproducable I couldn't test it.
Comment 14 jpl24 2006-01-10 22:19:07 PST
From a quick search through lxr I think nsMenuPopupFrame and nsAutoRepeatBoxFrame may have the same bug. nsScrollBarButtonFrame addrefs when it QIs to nsITimerCallback, not sure if that's right. nsSliderFrame does this correctly like nsImageFrame, but names the class nsSliderMediator which I like more than the name I came up with.
Comment 15 neil@parkwaycc.co.uk 2006-01-11 03:52:06 PST
Comment on attachment 208174 [details] [diff] [review]
patch_v1

>+  NS_REINTERPRET_CAST(nsMenuFrameListener*, mListener.get())->ClearFrame();
By making mListener an nsRefPtr<nsMenuFrameListener> you avoid this (incorrect - should have been static) cast.

>+  friend class nsMenuFrameListener;
ClearFrame is public, no?
Comment 16 neil@parkwaycc.co.uk 2006-01-11 03:57:27 PST
(In reply to comment #14)
>From a quick search through lxr I think nsMenuPopupFrame and
>nsAutoRepeatBoxFrame may have the same bug.
At least nsAutoRepeatBoxFrame stops the repeat service in Destroy.
Comment 17 jpl24 2006-01-11 05:58:02 PST
(In reply to comment #15)
> (From update of attachment 208174 [details] [diff] [review] [edit])
> >+  NS_REINTERPRET_CAST(nsMenuFrameListener*, mListener.get())->ClearFrame();
> By making mListener an nsRefPtr<nsMenuFrameListener> you avoid this (incorrect
> - should have been static) cast.
Good idea, I thought that looked odd(it's copied from nsImageFrame)

> >+  friend class nsMenuFrameListener;
> ClearFrame is public, no?
The macros obscure this a bit, I did this so Notify on the nsMenuFrame could be private to make it clear that the method should only be called from nsMenuFrameListener.

> At least nsAutoRepeatBoxFrame stops the repeat service in Destroy.
Ah so it stops the timer. It looks like there were a few methods used to stop this type of crash.
Comment 18 jpl24 2006-01-11 20:25:50 PST
I have a patch that fixes Neil's comments, but I'm going to wait for additional review comments before I upload a new patch.

I searched talkback and found 448 crashes in nsMenuFrame::Notify, looks like there are numerous ways to trigger this.
http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=1&searchby=stacksig&match=contains&searchfor=nsMenuFrame%3A%3ANotify&vendor=MozillaOrg&product=All&platform=All&buildid=&sdate=&stime=&edate=&etime=&sortby=bbid
Comment 19 Boris Zbarsky [:bz] (Out June 25-July 6) 2006-01-11 21:08:02 PST
Comment on attachment 208174 [details] [diff] [review]
patch_v1

>Index: layout/xul/base/src/nsMenuFrame.cpp
>+  // Setup a listener which can be used for callbacks on this frame.

"Set up" (two words).

>+  mListener = new nsMenuFrameListener(this);

Maybe call it nsMenuTimerMediator or something?

>@@ -226,16 +230,20 @@ nsMenuFrame::Init(nsPresContext*  

>+  // Null out the pointer to this frame in the listener wrapper so that it 
>+  // doesn't try to interact with a deallocated frame.
>+  NS_REINTERPRET_CAST(nsMenuFrameListener*, mListener.get())->ClearFrame();

I'd do this in Destroy.  And if you make mListener a nsRefPtr<nsMenuFrameListener> there should be no need for a cast here.

To be safe, do we want to cancel the timer here too?

>+nsMenuFrameListener::nsMenuFrameListener(nsMenuFrame *aFrame) :
>+  mFrame(aFrame)
>+{

  NS_ASSERTION(mFrame, "Must have frame");

please.

>+void nsMenuFrameListener::ClearFrame()
>+{
>+  mFrame = 0;

nsnull, not 0.

>Index: layout/xul/base/src/nsMenuFrame.h

>+class nsMenuFrameListener : public nsITimerCallback
>+  virtual ~nsMenuFrameListener();

Can you make that destructor private and non-virtual?  If so, great.  If not, I guess this is ok.
>@@ -218,26 +240,32 @@ protected:
>+  NS_IMETHOD Notify(nsITimer* aTimer);

This should just be NS_HIDDEN_(nsresult), not an NS_IMETHOD.

With those nits picked, r=bzbarsky.  Thanks a ton for doing this!  And if we have other places doing this sort of thing we should fix them too...
Comment 20 jpl24 2006-01-11 22:45:45 PST
Created attachment 208247 [details] [diff] [review]
patch_v2

I think this addresses all the review comments. I tried making the destructor private, but that caused a warning about the destructor being inaccessable. I think this warning is correct, not that it would matter because the destructor is empty. I did make it non-virtual though. Carrying over r+.

Does this need sr?(the bugzilla component is definitely wrong)

I believe all the other frames which implement nsITimerCallback are safe from crashes, but they do block bug 322939 (AddRef on frames should assert).
Comment 21 Frank Wein [:mcsmurf] 2006-01-12 05:03:15 PST
Yes this patch needs sr, too.
Comment 22 Boris Zbarsky [:bz] (Out June 25-July 6) 2006-01-12 08:38:56 PST
Comment on attachment 208247 [details] [diff] [review]
patch_v2

sr=me too.  ;)
Comment 23 Boris Zbarsky [:bz] (Out June 25-July 6) 2006-01-12 09:41:47 PST
Checked in.  File new bugs on the remaining classes, if any?  I'd really like to fix bug 322939.  ;)
Comment 24 neil@parkwaycc.co.uk 2006-01-12 16:28:27 PST
(In reply to comment #19)
>To be safe, do we want to cancel the timer here too?
I'm still worried that Notify might clear the wrong timer.
Comment 25 Boris Zbarsky [:bz] (Out June 25-July 6) 2006-01-12 16:53:45 PST
What do you mean?
Comment 26 neil@parkwaycc.co.uk 2006-01-14 13:32:23 PST
Well, Notify always clears mOpenTimer, but only cancels it if it equals aTimer.
Comment 27 Boris Zbarsky [:bz] (Out June 25-July 6) 2006-01-15 13:59:20 PST
I think that code is bogus.  If we're getting Notify() called, the timer has to be mOpenTimer, just based on where we set mOpenTimer...
Comment 28 jpl24 2006-01-15 14:15:22 PST
Yeah I can't see how another timer could call Notify, especially with the mediator in place. Want me to change that to an assert and attach a patch?
Comment 29 Boris Zbarsky [:bz] (Out June 25-July 6) 2006-03-14 08:23:45 PST
> Want me to change that to an assert and attach a patch?

Either way.  Sorry I missed this question before...

But it looks like at least nsMenuPopupFrame is not safe from crashes.  See bug 246712, bug 268399, bug 302584.  :(
Comment 30 Adam Guthrie 2006-05-30 17:33:38 PDT
*** Bug 339535 has been marked as a duplicate of this bug. ***
Comment 31 Marc Bejarano 2006-07-24 19:57:25 PDT
*** Bug 322023 has been marked as a duplicate of this bug. ***
Comment 32 Marc Bejarano 2006-07-24 22:37:35 PDT
*** Bug 342658 has been marked as a duplicate of this bug. ***
Comment 33 Tony Mechelynck [:tonymec] 2006-07-26 05:54:24 PDT
*** Bug 345969 has been marked as a duplicate of this bug. ***
Comment 34 Tony Mechelynck [:tonymec] 2006-07-26 05:57:23 PDT
(In reply to comment #33)
> *** Bug 345969 has been marked as a duplicate of this bug. ***
> 

Bug 345969 = crash on Fx 1.5.0.5 nightly, happened a few minutes ago.
Tentative dupe. Feel free to un-dupe if stack isn't close enough.
Comment 35 Tony Mechelynck [:tonymec] 2006-07-28 10:50:35 PDT
I've been getting crashes now and then on Linux Fx 1.5.0.5 which have been duped to this bug. Will the fix be ported "in due time" or should I open a separate bug for Linux and/or 1.8.0.x Branch ?
Comment 36 Adam Guthrie 2006-07-28 23:57:05 PDT
Tony, no, you don't need to file any additional bugs. The patch that landed on trunk is waiting to get approved for the 1.8 and 1.8.0 branches. 
Comment 37 Mike Schroepfer 2006-07-29 18:38:45 PDT
bz is this safe for 1.8 branch?
Comment 38 Boris Zbarsky [:bz] (Out June 25-July 6) 2006-07-30 13:25:27 PDT
In my opinion, yes (otherwise I wouldn't have asked for approval ;)).  It's a technically sound change that should only have an effect in the corner case when the frame is destroyed by the time the timer fires, at which point we used to crash.  And there have been no regressions reported in the 6+ months since this was checked in.
Comment 39 Boris Zbarsky [:bz] (Out June 25-July 6) 2006-07-31 10:29:31 PDT
Created attachment 231436 [details] [diff] [review]
Patch merged to 1.8 branch
Comment 40 Boris Zbarsky [:bz] (Out June 25-July 6) 2006-07-31 10:30:32 PDT
Fixed on 1.8 branch.
Comment 41 Steve England [:stevee] 2006-08-05 10:59:41 PDT
*** Bug 347543 has been marked as a duplicate of this bug. ***
Comment 42 Daniel Veditz [:dveditz] 2006-08-15 15:40:08 PDT
Comment on attachment 208247 [details] [diff] [review]
patch_v2

approved for 1.8.0 branch, a=dveditz for drivers
Comment 43 Boris Zbarsky [:bz] (Out June 25-July 6) 2006-08-15 17:33:17 PDT
Checked in the fix for 1.8.0.7

Note You need to log in before you can comment on or make changes to this bug.