Closed Bug 241733 Opened 20 years ago Closed 18 years ago

crash if I attempt to move an email while another email is being sent [@ nsMenuFrame::Notify ]

Categories

(Core :: Layout, defect)

x86
Windows 2000
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: l1a04uv02, Assigned: jlurz24)

References

(Blocks 1 open bug)

Details

(Keywords: crash, fixed1.8.0.7, fixed1.8.1, Whiteboard: [need testcase])

Crash Data

Attachments

(3 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113

When I move a message from my inbox to a local subfolder on my system at the
same time as I am sending another message to an smtp relay, the browser may crash.

Reproducible: Sometimes
Steps to Reproduce:
This problem happens fairly regularly, however have trouble reproducing 'at
will' - that said:

1.  Prepare email, adding attachments makes the issue more likely to happen.
2.  Select 'send this mail now' icon.
3.  While the email is being sent ('sending message' box is visible)
4.  Go to inbox (IMAP for me) select an email, right click, seclect 'move
mail..' and move through the popup menu to get to a local subfolder for storage.
5.  If you are still moving the email at the same time as the 'sending message'
box is visible, then there will be a short (1 sec?) hand, and a crash will
happen, with full browser and mail client dying.

The issue seems to be related to the phase where the sending message box is, and
the drawing of the subfolder menu on the screen.

Adding an attachment creates a larger email to send, so the 'sending message'
box is on the screen longer, and movre likely to cause the crash.

Other info that may be relevent:
Local PC to mailbox on IMAP server.  Email is sent through an SMTP relay when
leaving my PC.  I keep subfolders on my local disk, as well as 'sent' folder. 
Is it possible that its the storing to the local 'sent folder' at the same time
as I do the 'move' of the email from the inbox... just a thought.

I have found it happens to me less often than it did, but thats because now I
have placed where the crash comes from I workaround the issue by habit.  I would
have seen this issue since first loading mozilla at work, around v1.3


Actual Results:  
Browser crash.  Browser restarts fine afterwards.

Expected Results:  
sent the email, and moved the other one to the desired location without crashing.
I have just reproduced the bug, the netscape feedback agent ID is : TB32396233H
Reporter: Can you try this with 1.7rc1? The Talkback version included in Mozilla
1.6 sends the data to Netscape (AOL) and not to mozilla.org
Keywords: crash, stackwanted
No, i really need a TB id from Mozilla 1.7b or Mozilla1.7rc1, the stack itself
from TB is not useful.
ok - I have run 1.7rc-1 and caused a crash : TB31486M

Comments:
It was harder to cause the crash.  I had it down to a 1 in 3 tries (say) on
Moz1.6.  I tried about 30 times to get the crash on 1.7rc-1, to the point where
I thought it was fixed, until I eventually got lucky.

I'm guessing... but the speed thing seemed to affect it.  The client appears
more responsive (snappier?) to me, and if it is a timing thing this may have
affected my chances at relicating this issue.

... amongst all this, I did crash it twice in another way which is reproducible,
but that is a different thing, I'll do a search before logging the other one.

Sam
from talkback:

nsMenuFrame::Notify
[d:/BUILDS/tinderbox/Mozilla1.7/WINNT_5.0_Clobber/mozilla/layout/xul/base/src/nsMenuFrame.cpp,
line 1297]
nsTimerImpl::Fire
[d:/BUILDS/tinderbox/Mozilla1.7/WINNT_5.0_Clobber/mozilla/xpcom/threads/nsTimerImpl.cpp,
line 395]
handleTimerEvent
[d:/BUILDS/tinderbox/Mozilla1.7/WINNT_5.0_Clobber/mozilla/xpcom/threads/nsTimerImpl.cpp,
line 449]
PL_HandleEvent
[d:/BUILDS/tinderbox/Mozilla1.7/WINNT_5.0_Clobber/mozilla/xpcom/threads/plevent.c,
line 674]
PL_ProcessPendingEvents
[d:/BUILDS/tinderbox/Mozilla1.7/WINNT_5.0_Clobber/mozilla/xpcom/threads/plevent.c,
line 612]
nsEventQueueImpl::ProcessPendingEvents
[d:/BUILDS/tinderbox/Mozilla1.7/WINNT_5.0_Clobber/mozilla/xpcom/threads/nsEventQueue.cpp,
line 395]

I'm not sure what's going on.

see http://lxr.mozilla.org/mozilla/source/layout/xul/base/src/nsMenuFrame.cpp#1290

perhaps we need to (at least) safety check mContent or mMenuParent?
Assignee: sspitzer → mscott
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: crash if I attempt to move an email while another email is being sent → crash if I attempt to move an email while another email is being sent [@ nsMenuFrame::Notify ]
Product: MailNews → Core
*** Bug 287565 has been marked as a duplicate of this bug. ***
A frame as an XPCOM timer callback? Sounds fishy to me...
Er....  It's _very_ fishy.  There should be a separate callback object here (one that's actually refcounted), and the frame should notify it when it gets destroyed.  As the code stands we're probably making a callback on an already-deleted object.

And if so, we need to fix this on 1.8 branch too.  :(

If no one else does this I guess I can try to, but I won't have time until a few weeks from now with other stuff that's piled up.
Flags: blocking1.9a1?
Flags: blocking1.8.1?
Flags: blocking1.8.0.1?
Keywords: helpwanted
Without a reviewed/tested patch this one looks like it misses the .0.1 train.
Flags: blocking1.8.0.1? → blocking1.8.0.1-
Well, I guess we need to cancel the timer on destruction (in which case the last line of Notify looks suspicious) plus probably switch to a function callback?
That could work.  Or we could switch to using a separate refcounted object here like nsImageFrame does for the imagelib callbacks.
Attached patch patch_v1 (obsolete) — Splinter Review
Hopefully nobody else was busy fixing this, I didn't have the bug privileges to assign it to myself. This implements I believe what bz was suggesting.

This compiles and runs, my build currently has a bunch of assertions that that I'm hoping aren't related. This should fix the crash, but since this crash isn't easily reproducable I couldn't test it.
Attachment #208174 - Flags: review?(bzbarsky)
From a quick search through lxr I think nsMenuPopupFrame and nsAutoRepeatBoxFrame may have the same bug. nsScrollBarButtonFrame addrefs when it QIs to nsITimerCallback, not sure if that's right. nsSliderFrame does this correctly like nsImageFrame, but names the class nsSliderMediator which I like more than the name I came up with.
Comment on attachment 208174 [details] [diff] [review]
patch_v1

>+  NS_REINTERPRET_CAST(nsMenuFrameListener*, mListener.get())->ClearFrame();
By making mListener an nsRefPtr<nsMenuFrameListener> you avoid this (incorrect - should have been static) cast.

>+  friend class nsMenuFrameListener;
ClearFrame is public, no?
(In reply to comment #14)
>From a quick search through lxr I think nsMenuPopupFrame and
>nsAutoRepeatBoxFrame may have the same bug.
At least nsAutoRepeatBoxFrame stops the repeat service in Destroy.
(In reply to comment #15)
> (From update of attachment 208174 [details] [diff] [review] [edit])
> >+  NS_REINTERPRET_CAST(nsMenuFrameListener*, mListener.get())->ClearFrame();
> By making mListener an nsRefPtr<nsMenuFrameListener> you avoid this (incorrect
> - should have been static) cast.
Good idea, I thought that looked odd(it's copied from nsImageFrame)

> >+  friend class nsMenuFrameListener;
> ClearFrame is public, no?
The macros obscure this a bit, I did this so Notify on the nsMenuFrame could be private to make it clear that the method should only be called from nsMenuFrameListener.

> At least nsAutoRepeatBoxFrame stops the repeat service in Destroy.
Ah so it stops the timer. It looks like there were a few methods used to stop this type of crash.
I have a patch that fixes Neil's comments, but I'm going to wait for additional review comments before I upload a new patch.

I searched talkback and found 448 crashes in nsMenuFrame::Notify, looks like there are numerous ways to trigger this.
http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=1&searchby=stacksig&match=contains&searchfor=nsMenuFrame%3A%3ANotify&vendor=MozillaOrg&product=All&platform=All&buildid=&sdate=&stime=&edate=&etime=&sortby=bbid
Comment on attachment 208174 [details] [diff] [review]
patch_v1

>Index: layout/xul/base/src/nsMenuFrame.cpp
>+  // Setup a listener which can be used for callbacks on this frame.

"Set up" (two words).

>+  mListener = new nsMenuFrameListener(this);

Maybe call it nsMenuTimerMediator or something?

>@@ -226,16 +230,20 @@ nsMenuFrame::Init(nsPresContext*  

>+  // Null out the pointer to this frame in the listener wrapper so that it 
>+  // doesn't try to interact with a deallocated frame.
>+  NS_REINTERPRET_CAST(nsMenuFrameListener*, mListener.get())->ClearFrame();

I'd do this in Destroy.  And if you make mListener a nsRefPtr<nsMenuFrameListener> there should be no need for a cast here.

To be safe, do we want to cancel the timer here too?

>+nsMenuFrameListener::nsMenuFrameListener(nsMenuFrame *aFrame) :
>+  mFrame(aFrame)
>+{

  NS_ASSERTION(mFrame, "Must have frame");

please.

>+void nsMenuFrameListener::ClearFrame()
>+{
>+  mFrame = 0;

nsnull, not 0.

>Index: layout/xul/base/src/nsMenuFrame.h

>+class nsMenuFrameListener : public nsITimerCallback
>+  virtual ~nsMenuFrameListener();

Can you make that destructor private and non-virtual?  If so, great.  If not, I guess this is ok.
>@@ -218,26 +240,32 @@ protected:
>+  NS_IMETHOD Notify(nsITimer* aTimer);

This should just be NS_HIDDEN_(nsresult), not an NS_IMETHOD.

With those nits picked, r=bzbarsky.  Thanks a ton for doing this!  And if we have other places doing this sort of thing we should fix them too...
Attachment #208174 - Flags: review?(bzbarsky) → review+
Attached patch patch_v2Splinter Review
I think this addresses all the review comments. I tried making the destructor private, but that caused a warning about the destructor being inaccessable. I think this warning is correct, not that it would matter because the destructor is empty. I did make it non-virtual though. Carrying over r+.

Does this need sr?(the bugzilla component is definitely wrong)

I believe all the other frames which implement nsITimerCallback are safe from crashes, but they do block bug 322939 (AddRef on frames should assert).
Attachment #208174 - Attachment is obsolete: true
Attachment #208247 - Flags: review+
Yes this patch needs sr, too.
Assignee: mscott → jlurz24
Component: Networking: SMTP → Layout
Keywords: stackwanted
QA Contact: layout
Attachment #208247 - Flags: superreview?(roc)
Comment on attachment 208247 [details] [diff] [review]
patch_v2

sr=me too.  ;)
Attachment #208247 - Flags: superreview?(roc) → superreview+
Checked in.  File new bugs on the remaining classes, if any?  I'd really like to fix bug 322939.  ;)
Blocks: 322939
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
(In reply to comment #19)
>To be safe, do we want to cancel the timer here too?
I'm still worried that Notify might clear the wrong timer.
What do you mean?
Well, Notify always clears mOpenTimer, but only cancels it if it equals aTimer.
I think that code is bogus.  If we're getting Notify() called, the timer has to be mOpenTimer, just based on where we set mOpenTimer...
Yeah I can't see how another timer could call Notify, especially with the mediator in place. Want me to change that to an assert and attach a patch?
Blocks: 322162
> Want me to change that to an assert and attach a patch?

Either way.  Sorry I missed this question before...

But it looks like at least nsMenuPopupFrame is not safe from crashes.  See bug 246712, bug 268399, bug 302584.  :(
*** Bug 339535 has been marked as a duplicate of this bug. ***
*** Bug 322023 has been marked as a duplicate of this bug. ***
*** Bug 342658 has been marked as a duplicate of this bug. ***
*** Bug 345969 has been marked as a duplicate of this bug. ***
(In reply to comment #33)
> *** Bug 345969 has been marked as a duplicate of this bug. ***
> 

Bug 345969 = crash on Fx 1.5.0.5 nightly, happened a few minutes ago.
Tentative dupe. Feel free to un-dupe if stack isn't close enough.
I've been getting crashes now and then on Linux Fx 1.5.0.5 which have been duped to this bug. Will the fix be ported "in due time" or should I open a separate bug for Linux and/or 1.8.0.x Branch ?
Attachment #208247 - Flags: approval1.8.1?
Attachment #208247 - Flags: approval1.8.0.6?
Tony, no, you don't need to file any additional bugs. The patch that landed on trunk is waiting to get approved for the 1.8 and 1.8.0 branches. 
bz is this safe for 1.8 branch?
In my opinion, yes (otherwise I wouldn't have asked for approval ;)).  It's a technically sound change that should only have an effect in the corner case when the frame is destroyed by the time the timer fires, at which point we used to crash.  And there have been no regressions reported in the 6+ months since this was checked in.
Attachment #208247 - Flags: approval1.8.1? → approval1.8.1+
Fixed on 1.8 branch.
Flags: blocking1.9a1?
Keywords: helpwantedfixed1.8.1
*** Bug 347543 has been marked as a duplicate of this bug. ***
Comment on attachment 208247 [details] [diff] [review]
patch_v2

approved for 1.8.0 branch, a=dveditz for drivers
Attachment #208247 - Flags: approval1.8.0.7? → approval1.8.0.7+
Checked in the fix for 1.8.0.7
Keywords: fixed1.8.0.7
Whiteboard: [need testcase]
Crash Signature: [@ nsMenuFrame::Notify ]
You need to log in before you can comment on or make changes to this bug.