Mail hangs on biff attempt after network connection lost

VERIFIED FIXED

Status

MailNews Core
Networking: IMAP
P3
critical
VERIFIED FIXED
18 years ago
9 years ago

People

(Reporter: Peter Trudelle, Assigned: Bienvenu)

Tracking

Trunk
x86
Windows 98

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: rtm- relnote-user)

Attachments

(2 attachments)

(Reporter)

Description

18 years ago
Using today's N6 branch opt verification build on Win98
Login to IMAP account (haven't checked POP)
Set account to check mail every minute.
break current network connection, such as by disconnecting from SERA
A minute or so later, app will put up a blank alert and become unresponsive.
Have to give it the three-finger salute.
Subject: 
                      Re: Mail hangs when checking mail after SERA is disconnected
            Date: 
                      Fri, 03 Nov 2000 12:35:33 -0800
           From: 
                      bienvenu@netscape.com (David Bienvenu)
               To: 
                      David Bienvenu <bienvenu@netscape.com>
               CC: 
                      Jean-Francois Ducarroz <ducarroz@netscape.com>, Jeff Tsai <jefft@netscape.com>
  References: 
                      1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9




You could try the following change to see if it fixes the problem:

In nsImapProtocol::OnStopRequest, move the PR_CEnterMonitor and 
PR_CExitMonitor to after the Alert calls, so the routine would look 
something like this:


   PRBool killThread = PR_FALSE;

   if (NS_FAILED(aStatus)) {
       switch (aStatus) {
           case NS_ERROR_UNKNOWN_HOST:
               AlertUserEventUsingId(IMAP_UNKNOWN_HOST_ERROR);
               killThread = PR_TRUE;
               break;
           case NS_ERROR_CONNECTION_REFUSED:
               AlertUserEventUsingId(IMAP_CONNECTION_REFUSED_ERROR);
               killThread = PR_TRUE;
               break;
           case NS_ERROR_NET_TIMEOUT:
               AlertUserEventUsingId(IMAP_NET_TIMEOUT_ERROR);
               killThread = PR_TRUE;
               break;
           default:
               break;
       }
   }

   PR_CEnterMonitor(this);

       if (killThread == PR_TRUE) {
         ClearFlag(IMAP_CONNECTION_IS_OPEN);
         TellThreadToDie(PR_FALSE);
   }

   m_channel = null_nsCOMPtr();
   m_outputStream = null_nsCOMPtr();
   m_inputStream = null_nsCOMPtr();
   PR_CExitMonitor(this);
  return NS_OK;


David Bienvenu wrote:

>  From this, I'd say the UI thread and the IMAP thread are contending 
> over the auto lock on the nsImapProtocol object. In theory, we 
> shouldn't have to wait for the user to dismiss the alert (i.e., that 
> call could be non-blocking) but the XPCOM proxy stuff is all or 
> nothing on an interface basis, i.e., if any methods have to block, 
> they all have to block. We might have to use a non-blocking interface.
> 
> We could make it so that we don't hold a lock on the nsImapProtocol 
> object when putting up alerts, but I'm not sure what horrible race 
> condition bugs that would introduce.
> 
> - David
> 
> Jean-Francois Ducarroz wrote:
> 
>> Some other blocked threads:
>> 
>> NTDLL! 77f6829b()
>> NTDLL! 77f67546()
>> PR_EnterMonitor(PRMonitor * 0x00a21070) line 79 + 14 bytes
>> PR_CEnterMonitor(void * 0x12901028) line 308 + 9 bytes
>> nsImapProtocol::CanHandleUrl(nsImapProtocol * const 0x12901028, 
>> nsIImapUrl * 0x132c0830, int * 0x0012f0bc, int * 0x0012f0a0) line 
>> 1415 + 10 bytes
>> nsImapIncomingServer::CreateImapConnection(nsIEventQueue * 
>> 0x13083e30, nsIImapUrl * 0x132c0830, nsIImapProtocol * * 0x0012f124) 
>> line 644 + 44 bytes
>> nsImapIncomingServer::GetImapConnectionAndLoadUrl(nsImapIncomingServer 
>> * const 0x1247bdd8, nsIEventQueue * 0x13083e30, nsIImapUrl * 
>> 0x132c0830, nsISupports * 0x00000000) line 411 + 43 bytes
>> nsImapService::GetImapConnectionAndLoadUrl(nsIEventQueue * 
>> 0x13083e30, nsIImapUrl * 0x132c0830, nsISupports * 0x00000000, nsIURI 
>> * * 0x00000000) line 1920 + 38 bytes
>> nsImapService::SelectFolder(nsImapService * const 0x126f7950, 
>> nsIEventQueue * 0x13083e30, nsIMsgFolder * 0x1102a51c, nsIUrlListener 
>> * 0x11032ee4, nsIMsgWindow * 0x00000000, nsIURI * * 0x00000000) line 
>> 198 + 27 bytes
>> nsImapMailFolder::GetNewMessages(nsImapMailFolder * const 0x11032e6c, 
>> nsIMsgWindow * 0x00000000) line 1783 + 79 bytes
>> nsImapIncomingServer::PerformBiff(nsImapIncomingServer * const 
>> 0x1247bd80) line 869 + 28 bytes
>> nsMsgBiffManager::PerformBiff() line 321
>> OnBiffTimer(nsITimer * 0x132c54b0, void * 0x124793b0) line 40
>> nsTimer::Fire() line 194 + 17 bytes
>> nsTimerManager::FireNextReadyTimer(nsTimerManager * const 0x10f8d810, 
>> unsigned int 0) line 117
>> FireTimeout(HWND__ * 0x00000000, unsigned int 275, unsigned int 
>> 12565, unsigned long 9701078) line 89
>> USER32! 77e7185c()
>> nsXULWindow::ShowModal(nsXULWindow * const 0x13083a80) line 237
>> nsWebShellWindow::ShowModal(nsWebShellWindow * const 0x13083a80) line 
>> 1142
>> nsChromeTreeOwner::ShowModal(nsChromeTreeOwner * const 0x12ff7830) 
>> line 182
>> GlobalWindowImpl::OpenInternal(GlobalWindowImpl * const 0x00b1cc20, 
>> JSContext * 0x00b1b580, long * 0x128a6e38, unsigned int 4, int 1, 
>> nsIDOMWindowInternal * * 0x0012fa50) line 3122
>> GlobalWindowImpl::OpenDialog(GlobalWindowImpl * const 0x00b1cc24, 
>> JSContext * 0x00b1b580, long * 0x128a6e38, unsigned int 4, 
>> nsIDOMWindowInternal * * 0x0012fa50) line 2055
>> nsCommonDialogs::DoDialog(nsCommonDialogs * const 0x11c72240, 
>> nsIDOMWindowInternal * 0x00b1cc24, nsIDialogParamBlock * 0x130867e0, 
>> const char * 0x015d8738) line 453 + 49 bytes
>> nsCommonDialogs::Alert(nsCommonDialogs * const 0x11c72240, 
>> nsIDOMWindowInternal * 0x00b1cc24, const unsigned short * 0x13086840, 
>> const unsigned short * 0x13082100) line 70 + 27 bytes
>> nsDOMWindowPrompter::Alert(nsDOMWindowPrompter * const 0x130868c0, 
>> const unsigned short * 0x00000000, const unsigned short * 0x13082100) 
>> line 1877 + 55 bytes
>> nsSingleSignOnPrompt::Alert(nsSingleSignOnPrompt * const 0x13086880, 
>> const unsigned short * 0x00000000, const unsigned short * 0x13082100) 
>> line 431
>> nsNetSupportDialog::Alert(nsNetSupportDialog * const 0x13086bb0, 
>> const unsigned short * 0x00000000, const unsigned short * 0x13082100) 
>> line 73 + 31 bytes
>> nsImapIncomingServer::FEAlert(nsImapIncomingServer * const 
>> 0x1247bddc, const unsigned short * 0x13082100, nsIMsgWindow * 
>> 0x00000000) line 1568 + 29 bytes
>> XPTC_InvokeByIndex(nsISupports * 0x1247bddc, unsigned int 14, 
>> unsigned int 2, nsXPTCVariant * 0x130824f0) line 139
>> EventHandler(PLEvent * 0x13086120) line 513 + 41 bytes
>> PL_HandleEvent(PLEvent * 0x13086120) line 580 + 10 bytes
>> PL_ProcessPendingEvents(PLEventQueue * 0x00ac51d0) line 513 + 9 bytes
>> _md_EventReceiverProc(HWND__ * 0x004d024e, unsigned int 49322, 
>> unsigned int 0, long 11293136) line 1049 + 9 bytes
>> USER32! 77e71820()
>> 00ac51d0()
>> 
>> ------------------
>> 
>> I got several thread like this one:
>> 
>> NTDLL! 77f6829b()
>> KERNEL32! 77f04f41()
>> _PR_WaitCondVar(PRThread * 0x1104bda0, PRCondVar * 0x01f4e600, PRLock 
>> * 0x01f48710, unsigned int 4294967295) line 185 + 23 bytes
>> PR_Wait(PRMonitor * 0x01f4ef10, unsigned int 4294967295) line 155 + 
>> 29 bytes
>> nsAutoMonitor::Wait(unsigned int 4294967295) line 197 + 17 bytes
>> nsThreadPool::GetRequest(nsIThread * 0x1104bf10) line 458 + 10 bytes
>> nsThreadPoolRunnable::Run(nsThreadPoolRunnable * const 0x1104bf60) 
>> line 685 + 27 bytes
>> nsThread::Main(void * 0x1104bf10) line 84 + 26 bytes
>> _PR_NativeRunThread(void * 0x1104bda0) line 399 + 13 bytes
>> _threadstartex(void * 0x1104bbf0) line 212 + 13 bytes
>> KERNEL32! 77f04ee8()
>> 
>> -----------
>> 
>> NTDLL! 77f6829b()
>> KERNEL32! 77f04f41()
>> _PR_WaitCondVar(PRThread * 0x00a24970, PRCondVar * 0x00a24b30, PRLock 
>> * 0x00a24be0, unsigned int 5335080) line 185 + 23 bytes
>> PR_WaitCondVar(PRCondVar * 0x00a24b30, unsigned int 5335080) line 532 
>> + 23 bytes
>> MemoryFlusher::Run(MemoryFlusher * const 0x00a24c90) line 153 + 20 bytes
>> nsThread::Main(void * 0x00a24ae0) line 84 + 26 bytes
>> _PR_NativeRunThread(void * 0x00a24970) line 399 + 13 bytes
>> _threadstartex(void * 0x00a247c0) line 212 + 13 bytes
>> KERNEL32! 77f04ee8()
>> 
>> -----------
>> 
>> NTDLL! 77f6829b()
>> MSAFD! 77664a12()
>> WS2_32! 776b9f5e()
>> _PR_MD_PR_POLL(PRPollDesc * 0x01f4de60, int 1, unsigned int 3112130) 
>> line 224 + 35 bytes
>> PR_Poll(PRPollDesc * 0x01f4de60, int 1, unsigned int 3112130) line 
>> 115 + 17 bytes
>> nsSocketTransportService::Run(nsSocketTransportService * const 
>> 0x01f13734) line 385 + 24 bytes
>> nsThread::Main(void * 0x01f4e440) line 84 + 26 bytes
>> _PR_NativeRunThread(void * 0x01f4e9f0) line 399 + 13 bytes
>> _threadstartex(void * 0x01f49c40) line 212 + 13 bytes
>> KERNEL32! 77f04ee8()
>> 
>> 
>> Jean-Francois Ducarroz wrote:
>> 
>>> Right, looks like that we froze if the imap thread is blocked  by 
>>> the alert and then we fire biff. I've a stack trace of the UI i'll 
>>> post asap...
>>> 
>>> JFD
>>> 
>>> Jeff Tsai wrote:
>>> 
>>>> A similar way to reproduce the problem is right after (5) do a 
>>>> GetMsg()
>>>> again without closing down the alert dialog bug. (Simulating biff 
>>>> comes
>>>> up again in the backgroud.) Is biff causing the problem?
>>>> 
>>>> -- Jeff
>>>> 
>>>> Jean-Francois Ducarroz wrote:
>>>> 
>>>>  > I can see all the treads and a lot a them are waiting too but I 
>>>> cannot
>>>>  > figure out which one is the UI one and I cannot see the 
>>>> corresponding
>>>>  > stack!
>>>>  >
>>>>  > Here is how I reproduce the problem, it take about 5-10 minutes 
>>>> before
>>>>  > you freeze:
>>>>  >
>>>>  > 1) My screen saver is set to go on after 1 minutes of inactivity.
>>>>  > 2) Start mail tree pane, I have an IMAP and a POP account, IMAP is
>>>>  > suppose to check for mail every minutes
>>>>  > 3) Disconnect my ethernet cable
>>>>  > 4) press GetMessage with IMAP account selected
>>>>  > 5) wait, wait, wait than the connection error alert appears
>>>>  > 6) then wait also than the screen saver start
>>>>  > 7) wake up the PC
>>>>  > 8) the APP should be frozen, if not wait again few more minutes...
>>>>  >
>>>>  > JFD
>>>>  >
>>>>  > Jeff Tsai wrote:
>>>>  >
>>>>  >> Sounds to me there is a serious deadlock in between Imap 
>>>> thread, UI
>>>>  >> thread and network thread. JF, if you are using Windows, going
>>>>  >> through the thread list you will find all threads are inside the
>>>>  >> PR_Wait() call.
>>>>  >>
>>>>  >> -- Jeff
>>>>  >>
>>>>  >> Jean-Francois Ducarroz wrote:
>>>>  >>
>>>>  >>> The App freeze because it is block on the PR_Wait in the function
>>>>  >>> PL_WaitForEvent. I can reproduce the problem on Windows and 
>>>> Linux but
>>>>  >>> not on Mac, you just have disconnect you PC from the net 
>>>> (remove the
>>>>  >>> cable). Here is the stack:
>>>>  >>>
>>>>  >>> NTDLL! 77f6829b()
>>>>  >>> KERNEL32! 77f04f41()
>>>>  >>> _PR_WaitCondVar(PRThread * 0x118b5120, PRCondVar * 0x118d8730, 
>>>> PRLock *
>>>>  >>> 0x118d85d0, unsigned int 4294967295) line 185 + 23 bytes
>>>>  >>> PR_Wait(PRMonitor * 0x118d9e70, unsigned int 4294967295) line 
>>>> 155 +
>>>>  >>> 29 bytes
>>>>  >>> PL_WaitForEvent(PLEventQueue * 0x118d84d0) line 676 + 12 bytes
>>>>  >>> nsEventQueueImpl::WaitForEvent(nsEventQueueImpl * const 
>>>> 0x118d9eb0,
>>>>  >>> PLEvent * * 0x1337fbe4) line 431 + 12 bytes
>>>>  >>> nsProxyObject::PostAndWait(nsProxyObjectCallInfo * 0x11cb2550) 
>>>> line 359
>>>>  >>> + 27 bytes
>>>>  >>> nsProxyObject::Post(unsigned int 13, nsXPTMethodInfo * 
>>>> 0x127a50f8,
>>>>  >>> nsXPTCMiniVariant * 0x1337fca4, nsIInterfaceInfo * 0x00ab5e30) 
>>>> line 460
>>>>  >>> + 12 bytes
>>>>  >>> nsProxyEventObject::CallMethod(nsProxyEventObject * const 
>>>> 0x118d97d0,
>>>>  >>> unsigned short 13, const nsXPTMethodInfo * 0x127a50f8,
>>>>  >>> nsXPTCMiniVariant
>>>>  >>> * 0x1337fca4) line 429 + 52 bytes
>>>>  >>> PrepareAndDispatch(nsXPTCStubBase * 0x118d97d0, unsigned int 13,
>>>>  >>> unsigned int * 0x1337fd54, unsigned int * 0x1337fd44) line 100 
>>>> + 31
>>>>  >>> bytes
>>>>  >>> SharedStub() line 124
>>>>  >>> nsImapProtocol::AlertUserEventUsingId(unsigned int 5053) line 
>>>> 3965
>>>>  >>> nsImapProtocol::OnStopRequest(nsImapProtocol * const 0x01377ef8,
>>>>  >>> nsIChannel * 0x118dda94, nsISupports * 0x00000000, unsigned int
>>>>  >>> 2152398861, const unsigned short * 0x100a56c8 gCommonEmptyBuffer)
>>>>  >>> line 1221
>>>>  >>> nsOnStopRequestEvent::HandleEvent(nsOnStopRequestEvent * const
>>>>  >>> 0x11cb2940) line 302
>>>>  >>> nsStreamListenerEvent::HandlePLEvent(PLEvent * 0x11cb2de0) 
>>>> line 97 + 12
>>>>  >>> bytes
>>>>  >>> PL_HandleEvent(PLEvent * 0x11cb2de0) line 580 + 10 bytes
>>>>  >>> PL_ProcessPendingEvents(PLEventQueue * 0x118d84d0) line 513 + 
>>>> 9 bytes
>>>>  >>> nsEventQueueImpl::ProcessPendingEvents(nsEventQueueImpl * const
>>>>  >>> 0x118d9eb0) line 356 + 12 bytes
>>>>  >>> nsImapProtocol::CreateNewLineFromSocket() line 3702
>>>>  >>> nsImapProtocol::EstablishServerConnection() line 928 + 8 bytes
>>>>  >>> nsImapProtocol::ProcessCurrentURL() line 1033
>>>>  >>> nsImapProtocol::ImapThreadMainLoop() line 894 + 14 bytes
>>>>  >>> nsImapProtocol::Run(nsImapProtocol * const 0x01377efc) line 694
>>>>  >>> nsThread::Main(void * 0x118b6fd0) line 84 + 26 bytes
>>>>  >>> _PR_NativeRunThread(void * 0x118b5120) line 399 + 13 bytes
>>>>  >>> _threadstartex(void * 0x118b52d0) line 212 + 13 bytes
>>>>  >>> KERNEL32! 77f04ee8()
>>>>  >>>
>>>>  >>> I'll try to see if I can reproduce it with just the browser...
>>>>  >>>
>>>>  >>> Jean-Francois
>>>>  >>>
>>>>  >>> Lisa Chiang wrote:
>>>>  >>>
>>>>  >>>  > Trudelle filed this same bug recently:
>>>>  >>>  > http://bugzilla.mozilla.org/show_bug.cgi?id=58547
>>>>  >>>  >
>>>>  >>>  > This was an issue a while back that got marked fixed due to 
>>>> another
>>>>  >>>  > bug http://bugzilla.mozilla.org/show_bug.cgi?id=47666.  
>>>> Perhaps
>>>>  >>> now it
>>>>  >>>  > is back?
>>>>  >>>  >
>>>>  >>>  > Rod Spears wrote:
>>>>  >>>  >
>>>>  >>>  >> 1) I am reading mail connected to SERA
>>>>  >>>  >> 2) I leave mail running
>>>>  >>>  >> 3) Shutdown SERA
>>>>  >>>  >> 4) Accidently check mail without SERA being restarted
>>>>  >>>  >> 5) Entire app freezes
>>>>  >>>  >> 6) Must kill app via task manager
>>>>  >>>  >>
>>>>  >>>  >> Is there a bug on this?
>>>>  >>>  >>
>>>>  >>>  >> Rod
>>>>  >>>  >>
>>>>  >>>  >>
>>>>  >>>  >
>>>>  >>>  >
>>>>  >>>
>>>> 
>> 
>> 
> 

Seems to be a very seriou problem, nomination for RTM.
Keywords: rtm
I haven't tried your patch but if I disable biff, I don't freeze anymore.

Comment 4

18 years ago
Same as bug 55073.  And yes, it's very serious -- it's one of the reasons I'm
not using mail for dogfood, since our mail server goes down so often and every
time that happens, the app hangs.
David, if I correctly follow your proposition, we should also call PR_CExitMonitor(this) only if status is a failure!
(Assignee)

Comment 6

18 years ago
No, the code as I attached it always calls
PR_CEnter and PR_CExit Monitor - it just does it after putting up the alert, if
we do put up the alert.
oh sorry, you are right, I did see that you move the kill thread code to. Let my test it again...

Comment 8

18 years ago
*** Bug 55073 has been marked as a duplicate of this bug. ***
The patch works well for me, no more freeze and Imap semms to work correctly.

If we cannot check in this fix into the RTM, we must at least warm users of the
problem in the release not and give the work around which is either to close the
app when they close the connection of never activate biff.
Keywords: relnoteRTM

Comment 10

18 years ago
I can reproduce this problem on 11-03-09-MN6 build.
Change QA contact to me so I will verify this bug after developer fix this
problem.

Keywords: relnoteRTM
QA Contact: esther → huang

Comment 11

18 years ago
But we only enter the monitor if NS_FAILED(aStatus) { ....
PR_CEnterMonitor(this); ...}. Does this work with the normal situation?

Comment 12

18 years ago
*** Bug 57823 has been marked as a duplicate of this bug. ***

Comment 13

18 years ago
Created attachment 18666 [details] [diff] [review]
Slightly modifed David's patch

Comment 14

18 years ago
Wait. I was wrong. David did have a correct patch. Sorry.

Comment 15

18 years ago
It seemed that if I didn't turn on the biff, I won't get the hang. I will get a 

alert: Connection refused to the server and after select OK and then I can close 

the application.

I should log another bug for the Alert should display for warning users about the 

disconnection from the network (but that will be a future bug, I will log later) 

But, it seems that this hang will occur after turn on the biff...Ccing fenella.

The only side effect I have seen yet is that I get an alert every time biff is
running. Therefore if I let my computer in idle for a long period, when I wake
it up, I get plenty of alerts stacked on the top of each others. Maybe we can
block the re-entrancy to avoid shoing a new alert until the user dismiss the
previous one!
(Assignee)

Comment 17

18 years ago
I wouldn't call that a side-effect of the fix - it's what I would have expected
to happen if there was no bug.

However, biff shouldn't be causing any error messages to get put up in the first
place. So, a better fix would be to avoid putting up those alerts if it's a biff
that's running. We'd need to have a way to determine that, however, since we no
longer have a special biff context.
(Assignee)

Comment 18

18 years ago
reassign to me - Scott, can I get an r= so I can check this into the trunk so at
least internal MNTrunk builds will have this fix.
Assignee: mscott → bienvenu
(Assignee)

Comment 19

18 years ago
Created attachment 18722 [details] [diff] [review]
real patch for the trunk

Comment 20

18 years ago
rtm-, biff is off by default.  We should definitely release note that biff can
cause this problem.
Keywords: relnoteRTM
Whiteboard: rtm-
Whiteboard: rtm- → rtm- relnote-user

Comment 21

18 years ago
Lots of stacked modal dialogs is almost as bad as the original hang.  It does
mean that if I have some really important unsaved data, I can get back to it if
I care enough to spend the time to dismiss all those dialogs; but most of the
time, I'll just kill the app because it's not worth the time.

In the absence of a biff context, could we set a bit for "server is down and
there's a dialog showing", and don't biff if that bit is set?
(Assignee)

Comment 22

18 years ago
No, I think we just need to make biff url's setup so that imap won't put up
error alerts at all. This is the way 4.x worked.
(Assignee)

Comment 23

18 years ago
Scott, can I get an sr =? this fix is orthogonal to the question of whether biff
should put up errors or not, since you can get into this situation by clicking
get new mail as well.
Status: NEW → ASSIGNED

Comment 24

18 years ago
sr=mscott
(Assignee)

Comment 25

18 years ago
Fixed on trunk - I'll open a separate bug for biff putting up alerts.
Status: ASSIGNED → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED

Comment 26

18 years ago
Verified on an IMAP account with biff setup for Win98 11-21-08-Mtrunk build:
There is NO alert displaying and NO hang occurring after disconnecting from the 
SERA now.  
And, as David mentioned that bug 59802 already logged for tracking biff putting 
up alerts problem.
Marking as verified.
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.