Last Comment Bug 550455 - Crash when m_foldersToStat.Count() == 0 [@ nsImapIncomingServer::OnStopRunningUrl]
: Crash when m_foldersToStat.Count() == 0 [@ nsImapIncomingServer::OnStopRunnin...
Status: RESOLVED FIXED
: crash
Product: MailNews Core
Classification: Components
Component: Networking: IMAP (show other bugs)
: Trunk
: x86 Linux
: -- critical with 2 votes (vote)
: ---
Assigned To: David :Bienvenu
:
:
Mentors:
: 564206 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-03-05 04:40 PST by Jan Horak
Modified: 2011-06-09 14:58 PDT (History)
7 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
beta2-fixed
-
.5-fixed


Attachments
proposed fix (856 bytes, patch)
2010-03-09 20:43 PST, David :Bienvenu
neil: superreview-
Details | Diff | Splinter Review
less hacky but more complicated fix (3.55 KB, patch)
2010-03-26 11:18 PDT, David :Bienvenu
no flags Details | Diff | Splinter Review
fix addressing comments (3.61 KB, patch)
2010-03-31 08:37 PDT, David :Bienvenu
neil: review+
neil: superreview+
Details | Diff | Splinter Review
patch landed (3.59 KB, patch)
2010-04-01 12:01 PDT, David :Bienvenu
standard8: approval‑thunderbird3.0.5+
Details | Diff | Splinter Review

Description Jan Horak 2010-03-05 04:40:33 PST
Crash occurs in nsImapIncomingServer.cpp:
nsImapIncomingServer::OnStopRunningUrl(nsIURI *url, nsresult exitCode):
...
    case nsIImapUrl::nsImapFolderStatus:
    {
      PRInt32 folderCount = m_foldersToStat.Count();
      nsCOMPtr<nsIMsgFolder> msgFolder(
->        do_QueryInterface(m_foldersToStat[folderCount - 1]));

when m_foldersToStat array is empty. I'm unable to reproduce but we have couple of automatic reports at https://bugzilla.redhat.com/show_bug.cgi?id=570391 (please see for full backtrace). I can reproduce only by manually setting folderCount variable to 0.
Comment 2 Jan Horak 2010-03-08 00:26:46 PST
(In reply to comment #1)
> related to any of these?
> *
I don't think so. I can't find any crash in nsImapIncomingServer::OnStopRunningUrl on crashstats nor something similar.
Comment 3 Kyle K. 2010-03-09 18:52:29 PST
This is happening to me multiple times a day:  https://bugzilla.redhat.com/show_bug.cgi?id=570041

Is this ticket getting the visibility required?  Having thunderbird constantly segfault on me is surprising since it's always been fairly solid in the past.
Comment 4 David :Bienvenu 2010-03-09 20:43:08 PST
Created attachment 431555 [details] [diff] [review]
proposed fix

I don't know why this situation should happen, or how to test for it, but this should fix it.
Comment 5 neil@parkwaycc.co.uk 2010-03-10 04:08:17 PST
Could this happen when getting new messages for non inbox folders while there are still folders waiting to update statue?
Comment 6 Wayne Mery (:wsmwk, NI for questions) 2010-03-10 04:16:43 PST
FWIW, I took a spin through crashes with OnStopRunningUrl in the stack and did not find any matches with it in the top couple frames.
* nsXPTCStubBase::Release()  bp-2bab6ac8-4f77-4780-beeb-923892100308
* nsImapProtocol::SetupWithUrl(nsIURI*, nsISupports*) bp-fc01edc8-dbf5-42c8-846a-6faa92100306
Comment 7 David :Bienvenu 2010-03-10 07:05:41 PST
(In reply to comment #5)
> Could this happen when getting new messages for non inbox folders while there
> are still folders waiting to update statue?

Well, we're running a STATUS url when this happens, and the incoming server is the listener for the url. AFAIK, that combination only happens when doing a STATUS to check non-inbox folders for new mail. We run other STATUS urls, but the imap incoming server is not the listener for those urls.
Comment 8 neil@parkwaycc.co.uk 2010-03-12 01:23:59 PST
Comment on attachment 431555 [details] [diff] [review]
proposed fix

So, as discussed on IRC, this just wallpapers over the crash. The real cause is that each call to GetNewMessagesForNonInboxFolders triggers a chain of folder updates, but there's no protection against multiple chains simultaneously using the same array.
Comment 9 Jan Horak 2010-03-26 08:23:24 PDT
This crash is rather frequent for significant amount of Linux users (see https://bugzilla.redhat.com/show_bug.cgi?id=570655 for example). Could this be a blocker for next release (3.0.5)?
Comment 10 David :Bienvenu 2010-03-26 08:45:17 PDT
It's not showing up high in our crashstats for some reason. I am planning on fixing it, though I have to convince Neil of the rightness of the fix...the first thing is that I need to remove the assumption that the current folder is also the last folder in the stat array, which can be wrong when you get multiple calls to GetNewMessagesForNonInboxFolders.
Comment 11 Wayne Mery (:wsmwk, NI for questions) 2010-03-26 08:51:38 PDT
I'm looking at related crash sigs/bugs and may have more info later. But I want add that Matej in https://bugzilla.redhat.com/show_bug.cgi?id=570655#c3 mentions their stack is the same as bug 411147. I haven't verified that. Not also, I wouldn't bank on bug 411147 being windows-only because my comment "windows only per crash-stats" simply means I didn't see any linux crashes.
Comment 12 Mark Banner (:standard8) 2010-03-26 09:14:12 PDT
I've just dug deep into crash-stats. I've found only one instance of this reported in the last two weeks:

bp-ab9ff0c2-3e7f-429d-a482-ad2a32100316

Therefore not blocking on this as that isn't anywhere near a top-crasher. I would say its wanted though.

jhorak: Are you sure this isn't something more specific to the Fedora build? An additional patch (I don't know what you guys do) or something?.

Also I notice from the first comment in that bug, it looks like your users are running a 64 bit build there and that's something we don't support/test/run at all at the moment (we don't even have nightlies). So you may want to take a look around and see if there's a 64-bit bug hiding somewhere.
Comment 13 David :Bienvenu 2010-03-26 09:31:52 PDT
I believe this is a case of linux users being much more likely to be checking all folders for new messages, because of server side filters, and being likely to have multiple accounts. I've seen this crash myself, and I'm only checking a few folders for new messages.
Comment 14 Ludovic Hirlimann [:Usul] 2010-03-26 09:54:38 PDT
That's where it would be nice to receive crash stats from linux builds by distros.
Comment 15 David :Bienvenu 2010-03-26 09:59:55 PDT
(In reply to comment #14)
> That's where it would be nice to receive crash stats from linux builds by
> distros.

exactly.

My plan for fixing this is not to leave the currently statted folder in m_foldersToStat. This will make things a lot simpler as far as dealing with multiple actors on the array is concerned.
Comment 16 David :Bienvenu 2010-03-26 11:18:30 PDT
Created attachment 435225 [details] [diff] [review]
less hacky but more complicated fix

This change makes it so the current STATUS operation doesn't have its folder in m_foldersToStat, which means we have to check if there are any folders left before doing the next stat. It also means we won't be potentially removing the wrong folder when the STATUS operation finishes. And I put some code to prevent us from putting the same folder in the array multiple times, which should ameliorate the case of the user hitting get new mail twice.
Comment 17 neil@parkwaycc.co.uk 2010-03-28 16:18:49 PDT
Comment on attachment 435225 [details] [diff] [review]
less hacky but more complicated fix

Although this obviously avoids removing the wrong folder, it doesn't actually stop multiple attempts to check for status, and in fact would make it harder to retrofit such a check.

>+      // if we get an error running the url, it's better
>+      // not to chain the next url.
>+      if (NS_FAILED(exitCode))
>+        break;
What about the remaining folders?

>+        nsCOMPtr<nsIMsgImapMailFolder> folder = m_foldersToStat[folderCount - 1];
>+        m_foldersToStat.RemoveObjectAt(folderCount - 1);
>+        folder->UpdateStatus(this, nsnull);
Is UpdateStatus not always async?

>-      if (imapFolder && !isServer)
>+      if (imapFolder && !isServer &&
>+          m_foldersToStat.IndexOfObject(imapFolder) == -1)
Nit: IndexOfObject is really slow because it QIs the argument and each member to nsISupports first. If you don't need that just use IndexOf instead.
Comment 18 David :Bienvenu 2010-03-28 17:32:42 PDT
(In reply to comment #17)
> (From update of attachment 435225 [details] [diff] [review])
> Although this obviously avoids removing the wrong folder, it doesn't actually
> stop multiple attempts to check for status, and in fact would make it harder to
> retrofit such a check.

It does, by preventing duplicates from getting added to the array of folders to stat.


> 
> >+      // if we get an error running the url, it's better
> >+      // not to chain the next url.
> >+      if (NS_FAILED(exitCode))
> >+        break;
> What about the remaining folders?

An error at this point is generally something that's not recoverable, e.g., the server is down, or we're shutting down. The remaining folders will still be in the array, so if we do STAT again, we'll get them on the next biff round. 

> 
> >+        nsCOMPtr<nsIMsgImapMailFolder> folder = m_foldersToStat[folderCount - 1];
> >+        m_foldersToStat.RemoveObjectAt(folderCount - 1);
> >+        folder->UpdateStatus(this, nsnull);
> Is UpdateStatus not always async?

It is. I've changed m_foldersToStat to mean the folders still remaining to stat. Once we've started statting a folder, it is removed from the array.

> 
> >-      if (imapFolder && !isServer)
> >+      if (imapFolder && !isServer &&
> >+          m_foldersToStat.IndexOfObject(imapFolder) == -1)
> Nit: IndexOfObject is really slow because it QIs the argument and each member
> to nsISupports first. If you don't need that just use IndexOf instead.

OK, that's good to fix.
Comment 19 David :Bienvenu 2010-03-28 17:33:56 PDT
(In reply to comment #18)

> > Although this obviously avoids removing the wrong folder, it doesn't actually
> > stop multiple attempts to check for status, and in fact would make it harder to
> > retrofit such a check.
> 
> It does, by preventing duplicates from getting added to the array of folders to
> stat.
> 
I suppose it makes it so the currently stat'd folder could again be requested to be stat'd. That doesn't bother me. You might actually get a different answer the next time.
Comment 20 neil@parkwaycc.co.uk 2010-03-30 08:22:57 PDT
(In reply to comment #18)
> (In reply to comment #17)
> > (From update of attachment 435225 [details] [diff] [review])
> > Although this obviously avoids removing the wrong folder, it doesn't actually
> > stop multiple attempts to check for status, and in fact would make it harder to
> > retrofit such a check.
> It does, by preventing duplicates from getting added to the array of folders to
> stat.
Sorry for being unclear, I was referring to multiple checks on the same server, rather than the same folder. (If you clicked Get Messages too many times, could you max out the cached connections?)

> > >+      // if we get an error running the url, it's better
> > >+      // not to chain the next url.
> > >+      if (NS_FAILED(exitCode))
> > >+        break;
> > What about the remaining folders?
> An error at this point is generally something that's not recoverable, e.g., the
> server is down, or we're shutting down. The remaining folders will still be in
> the array, so if we do STAT again, we'll get them on the next biff round. 
So the idea is that because you won't add them again, you don't need to remove them?

> > >+        nsCOMPtr<nsIMsgImapMailFolder> folder = m_foldersToStat[folderCount - 1];
> > >+        m_foldersToStat.RemoveObjectAt(folderCount - 1);
> > >+        folder->UpdateStatus(this, nsnull);
> > Is UpdateStatus not always async?
> It is. I've changed m_foldersToStat to mean the folders still remaining to
> stat. Once we've started statting a folder, it is removed from the array.
I was just wondering whether you could write
m_foldersToStat[folderCount - 1]->UpdateStatus(this, nsnull);
or is that being too sneaky?
Comment 21 neil@parkwaycc.co.uk 2010-03-30 08:32:29 PDT
(In reply to comment #19)
> (In reply to comment #18)
> > > Although this obviously avoids removing the wrong folder, it doesn't actually
> > > stop multiple attempts to check for status, and in fact would make it harder to
> > > retrofit such a check.
> > It does, by preventing duplicates from getting added to the array of folders to
> > stat.
> I suppose it makes it so the currently stat'd folder could again be requested
> to be stat'd. That doesn't bother me.
On the other hand you could perhaps work around that with a hybrid approach whereby you get the folder from the url (rather than the existing assumption that it's the last folder) and removing that from the array (helpfully RemoveObject returns PR_TRUE only if the object was in fact in the array.)
Comment 22 David :Bienvenu 2010-03-30 08:35:22 PDT
(In reply to comment #20)

> Sorry for being unclear, I was referring to multiple checks on the same server,
> rather than the same folder. (If you clicked Get Messages too many times, could
> you max out the cached connections?)

Maybe, but that would just mean that the requests would get queued. It wouldn't break anything. Ideally, we would prevent duplicate attempts, but the cure might be worse than the disease, if we get it wrong and permanently think we're doing a stat when we're not.

> So the idea is that because you won't add them again, you don't need to remove
> them?

Well, the idea was more that the next time around we should pick up where we left off. But since it's going to be the same set of folders the next time around, you're right that I might as well clear them.

> I was just wondering whether you could write
> m_foldersToStat[folderCount - 1]->UpdateStatus(this, nsnull);
> or is that being too sneaky?

Too sneaky, and that would cause issues if UpdateStatus failed synchronously, which is a possibility we should allow for (I guess that's what you were asking).
Comment 23 David :Bienvenu 2010-03-30 08:37:08 PDT
(In reply to comment #21)

> On the other hand you could perhaps work around that with a hybrid approach
> whereby you get the folder from the url (rather than the existing assumption
> that it's the last folder) and removing that from the array (helpfully
> RemoveObject returns PR_TRUE only if the object was in fact in the array.)

I started off doing just that, and decided not to, for a very good reason which I'll try to remember :-)
Comment 24 neil@parkwaycc.co.uk 2010-03-31 06:32:59 PDT
(In reply to comment #23)
> (In reply to comment #21)
> > On the other hand you could perhaps work around that with a hybrid approach
> > whereby you get the folder from the url (rather than the existing assumption
> > that it's the last folder) and removing that from the array (helpfully
> > RemoveObject returns PR_TRUE only if the object was in fact in the array.)
> I started off doing just that, and decided not to, for a very good reason which
> I'll try to remember :-)
Please do, that's the only remaining issue blocking my review!
Comment 25 David :Bienvenu 2010-03-31 08:37:36 PDT
Created attachment 436199 [details] [diff] [review]
fix addressing comments

The idea of not having to find the currently STATUS'd folder and remove it when done was the main appeal, iirc. But it occurred to me that if I start with the first folder in the array instead of the last, finding the current folder should in general be quick, and the code could be simplified a bit...
Comment 26 neil@parkwaycc.co.uk 2010-04-01 08:01:10 PDT
Comment on attachment 436199 [details] [diff] [review]
fix addressing comments

>+      if (NS_FAILED(exitCode))
>+      {
>+        m_foldersToStat.Clear();
>+        break;
>+      }
>+      if (m_foldersToStat.Count() > 0)
>+        m_foldersToStat[0]->UpdateStatus(this, nsnull);
Nit: could use else or could just leave out the break
(the compiler should notice that the count will be zero anyway)
Comment 27 David :Bienvenu 2010-04-01 12:01:21 PDT
Created attachment 436533 [details] [diff] [review]
patch landed

I just moved the break to the end of the case, because a case w/o a break is a bug waiting to happen.
Comment 28 David :Bienvenu 2010-04-01 12:01:51 PDT
fixed on trunk.
Comment 29 Martin Stránský 2010-04-22 23:58:47 PDT
Can we have this fix in 3.0 line? It affects many of Fedora users.
Comment 30 Mark Banner (:standard8) 2010-04-23 00:08:33 PDT
(In reply to comment #29)
> Can we have this fix in 3.0 line? It affects many of Fedora users.

Probably - it is already flagged for approval, but as we haven't yet got a firm date for 3.0.5 I've not looked at the approvals yet.
Comment 31 Mike Hommey [:glandium] 2010-04-30 02:10:55 PDT
FWIW, it also affects seamonkey 2.0.x
Comment 32 Mark Banner (:standard8) 2010-04-30 02:55:32 PDT
(In reply to comment #31)
> FWIW, it also affects seamonkey 2.0.x

That's why it is in Mailnews Core and not Thunderbird product...
Comment 33 David :Bienvenu 2010-04-30 08:15:23 PDT
fixed for 3.0.5
Comment 34 timeless 2010-07-06 23:39:17 PDT
*** Bug 564206 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.