Last Comment Bug 837409 - crash [@ nsImapMailFolder::AddMoveResultPseudoKey(unsigned int)], [@ nsImapMailFolder::AddMoveResultPseudoKey] (Mac)
: crash [@ nsImapMailFolder::AddMoveResultPseudoKey(unsigned int)], [@ nsImapMa...
Status: RESOLVED FIXED
[regression:TB15.0]
: crash, regression
Product: MailNews Core
Classification: Components
Component: Networking: IMAP (show other bugs)
: Trunk
: All All
: -- critical (vote)
: Thunderbird 23.0
Assigned To: Magnus Melin
:
Mentors:
Depends on:
Blocks: 628647
  Show dependency treegraph
 
Reported: 2013-02-02 09:29 PST by Wayne Mery (:wsmwk, NI for questions)
Modified: 2015-10-07 18:38 PDT (History)
7 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
proposed fix (30.70 KB, patch)
2013-03-29 13:56 PDT, Magnus Melin
rkent: review-
Details | Diff | Splinter Review
proposed fix (no whitespace changes) (20.59 KB, patch)
2013-03-29 13:57 PDT, Magnus Melin
no flags Details | Diff | Splinter Review
proposed fix, v2 (29.78 KB, patch)
2013-03-30 05:29 PDT, Magnus Melin
rkent: review-
Details | Diff | Splinter Review
proposed fix, v2 (no whitespace changes) (19.64 KB, patch)
2013-03-30 05:30 PDT, Magnus Melin
no flags Details | Diff | Splinter Review
proposed fix, v3 (17.50 KB, patch)
2013-04-14 03:40 PDT, Magnus Melin
rkent: review+
Details | Diff | Splinter Review
proposed fix, v3 (no whitespace changes) (8.98 KB, patch)
2013-04-14 03:40 PDT, Magnus Melin
no flags Details | Diff | Splinter Review

Description Wayne Mery (:wsmwk, NI for questions) 2013-02-02 09:29:06 PST
+++ This bug was initially created as a clone of Bug #628647 +++

crash [@ nsImapMailFolder::AddMoveResultPseudoKey(unsigned int)], [@ nsImapMailFolder::AddMoveResultPseudoKey] (Mac)

#28 crash for TB17.0.2

something in TB15.0 caused a regression, based on greatly increased crash rate at that time period.  But crash existed prior to TB15, so impossible to determine regression range via crash-stats easily, if at all. Or whether this is a clear regression of bug 628647. Did not determine whether this was a topcrash in TB15 or TB16.

There is no clear theme in crash comments, however several (and multiple OS) mention just having come out of sleep/hibernate.  And also getting "Unresponsive Script" (examples below)

crash is in code changed by bug 628647
bienvenu@7044
3904 nsresult rv = mDatabase->GetMsgHdrForKey(aMsgKey, getter_AddRefs(pseudoHdr));
bienvenu@7044
3905 NS_ENSURE_SUCCESS(rv, rv);
bienvenu@6984
3906 nsCString messageId;
bienvenu@6984
3907 pseudoHdr->GetMessageId(getter_Copies(messageId)); 

TB19 bp-9f158740-d238-4a86-8f08-260402130202
0	xul.dll	nsImapMailFolder::AddMoveResultPseudoKey	mailnews/imap/src/nsImapMailFolder.cpp:3904
1	xul.dll	nsImapOfflineSync::ProcessNextOperation	mailnews/imap/src/nsImapOfflineSync.cpp:821
2	xul.dll	nsImapService::PlaybackAllOfflineOperations	mailnews/imap/src/nsImapService.cpp:3289
3	xul.dll	nsMsgOfflineManager::SynchronizeOfflineImapChanges	mailnews/base/src/nsMsgOfflineManager.cpp:173
4	xul.dll	nsMsgOfflineManager::AdvanceToNextState	mailnews/base/src/nsMsgOfflineManager.cpp:118
5	xul.dll	nsMsgOfflineManager::AdvanceToNextState	mailnews/base/src/nsMsgOfflineManager.cpp:151
6	xul.dll	nsMsgOfflineManager::GoOnline	mailnews/base/src/nsMsgOfflineManager.cpp:301
7	xul.dll	NS_InvokeByIndex_P	xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:70 

these mention Unresponsive Script and suspend, or both
bp-fa2e3616-b660-401a-963d-4329e2130106 TB17 (mothlight) 
bp-48dcd697-ad6e-4084-b238-b0d962130113 TB17
bp-5ba0241a-b6cb-47b5-9cb3-bfccc2130128 TB17
bp-dd2f5b3e-3438-4274-b71b-620fa2130119 TB17
such crashes have nsMsgOfflineManager::GoOnline on the stack

bp-ea4c9730-0e6e-40ee-bb2e-69b8e2120831 TB15
bp-6cc17a4b-eb33-41df-af96-e27a42120622 TB12
Comment 1 Scoobidiver (away) 2013-02-12 06:45:57 PST
It's #27 top crasher in TB 17.0.2 so not a top crasher according to https://wiki.mozilla.org/CrashKill/Topcrash
Comment 2 Kent James (:rkent) 2013-02-12 08:48:58 PST
This bug is yet another example of trying to use mDatabase without checking for a non-null value. It would probably be worth going through all of the folder code and look for instances of this pattern, as even if they are not crashing now, they can start to crash when other code is changed so that calling pattern changes slightly. If mDatabase is used more than once in a method, a local reference should be stored and used as there have been cases where mDatabase goes null after it has been initially checked as non-null.
Comment 3 Magnus Melin 2013-03-29 13:56:26 PDT
Created attachment 731283 [details] [diff] [review]
proposed fix

How about this? These are the places i found in this file where mDatabase wasn't checked before usage.
Comment 4 Magnus Melin 2013-03-29 13:57:20 PDT
Created attachment 731285 [details] [diff] [review]
proposed fix (no whitespace changes)

Without whitespace changes, for easier review.
Comment 5 Kent James (:rkent) 2013-03-29 15:59:26 PDT
Comment on attachment 731283 [details] [diff] [review]
proposed fix

I'm afraid that you have misunderstood my intention.

What you are doing in many cases is to replace GetDatabase calls with GetMsgDatabase(mDatabase) calls. That may or may not be a good idea, but it certainly has side effects beyond simply stopping the crash - and does not add anything AFAIK in preventing the crashes. What is the motivation for that? GetMsgDatabase would normally be used only when you want to get a separate pointer to the database, either in a js method or when you are trying to store a local pointer. It does not really make sense to call it to get mDatabase (other than for the possible side effect of setting the access timer, which might be a good idea but is a separate issue than than preventing crashes). GetDatabase is fine within a folder. 

So what I was suggesting in comment 2 is that we scan for cases where mDatabase is not checked at all, for example in the current case of nsImapMailFolder::AddMoveResultPseudoKey, or looking at your patch another example is nsImapMailFolder::NotifyMessageFlagsFromHdr, and add there either simple checks for null, or a get followed by a check.

Therefore many of the changes that you done are not really necessary, so this patch could be much smaller.
Comment 6 Magnus Melin 2013-03-30 05:29:31 PDT
Created attachment 731466 [details] [diff] [review]
proposed fix, v2

Hm, i had a misunderstanding mDatabase was not necessary != null after a successful GetDatabase() but that seems to be wrong. Successful rv does mean we mDatabase is now set.

This uses GetDatabase() instead. I don't think there should be much side effects except for an additional warning in the console for debug builds, for some cases. (And for GetBodysToDownload possibly a success in more cases.) It's true the patch could be smaller, but i do think early returns are generally good.
Comment 7 Magnus Melin 2013-03-30 05:30:05 PDT
Created attachment 731467 [details] [diff] [review]
proposed fix, v2 (no whitespace changes)
Comment 8 Kent James (:rkent) 2013-04-09 12:49:27 PDT
I've thought about this patch quite a bit in the last week, and I am just not comfortable at the moment with the change in philosophy that this represents, namely to routinely call GetDatabase() in any call that might use the database.

If you look at for example nsMsgDBFolder, missing database is handled in a variety of different ways. In some cases, yes there is GetDatabase(). But in others, it simply returns an error, or has if (mDatabase) ... in front of code.

Recently, in an effort to reduce memory use and leaks, there have been a few cases where methods explicitly null mDatabase if they are forced to open it. (See nsMsgDBFolder::SetFlag and nsMsgDBFolder::ClearNewMessages). So there is clearly a tendency to move away from too many folders with mDatabase references.
Yes it has been a common reaction in the past to null database crashes to insert GetDatabase(), but nobody has really set down and said that our philosophy is going to be to aggressively call GetDatabase() in every conceivable instance where it might be needed, as this patch attempts to do, instead of returning an error. There are cases where it is important that a folder not have an open mDatabase reference (like when the folder is being renamed), and I am not sure that opening the database is always the best response to a missing database.

So my conclusion is that I am not willing to adopt this new philosophy in the context of a bug that is just trying to prevent crashes. I think that we want to do such a change, it should be discussed first in a bug that specifically proposes and aggressive approach to GetDatabase() as its purpose.

What I would be comfortable with is a much smaller patch that looks for cases of unchecked mDatabase, and does "rv = GetDatabase(); NS_ENSURE_SUCCESS(rv, rv)" there. Looking through your patch, those are these methods:

nsImapMailFolder::MarkMessagesFlagged
nsImapMailFolder::ApplyFilterHit
nsImapMailFolder::AddMoveResultPseudoKey
nsImapMailFolder::HandleCustomFlags
nsImapMailFolder::NotifyMessageFlagsFromHdr
nsImapMailFolder::FindOpenRange

These changes are also OK, as they are adding a warning:

nsImapMailFolder::NotifySearchHit
nsImapMailFolder::CopyFileToOfflineStore
nsImapMailFolder::GetOfflineMsgFolder
nsImapMailFolder::GetOfflineFileStream

For other methods, please do not do those changes.
Comment 9 Kent James (:rkent) 2013-04-09 12:51:34 PDT
correction:

"I think that we want to do such a change, it should be discussed first in a bug that specifically proposes and aggressive approach to GetDatabase() as its purpose."

should be:

"I think that if we want to do such a change, it should be discussed first in a bug that specifically proposes an aggressive approach to GetDatabase() as its purpose."
Comment 10 Magnus Melin 2013-04-14 03:40:10 PDT
Created attachment 737226 [details] [diff] [review]
proposed fix, v3

Addressing review comments.
Comment 11 Magnus Melin 2013-04-14 03:40:46 PDT
Created attachment 737227 [details] [diff] [review]
proposed fix, v3 (no whitespace changes)
Comment 12 Kent James (:rkent) 2013-04-22 13:43:08 PDT
Just as a FYI I hit a crash that this bug will solve while doing testing on bug 864187. In that test, I call a periodic filter which tries to set a star on test messages in a subfolder. I got a crash in nsImapMailFolder::MarkMessagesFlagged with a null mDatabase.

The database object existed in nsMsgFilterAftertheFact but not in the folder. Yet it had to exist in the folder earlier, because nsMsgFilterAfterTheFact gets its database from the folder. So something is nulling that database in the folder unexpectedly.

The patches in the bug will fix that. Sorry for the slow review, I'll finish it Real Soon Now :)
Comment 13 Kent James (:rkent) 2013-04-22 14:03:23 PDT
nsAutoSyncState::ProcessExistingHeaders is nulling the database.
Comment 14 Kent James (:rkent) 2013-04-23 11:07:44 PDT
Comment on attachment 737227 [details] [diff] [review]
proposed fix, v3 (no whitespace changes)

Review of attachment 737227 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the one nit fixed.

::: mailnews/imap/src/nsImapMailFolder.cpp
@@ +1984,5 @@
>      if (NS_FAILED(rv)) return rv;
>      rv = StoreImapFlags(kImapMsgFlaggedFlag, markFlagged,  keysToMarkFlagged.Elements(),
>                          keysToMarkFlagged.Length(), nullptr);
> +
> +    rv = GetDatabase();

This loses the rv error from the previous line, so you need NS_ENSURE_SUCCESS(rv, rv); before rv = GetDatabase();
Comment 15 Kent James (:rkent) 2013-04-23 11:08:28 PDT
Comment on attachment 737226 [details] [diff] [review]
proposed fix, v3

r=me with the one nit fixed.
Comment 16 Magnus Melin 2013-04-30 02:49:16 PDT
http://hg.mozilla.org/comm-central/rev/24722af6fb8b -> FIXED

Note You need to log in before you can comment on or make changes to this bug.