+++ This bug was initially created as a clone of Bug #628647 +++
crash [@ nsImapMailFolder::AddMoveResultPseudoKey(unsigned int)], [@ nsImapMailFolder::AddMoveResultPseudoKey] (Mac)
#28 crash for TB17.0.2
something in TB15.0 caused a regression, based on greatly increased crash rate at that time period. But crash existed prior to TB15, so impossible to determine regression range via crash-stats easily, if at all. Or whether this is a clear regression of bug 628647. Did not determine whether this was a topcrash in TB15 or TB16.
There is no clear theme in crash comments, however several (and multiple OS) mention just having come out of sleep/hibernate. And also getting "Unresponsive Script" (examples below)
crash is in code changed by bug 628647
3904 nsresult rv = mDatabase->GetMsgHdrForKey(aMsgKey, getter_AddRefs(pseudoHdr));
3905 NS_ENSURE_SUCCESS(rv, rv);
3906 nsCString messageId;
0 xul.dll nsImapMailFolder::AddMoveResultPseudoKey mailnews/imap/src/nsImapMailFolder.cpp:3904
1 xul.dll nsImapOfflineSync::ProcessNextOperation mailnews/imap/src/nsImapOfflineSync.cpp:821
2 xul.dll nsImapService::PlaybackAllOfflineOperations mailnews/imap/src/nsImapService.cpp:3289
3 xul.dll nsMsgOfflineManager::SynchronizeOfflineImapChanges mailnews/base/src/nsMsgOfflineManager.cpp:173
4 xul.dll nsMsgOfflineManager::AdvanceToNextState mailnews/base/src/nsMsgOfflineManager.cpp:118
5 xul.dll nsMsgOfflineManager::AdvanceToNextState mailnews/base/src/nsMsgOfflineManager.cpp:151
6 xul.dll nsMsgOfflineManager::GoOnline mailnews/base/src/nsMsgOfflineManager.cpp:301
7 xul.dll NS_InvokeByIndex_P xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:70
these mention Unresponsive Script and suspend, or both
bp-fa2e3616-b660-401a-963d-4329e2130106 TB17 (mothlight)
such crashes have nsMsgOfflineManager::GoOnline on the stack
It's #27 top crasher in TB 17.0.2 so not a top crasher according to https://wiki.mozilla.org/CrashKill/Topcrash
This bug is yet another example of trying to use mDatabase without checking for a non-null value. It would probably be worth going through all of the folder code and look for instances of this pattern, as even if they are not crashing now, they can start to crash when other code is changed so that calling pattern changes slightly. If mDatabase is used more than once in a method, a local reference should be stored and used as there have been cases where mDatabase goes null after it has been initially checked as non-null.
Created attachment 731283 [details] [diff] [review]
How about this? These are the places i found in this file where mDatabase wasn't checked before usage.
Created attachment 731285 [details] [diff] [review]
proposed fix (no whitespace changes)
Without whitespace changes, for easier review.
Comment on attachment 731283 [details] [diff] [review]
I'm afraid that you have misunderstood my intention.
What you are doing in many cases is to replace GetDatabase calls with GetMsgDatabase(mDatabase) calls. That may or may not be a good idea, but it certainly has side effects beyond simply stopping the crash - and does not add anything AFAIK in preventing the crashes. What is the motivation for that? GetMsgDatabase would normally be used only when you want to get a separate pointer to the database, either in a js method or when you are trying to store a local pointer. It does not really make sense to call it to get mDatabase (other than for the possible side effect of setting the access timer, which might be a good idea but is a separate issue than than preventing crashes). GetDatabase is fine within a folder.
So what I was suggesting in comment 2 is that we scan for cases where mDatabase is not checked at all, for example in the current case of nsImapMailFolder::AddMoveResultPseudoKey, or looking at your patch another example is nsImapMailFolder::NotifyMessageFlagsFromHdr, and add there either simple checks for null, or a get followed by a check.
Therefore many of the changes that you done are not really necessary, so this patch could be much smaller.
Created attachment 731466 [details] [diff] [review]
proposed fix, v2
Hm, i had a misunderstanding mDatabase was not necessary != null after a successful GetDatabase() but that seems to be wrong. Successful rv does mean we mDatabase is now set.
This uses GetDatabase() instead. I don't think there should be much side effects except for an additional warning in the console for debug builds, for some cases. (And for GetBodysToDownload possibly a success in more cases.) It's true the patch could be smaller, but i do think early returns are generally good.
Created attachment 731467 [details] [diff] [review]
proposed fix, v2 (no whitespace changes)
I've thought about this patch quite a bit in the last week, and I am just not comfortable at the moment with the change in philosophy that this represents, namely to routinely call GetDatabase() in any call that might use the database.
If you look at for example nsMsgDBFolder, missing database is handled in a variety of different ways. In some cases, yes there is GetDatabase(). But in others, it simply returns an error, or has if (mDatabase) ... in front of code.
Recently, in an effort to reduce memory use and leaks, there have been a few cases where methods explicitly null mDatabase if they are forced to open it. (See nsMsgDBFolder::SetFlag and nsMsgDBFolder::ClearNewMessages). So there is clearly a tendency to move away from too many folders with mDatabase references.
Yes it has been a common reaction in the past to null database crashes to insert GetDatabase(), but nobody has really set down and said that our philosophy is going to be to aggressively call GetDatabase() in every conceivable instance where it might be needed, as this patch attempts to do, instead of returning an error. There are cases where it is important that a folder not have an open mDatabase reference (like when the folder is being renamed), and I am not sure that opening the database is always the best response to a missing database.
So my conclusion is that I am not willing to adopt this new philosophy in the context of a bug that is just trying to prevent crashes. I think that we want to do such a change, it should be discussed first in a bug that specifically proposes and aggressive approach to GetDatabase() as its purpose.
What I would be comfortable with is a much smaller patch that looks for cases of unchecked mDatabase, and does "rv = GetDatabase(); NS_ENSURE_SUCCESS(rv, rv)" there. Looking through your patch, those are these methods:
These changes are also OK, as they are adding a warning:
For other methods, please do not do those changes.
"I think that we want to do such a change, it should be discussed first in a bug that specifically proposes and aggressive approach to GetDatabase() as its purpose."
"I think that if we want to do such a change, it should be discussed first in a bug that specifically proposes an aggressive approach to GetDatabase() as its purpose."
Created attachment 737226 [details] [diff] [review]
proposed fix, v3
Addressing review comments.
Created attachment 737227 [details] [diff] [review]
proposed fix, v3 (no whitespace changes)
Just as a FYI I hit a crash that this bug will solve while doing testing on bug 864187. In that test, I call a periodic filter which tries to set a star on test messages in a subfolder. I got a crash in nsImapMailFolder::MarkMessagesFlagged with a null mDatabase.
The database object existed in nsMsgFilterAftertheFact but not in the folder. Yet it had to exist in the folder earlier, because nsMsgFilterAfterTheFact gets its database from the folder. So something is nulling that database in the folder unexpectedly.
The patches in the bug will fix that. Sorry for the slow review, I'll finish it Real Soon Now :)
nsAutoSyncState::ProcessExistingHeaders is nulling the database.
Comment on attachment 737227 [details] [diff] [review]
proposed fix, v3 (no whitespace changes)
Review of attachment 737227 [details] [diff] [review]:
r=me with the one nit fixed.
@@ +1984,5 @@
> if (NS_FAILED(rv)) return rv;
> rv = StoreImapFlags(kImapMsgFlaggedFlag, markFlagged, keysToMarkFlagged.Elements(),
> keysToMarkFlagged.Length(), nullptr);
> + rv = GetDatabase();
This loses the rv error from the previous line, so you need NS_ENSURE_SUCCESS(rv, rv); before rv = GetDatabase();
Comment on attachment 737226 [details] [diff] [review]
proposed fix, v3
r=me with the one nit fixed.
http://hg.mozilla.org/comm-central/rev/24722af6fb8b -> FIXED