Closed Bug 480090 Opened 11 years ago Closed 11 years ago

Crash when sending email after being in offline mode [@ morkRowObject::CloseRowObject(morkEnv*)] - [@ nsMsgOfflineImapOperation::SetCopiesToDB - nsMsgOfflineImapOperation::Release]

Categories

(MailNews Core :: Backend, defect, critical)

1.9.1 Branch
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0rc1

People

(Reporter: Usul, Assigned: Bienvenu)

References

()

Details

(Keywords: crash, qawanted, topcrash, Whiteboard: [no l10n impact])

Crash Data

Attachments

(3 files, 2 obsolete files)

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20090223 Thunderbird/3.0b2.

I switched from Offline mode to online mode. Said yes to send pending email and thunderbird crashed.

0  	thunderbird-bin  	morkRowObject::CloseRowObject  	db/mork/src/morkRowObject.cpp:606
1 	thunderbird-bin 	morkRowObject::CloseMorkNode 	db/mork/src/morkRowObject.cpp:85
2 	thunderbird-bin 	thunderbird-bin@0x9a99fc 	
3 	thunderbird-bin 	morkObject::Release 	db/mork/src/morkObject.cpp:67
4 	libxpcom_core.dylib 	libxpcom_core.dylib@0x41eb 	
5 	thunderbird-bin 	nsMsgOfflineImapOperation::SetCopiesToDB 	nsCOMPtr.h:469
6 	thunderbird-bin 	nsMsgOfflineImapOperation::Release 	nsMsgOfflineImapOperation.cpp:50
7 	libxpcom_core.dylib 	nsCOMArray_base::ReplaceObjectAt 	nsCOMArray.cpp:61
8 	thunderbird-bin 	thunderbird-bin@0x5082e4 	
9 	thunderbird-bin 	nsImapOfflineSync::Release 	nsImapOfflineSync.cpp:60
10 	libxpcom_core.dylib 	libxpcom_core.dylib@0x41eb 	
11 	thunderbird-bin 	nsMsgMailNewsUrl::SetUrlState 	nsCOMPtr.h:469
12 	libxpcom_core.dylib 	NS_InvokeByIndex_P 	xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:179
13 	libxpcom_core.dylib 	nsProxyObjectCallInfo::Run 	xpcom/proxy/src/nsProxyEvent.cpp:181
14 	libxpcom_core.dylib 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:510
15 	libxpcom_core.dylib 	NS_ProcessPendingEvents_P 	nsThreadUtils.cpp:180
16 	thunderbird-bin 	nsBaseAppShell::NativeEventCallback 	widget/src/xpwidgets/nsBaseAppShell.cpp:121
17 	thunderbird-bin 	nsAppShell::ProcessGeckoEvents 	widget/src/cocoa/nsAppShell.mm:374
18 	CoreFoundation 	CoreFoundation@0x735f4 	
19 	CoreFoundation 	CoreFoundation@0x73cd7 	
20 	HIToolbox 	HIToolbox@0x302bf 	
21 	HIToolbox 	HIToolbox@0x300d8 	
22 	HIToolbox 	HIToolbox@0x2ff4c 	
23 	AppKit 	AppKit@0x40d7c 	
24 	AppKit 	AppKit@0x4062f 	
25 	AppKit 	AppKit@0x3966a 	
26 	thunderbird-bin 	nsAppShell::Run 	widget/src/cocoa/nsAppShell.mm:693
27 	thunderbird-bin 	nsAppStartup::Run 	toolkit/components/startup/src/nsAppStartup.cpp:192
28 	thunderbird-bin 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3279
29 	thunderbird-bin 	main 	nsMailApp.cpp:103
30 	thunderbird-bin 	thunderbird-bin@0x2105 	
31 	thunderbird-bin 	thunderbird-bin@0x202c 	
32 		@0x2
Component: General → Backend
Product: Thunderbird → MailNews Core
QA Contact: general → backend
Version: Trunk → 1.9.1 Branch
Flags: wanted-thunderbird3?
this is now a topcrash for b2 + b3pre
Flags: blocking-thunderbird3?
Keywords: topcrash
OS: Mac OS X → All
Hardware: x86 → All
Marking as blocker, adding mark and david as they both may have ideas.
Flags: blocking-thunderbird3? → blocking-thunderbird3+
Flags: wanted-thunderbird3?
I'm going to look at this a little, I'm not quite ready to sign on.
this stack is a bit goofy - nsMsgOfflineImapOperation::Release doesn't call SetCopiesToDB
Well I'm going to quit looking at this, sorry. I can't reproduce it, and I'm just not familiar enough with the code that is affected.
Target Milestone: --- → Thunderbird 3.0rc1
I can create a very similar stack by deleting an imap message and immediately shutting down, so that the shutdown process coincides with the attempt to playback the delete as an offline operation. I wonder if the other instances of this crash that make it a top-crash have to do with that instead of Ludovic's particular scenario.
(In reply to comment #6)
> I can create a very similar stack by deleting an imap message and immediately
> shutting down, so that the shutdown process coincides with the attempt to
> playback the delete as an offline operation. I wonder if the other instances of
> this crash that make it a top-crash have to do with that instead of Ludovic's
> particular scenario.

Any idea how this could be protected from crashing david ?
a few possibilities:

Have shutdown check if we're in the middle of playing back an offline event and back-off in that case.

Have db's keep a cache of in use offline imap operations and clear them when the db is forced closed, like we do for msg hdrs.

I need to think a bit more about this, which I plan to do after b3.
- #5 crasher currently for 3.0b2

- fwiw no comments from 3.0b2. but 3.0b3pre has 2 with comments
** Usually occurs the first time I attempt to start, and then normal. But sometimes, like now, the 2nd also. I think this is the fault of th AddressBook Sync add on bp-e4b2eb93-9a09-42b8-8bbf-758d62090509
** Crashed when trying to rebuild a mailbox index (to fix the usual b0rked mailbox display problem) bp-2bb846da-e60b-40eb-a2f2-4ce532090510
#1 crasher for 3.0b3pre. 

3.0b2 crash comments:
* Running Zindus sync
* program was in background so no action I did seemed to cause this
* just switched from offline to online, and had some send later emails in the outbox.

3.0b3pre crash comments:
* moving msgs from an acct to a local folder for saving. bp-8b3561c7-43ef-4628-9ebd-f85ed2090528
* bc: [Bob Clary?] went offline, then back online bp-f094cf32-5c08-4ec2-a67d-c5d802090527

I've cc: BC - hopefully he can provide some insight.

Would be nice to drive this in for b3 if someone has bandwidth - since the b3 got pushed back, and will become top 3 crasher when b3 ships.
Assignee: nobody → bienvenu
Status: NEW → ASSIGNED
Target Milestone: Thunderbird 3.0rc1 → Thunderbird 3.0b4
To explain the stack a little, SetUrlState releases its listeners after sending an OnStopRunningUrl. The offline sync object then tends to get deleted, which causes   nsCOMArray<nsIMsgOfflineImapOperation> m_currentOpsToClear to release all the ops it has saved.  If any of the db's those ops came from have already been closed at this point, we'll crash.

This shouldn't be too hard to fix.
Attached patch possible fixSplinter Review
I think this will help - removing the cleared op from the m_currentOpsToClear array will in general keep the ops from sticking around longer than needed. And clearing the row before releasing the db should help in the case that we're holding the last reference to the db.

I'd like to see how much of a dent this makes in the crash reports w/ this stack. There's more that I can do but I'd like to start with this.
Attachment #381822 - Flags: superreview?(bugzilla)
Attachment #381822 - Flags: review?(bugzilla)
putting in b3 for now...
Whiteboard: [needs review standard8]
Target Milestone: Thunderbird 3.0b4 → Thunderbird 3.0b3
Attachment #381822 - Flags: superreview?(bugzilla)
Attachment #381822 - Flags: superreview+
Attachment #381822 - Flags: review?(bugzilla)
Attachment #381822 - Flags: review+
Comment on attachment 381822 [details] [diff] [review]
possible fix

I couldn't crash with or without this, but it seems to be the right thing to try. r/sr=Standard8
marking fixed - we should monitor crash-stats and see if this helps.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [needs review standard8]
ah, but the difference is that a quick look shows these are all addr db enumerators, i.e., enumerators that iterate over cards address books, which is a different bug. I'd suggest filing a new bug on those crashes. The fix is similar, I believe. Please cc me on the new bug.
bad things happen when I don't look at the stack. even moreso given the AB crashes might all be mine, working with AB sync extension :(

however, check build 20090625024553 crash bp-49d10b91-9f97-4c97-80ec-bacb92090627
Summary: Crash when sending email after being in offline mode [@ morkRowObject::CloseRowObject(morkEnv*)] → Crash when sending email after being in offline mode [@ morkRowObject::CloseRowObject(morkEnv*) - nsMsgOfflineImapOperation::SetCopiesToDB - nsMsgOfflineImapOperation::Release]
yeah, that one looks more like this.  But it would be good to see if these crashes have gone down...
moving off of b3, since I don't think the crash+stack this bug was originally filed on is still a top crash.
Target Milestone: Thunderbird 3.0b3 → Thunderbird 3.0b4
fixing summary to make it match for crash-stats.
top of stack sig MUST be specified as "[@ signature]"

I agree, probably not topcrash.
someone will need examine the remaining crashes more closely for b4
Keywords: topcrash
Summary: Crash when sending email after being in offline mode [@ morkRowObject::CloseRowObject(morkEnv*) - nsMsgOfflineImapOperation::SetCopiesToDB - nsMsgOfflineImapOperation::Release] → Crash when sending email after being in offline mode [@ morkRowObject::CloseRowObject(morkEnv*)] - [@ nsMsgOfflineImapOperation::SetCopiesToDB - nsMsgOfflineImapOperation::Release]
Ok got some other STR to get the same stack Trace:

0. Gmail imap account
1. Go offline
2. locate Trash, right click and select empty trash
3. Answer yes to the dialog about emptying the trash
4. Go online 
5. Boom
those steps don't crash for me. Ludo, do you have a breakpad incident for that crash?
(In reply to comment #23)
> those steps don't crash for me. Ludo, do you have a breakpad incident for that
> crash?

Here is a fresh one I've just created : http://crash-stats.mozilla.com/report/index/c9cab354-66b1-48b9-8050-16d1a2090807 .

So the STR are missing that you need to put something in the trash before going offline.
Attached patch proposed fix (obsolete) — Splinter Review
this fixes the crash Ludo reproduced...
Attachment #394177 - Flags: superreview?(neil)
Attachment #394177 - Flags: review?(neil)
I forgot to save my last set of changes before generating the diff...
Attachment #394177 - Attachment is obsolete: true
Attachment #394204 - Flags: superreview?(neil)
Attachment #394204 - Flags: review?(neil)
Attachment #394177 - Flags: superreview?(neil)
Attachment #394177 - Flags: review?(neil)
Comment on attachment 394204 [details] [diff] [review]
always clear currentOp, as before

>+          // kDeleteAllMsgs should have cleared this, but just in case...
>           currentOp = nsnull;
>+          m_currentDB = nsnull;
This looks wrong, since we're after the else here, so we always clear the DB?
(In reply to comment #28)
> (From update of attachment 394204 [details] [diff] [review])
> >+          // kDeleteAllMsgs should have cleared this, but just in case...
> >           currentOp = nsnull;
> >+          m_currentDB = nsnull;
> This looks wrong, since we're after the else here, so we always clear the DB?

I think you're right, thx - I was a bit sleep-deprived when I wrote that...new patch upcoming.
removed the clearing of the db, and removed the clearing of current op in other cases. I think that was a left-over of the pre-comptr days. currentOp is about to go out of scope, so there's no reason to explicitly null it out since the comptr will be destroyed right away.
Attachment #394284 - Flags: superreview?(neil)
Attachment #394284 - Flags: review?(neil)
Attachment #394284 - Flags: superreview?(neil)
Attachment #394284 - Flags: superreview+
Attachment #394284 - Flags: review?(neil)
Attachment #394284 - Flags: review+
Comment on attachment 394284 [details] [diff] [review]
fix addressing comment - checked in.

OK this looks good.
Attachment #394284 - Attachment description: fix addressing comment → fix addressing comment - checked in.
I believe there are still related crashes, so I think we need new STR's for the other crashes.
Attachment #394204 - Attachment is obsolete: true
Attachment #394204 - Flags: superreview?(neil)
Attachment #394204 - Flags: review?(neil)
Comment on attachment 394204 [details] [diff] [review]
always clear currentOp, as before

Cancelling reviews on obsolete patch.
Whiteboard: [no l10n impact]
http://crash-stats.mozilla.com/report/index/980bf2c4-7180-466e-a01a-f01722090824 says : 	I tried to rebuild the index on a folder.  I tried that offline, online, just before going online - but it didn't crash.
Keywords: qawanted
(In reply to comment #34)
> http://crash-stats.mozilla.com/report/index/980bf2c4-7180-466e-a01a-f01722090824
> says :     "I tried to rebuild the index on a folder."

If it helps at all, I think that was crash report was from me. And, yeah, I did happen to have just tried rebuilding an index on a folder at the time, but like you, I've rebuilt folder indexes before (and since) then without incident. 

For what it's worth, the reason that I did the folder-reindexing at the time was that at least one of my messages (IMAP) was out of sync -- I'd click on a subject line in the thread-pane and (for that one message) the wrong message-body was appearing in the preview pane. (That's what lead me to do the reindexing, which usually fixes that kind of thing.)

This next bit is just conjecture, but given that reindexing is usually pretty benign, maybe there was some underlying condition that confused not only the existing message indexes but also the re-indexing process?
Ludo says life is much better, so we can remove this off of blocker. If it becomes a topcrasher, we can bring it back.
Flags: wanted-thunderbird3+
Flags: blocking-thunderbird3-
Flags: blocking-thunderbird3+
crash-stats shows it's still #2 crasher for 3.0b4pre at a steady rate of 1 per day for the previous 7 days. (although judging topcrash from only a 1 week period is risky guesswork)
1 per day makes it the #2 top crasher? I guess we're not getting a ton of data from our 3.0b4pre builds. I'll mark this a blocker again, but put in rc1. I hope we get more data from 3.0b4...
Flags: blocking-thunderbird3- → blocking-thunderbird3+
Target Milestone: Thunderbird 3.0b4 → Thunderbird 3.0rc1
heh, it's going to be #1, since the current #1 and #3 are both fixed...
yup. but the nightly crash rates don't always translate to the rate for releases. we can reassess after 3.0b4 ships, if there isn't already  a patch by then. 

or do we even want a new bug?  ref comment 17...
stacks are similar, but not exact
old    bp-c9cab354-66b1-48b9-8050-16d1a2090807
recent bp-369f1632-3df0-4c2f-849f-aa4212090907
so far I've tried :
 - going nicely offline / rebuild index / go online
 - going brutally offline / rebuild index / go online

- both case with imap - No crash.
- did the same with one pop box - No crash.

In all the above I didn't need to rebuild the index. So I forced my mailbox to be in a state where rebuild indexes would be necessary - still no crash while re-building and changing offline state.

looking at the comments on http://crash-stats.mozilla.com/report/list?product=Thunderbird&query_search=signature&query_type=exact&query=morkRowObject%3A%3ACloseRowObject%28morkEnv*%29&date=&range_value=4&range_unit=weeks&do_query=1&signature=morkRowObject%3A%3ACloseRowObject%28morkEnv*%29 I tried :
- borking my indexes / rebuild / quit (while rebuild is in effect) on imap No crash.

All testing above done on mac os x 10.5 with one gmail account.

The crash both exist in Mac OS and Windows. It's our #1 topcrash.
Keywords: topcrash
I was able to reproduce this crash, by deleting about 8 imap messages one at a time, relatively quickly, and then shutting down immediately. From that, I saw that the db had been force closed out from under the offline imap operations. Making offline imap sync a db change listener allows me to free the operations before the db goes away...
Attachment #402442 - Flags: superreview?(neil)
Attachment #402442 - Flags: review?(neil)
Whiteboard: [no l10n impact] → [no l10n impact][has patch for review neil]
Comment on attachment 402442 [details] [diff] [review]
make offline sync a db change listener

>+class nsImapOfflineSync : public nsIUrlListener, public nsIMsgCopyServiceListener,
>+                          public nsIDBChangeListener {
Nit: one interface per line
Attachment #402442 - Flags: superreview?(neil)
Attachment #402442 - Flags: superreview+
Attachment #402442 - Flags: review?(neil)
Attachment #402442 - Flags: review+
fix checked in - we should make sure this stack goes away from crash stats in rc1 pre builds (or at least decreases significantly).
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Whiteboard: [no l10n impact][has patch for review neil] → [no l10n impact]
crash rate on 3.0pre is a minimum of every other nightly build, so we should know in a week from 3.0pre crashes whether this is gone
bienvenu, in the last 4 weeks just one crash with newer build 2009101100 - stack is ~same.  bp-7aedd603-c450-4ca1-8b49-3eb972091014
Crash Signature: [@ morkRowObject::CloseRowObject(morkEnv*)] [@ nsMsgOfflineImapOperation::SetCopiesToDB - nsMsgOfflineImapOperation::Release]
You need to log in before you can comment on or make changes to this bug.