Closed Bug 518678 Opened 15 years ago Closed 15 years ago

archive crash [@ nsMsgLocalMailFolder::EndMessage(unsigned int)]

Categories

(MailNews Core :: Backend, defect)

1.9.1 Branch
x86
macOS
defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED
Thunderbird 3.0rc1

People

(Reporter: wsmwk, Assigned: Bienvenu)

References

(Blocks 1 open bug, )

Details

(Keywords: crash, fixed-seamonkey2.0.1, Whiteboard: [GS])

Crash Data

Attachments

(4 files)

archive crash [@ nsMsgLocalMailFolder::EndMessage(unsigned int)]

10 crashes but likely a single person. how do we know? because they ALL have comments (reporter must be "the good samaratin of crashes", aka TGSC)

bp-21fc1049-34f4-4849-a1b0-dbda72090921
0	thunderbird-bin	nsMsgLocalMailFolder::EndMessage	 nsAutoPtr.h:968
1	thunderbird-bin	nsCopyMessageStreamListener::EndMessage	nsCopyMessageStreamListener.cpp:117
2	thunderbird-bin	nsImapMailFolder::EndMessage	nsImapMailFolder.cpp:5161
3	libxpcom_core.dylib	NS_InvokeByIndex_P	xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:179
4	libxpcom_core.dylib	nsProxyObjectCallInfo::Run	xpcom/proxy/src/nsProxyEvent.cpp:181
5	libxpcom_core.dylib	nsThread::ProcessNextEvent	xpcom/threads/nsThread.cpp:510
6	libxpcom_core.dylib	NS_ProcessPendingEvents_P	nsThreadUtils.cpp:180
7	thunderbird-bin	nsBaseAppShell::NativeEventCallback	widget/src/xpwidgets/nsBaseAppShell.cpp:121
8	thunderbird-bin	nsAppShell::ProcessGeckoEvents	widget/src/cocoa/nsAppShell.mm:405
9	CoreFoundation	CFRunLoopRunSpecific	
10	CoreFoundation	CFRunLoopRunInMode	
11	HIToolbox	RunCurrentEventLoopInMode	
12	HIToolbox	ReceiveNextEventCommon	
13	HIToolbox	BlockUntilNextEventMatchingListInMode	
14	AppKit	_DPSNextEvent	
15	AppKit	-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]	
16	thunderbird-bin	nsAppShell::ProcessNextNativeEvent	widget/src/cocoa/nsAppShell.mm:649
17	thunderbird-bin	nsBaseAppShell::DoProcessNextNativeEvent	widget/src/xpwidgets/nsBaseAppShell.cpp:151
18	thunderbird-bin	nsBaseAppShell::OnProcessNextEvent	widget/src/xpwidgets/nsBaseAppShell.cpp:278
Component: General → Backend
Product: Thunderbird → MailNews Core
QA Contact: general → backend
Version: 3.0 → 1.9.1 Branch
Mac-only still, 3.0b2, 3.0b3 and 3.0b4

interesting comments are 

http://crash-stats.mozilla.com/report/index/361121d7-d56f-4eec-b0b2-d4aec2090924
selected all messages in a IMAP folder containing about 1400 and hit 'A' to archive all items. It seems to have worked for a while and some time later a modal message said that "another task is running on this folder, wait till it's done" or something like that. When I clicked OK, TB crashed.

http://crash-stats.mozilla.com/report/index/899fc5db-bdac-49bc-aa68-5f30f2090923
switch to a folder (and it is indexing, downloading from the imap server, etc) and I select all and archive right away before it is done doing all of its updates.
I'm not the original good Samaratin but I had this problem too. I'm also on a Mac (Leopard) using 3.0b4.
http://crash-stats.mozilla.com/report/index/fc22a0eb-d2d9-4618-94d1-0d0c42091010

1. Tried out the new Archive button on my imap account and noticed that it created a new Archive folder on the imap server.

2. Changed the setting in Thunderbird to put the archive in Local Folders.

3. Switch to a local folder containing lots of old mail going back to 2004.

4. Select all and press Archive.

5. Got an alert dialog saying "Operation Failed because another operation is using the folder. Please wait for that operation to finish and then try again".

6. After dismissing the dialog I got another one just like it. I didn't count, but I think I probably got 5 dialogs in all from the original Archive operation. No crashes though, and some of the messages were archived.

7. Tried this again a few times with lots of failures but no crashes. Eventually it crashed but I'm not sure what was different that time. Maybe because by then I had switched to archiving messages in my IMAP inbox (I started out archiving a local folder).

The dialogs only happened when I tried archiving a lot of messages at once. Everything was ok with just a screenful of messages. I think this probably happens when the selection contains messages from multiple years, since in that case not all the messages are being copied to the same archive folder.
I just tried reproducing the problem by selecting messages from my imap inbox from 2008 and 2009 and it worked just fine. No crashes, no dialogs. Apparently you need to archive a lot of messages at once.
nominating for Thunderbird 3
Flags: blocking-thunderbird3?
#28 crasher for 3.0b4, but my guess would be it will not be that low when we actually release 3.0
With the current numbers, leaving as wanted-.  bienvenu will try and look at anyway.  If we get worse numbers for RC1, renomination could be worthwhile.
Flags: wanted-thunderbird3+
Flags: blocking-thunderbird3?
Flags: blocking-thunderbird3-
I'm confident that bienvenu will be able to reproduce this and if he can't, I have a backup of the mail folder that caused the problem.
I'm seeing a crash that looks a lot like this. Select about 4-5k messages and hit 'a' Spinning Pizza for a bit, then the Apple crash report dialog. I don't seem to be generating Mozilla crash reports, though.

Totally reproducible, so how can I generate useful info here?
(In reply to comment #8)
> Totally reproducible, so how can I generate useful info here?

i think in your case we just wait for a patch to test
ah, archiving from an imap folder to a local folder - interesting. I did a large archive and I noticed that the batching code was behaving unexpectedly, in that it did multiple copies to my 2009 folder, instead of just one. That could partly account for the problem, though the copy service is really supposed to prevent contention over the dest folder.
There are two things going on here:

1. archive still does batches on a per-month basis even if the archive granularity is per-year (that's my fault). This means we'll generate up to 12 simultaneous copies to the same folder.

2. the copy service is supposed to serialize these requests but it isn't quite working in this case. It may be that there's something about imap to local copies, or it may be something more fundamental.

I'm going to fix #1, and then try to figure out #2. Marking blocking for now.
Assignee: nobody → bienvenu
Status: NEW → ASSIGNED
Flags: blocking-thunderbird3- → blocking-thunderbird3+
Target Milestone: --- → Thunderbird 3.0rc1
Just as a reminder, the unit tests that were written for some of these move cases also failed intermittently on the Mac (bug 502928). It is possible that the root cause of both these test failures, and your thing 2. from comment 11 are the same.
thx, could be related. I'm on Windows, but of course, there's really nothing platform-specific about the move/copy code.
this makes the batching match the archive granularity, which makes this crash much harder to reproduce. I figure out the archiveFolderUri and granularity in the batch coalescing code, and store it in the batch. And I changed the batch key to know whether the month should be used to determine batches.

There's still a little bit of duplication of code figuring out the identity, but we only run the dup code when the root archive folder doesn't exist.
Attachment #407156 - Flags: review?(david.ascher)
this fixes the lowest level of the crash.

What's left is figuring out how the copy service is getting confused...
Attachment #407158 - Flags: superreview?(bugzilla)
Attachment #407158 - Flags: review?(bugzilla)
Attachment #407156 - Flags: review?(david.ascher) → review+
I've checked in the first patch, which suffices to take this off the blocker list.
Attachment #407156 - Attachment description: fix archive batching → fix archive batching - checked in.
Flags: blocking-thunderbird3+
Blocks: TB2SM
Attachment #407158 - Flags: superreview?(bugzilla)
Attachment #407158 - Flags: superreview+
Attachment #407158 - Flags: review?(bugzilla)
Attachment #407158 - Flags: review+
Attachment #407158 - Flags: approval-thunderbird3+
Attachment #407158 - Attachment description: fix crash → fix crash - checked in.
The second patch was checked in as well.

If you disable the first patch, in order to make the copy service again chain urls that do a copy to the same target folder, the STR to reproduce this bug are as follows:

1. Setup an imap account to archive to a local folder, w/ the granularity left as the default (monthly)
2. Select 3 imap messages from different months but the same year.
3. Press archive.

Thx to asuth for listening to my addled description of the weird things I was seeing in the copy service and saying several true things that allowed me to make sense of what I was seeing. What's happening is that when the first move finishes, nsMsgCopyService::NotifyCompletion calls ClearRequest - this calls OnStopCopy on the listeners, which provokes the archive batching code into calling CopyMessages again, which adds a new request to the request queue. But nsMsgCopyService::NotifyCompletion is looping through the request queue looking for requests with a particular source and destination, and the batching code has unwittingly added a new one that matches, so it gets removed right away. But nsMsgLocalMailFolder::OnCopyCompleted doesn't get called for this second request, so it leaves the folder in an unhappy state, i.e., with a copy state object. The third request comes along, and the folder complains about the existing copy state...

So it turns out to be a variation on the classic issue of iterating over a list that has changed out from under us. I'm not sure why we think the second request is done before it has even started, and fixing that might fix the problem.
I'm going to try to come up with a test case that breaks w/o this patch
Attachment #407870 - Flags: review?(kent)
Comment on attachment 407825 [details] [diff] [review]
fix for copy chaining in copycompleted notification

Kent knows this code better than anyone else, so I'm asking him for a review.

The change to where we increment sourceIndex is obviously not really a change...I can back that part out if you like.
Attachment #407825 - Flags: superreview?(bugzilla)
Attachment #407825 - Flags: review?(kent)
Attachment #407825 - Flags: review?(kent) → review+
Comment on attachment 407825 [details] [diff] [review]
fix for copy chaining in copycompleted notification

"The change to where we increment sourceIndex is obviously not really a
change...I can back that part out if you like."

That is my only complaint, I don't see the reason to remove a standard loop idiom like you did, so please restore the original.

Overall this has become a very fragile piece of code over time, and you are just patching over the fragility. But this is not the time for a major refactoring.
Comment on attachment 407870 [details] [diff] [review]
unit test that asserts w/o this fix

>+load("../../mailnews/resources/asyncTestUtils.js");

IFAICT you are not using this, so it could be removed.

>+var gMessageGenerator;
>+var gScenarioFactory;

These are only used locally in run_test, so should not be globals.

r=me with these nits fixed.
Attachment #407870 - Flags: review?(kent) → review+
Attachment #407825 - Flags: superreview?(bugzilla) → superreview+
Is the theory still that this bug is triggered when archiving messages from multiple years? I just got several crashes when trying to archive messages only from the same year. 

I'm trying to archive my itunes receipts which are dated 2008 and 2009. In the search bar i chose 'from filter' and type "itunes store' which displays all my receipts in the message list pane. To archive a mail i select 1 or more messages and hit the archive button in the message pane. I'm not using a separate search window.

At first it was only crashing when i selected messages from 2008. Archiving mails from 2009 failed too, but i only got an error dialog with no crash. 
   "an operation failed because another operation is using the folder"

Now that I'm done archiving the 2008 messages, archiving 2009 messages causes crashes too.

Still using 3.0b4

4b33ac82-4e53-4861-8c01-ff7172091024	10/24/09	8:46 PM
02fb4e29-db4e-4975-adaa-0144a2091024	10/24/09	8:39 PM
bded6a07-eaeb-4236-8d7b-094742091024	10/24/09	7:47 PM
7734f1a2-fd13-4921-b999-c79822091024	10/24/09	7:43 PM
fe2bc0af-e4f7-4c42-b0be-28fd12091024	10/24/09	7:42 PM
No, the theory was always archiving messages from different months in the same year caused crashes, if you had an archive granularity of years.
Dawn, I highly recommend trying a 3.0 pre build which should fix these crashes - http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-1.9.1/
Ok thanks, I misunderstood. I just installed the latest build and archiving works great. I love that feature!
Whiteboard: [final two patches need checkin on trunk]
the last two patches have been landed on the trunk. They are additional bullet-proofing, w/ a unit test. Marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: [final two patches need checkin on trunk]
v.fixed per comment 26 and signature gone from crash stats *after* 3.0b4
Status: RESOLVED → VERIFIED
Crash Signature: [@ nsMsgLocalMailFolder::EndMessage(unsigned int)]
This bug is coming back for at least one user of TB13 :-(
See https://getsatisfaction.com/mozilla_messaging/tags/bug_518678
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: