Closed Bug 220012 Opened 21 years ago Closed 14 years ago

pop3 fcc doesn't complete, spins

Categories

(MailNews Core :: Database, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.6alpha

People

(Reporter: mscott, Assigned: Bienvenu)

Details

Attachments

(2 files, 1 obsolete file)

From an AIM conversation with Brendan:

"i'm running a recent trunk daily build, seeing a lot of Sent.msf corruption. 
once it happens, the send window can't dismiss, hangs.
and clicking on Sent shows watch-hand cursor  seems to happen more with recent
build sending big message, attachment, possibly i'm deleting the message i'm
replying to or fwd'ing"

FYI, Brenden is on POP.
accepting
Status: NEW → ASSIGNED
Keywords: hang
Target Milestone: --- → mozilla1.6alpha
I'm not sure why this is Sent.msf file corruption. It sounds to me more like a
folder lock never getting cleared. But I'd love to find a reproducible case of this.
Brendan, when this happens, is the dialog about sending/copying to the sent
folder up and never comes down? Or has it come up and gone away? Or have you
cancelled it?

Is it possible this happens when you have the sent folder open in the UI?
Happened again today, and I crashed Canceling the dialog that says my message
was sent but couldn't be copied to Sent, would I like to return to the compose
window?  That dialog came up minutes after I had tired of looking at the compose
window hanging with a "copying to Sent folder" dialog, canceled that, and told
compose not to save a draft.  No open view on Sent folder.  No racing compose
windows sending other messages, AFAIcouldTell.

Something was rattling my disk, and strace -p <mozilla-bin's pid> again showed
tons of UIDL synching, or maybe Junk mail filtering, or message filtering? 
Dunno, but lots of 511-byte reads with seeks intervening.  I didn't save that
strace output, but I sent you one like it a few months back -- the only cure
then was to move aside my Inbox and start fresh (although since then, I've
viewed the big Inbox.old [12730 messages] without any disk seek/read storms).

Here's the talkback:
	
MozillaTrunk     
Netscape LinuxIntel (2003082705)
Trigger Time: 09/26/2003 14:28:24 	
		
		Incident ID: 23916421
	
	All Incidents submitted by this user
Machine State
OVERVIEW   |   ENVIRONMENT   |   CODE AROUND THE PC   |   SHARED OBJECTS
Trigger Type:  Program Crash
Trigger Reason:  SIGSEGV: Segmentation Fault: (signal 11)
Call Stack:    (Signature = 0x00000029 67b6a94c)
    	0x00000029 	 
    	nsMsgCopyService::DoNextCopy() 	 
    	nsMsgCopyService::DoCopy() 	 
    	nsMsgCopyService::CopyMessages() 	 
    	nsMsgFilterAfterTheFact::ApplyFilter() 	 
    	nsMsgFilterAfterTheFact::OnSearchDone() 	 
    	nsMsgSearchSession::NotifyListenersDone() 	 
    	nsMsgSearchSession::TimerCallback() 	 
    	nsTimerImpl::Fire() 	 
    	handleTimerEvent() 	 
    	PL_HandleEvent() 	 
    	PL_ProcessPendingEvents() 	 
    	nsEventQueueImpl::ProcessPendingEvents() 	 
    	event_processor_callback() 	 
    	our_gdk_io_invoke() 	 
    	libglib-1.2.so.0 + 0x11076 (0x402b7076) 	 
    	libglib-1.2.so.0 + 0x1297e (0x402b897e) 	 
    	libglib-1.2.so.0 + 0x12e59 (0x402b8e59) 	 
    	libglib-1.2.so.0 + 0x130f4 (0x402b90f4) 	 
    	libgtk-1.2.so.0 + 0xa86df (0x401b76df) 	 
    	nsAppShell::Run() 	 
    	nsAppShellService::Run() 	 
    	main1() 	 
    	main() 	 
    	0x420158d4 	 
Registers:
EAX:	0be2fa40 	EBX:	41e36730 	ECX:	08c6ec18 	EDX:	0be3dac0
ESI:	0d5d9698 	EDI:	80004005 	ESP:	bfffef0c 	EBP:	bfffef28
EIP:	00000029 	cf PF af zf sf of IF df nt RF vm   IOPL: 0
CS:	0023	DS:	002b	SS:	002b	ES:	002b	FS:	0000	GS:	0007
Stack Dump:
Command Line:  /usr/local/mozilla/mozilla-bin -P brendan
System
OVERVIEW   |   RESOURCE LIMITS
Processor:
Type 	Speed
Pentium II
	
134 MHz
Current Working Directory:  /home/brendan/src/trunk-opt/mozilla/js/src/
System Information:
Linux
localhost.localdomain
2.4.18-14
#1 Wed Sep 4 13:35:50 EDT 2002
i686
Physical Memory:  249.4 MB
Page Size:  4096
User ID:  Not Available
Group ID:  Not Available
System uptime:
2:28pm up 1 day, 10:54, 1 user, load average: 1.50, 1.12, 0.81
User
Talkback User ID:  U49C24F50-3ACC-11D6-89A30010-8303E166
User IP Address:  10.169.106.79
User Email Address:  brendan@meer.net
User Comments:
Another corrupted Sent.msf case. Crash happened after I got the much-delayed
dialog asking whether I wanted to return to the compose window, even though the
message was sent, just because it couldn't be copied to the Sent folder. I
clicked Cancel and crashed.
MozillaTrunk
OVERVIEW   |   TRACES & ASSERTS   |   CUSTOM   |   COLLECTED FILES
Statistics:
Launches 	Normal Quits 	Crashes 	Trigger Events
-
	
-
	
-
	
3
Seconds Since Last Crash:  157977 seconds (1 Day, 19:52:57)
Total Runtime:  507556 seconds (5 days, 20:59:16)
© Copyright 1997–99 Full Circle Software, Inc. All Rights Reserved. 	Version
2.2.2254

/be
Older talkback for a similar case:

Trigger Type:   Program Crash
Trigger Reason:  SIGSEGV: Segmentation Fault: (signal 11)
Call Stack:    (Signature = _ZN15nsMsgSearchTerm18MatchRfc2047StringEPKcS1_iPi()
3de14aae)
    	_ZN15nsMsgSearchTerm18MatchRfc2047StringEPKcS1_iPi() 	 
    	_ZN15nsMsgSearchTerm17MatchRfc822StringEPKcS1_iPi() 	 
    
_ZN22nsMsgSearchOfflineMail17ProcessSearchTermEP11nsIMsgDBHdrP16nsIMsgSearchTermPKcP21nsIMsgSearchScopeTermP14nsIMsgDatabaseS5_jiPi()
	 
    
_ZN22nsMsgSearchOfflineMail23ConstructExpressionTreeEP11nsIMsgDBHdrP16nsISupportsArrayRjPKcP21nsIMsgSearchScopeTermP14nsIMsgDatabaseS6_jiPP25nsMsgSearchBoolExpressionPi()
	 
    
_ZN22nsMsgSearchOfflineMail10MatchTermsEP11nsIMsgDBHdrP16nsISupportsArrayPKcP21nsIMsgSearchScopeTermP14nsIMsgDatabaseS5_jiPi()
	 
    
_ZN22nsMsgSearchOfflineMail19MatchTermsForSearchEP11nsIMsgDBHdrP16nsISupportsArrayPKcP21nsIMsgSearchScopeTermP14nsIMsgDatabasePi()
	 
    	_ZN22nsMsgSearchOfflineMail6SearchEPi() 	 
    	_ZN20nsMsgSearchScopeTerm9TimeSliceEPi() 	 
    	_ZN18nsMsgSearchSession15TimeSliceSerialEPi() 	 
    	_ZN18nsMsgSearchSession9TimeSliceEPi() 	 
    	_ZN18nsMsgSearchSession13TimerCallbackEP8nsITimerPv() 	 
    	_ZN11nsTimerImpl4FireEv() 	 
    	_Z16handleTimerEventP14TimerEventType() 	 
    	PL_HandleEvent() 	 
    	PL_ProcessPendingEvents() 	 
    	_ZN16nsEventQueueImpl20ProcessPendingEventsEv() 	 
    	NS_ShutdownXPCOM() 	 
    	NS_ShutdownXPCOM() 	 
    	GRE_Shutdown() 	 
    	main() 	 
    	0x420158d4 	 
Registers:
EAX:	00000000 	EBX:	423ce730 	ECX:	080af6e4 	EDX:	00000000
ESI:	080897c0 	EDI:	bffff068 	ESP:	bffff020 	EBP:	bffff098
EIP:	423b0574 	cf PF af ZF sf of IF df nt RF vm   IOPL: 0
CS:	0023	DS:	002b	SS:	002b	ES:	002b	FS:	0000	GS:	0007

/be
I have a fix for the crash - I ran across the same crash today trying to
reproduce this problem (but I had to poke something in the debugger to get it to
fail). I'll attach the patch, but I still don't know what's causing the root
problems(s).
Attached patch fix for the crash (obsolete) — Splinter Review
make code match Cavin's comment - I wish Seth or I had caught this in the code
reviews :-(
Not really a hang -- I can select another folder than Sent and resume work. 
Only when trying to click on Sent to see my sent messages does the cursor go to
a watch hand and never change, and nothing loads in the threadpane.

So should the crash be a separate bug?  If so, please spin off.  If not, please
set crash keyword.  I'm ok either way, but we'll need to keep this bug open to
fix the corruption problem too.

Should the patch be a candidate for 1.4.2?

/be
Keywords: hang
Let's get this in for the trunk, at any rate.  David, can you set request
tracker review flags?

/be
Flags: blocking1.4.2?
Was this msg send happening as a result of shutdown? I ask that because there's
some goofy code (OK, msg send/compose is *ALL* goofy code) that pumps events if
a message send is the result of trying to shut down with an open compose window.
Also, that other stack trace looks like it's in shutdown. 

The regression that caused this crash was checked in July 14, so I don't think
it's in the 1.4 branch, though it's definitely in the 1.5 tree.
Comment on attachment 132223 [details] [diff] [review]
fix for the crash

requesting reviews - not sure if Seth is around - haven't seen him on Aim.
Since mscott isn't around either, requesting r= from Neil.
Attachment #132223 - Flags: superreview?(sspitzer)
Attachment #132223 - Flags: review?(neil.parkwaycc.co.uk)
I was not shutting down.  The dialog that came up well after I'd canceled and
not-saved the compose window while it was trying to copy to Sent, the one that
asked if I wanted to go back to a compose window on the message, came up while I
was trying to do other stuff.

/be
Flags: blocking1.4.2?
Is that FilterAfterTheFact stack trace valid? Were you doing a filter after the
fact? That would be either Tools | Run Filters on Folder or bringing up the
filter UI and running the selected filter on a folder chosen from the drop down.
Or had you done a filter after the fact earler, and we have a listener that's
not getting unregistered? The code looks like it unregisters correctly. 
I don't think I was running filters, but I might have been fetching mail from
the POP server.  I have to confess, I've gotten good at marking a message Junk,
then hitting alt-T/R for Tools / Run Filters on Folder.  I do hava a filter
named "junk" that moves Junk messages to my Junk folder.  I don't know whether
that is needed any longer; it once was.

I've sometimes been bugged by dialogs saying "couldn't run filter staff;
continue applying filters" when seeming doing nothing (or perhaps I hit ctrl-T
to fetch new messages and forgot; bad habit).  It seems to me filters do run in
the background, or nearly so; and they sometimes do foul themselves up (or
something fouls them up -- sorry I haven't reported *this* bug till now).  Is it
all related?

/be
Hmm, that came off scatterbrained.

I probably was either ctrl-T'ing or alt-T/R'ing when the "couldn't copy to Sent;
bring up compose window again" tardy dialog popped up, upon canceling of which,
Mozilla crashed.  If that's the only way run-filters-after-the-fact could have
run, then I was probably doing one or the other.

But is it possible for filters to run at other times, perhaps as an unintended
consequence?

/be
Sorry if this should be a different bug. I had sorted by read/unread, marked
something junk and alt-T/R'd to run my move-junk-message-to-Junk filter, then
saw a read message I wanted to mark unread, did so, noticed that the Unread
count in the folder view didn't update, heard lots of rattling.

Deactivated all Mozilla windows, found the pid, strace'd piping to tee /tmp/st,
then realized that strace writes to stderr, reran '2>&1 strace -p <pid> | tee
/tmp/st', ctrl-C'd, got the attached.  Disk rattled for a good while longer,
and at some point the unread count for Inbox bumped up one as expected.

Seems like filters can run for a while, or trigger some re-reading/summarizing
(X-UIDL synching?) that runs on.  It further seems like there are reentrancy
hazards, where something comes up from this background rattling and crashes, or
violates some invariant, or something.

/be
OK, so I'm guessing some of your filters are body filters? From the previous
investigation of the disk rattling, I found that body search does the painfully
slow 511 byte reads. How large is your INBOX, and how many filters are body
filters? I doubt that UIDL syncing is going on - I think it's just the body
searching. I'm wondering if this operation is just so slow that it seems like
filters are running in the background when it's just this one operation taking a
long time.

I'm not sure why this would affect the send mail operation - unless, hmm, we
were trying to mark a message as replied and failed because the body search had
the folder lock or something...I'll look more into it.
bienvenu: to quote Basil Fawlty, Brilliant!  I had a lame-ass body filter for a
certain spam pattern |VERIFIED, from long before spambayes days. I've disabled
it.  But this does perhaps explain why I find conflicts later running filters,
where I get a dialog telling me that mail was unable to run a particular filter,
and should it continue to run the rest?

Hope this leads to fixage.

Where is the body filter code?  I have some ideas for how to speed it up.  I do
have a big-ass Inbox (176MB!).

/be
there's a lot of low hanging fruit in the body search code, that's for sure.

the code is in mailnews/base/search/src/nsMsgBodyHandler.cpp

two of the things going on that are really inefficient:

1. closing and re-opening the mail folder for each msg evaluated
2. the aforemention nsIFileSpec::ReadLine code, which reads in 511 bytes at a time.

My guess is that fixing 1) would probably help the most. On NT, it's not really
noticeable, but I imagine it's bad on Linux.

To fix body search, I think I'd add a parameter to nsMsgBodyHandler constructor
that's an input stream to read from, and cache the stream in the search scope or
something...then we could change nsMsgBodyHandler to do its own line parsing
from the stream, instead of using readLine.

I suspect that this could be causing your other problems, if only from the shear
load we're putting on the CPU, coupled with the folder locking issues. I'm still
at a loss about the Sent folder issues, but I'll think about it some more.
Comment on attachment 132223 [details] [diff] [review]
fix for the crash

I don't see why you can't move the if (NS_FAILED(rv)) ClearRequest(copyRequest,
rv) to just after lines 291-294 rv = ... CopyFolder
Attachment #132223 - Attachment is obsolete: true
Comment on attachment 132309 [details] [diff] [review]
 fix addressing Ns comments.

re-requesting reviews.
Attachment #132309 - Flags: superreview?(sspitzer)
Attachment #132309 - Flags: review?(neil.parkwaycc.co.uk)
Comment on attachment 132309 [details] [diff] [review]
 fix addressing Ns comments.

Code and comments look good to me now.
Attachment #132309 - Flags: review?(neil.parkwaycc.co.uk) → review+
Attachment #132223 - Flags: superreview?(sspitzer)
Attachment #132223 - Flags: review?(neil.parkwaycc.co.uk)
Comment on attachment 132309 [details] [diff] [review]
 fix addressing Ns comments.

sr=sspitzer
Attachment #132309 - Flags: superreview?(sspitzer) → superreview+
taking, changing summary, patch has been checked in. Need to figure out hang
next. The fcc is a synchronous operation so it should never take more than a few
seconds.
Assignee: scott → bienvenu
Status: ASSIGNED → NEW
Summary: Sent.msf corruption in trunk builds? → hang doing pop3 fcc, crash cancelling
Keywords: crash, hang
Of course, it's not really a hang, and the crash is fixed. I think I'll spin up
a new bug for the crash, mark it fixed, and cleanup this bug. bug 221027 filed
for the crash.
Keywords: crash, hang
Summary: hang doing pop3 fcc, crash cancelling → pop3 fcc doesn't complete, spins
this patch has been checked into the 0.3 branch.
Product: MailNews → Core
QA Contact: esther → database
Product: Core → MailNews Core
(In reply to comment #25)
> taking, changing summary, patch has been checked in. Need to figure out hang
> next. The fcc is a synchronous operation so it should never take more than a few
> seconds.

bienvenu, not clear to me what's left here to do. close / spin a new bug?
yeah, spin a new bug for the separate issues with the same symptom.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.