Closed Bug 420673 Opened 18 years ago Closed 17 years ago

"Get Mail" intermittently does not get all mail from multiple POP accounts

Categories

(MailNews Core :: Networking: POP, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: leamphil, Unassigned)

Details

Attachments

(8 files, 1 obsolete file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12 Build Identifier: Thunderbird version 2.0.0.12 (20080213) I have a problem getting mail from multiple POP accounts (all going into the same inbox). The symptoms of the problem occur whether there is mail on the POP server or not. I've installed the "Tb Progress History" add-on (http://forum.addonsmirror.net/index.php?showtopic=2031&mode=threaded) which shows more detail of the Get Mail process. When it "sticks" on getting the mail for an id it has performed the following tasks: Connecting to mail.btinternet.com Connected to mail.btinternet.com It fails to progress to the next step when it would issue: Connect: Host contacted, sending logon information (so no blue bar) There are no new messages on the server (for example, if none) Most of the time it works fine for all my ids, then sometimes it "sticks", and the id it sticks on varies. When it sticks, there is no error message or indication that it has failed to contact my mail server (if it was a network problem then I would expect an error message from Thunderbird saying something had timed out - it does not get as far as the sending logon information so no blue progress bar). Having stuck on an id, the get mail messages indicate that it is able to carry on and get the mail for other ids (not sure if this process is sequential or in parallel - the successful get mail messages appear after the one that sticks). Having read other threads I have been compacting my folders regularly (using xpunge) but the problem has re-occurred since. All the ids share a single inbox so if it was a folder problem I would not expect the problem to be limited to a single id. Reproducible: Sometimes Steps to Reproduce: 1.Get Mail on multiple POP accounts sharing a single inbox 2.Sometimes it does not progress to the "Host Contacted" stage 3.Repeated "Get Mails" for the stuck account will eventually get "Host Contacted" and get any mail that is on the server (or give the "no messages on server" message)
Log from "Get Mail" showing lack of "Connected ... " on last account
Log from "Get All Mail" shows lack of "Connected to ..." on two different accounts, one of which was successfully connected on the previous "Get Mail"
I opened this bug report in response to the title change on bug 373451 as I do not reun PopTray or any other similar tools.
(In reply to comment #0) Get POP3 protocol log. See Bug 402793 Comment #1 for getting log. (Read imap:5 as pop3:5 when POP3) I recommend you to use next, because something wrong around login is suspected. SET NSPR_LOG_MODULES=nsHostResolver:5,nsSocketTransport:5,pop3:5 > multiple POP accounts (all going into the same inbox) Question to clarify your environment. What do you mean by "same inbox"? Global Inbox? - Single mail folder of "Inbox" of Global Inbox owner("Local Folders" usually) Is there any limitation of POP3 access? e.g. Next log-in within 5 minutes is not permitted.
I'm struggling to get my BAT file to work - windows prompt runs & then starts another windows prompt without apparently starting thunderbird - contents of Bat file are: SET NSPR_LOG_MODULES=nsHostResolver:5,nsSocketTransport:5,pop3:5 setenv NSPR_LOG_MODULES=nsHostResolver:5,nsSocketTransport:5,pop3:5 set NSPR_LOG_FILE=c:\zip\tbirdlog.txt start "C:\Program Files\Mozilla Thunderbird\thunderbird.exe" Please advise what to do to correct this. I think I'm using Global Inbox - please advise exactly where this is specified (all run under the same Windows logon, single Inbox folder under Local Folders. I do have a message filter to redirect some mail into different folders (under Local Folders) but this wouldn't cut in until mail arrives. No limitations on POP3 access.
This worked for me: set mydate=%date:~-4,4%%date:~-7,2%%date:~-10,2% set mytime=%time:~0,2%%time:~+3,2%%time:~+6,2% set NSPR_LOG_MODULES=pop3:5 set NSPR_LOG_FILE=c:\temp\tb\%mydate%_%mytime%.log c: cd \Program Files\Mozilla Thunderbird start thunderbird.exe
Get Mail for 3 accounts, first one "stuck" (only got to "Connected .." state), 2nd & 3rd worked fine (got to "No messages ..." state). I then got mail for the stuck id and it worked fine.
Please advise if log is OK or if I need to redo in some way.
Can you please remove the first attachment - I did not realise it would contain the contents of an e-mail received.
Following flow is seen for three accounts. (i.e. no problem is found in protocol level flow) > RECV: +OK hello from popgate 2.38.7 on pop807.mail.ird.yahoo.com > SEND: AUTH > RECV: -ERR popgate unknown command > SEND: CAPA > RECV: +OK CAPA list follows > SEND: USER <user_id> > SEND: STAT > (LIST,UIDL,RETR,DELE etc. follows, if required) > SEND: QUIT Above flows for following three <user_id>s exist in log. > SEND: USER bridgwa > SEND: USER kate.bridgwater > SEND: USER leamphil Fisrt step of "AUTH, then -ERR" is current implementation, although it's not so happy behaviour for some POP3 servers or other programs. Bug of "Tb Progress History", isn't it?
Attachment #307019 - Attachment is obsolete: true
Attachment #307019 - Attachment mime type: text/plain → application/octet-stream
Attachment #307019 - Attachment filename: tbirdlog.txt → tbirdlog.zip
The log should show the getting of mail for leamphil, bridgwa, kate.bridgwater in that order (the leamphil one failed), followed by leamphil at a slightly later time which succeeded. The error was the get mail for leamphil before that of bridgwa & as far as I can see it doesn't get as far as sending the leamphil logon (so no mention of the id is in the log). I don't think it is a "Tb Progress History" Bug as I only added that extension so I could see what was going on ... let me know if I must redo this trace without the extension.
(In reply to comment #11) > The log should show the getting of mail for leamphil, bridgwa, kate.bridgwater > in that order I don't know Tb's spec relates to "order of login" or "parallel login" when multiple user_id on same pop3 serever. I don't know any rules(such as RFC) relates to "order of login" or "parallel login" in such case. What is your base of "should be order of 'leamphil, bridgwa, kate.bridgwater'"? > (the leamphil one failed) I can't see any log corresponding to the fail of leamphil". Which log lines are evidence of "the leamphil one failed"?
The order (leamphil, bridgwa, kate.bridgwater) is what I remember seeing in the Tb Progress History window as it did the trace (and the order that it normally comes out in). The advantage of the Tb Progress History add-on is that it shows which account the progress messages are for (eg. "leamphil: connecting to ....") so one can tell a bit more about what is going on - without the add-on one just gets a stream of "Connecting to ..." etc appearing & disappearing in the status line at the bottom of Thunderbird without any way of going back & seeing the history). I don't know which log lines are evidence of the failed leamphil logon because, as far as I can see it doesn't get as far as trying to logon to the POP server - the progress lines I see for the failed Get Mail are just: Connecting to mail.btinternet.com Connected to mail.btinternet.com without the: Connect: Host contacted, sending logon information (so no blue bar) There are no new messages on the server (for example, if none) so the implication I draw is that something is failing before it sends logon information to the host (hence the absence of "leamphil" in the trace for the first failed logon).
Attached file Successful Get Mail
Successful Get Mail for single account (Bridgwa) without Tb Progress History add-on active.
Attached file Failed "Get Mail"
Unsuccessful "Get Mail" for single account (Bridgwa) without Tb Progress History add-on active.
(In reply to comment #15) > Failed "Get Mail" > Unsuccessful "Get Mail" for single account (Bridgwa) without Tb Progress History add-on active. Following logs are seen. >1-st timeout > 2504[1265710]: poll timeout: 180 > 2504[1265710]: timeout = 180000 milliseconds >2-nd timeout > 2504[1265710]: poll timeout: 120 > 2504[1265710]: timeout = 120000 milliseconds Similar logs were seen in top part of previous log for "Get All New Messages". I think these are the evidence of "the leamphil one failed" you said. With/without "Tb Progress History add-on" perhaps doesn't have relation to your problem of "connection failure", because the add-on merely displays history of messages in status filed by Tb. It seems to be timeout issue due to server side delay or busy. When first try of connection for first account/user of consecutive login's, delay/busy at server side occurs (due to such as staging, caching). And, Tb won't issue error message in many situations(it's already reported bug). What will happen when tcp timeout value is increased? Tools/Options/Advanced/Network & Disk Space : Connection / Connection timeout: NNN seconds (mailnews.tcptimeout value of Config Editor or prefs.js)
I had already increased the timeout from 60 secs to 120 secs (I had wondered about this before) but this does not seem to make a difference - failure occurs very quickly (almost immediately), the timeout does not seem to apply to whatever the failure is. Looking at the successful log, that also has the same two poll timeouts that you've picked out above. Is this the section of the log that contains the "failure" ? 2504[1265710]: calling PR_Poll [active=0 idle=1] 2504[1265710]: timeout = -1 milliseconds 2504[1265710]: ...returned after 0 milliseconds 2504[1265710]: nsSocketInputStream::Read [this=25f27e8 count=4096] 2504[1265710]: calling PR_Read [count=4096] 2504[1265710]: PR_Read returned [n=0] 2504[1265710]: nsSocketInputStream::CloseWithStatus [this=25f27e8 reason=0] 2504[1265710]: nsSocketInputStream::CloseWithStatus [this=25f27e8 reason=80470002] 2504[1265710]: nsSocketTransport::OnMsgInputClosed [this=25f2750 reason=80470002] 2504[1265710]: nsSocketInputStream::OnSocketReady [this=25f27e8 cond=80470002] 2504[1265710]: nsSocketTransportService::PostEvent [event=2368ba0] Compared with the successful "Get Mail" which I think has the following at the same stage in the process: 468[1265710]: calling PR_Poll [active=0 idle=1] 468[1265710]: timeout = -1 milliseconds 468[1265710]: ...returned after 0 milliseconds 468[1265710]: nsSocketInputStream::Read [this=25eb7e0 count=4096] 468[1265710]: calling PR_Read [count=4096] 468[1265710]: PR_Read returned [n=60] 468[1265710]: nsSocketTransport::SendStatus [this=25eb748 status=804b0006] 468[1265710]: nsSocketInputStream::Read [this=25eb7e0 count=4036] 468[1265710]: calling PR_Read [count=4036] 468[1265710]: PR_Read returned [n=-1] Is this the right area ? Do the reason codes give any clue ?
(In reply to comment #17) > failure occurs very quickly (almost immediately), the timeout does not seem to apply to whatever the failure is. Can you get NSPR log with DebugView? - Timestamp is added by DebugView. I'll be required to analyze timeout related issues. - No physical log write by Tb. It'll reduce bad affect on testing. - Getting of required part only is possible when DebugView is used. See Bug 86396 Comment #15 for getting DebugView. See Bug 86396 Comment #7 for rough procedure to get log. See my Bug 325649 for example of NSPR log with DebugView. 1. Start DebugView with admin privilege, and stop Capture 2. Set env variables, and start Tb > SET NSPR_LOG_MODULES=... > SET NSPR_LOG_FILE=WinDebug > CD <pgm_lib_of_Tb> > thunderbird.exe -P 2. Start Capture at DebugView 3. "Get Msgs" for the account 4. Save log data at DebugView 5. Stop Capture at DebugView
Failed "Get Mail" logged with DebugView as requested (I hope !).
> Failed "Get Mail" log from DebugView "poll timeout: 120 / timeout = 120000 milliseconds" was not log of "timeout occurred". It was log of "timeout value is set". Follwoing looks to be log of "timeout occurred". > 00000132 13:38:01.937 [4076] 0[2748a0]: nsSocketTransportService::PostEvent [event=25efe90] > 00000133 13:38:01.937 [4076] 2020[1265598]: ...returned after 8283 milliseconds > 00000136 13:38:16.953 [4076] 0[2748a0]: nsSocketTransportService::PostEvent [event=25efe90] > 00000137 13:38:16.953 [4076] 2020[1265598]: ...returned after 15011 milliseconds > 00000140 13:38:32.234 [4076] 0[2748a0]: nsSocketTransportService::PostEvent [event=25efb28] > 00000141 13:38:32.234 [4076] 2020[1265598]: ...returned after 15018 milliseconds > 00000144 13:38:47.234 [4076] 0[2748a0]: nsSocketTransportService::PostEvent [event=1e33d00] > 00000145 13:38:47.234 [4076] 2020[1265598]: ...returned after 14740 milliseconds Following is POP3 access flow using telnet. > telnet > open pop.ops.dti.ne.jp 110 > +OK melody-2.1-DTI server ready <0011025.1204703970@pop.ops.dti.ne.jp> > +--> (POP3 flow) > | user <My_User_Id> > | +OK Password required for <My_User_Id>. > | pass <My_Password> > | +OK There are no messages (0 octets) in your mailbox. > | stat > | +OK 0 0 > | quit > | +OK Closing session. > +--> > close > quit How long does your server usually take to return "+OK" for "open <server_name> <port#>" request? > telnet > open <server_name> 110 > +OK
Telnet response (08.45 local time) was immediate. When getting mail successfully, there is occassionally a pause of a second or two but not more.
Following log perhaps corresponds to "telnet mail.btinternet.com 110". > 00000001 13:37:53.578 [4076] 0[2748a0]: nsSocketTransport::Init [this=25f0280 host=mail.btinternet.com:110 proxy=:0] And following is probably a result of DNS lookup. > 00000050 13:37:53.640 [4076] 2020[1265598]: trying address: 217.146.188.192 Note: My DNS says 217.146.188.192 is for mail.btinternet.com too. And NSLOOKUP by http://www.dnsstuff.com/ says; > IP address: 217.146.188.192 > Reverse DNS: pop-smtp1-f.bt.mail.vip.ird.yahoo.com. And, as you say, "telnet mail.btinternet.com 110" returned next response very quickly in my environment(from Japan). > +OK hello from popgate 2.38.7 on pop803.mail.ird.yahoo.com Packet loss due to (inappropriate) packet filtering on some situations? Can problem be re-produced when firewall software on your PC is disabled?
When "telnet, open pop.xxx.xxx.ne.jp 110, quit", both POP3 session and TCP connection was closed(seems to be spec of telnet), then no TCP connection remained after quit of POP3. (while accessing via telnet, status by NETSTAT is ESTABLISHED) This was same(no remained TCP connection) when mail download by Seamonkey 1.1.8 and single account on a POP3 sever. But, when mail download by Seamonkey 1.1.8 and when multiple accounts on same POP3 server, TCP connection was kept for a while after POP3 server access, and new port number(Semonkey side) was used on each following "Get Msgs". Then multiple TCP connections were kept for a while. Sorry but I don't know expiration time of the TCP connection of TIME_WAIT. > NETSTAT after 3 "Get Msgs" for an account on pop.s3.dion.ne.jp > TCP HP-Note-PC:2212 pop.s3.dion.ne.jp:pop3 TIME_WAIT > TCP HP-Note-PC:2213 pop.s3.dion.ne.jp:pop3 TIME_WAIT > TCP HP-Note-PC:2215 pop.s3.dion.ne.jp:pop3 TIME_WAIT If next POP3 server access is done before expiration of above TCP connection of TIME_WAIT, number of the kept TCP connection of TIME_WAIT will increase. When your case, three TCP connections will be added on each "Get All New Messages", because you have three accounts on the server. If mail.btinternet.com has restriction on number of TCP connections, and if number of the kept TCP connections of TIME_WAIT exceeds the limit, connection error can occur on next "Get Msgs" and/or "Get All New Messages". What does "NETSTAT" before/after "Get Msgs" say? What does "NETSTAT" before/after "Get All New Messages" say? What is your "Check for new messages every NN minutes" setting?
FYI. Similar situation on many & consecutive FTP request is reported in Bug 354784 Comment #5. See following for TIME_WAIT status of Winsock or Unix Socket. > http://tangentsoft.net/wskfaq/articles/debugging-tcp.html > 4. Problem: Netstat shows lots of sockets in the TIME_WAIT state. What's wrong? > http://www.developerweb.net/forum/showthread.php?t=2941 See next page for MS Win's registry entry relates to TIME_WAIT. > http://www.vm6.ca/Support/Knowledge/View.aspx?kbid=KB01001 Note: When TIME_WAIT status, "NETSTAT /O" says PID(owner)==0.
Attached image shows Netstat /O before & after a "Get Mail". "Check for new messages every NN minutes" setting is off for all accounts.
> Image of Netstat /o before/after Get Mail You can get text data easily by following. NETSTAT /O > C:\Test\list-1.txt When "Get Msg" of an account is executed repeatedly & consecutively, how many TCP connections of TIME_WAIT to the POP3 server will remain? Increases infinitely? When many connections of TIME_WAIT remain, problem is re-produced? When no remained connections of TIME_WAIT, problem is re-produced?
I don't know if it is relevant but I had the "stuck" situation on a couple of the id's just after I had booted up this morning, but not (yet) had it since. In this case I guess that there would not be any sockets in TIME_WAIT state before opening the mail (I didn't look because I hadn't yet seen your posts at the time). I've just done "Get All Mail" repeatedly without any problems and have ended up with 51 sockets in TIME_WAIT state. Starting with no sockets in TIME_WAIT state, I haven't had any problems (at the moment) ... situation is intermittent and isn't occurring very frequently right now (though I did have a couple of stuck id's initially this morning. I'll try rebooting and seeing if I get problems again.
Reboot & "Get Mail" was fine (so just coincidence I guess). What does the trace show as failing in the stuck case (& why doesn't Thunderbird indicate that something has failed) ?
Got "stuck" condition on one of my accounts, closed down my firewall (ZoneAlarm Pro) and got "stuck" condition again with no firewall ... so I guess it's not the firewall.
Going back to the log, is this where the error occurs: 00000078 13:37:53.640 [4076] 2020[1265598]: calling PR_Poll [active=0 idle=1] 00000079 13:37:53.640 [4076] 2020[1265598]: timeout = -1 milliseconds 00000080 13:37:53.640 [4076] 2020[1265598]: ...returned after 0 milliseconds 00000081 13:37:53.640 [4076] 2020[1265598]: nsSocketInputStream::Read [this=25f0318 count=4096] 00000082 13:37:53.640 [4076] 2020[1265598]: calling PR_Read [count=4096] 00000083 13:37:53.640 [4076] 2020[1265598]: PR_Read returned [n=0] 00000084 13:37:53.640 [4076] 2020[1265598]: nsSocketInputStream::CloseWithStatus [this=25f0318 reason=0] 00000085 13:37:53.640 [4076] 2020[1265598]: nsSocketInputStream::CloseWithStatus [this=25f0318 reason=80470002] ... in particular the "PR_Read returned [n=0]" compared with a successful Get Mail which has "PR_Read returned [n=60]" ? I don't know what this all means but as a total guess is the PR_Read expecting to get some data back but isn't in the [n=0] case hence TB aborts the "get mail" without any error message ?
Any progress likely on this ? I'm still getting the problem significantly often.
Any more info I can get on this to assist debug ? ... I'm still getting the problem very frequently.
wada, what do you conclude from comment 28?
Component: General → Networking: POP
Product: Thunderbird → Core
QA Contact: general → networking.pop
(In reply to comment #34) > wada, what do you conclude from comment 28? Bug opener says following. Bug opener says "situation is intermittent" also. So it doesn't seem for me to be "problem due to too many not-deleted-yet socket/connection". > I've just done "Get All Mail" repeatedly without any problems > and have ended up with 51 sockets in TIME_WAIT state. Nothing is seen in POP3 level log nor Tb's connection/communication level log for bug opener's problem. From log, it can be said 'lack of "Connected ... "' only. So I can't guess other than "issue outside of Tb" (network level issue, environment issue). And, I can say nothing about problem analysis of network/environment issue.
Problem has currently "gone away" and I'm not getting any problems - I haven't changed anything so I would guess that it is likely to be a networker/server problem ... ... HOWEVER I think Tb should trap whatever the error was & say that it was unable to get the mail for the failing account rather than just get the error and carry on as if it had been successful. There should be an error path for where it starts getting the mail for an account, ie. once it has issued Connecting to mail.btinternet.com Connected to mail.btinternet.com but hasn't managed to complete with: Connect: Host contacted, sending logon information There are no new messages on the server
which leaves us with external causes and => invalid leamphil, thanks for working through this
Status: UNCONFIRMED → RESOLVED
Closed: 17 years ago
Resolution: --- → INVALID
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: