Closed Bug 127461 Opened 18 years ago Closed 16 years ago

Get New Messages stops working for one account

Categories

(MailNews Core :: Networking: POP, defect, minor)

x86
Windows 98
defect
Not set
minor

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.3alpha

People

(Reporter: davidebsmith, Assigned: Bienvenu)

References

Details

(Keywords: fixed-aviary1.0)

Attachments

(2 files, 3 obsolete files)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.8) Gecko/20020204
BuildID:    2002020406

Have multiple POP accounts. From time to time, Get New Messages stops working on
one account. (Which of the accounts stops working varies from time to time). 
Clicking on button or selecting from menu generates no action -  no action on
dial-up network connection.  Other accounts continue to work OK. Only way to
clear problem is to exit Mozilla entirely (including Quick Launch) and restart.
Seems to happen after Quick Launch has been running for a long time and Mail has
been started & exited multiple times.

Reproducible: Sometimes
Steps to Reproduce:
1. USe Mail on a daily basis with Quick Launch running
Also, when this happens there's no "Connect: Host contacted, sending login
information" or any other messages at bottom of mail window.
This could have been fixed by bug 125503 because summary was getting invalid 
in some cases and we were not able to download any messages. Please use the latest
builds and report back. 
Still happening in RC 1 but less frequently? After it happens, exit Mozilla &
Quick Launch, restart Mozilla, get message about "Building summary file for
Inbox" on hung account. 
I'm not sure it's the same bug, but I have a VERY similar problem. I have one
POP account and every once in awhile (it seems random) the get messages commands
will stop working, just as David described. I have to completely exit Mozilla
and restart to get it working again. 

I'm running 2002051309 on Linux.
similar: bug 147156
I have the same issue with 2002053012 on Windows 2000.  It works for a couple
hourse (and autodownloads) then refuses to contact the POP server.  I have to
exit all of Mozilla (including quicklaunch) for it to attempt to connect to the
POP server.  This happens every day.
Blocks: 165832
QA Contact: sheelar → stephend
Please go to bug 165832 and check if this here is a dupe or any of the bugs
listed there is a dupe of this one.

pi
taking.

today is "fix get msg button problems" day.
Assignee: naving → sspitzer
Status: UNCONFIRMED → NEW
Ever confirmed: true
Target Milestone: --- → mozilla1.3alpha
accepting
Status: NEW → ASSIGNED
Same thing happens to me On Linux. 
Multiple accounts, some pop some IMAP, One particular (slow) account stops not
responding as above. Looks like an error flag is set that will not reset until
complete exit.
 
I'm afraid this is a very critical bug for me as I can no longer use any other
e-mail client because only Mozilla can cope with my Junk mail (tons of them). 
 
I use 1.5b same problem with 1.4 & 1.3 .I run Linux mandrake 9.1 
I believe the OS should be changed to 'All' for this bug. I'm seeing it on
Windows XP with 1.5b. From other comments, this appears to be fairly universally
reproduced. Automatic periodic retrieval of mail just stops working and any
attempt to initiate a get mail manually for that account has no effect. Close
Mozilla and then re-open it and mail downloads normally again.
I agree that this should be OS all.  I am seeing this on Linux (Redhat 9 with
all patches applied and Mozilla 1.4 (Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.4) Gecko/20030701).

See also:

http://bugzilla.mozilla.org/show_bug.cgi?id=213581
http://bugzilla.mozilla.org/show_bug.cgi?id=204702

I have several nspr logs showing that Mozilla stops sending SEND: USER requests
to the server.

As of yesterday this was still happening.
While waiting for the problem to repeat I :

1. upgraded to Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5b) Gecko/20030831
2. have a program scanning nspr logs that will detect when Mozilla stops issuing
   SEND: USER <account>.  
   This lets me detect the problem sooner.  
   I also have all of the problem session logs for analysis.

As far as I can tell,
- Mozilla periodically issues SEND: QUIT
- mostly it reestablishes the connection
- sometimes it seems to forget that it was retriveing mail 
  for one or more accounts (but not all accounts)

For what it's worth the mail server I'm using reports itself as Intermail (in
the nspr log).  I don't think it's server related but JIC that's what I'm using.
 The traffic and mozilla's POP states appear pretty normal with no obvious
errors associated with the problem.

Another challenge is that it could easily take a couple of weeks for the problem
to repeat itself (or I could get swamped today).

I've heard nothing on this or the other related problems.  Any suggestions for
further diagnosis that I could try?

There seems to be some agreement that the problem is multiple OS (Linux, Win
98/NT/2000) and has been around from 0.9.8 to 1.5b: 
  
- Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.8) Gecko/20020204
- Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4a) Gecko/20030401 
  WebWasher 3.3
- 2002053012 on Windows 2000
- 1.5b on Windows XP

- 2002051309 on Linux.
- Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030313 (RedHat9)
- Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030701 (RedHat9)
- 1.4, 1.3, 1.5b Linux mandrake 9.1

It appears to be very different from
http://bugzilla.mozilla.org/show_bug.cgi?id=165832 where the button itself
didn't work.  In this the button works but what's happening isn't expected.

It also appears that a change betwwen 1.3/1.4 may have fixed a related problem
where the progress bar ran endlessly.  That problem masked this one. See
http://bugzilla.mozilla.org/show_bug.cgi?id=213581


But I haven't seen comment from anyone at mozilla/netscape on this recently. 
What's the process here? Should *I* update the OS and version as the problem
reoccurs?  Or open a new problem and back reference?

David

The solution?

When the button "get messages" is not working, complete restart of Mozilla isn't
necessary. It also helps to press the "stop" button first, wait some seconds and
then press the "get message button" again. 
Clarification.  The problem I am encountering is with automatic mail retrieval
of multiple accounts and the Get All New Messages function associated with the
Get New Messages Button.  These are what I use most.   Get New Messages only
applies to the mail account you have currently selected.  And I use this rarely.
 When the problem surfaces Mozilla forgets to make requests for all but one
account.   

If I recall correctly, selecting one of the other accounts and running Get New
Messages doesn't work. (I did not try the stop button). And hence I've had to
restart.  I will confirm these things upon next occurance.

David
Problem repeated yesterday (the first time on 1.5b for me).  Checked and there
is no stop button.  Three accounts were being automatically retrieved, the one
that wasn't forgotten about was the one I had selected when the problem
occurred.  Selecting the other accounts from the Get Messages pull down did not
requery the server - a full restart of mozilla was needed.

At the time I detected the problem, netstat showed 2 established POP3
connections from my system.
Again, the nspr logs show nothing unusal.

David
>Checked and there is no stop button.

You have to make your stop button visible!
Edit -> Preferences "Stop"
Added the stop button.  I'll wait till next time (sigh).

BTW.  If you have any ideas as to what else to look for/at.  I'm game to try
them out.  I can run packet capture or system traces.  Since I'll need to run
them all the time, some insights/gudiance/theories/hunches on what to look for
would help.

Also, I've reprocessed the raw logs.  Here's a small snippet to give you an idea.

--RECV: +OK InterMail POP3 server ready.
--POP3: Entering state: 31
--POP3: Entering state: 5
++SEND: USER second.userid
++Tracking/expecting 3 email accounts
++[1] third.userid
++[2] primary.userid
++[3] second.userid
+*Sequence error, expecting SEND: USER third.userid
+*Sequence error, expecting SEND: USER primary.userid

I'm summarizing RECV +OK/-ERR, POP3, SEND, Entering NET_ProcessPop3 messages and
changes in the alpha-numerical tag prefixing each line.  I can easily add some
pretty complex state tracking logic if that would be helpful.  And I still have
the raw logs from all my failed sessions to go back to.

Thanks, David
This is still happening with 1.5 (official rpm from Mozilla site).  There
doesn't seem to be enough info in the nspr logs to figure this out.   There is
no obvious pattern with the state information that suggests a broken pattern (it
might help to know what the states mean).   Please see refered bug for more details.

IMHO this should be changed to OS ALL.

I'm willing to do some more investigation on this but some suggestions would be
helpful. 

David
Could the problem be due to an (earlier) timeout on the POP server?
See bug 199784, bug 199914.
These timeout bugs are interesting and may be related but it isn't clear how. 
In the first case, the description is a bit imprecise talking about trying to
connect to the POP server some number of times to make it timeout.  In the
second case, the users ADSL connection dropped for 4-6 hours.   In all my
experience, my system has been on continuously and there doesn't seem to be a
fixed interval (or number of SENDs).

As I have several accounts on the same pop server, I've increased the time
between checks for mail on each account.  With one each set to every 10, 30 & 60
minutes I expect that the server will timeout on one accout before the others! 
It may take a while to gain the evidence.  Wish me luck.

David
I just noticed the server settings on my accounts are different (beyond the
timeouts I just set).

Account #1 checks for new messages at startup and automatically downloads
messages.  My other accounts do not have this checked.  As I usually explicitly
initiate get messages for these I am beginning to suspect the automatic download
(or lack of it) is the problem.  I have not yet lost account #1 but I have lost
all the others.   I have now set check at startup for all and a mix of checking
times and automatic download.  With some patients and some adjustment this may
pay off.

David
See also bug 225869.
*** Bug 130578 has been marked as a duplicate of this bug. ***
I am running 2003120808 on Windows 2000.  I have 4 pop accounts, but I only
automatically retrieve mail from 3 of them.  Moz mail stops being able to get
new messages for me once or twice a day sometimes.  My gut tells me that it
might have something to do with the flakey broadband (supposedly always-on) ISP
service I have here in China, as it always seems to happen during times when I
notice flakey internet connectivity in general.  My email host is in the USA for
2 of the three POP accounts.  DNS sometimes becomes unreachable, connections
will drop frequently, and sometimes its just very slow.  The Great Firewall of
China really imposes a burden sometimes.  When this happens, I can still use the
browser, and sometimes I can still send email.  (Sometimes yes, and sometimes
no, but mail sending **** out at other times also independently of this main
problem.  I just have to try sending several times in a row.)

When I restart moz, I get the "rebuilding summary file" message while it
rebuilds.  Strangely, I jjust started getting that after this last release. 
Dont remember seeing that before.
I am experiencing this same problem w/ Mozilla 1.6 on WinNT 4.0, and have been
since 1.4 or 1.5.  Increasing the "Check for email" interval has reduced the
frequency of occurrance.  The "check for email" interval on my five POP accounts
are 3, 5, 10, 20, and 20.  The busier accounts have the smaller time interval. 
I've seen it happen the most on the 5 minute interval account, but also on the 3
minute interval account, although maybe I don't notice it when it happens on the
other accounts.  When the 3 minute interval account was at 1 minute, it happened
noticably more frequently, probably every day or two.  Now I might go a week
before it happens.  All accounts are set to "check on startup" and
"automatically download".  The 5, 20, and 20 interval accounts also "leave
messages on server".

I'm willing to help diagnose this problem.  Just let me know what to do.
An update and questions.

- Switching user profiles corrects the problem.  This is only very slightly less
drastic than restarting Mozilla.  It's not very useful but it may provide a clue.

- I have about 40 nspr logs of problem sessions (a fraction of my occurrences).
 However, 10 have netstat information intertwined and 5 have both netstat and ps
information entwined.  I haven't looked at them in detail yet.  Hopefully, I
should have some time next week and they can provide a clue as to what's going on.

- re:  Mike Cowperthwaite's post about this being possibly due to a timeout of
the POP server.  It sounds similar and possible but I have reservations.  Mike,
how did you determine it was actually a timeout?  I find the time frame is
variable.  Any account can go awol. I also find that it's not the least
frequently checked account that stops getting queried.  I've also seen a session
where one and then later a second account stopped getting queries (but never
all).  Mostly it seems to be the account that gets the most mail that goes awol.

- I'm still waiting for some suggestions as to what to look for.  Any theories
or guidance would be welcome.

Lastly, including this bug there are at least 3 related bugs on multiple OSes
going here all open.  Wouldn't it be better to roll these up into one?

http://bugzilla.mozilla.org/show_bug.cgi?id=127461 assigned to Seth Spitzer
http://bugzilla.mozilla.org/show_bug.cgi?id=204702 assigned to Scott MacGregor
(gone)
http://bugzilla.mozilla.org/show_bug.cgi?id=165832 unassigned Meta bug

Thanks, David
I didn't determine anything in the referenced bugs; I just tried to point out 
other bugs that sound similar.  Bug 204702 may also be the same problem, as you 
note.  I have experienced bug 125328, but that was less severe than this bug, 
and hasn't been a problem since replacing my modem.
Maybe, bug http://bugzilla.mozilla.org/show_bug.cgi?id=214314 can be placed here?
*** Bug 201530 has been marked as a duplicate of this bug. ***
I believe I am seeing a related problem. I am running Moz 1.7 beta on GNU/Linux
(Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7b) Gecko/20040316 is the
signature on the About page), and I've observed that if I leave Moz open for a
long time, I often am unable to retrieve mail. Clicking on Get Messages, using
Ctrl-T, or clicking on the menu item all seem to do nothing. If I restart Moz
completely, then I'm fine.

I *think* I've only observed this after suspending and resuming my laptop, but
I'm not sure. I've only got one account set up on this machine, so it's possible
that if I had more than one, I would be working on the other... 

One other observation: while I am currently in this state, I have been able to
successfully send mail out, but I've not retrieved any mail, even though I've
sent a reply to this account.

If I there is more diagnostic information that I can provide, please contact me
at my e-mail address and I'll see what I can do. I also observed this in 1.6.
Thanks.


As a follow-on to my comment #25 and the recent #31, I am now running 1.7RC1 and
these days only retrieve email from a single pop account.  (Although I still
have several configured, they are not set to retrieve or check email.)   I do
not use  QuickLaunch.  I still see this problem occasionally. (I no longer get
the "rebuilding index" message, but it seems like that, or something similar,
happens without the message anyways when re-starting up after the problem has
occurred.)

Other changes in my usage is that I set it to check mail every 10 minutes, but
not download.  (Email where I live is filtered/censored and so when I
automatically download, some of it goes mysteriously missing if it contains
sensitive keywords. Oddly this does not happen if I dial up, so I usually dial
up to get email these days, and the connection is fairly stable and does not
seem to cause this problem.)  But this sometimes occurs just by having mozilla
check for new email every 10 minutes, even if it is not set to actually download
the email.  This seems to occur in late evenings when my line also becomes
extremely congested, and sometimes connections are dropped or cannot be made.  

FYI, the pop download process on my broadband connection is often interrupted
due to the filtering/censoring processing taking over the connection.  (I
presume, since whenever this happens, emails go missing.  The behavior of
mozilla is slightly different depending on whether the pop download completed
normally or was interrupted, so I can tell when this happens.  I have also
verified this behavior using a web based interface to email that is not filtered
and "seen" emails go missing.  So I am pretty sure of what is happening.)  If
anyone believes this may be related to errors or dropped connections during the
pop protocol processing and needs a testbed, I am willing to help.  The
filtering/censoring is easy to trigger.
This problem can occur if the POP3 server does not send a response, or that
response got lost/filtered in the network.  You can check if this is the case
for your situation by using 'netstat' to see if a POP3 connection remains up for
that POP3 server. [What a coindcidence!  I just did a netstat and I see that I
have a POP3 connect staying up and the GET button doesn't work for that account.
 This is with 1.7 RC2.]

While this is a problem with the POP3 server not responding, Mozilla could
recover from this condition if a timeout was implemented for expected responses.
I am now running 1.7RC2 (on Win2K) and was able to reproduce my Get Msg button
not working problem.  When I looked at netstat, there was indeed an open pop3
connection.

TCP  mylaptop:2236  mymailserver.com:pop3  ESTABLISHED

On Larry's advice, I downloaded a utility that allowed me to kill the
connection, at which point Get Msgs started working again.  No restart of
Mozilla was required.  I can likely reproduce this again within a few days if
anyone needs futher details or tests.
I have just confirmed that this problem still exists in the final version of
1.7.  The pop3 connection remains open, and when I kill it using a 3rd party
product, mozilla mail recovers.

As an added note, I believe that I have observed one, perhaps two additional
scenarios of being unable to get email, but where no pop3 connection remains
open.   (Although in one, a connection seems to remain in the TIME_WAIT state -
but I need to carefully observe it again before I swear to this.)  They are less
common and hence Im having more problem reproducing them.  But they are
definitely distinct from this one.
I just recently upgraded to 1.7 using the rpms.   The problem astill exists.

After 22hours and 289MB of logs, mozilla failed to issue SEND: USER requests on
2 of 3 POP3 accounts.   

I should note that I was also presented with an authentication dialog box
exactly twice.  This is new behaviour that I had hoped that mozilla would have
reestablished the connection.  It appears now that it didn't.

I'll look for further clues in the log.
Oh BTW I'm on Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040619

I just had a look through the logs and have tried to boil them down a bit.  It
will take some time to get my head inside them again.  

One possible clue, although I was prompted for a password long about 2-3 hours
into the run,  I only see SEND: AUTH requests during the first minutes when all
accounts are polled.  Lots of requests received to send the PASS command back,
but this is normal as I recall.
I have been experiencing this problem for a while, but only for the past year or
so.  I agree that OS should be changed to all because I also had this problem on
Solaris through Mozilla release 1.7.  I'm now also seeing this on my
self-compiled Linux Mozilla: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2)
Gecko/20040815.  My Mozilla Mail setup is fairly similar to the others described
- 2 POP accounts and 2 IMAP accounts; none of them set to "check for new
messages at startup."  I used to have 5 POP accounts and 2 IMAP accounts but
that didn't make any difference.  Running netstat also reveals an ESTABLISHED
connection to the POP server of the "problematic" connection.  What I've
observed is that it's always my first account (a POP account) fails.  The
interval between is seemingly random.  I occasionally get that password dialog
but I have a feeling that it is not related to this bug.

By the way, where can I download that utility that kills network connections?

Thanks.
> By the way, where can I download that utility that kills network connections?

For Windows, I've used both of these:

http://www.nwpsw.com/estopmain.html (Shareware)
http://freehost14.websamba.com/nirsoft/utils/index.html (Freeware)

For Linux, I use this script, you just have to modify POP_SERVER and POP_PORT:

  #!/bin/sh

  # you need to modify these
  POP_SERVER="pop.isp.net"
  POP_PORT=110

  LSOF=`lsof -i tcp@$POP_SERVER:$POP_PORT | egrep '[m]ozilla'`

  PID=`echo "$LSOF" | awk '{print $2}' | sed -n 1p`
  FDS=`echo "$LSOF" | awk '{print $4}' | tr -cd '[:digit:]\n' | sort -u`

  GDBX=/tmp/gdb-close-mozilla-pop-fds-$$

  echo "attach $PID" >"$GDBX"
  for fd in $FDS ; do
    echo "call close($fd)" >>"$GDBX"
  done
  echo detach >>"$GDBX"

  gdb -batch -x "$GDBX"
  rm -f "$GDBX"
This proposed patch implements a response timer in nsPop3Protocol so that hung
connections are dropped.  A preference setting specifies the timeout.  The
default is 45 seconds.
Attachment #157755 - Flags: superreview?(bienvenu)
Attachment #157755 - Flags: review?(ch.ey)
I'd like to have such a timeout on a lower level, not in the protocol code. But
nobody has such a solution and this one here is real and does the job.

As far as I tested it, it works. Maybe the password prompt can become a problem
if the user needs more time than the timout is to enter (look, search for and
type) the password. What do you thing about an exception?

The users possible desire to adjust the timeout on a per server basis shouldn't
be needed if I'm right. Increasing the general timeout for a slow server
shouldn't harm the faster ones - only recognizing the timeout will be slower.

And it would be nice to use spaces instead tab in the line
+	("mail.pop3_response_timeout=%d", m_responseTimeout));
Yes, I agree that this should be done at a lower level but I don't think that's
going to happen any time soon, which is why I encouraged Larry to do this. The
password prompt is a good point - we should only have the timer in affect when
we're waiting for a response, so we should be setting the timer after sending
data, and clearing it when we receive data. I need to apply the diff in context
to see if that's what's going on...
Christian and David,

I just tested the password prompt.  It times out and drops the connection.  The
easiest change that would handle this is to cancel the timer in GetPassword()
right before the Password dialog is called.  The timer would them be set again
when the password is sent to the POP3 server, or it could be set again when the
dialog closes.  I say this is easiest because there are three places that
GetPassword() is called, so I think it's better to have it in GetPassword().  

The other issue is that while all data is sent via nsPop3Protocol::SendData(),
there is no such method for receiving.  ReadNextLine() is called in ten
different places.  Additionally, some states don't just read one chuck of data.
 And they might have multiple conditions for determining the response is
complete.  It just seemed like a nightmare to try to cover all the conditions. 
For that reason I just set the timer when the data is sent and reset it the next
time data is sent.  For my own use, with three different POP3 servers, this
handled all cases except for large email.  For that I reset the timer after each
chuck is received.

Let me know what you think is the best way to handle the password prompt and I
will make that change.  I will also replace the tab with spaces.  Thanks!
Larry, here's what I had in mind:

make nsPop3Protocol override nsMsgProtocol::OnDataAvailable - this is called
whenever data comes in and would be the right place to clear the timer. So it
would look like this:

NS_IMETHODIMP nsPop3Protocol::OnDataAvailable(nsIRequest *request, nsISupports
*ctxt, nsIInputStream *inStr, PRUint32 sourceOffset, PRUint32 count)
{
  CancelResponseTimer();
  return nsMsgProtocol::OnDataAvailable(request, ctxt, inStr, sourceOffset, count);
}

you'd also need to add the function override decl to nsPop3Protocol.h

This seems to me to be a very general way to handle this, w/o worrying about
special casing the password prompt code, or any other prompting code...However,
I guess there could be a problem with my approach if we get an
OnDataAvailableEvent, and process all the data, and then wait for more data w/o
sending a command to the server, but we never get another dataavailable event. 
So it seems like you'd want to turn the timer back on after we've read all the
data so far, maybe after the while loop in nsPop3Protocol::ProcessProtocolState,
when we know we're pausing for read - then I guess you wouldn't need to do it in
SendData...

Does that make sense? What do you think, Christian?  
David,

I think your suggestion just might do it.  Although I don't have to override
OnDataAvailable because that just calls ProcessProtocolState.  

The initial timer can be set prior to LoadURL.  Then upon entry to
ProcessProtocolState, the timer can be canceled. ProcessProtocolState has three
return statements.  One for if the username is not set (error condition), one
for the POP3_FREE state when the socket is closed, and one at the end when the
while loop exits because we need to pause for more data.  So setting the timer
again at the end looks like it will do the trick.

I'll run with this for a while and then submit a revised patch.  Let me know if
 you think of anything else.
I don't see the need to override ODA too.

Instead having SetResponseTimer() in SendData() and RetrResponse() I just put a
call to CancelResponseTimer(); above the while statement in
ProcessProtocolState() and SetResponseTimer() just before the final return in
the same function.

This did the job and also the pw prompt was no problem so this looks like the
right thing to do.

I guess the CancelResponseTimer(); in CloseSocket() wouldn't be necessary also,
right?
Revision 2 of my patch, reworked per comments #45, #46, and #47.
Attachment #157755 - Attachment is obsolete: true
Attachment #157869 - Flags: superreview?(bienvenu)
Attachment #157869 - Flags: review?(ch.ey)
Comment on attachment 157869 [details] [diff] [review]
Proposed patch - Revision 2 based on Christian and David's comments

we don't need the pr logging in CancelResponseTimer. Can you remove that?

I'm not sure about starting the timer in LoadUrl - that will start the timer
before necko does the dns lookup, which has its own timeout code - it might be
the right thing to do, however, since in theory we could be connected without
getting a greeting from the server. So as long as it's OK with Christian,
sr=bienvenu, as long as you remove the logging from CancelResponseTimer.
Assignee: sspitzer → lcook
Status: ASSIGNED → NEW
Attachment #157755 - Flags: superreview?(bienvenu)
Attachment #157755 - Flags: review?(ch.ey)
Comment on attachment 157869 [details] [diff] [review]
Proposed patch - Revision 2 based on Christian and David's comments

> since in theory we could be connected without getting a greeting from
> the server

No greeting after connection? Not with an RFC compliant server. So timing out
while waiting for an initial greeting is ok.

I can't say if starting the timer in LoadUrl() is too.
The right point would be after setting up the TCP/IP connection and before
returning to the protocol handler. But having it in LoadUrl() looks ok too, the
necko timers shouldn't conflict here.
Attachment #157869 - Flags: review?(ch.ey) → review+
>No greeting after connection? Not with an RFC compliant server. So timing out
>while waiting for an initial greeting is ok.

I was thinking of a network problem occurring somewhere in between establishing
the connection and getting the greeting back. Probably highly unlikely...
Attached patch Proposed patch - Revision 3 (obsolete) — Splinter Review
I removed the pr logging from CancelResponseTimer.

Regarding setting the initial response timer, I'm willing to change it if you
have a suggestion for a better place?
Attachment #157869 - Attachment is obsolete: true
sorry, I forgot about this - one thing that occurred to me : if we leave the
call to the timer in LoadUrl, we might have a problem if that kicks off a
dial-up network connection which takes longer than the timeout to establish
(e.g., the dial-up connection requires a password prompt).

So I'm thinking we don't need to call in LoadUrl - the call at the end of
ProcessProtocolState should be sufficient.  Necko handles connection time-outs,
and once we have a connection, we'll be in ProcessProtocolState and be OK from
there. What do you think?
I did some testing using a simple socket server.  If the server sends back any
data, even just one byte, we get into ProcessProtocolState.  But if the server
doesn't send back any data, then we don't.  Therefore, to handle the case of the
server accepting the connection but not sending any data, I need to know when
the connection has been established so the POP3 response timer can be started. 
So how do I know that the connection has been established?  I see OnStartRequest
and OnTransportState in nsMsgProtocol.  Might overriding either of those provide
that ability?
Darin would know for sure. It looks to me like the connection has been
established when OnStartRequest is called so that might be a possibility. 
> Darin would know for sure. It looks to me like the connection has been
> established when OnStartRequest is called so that might be a possibility. 

OnStartRequest happens once at least one byte has been transferred.  To know
when a connection has been established, you should implement
nsITransportEventSink.  In your OnStatus, look for aStatus ==
NS_NET_STATUS_CONNECTED_TO.

I've often thought that we should send OnStartRequest when a connection has been
established, but the current nsInputStreamPump machinery (which generates
OnStartRequest from nsIInputStreamCallback::OnInputStreamReady) has no way of
knowing when that happens.  Perhaps it could be accomplished by having
nsSocketTransport call OnInputStreamReady once a connection is established... of
course, that is somewhat of a lie, since the input stream would not necessarily
have any data to read at that point.

I think it would be best for you to try to use nsITransportEventSink if at all
possible.
we already are - Larry, the call is nsMsgProtocol::OnTransportStatus - you could
just override this, detect NS_NET_STATUS_CONNECTED_TO, and then always call the
base class.

thx, darin!
Implemented suggestion in comment #57.

Removed SetResponseTimer call prior to LoadURL and instead overrode
OnTransportStatus to call SetResponseTimer when status is CONNECTED_TO.
Attachment #158076 - Attachment is obsolete: true
Comment on attachment 158495 [details] [diff] [review]
Proposed patch - Revision 4

thx for doing this, looks good. I'm going to kill the unneeded braces in
nsPop3Protocol::OnTransportStatus() before I checkin, but it looks good.
Actually, the args to OnTransportStatus should all have a in their var names,
e.g., aTransport, aStatus, but I can do that too...I know it's not a convention
that's followed to much in that code, but that's because the code is so old.
Attachment #158495 - Flags: superreview+
Attachment #158495 - Flags: review?(ch.ey)
You're welcome.  I'm okay with your changes.  Thanks for your help and for
handling the checkin.
Comment on attachment 158495 [details] [diff] [review]
Proposed patch - Revision 4

That's a really nice solution.
Attachment #158495 - Flags: review?(ch.ey) → review+
fixed on trunk - I think we'll want this for the aviary branch once it opens for
.9 work.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
*** Bug 204702 has been marked as a duplicate of this bug. ***
*** Bug 213581 has been marked as a duplicate of this bug. ***
(In reply to comment #51)
> >No greeting after connection? Not with an RFC compliant server. So timing out
> >while waiting for an initial greeting is ok.
> 
> I was thinking of a network problem occurring somewhere in between establishing
> the connection and getting the greeting back. Probably highly unlikely...
Not too unlikely at my location :)

Anyway, is the issue of the connection startup solved, i.e. set the timer AFTER
the connection is established? It looks as it is, according to comment 58.
This may solve bug 138632.

But there should also be a timeout that kills the connection ATTEMPT if the
server is unreachable (bug 197655). But that should be generic networking, not
Pop3 protocol. The protocol just needs to get an indication that the attempt was
dropped. This may currently not be the case.
I'm thinking we should try this for the aviary branch...
Flags: blocking-aviary1.0?
Attachment #158495 - Flags: approval1.7.x?
Sure.

/be
Flags: blocking-aviary1.0? → blocking-aviary1.0+
Comment on attachment 158495 [details] [diff] [review]
Proposed patch - Revision 4

a=mkaply for 1.7
Attachment #158495 - Flags: approval1.7.x? → approval1.7.x+
Attachment #157869 - Flags: superreview?(bienvenu)
Unfortunately, aceman is right - there are situations where we fail to connect
but still end up thinking the server is busy. I had this happen to me when I was
on a dial-up connection that I disconnected.  I'm not sure if we're not getting
the OnStopRequest notification or what...
I haven't been able to reproduce the problem with a failed connection not
calling OnStopRequest but I'll keep trying...the timer has been checked into the
aviary branch.
Keywords: fixed-aviary1.0
after talking to Darin, I think I got it exactly wrong by saying we shouldn't
start the timer until the connection has been established. Necko has no built-in
timeout on making a connection - it relies on the OS to notify it that the
connection attempt has failed. I'm thinking we perhaps should put the timeout
code before the connection establishment, and clear it if the connection is
established. We might want to use a longer timeout on the connection for the
dial-up case I mentioned earlier, in addition to the fact that the OS will
*usually* timeout...
re-opening...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Add a timeout to the connection case...
Assignee: lcook → bienvenu
Status: REOPENED → ASSIGNED
Comment on attachment 163392 [details] [diff] [review]
fix the connection case

Christian and/or Larry, does this seem OK?
Attachment #163392 - Flags: superreview?(mscott)
Attachment #163392 - Flags: review?(ch.ey)
Attachment #163392 - Flags: superreview?(mscott) → superreview+
Comment on attachment 163392 [details] [diff] [review]
fix the connection case

Looks good although I can't simulate and test this case.
Attachment #163392 - Flags: review?(ch.ey) → review+
Looks good to me.


marking fixed. We'll still be hosed if the necko code that handles establishing
a connection is blocked, but there's nothing we can do about that in mailnews,
as I understand it. If that code is blocked, I think no connections can get
established, browser or mailnews...
Status: ASSIGNED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
*** Bug 199784 has been marked as a duplicate of this bug. ***
Product: MailNews → Core
Based on reading the comments on this bug, I would assume that the developers
think it's fixed by version 1.7.5 of Mozilla, which is what I'm using.  However,
this bug is not fixed.  An interesting difference for me, however, is that lsof
does not report any open connection to the pop server, so that work-around isn't
available.  I am still required to completely close ALL mozilla processes and
then restart to fix the problem.  Note also that I have two accounts on one pop
server; one account can download email, while the other cannot.
Timothy,

(In reply to comment #79)
> However, this bug is not fixed.  An interesting difference for me, 
> however, is that lsof does not report any open connection to the 
> pop server, so that work-around isn't available.

I think you are seeing a different problem.  While the symptom might be the
same, it is not caused by a connection staying open waiting for a server
response that will never be received.  

> I am still required to completely close ALL mozilla processes and
> then restart to fix the problem.  Note also that I have two accounts 
> on one pop server; one account can download email, while the other 
> cannot.

I suggest looking at some of the other open bugs to see which one most closely
resembles what you are experiencing.  Some that I quickly found are bug 132538,
bug 165832, bug 173924, bug 199914, bug 214314, and bug 219657.  I'm sure there
are others.

I'd also suggest turning on logging to see if you can identify what is causing
the problem.  See http://www.mozilla.org/quality/mailnews/mail-troubleshoot.html
for POP3 logging instructions.
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.