Closed Bug 399061 Opened 17 years ago Closed 16 years ago

IMAP4 connections are lost/stalled/garbled?

Categories

(MailNews Core :: Networking: IMAP, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED INVALID

People

(Reporter: chris, Assigned: Bienvenu)

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7
Build Identifier: version 2.0.0.6 (20070728)

I use an tb with two separate accounts hosted on the same IMAP server.

After running for a while (hours, not days), tb appears to lose its connections to the server. I click on a folder, and the status bar reports "Opening folder..." and the throbber will throb forever. The folder never appears to be opened. Messages I have seen before are there and their contents are visible. Depending on how tb is feeling at the time, I might even be able to read a message that appears as unread (that is, "New"), but it looks like that message text has been fetched from the server in the past.

The same is true for individual messages: it looks like tb is trying to fetch a message, but it stalls forever. Meanwhile, reading a message on the other account (same server, remember) works as expected (and quickly). The server is not stalled.

Using tcpview, I can forcibly close these connections and tb often recovers after a "get all new messages" request. Other times, tb cannot recover.

Other times, I get one of the following two messages (can't reproduce easily, so please forgive me if the message text isn't exact):

* Thunderbird cannot communicate with the remote server because it is not a valid IMAP4 server.

* Thunderbird cannot communicate with the remote server because it uses a security protocol which isn't supported.

Both of these error messages are erroneous, as my IMAP server is working properly, even during the times of the (tb) errors. Usually only one account gives these error messages while the other continues to work perfectly.

These errors are unrecoverable: I must restart tb in order to resume normal email operations.

Finally, a tb restart always fixes the problem. I have neither restarted nor reconfigured my IMAP server in the last few months when all this has been happening.

Reproducible: Sometimes

Steps to Reproduce:
I cannot reproduce this problem reliably, but leaving my computer to sit for a while usually does the trick. I believe my router closes dormant connections after a period of time, but I believe that time (10 minutes?) is less than the configured check-for-new-messages time (3 minutes) that I have configured for tb.
Actual Results:  
tb can no longer communicate properly with my IMAP server. Sometimes a forced close of TCP connections will "un-stick" tb, but if I get either of the two errors mentioned above ("invalid IMAP4 server" or "unsupported security protocol"), I'm completely out of luck: a tb restart is required.

Expected Results:  
Assuming that the connections are being damaged somehow, I would have expected tb to destroy them and attempt to create new connections.

I happen to be using "Portable Thunderbird" which is a re-packaging of tb that can be found at http://portableapps.com/apps/internet/thunderbird_portable. I don't believe this re-packaging is causing the problem because when I am outside my home network (with a suspicious router), these problems do not appear to occur.

I have not noticed any other problems with my network, so I do not believe my router is a direct cause (by actually garbling the connections itself, for instance), but I do believe that some/all connection timeouts and closures are being caused by the router.
I should add that this setup has been working since 2004 through various versions of tb starting from before 1.0. I suspect that unstable TCP connections are the problem, and hope that there is a way to get tb to recycle useless connections on its own, rather than me having to close them manually.
Assignee: nobody → bienvenu
Component: General → Networking: IMAP
Product: Thunderbird → Core
QA Contact: general → networking.imap
> Windows NT 5.1
> Build Identifier: version 2.0.0.6 (20070728)

> Using tcpview, I can forcibly close these connections and tb often recovers
> after a "get all new messages" request. Other times, tb cannot recover.

> Other times, I get one of the following two messages (can't reproduce easily,
> so please forgive me if the message text isn't exact):
> * Thunderbird cannot communicate with the remote server because it is not a
> valid IMAP4 server.
> * Thunderbird cannot communicate with the remote server because it uses a
> security protocol which isn't supported.

> this setup has been working since 2004 through various versions of tb starting from before 1.0. 
> I suspect that unstable TCP connections are the problem,

Sounds that mismatch of status between IMAP server and Tb is involved.
A) Tb tries from initial(e.g. send STARTLS) due to some errors,
   but IMAP server doesn't know error and waiting for response/request from TB.
   So response from IMAP is not OK/CAPABILIT, then "not a vaild IMAP4 server".
B) IMAP server is waiting for response/request from TB, but Tb's status is
   abnormal and wait for something.
   Force of connection close by IMAP resolve situation of "both are waiting".

Firewall software is cause of such funny situation or unexpected communication error in many cases.
What anti-virus software/firewall software do you use?
When the problem started to occur?
See Bug 398843 for damage on POP3 by McAfee's recent updates.

If you have enough disk space, enable NSPR logging while you are using, and get IMAP protocol log, and check real protocol level flow.
 - http://kb.mozillazine.org/Session_logging_for_mail/news
 - http://www.mozilla.org/quality/mailnews/mail-troubleshoot.html#imap
 - http://www.mozilla.org/projects/netlib/http/http-debugging.html
"SET NSPR_LOG_MODULES=imap:5,smtp:5" is required, I think.
Other people said "140MB log was generated by 8 hours run" in other bug.
So be careful about disk space full.

If cause of problem is fault of Tb, and log analysis by developers is required, attach the log file to this bug(text/plain; if size is accepted).
Please don't paste long log data to bug.
TB times out cached imap connections after 29 minutes of idle time. It also times out after 60 seconds when trying to use an imap connection that has silently been dropped (faster if the OS tells us the connection is dead).

An IMAP protocol log might be helpful.

You could also set the connection cache timeout to 10 minutes (or less) if your router times out connections. Tools | options | advanced, config editor, change mail.server.serverXX.timeout from 29 to 9 or so, where serverXX is the imap server having the problem.
(In reply to comment #2)
> Firewall software is cause of such funny situation or unexpected communication
> error in many cases.
> What anti-virus software/firewall software do you use?

I use Avast! anti-virus and have not really had any problems in the past. Two firewalls are in play: WinXP's built-in firewall plus the de-facto firewall provided by the router connected to my cable modem. Neither have any special configuration (other than certain forwarded ports on the router).

> When the problem started to occur?

It's hard to tell, but I'm pretty sure that the problems started when I got the new router. Since nothing else seems to be having problems, I'm guessing that idle connections are being damaged and that causes tb to choke. But then again, I don't know anything ;)

> See Bug 398843 for damage on POP3 by McAfee's recent updates.

I don't use McAfee, so this is unlikely to be the problem.

> If you have enough disk space, enable NSPR logging while you are using, and get
> IMAP protocol log, and check real protocol level flow.

Okay, I will start a log file and see if I can post a relevant section. In case I can't figure out what is relevant, is there a place I can post a huge log file? Or, should I just attach it to this bug? I'm not sure I'm qualified to sanitize it to remove sensitive information, so I'd prefer to have a small audience.

> If cause of problem is fault of Tb, and log analysis by developers is required,
> attach the log file to this bug(text/plain; if size is accepted).

Thanks for your time.
Okay, I've got a 16MB log file that saw the "unsupported security protocol" message. I'm rummaging around in it for an indication of failure.

Unfortunately, it doesn't look like there are any timestamps in here. :(

I grabbed the last (~350kB) portion of the log which appears not to contain anything too sensitive, and I think it should show a few messages that were loadable followed by Something Bad.

Interestingly enough, I only got the "unsupported security protocol" when attempting to access the "Tomcat-Users" folder (clearly visible in the logs). Switching to my INBOX seems to have worked properly (you can see the switch in the logs where it downloads a bunch of spam). Switching back to Tomcat-Users repeated the error message.
Zipped to reduce size (non-patch size limit was exceeded)
Comment on attachment 284331 [details]
IMAP protocol log that should include both good and bad comunications.

Fixed MIME type
Attachment #284331 - Attachment mime type: text/plain → application/zip
Since it's a Courier server, you might go into the advanced imap server settings (tools | account settings |  server settings, advanced button, and change the maximum number of connections to cache to 4. 

There might be something weird going on with your certs, since it seems to be the STARTTLS negotiation that's failing, though it would seem like that would always succeed or fail.

The reason clicking on the inbox is working is that we're caching the connection to the inbox. For the tomcat folder, we need to create a new connection for that folder.
(Off-Topic)
(In reply to comment #5)
> Unfortunately, it doesn't look like there are any timestamps in here. :(
When MS Win, NSPR logging+DebugView can produce log with timestamp.
"SET NSPR_LOG_FILE=WinDebug" passes data to DebugView, and DebugView writes data with timestamp. It is convenient for getting log while specific operations only.
See Bug 86396 Comment #7.
David,

My Courier settings include these:
MAXDAEMONS=5
MAXPERIP=5

...which should be setting the connection counts to 5, right?

I have tb configured (for both accounts) to cache at maximum 2 connections to the server for a total of 4 persistent connections.

Can you show me where any error might be indicated in the log file?
2788[4442de8]: 2ce22f0:mail.christopherschultz.net:NA:SendData: 2 STARTTLS

2788[4442de8]: ReadNextLine [stream=42156e0 nb=37 needmore=0]
2788[4442de8]: 2ce22f0:mail.christopherschultz.net:NA:CreateNewLineFromSocket: 2 OK Begin SSL/TLS negotiation now.

2788[4442de8]: 2ce22f0:mail.christopherschultz.net:NA:SendData: 3 capability

2788[4442de8]: ReadNextLine [stream=42156e0 nb=0 needmore=1]
2788[4442de8]: 2ce22f0:mail.christopherschultz.net:NA:CreateNewLineFromSocket: clearing IMAP_CONNECTION_IS_OPEN - rv = 80004005
2788[4442de8]: 2ce22f0:mail.christopherschultz.net:NA:TellThreadToDie: close socket connection
2788[4442de8]: 2ce22f0:mail.christopherschultz.net:NA:CreateNewLineFromSocket: (null)
27

80004005 is unknown error, iirc.  But it looks to me like the server dropped the connection after STARTTLS had been negotiated, and we discovered it when we issued the CAPABILITY command.

The MAXPERIP setting seems right. (I'm not a Courier expert by any stretch). You might try bumping the MAXDAEMONS setting higher - maybe doing TLS requires an extra daemon? I'm just guessing...
I just got the "unsupported security protocol" message again (without logging turned on, unfortunately), and this time, instead of immediately shutting down tb (to get a cleanly closed logfile), I decided to play with the connections.

Using tcpview, I forced closed all IMAP connections from tb to my server. Attempting a "get all messages" resulted in the same error message. Tcpview shows no currently open IMAP connections.

I went to the IMAP server and restarted Courier, just in case something was clogged on the server side. Clean restart. Again, I tried to do a "get all messages" in tb and got the same "unsupported security protocol".

tb appears to be dead in the water, now, so I'll have to restart it.

I hope this additional information is useful.
You can also toggle TB off and online, to force TB to close all the connections.

Do you have anything between TB and the server, e.g., a firewall or virus checker or...?
I'll try offline/online toggle next time to see what happens. Note that all IMAP connections were closed and stayed closed. In tcpview, I could see tb attempting to open up two new connections (one from each account, probably), and then immediately being closed after the error.

As mentioned previously, I'm running on WinXPsp2 with the firewall turned ON, but with no explicit configuration. I am running Avast! anti-virus which has never caused me any problems. I run through a router to get to my cable modem.

There is nothing on the other end AFAIK. It's a virtually-hosted server, but I have my own IP, etc., and don't share the server with anyone else. The server runs Courier IMAP (ssl) and Apache httpd.
Interesting - I suppose it could be an issue with our core security code, but I would think that would affect a lot of users, and I haven't heard reports of this other than yours.

Disabling Avast! temporarily, and rebooting your router would be worth trying as well.
(In reply to comment #10)
> My Courier settings include these:
> MAXDAEMONS=5
> MAXPERIP=5

Local IMAP server for your use only and single IMAP account only?
Will increasing of MAXDAEMONS alter phenomenon?
 Possible issue when MAXDAEMONS =< MAXPERIP :
  When client requests connections over MAXPERIP, if MAXDAEMONS>>MAXPERIP,
  additional connections can be established, and closing process of current
  connections exceed MAXPERIP is initiated, then immediate killing of current
  connections usually doesn't occur.
  But, if MAXDAEMONS<=MAXPERIP, immediate killing of current connection always
  occurs, because additional connection request causes exceeding of total
  connection limit by MAXDAEMONS. And, if SSL or STARTTLS, it is badly affected
  by this immediate killing of current connections.
WADA,

Yes, the IMAP server is for my use only, but it has two accounts. 2x2 connections from tb, so a total of 4 connections are expected at any given time.

In my case, MAXDAEMONS=MAXPERIP. Is there any reason to suspect that requesting a new connection would terminate an old connection? That would seem to be unwise (i.e. a pretty bad DOS scenario).

Can you provide a reference for that text?
(In reply to comment #17)
> In my case, MAXDAEMONS=MAXPERIP. Is there any reason to suspect that requesting
> a new connection would terminate an old connection?
> Can you provide a reference for that text?

That is my guess. And if "reference" is for my guess, no such reference.
(If "reference" is for Courier IMAP parameter,              )
(following page is an example of short description about it.)
( http://kb.mediatemple.net/article.php?id=259              )

However, design of software/parameter is usually (A) instead of (B), I believe.
(A) MAXDAEMONS defines total connection number in an Courier IMAP system.
    So number of daemons(=total connection count) can not exceed MAXDAEMONS
    anytime, anywhere.
(B) (MAXPERIP)*(access from N IP addresses) is always available.
If (B), there is no need of MAXDAEMONS, is there?
If (B), why above page says as follows?
  MAXPERIP=250(50 employees times 5 cached mailbox connections)
  MAXDAEMONS=300 or 400
  when access via same firewall(i.e. all access comes from single IP address)

> but it has two accounts. 2x2
> connections from tb, so a total of 4 connections are expected at any given time.

As far as I know, Tb's connection limit count=4 is per account.
(accountN->ServerN==Server_Name+User_Name==An account at an IMAP server)
(And mail.server.serverN.max_cached_connections defines max count      )
MAXPERIP=4*2 is required, if 2 accounts are used by a Thunderbird, isn't it? 
MAXDAEMONS=8(4 of MAXPERIP * 2) is required, if 2 Thunderbird's who uses an account, isn't it?
MAXPERIP=8 and MAXDAEMONS=16 is required, if 2 Tb with 2 accounts each, isn't it?
(Sorry but I don't know whether mail.imap.max_cached_connections limits total) (number of connection request by a Tb to an IMAP server or not.)
WADA,

I'm telling you that I've changed my "max connections per account" setting in tb to 2 (default is 5, I believe). For two accounts that makes 4 total connections. I can confirm that by looking at tcpview's status. tb never tries to create more than 4 total IMAP connections.

Given that I have 5 total daemons configured in Courier, this should not be a problem.

I was asking for a reference for the assertion that in the event of a new incoming connection "immediate killing of current connection always occurs". Perhaps I misunderstood that to mean "an existing connection is terminated" rather than the more reasonable "the new connection is refused".

Anyhow, I'm pretty sure this isn't a connection count issue. Just in case, I have increased both MAXDAEMONS and MAXPERIP to 10. We'll see what happens.
(In reply to comment #19)
> I've changed my "max connections per account" setting in tb to 2

I see. My question is to rule out server side definition related problem, in order to avoid needless guess of cause or needless test for problem analysis.
And my question is not for cause of problem itself(something wrong in Tb), is for workaround/bypass, because finding of a bypass will help finding of cause.

Another concern is phenomenon reported to Bug 396328.
  Larger MAXPERIP than we are expecting(based on max_cached_connections)
  may be required, in order to avoid unwanted problem due to connection error
  by MAXPERIP(and/or MAXDAEMONS) of Courier IMAP server
This is one of the reasons of my guess of Comment #16. Looseness/margin in MAXPERIP/MAXDAEMONS may be required to bypass problem. Your setting may be too tight for current Thunderbird.
Here is where my confusion rests: consider the order of events observed:

1. tb is working fine.
2. tb somehow runs across an error and reports an invalid security protocol
   (on a single connection, I would imagine).
3. I kill all tb IMAP connections using tcpview.
4. No new connections are successful (same error).
5. tb restart solves the problem.

After forcing those connections closed, I'm not sure why tb can't re-establish a new connection. Courier does not need to be restarted on the server. If a tb restart wasn't a sure-fire solution, I would guess that connections on the server side were staying open in some unfortunate state (like TIME_WAIT for a long time).

That does not appear to be the case.

> This is one of the reasons of my guess of Comment #16. Looseness/margin
> in MAXPERIP/MAXDAEMONS may be required to bypass problem. Your setting may
> be too tight for current Thunderbird.

If that's true, then tb is doing something that is unexpected (IMO). I suppose the setting I am using is "maximum number of server connections to /cache/", rather than just maximum number of server connections. Does this mean that tb will open as many connections as it wants, but only actually cache the number I specify? I would prefer to limit the total number of connections, not just those that are kept open indefinitely.

On the other hand, regardless of the number of connections that are cached, etc. tb should be able to recover from this type of error, somehow. After this error, I can observe tb opening a new connection to the server, displaying an error, and then closing the connection. Why is tb unable to re-establish contact? I could understand a single connection failing for whatever reason, but it appears that this error permanently disables tb. :(
(In reply to comment #13)
> You can also toggle TB off and online, to force TB to close all the
> connections.

tb died again with the unsupported security protocol error. I tried to go offline, then go back online, and the error still occurs. I confirmed with tcpview that no connections existed during the "offline" period, so tb definitely closed its own connections.

I checked on the server, just for the heck of it, and the only TCP activity regarding port 993 is in the LISTEN state, so there aren't any hung-up connections on the server side. tb still errors-out.

I also tried hibernating Windows XP and coming back for a 3rd attempt. Nothing. :(

So, to recap thus far:
1. (server) MAXDAEMONS = MAXPERIP = 10
2. tb gets an error.
3. tb is dead after the error.
4. offline/online does not help. hibernate does not help. server does not have stale connections.
5. tb restart yields immediate success.
Does TB have a feature like FF's about:config that lets you see your prefs
including prefs for which there is no UI?  
If so, filter for security.enable, and copy the results here
yes, it's tools | options | advanced | general, config editor - this sure sounds like something is going on at the security level though why it would work for a certain number of connections and then stop completely is a mystery.
Status: UNCONFIRMED → NEW
Ever confirmed: true
NSS supports SSL 2, SSL 3.0 and SSL 3.1.  By default, FF and TB support
SSL 3.0 and 3.1 only, not SSL 2. 

Because of the large number of broken SSL 3.x servers out there, that do 
not correctly handle requests from clients that are 3.1 capable, but do 
handle requests from SSL 3.0 clients OK, PSM  tries to figure out if a 
server is choking on SSL 3.1, and back off to SSL 3.0.  Trouble is, it 
does that even when SSL 3.0 has been disabled by the user.  It doesn't
check the pref for SSL 3.0 before deciding to give up on 3.1.  So, if 
it gives up on 3.1, and the user has disabled 3.0, then SSL has no 
choices left.  

I don't know if that explains what this user is seeing, or not.
That's why I asked for the results from security.enable.  
Those results will tell us what versions of SSL are enabled in his client.
If SSL 3.0 is enabled, then this is not the cause of his problems.
securiry.enable has no value. tb has the following security.en* preferences and values (which all have the default values)

security.enable_ssl2 = false
security.enable_java = true
security.enable_ssl3 = true
security.enable_slt = true

More info: I have this in my Courier log file around the time of this morning's incident:
Oct 12 10:27:48 christopherschultz imapd: couriertls: accept: error:1408A10B:SSL
 routines:SSL3_GET_CLIENT_HELLO:wrong version number
Oct 12 10:27:48 christopherschultz imapd: Disconnected, ip=[::ffff:68.50.0.179],
 time=0, starttls=1
Oct 12 10:27:48 christopherschultz imapd: couriertls: accept: error:1408A10B:SSL
 routines:SSL3_GET_CLIENT_HELLO:wrong version number
Oct 12 10:27:48 christopherschultz imapd: Disconnected, ip=[::ffff:68.50.0.179],
 time=0, starttls=1
Oct 12 10:30:42 christopherschultz imapd: Connection, ip=[::ffff:68.50.0.179]
Oct 12 10:30:42 christopherschultz imapd: Connection, ip=[::ffff:68.50.0.179]
Oct 12 10:30:42 christopherschultz imapd: couriertls: accept: error:1408A10B:SSL
 routines:SSL3_GET_CLIENT_HELLO:wrong version number
Oct 12 10:30:42 christopherschultz imapd: Disconnected, ip=[::ffff:68.50.0.179],
 time=0, starttls=1
Oct 12 10:30:42 christopherschultz imapd: couriertls: accept: error:1408A10B:SSL
 routines:SSL3_GET_CLIENT_HELLO:wrong version number
Oct 12 10:30:42 christopherschultz imapd: Disconnected, ip=[::ffff:68.50.0.179],
 time=0, starttls=1
Well, your server is disconnecting the connections because it doesn't 
like the version number from the client.  The client is sending either
version 3.1 or 3.0 (both are enabled in your client).  The error message
doesn't tell us what number it received, nor what it expected.  But at
this point, I would recommend that you look at your server and see what
it takes to configure it to accept both 3.0 and 3.1.  If your server is
sometimes failing to complete handshakes of version 3.1, but it only 
accepts version 3.1 (and not also 3.0), then that behavior will cause
this problem.  
Christopher,
If you can confirm the version numbers that this server accepts, and
the version numbers that it does not accept, then this bug may be a 
dupliate of bug 239381.
Nelson,

I'm in over my head when it comes to SSL and TLS, but I used openssl's s_client to connect to my IMAP server and show the certs, and this is what it reports:

New, TLSv1/SSLv3, Cipher is AES256-SHA
Server public key is 1024 bit
Compression: zlib compression
Expansion: zlib compression
SSL-Session:
    Protocol  : TLSv1
    Cipher    : AES256-SHA
    Session-ID: [removed]
    Session-ID-ctx: 
    Master-Key: [removed]
    Key-Arg   : None
   Compression: 1 (zlib compression)
    Start Time: 1192218843
    Timeout   : 7200 (sec)
    Verify return code: 18 (self signed certificate)
---
* OK [CAPABILITY IMAP4rev1 UIDPLUS CHILDREN NAMESPACE THREAD=ORDEREDSUBJECT THREAD=REFERENCES SORT QUOTA IDLE AUTH=PLAIN ACL ACL2=UNION] Courier-IMAP ready. Copyright 1998-2005 Double Precision, Inc.  See COPYING for distribution information.

================ end of s_client output =================

It sure sounds like this could be bug #23981:

> There's apparently no way to get PSM to flush its list of TLS-intolerant
> servers, short of a restart.
Regarding 3.0 versus 3.1, I'm not sure how to check.

My configuration includes the following directives. Sorry for the partial configuration-dump. I'm happy to go poking around, but I'll need a bit of direction.

# Set IMAP_TLS_REQUIRED to 1 if you REQUIRE STARTTLS for everyone.
# (this option advertises the LOGINDISABLED IMAP capability, until STARTTLS
# is issued).

IMAP_TLS_REQUIRED=0

# TLS_PROTOCOL sets the protocol version.  The possible versions are:
#
# SSL2 - SSLv2
# SSL3 - SSLv3
# TLS1 - TLS1

TLS_PROTOCOL=SSL3

# TLS_STARTTLS_PROTOCOL is used instead of TLS_PROTOCOL for the IMAP STARTTLS
# extension, as opposed to IMAP over SSL on port 993.
#

TLS_STARTTLS_PROTOCOL=TLS1

TLS_CIPHER_LIST=!SSLv2:HIGH

This looks like an OpenSSL cipher string, which yields these ciphers:
ADH-AES256-SHA:DHE-RSA-AES256-SHA:DHE-DSS-AES256-SHA:AES256-SHA:ADH-AES128-SHA:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA:AES128-SHA:ADH-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:EDH-DSS-DES-CBC3-SHA:DES-CBC3-SHA

SSLPORT=993
IMAPDSSLSTART=YES
IMAPDSTARTTLS=YES

Courier is listening on both ports 993 and 143.

tb is using 143 since it's using TLS. I had assumed it was using 993 (IMAPS).

If I run openssl's s_client against port 143, I get this:

$ openssl s_client -tls1 -connect [my server]:143 -showcerts -cipher TLSv1
CONNECTED(00000003)
4813:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number:s3_pkt.c:284:

That looks like a problem. Is it?
Okay, I'm an idiot. I forgot about the STARTTLS. Here goes again:

$ openssl s_client -tls1 -connect [my server]:143 -showcerts -cipher TLSv1 -starttls imap

New, TLSv1/SSLv3, Cipher is AES256-SHA
Server public key is 1024 bit
Compression: zlib compression
Expansion: zlib compression
SSL-Session:
    Protocol  : TLSv1
    Cipher    : AES256-SHA
    Session-ID: [removed]
    Session-ID-ctx: 
    Master-Key: [removed]
    Key-Arg   : None
   Compression: 1 (zlib compression)
    Start Time: 1192222953
    Timeout   : 7200 (sec)
    Verify return code: 18 (self signed certificate)
---
. OK CAPABILITY completed
s_client results don't help us.  They don't tell us how the server
is responding to the versions being sent by TB.

Try ssldump or ssltap
After increasing the number of connections that Courier allows (but leaving tb alone), tb still eventually gives me the "unsupported security protocol" message. :(

I forgot to do a tcpdump or equivalent before restarting tb, so I'll have to wait for it to happen again.
Okay, tb has hosed again. I fired up Wireshark and watched port 143. I then clicked on a random folder, and here is the IMAP conversation:

[server -> tb]
* OK [CAPABILITY IMAP4rev1 UIDPLUS CHILDREN NAMESPACE THREAD=ORDEREDSUBJECT THREAD=REFERENCES SORT QUOTA IDLE ACL ACL2=UNION STARTTLS] Courier-IMAP ready. Copyright 1998-2005 Double Precision, Inc.  See COPYING for distribution information.

[tb -> server]
1 capability

[server -> tb]
* CAPABILITY IMAP4rev1 UIDPLUS CHILDREN NAMESPACE THREAD=ORDEREDSUBJECT THREAD=REFERENCES SORT QUOTA IDLE ACL ACL2=UNION STARTTLS
1 OK CAPABILITY completed

[tb -> server]
2 STARTTLS

[server -> tb]
2 OK Begin SSL/TLS negotiation now.

[tb-server]
0000   16 03 00 00 43 01 00 00 3f 03 00 00 00 32 81 6e  ....C...?....2.n
0010   b9 3e 35 04 7b 8a f5 61 93 16 fd bd 1d cc df 90  .>5.{..a........
0020   25                                               %
[and then:]
0000   97 fb d7 4e e1 b8 01 1a 00 00 18 00 39 00 38 00  ...N........9.8.
0010   35 00 33 00 32 00 04 00 05 00 2f 00 16 00 13 fe  5.3.2...../.....
0020   ff 00                                            ..

[server -> tb]
0000   15 03 00 00 02 02 28

The server then sends FIN/ACK to kill the connection.

Sorry, I don't have ssltap or ssldump installed anywhere. Can someone suggest the best setup? I already have a saved dump of the connection from Wireshark... any way to play that back through one of these tools, instead of re-wiring the connections and waiting for tb to fail, again?

Thanks again for your attention.
I didn't receive the email copies of the previous two comments, and so 
didn't notice them until now.  

I'm somewhat suspicious of the hexadecimal data shown above.  It seems 
to be incomplete.  I suspect that some bytes were lost in the capturing
process.  It's VERY odd that the data from the first record would be 
split into two packets or 33 and 34 bytes, rather than in one packet.
The packet header says the packet is 72 bytes long, but there are only 
67 bytes there.  5 bytes are missing.  I don't want to try to draw 
conclusions from possibly incomplete data.  
To Christopher Schultz(bug opener):

NSPR log with next parameter may help further analysis.
> SET NSPR_LOG_MODULES=imap:5,nsSocketTransport:5,nsHostResolver:5,negotiateauth:5
> http://kb.mozillazine.org/Session_logging_for_mail/news
> http://www.mozilla.org/quality/mailnews/mail-troubleshoot.html#imap
> http://www.mozilla.org/projects/netlib/http/http-debugging.html
> http://www.redhat.com/docs/manuals/enterprise/RHEL-5-manual/Deployment_Guide-en-US/sso-ov.html#sso-config-firefox-troubleshoot

If analysis with other traces such as ethereal is required, timestamp of NSPR log is useful, and when MS Win, combination of DebugView+NSPR_logging is available for this purpose.
  SET NSPR_LOG_FILE=WinDebug
  SET NSPR_LOG_MODULES=...
  (See Bug 86396 Comment #7 for rough procedure)
NSPR_LOG_FILE=WinDebug will pass log data to DebugView, and DebugView will add
timestamp to data and save timestam+data in a file.
This is also convenient to obtain NSPR log while specific operations only.
(In reply to comment #35)
> I'm somewhat suspicious of the hexadecimal data shown above.  It seems 
> to be incomplete.

That might be due to the way I copied the data from wireshark.

> The packet header says the packet is 72 bytes long, but there are only 
> 67 bytes there.  5 bytes are missing.  I don't want to try to draw 
> conclusions from possibly incomplete data.

Here's the entire packet IMAP payload (i.e. no IP or TCP data):

0000   16 03 00 00 43 01 00 00 3f 03 00 00 00 32 81 6e  ....C...?....2.n
0010   b9 3e 35 04 7b 8a f5 61 93 16 fd bd 1d cc df 90  .>5.{..a........
0020   25 0a 46 97 fb d7 4e e1 b8 01 1a 00 00 18 00 39  %.F...N........9
0030   00 38 00 35 00 33 00 32 00 04 00 05 00 2f 00 16  .8.5.3.2...../..
0040   00 13 fe ff 00 0a 01 00                          ........

This appears to contain the full 72 bytes.

Since wireshark "knows" about IMAP, I think it was trying to be helpful by showing the distinct messages and removing things like message separators an stuff -- probably where the 5 bytes got "lost". This time, I have included the entire payload of the TCP packet -- it /was/ one packet, and not split.
Oh, in case it wasn't obvious, I still have the saved capture if more information is needed. (That includes timestamps, WADA.)
The client hello packet shown in comment 37 seems complete and OK.
The server sends back an SSL 3.0 fatal alert with the generic error code
handshake_failure(40).  Maybe the server log can explain what the server
didn't like.  I see no error in the data coming from the SSL client.
=> invalid per comment 39  unless Chris has something to add.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → INVALID
Running 2.0.0.16 on win32/Vista, I am still experiencing lockups. I'm exclusively using IMAP4 and sometimes tb will run all day without incident, but sometimes it will lock up after only a few minutes. This is a laptop which I regularly hibernate and move between networks, and I rarely restart the OS. Restarting tb always fixes the issue.

It's possible that I am experiencing a variety of issues, since these days, I'm not getting the "[server] is not an IMAP server" message. It's possible that I could be experiencing bug #410747 or something similar.

The error could conceivably be in the server, too. I'm courier-imap-ssl, but it's a solid version that has never crashed or anything like that. My server runs on a virtualized OS (VPS), and I have heard some speculation that VPSs suck for mail servers -- I /do/ know that I will often get long wait times on disk I/O (like waiting for 10 seconds to get a directory listing for a directory with only a dozen files in it). As I said, though, a tb restart always solves the issue: no server-side intervention is necessary.
Product: Core → MailNews Core
To close the loop on this, I switched email hosting providers to one that does not use Virtuozzo, and my problems have evaporated. It appears that the server was stalling repeatedly and tb was being affected.

So, this is not really a bug in tb, but the error messages received (such as "not a valid IMAP4 server") are confusing.

Thanks for all the suggestions, everyone.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: