Closed Bug 206107 Opened 21 years ago Closed 16 years ago

network component of Mozilla becomes unresponsive for both browser and mail after certain IMAP operations are performed

Categories

(MailNews Core :: Networking: IMAP, defect)

x86
Linux
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: ebravick, Assigned: Bienvenu)

References

()

Details

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030507
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030507

This is perhaps a meta report of many other problems reported with networking
and IMAP.  In looking through the bug reports, there are many that involve
certain aspects of what I am experiencing, but none that I feel as comprehensive
enough to denote the problem.

There appears to be a general networking problem, that in specific can be
reproduced with 100% accuracy with an IMAP setup.  When this "network spin"
occures, all network communication stops, including all browser activity and
mail activity (no other sub-systems have been verified, I don't use them.)

I have left the network spin for many hours without restarting, and Mozilla does
not recover.  It cannot make new connections out.  tcpdump confirms no traffic
is being generated.  Mozilla also does not report any of the network spin as
timeouts, for example, normally when a web page cannot be reached in a
reasonable amount of time, an error is returned.  In these cases, no error is
ever returned.

During network spin, the browser and mail component remain responsive, as long
as no network activity is expected.

I have recently discovered that the "stop" button in mail appears to restore
some ability to connect in the mail component, although browser access is still
failed until restarting.

The problem does get worse as you add more email accounts, those email accounts
get more folders, and as time goes on.  Now that I have all of my email accounts
loaded, mail always fails within 5 minutes of use, usually within 30 seconds.

Operations that typically trigger network spin are: moving a subfolder from one
folder to another, "running" newly created email filters, loading large-ish
files, moving a message into a sub-folder (as the mail reader automatically
attempts to open the next message, it hangs.  pushing stop allows you to select
a different message to open, but at this point most new network activity will spin.

Note that this problem is NOT simply slowness, although failing to wait for one
network operation to complete before others are started does seem to enhance the
chance of network spin.

Also note that when the email browser has "partial network ability" you can move
individual messages into sub-folders, and make new folders, and read new
messages (as long as you aren't auto advancing to the next message.)  You can
NOT: execute email filters, move subfolders amongst folders, access certain
messages that seem to get "turned off."

One interesting technique for "fixing" the email browser is as follows: you know
the network is spinning when your mail browser says "connecting to (IMAP
server)" but isn't doing anything, or sending any packets.  However, fully
collapsing and expanding the account will initiate a rescan of the subscribed
folders list.  I've discovered that if you collapse everything, kill all the
IMAP processes on the server, expand the accounts (you then get a "server has
disconnected you" error, then click on the inbox to get a re-authentication,
then things will start working again until you tickle the problem.

Reproducible: Always

Steps to Reproduce:
1. setup many imap accounts, SSL, show all folders
2. create a main folder
3. move that main folder to a sub-folder

Actual Results:  
network spin

Expected Results:  
move the folder to the main folder

33 mail accounts, all IMAP/SSL, all with subscribe to all folders.  Linux on all
components.

Note that tuuring on "expunge on exit" and "clean up trash on exit" also causes
Mozilla to be unable to gracefully shut down if you have tickled network spin.

Also note that I did NOT have this problem with 15 accounts, it started
happening after I added 18 more accounts for a total of 33.

I am actively testing new combinations of this bug, as it is very critical to
me.  I will continue to update this until there is a resolution (either a bug
fix or I find a work around.)
Interestingly enough, I disocvered that the appearence of this problem is
significantly reduced when the browser and IMAP servers are seperated by a
saturated link.  Mail works a lot slower (of course) but it appears to work with
less error.  I'm going to try more test cases today to attempt to narror the
scope of this problem.
Can you generate a client-side protocol log of this happening?

http://www.mozilla.org/quality/mailnews/mail-troubleshoot.html#imap

so we can see what's going on from the client-side?

Also, do you have Mozilla set up to check all folders for new mail?

I don't know why the browser has trouble once IMAP gets horked. Is it dns
lookups that are horked? Or just any network activity? I know Darin has fixed
bugs like this not too long ago, but the fixes are all checked in.
http://www.mozilla.org/quality/mailnews/mail-troubleshoot.html#imap

I followed these directions, what a great debug feature!  This data will take me
a bit of time to pear down, since just opening my email generates a 14MB log
file!!!  I'll work on getting something relavent to the problem.

Yes, I do have mozilla check all inboxes.  Should I turn that off?

I was only checking web traffic.  I will reproduce, and check for DNS traffic.

I would try turning off check all mail folders and see what affect that has, if any.
After lockup, I see no DNS or HTTP activity.  I should note, however, that I am
using a proxy...  so what I am *really* seeing is the halting of all proxy activity.

Removed all check on open and periodic checks, will test again.
OK, removing all automatic email checks helped a lot.  I've been using mail now
 for 22 minutes without a network halt, and I'm writing this in the browser
while mail is processing in another window.  I'll hit it with as much hard stuff
as I can over the next few hours and see if it breaks again.

I'll also try to get some protocol logs shaved down, since there is definately
still a problem when you start up large numbers of connections all at once.
This is where I am at now:  a day later, and I've only seen one failure
operating with all of the autoamtic checks off.  This bug seems to predominantly
show when one mail browser is required to make many (like hundreds in my case)
IMAP connections at the same time.

At this point, I have what I consider an acceptable work around.  I will offer 2
paths, based on what the developers want to do.

1) I will close this case and assume that at some point in the future this will
just be worked out by improvements in the network code.

2) I will continue to debug, and update this ticket, if there is an interest
from the development side of a real world user who uses Mozilla in this way.  I
understand that my needs exceed most users, so I don't want to waste time if
everyone agrees that there are more pressing matters to attend to.  If I do
continue, I'll send protocol log snippets each time I see the failure, and I'll
try to figure out how many connections it takes to break it.

Comments?

Thanks.
Eric, what kind of imap server are you using, and is it running on the same
machine as the client?

The way the check all folders for new mail feature is implemented is very
stressful to the imap connection code - if you have 100 folders, it just blasts
off 100 requests to open folders simultaneously and the imap connection code
spins up 5 connections to handle the first five and queues the other 95
requests, and runs them one by one as each previous request is finished. I
believe the check for new mail feature should chain the urls instead of blasting
them out all at once. However, as you say, the imap connection code needs to get
better to handle this. There are two problems in this code:

1. It doesn't quite prevent simultaneous connections to the same folder, which
is required for some UW IMAP server configurations. I have a partial fix for
this tree, but it's risky for 1.4

2. There's a race condition in the url chaining code such that the queue can get
stalled. This has turned out to be hard to fix, but it's definitely something I
want to fix.

Neither of these explains why all of networking would stop - I think there might
be a third thing going on.
My machine is only a workstation, all IMAP servers are on different machines. 
Stats are roughly:

1 client
3 servers in different data centers (in a 3,3,27 account split)
33 total accounts
All use IMAP/SSL
961 email folders across all accounts
Linux client
All servers are Linux and uw-imapd 2001a (Full version string is
"2001adebian-6", which is UW imap with SSL and maildir support.)

I do notice the large number of IMAP connections to the server with 27 accounts
on it...  but I've had more than that on this server before with other clients
and users, so it *should* be OK.  (although, there may be some interaction here.)

I can tell you that the browser definately stops all network communications
after the network halting is caused by the email client.  I've never been able
to get the browser itself to do this, although I haven't really tried...  maybe
I should?  I could try a big tab set and see if I can cause it to die in a
similar fashion.
I should also note that I have around 150 filters configured across all those
accounts also...  not sure if that would also cause an increase in the number of
IMAP connections that get spun up -- I suspect it should.
necko allows at most 50 active sockets (it'll queue up requests for additional
sockets).  HTTP has a smaller limit on the number of persistent connection it'll
keep around so that this ceiling is never reached except in cases where
additional sockets are temporarily opened.  FTP keeps at most 8 idle control
connections.  Perhaps we are hitting (or attempting to exceed) this limit with
IMAP.  perhaps IMAP needs to be careful not to try to keep an excessive number
of connections open at one time.  Anyways, this is a much bigger problem for the
whole project really because with each module individually keeping some number
less than 50 idle sockets, they could all easily add up to something more than
50.  i'm not sure how to best solve this problem...
the Mozilla client will/should only ever open 5 imap connections per server.
However, by server, I mean account :-) - since you have multiple accounts
pointing at the same server, we will end up with lots of open connections.
Unfortunately, our connection limiting code is per-account and we'd have to
build a whole nother layer of code to limit the number of total connections. It
could be done, but now that you describe your setup, it seems pretty far out of
the mainstream.

If the check all folders for new mail folders would re-use connections, and
chain requests, we'd use a lot less connections. We could also fix it to use the
Status command, which doesn't require a new connection.
Theoretically, if I turn back on all the automatic email checks, I'm going to
easily push the browser into a state where it will *always* need more than 50
sockets open.  (at least, judging by the behavior that I am seeing when I watch
the email servers...)  It could be that I'm just grabbing all the available
sockets, and so any connections that I make after that are sitting in a
perpetual queue?  Since none of the old connections can be reaped, that 51st - n
queued connection would just sit forever...???
I have also discovered that I can reliably reproduce this error in the new
configurations (e.g. all the automatic connection making stuff turned off) by
running down my list of email accounts and checking the inbox on each one.  No
matter what order I check them in, it almost always locks on the 28th account
that I check...  as long as I check them "in reasonable succession."

In any event, since I am a corner case, should I close this ticket?

Thanks.
Please try this again with Mozilla 1.7 RC1 or later. Bug 240759 probably fixed this.
or, you can try a 1.8 trunk build, which chains the status commands, which will
result in a single connection being used to check all the folders...
Product: MailNews → Core
Blocks: 273112
One imap account is enough to get the problem.
This is an automated message, with ID "auto-resolve01".

This bug has had no comments for a long time. Statistically, we have found that
bug reports that have not been confirmed by a second user after three months are
highly unlikely to be the source of a fix to the code.

While your input is very important to us, our resources are limited and so we
are asking for your help in focussing our efforts. If you can still reproduce
this problem in the latest version of the product (see below for how to obtain a
copy) or, for feature requests, if it's not present in the latest version and
you still believe we should implement it, please visit the URL of this bug
(given at the top of this mail) and add a comment to that effect, giving more
reproduction information if you have it.

If it is not a problem any longer, you need take no action. If this bug is not
changed in any way in the next two weeks, it will be automatically resolved.
Thank you for your help in this matter.

The latest beta releases can be obtained from:
Firefox:     http://www.mozilla.org/projects/firefox/
Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html
Seamonkey:   http://www.mozilla.org/projects/seamonkey/
I have not been able to reproduce this so far under the latest Thunderbird
stable release, but I will convert the balance of my email accounts to
Thunderbird in the next month and report back if, using the same number of
accounts, I can reproduce.  If I cannot repoduce this, I will report.  Thanks.
WFM per reporter
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago
Resolution: --- → WORKSFORME
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.