TB hangs on network change

RESOLVED INVALID

Status

RESOLVED INVALID
4 years ago
2 years ago

People

(Reporter: jhaar, Unassigned)

Tracking

x86_64
Linux

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [closeme 2016-10-01])

(Reporter)

Description

4 years ago
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:36.0) Gecko/20100101 Firefox/36.0
Build ID: 20150306140254

Steps to reproduce:

I run Ubuntu-14.04 with TB-31.5 (the standard package) on a laptop, with two imap accounts. At home and at work we have "split mode" dns: ie inside each network, "imap.domain.name" resolves to a 192.168.X and 10.X address, and on the Internet they resolve to different Internet addresses

So when at home, one imap accounts resolves to my home 192.168 imap server, while my work resolves to its Internet address. Then I sleep my laptop, go to work, open the lid and connect to the work network, and then dns resolves my home imap to an Internet address and my work to a 10 address



Actual results:

What happens is that TB consistently hangs for up to 5 minutes. "lsof -ni|grep thunderbird" shows TB has kept IMAP connections open to the wrong addresses - ie it hasn't realised there's being a network change and so is still trying to connect to my home 192.168 address from work  - where it should be using the Internet address. Similarly, it is still trying to exchange packets with my work Internet address instead of it's 10 address

Using nslookup I can see my local DNS resolution is correct - but TB has not actually re-done a DNS lookup. Either it's running from it's own cache, or is insisting on continuing to use these old, out of date ESTABLISHED connections

Just a FYI, but "lsof -ni" shows TB has ESTABLISHED connections from its old, non-existent 192.168 address even when it's on the 10 network - and doesn't have that address. That will be the "fault" of the Linux OS - not TB - but it is what it is


Expected results:

I think TB should keep an eye on hanging IMAP (POP too) sessions, and when it decides they aren't working, tear them down and re-initialize - including DNS lookups. Maybe it already does that, in which case all I have a problem with is that it takes 5+ minutes instead of (say) 30 seconds?

Here's what lsof shows 10+ minutes after TB was un-slept on my work network. You will see even now it still has "open" connections from it's old 192.168.8.11 address (which isn't assigned to any network card). They are now in a CLOSE_WAIT state and TB is now working correctly - but they shouldn't be there...

thunderbi 10464           jhaar   43u  IPv4  3573819      0t0  TCP 192.168.8.11:54112->12.3.11.5:pop3s (CLOSE_WAIT)
thunderbi 10464           jhaar   44u  IPv4  3573820      0t0  TCP 192.168.8.11:54113->12.3.11.5:pop3s (CLOSE_WAIT)
thunderbi 10464           jhaar   62u  IPv4  3569181      0t0  TCP 192.168.8.11:54114->12.3.11.5:pop3s (CLOSE_WAIT)
thunderbi 10464           jhaar   64u  IPv4   104193      0t0  TCP 192.168.8.11:48894->192.168.8.3:imaps (CLOSE_WAIT)
thunderbi 10464           jhaar   68u  IPv4   121185      0t0  TCP 192.168.8.11:49022->192.168.8.3:imaps (CLOSE_WAIT)
thunderbi 10464           jhaar   69u  IPv4  3912428      0t0  TCP 192.168.8.11:48080->198.84.60.198:http (CLOSE_WAIT)
thunderbi 10464           jhaar   77u  IPv4   121353      0t0  TCP 192.168.8.11:49026->192.168.8.3:imaps (CLOSE_WAIT)
thunderbi 10464           jhaar   78u  IPv4  3613703      0t0  TCP 192.168.8.11:54404->12.3.11.5:pop3s (CLOSE_WAIT)
thunderbi 10464           jhaar   82u  IPv4  3911672      0t0  UDP 127.0.0.1:49500->127.0.1.1:domain 
thunderbi 10464           jhaar   84u  IPv4  2974525      0t0  TCP 192.168.8.11:41317->192.168.8.3:imaps (CLOSE_WAIT)
thunderbi 10464           jhaar   92u  IPv4 27850665      0t0  TCP 10.8.2.21:46182->10.8.254.3:pop3s (ESTABLISHED)
thunderbi 10464           jhaar  104u  IPv4  3913513      0t0  TCP 192.168.8.11:34083->65.54.226.151:http (CLOSE_WAIT)
thunderbi 10464           jhaar  117u  IPv4  3913517      0t0  TCP 192.168.8.11:37632->92.52.96.89:http (CLOSE_WAIT)
thunderbi 10464           jhaar  118u  IPv4  3914187      0t0  TCP 192.168.8.11:39166->107.6.106.82:http (CLOSE_WAIT)
thunderbi 10464           jhaar  130u  IPv4  3914219      0t0  UDP 127.0.0.1:45928->127.0.1.1:domain 
thunderbi 10464           jhaar  134u  IPv4 27850668      0t0  TCP 172.16.12.4:51474->21.114.246.214:imaps (ESTABLISHED)
thunderbi 10464           jhaar  148u  IPv4 27854245      0t0  TCP 10.8.2.21:46204->10.8.254.3:pop3s (ESTABLISHED)
thunderbi 10464           jhaar  162u  IPv4 27851252      0t0  TCP 10.8.2.21:46175->10.8.254.3:pop3s (ESTABLISHED)
thunderbi 10464           jhaar  171u  IPv4 27940241      0t0  TCP 172.16.12.4:52194->21.114.246.214:imaps (ESTABLISHED)
thunderbi 10464           jhaar  173u  IPv4 27856007      0t0  TCP 10.8.2.21:46205->10.8.254.3:pop3s (ESTABLISHED)
thunderbi 10464           jhaar  179u  IPv4 27848342      0t0  TCP 10.8.2.21:46152->10.8.254.3:pop3s (ESTABLISHED)
thunderbi 10464           jhaar  183u  IPv4 27939607      0t0  TCP 172.16.12.4:52195->21.114.246.214:imaps (ESTABLISHED)

Updated

4 years ago
Component: Untriaged → Networking
Product: Thunderbird → MailNews Core
CLOSE_WAIT is normal state, because it's "server suddenly disappeared" for this PC. Timeout in TCP is approximately 10 minutes.

Tb has problem around DNS caching, so, if server's IP address is suddenly changed by you, Tb can't follow you quickly.
"Sudden server IP address change" is sudden server down from perspective of "client in a PC". It takes long to do "error detection, error recovery, clean up due to permanent error, retry from scratch, ...". 

Do you see your problem by following procedure?
1. Before network change, Go "Work Offline" mode in Tb.
2. Network change. 
3. When new network is usable, Go "Work Online" mode. Do network access such as imap folder access. 

A reason why "takes long to swich to new network environment" :
   If IDLE is used, imap cached connection for Inbox goes "Receive state" after IDLE.
   Tb has problem in "connection loss while idling". Tb does do nothing when  "connection loss while idling".
   "Network change while Tb is running" == Forcing this "connection loss while idling" at a cached connection used for inbox.
   Because idle timeout=29 minutes, next "DONE/IDLE cycle" is initiated after 30 minutes.
"Go Work Offline followed by Go Work Online" forces logout/connection close of Tb. Because connection is normally closed, next access to server is normally initiated. This is trick.
(Reporter)

Comment 2

4 years ago
Sorry I failed to mention it before but I already tested with offline/online and that definitely helps. My real concern is that kind of trick is OK for me and you - but our fathers couldn't do it :-)

I've been seeing this problem for 10 years with Firefox and Thunderbird - I work around it myself - but think it's enough of a problem for "normal" people that TB should do something extra to reduce the impact

Your comment about IMAP IDLE makes a lot of sense. However, I have mail.server.default.use_idle=true and yet within Advanced settings have "Use IDLE command if the server supports it" unchecked... Which one wins? It sounds like IDLE isn't used - in which case your comments about IDLE cannot be happening?

Also, about:config shows network.tcp.keepalive.idle_time=600 - which is the 10 minutes I'm seeing before it's starts clearing up. If I reduced that to 300, should that speed up reconnections too?

Thanks!
(In reply to Jason Haar from comment #2)
> However, I have mail.server.default.use_idle=true
> and yet within Advanced settings have "Use IDLE command if the server supports it" unchecked...
> Which one wins?
The Advanced settings is saved in mail.server.server#.use_idle=false
mail.server.default.use_idle is default when mail.server.server#.use_idle is not defined, and/or is default upon imap account creation.

TCP keepalive : http://en.wikipedia.org/wiki/Keepalive
I also think small TCP keepalive is useful for quick network error detection, especially in WiFi environment.
If reliable network and connection, "small TCP keepalive" is merely annoyance for server. Default in TCP as transmission protocol is 2 hour.
Small imap idle timeout is another way. 
Short idle timeout(done/idle cycle at an imap cached connection) is better than TCP keepalive, because done/idle cycle is done at 5 connections only in Tb, but TCP keepalive is applied to any TCP session.
Short idle timeout(done/idle cycle at an imap cached connection) is better than small new mail check interval for many folders for network error detection.

Another simple way.
   Before go back home from office, terminate Tb, suspend PC.
   At home, after resume of PC, restart Tb.
   Before go to office from home, terminate Tb, suspend PC.
   At office, after resume of PC, restart Tb.

Why problem occurs only on imap in Tb is: connection is established always if imap.
If pop3 or smtp or nntp, connection is closed after each access, and connection is establshed upon next access.
So, "forcing imap connection close at somewhere" is needed for quick/certain recovery from network error.
"forcing imap connection close" is done by : termination of Tb, or Go Work Offline.

Comment 4

2 years ago
Jason, 
Much has changed in two years.
Do you still see this problem when using a current version 45?

In a rather generic bug query https://mzl.la/2cIkgBW I don't see anything that definitely sounds like your issue (except yours of course)

I do not use offline/online and I do not have any problems moving across networks. But I am not using linux.
Flags: needinfo?(jhaar)
Whiteboard: [closeme 2016-10-01]
(Reporter)

Comment 5

2 years ago
Yeah, this was 18 months ago, so everything's changed. I moved off Ubuntu onto Fedora for starters, along with several OS updates and TB doesn't show this issue any more

Let's close this one :-)
Status: UNCONFIRMED → RESOLVED
Last Resolved: 2 years ago
Flags: needinfo?(jhaar)
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.