Closed Bug 569324 Opened 15 years ago Closed 14 years ago

Thunderbird complains there is no connection on resume from suspend, when there is a connection

Categories

(Core :: Networking, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: michael, Unassigned)

References

Details

(Whiteboard: [has protocol log][summary comment 18])

Attachments

(1 file, 1 obsolete file)

User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.9.2.3) Gecko/20100423 Ubuntu/10.04 (lucid) Firefox/3.6.3 Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100423 Thunderbird/3.0.4 When I resume from suspend, network manager quickly re-establishes a WiFi internet connection. With an active connection, I try to check my mail with Thunderbird, but it complains that it can't connect to the IMAP server. This seems to happen after longer suspend periods because if I try to replicate the behaviour by quickly suspending and resuming (suspend of a few seconds), I get no warning. This could be a duplicate of the reported fixed bug https://bugzilla.mozilla.org/show_bug.cgi?id=473483 although that one was reported against Mac OSX. Reproducible: Always Steps to Reproduce: 1. Open Thunderbird, check mail 2. Suspend laptop, let it sleep for a while 3. Resume laptop, wait for connection, go to inbox 4. Thunderbird complains there is no connection 5. Mail can be read by going to some other folder, then back to Inbox Actual Results: Thunderbird complains there is no connection Expected Results: Thunderbird checks mail without complaining Reported on Launchpad: https://bugs.launchpad.net/ubuntu/+source/thunderbird/+bug/584529?comments=all
Anything in Tools->Error console ? If you launch tb from a terminal anything on that terminal's stdout ?
Nothing in stdout In Tools->Error Console I see this, several times: Error: [Exception... "update.locale file doesn't exist in either the XCurProcD or GreD directories" nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)" location: "JS frame :: file:///usr/lib/thunderbird-3.0.4/components/nsUpdateService.js :: getLocale :: line 549" data: no] Source File: file:///usr/lib/thunderbird-3.0.4/components/nsUpdateService.js Line: 549 Will keep an eye on this log.
I can confirm this bug. It happens on first click on "receive messages" every time I come back from suspend after "a while" as Michael notes. It happens with POP accounts, not only IMAP. Second click on "receive messages" works fine. Perhaps this "while" is the "check for messages every X minutes" time? No relevant information on stdout or error console. I have Mozilla/5.0 (X11; U; Linux i686; es-ES; rv:1.9.1.10) Gecko/20100520 and ThunderBird 3.0.5
Sorry, seems a dup.
It seems no one's working on this one even when it is quite annoying, but at least status should be changed to "confirmed"
(In reply to comment #6) > It seems no one's working on this one even when it is quite annoying, but at > least status should be changed to "confirmed" Why we don't reproduce - so mightsomething external causing the issue.
So maybe some OS pattern? I use Ubuntu 10.04 amd64 with Thunderbird 3.0.6.
I use opensuse 11.3 TB 3.1.2, NetworkManager and nm-applet as frontend
Ubuntu 10.04 amd64 TB 3.0.6 here too.
https://bugzilla.redhat.com/show_bug.cgi?id=510005 I believe Fedora 14 also has this problem. On the other hand, firefox in Fedora can detect the connection state and go to offline mode, so this seems possible.
I can confirm that this also happens with Fedora 14/TB 3.1.6 using NetworkManager/nm-applet. After resuming from suspend I always get the "Failed to connect to server" pop up for every account the first time I hit "Get Mail" for that account, unless I give it enough time to make an automatic background check (which, by the way, seems to take a *very* long time after resuming - a lot longer than the configured 10 minute check interval, which is why I always need to click "Get Mail" in the first place). As others have noted, it does not happen for short suspends. Maybe someone could at least provide some ideas on how this could be diagnosed further?
A nsSocketTransport:5 log might help .
Attached file nsSocketTransport:5 log (obsolete) —
nsSocketTransport:5 log demonstrating the problem. The log was taken from a TB instance with a single IMAP account. 2010-12-25 21:58:42.339538 TB starts 2010-12-26 00:47:59.055689 Last log entry before suspend 2010-12-26 09:02:30.261338 First log entry after wakeup 2010-12-26 09:02:53.550120 First "Get Mail" click (fails) (?) 2010-12-26 09:02:57.739030 Second "Get Mail" click (successful) (?) 2010-12-26 09:03:08.135790 Third "Get Mail" click (successful) (?)
I don't have any experience of interpreting nspr logs, so take this with a grain of salt... It seems that almost immediately after wakeup, TB tries to re-establish the connection but hostname lookup fails (NS_ERROR_UNKNOWN_HOST). This probably makes sense because the network is not yet up at that time. After a while, I click "Get Mail". TB tries to open a new connection, but immediately fails again with NS_ERROR_UNKNOWN_HOST. It appears that this result is cached from the previous attempt. I will add resolver logging too to see if that gives additional any insight.
Another log adding nsHostResolver:5. The behaviour was a little bit different this time in that it actually updated the mailbox before I got a chance to hit "Get Mail" a second time. First time, though, I got the error message as usual. AFAICT it confirms that nsHostResolver is indeed caching the initial lookup failure (timestamp 2010-12-26 15:54:33.688102). It appears that nsHostResolver treats any lookup failure as NS_ERROR_UNKNOWN_HOST which is perhaps a bit too naive. Ideally, it would distinguish between a communication failure and an actual NXDOMAIN condition, and avoid caching the former, or at least use a much shorter cache timeout.
Attachment #499753 - Attachment is obsolete: true
Component: General → Networking
Product: Thunderbird → Core
QA Contact: general → networking
does log show anything interesting ?
Whiteboard: [has protocol log]
My theory is, also as Christer mentioned in comment 16, that we try to connect immediately after wake up, but since cached host name has expired we do complete name resolving again. But the system is not up that soon (few milliseconds after the wake up) to return some result. We should then not cache the unknown-host state. I'm not a linux developer (I don't see this behavior on a Windows machine, but also may be just related to record expiration times) and also I'm not directly a DNS code maintainer (however, I can do the fix my self). Adding Michal, since he may know better what type of error state we get from getaddrinfo w/o a net connection up on a linux system.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Whiteboard: [has protocol log] → [has protocol log][summary comment 18]
Actually, as I tried to explain, what's going on is this; 1. After wakeup TB immediately tries to reconnect 2. TB calls nsHostResolver to resolve the server name which immediately gets an error because at this time, the network interface is not yet up 3. nsHostResolver treats this error (in fact, any error) as an unknown host, and puts that that "fact" in its cache 4. TB recognizes that the request is not initiated by the user and therefore suppresses any UI level error message 5. User comes around and clicks "Get Mail" 6. TB again calls nsHostResolver to resolve the server name, which returns "unknown host" from its cache 7. This time the request IS initiated by the user so an error message is shown 8. Not until the cache entry expires can the user successfully reconnect Having said that, it seems that this problem is gone in TB 10.0! /C
(In reply to Christer Palm from comment #19) > Having said that, it seems that this problem is gone in TB 10.0! Thanks Christer. Based on this I close this as WFM.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: