Closed Bug 473483 Opened 15 years ago Closed 15 years ago

unable to re-connect to mail server after laptop awakes from sleep state

Categories

(Thunderbird :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0b3

People

(Reporter: endico, Assigned: Bienvenu)

References

Details

(Whiteboard: [no l10n impact])

Attachments

(2 files)

While I was away on vacation I put the laptop to sleep a lot and when it woke up again, thunderbird was often unable to re-connect to the mail server and claimed the network connection was down. The only way to use mail again was to restart Thunderbird. Firefox had no such problem so the network was not actually down.

I'm using Thunderbird 3.0b1 on a new macbook (2GHz Intel Core 2 duo) that runs Mac OS X 10.5.6. I put the machine to sleep by closing the lid.

Davida says he sees this a lot too. I'm not seeing it any more because now that I'm home I don't put the machine to sleep.
would be interesting if switching to offline/online in tb makes it working again
(In reply to comment #1)
> would be interesting if switching to offline/online in tb makes it working
> again

It does, in my testing on OS X.
It'd be really good to get this in b2.  Assigning to me so that I find someone who wants to take it. If you want to take it, take it with my thanks!
Assignee: nobody → david.ascher
Flags: blocking-thunderbird3+
Target Milestone: --- → Thunderbird 3.0b2
(In reply to comment #0)
> While I was away on vacation I put the laptop to sleep a lot and when it woke
> up again, thunderbird was often unable to re-connect to the mail server and
> claimed the network connection was down. The only way to use mail again was to
> restart Thunderbird. Firefox had no such problem so the network was not
> actually down.

Question: How long did you wait and are you using imap?

David Bienvenu has mentioned to me previously that imap could have a 100 second timeout on connections.

Obviously we'd prefer to detect the sleep mode, but if we are waiting until we time out then it would be good to know.
I just did a quick look in mxr. There are two notifications available on the observer service - sleep_notification and wake_notification.

http://mxr.mozilla.org/comm-central/search?string=sleep_notification

Looks like its supported on Windows & Mac. I wonder if we should be doing things like stopping the autosync service whilst we're sleeping? Anyone know if this happened with TB 2?
(In reply to comment #4)

> Question: How long did you wait and are you using imap?
> 
> David Bienvenu has mentioned to me previously that imap could have a 
> 100 second timeout on connections.

Yes, i'm using imap but I don't remember how long after awakening these problems occurred. My guess is longer than a hundred seconds but I'm not certain.
I don't think this is about sleep, but about network unavailability -- I've had the same problem when disabling wifi and plugging in a wire.  IIRC, we used to go into offline mode when the network was disabled, but that doesn't seem to happen anymore.
I suspect this is a dupe of 474345
Dawn, in my experience this FIXED at this point.  Can you test w/ a nightly and see if you still see the problem?
Whiteboard: fixed?
I am still having this problem with the latest nightly. Since the onset of this new issue I also have an issue with TB going runaway right after waking from sleep, pegging both cores. I am going to add my sample dump from the runaway process. Quitting and restarting the app returns normal function.
Access to all the mail already pulled down locally (cached) is ok but there is no network functionality. Even checking for updates fails with "Update XML file malformed (200)". Which seems somewhat odd, because if there is no network connection, that causes a totally different error.
In order to look into this further, after a wake cycle I deleted all of my Little Snitch rules for shredder to see if its actually trying anything. Short answer, its not. LS is showing no connection requests. Again normal function returns when the app is restarted
Whiteboard: fixed? → fixed? [no l10n impact]
Hi,

Ok so the build prior to the on for 20080203025550 worked correctly when waking from sleep. Both sent and received email. I'll keep following this issue on the nightlies for about a week or so.
I think David (Bienvenu) was going to take a look at this.

For investigating the notifications, there are a couple of places to hook into (in no particular order):

http://mxr.mozilla.org/comm-central/source/mozilla/netwerk/system/mac/nsNetworkLinkService.mm

nsNetworkLinkService::UpdateReachability() - this function is the one that gets called when something changes. The first if statement deals with if we're not sure about the mac being online or not, the second part of the function deals with when we know we are (or not) which is what we should normally be hitting.

On link status change, the service above sends out an network:link-status-changed notification via the observer service with up or down as the options, the IOService picks this up, and sends out an network:offline-status-changed notification.

We then set our UI indicators via that notification: network:offline-status-changed.

Note that the toggle function for online/offline is handled a few lines above: http://mxr.mozilla.org/comm-central/source/mail/base/content/mail-offline.js#80 it uses the IOService to detect the online state. I'm sure Phil has said that he's seen the offline icon but on clicking it TB asked to go offline, which doesn't really make sense unless the notifications are out of sync with the IOService (which generated them).
Assignee: david.ascher → bienvenu
Whiteboard: fixed? [no l10n impact] → [no l10n impact]
Bad news,

So I woke from sleep again, after the last nightly (previous working was a fluke?) and its not working again. Doing an intentional sleep cycle to test, the offline indicator (bulb) in the corner indicates an offline state, however I to the menu and turned off 'work offline' and there is no change in connectivity. The offline icon still is, well, offline and there's no network function.
(In reply to comment #16)
Doing an intentional sleep cycle to test,
> the offline indicator (bulb) in the corner indicates an offline state, however
> I to the menu and turned off 'work offline' and there is no change in
> connectivity. The offline icon still is, well, offline and there's no network
> function.

If that happens again, please could you try selecting to work offline again (or just click on the icon again), i.e. doing it twice - I believe that something may be getting out of sync so TB indicates it is in the other state to what it actually is.
Mark:

So after waking from sleep, the online status icon showed offline, and I had no activity. The menu setting did not match, (was not in 'work offline'), however I selected it anyway and LS immediately popped up asking for access from shredder. I'll keep monitoring. But it looks like its not reading the online state correctly, or not flipping back to online.
Here are the notifications we get in the case where things go wrong:


went to sleep:
offline status changed to offline
offline status changed to online

woke up:
Begin mail message delivery.
End mail message delivery.
offline status changed to online
offline status changed to offline

note that the notifier or its caller seems to be confused about what state we were in, or are in (more likely the latter) - we shouldn't get notified when the state hasn't changed...
dcamp might be interested in the above...I'll poke around in the code a bit more.
WARNING: NS_ENSURE_TRUE(identity) failed: file /Users/davidbienvenu/tbirdhg/mailnews/base/util/nsMsgIncomingServer.cpp, line 1821
io service set offline offline (was online)
io service set offline online (was offline)
offline status changed to online
offline status changed to offline
WARNING: NS_ENSURE_TRUE(thread) failed: file /Users/davidbienvenu/tbirdhg/mozilla/netwerk/base/src/nsSocketTransportService2.cpp, line 115

My guess is that the io service is sending the right notifications, in the right order, but we're receiving them in the wrong order - I don't know if the observer service is supposed to guarantee the order of notifications or not, but it wouldn't surprise me if it doesn't.
Does NSPR_LOG_MODULES=ObserverService:5 have things passing through there in the same backward order?
the observer service sure looks like it would do notifications in the right order, synchronously - we do have to go from c++ to js, but I don't see how that could be an issue, especially when necko also seems confused.

I'll try logging, but that's a bit of a pain on the mac...
observer service logging doesn't tell you what the data is, so it's not useful for this.  But from what I can tell with my hand-rolled printfs, the observer service is generating the actual notifications backwards, i.e., it's not that we're receiving the notifications in the wrong order; they're backwards by the time the observer service tries to generate them:


io service set offline offline (was online)
io service set offline online (was offline)
network:offline-status-changed to Online (nsObserverList::NotifyObservers)
offline status changed to online
network:offline-status-changed to Offline (nsObserverList::NotifyObservers)
offline status changed to offline
WARNING: NS_ENSURE_TRUE(mInitialized) failed: file /Users/davidbienvenu/tbirdhg/mozilla/netwerk/base/src/nsSocketTransportService2.cpp, line 467

I may have screwed up my diagnostics, of course ;-) if not, I'll try to figure out why we could be getting in this situation.
--DOMWINDOW == 10 (0x1d098280) [serial = 9] [outer = 0x1c6aacf0] [url = about:blank]
io service set offline offline (was online)
no notification sent here yet...

io service set offline online (was offline)
calling notify observers : online
network:offline-status-changed to Online
offline status changed to online

********here's where we went wrong: 
calling notify observers : offline
network:offline-status-changed to Offline
offline status changed to offline

is the io service getting called from multiple threads?
ah, no, it's getting called re-entrently - fun stuff :-)

#0	0x1176b565 in nsIOService::SetOffline at nsIOService.cpp:614
#1	0x1176aff4 in nsIOService::TrackNetworkLinkStatusForOffline at nsIOService.cpp:957
#2	0x1176d7e9 in nsIOService::Observe at nsIOService.cpp:823
#3	0x0049f405 in nsObserverList::NotifyObservers at nsObserverList.cpp:136
#4	0x004a095c in nsObserverService::NotifyObservers at nsObserverService.cpp:181
#5	0x118528fc in nsNetworkLinkService::SendEvent at nsNetworkLinkService.mm:207
#6	0x11852946 in nsNetworkLinkService::ReachabilityChanged at nsNetworkLinkService.mm:220
#7	0x95b38cc6 in rlsPerform
#8	0x931825f5 in CFRunLoopRunSpecific
#9	0x93182cd8 in CFRunLoopRunInMode
#10	0x900d9d75 in -[NSRunLoop(NSRunLoop) runMode:beforeDate:]
#11	0x11c0e4fb in nsAppShell::ProcessNextNativeEvent at nsAppShell.mm:615
#12	0x11c5845f in nsBaseAppShell::DoProcessNextNativeEvent at nsBaseAppShell.cpp:151
#13	0x11c5899a in nsBaseAppShell::OnProcessNextEvent at nsBaseAppShell.cpp:278
#14	0x11c0dcac in nsAppShell::OnProcessNextEvent at nsAppShell.mm:766
#15	0x004ff7b2 in nsThread::ProcessNextEvent at nsThread.cpp:497
#16	0x00488bf8 in NS_ProcessPendingEvents_P at nsThreadUtils.cpp:180
#17	0x11c583cb in nsBaseAppShell::NativeEventCallback at nsBaseAppShell.cpp:121
#18	0x11c0f79c in nsAppShell::ProcessGeckoEvents at nsAppShell.mm:374
#19	0x931825f5 in CFRunLoopRunSpecific
#20	0x93182cd8 in CFRunLoopRunInMode
#21	0x968332c0 in RunCurrentEventLoopInMode
#22	0x968330d9 in ReceiveNextEventCommon
#23	0x96832f4d in BlockUntilNextEventMatchingListInMode
#24	0x95b95d7d in _DPSNextEvent
#25	0x95b95630 in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
#26	0x11c0e55f in nsAppShell::ProcessNextNativeEvent at nsAppShell.mm:626
#27	0x11c5845f in nsBaseAppShell::DoProcessNextNativeEvent at nsBaseAppShell.cpp:151
#28	0x11c5899a in nsBaseAppShell::OnProcessNextEvent at nsBaseAppShell.cpp:278
#29	0x11c0dcac in nsAppShell::OnProcessNextEvent at nsAppShell.mm:766
#30	0x004ff7b2 in nsThread::ProcessNextEvent at nsThread.cpp:497
#31	0x00488a96 in NS_ProcessNextEvent_P at nsThreadUtils.cpp:227
#32	0x004fff4d in nsThread::Shutdown at nsThread.cpp:465
#33	0x11796276 in nsSocketTransportService::Shutdown at nsSocketTransportService2.cpp:445
#34	0x1176b708 in nsIOService::SetOffline at nsIOService.cpp:637
#35	0x1176aff4 in nsIOService::TrackNetworkLinkStatusForOffline at nsIOService.cpp:957
#36	0x1176d7e9 in nsIOService::Observe at nsIOService.cpp:823
#37	0x0049f405 in nsObserverList::NotifyObservers at nsObserverList.cpp:136
#38	0x004a095c in nsObserverService::NotifyObservers at nsObserverService.cpp:181
#39	0x118528fc in nsNetworkLinkService::SendEvent at nsNetworkLinkService.mm:207
#40	0x11852946 in nsNetworkLinkService::ReachabilityChanged at nsNetworkLinkService.mm:220
#41	0x95b38cc6 in rlsPerform
#42	0x931825f5 in CFRunLoopRunSpecific
#43	0x93182cd8 in CFRunLoopRunInMode
#44	0x968332c0 in RunCurrentEventLoopInMode
#45	0x968330d9 in ReceiveNextEventCommon
#46	0x96832f4d in BlockUntilNextEventMatchingListInMode
#47	0x95b95d7d in _DPSNextEvent
#48	0x95b95630 in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
#49	0x95b8e66b in -[NSApplication run]
#50	0x11c0e05e in nsAppShell::Run at nsAppShell.mm:693
#51	0x1291142a in nsAppStartup::Run at nsAppStartup.cpp:192
#52	0x000e30d8 in XRE_main at nsAppRunner.cpp:3279
#53	0x0000285f in main at nsMailApp.cpp:103
I think this is going to end up being a dup of bug 470274, and will require a 1.9.1 necko change.
Whiteboard: [no l10n impact] → [no l10n impact][fix in process in 470274 - will require a 1.9.1 change]
Since we can't guarantee when/if we'll see a 1.9.1 patch land from bug 470274, I think we better flip off the pref in the meantime: if we ship a beta with it on, but relnoted that turning it off makes the pain go away, then we don't have any way to pull those people back when it does work, but if we ship with it off, then when we turn it back on we not only bring back the people we gave a default off to, we also bring back anyone who has manually turned it off already.
Attachment #361025 - Flags: review?(bienvenu)
Comment on attachment 361025 [details] [diff] [review]
Flip the pref for a bit [checked in+backed out]

For b2, I'd probably have gozer apply the patch in bug 470274 for our beta builds, if it doesn't land in 1.9.1 before we spin b2. But we can turn it off on trunk builds for now...
Attachment #361025 - Flags: review?(bienvenu) → review+
Comment on attachment 361025 [details] [diff] [review]
Flip the pref for a bit [checked in+backed out]

http://hg.mozilla.org/comm-central/rev/b56463d50c4f

Only slightly adds to the degree of difficulty for rolling our own Gecko for b2 :)
Attachment #361025 - Attachment description: Flip the pref for a bit → Flip the pref for a bit [checked in]
Target Milestone: Thunderbird 3.0b2 → Thunderbird 3.0b3
Bug 470274 has now landed on branch, can folks flip the pref to true again and give this another try?
Depends on: 470274
Whiteboard: [no l10n impact][fix in process in 470274 - will require a 1.9.1 change] → [no l10n impact][needs testing before toggling pref again
Whiteboard: [no l10n impact][needs testing before toggling pref again → [no l10n impact][needs testing before toggling pref again]
Phil and I have been running with offline.autoDetect set to true since bug 470274 landed and we've not seen any problems. Therefore I have just backed out Phil's patch so that offline.autoDetect is set to true again on Mac.

http://hg.mozilla.org/comm-central/rev/015b58fc4823

Note: bug 480324 is a known mac issue about not detecting offline in some instances that hasn't landed for the 3.0x builds yet, however it doesn't affect this being able to re-enable the pref for users whom this bug originally affected.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: [no l10n impact][needs testing before toggling pref again] → [no l10n impact]
Attachment #361025 - Attachment description: Flip the pref for a bit [checked in] → Flip the pref for a bit [checked in+backed out]
I just want to chime in here on what I would call normal function. However there seems to be a slight bit of lag. I 'feel' like even after things like adium fire up, an di try to hit email, i'll get a connect failure if i check too soon. Hardly a deal breaker.
Today Apple released AirPort Client Update 2009-001 Version 1.0.

"This update is recommended for all Intel-based Macintosh computers running Mac OS X 10.5.6.  It addresses issues with roaming and network selection in dual-band environments."

Maybe the update is related to this bug and explains why I had trouble with it at my parent's house but can't reproduce it at home.
Dawn, have your parents a router? If yes, which model? See bug 475603 too.
They have a Linksys that does b/g. I think its a WRT54G.
Would a possible side benefit of the bug be that TBird no longer 'hangs' (I can't bring it to the front) when I wake the Mac from sleep until it's done with checking for new email? This was happening forever until the past week or so.
(In reply to comment #40)
> Would a possible side benefit of the bug be that TBird no longer 'hangs' (I
> can't bring it to the front) when I wake the Mac from sleep until it's done
> with checking for new email? This was happening forever until the past week or
> so.

That was probably bug 476960 being fixed if you have imap.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: