Closed Bug 307527 Opened 19 years ago Closed 16 years ago

Connection timeout with IMAP on dual core systems

Categories

(Thunderbird :: General, defect)

x86
Windows XP
defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird2.0

People

(Reporter: fasano, Assigned: Bienvenu)

References

Details

(Keywords: fixed1.8.0.2, fixed1.8.1, verified1.8.1.3)

Attachments

(9 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b4) Gecko/20050908 Firefox/1.4
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20050907 Thunderbird/1.6a1 ID:2005090706

When 2 or more account is exist on 1 imap server
and do "Get All New Messages",
then the dialog "Connection to <server> time out." appear many times.

I show time problem with trunk and 1.8 branch.
but do not show it with 1.0.6.

Reproducible: Always

Steps to Reproduce:
1. make 2 or more account on 1 imap server
2. add the accounts to the thunderbird
3. do "Get All New Messages"

Actual Results:  
the dialog "Connection to <server> time out." appear many times.

Expected Results:  
no dialog.

I use the "Classic Hamster (build 2.0.7)" as imap server on local host.

It happen with dual core (Athlon 64 X2 3800+),
and it don't happen with single core (PenIII 800MHz).
If "Check for new messages every <n> minutes" enable,
it happen some time.
But not every checking time.
Actually it doesn't matter if it's IMAP or not, different servers or one. I got
3 POP accounts on 3 different servers and i get random timeouts on random occasions.
> I show time problem with trunk and 1.8 branch.
> but do not show it with 1.0.6.

Since 1.0.x doesn't have "tcp timeout" capabilty, "timeout dialog" is usually
displayed only when new builds.
mailnews.tcptimeout(default=60sec.) was introduced by Bug 189363.  

> It happen with dual core (Athlon 64 X2 3800+),
> and it don't happen with single core (PenIII 800MHz).

If timeout when "dual core" or multiple CPU was a result of concurrent server
access, increasing timeout detection is needed in many cases. 

ASANO san, is increasing mailnews.tcptimeout a relief?

Another concern.
Does "Classic Hamster" have capability to limit connection from an IP address?
If yes, check whether mismatch between nexts has relation or not.
 (a) "cached connection count" setting of Thunderbird (default=5)
 (b) connection "count limit from a single IP address" setting of IMAP server
     (default=4 if Courier-IMAP)" 
See Bug 92072 and Bug 71792. 
See Bug 206408 Comment #23 and Bug 196732 comment #63 also.
Since 1.0.x doesn't have "tcp timeout" capabilty, "timeout dialog" is usually
displayed on new builds only, then same phenomenon, timeout dialog on new builds
only, is observed on new builds only.

(In reply to comment #2)
> I got 3 POP accounts on 3 different servers and i get random timeouts on
random occasions.

POP3 problem is different from IMAP problem in many cases.
Your case sounds problem such as Bug 272145, Bug 270014 and Bug 273369, if your
problem is not "dual core" only problem.
Is your problem "dual core" only or multiple CPU only?

If Tb 1.0.x, check mail.pop3_response_timeout(default=45sec.) which was
introduced by Bug 127461.
(45 seconds is not enough for some servers. Outlook's default is 60 seconds for
POP3 server.)
If nightly/latest-trunk build, check mailnews.tcptimeout(default=60sec.) which
was introduced by Bug 189363.
(60 seconds is still not enough for some servers.)
Is increasing this value a relief?
(In reply to comment #3)

> If timeout when "dual core" or multiple CPU was a result of concurrent server
> access, increasing timeout detection is needed in many cases. 
> 
> ASANO san, is increasing mailnews.tcptimeout a relief?

I set 600 to mailnews.tcptimeout, but the problem occur.
The dialog appear before 600 sec.

> Another concern.
> Does "Classic Hamster" have capability to limit connection from an IP address?
> If yes, check whether mismatch between nexts has relation or not.
>  (a) "cached connection count" setting of Thunderbird (default=5)
>  (b) connection "count limit from a single IP address" setting of IMAP server
>      (default=4 if Courier-IMAP)" 

I set nolimit to the max connection$B!!(Band 3 to "cached connection".
So the dialog appear.


I make the new machine with vmware, run thunderbird on it 
and connect to the IMAP server (host os).
If the number of processer is 2, then the problem is happen.
But if 1 processer, thunderbird do normally.
> I make the new machine with vmware, run thunderbird on it 
> and connect to the IMAP server (host os).
> If the number of processer is 2, then the problem is happen.
> But if 1 processer, thunderbird do normally.

I have the same problem.  Running Thunderbird 1.5beta1 on Windows XP x64, AMD
Athlon X2 3800+.  If I run Thunderbird normally, I get connection timeouts all
over the place.  Run it and set it to use only one processor, and it works just
fine.  I'm also connecting to an IMAP server.
(In reply to comment #5)
> I have the same problem.  Running Thunderbird 1.5beta1 on Windows XP x64, AMD
> Athlon X2 3800+.  If I run Thunderbird normally, I get connection timeouts all
> over the place.  Run it and set it to use only one processor, and it works just
> fine.  I'm also connecting to an IMAP server.

I change my PC's BIOS setup.
I set "disable" to "Cool'N'Quiet".
Then the problem go away.
But sometime it appear again.

Please try this.

(In response to comment #5 and comment #6)
mstevens-san and ASANO-san, can you get trace at IMAP server when access from Tb
on "dual core"?
(Q1)Does all requests from thunderbird for all IMAP accounts arrive?
(Q2)If all requests arrives, does thunderbird respond to all requests from IMAP
server?
A1 Yes
A2 No

I attach the IMAP server's log.
4 connections for 4 accounts are created,
but 3 connections lost.
can you generate a client-side protocol log by following these instructions?

http://www.mozilla.org/quality/mailnews/mail-troubleshoot.html#imap

that'll give me a better idea if it's the client or the server that's dropping
the connection.
Attached file Thunderbird logfile
I attach the Thunderbird's logfile.

I show the dialog 3 time.
Just an information.
- Bug 233765 maybe has relation to this bug(found by search "athlon" in summary)
- There is at least Bug 152171 (XPCOM bug) when multiple processor. 
  This can have relation to this bug's problem.
  (found by search for ["dual OR multi" AND "core OR cpu OR process"] in summary)
Attached file IMAP log
I am experiencing exactly the same problem as described here. I have managed to
replicate this on the latest build - 20051002 - using just 1 IMAP account. When
trying to check my mail, the connection to the server times out immediately -
the timeout config option has no effect. This only happens in the new builds
and not in the current stable 1.0.7. Also, this build is working perfectly on
my Intel Centrino laptop. It is on my desktop AMD Athlon64 X2 4400 on Windows
XP x64 edition that the problem occurs - i.e I can confirm that this appears to
be a problem with dual core platforms.

I have taken a log of the connection as suggested in comment #9 and have
attached it.
Looking at the summary/title of this bug, perhaps a more appropriate one would be:

"Connection timeout with IMAP on dual core systems"

since it now appears this can be replicated with any number of accounts.
Darin, any clues on this? From the log, we're getting a disconnected error rv
from necko.
Status: UNCONFIRMED → NEW
Ever confirmed: true
on my pc, if only 1 account exist, this occur too.

I change the summary to comment #13
Summary: connection timeout when 2 or more account on 1 imap server → Connection timeout with IMAP on dual core systems
Am seeing the same problem, If i set Affinity to just one CPU everything is fine...
I can confirm that if you set the affinity of the thunderbird.exe process (use Ctrl+Alt+Del to get the task manager, right click the process and choose the Set Affinity option) so that it is only running on one CPU then no errors occur and everything appears to work as it should.
Temporary workaround:

Get imagecfg.exe (from windows NT 4.0 CD, or Windows 2000 Server Resource Kit Supplement One, or - easiest way - www.google.com) and run the following on commandline:

C:\Program Files\Mozilla Thunderbird > imagecfg -a 0x2 thunderbird.exe
thunderbird.exe contains no configuration information
thunderbird.exe contains a Subsystem Version of 4.0
thunderbird.exe updated with the following configuration information:
    Process Affinity Mask: 00000002

The -a 0xN parameter sets the CPU you want Thunderbird to reside on:
CPU              Mask
0                0x1
1                0x2
2                0x4
3                0x8
4                0x10
5                0x20

It probably doesn't matter which CPU to use, i used (for no special reason) CPU 1 in my example.

After doing this: No more manual affinity adjustments in taskmanager necessary ! Thunderbird.exe will be auto-magically on the CPU you specified and no more timeouts will occur :-)

---

According to my experiences using SSL does worsen the situation but it is not the cause. In my opinion Bug 314476 is a duplicate. Could someone confirm this behavior ?
(In reply to comment #18)
I agree that bug 314476 is a duplicate of this bug. And I can also confirm the workaround works.
*** Bug 314476 has been marked as a duplicate of this bug. ***
Brian Pietsch, you say in Bug 314476 Comment #10 :
> I have a dual HT Xeon (4 virtual processors).
> If I disable certain combinations of 2 (or more) of them, it seems to work,
> but other combinations of disabling 2 (or less) of them does not work.
> It might be a HyperThreading issue more than a multi-cpu issue.

Brian, problem occurs when when CPU affinity is set to 0 and 1(or 3 and 4), but no problem when 0 and 2(or 1 and 3)?

To all other problem reporters: Your "dual core" or "multi-processors" is "HyperThreading"?
>Brian, problem occurs when when CPU affinity is set to 0 and 1(or 3 and 4), but
>no problem when 0 and 2(or 1 and 3)?

Actually, all 4 of these combinations fail.  The ones that succeed are:

0, 1, 2, or 3 individually (only 1 selected)
0 and 3
1 and 2

Seems odd.  In any case, I spent the morning today getting sources and setting up my build environment.  I have a debug build set and ready to go.  Attaching the console output from my first debug run... of notable interest was the following assertion:

###!!! ASSERTION: nsWeakReference not thread-safe: '_mOwningThread.GetThread() =
= PR_GetCurrentThread()', file d:/source/mozilla/tbird-debug/xpcom/build/nsWeakR
eference.cpp, line 110

I'm ready and waiting for any other debugging that you might find helpful on my end.
Attached file debug console log
(In reply to comment #22)
> Actually, all 4 of these combinations fail.  The ones that succeed are:
> 0 and 3
> 1 and 2

Sounds it works only when combination of "Logical prosessor 0 on a Physical processor" and "Logical processor 1 on other Physical processor", or vise versa.
Why not work when same Logical processor number even on different Physical processor?

Does OS issue in Hyper Threading support have relation to the problem?

http://www.intel.com/cd/ids/developer/asmo-na/eng/20354.htm?page=3 says;
> Program Design
> Developers should observe carefully the following guidelines in terms of
> their affects on hyper-threading:
> 1. Threads should not compete for the same resources.
> This causes logical processors to stall while waiting for the same event.
> Avoid this problem by having threads do very different things.
> If threads must perform the same work, make them operate on different data
> and different events and/or assign the task to a different physical processor,
> if possible. 

http://www.intel.com/cd/ids/developer/asmo-na/eng/20354.htm?page=4 says;
> The Linux 2.4 kernel will not be developed to take an active advantage of HT
> Technology and patches may be required to fix some possible problems.
> Rusty Russell (a Linux developer) mentions: "The hyperthreading issue... is
> likely to throw a new set of complications into the mix.
> A processor which does hyperthreading looks like two independent CPUs,
> but it [processes] should not be scheduled [by the scheduler] as such - it is
> better to divide process across real (hardware) processors first."

Can some one re-produce the problem on newer Linux kernel than 2.4 or Mac OS X on Intel?
Attached patch possible fixSplinter Review
From Brian's debugging, my guess is that we're getting a negative interval for the polling time - I don't know why that would be, but my guess would be that hyper threading is involved. Darin, wtc, thoughts?
Assignee: mscott → bienvenu
Status: NEW → ASSIGNED
Comment on attachment 202277 [details] [diff] [review]
possible fix

>-    *interval = PR_IntervalToSeconds(PR_IntervalNow() - ts);
>+    PRIntervalTime tf = PR_IntervalNow();
>+    NS_ASSERTION(tf >= ts, "time went backwards!");
>+    *interval = (tf > ts) ? PR_IntervalToSeconds(tf - ts) : 0;

This patch is wrong.  PRIntervalTime is a 32-bit unsigned
integer timestamp that wraps around.  So it is legal for
tf to be less than ts.

People have long reported (before there were dual-core CPUs)
that the Win32 QueryPerformanceCounter() function returns the
wrong result on some hardware.  See bug 115865 and bug 176881.
You can try editing mozilla/nsprpub/pr/src/md/windows/ntinrval.c
and deleting the body of the _PR_MD_INTERVAL_INIT() function to
make it look like this:

void
_PR_MD_INTERVAL_INIT()
{
}

This will make PR_IntervalNow uses the Win32 GetTickCount()
function instead of QueryPerformanceCounter().

Let me know if that solves the problem.
Attachment #202277 - Flags: review-
>So it is legal for tf to be less than ts.

Thx, wtc. Is it possibly correct for it to be less than tf on startup every time? Or is just that that eventually tf will be less than ts?
Opener of Bug 315420, which is for similar problem when POP3, has reported interesitng test result, "/usepmtimer in boot.ini is a workaround when MS Win XP on Athlon 64 X2 4600+", in Bug 315420 comment #16.

Question to all problem reporters:
 - Can "/usepmtimer in boot.ini" be a workaround when MS Win XP on multi-core or
   HyperThreading? 
  
looks like /usepmtimer does work around the problem on hyperthreaded systems... it went away on my dual-ht xeon.
I set "/usepmtimer" in boo.ini and timeout don't appear.

Athlon 64 X2 3800+
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051109 Thunderbird/1.5 ID:2005110908
(In reply to comment #26)
> People have long reported (before there were dual-core CPUs)
> that the Win32 QueryPerformanceCounter() function returns the
> wrong result on some hardware.

Wan-Teh Chang and David, can "Correct TSC synchronization" section of http://support.microsoft.com/kb/Q896256 explain our problem?

Note:
The MS KB Q896256 is pointed by "/USEPMTIMER" description in http://smallvoid.com/tweak/winnt/install.html.
MS KB Q821893 and MS KB Q835730 are also pointed by the description.
Comment on attachment 202277 [details] [diff] [review]
possible fix

If you are willing to test the NSPR fix I described in
comment 26, please send me email so I can email you the
patched nspr4.dll for testing.  I will send the DLL in
a signed email.  In your email please tell me the version
of Mozilla Thunderbird that you are using.

To test the fix, you need to know how to replace the
nspr4.dll in your Mozilla Thunderbird installation with
the patched nspr4.dll.
(In reply to comment #31)

This is specific to Windows XP, however I'm using XP x64 Edition so I don't think it applies. Particularly since I can't locate the boot.ini file on my system and the msconfig interface doesn't allow me to add the /usepmtimer flag.
You will have boot.ini, but it tends to be a hidden, system read-only file so it wont show up unless you ask Explorer very nicely.

Windows XP 64bit uses the Windows 2003 kernel (and SP1 for both is equivelant to XP 32bit's SP2), but it doesn't look like it works around the processor timing weirdness any better than the 32bit version.
Ahh indeed. I added the flag from comment 28 and TB is now working properly without timeouts on Win x64 edition on Athlon X2 4400+ with build version 1.6a1 (20051111). Removing the flag reverts to the timeout errors.
(In reply to comment #32)
> (From update of attachment 202277 [details] [diff] [review] [edit])
> If you are willing to test the NSPR fix I described in
> comment 26, please send me email so I can email you the
> patched nspr4.dll for testing.

Wan-Teh Chang sent me 2 versions of the file, the original upatched and the patched to prove that the patch did have an effect. I rebooted after removing the /usepmtimer flag and renamed the existing nspr4.dll to nspr4.dll.old. However using either the original or patched versions of nspr4.dll provided by Wan-Teh caused the following error when trying to launch TB:

thunderbird.exe - Entry Point Not Found
The procedure entry point PR_GetPhysicalMemorySize could not be located in the dynamic link library nspr4.dll

Restoring the version of the file that comes with TB allowed TB to launch, but with the timeout errors remaining.
David,

The NSPR DLLs I sent to you will only work with
Thunderbird 1.0.7.  So you need to downgrade to
Thunderbird 1.0.7 to test those DLLs.
Right. But I don't get any timeouts with TB 1.0.7, it is only with the newest 1.5 builds (e.g. the one I just upgraded to - version 1.6a1 (20051111)).
wtc, only 1.5 and trunk builds have necko timeouts.
(In reply to comment #36)
Have now received .dlls for any version of TB. Can confirm that with the ".orig" dll file, the timeouts occur as expected. With the ".patched" version however, the timeouts do not occur - the bug is fixed. (This is confirming the fix described in comment 26).
David (Mytton): thanks for testing my fix.

David (Bienvenu) and Darin, there are two Win32 functions
we can use: GetTickCount() and timeGetTime().  In bug
124695, it is reported that GetTickCount() is cheaper
than timeGetTime(), but GetTickCount() only has 10ms
precision whereas timeGetTime() has 1ms precision.

timeGetTime() requires linking with winmm.dll.  NSPR
used timeGetTime() before, but removed it during Netscape
6.x's big push to minimize startup time.

Based on the info in bug 124695, I think timeGetTime is
the better choice.  What do you think?  Do you know how
to measure Mozilla startup time?
(Thank god for this bug.. I just set up a new AMD-X2 system and was sure I was doing something idiotically wrong.)

WTC, switching from QueryPerformanceCounter (QPC) to timeGetTime (TGT) may not solve this problem - I think it it may just shift it to a different set of users.  We've run into this problem in the music world for years, and while dual-CPUs don't work well with QPC because they don't stay in sync, there's a whole other class of motherboards that have bugs or other problems in timeGetTime.

A popular music program, Nuendo/Cubase, ended up just offering a checkbox to users and saying "Here, you pick which timer works for you."  Other apps have done some checking to see if the QPC clock is based on a reasonable counter, and if so, they trust it, else they fall back to TGT.  There's also a lot of discussion of calibration loops and sanity-checking.  

Now, most of these apps need great precision, so for a 60-second timeout, I don't know if you'll run into these problems - but you very well might!  I've seen TGT deltas be negative on some boards, but again, I was mostly looking for milliseconds, not seconds.

Some discussions I had researched a few years ago (or, depending which clock you believe, several seconds ago):

http://www.tcl.tk/cgi-bin/tct/tip/7.html
http://www.gamedev.net/community/forums/topic.asp?topic_id=195892
http://www.libsdl.org/pipermail/sdl/1999-September/023531.html
By the way, the imagecfg workaround doesn't work for me on an Athlon 4600+.  The tool confirms that it's set the uniprocessor mode and 00000001 affinity - it even reads it back in the next time I check it - but TB 1.5RC1 still gives instant timeouts.  Setting the affinity manually from task manager each time does seem to work.  I presume that means I'm doing something wrong with imagecfg, but I can't imagine what.  For whatever that's worth.
(In reply to comment #42)
> I just set up a new AMD-X2 system
> We've run into this problem in the music world for years, and while
> dual-CPUs don't work well with QPC because they don't stay in sync, there's a
> whole other class of motherboards that have bugs or other problems in
> timeGetTime.

Jay Levitt, have you read MS KB Q821893, MS KB Q835730 and MS KB Q896256 which I mentioned in comment #31?
Sounds MS KB Q835730 is applicable to your issue.
WADA - thanks for the tips... it's actually more complicated in the music world, since you also have to worry about what clock the hardware and drivers are based on, but we're getting OT, so never mind.  I was just mentioning it as confirmation that clock problems are known issues in the PC world.

Also, correction to my own report - the IMAGECFG workaround works fine, as long as you aren't running thunderbird.exe while trying to modify the image.  Doh!  The Task Manager "Set Affinity..." option *will* reflect any changes you make with IMAGECFG so that's a good way to confirm that you've done it correctly.

Interestingly, KB #896256 implies that there's a hotfix for Windows XP SP2 that will keep the clocks in sync across processors.  I've asked them to send it to me.  Still, if TB can work around the problem, it seems better than expecting end users to contact Microsoft to get a hotfix.
Jay: does this mean you now think it is okay to
use timeGetTime?  Or should we use GetTickCount(),
which only has a 15-16 ms resolution on my Windows
XP SP2 computer?
Honestly, I'm not sure - I've only written code to test for the problems, not to solve them.  From reading those threads, it sounds like there's no One Great Solution, unless you want to sanity-check one timer against the other when both are present, via a software PLL.

For what it's worth, I can now confirm that the Windows Hotfix from #896256 solves the problem on my Athlon X2 4600+; I reset Thunderbird's affinity to allow both CPUs, and I can successfully launch, download IMAP folders, etc. without a "server timeout" error.  There seems to be no way to remove the affinity mask, so I just set it to 0x03.

I asked a device-driver guy I know who's dealt with audio clock issues to stop in here and offer his thoughts.
I have a dual core Athlon 60 4600 running Win XP and I have the exact same problem.  If I set the affinity of Thunderbird to just CPU0 the problem goes away.
The problem exists in spades with 1.5rc1 (20051025) with an IBM R31 laptop, which is far too old to have multiple cores or HT. (Win XP SP2)
Is this problem going to be resolved for the 1.5 release? I have been using the patched nspr4.dll file for some time now without any problems.
I had the same problem with the 1.5 release. Had to bind it to one of the CPU cores only, then the problem goes away.
I'm shocked that TB1.5 launched with this bug still alive.  It makes TB completely non-functional on dual core systems.

I did discover any easier work-around in the mozillazine forums:
Right click on thunderbird.exe or a Thunderbird shortcut and choose "Properties".
Choose the "Compatability" tab and run thunderbird in compatability mode for "Windows 98 / ME".  Note: this is not a fix for the problem, only a temporary work-around.
(In reply to comment #52)
> I did discover any easier work-around in the mozillazine forums:
> Right click on thunderbird.exe or a Thunderbird shortcut and choose
> "Properties".
> Choose the "Compatability" tab and run thunderbird in compatability mode for
> "Windows 98 / ME".  Note: this is not a fix for the problem, only a temporary
> work-around.

I should mention that this work-around does not allow links to be properly sent to external applications (such as Firefox).  In this compatibility mode, Windows converts forward slashes ( / ) to back slashes ( \ ), making the URL invalid. 

Attached file Patched DLL
Seeing as this bug is not deemed to be important enough to go into the TB1.5 release, and I have had a patch for it since the beginning of November 05 (comment #36), I thought I would attach the patched file provided by Wan-Teh Chang. Just download the .dll and overwrite the current one in C:\Program Files (x86)\Mozilla Thunderbird.
Attachment #208496 - Attachment description: Patched DDL → Patched DLL
*** Bug 323435 has been marked as a duplicate of this bug. ***
The original code uses QueryPerformanceCounter if it
works, and falls back on GetTickCount.

This patch essentially has two changes.

1. Do not use QueryPerformanceCounter.  This is as
if _nt_ticksPerSec is a constant with value -1 and
only the code paths for _nt_ticksPerSec == -1 remain.

2. Use timeGetTime instead of GetTickCount because
timeGetTime has a finer resolution on most systems.
Unfortunately this requires linking with the system
library winmm.lib, which hurts the application startup
time.  But this is a safer change because I'm worried
that some code may need the higher resolution.
Attachment #208675 - Flags: superreview?(bienvenu)
Attachment #208675 - Flags: review?(darin)
wtc: the cheap way to test mozilla startup performance is by loading a single web page that does a "window.close()" call.  then, run time the execution of "firefox file:///path/to/special_page.html."  The only detail is that you have to flip a pref in firefox to enable window.close() calls from unprivileged script ;-)

I think you'd want to toggle the dom.allow_scripts_to_close_windows pref.
Attachment #208675 - Flags: superreview?(bienvenu) → superreview+
Wanted to add some findings other people have from a mozillazine thread I didn't see linked: http://forums.mozillazine.org/viewtopic.php?t=330981
Note page two where the users are downloading driver updates.
Note Setting timeout to 65535

I think AMD X2 processor driver is doing the same thing as the microsoft patch mentioned above Q896256.  And thus may be the thing that resolves this for AMD users.

Also isn't bug 315420 the same issue and should be marked duplicate and bug title changed to reflect that it occurs for both POP and IMAP?
(In reply to comment #59)
> I think AMD X2 processor driver is doing the same thing as the microsoft patch
> mentioned above Q896256.
> And thus may be the thing that resolves this for AMD users.

Looks to be same when IBM Thinkpad G41(Intel HyperTheading model) users.

Next are sites I reported to bug 315420 Comment #15.
> (1) http://support.microsoft.com/?id=835730
> (2) http://forums.amd.com/lofiversion/index.php/t46434.html
> (3) http://www-307.ibm.com/pc/support/site.wss/document.do?lndocid=MIGR-57463
>     (Thinkpad G41 is Intel HT model)
Third one says : 
 -  When MS Win XP SP1, "/usepmtimer" is the solution.
    (says also that additional fix is required though). 
 -  When MS Win XP SP2, driver update is the solution.
This driver update for SP2 seems to be similar solution to MS's Hot Fix for SP2.
I don't know about other models.
WTC, the msdn docs for timeGetTime say that it has a default precision of 5 msec on WinNT, and a default precision of 1 msec on Win9x.  It says that this can be adjusted.  Has Microsoft said anything about the bugs with QueryPerformanceCounter?

I also checked, and it's true that Firefox does not load winmm.dll currently; however, it is loaded by other processes on my system (by stuff that is fairly common like winlogon.exe and svchost.exe).  So, perhaps the cost of loading it into Firefox will be very small.  Has anyone tested this patch to see if it impacts startup time?
I've got the timeout with two POP3 accounts. With the new nspr4.dll it works fine, no more problems.
Confirmed - works for me (Althlon 4200+ dual-core)!
Confirmed - works for me (Athlon 4200+ dual-core)!
this is probably one for the 2.0 radar, and possibly the 1.5.0.1/2 train. Though it would have to land on the trunk soon for that...
Flags: blocking-thunderbird2+
Target Milestone: --- → Thunderbird2.0
The patched DLL that is supposed to take care of timeout problems on systems with dual-core CPUs does nothing to correct the false-timeout problem on our AMD 4400-based test platform. 

Version 1.07 did run correctly. The false timeouts were encountered first when testing version 1.5 release candidates on the dual-core systems.

Windows XP in this case is specifically x64 with SP2. 
(In reply to comment #66)
> Windows XP in this case is specifically x64 with SP2. 

For your information.
This bug's problem when MS Win XP x64 on AMD's dual core was reported to Bug 323228.
Bug 323228 is not DUP'ed to this bug, because Bug 323228 is kept open for crash problem after this bug when MS Win XP x64.
Flags: blocking1.8.0.2?
OS: WinXP Professional 5.1 Service Pack 2 (Build #2600) - CPU: Dual Core AMD Opteron 175, 2.21 GHz - Memory Usage: 475/2048MB

I just changed from an AMD 3500+, to this processor last night. Was getting timeouts all over the place in Thunderbird 1.5 everytime I changed directories or hit get mail. I have 2 imap accounts, on the same server, and the server is running courier-imap and qmail. 

Here are the fairly useless logs from the server. Note that both IMAPS and SMTP were both timing out.

[Jan 31 10:27:09 echo imapd-ssl: Disconnected, ip=[68.144.xxx.xxx], time=0, starttls=1
Jan 31 10:27:09 echo imapd-ssl: Connection, ip=[68.144.xxx.xxx]
Jan 31 10:27:10 echo imapd-ssl: Disconnected, ip=[68.144.xxx.xxx], time=1, starttls=1
Jan 31 10:27:10 echo imapd-ssl: Connection, ip=[68.144.xxx.xxx]
Jan 31 10:27:10 echo imapd-ssl: Disconnected, ip=[68.144.xxx.xxx], time=0, starttls=1
Jan 31 10:27:10 echo imapd-ssl: Connection, ip=[68.144.xxx.xxx]
Jan 31 10:27:10 echo imapd-ssl: Disconnected, ip=[68.144.xxx.xxx], time=0, starttls=1
Jan 31 10:27:11 echo imapd-ssl: DISCONNECTED, user=burn@gentoo.ca, ip=[68.144.xxx.xxx], headers=0, body=0, time=27, starttls=1

@4000000043df9d003706921c tcpserver: status: 0/40
@4000000043df9d09069fcef4 tcpserver: status: 1/40
@4000000043df9d0906a35164 tcpserver: pid 5011 from 68.144.xxx.xxx
@4000000043df9d0906af76c4 tcpserver: ok 5011 echo.bluedevil.ca:66.51.100.98:25 somemodem.cg.shawcable.net:68.144.xxx.xxx::1344
@4000000043df9d090a91873c tcpserver: end 5011 status 256
@4000000043df9d090a92043c tcpserver: status: 0/40
@4000000043df9d1d204354ac tcpserver: status: 1/40
@4000000043df9d1d2046ee8c tcpserver: pid 25501 from 68.144.xxx.xxx
@4000000043df9d1d2052b244 tcpserver: ok 25501 echo.bluedevil.ca:66.51.100.98:25 somemodem.cg.shawcable.net:68.144.xxx.xxx::1345
@4000000043df9d1d2a87524c tcpserver: end 25501 status 256
@4000000043df9d1d2a87d334 tcpserver: status: 0/40
@4000000043df9d3a145d0534 tcpserver: status: 1/40
@4000000043df9d3a146087a4 tcpserver: pid 11746 from 68.144.xxx.xxx
@4000000043df9d3a146cff0c tcpserver: ok 11746 echo.bluedevil.ca:66.51.100.98:25 somemodem.cg.shawcable.net:68.144.xxx.xxx::1352
@4000000043df9d3a1e083d74 tcpserver: end 11746 status 256


After downloading the patched dll (2006-01-14 14:16 PST), it seems to be working much better now. Haven't had a single timeout yet. Upping the timeout in TB has zero effect for me btw. Thanks for the patch!
We'd love to consider this patch for Thunderbird 1.5.0.2 as a lot of users have been getting bit by this on dual processor systems, but Darin raises some good points about the effect this patch may have on Firefox startup time. Anyone on the bug list who could help alleviate those concerns? 
could we check it in and look at the performance numbers on Tinderbox to make sure it doesn't cause a hit? I can't imagine it should make anything other than a negligible difference. Or are we still driving blind there? 
Comment on attachment 208675 [details] [diff] [review]
Proposed patch: use timeGetTime (whitespace ignored for code review)

Given that Windows NT defaults to a 5ms interval for timeGetTime (some reports indicate 10ms), perhaps it would be good to call timeBeginPeriod to set the interval to 1ms instead?
(In reply to comment #70)
> Or are we still driving blind there? 

David, has comment #42 by Jay Levitt been already cleared?
> switching from QueryPerformanceCounter (QPC) to timeGetTime (TGT) may not
> solve this problem - I think it it may just shift it to a different set of
> users.

I think "switching from QPC to TGT(even when GetTicCount)" is better to be optional, user choosable. 
scott/bienvenu: the tinderbox tests are back up, so we can give this a try and see what the results look like.  i'm more concerned about the interval resolution issue.  it'd be nice to understand the implications of reducing the timer resolution.
At least for one mozilla zine community tester, he downloaded new CPU drivers for his AMD dual core system, and it fixed the IMAP timeouts he was seeing. 

http://forums.mozillazine.org/viewtopic.php?p=2059198#2059198

just an FYI. 
I'm not that worried about the startup time added by
loading winmm.dll.  What bothers me is that timeGetTime
and GetTickCount each have its own problem.

For timeGetTime, it's not clear what happens if the
timeBeginPeriod and timeEndPeriod calls are nested,
for example,

timeBeginPeriod(1);
timeBeginPeriod(5);
...
timeEndPeriod(5);
timeEndPeriod(1);

or

timeBeginPeriod(1);
timeBeginPeriod(5);
...
timeEndPeriod(1);
timeEndPeriod(5);

It's also not clear what happens if the application
does not call timeEndPeriod, for example,

timeBeginPeriod(1);
...
// no timeEndPeriod call

or

timeBeginPeriod(1);
timeBeginPeriod(5);
...
timeEndPeriod(5);
// no timeEndPeriod(1) call

For GetTickCount, its resolution of 16ms on my
Windows XP SP2 computer may be too coarse for some
applications that depend on the finer resolution of
QueryPerformanceCounter.

So I don't know how to solve this problem without
introducing new problems for some other applications.
In comment 73, darin wrote:
> it'd be nice to understand the implications of
> reducing the timer resolution.

This knowledge base article is one of the best
I found:

Battery Life May Be Shortened After You Use a
Multimedia Program
Article ID: 306828
http://support.microsoft.com/default.aspx?scid=kb;en-us;306828

These two articles talk about QueryPerformanceCounter
and multicore CPUs:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/directx9_c/Game_Timing_and_Multicore_Processors.asp
http://support.microsoft.com/default.aspx?scid=kb;en-us;909944
Having just stumbled upon this bug on an x64 version of windows, I thought I'd add two cents. 

Microsoft explains why QueryPerformanceCounter on x64 isn't performing well:
http://blogs.msdn.com/oldnewthing/archive/2005/09/02/459952.aspx

Raymond Chen discusses QueryPerformanceCounter in his blog:
http://support.microsoft.com/?scid=kb;en-us;895980
WTC, we could arrange to have timeBeginPeriod and timeEndPeriod called at application startup and shutdown if we wanted to.  We don't have to put that logic in NSPR.
Darin, we should try hard to not call timeBeginPeriod and
timeEndPeriod in the application because this violates the
implementation abstraction of NSPR.  If we determine that
we should call timeBeginPeriod and timeEndPeriod, we can
call them in NSPR's DllMain function during process attach
and process detach.
Good suggestion.  Any thoughts as to how to go about determining
if we need to or should call timeBeginPeriod?
Comment on attachment 208675 [details] [diff] [review]
Proposed patch: use timeGetTime (whitespace ignored for code review)

OK, let's just give this a try then.  r=darin
Attachment #208675 - Flags: review?(darin) → review+
You should not call timeBeginPeriod in DllMain because it depends on a dll other than kernel32.dll.  See the documentation for DllMain:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/dllmain.asp

"The entry-point function should perform only simple initialization or termination tasks. It must not call the LoadLibrary or LoadLibraryEx function (or a function that calls these functions), because this may create dependency loops in the DLL load order. This can result in a DLL being used before the system has executed its initialization code. Similarly, the entry-point function must not call the FreeLibrary function (or a function that calls FreeLibrary) during process termination, because this can result in a DLL being used after the system has executed its termination code.

Because Kernel32.dll is guaranteed to be loaded in the process address space when the entry-point function is called, calling functions in Kernel32.dll does not result in the DLL being used before its initialization code has been executed. Therefore, the entry-point function can call functions in Kernel32.dll that do not load other DLLs. For example, DllMain can create synchronization objects such as critical sections and mutexes, and use TLS.

Windows 2000:  Do not create a named synchronization object in DllMain because the system will then load an additional DLL. This restriction does not apply to subsequent versions of Windows.
Calling functions that require DLLs other than Kernel32.dll may result in problems that are difficult to diagnose. For example, calling User, Shell, and COM functions can cause access violation errors, because some functions load other system components. Conversely, calling functions such as these during termination can cause access violation errors because the corresponding component may already have been unloaded or uninitialized."
It'd be great if we could get this patch onto the trunk and the 1.8.1 branch so we can get some testing feedback on it in case we think about this for the next thunderbird stability update. I can help land it.
Comment on attachment 208675 [details] [diff] [review]
Proposed patch: use timeGetTime (whitespace ignored for code review)

I just checked in this patch on the NSPR trunk (NSPR 4.7),
NSPRPUB_PRE_4_2_CLIENT_BRANCH (Mozilla trunk,
mozilla1.9alpha), and MOZILLA_1_8_BRANCH (mozilla1.8.1).

I didn't add any timeBeginPeriod and timeEndPeriod calls.
Attachment #208675 - Flags: branch-1.8.1+
thanks wtc
Keywords: fixed1.8.1
hmm, I think we broke the Firefox windows build with this change:

http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla1.8/1139279400.26513.gz#err0

It's not finding the symbols from the new library we are linking in.
Blocks: 124695
Builds with this patch can be found here:

ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8

It would be great if all of you with dual core systems could help test these binaries. It would greatly improve the chance of this patch getting into the next stability update of Thunderbird 1.5. Thanks!
Please wait until tomorrow (2006-02-08) to download builds
with this patch from the URL that Scott gave in comment 87.

Although I checked in the patch yesterday afternoon, I had
to back it out last night because of a Tinderbox issue.  This
morning I figured out a way to check in the patch without
breaking Tinderboxes, so I have checked in the patch on the
MOZILLA_1_8_BRANCH, and will check in the patch on the Mozilla
trunk this afternoon.

Scott, sorry about the confusion.
I have just checked in the patch on the NSPRPUB_PRE_4_2_CLIENT_BRANCH
(Mozilla trunk).  For the person who will check in the patch on the
MOZILLA_1_8_0_BRANCH, here are the instructions to avoid breaking the
Tinderboxes.

1. Check in the patch.  Note that the patch changes a Makefile.in
   file in mozilla/nsprpub.

2. Add or remove a blank line at the end of mozilla/nsprpub/configure.
   This dummy checkin will cause the Mozilla Tinderboxes to rerun
   mozilla/nsprpub/configure and regenerate all the Makefile's in
   mozilla/nsprpub.
No longer blocks: 124695
Depends on: 124695
*** Bug 315420 has been marked as a duplicate of this bug. ***
Flags: blocking1.8.0.2? → blocking1.8.0.2+
Comment on attachment 208675 [details] [diff] [review]
Proposed patch: use timeGetTime (whitespace ignored for code review)

approved for landing on the 1.8.0 branch, a=dveditz for drivers
Attachment #208675 - Flags: approval1.8.0.2+
AMD X2 4200+
Running Affinity (and checking in Task Mgr to see if it worked), installing the AMD update, the MS Hotfix, and a new nspr4.dll still does not do the trick for me.  Neither does upping the timeout.  I've tried everything on this page with no success, so far.
Add /usepmtimer to your boot.ini so that the line looks like this:
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional" /noexecute=optin /fastdetect /usepmtimer
Sorry for the redundant post. Disregard my last reply.
(In reply to comment #87)
> Builds with this patch can be found here:
> 
> ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8
> 
> It would be great if all of you with dual core systems could help test these
> binaries. It would greatly improve the chance of this patch getting into the
> next stability update of Thunderbird 1.5. Thanks!

I downloaded the nightly build earlier today (20060212) and installed it on my dual-core AMD system that I have Windows installed on.  I was running into this bug with the 1.5 release.  So far the nightly build appears to be working fine for me.  I haven't noticed any performance problems with program startup and haven't run into the connection timeout problem.  The bottom line is that everything is looking great to me at this point.
(In reply to comment #87)
> Builds with this patch can be found here:
> ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8
> It would be great if all of you with dual core systems could help test these
> binaries. It would greatly improve the chance of this patch getting into the
> next stability update of Thunderbird 1.5. Thanks!

I just installed the build above and have not had anymore timeout issues. I am not a power user of Thunderbird, just basic email stuff. Here are details on my HW.
DFI LANPARTY UT nF4 SLI DR Expert
Bios 12/7/2005 Phoenix 6.00 PG
AMD Athlon64 X2 4400+ Toledo
OCZ Gold Edition (2 x 1GB) DDR 500 (PC 4000)
2x 160GB Seagate Barracuda SATA hdd 
2x eVGA 256-P2-N516 Geforce 7800GT 256MB
        in SLI config
BlueGears/HDA Digital XMYSTIQUE7.1 8 (7.1)  Sound Card
Windows XP Pro SP 2
I'm also pleased to say that version 1.5 (20060212) is working fine for me without any of the timeouts on AMD Athlon X2 4400 and Windows XP x64 Edition.
The nightly from 20060212 still has the same timeout issues for me (see Comment 92).  
I put this on the 1.8.0 branch. thanks for the tip about changing the configure file!
Keywords: fixed1.8.0.2
Bug 124695 comment 18 reported that the MinGW build crashed
inside the time() function.  This patch has been verified to
fix that crash.  I checked in this patch on the NSPR trunk
(NSPR 4.7), NSPRPUB_PRE_4_2_CLIENT_BRANCH (mozilla
1.9 alpha), MOZILLA_1_8_BRANCH (mozilla 1.8.1), and
MOZILLA_1_8_0_BRANCH (mozilla 1.8.0.2).
Questions to Wan-Teh Chang, in order to clarify release strategy of "Switch-from-QPC-to-TGT patch".

According to MS knowledgebase'es,
 - http://support.microsoft.com/kb/Q821893 
 - http://support.microsoft.com/kb/Q835730 
 - http://support.microsoft.com/kb/Q896256
problem looks to occur only when all next conditions are met.
(considering MS Win XP only)
 (1) Multiple CPU
 (2) acpi or "speed step" is enabled
     Note: Problem possibly occur even when no acpi/no "speed step",
           because different hardware timer like TSC is used when multi-CPU,
           although possibilty of "Time travel to past" is expected to be
           very very low or nealy equal to zero(maybe really zero). 
 (3) Multiple CPU edition of MS Win XP (smp-HAL or acpi-HAL)
     Note: Time Stamp Counter(TSC) is used as system timer when multiple CPU
           edition of MS Win XP, then QPC uses TSC.
           But Power Management Timer(PMT) is used as system timer when single
           CPU edition of MS Win XP, then QPC uses PMT.  
 (4) QueryPerformanceCounter(QPC) is used in multiple CPU environment.
 (5) Before MS Win XP SP3(= SP2 + HotFix + a). This is always true currently.
     Note: Since driver update by hardware vender for MS Win XP SP2 is acpi-HAL
           update, the driver update seems to be similar change by MS's Hot Fix
           for SP2. 
 (6) One of nexts.
     (6-1) Hardware doesn't have "Power Management Timer".
           (No way to bypass by /usepmtimer)
     (6-2) MS Win XP SP1 or before(No way to bypass of /usepmtimer)
     (6-3) MS Win XP SP2 but unable(or doesn't want) to use bypass of
           /usepmtimer.

(Q1) When single CPU or "/usepmtimer" is enabled(SP2 or later only), there is
     no need to switch from QueryPerformanceCounter(QPC) to getTicCount(GTC).
     In this case, your concerns, startup performance and low resolution than
     QPC, can become a problem in some environments.
     Is forcing patch use to all MS Win XP users safe?
(Q2) When MS Win XP SP3 will be available, it's expected to be no need to
     switch from QueryPerformanceCounter(QPC) to getTicCount(GTC).
     Won't "low resolution than QPC" become problem in future?
Wada:

Thank you for the information and questions.

We switched to timeGetTime(), not GetTickCount().

I am not worried about the degradation to startup performance
any more.  The degradation is negligible.

As for time resolution, the requirement is that it must be
<= 10 ms, which is the traditional Unix time resolution.  If
we aren't sure that the default resolution of timeGetTime()
is fine enough, we can change the Mozilla clients to call
timeBeginPeriod and timeEndPeriod to set the resolution to
a value <= 10 ms.  It is a shame that that'll violate the
implementation abstraction of NSPR (unfortunately we can't
call timeBeginPeriod and timeEndPeriod in NSPR's DllMain
function), but it's necessary to fix this bug.
(In reply to comment #102)
> We switched to timeGetTime(), not GetTickCount().
Oh, sorry. I was confused TGT, GTC, QPC, TSC, PIT etc. Too many T & C's.

> The degradation is negligible.
> As for time resolution, the requirement is that it must be
> <= 10 ms, which is the traditional Unix time resolution.
Thanks. I can sleep well by your answer :-)

Last question.
According to your Comment #89, special care seems to be needed in build creation.
Won't it produce some problems in future(for patch creater, for nightly testers)?
Wada,

My comment 89 is a way to work around a bug in
NSPR's makefiles, which don't rerun NSPR's configure
if a Makefile.in in NSPR changed.  I guess we could
file a bug report about this problem.
(In reply to comment #104)
> My comment 89 is a way to work around a bug in NSPR's makefiles
Oh I see.
Then no worry for me about release of your "Switch-QPC-2-TGT(=timeGetTime) patch".
Wtc, thanks for your answers to my questions due to anxiety.
Installed nightly from 20060214.  
Re-applied all other fixes on this page.
Still times out.  Still works on my single-core machine.  

(BTW, is it proper to make daily reports here if I'm installing nightly builds?)
AMD x2 4200+
WinXP x64, all updates including today's
4 POP accounts

Testing sequence:
Removed user_pref("mailnews.tcptimeout", 65535);
 from prefs.js
Started TBird 1.5
No timeouts after a few minutes and a few tests
Restarted 1.5
Connection to server xx.xx.xx timed out
Uninstalled TB1.5

Installed 20060215 nightly
Started 20060215
Received messages from one account, no timeouts
Sent and rcvd a few test messages, no timeouts
Uninstalled 20060215

Reinstalled TB1.5
Started TB1.5
Connection to server xx.xx.xx timed out
Quit TB1.5
Reinserted user_pref("mailnews.tcptimeout", 65535); into prefs.js
Started TB1.5
No timeouts

Nightly 20060215 worked well for me
i try build 20060215, but i still have timeout problem width imap account (width and widthout ssl). i have Intel Celeron 2.66 CPU. 
Still doesn't work with thunderbird-2.0a1 nightly.
Continuing to try nightlies.  
1.  Uninstall TBird.
2.  Remove TBird directories.
3.  Install nightly.
4.  Run imagecfg.exe.
5.  Run TBird.
Connection times out.

Outlook Express and Opera Mail both work OK.
Have you tried without using imagecfg.exe? When I downloaded (and am still using) version 1.5 (20060212) it worked right away without any tweaking or changes to which core it is running on (etc).
I have tried 1.5 and all the nightlies without imagecfg, too.  I *really* don't like the other email clients and am doing everything I can think of to make TBirdy work.
(In reply to comment #112)
> I have tried 1.5 and all the nightlies without imagecfg, too.  I *really* don't
> like the other email clients and am doing everything I can think of to make
> TBirdy work.
I confirm this. 
One imap connection, one pop connection and near of 10 rss feeds.
I've got timeout problem yesterday after my switch hangs up. 
I've restarted my AlliedTelesyn AT8350GB, restarted my computer (A64 2800+) and timeout bug still exists and other network services are alive. But this problem doesn't appears at another computer near me, but TB configuration is different.
I've killed my TB profile and create the new one. Timeout error disappeared, but
it appeared again when I started configure my profiles.
I've downloaded nightly bould, but bug is still here.
I've notised, that rss still work but imap and pop accounts didn't tried to connect to their servers.
TB - 1.5 and thunderbird-3.0a1.en-US.win32.installer.exe.
pop3 server - popa3d linux debian 3.0 and latest updates
imap server - courier-imap and courier-imap-ssl linux debian 3.0 and latest updates
windows XP SP2 and latest updates.
I've found the solution in my case.
Thundenbird in one moment don't want to connect to server (imap, pop) if it is not in http proxy override list. if i add this server to http override list imap and pop servers thundenbird connects them normally.


i can submit this bug at another number
It seems like 1.5.0.2 fixed this bug for me.
Turns out that on the install on the new computer, Thunderbird was setup to use a proxy (default?  my fault?).  I changed to direct connection and TBirdy again works!
Most - if not all - contributions to this bug refer to Thunderbird within Win XP on dual core prozessors. 
But I have the same problem on a SuSE 9.3 installation with an Athlon X2 4800+. 
The bug does not appear when using the 1.08 version. It appears with all 1.5x rpms I found on the net. 
Regards
Ralph
Regarding my last comment about timeout errors on Suse 9.3 linux-systems with AMD Athlon X2 4800+ processors I should add that I have among other 1.5.x versions tried MozillaThunderbird-1.5.0.4-0.1.x86_64.rpm
and that I have a cyrus imap server as part of an Open Exchange OX5 server. 
The timeout error appears when synchronizing imap folders. Typically I get the warning popup when new login information is provided to the server (e.g. when the synchronization process changes from one folder to another).   
I have the "immediate imap timeout" problem on Suse 9.3 (64-bit mode) on an AMD X2 4400+ running Thunderbird 1.5.0.5 (source rpm MozillaThunderbird-1.5.0.5-0.1.src.rpm).  Running it on one processor (via "taskset -c 0 thunderbird") makes the timeouts go away.  I did get one timeout after an hour or so of running this way, but without the taskset command I would get a timeout error on almost every action (eg folder select, message select, mark read/unread/junk).  The only thing that changed was upgrading the rpm; prior to that it worked fine.
(In reply to comment #118)
> Suse 9.3 linux-systems with AMD Athlon X2 4800+ processors
(In reply to comment #119)
> Suse 9.3 (64-bit mode) on an AMD X2 4400+

Do you use newest processor driver?
> AMD Athlon&#8482; 64 Processors Driver Version 1.60.01 for Linux 2.6 (May 2006)
>   Provides support for AMD PowerNow!&#8482; technology and, where appropriate,
>   AMD$B!G(Bs Cool-n-Quiet&#8482; technology for Linux systems.
See AMD's "AMD Athlon&#8482; 64 X2 Dual Core Processor Utilities & Updates" page. http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_871_13118,00.html
(In addition to comment #120)
> Suse 9.3 linux-systems with AMD Athlon X2 4800+ processors
> Suse 9.3 (64-bit mode) on an AMD X2 4400+

Will disabling "Cool'N'Quiet"(BIOS setup) be a workaround?
See Comment #6 by bug opener.
Just FYI.
MS's KB article number on "Performance counter leap" phenomenon was Q274323.
  "Performance counter value may unexpectedly leap forward"
  ( http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q274323& )
(In addition to comment #120)
Following is part of documentation change on linux kernel 2.6.16 rc1.
(I got it from http://www.tglx.de/projects/hrtimers/2.6.16-rc1/patch-2.6.16-rc1-hrt1.patch )
> Index: linux-2.6.16-rc1/Documentation/kernel-parameters.txt
> ===================================================================
> --- linux-2.6.16-rc1.orig/Documentation/kernel-parameters.txt
> +++ linux-2.6.16-rc1/Documentation/kernel-parameters.txt
>(snip)
> - 	clock=		[BUGS=IA-32,HW] gettimeofday timesource override.
> -			Forces specified timesource (if avaliable) to be used
> -			when calculating gettimeofday(). If specicified
> -			timesource is not avalible, it defaults to PIT.
> +	clock=		[BUGS=IA-32, HW] gettimeofday clocksource override.
> +			[Deprecated]
> +			Forces specified clocksource (if avaliable) to be used
> +			when calculating gettimeofday(). If specified
> +			clocksource is not avalible, it defaults to PIT.
> 			Format: { pit | tsc | cyclone | pmtmr }
> 
> 	hpet=		[IA-32,HPET] option to disable HPET and use PIT.
> @@ -1551,6 +1553,10 @@ running once the system is up.
>(snip)
> +	clocksource=	[GENERIC_TIME] Override the default clocksource
> +			Override the default clocksource and use the clocksource
> +			with the name specified.

According to this documentation, choice of "tsc" / "pmtmr" looks to be possible on Linux too.
I think "tsc" is corrently choosed as "gettimeofday timesource" by your Linux.
Try to use "pmtmr" (similar to /usepmtimer in boot.ini on MS Win XP), if you know how to force clock source change externally.  
(In reply to comment #121)
> (In addition to comment #120)
> > Suse 9.3 linux-systems with AMD Athlon X2 4800+ processors
> > Suse 9.3 (64-bit mode) on an AMD X2 4400+
> 
> Will disabling "Cool'N'Quiet"(BIOS setup) be a workaround?
> See Comment #6 by bug opener.
> 
Hi, 
Cool n' quiet is disabled on my machine with the AMD Athlon X2 4800+ and Suse 9.3. 
Meanwhile I run Thunderbird on a SUSE 10.1 installation with kernel 2.6.16.21-0.13-smp on the same machine (Athlon X2 4800+). 
I have no problem with Thunderbird version 1.5.0.4 and version 1.5.0.5 combined with SuSE 10.1. So the problem is in my case specific for SuSE 9.3 and its kernel. 
Were there any significant changes coming with kernel 2.6.16 regarding the Athlon X2 processor? 
(In reply to comment #121)
> (In addition to comment #120)
> > Suse 9.3 linux-systems with AMD Athlon X2 4800+ processors
> > Suse 9.3 (64-bit mode) on an AMD X2 4400+
> 
> Will disabling "Cool'N'Quiet"(BIOS setup) be a workaround?
> See Comment #6 by bug opener.
> 

Finally got around to rebooting my box; my BIOS doesn't have any such option as far as I can see (I can't even disable ACPI).
(In reply to comment #120)
> (In reply to comment #118)
> > Suse 9.3 linux-systems with AMD Athlon X2 4800+ processors
> (In reply to comment #119)
> > Suse 9.3 (64-bit mode) on an AMD X2 4400+
> 
> Do you use newest processor driver?
> > AMD Athlon&#8482; 64 Processors Driver Version 1.60.01 for Linux 2.6 (May 2006)
> >   Provides support for AMD PowerNow!&#8482; technology and, where appropriate,
> >   AMD$B!G(Bs Cool-n-Quiet&#8482; technology for Linux systems.
> See AMD's "AMD Athlon&#8482; 64 X2 Dual Core Processor Utilities & Updates"
> page.
> http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_871_13118,00.html
> 

I haven't been using powernow for many months (it had some dual-core issues on my box), so I guess the answer is no.  Attempting to compile powernow-k8 version 1.60.1 failed miserably on Suse stock kernel-smp-2.6.11.4-21.13 even though the readme states that it works on 2.6.10 and later (errors about missing linux/mutex.h and DEFINE_MUTEX, whether doing an in-kernel or out-of-kernel build).
I am using Thunderbird 1.5.0.9 (200612207) on Win XP using Intel Centrino Duo, IBM ThinkPad.  I have tried the tricks about coaxing Thunderbird to have an affinity with a single CPU, and I consistently get this problem.

Is this unresolved for others still?
Regarding Comment #127 of A. Liles: 
I tried Thunderbird 1.5.0.9 (20061207)on a freshly installed Win XP on a DELL Precision M90 notebook with an Intel Core 2 Duo CPU (T7400). I just installed Thunderbird without any modifications, connected to my Openexchange server with cyrus imap and started  a synchronization of more than 100 email folders - I did not get any errors or timeouts. 
I tested with a LAN connection (Gigabit-Lan) and a much slower wireless connection.  
I have no idea why it works on my system and not that of A.Liles. Could there be a dependency on the specific type of dual core notebook processor?
    
(In reply to comment #127 and #128)
I should add that I tested again with TLS activated on the XP client on the DELL with the Inteal Core 2 Duo processor. (I did this because I had problems with TLS before on a Linux system with Kmail as a client. So I thought it might be interesting to check an encrypted connection with Thunderbird on Win XP).   
I did get an initial timeout there, but this was due to some misconfiguration of my personal firewall on the client which did not allow for a proper testing of all the imap server abilities. Thunderbird performs such a test after it is started with TLS activated. 
However, after correcting the firewall problem and accepting the certificate of the imap server the data exchange and folder synchronization with the Cyrus imap server worked again without any errors. 
I have had this problem before, but it was always rectified by setting the affinity to only one core.  It is my belief that if you are experiencing this problem even when the affinity is set to only one core, that you are experiencing some other problem/bug.  Have you tried running Tbird in safe mode?  Extensions can often be the cause of many problems.
Re comment #130: I can confirm that restricting the affinity to just one CPU does not prevent the problem for me.  Safe mode makes no difference.
re comments #128 and #129: yes, it must be some difference

I think I have now narrowed it to a dependency with the IMAP server, as I have tried the fault client machine against a different IMAP server and I have yet to see the problem.

Other machines I use with the IMAP server don't exhibit this issue, so there still might be a problem with TB, but I shall turn my attention to the IMAP server.

I had thought it might be related to a connection pool limit on my normal IMAP server, but I tried reducing the "maximum number of server connections to cache" and this makes no difference.

For the record, the "faulty" IMAP server is a Courier IMAP v3.0.8 implementation with this HELLO string: * OK [CAPABILITY IMAP4rev1 UIDPLUS CHILDREN NAMESPACE THREAD=ORDEREDSUBJECT THREAD=REFERENCES SORT QUOTA IDLE ACL ACL2=UNION STARTTLS] Courier-IMAP ready. Copyright 1998-2004 Double Precision, Inc.  See COPYING for distribution information.

And the symptom seems to be not with WHAT the client is doing with the server e.g. long/short  running operations, but when a new operation is required, e.g. viewing the body of an email, then the client program hangs for about 6 seconds.
(In reply to comment #131)

Ok, so what is the error message that TB reports when you have a problem?  Have you found any other similar bugs posts on bugzilla not related to the affinity issue? Have you tried the TB 2.0 Beta? It could have been fixed already in the next version. The beta is pretty stable; I have been using it for a while now without any show stoppers.
I have not seen this on a winxp dual xeon hyperthreaded box on thunderbird 2.0 builds in some time and going to verify this bug as fixed in Thunderbird 2.0.0.0.

If anyone can reproduce this bug please download a build from <http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2.0.0.0-candidates/rc2/>, install it, make sure it is permitted by your firewall and proxy to access the net, and report back here. Thanks.
Keywords: verified1.8.1.3
(In reply to comment #102)
> we can change the Mozilla clients to call timeBeginPeriod and timeEndPeriod to set the resolution to a value <= 10 ms.

Wan-Teh Chang, Bug 363258 has been opened.
FYI. Timer resolution seems to have been improved by Windows Vista.
I experienced this 6 months ago and raised comment 128 and comment 131; however the problem has gone away on the hardware and OS stated in comment 128.  Sorry, I cannot define which version fixed it but 1.5.x is fine.  I suspect all along it was not TB but connectivity to my IMAP server - perhaps by having too few connections permitted at the server end.

I recommend closing this issue.
For some time now have been experiencing situation where, after several days of running along just fine, attempting to "Get Mail" from 3 POP accounts fails but succeeds from a lone IMAP account. Message returned is "! Connection to server pop.xxxxxx.xxx timed out." Exiting TB and re-starting TB does not solve problem. Powering down and restarting system twice (yes, twice) is required. One power cycle results in same error message. Second power cycle restores proper operation. Only seems to occur after a week or so of 24/7 operation.

Am I in the right thread?

H-P Pentium D with oodles of RAM and HD
Windows XP MCE 2005 SP2
Thunderbird 2.0.0.9 (20071031)
Norton Internet Security up-to-date
Jim, if you have to restart your computer, it's pretty safe to say that the problem is not in Thunderbird.
From comment #133 and #135, resolving.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
(In reply to WADA:World Anti-bad-Duping Agency from comment #31)
> (In reply to comment #26)
> > People have long reported (before there were dual-core CPUs)
> > that the Win32 QueryPerformanceCounter() function returns the
> > wrong result on some hardware.
> 
> Wan-Teh Chang and David, can "Correct TSC synchronization" section of
> http://support.microsoft.com/kb/Q896256 explain our problem?
> 
> Note:
> The MS KB Q896256 is pointed by "/USEPMTIMER" description in
> https://whatstatus.co/non-veg-jokes/.
> MS KB Q821893 and MS KB Q835730 are also pointed by the description.


I will send the DLL in a signed email.  In your email please tell me the version of Mozilla Thunderbird that you are using.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: