Closed Bug 570293 Opened 15 years ago Closed 13 years ago

timeout error when sending the attachment bigger than 5MB

Categories

(MailNews Core :: Networking: SMTP, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: brushington2, Unassigned)

Details

Attachments

(1 file)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.19) Gecko/2010040119 Ubuntu/8.04 (hardy) Firefox/3.0.19 Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 And Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.4) Gecko/20100526 Thunderbird/3.1 I report that problem at http://forums.mozillazine.org/viewtopic.php?f=39&t=1909277&p=9437219#p9437219 And tanstaafl agree with me to believe this is a bug of Thunderbird using the socket layer. The detail is like this: I use Thunderbird 3.0.2 on MacOS 10.5.6.(Later I tried Thunderbird 3.1RC1, and the same problem occur.) I can not complete the delivering process when sending a mail with 5MB.(Sometimes 7% delivered, sometimes 30% delivered like that.) The message shows: ================================ Send Message Error Sending of message failed. The message could not be sent because the connection to SMTP server **.**.**.** timed out. Try again or contact your network administrator. ================================= The SMTP server supports big attachments, I can complete atttachment like 27MB by Apple mail on the same computer. And by foxmail in Windows. I closed all the firewall and virus-scan when testing that. Here is the logs of SMTP sending: -1604430048[130ab40]: SMTP Connecting to: smtp.localserver.com -1604430048[130ab40]: SMTP entering state: 0 -1604430048[130ab40]: SMTP Response: 220 smtp.localserver.com ESMTP Exim 4.63 Mon, 31 May 2010 12:07:39 +0800 -1604430048[130ab40]: SMTP entering state: 14 -1604430048[130ab40]: SMTP Send: EHLO MacX61.local -1604430048[130ab40]: SMTP entering state: 0 -1604430048[130ab40]: SMTP Response: 250-smtp.localserver.com Hello MacX61.local [172.16.65.147] -1604430048[130ab40]: SMTP entering state: 0 -1604430048[130ab40]: SMTP Response: 250-SIZE 52428800 -1604430048[130ab40]: SMTP entering state: 0 -1604430048[130ab40]: SMTP Response: 250-PIPELINING -1604430048[130ab40]: SMTP entering state: 0 -1604430048[130ab40]: SMTP Response: 250-AUTH PLAIN LOGIN -1604430048[130ab40]: SMTP entering state: 0 -1604430048[130ab40]: SMTP Response: 250 HELP -1604430048[130ab40]: SMTP entering state: 4 -1604430048[130ab40]: SMTP entering state: 21 -1604430048[130ab40]: SMTP entering state: 16 -1604430048[130ab40]: Logging suppressed for this command (it probably contained authentication information) -1604430048[130ab40]: SMTP entering state: 0 -1604430048[130ab40]: SMTP Response: 235 Authentication succeeded -1604430048[130ab40]: SMTP entering state: 18 -1604430048[130ab40]: SMTP entering state: 3 -1604430048[130ab40]: SMTP Send: MAIL FROM:<accountsend@smtp.localserver.com> SIZE=6599804 -1604430048[130ab40]: SMTP entering state: 0 -1604430048[130ab40]: SMTP Response: 250 OK -1604430048[130ab40]: SMTP entering state: 5 -1604430048[130ab40]: SMTP Send: RCPT TO:<accountreceive@smtp.localserver.com> -1604430048[130ab40]: SMTP entering state: 0 -1604430048[130ab40]: SMTP Response: 250 Accepted -1604430048[130ab40]: SMTP entering state: 6 -1604430048[130ab40]: SMTP Send: DATA -1604430048[130ab40]: SMTP entering state: 0 -1604430048[130ab40]: SMTP Response: 354 Enter message, ending with "." on a line by itself -1604430048[130ab40]: SMTP entering state: 7 -1604430048[130ab40]: SMTP entering state: 8 The variable mailnews.tcptimeout and mail.server.serverN.timeout is useless. I modified that to 360. The problem occurs less than 10 seconds. Reproducible: Always Steps to Reproduce: 1.Attach a attachment bigger than 5MB 2.using a SMTP server that supports big attachments 3.Send the mail Actual Results: ================================ Send Message Error Sending of message failed. The message could not be sent because the connection to SMTP server **.**.**.** timed out. Try again or contact your network administrator. ================================= Expected Results: mail to be sent The variable mailnews.tcptimeout and mail.server.serverN.timeout is useless. I modified that to 360. The problem occurs less than 10 seconds.
Same problem as Bug 566548? Did you see problem with Tb 2.0, with same SMTP server?
Had the same problem. Pop and smtp servers of two different providers. Traced with an ip packet analyzer, randomly broke in the middle of message transfer. Resolved by a 'mailnews.tcptimeout' value of 100 000 ( microseconds ??? ). No effect with 10 000. Build identifier: Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.9.1.11) Gecko/20100707 SeaMonkey/2.0.6
(In reply to comment #3) > Had the same problem. Pop and smtp servers of two different providers. SMTP only problem? (Tb is sender) Or problem on POP3 too? (Tb is receiver) > Resolved by a 'mailnews.tcptimeout' value of 100 000 ( microseconds ??? ). > No effect with 10 000. IIRC, it was in seconds. How long did sending a mail or downloading a mail take with mailnews.tcptimeout=100000? If 10000 seconds is not effective but 100000 seconds works, sending should take at least 10000/3600=3 hours SeaMonkey forgets to multiply mailnews.tcptimeout by 1000 when requests timeout in milliseconds to TCP? > Traced with an ip packet analyzer, randomly broke in the middle of message transfer. Router issue like bugs already pointed in this bug? In those bugs, reported phenomenon was "connection loss" or timeout. If heavy packet loss occurs but re-transmission is successful, transmission ends sooner or later if sender waits response for sufficiently long time. Can smaller network.tcp.sendbuffer be a workaround in your case?
(In reply to comment #4) > SMTP only problem? (Tb is sender) > Or problem on POP3 too? (Tb is receiver) Both. > > Resolved by a 'mailnews.tcptimeout' value of 100 000 ( microseconds ??? ). > > No effect with 10 000. > > IIRC, it was in seconds. Thats' what I had expected, too. ( Conforming to tcp timeout of the socket layer. ) But with a value of 1000, there still was no remarkable gap between transmission interruption and error message. So I incresed the value more and more. With 10 000, I still get random error messages ( and can't plug off the network wire just some 0.1 seconds ). With 100 000 it works. Odd ... > How long did sending a mail or downloading a mail take with > mailnews.tcptimeout=100000? If 10000 seconds is not effective but 100000 > seconds works, sending should take at least 10000/3600=3 hours > SeaMonkey forgets to multiply mailnews.tcptimeout by 1000 when requests timeout > in milliseconds to TCP? Tested with a 7 MB attachment, about 80 seconds to upload and 15 seconds to download. From time to time "timeout" errors appeared within less than a second ( inbox check or small mails without attachment ). > Router issue like bugs already pointed in this bug? In those bugs, reported > phenomenon was "connection loss" or timeout. If heavy packet loss occurs but > re-transmission is successful, transmission ends sooner or later if sender > waits response for sufficiently long time. > Can smaller network.tcp.sendbuffer be a workaround in your case? Will try to test sendbuffer later. However, never seen a "connection loss" message. Also didn't notice any packet loss or delays longer than 170 ms in ip trace. Anyway, seamonkey should accept short connection interrupts and it doesn't.
(In reply to comment #4) > Can smaller network.tcp.sendbuffer be a workaround in your case? Tried. No, it can't.
It sounds mailnews.tcptimeout=100 == 100 milliseconds. When I reported bug 325649, I thought "mailnews.tcptimeout is in second". But it may be wrong. The value is directly passed to nsISocketTransport. > http://mxr.mozilla.org/seamonkey/source/mailnews/base/util/nsMsgProtocol.cpp#150 Log by SocketTransport is one like "poll timeout: 600". If "600 seconds", I think it's too long for polling timeout in networking. I now guess it's "600 milliseconds". If so, mailnews.tcptimeout is in millisecond, and it explains your test results. And, if it's right, default of 60 or 100 makes no sense in many environments... Can you get SMTP log and Socket log with timestamp and check log around timeout? > https://wiki.mozilla.org/MailNews:Logging > https://developer.mozilla.org/en/HTTP_Logging > SET NSPR_LOG_MODULES=timestamp,smtp:5,nsHostResolver:5,nsSocketTransport:5 See NSPR logs pointed in bug 325649 for example, please.
Attached file debug log file
... with options set to : NSPR_LOG_MODULES=timestamp,smtp:5,nsHostResolver:5,nsSocketTransport:5
(In reply to comment #8) > debug log file > nsSocketTransport::Init [this=238d5ca0 host=mail.sciod.eu:465 proxy=:0] Port=465. Does you SMTP use SSL? No problem like Bug 538283(CPU hog with SSL) in your evironment? > 2010-12-09 20:02:42.084000 UTC - 2[200e36c0]: poll timeout: 10000 > 2010-12-09 20:02:42.084000 UTC - 2[200e36c0]: timeout = 10000000 milliseconds > 2010-12-09 20:02:42.403000 UTC - 2[200e36c0]: ...returned after 361 milliseconds mailnews.tcptimeout looks "in second" as we expected. > 2010-12-09 20:02:42.405000 UTC - 2[200e36c0]: poll timeout: 10000 > 2010-12-09 20:02:42.405000 UTC - 2[200e36c0]: timeout = 10000000 milliseconds > 2010-12-09 20:02:42.405000 UTC - 2[200e36c0]: ...returned after 0 milliseconds Problem may be next discovered by bug 538283 comment #107. > That is, the PR_Poll function is only returning a false-positive when wait=true.
(In reply to comment #9) > (In reply to comment #8) > > debug log file > > > nsSocketTransport::Init [this=238d5ca0 host=mail.sciod.eu:465 proxy=:0] > > Port=465. Does you SMTP use SSL? No problem like Bug 538283(CPU hog with SSL) > in your evironment? Don't think so. Already have tested without encryption - nothing changed. Also don't get the 100% CPU symptom. > > > 2010-12-09 20:02:42.084000 UTC - 2[200e36c0]: poll timeout: 10000 > > 2010-12-09 20:02:42.084000 UTC - 2[200e36c0]: timeout = 10000000 milliseconds > > 2010-12-09 20:02:42.403000 UTC - 2[200e36c0]: ...returned after 361 milliseconds > > mailnews.tcptimeout looks "in second" as we expected. > > > 2010-12-09 20:02:42.405000 UTC - 2[200e36c0]: poll timeout: 10000 > > 2010-12-09 20:02:42.405000 UTC - 2[200e36c0]: timeout = 10000000 milliseconds > > 2010-12-09 20:02:42.405000 UTC - 2[200e36c0]: ...returned after 0 milliseconds > > Problem may be next discovered by bug 538283 comment #107. > > That is, the PR_Poll function is only returning a false-positive when > wait=true. Hmmm. What do you think about this lines : 2010-12-09 20:02:48.498000 UTC - 2[200e36c0]: STS poll iter [0] 2010-12-09 20:02:48.498000 UTC - 2[200e36c0]: active [0] { handler=238d5ca0 condition=0 pollflags=7 } 2010-12-09 20:02:48.498000 UTC - 2[200e36c0]: calling PR_Poll [active=1 idle=0] 2010-12-09 20:02:48.498000 UTC - 2[200e36c0]: poll timeout: 10000 2010-12-09 20:02:48.498000 UTC - 2[200e36c0]: timeout = 0 milliseconds 2010-12-09 20:02:48.498000 UTC - 2[200e36c0]: ...returned after 57594905 milliseconds 2010-12-09 20:02:48.498000 UTC - 2[200e36c0]: nsSocketTransport::OnSocketReady [this=238d5ca0 outFlags=-1] 2010-12-09 20:02:48.498000 UTC - 2[200e36c0]: socket timeout expired Should there ever appear such a big value ? It's simply calculated via before = PR_IntervalNow( ) ; PR_Poll( ... ) ; elapsed = PR_IntervalNow( ) - before ; ( http://mxr.mozilla.org/seamonkey/source/netwerk/base/src/nsSocketTransportService2.cpp#345 ) and would be between 10*1000 and 100*1000 ...
(In reply to comment #10) > ...returned after 57594905 milliseconds > socket timeout expired > Should there ever appear such a big value ? I believe you really waited for 16 hours. Log never lies :-) > It's simply calculated via > before = PR_IntervalNow( ) ; PR_Poll( ... ) ; > elapsed = PR_IntervalNow( ) - before ; > ( > http://mxr.mozilla.org/seamonkey/source/netwerk/base/src/nsSocketTransportService2.cpp#345 > ) and would be between 10*1000 and 100*1000 ... Wrap of QueryPerformanceCounter of MS Win(TSC is used. RDTSC instruction is used)? By bug 363258, Mozilla's timer routine uses QueryPerformanceCounter, i.e. uses TSC, uses RDTSC instruction, because problem of MS Win for multi-CPU around TSC skew was resolved by HAL update provided by MS. See Bug 563082 for it. However, AFAIN, and IIRC, issue around wrap of QueryPerformanceCounter(i.e. wrap o TSC) is not perfectly resolved. Is /usepmtimer a workaround in your MS Windows system? Please do Google search for setting /usepmtimer on your MS Win.
(In reply to comment #11) > I believe you really waited for 16 hours. Log never lies :-) :-) > Is /usepmtimer a workaround in your MS Windows system? It's Os/2 in a virtual machine hosted by win 7 on Core2 Duo T9600. A pretty particular environment. Perhaps I should try whether I can reproduce the symptoms on other platforms / environments. If the issue isn't platform-independent it could be not worth our time. What do you think ?
I don't think it could be not worth my time, because I'm intereted in VM on PC, and I'm also an OS/2 user, already-killed-died Warp 4.0 for my favarite EPM and Rexx only nowadays though :-) If VM on PC related issue, it needs to be analyzed. If CPU affiity of Tb under guest OS/2 can be set, try it. If CPU affiity of guest OS/2 can be set by host OS, try it. If only TSC skew relevant issue, CPU afinity should resolve problem. Windows 7 looks to support "Virtual TSC" and looks enabled by default. > http://communities.vmware.com/message/157217 As "Virtual TSC" is probably slower than real TSC, issue like wrap may easily happen. Next may alter phenomenon, if Virtual TSC relevant issue. Disable of Virtual TSC, with CPU affinity of OS/2 by host OS, with CPU affinity of Tb under OS/2 If power saving feature of CPU(acpi, power save by lowering CPU clock rate) can be killed by Win's power mamanagement stiing, disabling it + "disable Virtual TSC" may alter phenomenon. If Win 7(host OS)'s timer is relevant, /usepmtimer of Win 7(host OS) may alter phenomenon, as QueryPerformanceCounter of Win uses PMT instead of TSC by it. If OS/2 has /usepmtimer like feature(option to specifye hardware timer used as system timer), please try it.
I quickly read next document, but I couldn't find which timer OS/2 uses. > http://www.edm2.com/index.php/High_Resolution_Timing_under_OS/2 I don't think OS/2 uses TSC for system timer/counter(I hope routine around timer was not written by MS), I'm not sure tough. Instead, other issue such as one around timer0.sys is found in the document. As "...returned after 57594905", I suspected TSC skew relevant issue(counter is decreased==wrap==leap to future). But it may be issue under VM(may be Win 7 host only), because I don't know such issue on real machine except problem due to TSC skew on Win, and because probabilty of TSC use by OS/2 is low for me, and because TSC skew issue was already worked around by HAL of Win.
(In reply to comment #13) > I don't think it could be not worth my time, because I'm intereted in VM on PC, > and I'm also an OS/2 user, already-killed-died Warp 4.0 for my favarite EPM and > Rexx only nowadays though :-) Alright. > If VM on PC related issue, it needs to be analyzed. I wasn't precise, it's actually not a Virtual Machine, it's Virtual PC 2007. > If CPU affiity of Tb under guest OS/2 can be set, try it. > If CPU affiity of guest OS/2 can be set by host OS, try it. > If only TSC skew relevant issue, CPU afinity should resolve problem. Checked whether enabling hardware virtualisation ( via vanderpool ) makes a difference - no, doesn't. By the way, also tried same version of SeaMonkey ( 2.0.6 ) for XP in the VPC - works flawlessly. > If power saving feature of CPU(acpi, power save by lowering CPU clock rate) can > be killed by Win's power mamanagement stiing, disabling it + "disable Virtual > TSC" may alter phenomenon. > If Win 7(host OS)'s timer is relevant, /usepmtimer of Win 7(host OS) may alter > phenomenon, as QueryPerformanceCounter of Win uses PMT instead of TSC by it. > If OS/2 has /usepmtimer like feature(option to specifye hardware timer used as > system timer), please try it. OK, will look for timer-relevant settings in bios, host and guest ...
(In reply to comment #15) > it's Virtual PC 2007. Windows 7 is not reffered by next official page for Virtual PC 2007. > http://www.microsoft.com/windows/virtual-pc/ > Support / Looking for Virtual PC 2007? > => http://www.microsoft.com/windows/virtual-pc/support/virtual-pc-2007.aspx Officially supported configuration?
(In reply to comment #16) > Officially supported configuration? No, how you mentioned : Windows 7 isn't officially supported as host for VPC 2007. On the other side, Os/2 also isn't officially supported as guest for Windows VPC. By the way, I tested this setup toroughly and didn't recognize any drawbacks - until switching seamonkey from 1.x to 2.x. ( My intention to use VPC 2007 : it's not only possible to backup the system within less than 5 minutes ( by copying the virtual disk ), I also can use it at once on a XP workstation - with internet, mail, news, web development ( php, mySql, apache ), VisualAge C++, Rexx, image processing, and office applications, configuration, and current data. )
(In reply to comment #17) "Os/2 isn't officially supported as guest for Windows VPC" is not "OS/2 can never run". It's usually, conceptualy OS/2 can run as written for x86 series, but not tested, will not be tested, never changes logic for OS/2, no answer to OS/2 relevant questions, .... However, "Windows 7 isn't officially supported as host for VPC 2007" is "there is no garantee that VPC 2007 runs as expected" and "no change will be made even if problem occurs with this combination". > By the way, I tested this setup toroughly and didn't recognize any drawbacks - > until switching seamonkey from 1.x to 2.x. May be affected by change of Bug 363258.
"affected" : bug of VPC 2007 may be exposed by the change. problem with "Win 7 + VPC 2007" may be exposed by the change. the change on OS/2 is not torelant with VPC 2007's implementation.
(In reply to comment #18) > However, "Windows 7 isn't officially supported as host for VPC 2007" is "there > is no garantee that VPC 2007 runs as expected" and "no change will be made even > if problem occurs with this combination". OK, I see. Tested system on the XP host ( supported setup, XP prof. 32bit, AMD athlon 64bit ), same symptoms.
(In reply to comment #10) > http://mxr.mozilla.org/seamonkey/source/netwerk/base/src/nsSocketTransportService2.cpp#345 > before = PR_IntervalNow( ) ; PR_Poll( ... ) ; > elapsed = PR_IntervalNow( ) - before ; > http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/misc/prinrval.c#74 > return _PR_MD_GET_INTERVAL(); Win. > http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/md/windows/ntinrval.c#65 > return timeGetTime(); OS/2. > http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/md/os2/os2inrval.c#79 > 81 if (useHighResTimer) { > 84 APIRET rc = DosTmrQueryTime(&timestamp); > 88 /* Sadly, nspr requires the interval to range from 1000 ticks per > 89 * second to only 100000 ticks per second. DosTmrQueryTime is too > 90 * high resolution... > 91 */ > 92 top = timestamp.ulHi & _os2_highMask; > 93 top = top << (32 - _os2_bitShift); > 94 timestamp.ulLo = timestamp.ulLo >> _os2_bitShift; > 95 timestamp.ulLo = timestamp.ulLo + top; > 96 return (PRUint32)timestamp.ulLo; > 97 } else { > 99 DosQuerySysInfo(QSV_MS_COUNT, QSV_MS_COUNT, &msCount, sizeof(msCount)); > 100 return msCount; > 101 } According to next document, OS/2 seems to use PIT by DosTmrQueryTime(). > http://androidall.5d6d.com/archiver/tid-22139.html > On OS/2, DosTmrQueryTime() returns the time stamp counter from intel 8254 > timer (programmable interrupt timer). It may be problem of "Virtual PIT of VPC 2007" or problem of NSPR on OS/2 when PIT(Virtual PIT) wrapped. Do you know how to disable useHighResTimer option of OS/2?
(In reply to comment #21) > > On OS/2, DosTmrQueryTime() returns the time stamp counter from intel 8254 > > timer (programmable interrupt timer). > It may be problem of "Virtual PIT of VPC 2007" or problem of NSPR on OS/2 when > PIT(Virtual PIT) wrapped. > > Do you know how to disable useHighResTimer option of OS/2? Following the path you described set NSPR_OS2_NO_HIRES_TIMER=1 ( http://mxr.mozilla.org/comm-1.9.1/source/mozilla/nsprpub/pr/src/md/os2/os2inrval.c#57 ) does the job. Works on XP and win7 host - no more timeouts. Don't have a working mozilla build environment but can look how DosTmrQueryTime and DosTmrQueryFreq behaves on those systems.
FYI. Virtual PC 2007 himself offically supports Guest OS/2 although selected versions only. > http://en.wikipedia.org/wiki/Windows_Virtual_PC VMware's statement on Virtual PIT. > http://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf > Virtual PIT > VMware products fully emulate the timing functions of all three timers in the PIT device. I couldn't find such well document of "Virtual PC" via my Google search. I prefer guest OS/2 under Z/VM on Intel PC to guest OS/2 under Power/VM on PowerPC ThinkPad :-) Intel 8254. > http://en.wikipedia.org/wiki/Intel_8253
Stefan Schmohl's case(OS/2 under Virtual PC 2007 on Windows 7) from Comment #3 thru Comment #23 has been analyzed and solution of "SET NSPR_OS2_NO_HIRES_TIMER=1" has been found. It's very surprising because easy workaround was found by only two peoples without help of developers of Mozilla, from log of "...returned after 57594905" only, even though very special and hard-to-analyze case of "endangered species:OS/2 under ordinal Virtual PC 2007 on not-officially-supported Windows 7". Who wants to support such case :-) Thanks a lot to Bugzilla and bugzilla.mozilla.org and MXR. Retuning to original Comment #0. brushington2@sohu.com(bug opener): (1) Try smaller network.tcp.sendbuffer (restart Tb after change). Does problem still occur? (2) Get socket log too. > Win example : > SET NSPR_LOG_MODULES=timestamp,smtp:5,nsHostResolver:5,nsSocketTransport:5 Do you see funny timeout like Stefan Schmohl's case?
(Get thee out of General - though someone is welcome to move this to NSPR or whatever)
Component: General → Networking: SMTP
Product: Thunderbird → MailNews Core
QA Contact: general → networking.smtp
need more information per comment 24
Flags: needinfo?(brushington2)
Whiteboard: [closeme 2012-12-01]
Resolved per whiteboard
Status: UNCONFIRMED → RESOLVED
Closed: 13 years ago
Flags: needinfo?(brushington2)
Resolution: --- → INCOMPLETE
Whiteboard: [closeme 2012-12-01]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: