Closed Bug 437494 Opened 16 years ago Closed 13 years ago

Reports lost connection to outgoing mail server; tcp dump shows client sending invalid data (last [LF] or [CRLF] or .[CRLF] in final [CRLF].[CRLF] sequence is not passed to SMTP server)

Categories

(MailNews Core :: Networking: SMTP, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 8.0

People

(Reporter: karger, Assigned: jik)

References

Details

Attachments

(4 files, 3 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Build Identifier: 2.0.0.14 (20080421)

When I composed a particular email message (with a pdf attachment) and sent it, the progress bar hung at 96% progress and, after a timeout, thunderbird reported that it had lost the connection to the smtp server.  We tried sending the same message several times (and also composing a duplicate copy of the message) with the same results.  A tcpdump of the connection to the SMTP server showed thunderbird sending a line with a CR but no LF, violating the SMTP protocol.
Here's the tail end of the wireshark decode showing the terminating
".\r\n" missing its "\n" bit.

0002F05A  59 58 4a 30 65 48 4a 6c  5a 67 6f 78 4d 7a 4d 34 YXJ0eHJl ZgoxMzM4
0002F06A  0d 0a 4d 54 55 4b 4a 53  56 46 54 30 59 4b 0d 0a ..MTUKJS VFT0YK..
0002F07A  2d 2d 2d 2d 2d 2d 2d 2d  2d 2d 2d 2d 2d 2d 30 39 -------- ------09
0002F08A  30 35 30 31 30 39 30 35  30 35 30 33 30 34 30 34 05010905 05030404
0002F09A  30 39 30 34 30 34 2d 2d  0d 0a 2e 0d             090404-- ....
                                                                              
000001A9  34 32 31 20 4c 6f 73 74  20 69 6e 63 6f 6d 69 6e 421 Lost  incomin
                                                                              
000001B9  67 20 63 6f 6e 6e 65 63  74 69 6f 6e 0d 0a       g connec tion..

We then changed the recipient address, and the mail went out successfully.  Afterwards, I was able to send a copy of the email to the intended recipient.  Since then the bug has not recurred.


Reproducible: Couldn't Reproduce

Steps to Reproduce:
1.
2.
3.
Actual Results:  
Client timed out and reported lost connection to smtp server.

Expected Results:  
Client should have delivered mail.

Here's the tail end of the wireshark decode showing the terminating
".\r\n" missing its "\n" bit.

0002F05A  59 58 4a 30 65 48 4a 6c  5a 67 6f 78 4d 7a 4d 34 YXJ0eHJl ZgoxMzM4
0002F06A  0d 0a 4d 54 55 4b 4a 53  56 46 54 30 59 4b 0d 0a ..MTUKJS VFT0YK..
0002F07A  2d 2d 2d 2d 2d 2d 2d 2d  2d 2d 2d 2d 2d 2d 30 39 -------- ------09
0002F08A  30 35 30 31 30 39 30 35  30 35 30 33 30 34 30 34 05010905 05030404
0002F09A  30 39 30 34 30 34 2d 2d  0d 0a 2e 0d             090404-- ....
                                                                              
000001A9  34 32 31 20 4c 6f 73 74  20 69 6e 63 6f 6d 69 6e 421 Lost  incomin
                                                                              
000001B9  67 20 63 6f 6e 6e 65 63  74 69 6f 6e 0d 0a       g connec tion..
(In reply to comment #0)
> A tcpdump of the connection to the SMTP server showed
> thunderbird sending a line with a CR but no LF,
> violating the SMTP protocol.

TCP/IP level, data of "...[CR][LF]..." can be sent in "...[CR]" and "[LF]...".
This can not be a SMTP protocol violation.
I suspect this case, because you say following.
> We then changed the recipient address, and the mail went out successfully.
If this case, following can be explained too, because Message-ID: header length can vary upon next composition of mail to same recipient.
> Afterwards, I was able to send a copy of the email to the intended recipient.
> Since then the bug has not recurred.

Another suspect.
AFAIK, Tb encodes [CR] & [LF] & [HTAB] etc. in attached data as expected even when Quoted-Prontable is requested by user. However, some mailer doesn't encode special data when Quoted-Printable. If stand alone [CR] or [LF] exists in data, and if it is not encoded by the mailer, and if the mail is attached or forwarded by Tb, stand alone [CR] or [LF] may be sent to SMTP server by Tb(When APPEND of IMAP, it's true). Because you say "problem did not occur when composed again", this wouldn't produce "SMTP violation" detection by SMTP server, even if this problem existed. However, if similar situation like above("...[CR]..." is sent in "...[CR]" and "...") occurred at same time, it may produce "SMTP violation" detection by SMTP server.

(Q1) What packet was sent from PC to SMTP server after following [LF] missing packet?
> 0002F09A  30 39 30 34 30 34 2d 2d  0d 0a 2e 0d             090404-- ....

FYI.
Bug 301010 is stand alone [LF] related issue when IMAP.
  A mailer violates RFC, and SMTP passed thru the violation, and IMAP of MS
  Exchange passed thru the violation, and Thunderbird passed thru the violation,
  then earnest IMAP server complaints.
(In reply to comment #1)
> (Q1) What packet was sent from PC to SMTP server after following [LF] missing
> packet?
> > 0002F09A  30 39 30 34 30 34 2d 2d  0d 0a 2e 0d             090404-- ....

I can make a good guess at that, nothing.

David, having looked at some of the Thunderbird code relating to that area yesterday, you are indeed correct that it should be sending <CRLF>.<CRLF>

Is there any chance you could save a .eml copy of the message and post it here (or email it to me direct?) for analysis. Obviously you may not want to do this if it contains sensitive information, in which case, could you provide some more information about the message, such as:

- The overall size of the message
- The length of the Message To: line before and after the sender change
- Was it just a pdf attachment, if so, how long was it?
- What text was there in addition to the pdf attachment? Was is plain-text, HTML or both?

I certainly have the tools to debug this, but having the message would help a lot.
(In reply to comment #3)
> I can make a good guess at that, nothing.

Tb forgets to send the last [LF] upon final send of "...<last_mail_data_line>[CRLF].[CRLF]", if TCP data block is splited to "...<last_mail_data>[CRLF].[CR]" and "[LF]"?

If so, and if this is the reason of many reports of following phenomenon, why the mail is sent to destination by SMTP server?
> SMTP mail send fails(then copy of sent mails is not saved in Sent folder),
> but the mail is sent by SMTP server to destination.
Many such reports with SMTP log look to be "no OK response from SMTP server to final '.[CRLF]' by Tb" case.
(In reply to comment #3)
> I can make a good guess at that, nothing.

Mark Banner, can same issue occur on IMAP APPEND?
(If yes, issue of "Copying to Sent Folder..." when IMAP is possibly explained.)
(In reply to comment #5)
> (In reply to comment #3)
> > I can make a good guess at that, nothing.
> 
> Mark Banner, can same issue occur on IMAP APPEND?
> (If yes, issue of "Copying to Sent Folder..." when IMAP is possibly explained.)
> 
I don't know, I doubt it IMAP is a different protocol.

Please let's not make unnecessary hypothesis, if someone can pass me a message that reproduces this error then yes, we'll debug it and maybe we can use it for debugging other things as well. Without something to start from, I think it could be quite hard to debug.
I have attached the tcpdump we were running when we observed the bug.
Hello,

I think I have the same problem. Mails are not sent, the connection timeouts. tcpdump shows that the final \r\n is missing in the connection:

	0x0020:  5018 f91a 1d14 0000 2f68 746d 6c3e 0d0a  P......./html>..
	0x0030:  0d0a 0d0a 2d2d 474e 522d 424f 554e 4441  ....--GNR-BOUNDA
	0x0040:  5259 2d34 3344 4653 6964 6a78 616b 396c  RY-43DFSidjxak9l
	0x0050:  532d 2d0d 0a0d 0a0d 0a0d 0a2d 2d2d 2d2d  S--........-----
	0x0060:  2d2d 2d2d 2d2d 2d2d 2d30 3430 3730 3730  ---------0407070
	0x0070:  3130 3130 3930 3030 3230 3230 3330 3430  1010900020203040
	0x0080:  302d 2d0d 0a2e                           0--...

There should be a 2d0d after the 2e (.) at the end of the packet.

After this packet is acknowledged by the server nothing happens until thunderbird closes the connection to the server after the timeout (~30 seconds later).

I can't attach the whole dump because the mail where the problem occurs contains sensitive information.
I meant there should be a "0d0a" after the 2e not a "2d0d".
Component: General → Networking: SMTP
Product: Thunderbird → MailNews Core
QA Contact: general → networking.smtp
I haven't seen the bug for quite some time. Likely, this is because I
split my inbox in two---thunderbird simply wasn't able to handle a 30K
message inbox.  I can't at this point replicate the bug.
wada, leaving next step of this one to you :)
Target Milestone: --- → Thunderbird 3.1b1
Target Milestone: Thunderbird 3.1b1 → ---
(In reply to comment #11)

Two kinds of TCP dump data(evidence), one="[CRLF].[CR]" and other="[CRLF]." instead of correct "[CRLF].[CRLF]", are already available. And, bug opener already attached TCP dump file to this bug. i.e. Answers to questions in comment #3 is already available in this bug. And, short comments by developer are already posted to this bug. I think we QA people can do nothing on this bug. Ball is in developer's hands.
Summary: Reports lost connection to outgoing mail server; tcp dump shows client sending invalid data → Reports lost connection to outgoing mail server; tcp dump shows client sending invalid data (last [LF] or [CRLF] in final [CRLF].[CRLF] sequence is not passed to SMTP server)
Poster on GS has same problem, has tcpdump downloadable via HTTP.  tcpdump shows final LF missing (like problem reporter case).  Occurred on both Windows and Ubuntu; OS/TB versions, link to tcpdump, etc. available -- see http://getsatisfaction.com/mozilla_messaging/topics/sending_message_with_attachment_hangs_at_99

"Platform" should probably be changed to "All", and status to "NEW"....
Severity: normal → major
OS: Windows XP → All
Hardware: x86 → All
(In reply to comment #7)
> TCPDUMP of buggy interaction

Mail size.
> MAIL FROM:<abcdef@ghi.jkl> SIZE=192510
tcpdump for GetSatisfaction poster shows:
> MAIL FROM:<xxxxxx@shadowweb.org> SIZE=929790

159 bytes of greeting, auth, and envelope data were sent by TB.  929790 + 159 + 3 (.CRLF) = 929952, but only 929951 bytes transmitted (FIN has relative seq# of 929952).  No period-stuffing (RFC2821, 4.5.2) appears to have occurred, or to have been necessary.
FYI. Delta of SIZE in 4KB = (929790-192510)/4096 = 180
(In reply to comment #17)
> FYI. Delta of SIZE in 4KB = (929790-192510)/4096 = 180

Do you mean to imply that if SIZE (MOD 4096) = 4094, the problem occurs ?
(In reply to comment #18)
> Do you mean to imply that if SIZE (MOD 4096) = 4094, the problem occurs ?

Near, but slightly different.
I'm suspecting problem when SIZE=(4096*A)+B, where A is integer and B is same or very near.
  929790 = 4096*226 + 4094
  192510 = 4096*46  + 4094
When last data is 4094 bytes, "." and [CR] is put in the last 4KB buffer, but final [LF] should be put in next 4KB buffer. If Tb's fault, Tb may forget to send orphaned [LF] in next buffer or may fail to send correctly(misinterpret length=1 in next buffer as no data to send).
Loss of final [CRLF] case may be SIZE=(4096*A)+4095 case.
Thomas Stangner(reporter of the getsatisfaction.com article) has reported next.
  (SIZE=929792)=929790+2=(4096*226)+4096=(4096*227)+0 : .[CRLF] was not sent.
Confirig per attached data, although Tb's bug or Socket/TCP bug is unclear yet.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Reports lost connection to outgoing mail server; tcp dump shows client sending invalid data (last [LF] or [CRLF] in final [CRLF].[CRLF] sequence is not passed to SMTP server) → Reports lost connection to outgoing mail server; tcp dump shows client sending invalid data (last [LF] or [CRLF] or .[CRLF] in final [CRLF].[CRLF] sequence is not passed to SMTP server)
Problem was reproduced by Tb 3.1.1, with mail of next size.
> MAIL FROM:<soarex@ops.dti.ne.jp> SIZE=8190
Checked with smtp.gmail.com(SSL) and smtp.ops.dti.ne.jp(no SSL).

[Steps to reproduce]
1. Save attached file to mail directory, Restart Tb,
   "Repair Folder" of folder properties if required.
2. View mail, Edit As New, Send => Send doesn't complete
Change last line length if unwanted SIZE is produced by Tb, or for other SIZE case.
Attached TCP dumps look to cover all cases. 
(1) MAIL FROM:<a@p.q.r.s> SIZE=8189(4096*1+4093): No problem
(2) MAIL FROM:<a@p.q.r.s> SIZE=8190(4096*1+4094): Problem occurred(LF is lost)
(3) MAIL FROM:<a@p.q.r.s> SIZE=8191(4096*1+4095): No problem
(4) MAIL FROM:<a@p.q.r.s> SIZE=8192(4096*1+4096): Problem ocurred(.CRLF is lost)
(5) MAIL FROM:<a@p.q.r.s> SIZE=8193(4096*1+4097): No problem

Note: When you test with attached mail, change To: to your mail addr, please.
Summary: Reports lost connection to outgoing mail server; tcp dump shows client sending invalid data (last [LF] or [CRLF] or .[CRLF] in final [CRLF].[CRLF] sequence is not passed to SMTP server) → Reports lost connection to outgoing mail server; tcp dump shows client sending invalid data (last [LF] or .[CRLF] in final [CRLF].[CRLF] sequence is not passed to SMTP server)
(In reply to comment #8)
>     0x0020:  5018 f91a 1d14 0000 2f68 746d 6c3e 0d0a  P......./html>..
>     0x0030:  0d0a 0d0a 2d2d 474e 522d 424f 554e 4441  ....--GNR-BOUNDA
>     0x0040:  5259 2d34 3344 4653 6964 6a78 616b 396c  RY-43DFSidjxak9l
>     0x0050:  532d 2d0d 0a0d 0a0d 0a0d 0a2d 2d2d 2d2d  S--........-----
>     0x0060:  2d2d 2d2d 2d2d 2d2d 2d30 3430 3730 3730  ---------0407070
>     0x0070:  3130 3130 3930 3030 3230 3230 3330 3430  1010900020203040
>     0x0080:  302d 2d0d 0a2e                           0--...
> 
> There should be a 2d0d after the 2e (.) at the end of the packet.

I couldn't reproduce this case(loss of final [CRLF]) with my test mail which has two [CRLF]s at end of mail(Tb adds last [CRLF] by mail copy).
Same tcp dump data is reported to Bug 583192.
I guess 4046*A+4095(orphaned [CRLF]) case.
To see this case, next was required.
(1) Create SIZE=8193 mail from attached test mail => SIZE=8193, No problem.
   (single mail in a mail folder) 
> 1234567890123456789012[CRLF]
> [CRLF]
(3) Remove last [CRLF] => SIZE=8191
> 1234567890123456789012[CRLF]
(4) Restart Tb, View mail, Edit as new, Send => Send doesn't complete.
Summary: Reports lost connection to outgoing mail server; tcp dump shows client sending invalid data (last [LF] or .[CRLF] in final [CRLF].[CRLF] sequence is not passed to SMTP server) → Reports lost connection to outgoing mail server; tcp dump shows client sending invalid data (last [LF] or [CRLF] or .[CRLF] in final [CRLF].[CRLF] sequence is not passed to SMTP server)
Checked socket level log too. 
> https://wiki.mozilla.org/MailNews:Logging
> https://developer.mozilla.org/en/HTTP_Logging
> SET NSPR_LOG_MODULES=timestamp,smtp:5,nsSocketTransport:5

Tb's SMTP and socket layer code requests send of mail data using 4KB buffer
after receive of '354 Enter mail, end with "." on a line by itself'.
Loss of a part of .CRLF can be observed by data length passed by PR_Write.
(1) Size=8190=4096+4094: 2*4096 bytes is sent. last LF in .CRLF is lost
    calling PR_Write [count=4096], PR_Write returned [n=4096]
    calling PR_Write [count=4096], PR_Write returned [n=4096]
(2) Size=8191=4096+4095: 2*4096 bytes is sent. last CRLF in .CRLF is lost
    calling PR_Write [count=4096], PR_Write returned [n=4096]
    calling PR_Write [count=4096], PR_Write returned [n=4096]
(3) Size=8192=4096+4096: 2*4096 bytes is sent. last .CRLF is lost
    calling PR_Write [count=4096], PR_Write returned [n=4096]
    calling PR_Write [count=4096], PR_Write returned [n=4096]
(4) Size=8193=4096+4097: 2*4096 bytes + 4 bytes is sent. final .CRLF is sent
    calling PR_Write [count=4096], PR_Write returned [n=4096]
    calling PR_Write [count=4096], PR_Write returned [n=4096]
    calling PR_Write [count=4],    PR_Write returned [n=4]
Total bytes of PR_Write should be Size + 3 bytes(for added .CRLF).
I ran into this bug last night at around 9:15pm. I've now spent the last 6.25 hours tracking it down, and it's 3:33am. My wife is going to kill me in the morning. :-)

I am, however, delighted to say that I think I've figured out the bug, and I've even got a fix that seems to work. Whether it's the *right* fix is another question entirely.

WADA's guess about the cause of the problem was spot on. When Thunderbird is done transmitting the message data to the server, nsMsgAsyncWriteProtocol::SendData in nsMsgProtocol.cpp gets called, from nsMsgAsyncWwriteProtocol::PostDataFinished, with the byte sequence [. CR LF] to terminate the message data.

Unfortunately, if there's enough room in the output pipe for only part of the byte sequence, then the part that there's room for gets queued up, but the rest gets discarded.

The fix I came up with is to notice when this happens and queue up the data that hasn't been sent yet so that it is sent when room becomes available in the output pipe. As I said, I don't know if this is the best or correct fix, but it appears to work.

Some other notes and caveats about this:

* I don't think this just happens when the [. CR LF] is split and only some of it can be sent. I think it could also be the case, in very rare circumstances, that the output buffer is exactly full, in which case the entire [. CR LF] sequence will be lost.

* It's not just the terminating [. CR LF] sequence that can be lost. It's any command sent to the server with SendData when the output pipe is full. That's just theoretical, though, since the only time when the output pipe is likely to be full is at the end of the DATA phase,because the SMTP protocol is a short exchange-response protocol.

Have mercy on my code. Like I said, I've been at it for >6 hours and it's now almost 4am. Ugh!
Attachment #545052 - Flags: review?(mbanner)
This is a better version of my patch. It fixes two additional issues:

* There was another edge condition that my prior patch didn't take into account -- what happens if the output pipe is all the way full, not just mostly full, when SendData is called?

* The code had another bug which this new patch addresses -- a race condition which could cause the final . CR LF to be sent in the middle of the message rather than at the end, if SendData is called with . CR LF before the input file has been completely drained into the output pipe.

I fix both of these issues as well as the previously reported ones in this new patch by funneling *all* output transmitted through SendData through OnOutputStreamReady, the same callback function through which the message data is transmitted. This guarantees sequential handling of the output and eliminates all the full buffer issues since OnOutputStreamReady knows how to handle them.
Attachment #545052 - Attachment is obsolete: true
Attachment #545086 - Flags: review?(mbanner)
Attachment #545052 - Flags: review?(mbanner)
Comment on attachment 545086 [details] [diff] [review]
new patch which fixes the issue reported in this bug as well as several others

I'm glad you've found the cause. I'm travelling over the next few days, so I'm going to pass this over to David in the hopes that he can review it sooner.
Attachment #545086 - Flags: review?(mbanner) → review?(dbienvenu)
Assignee: nobody → jik
Status: NEW → ASSIGNED
Comment on attachment 545086 [details] [diff] [review]
new patch which fixes the issue reported in this bug as well as several others

the patch looks reasonable, and SMTP seems to work fine in the product, but unfortunately, test_smtpPasswordFailure3.js fails reliably with the patch applied. I'm not sure if it's an issue with the test, or if this patch somehow broke our smtp password failure handling (which seems unlikely). I'll verify that the test passes without the patch, in my tree, though I'm fairly sure it will. And I'll try to figure out why the test is failing.
Attachment #545086 - Flags: review?(dbienvenu) → review-
I wasn't clearing the new buffer I added when closing the socket, so there was text leftover in the buffer which was being sent the next time the SMTP connection was opened. This new patch fixes that issue.
Attachment #545086 - Attachment is obsolete: true
Attachment #545223 - Flags: review?(dbienvenu)
Comment on attachment 545223 [details] [diff] [review]
patch which fixes the unit test failure

this misses the .h changes from the prev patch - I can reconstruct for testing purposes, but we'll need the cumulative fix before checkin...
Dang, sorry about that. Still getting the hang of mercurial :-)
Attachment #545223 - Attachment is obsolete: true
Attachment #545248 - Flags: review?
Attachment #545223 - Flags: review?(dbienvenu)
Attachment #545248 - Flags: review? → review?(dbienvenu)
Comment on attachment 545248 [details] [diff] [review]
patch that includes chunk from .h file

thx, yeah, this works better.
Attachment #545248 - Flags: review?(dbienvenu) → review+
Keywords: checkin-needed
Checked in: http://hg.mozilla.org/comm-central/rev/f66eb5d6026d
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 8.0
See Also: → 1222046
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: