360118 - Thunderbird SMTP client does not honor SMTP negative responses codes and state machine.

Reporter

Description

•

19 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 This TBIRD mail client problem begins with is hardcoded usage of domain literals for issuing the EHLO and HELO commands. See bug 244030. Since the IP addresses do not match, a SMTP server can potentially reject this transaction as shown to the case below. As you can see, TBIRD ignored the 501 error and continued with the fallback to the HELO command (should only be done on 500 response). It received another 501 response code, which TBIRD ignored and continues with the MAIL FROM command. This behavior can put users at risk with erroneous red flags marking them as SMTP spam abusers who don't follow the SMTP protocol state machine correctly. Real example session showing problem: Wildcat! SMTP Server v6.1.451.9 SMTP log started at Mon, 06 Nov 2006 19:14:50 Connection Time: 20061106 19:14:50 cid: 00007126 SSL Enabled: NO Client IP: 72.144.114.198 (unknown) 19:14:50 S: 220 winserver.com Wildcat! ESMTP Server v6.1.451.9 ready 19:14:50 C: EHLO [192.168.1.103] 19:14:50 S: 501 Invalid EHLO client address. 19:14:50 C: HELO [192.168.1.103] 19:14:50 S: 501 Invalid HELO client address. 19:14:50 C: MAIL FROM:<xxxxxxxx@xxxxxxxx.com> 19:14:50 S: 503 Send HELO first. 19:14:52 ** Completed Reproducible: Always Steps to Reproduce: 1. Install TBIRD on a workstation behind a NAT where the IP is 192.168.1.x. 2. Try to send mail to a SMTP server who is checking valid EHLO/HELP Domain. The Santronics Wildcat! SMTP server is such a server. Try sending mail to mail.winserver.com to see the behavior. 3. Capture the log session is possible with TBIRD to see this transaction Actual Results: Simply send to a SMTP server that is performing EHLO/HELO domain literal violations. Expected Results: Mail rejection. Stop the transaction at the first 501. However, related to bug 244030, TBIRD should offer a way to not use the DOMAIN LITERAL which is the initial reason for the rejection. Other MAIL CLIENTS offer a way to issue the local computer name for example.

Hector Santos

Reporter

Comment 1

•

19 years ago

This was my first bugzilla bug report so I wasn't sure how the final description would appear. It seems ok to me and hopefully knowledgeable SMTP people will easily follow the issue at hand. The solution to the problem comes in two forms: 1) TBIRD needs an option to not use the domain literal. This is discussed in bug 244030. 2) It needs to basically honor ALL response codes properly. - 500 for the EHLO should be used as the fallback to HELO. - 50x should not allow TBIRD to continue. It should issue QUIT and drop the connection.

David :Bienvenu

Updated

•

19 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Christian Eyrich

Assignee

Comment 2

•

19 years ago

You're right on that the address visible for the client often isn't the one the outside world sees. I can't think of an automatic solution and so manual configuration seems ok. Adding such an option has been discussed in bug 68877 some time ago, please also have a look there. Anyways, that's another bug. > As you can see, TBIRD ignored the 501 error and continued with the fallback to > the HELO command (should only be done on 500 response). I can see that the response code from our "HELO" isn't checked. I agree that's wrong. And after reading the relevant RFC chapters several times I think a 501 can be interpreted that we shouldn't even fall back. Tomorrow I'll try to write a patch that honors all EHLO/HELO response codes and issues meaningful messages.

Hector Santos

Reporter

Comment 3

•

19 years ago

Hi Chris, No doubt, following the proper SMTP response codes are paramount. Not only because it is a SMTP requirement and compliance is become more important these days, more importantly, we don't want TBIRD users to be erroneously flagged for improper SMTP client behavior. So resolving this now will eliminate all wrong client behavior flagging potential. Regarding the domain literal. Overall, I think a new UI option at the Outbound SMTP Server dialog will resolve this. If we want an UI option to govern this, it could be: [x] Perform PTR lookup for EHLO/HELO (New Default) But can we do this automatically without a UI change or a default UI condition? I believe the code in nsSmtpProtocol::GetUserDomainName() could be changed to offer an automated solution. I don't know the source code completely, but something like this outline: rv = socketTransport->GetSelfAddr(&iaddr); if (NS_SUCCEEDED(rv)) { // If IP is 127.0.0.1, use "LocalHost" // Otherwise perform PTR lookup VIA DNS (don't use GetHostByName(NULL)) // If NULL response, use the Local Computer Name } I think the above will suffice. Just don't use GetHostByName(NULL) because under this will return the local computer name or localhost or what's in the HOSTS file first. You need to do a real DNS client PTR lookup to bypass the HOSTS file. PS: You can use the smtp server MAIL.WINSERVER.COM to test this if you wish. -- HLS

Christian Eyrich

Assignee

Comment 4

•

19 years ago

Attached patch proposed patch — Details — Splinter Review

This one changes the fallback condition to only 500 and 502 responses. And it also adds a check for succeeded HELO. All errors that don't trigger the fallback will produce an alert that includes the server response for troubleshooting. Anything that I missed?

Assignee: mscott → ch.ey

Status: NEW → ASSIGNED

Hector Santos

Reporter

Comment 5

•

19 years ago

(In reply to comment #4) > Created an attachment (id=245220) [edit] > proposed patch > > This one changes the fallback condition to only 500 and 502 responses. The fallback for 500 and 502 are correct. As a side note, looking in our SMTP server code, I see we use to have a 502 which was deprecated to 500. See 4.2.4. We decided it was best this way since it was possible an client will not look for a 502 because either EHLO is supported or it is not. In any case, from TBIRD's standpoint, this would be correct to look for both 500 and 502. Any other negative response code should trigger an session abort. > And it also adds a check for succeeded HELO. All errors that don't > trigger the fallback will produce an alert that includes the > server response for troubleshooting. I am not sure what this means. I can't make heads and tails with the diff you provided. If you want to send me the actual changed *.CPP file, I will be glad to look at it. Overall, If anything else but a 250 is recieved, the client must not move to the next state. It can't get pass EHLO/HELO, then it must issue a QUIT, technically wait for a response and then close the connection. That is the graceful client end session transaction and the SMTP server world will be graceful for it. :-) > Anything that I missed? Is the Domain Literal Issue (BUG 244030) going to be addresed as well in this change? That issue is what propagated this errant state machine discovery in this first place. No doubt, this change can run on its own, but we might as well kill two birds with one stone. :-) I appreciate your time and effort in improving TBIRD. Thanks Hector Santos, CTO http://www.santronics.com

Christian Eyrich

Assignee

Comment 6

•

19 years ago

> Overall, If anything else but a 250 is recieved, the client must not move to > the next state. That's exactly what that patch does. Up to now no check for HELO response is present at all. > It can't get pass EHLO/HELO, then it must issue a QUIT Well, not issuing QUIT in error cases is a long standing bug in Mozilla code. Have to see if there already is a bug on that issue. > Is the Domain Literal Issue (BUG 244030) going to be addresed as well in this > change? No, different issue -> different patch in different bug.

Hector Santos

Reporter

Comment 7

•

19 years ago

>> It can't get pass EHLO/HELO, then it must issue a QUIT > > Well, not issuing QUIT in error cases is a long standing bug in Mozilla code. > Have to see if there already is a bug on that issue. argh. Why can't a QUIT be issued doing the class instance destructor. QUIT MUST be issued in all cases, good or bad. So a straight forward QUIT command in the destructor or some cleanup member function is sufficient. The reason is straight forward. While SMTP servers are required (a MUST) to have a IDLE timeout (a minimum of 5 minutes) at each STATE, the concept of having a DROP detector is an IMPLEMENTATION design issue, not a PROTOCOL requirement. Hence, when clients DROP a connection, it is quite possible for servers to have hanging sessions that will not time out until 5 minutes later. This can have an effect on scalability and load balancing issues at servers. I think it is a good idea to create another bug report to resolve this one. Hector Santos, CTO http://www.santronics.com

Christian Eyrich

Assignee

Comment 8

•

19 years ago

> Why can't a QUIT be issued doing the class instance destructor. Maybe (maybe! - I didn't have a look) the connection is already closed then. > I think it is a good idea to create another bug report to resolve this one. Just did a search and bug 62836 looks like the place to chat about that issue.

Christian Eyrich

Assignee

•

19 years ago

(In reply to comment #12) > Sorry Hector, I don't understand what you're talking about. You reported a > problem, I agreed the existance, provided a patch - so the (I think) proper > corrections are provided. Chris, I am just not familar with this Open Source method where there seems to be little managerial control from a "systems" point of view. Someone has to be the cog who puts this all together. TB has three problems: - This bug where it doesn't handle negative response correctly, - It uses a FIXED domain literal concept, and as we learned, - it doesn't issue QUIT correctly on failed states, leaving potential servers hanging in an IDLE state for 5 minutes. We are not SENDMAIL or other popular mail server vendors, but we been in the business for over 20 years and our server is well branded in the Windows world. I am actively involved in the IETF process and our server is very modern. I'm here, which is a short period, to help TB. I'm not here to go against any process you may have. So if there is a systems engineer that can solve these problems, that would be wonderful not only for TB, but for all SMTP vendors just as ourselves who have to deal with the support process with TB users fail to work. Better yet, TB will instantly get new marketing from all system operators who had problems with it and can not promote it from their users. Now they would be able to. Consider the potential with this thorn on the side removed. > Currently the patch needs review and super-review, If you can apply the patch to produce the full nsSMTPProtocol.cpp file, post it, I will gladly take a look at it. Thanks Hector Santos, CTO http://www.santronics.com

David :Bienvenu

Comment 14

•

19 years ago

Thx again, Christian. I missed your comment because I forgot to cc myself on the bug. I'll run with the patch and note my review.

David :Bienvenu

Updated

•

19 years ago

Attachment #245220 - Flags: review?(bienvenu) → review+

Christian Eyrich

Assignee

Comment 15

•

19 years ago

Marking fixed since checked in trunk and 1.8 branch.

Status: ASSIGNED → RESOLVED

Closed: 19 years ago

Resolution: --- → FIXED

Magnus Melin [:mkmelin]

Comment 16

•

19 years ago

Should be fixed1.8.1.1 then, no?

Keywords: fixed1.8.1.1

Christian Eyrich

Assignee

Comment 17

•

19 years ago

Thanks, something like that. Or fixed1.8.1 that David used in the other bugs from the same checkin? Sorry, I'm confused about all that sub-sub-versions.

David :Bienvenu

Comment 18

•

19 years ago

my bad - Scott told me yesterday we should use fixed1.8.1.1 now instead of fixed1.8.1 (though I don't know how much it matters since we'll have to query for both)