Open Bug 941861 Opened 7 years ago Updated 3 years ago

spaces in the local part of an email address are mangled (Tb shouldn't remove "space in quoted local part of mail address" in SMTP command, because such space is already absolutely proper in RFC5321 in addition to RFC5322/RFC2822)

Categories

(MailNews Core :: Networking: SMTP, defect)

defect
Not set
normal

Tracking

(Not tracked)

People

(Reporter: indigo, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131030 Firefox/17.0 Iceweasel/17.0.10 (Nightly/Aurora)
Build ID: 20131030041028

Steps to reproduce:

compose a message to:

    "i have spaces"@example.com



Actual results:

Message is delivered to

    "ihavespaces"@example.com

In detail, tcpdump captures the conversation between thunderbird (well, actually icedove in my case) and the mail server as:

    220 mail.macprofessionals.com Kerio Connect 8.0.2 ESMTP ready
    EHLO [XXXXXXXXXXXXX]
    250-mail.macprofessionals.com
    250-AUTH CRAM-MD5 DIGEST-MD5
    250-SIZE 52428800
    250-STARTTLS
    250-ENHANCEDSTATUSCODES
    250-8BITMIME
    250-PIPELINING
    250-ETRN
    250-DSN
    250 HELP
    AUTH CRAM-MD5
    334 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    235 2.0.0 Authentication successful
    MAIL FROM:<phil@macprofessionals.com> SIZE=387
    250 2.1.0 Sender <phil@macprofessionals.com> ok
    RCPT TO:<"ihavespaces"@example.com>
    250 2.1.5 Recipient <"ihavespaces"@example.com> ok (remote)
    DATA
    354 Enter mail, end with CRLF.CRLF
    Message-ID: <528E74AB.9050300@macprofessionals.com>
    Date: Thu, 21 Nov 2013 16:01:31 -0500
    From: Phil Frost <phil@macprofessionals.com>
    User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130922 Icedove/17.0.9
    MIME-Version: 1.0
    To: "i have spaces"@example.com
    Subject: test
    Content-Type: text/plain; charset=ISO-8859-1; format=flowed
    Content-Transfer-Encoding: 7bit


    .
    250 2.0.0 528e74fd-0005a751 Message accepted for delivery
    QUIT
    221 2.0.0 SMTP closing connection


Note that the envelope recipient does not match the To: header.


Expected results:

The envelope address should match the To: header and the address that I entered in the compose window. According to RFC 822, this is a perfectly valid address (see section 6.1). Also see RFC 2821 section 2.3.10:

"Consequently, and due to a long history of problems when intermediate hosts have attempted to optimize transport by modifying them, the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address."

Though RFC 2821 applies to SMTP hosts, I'd think the same reasoning should apply to programs composing messages.
(In reply to Phil Frost from comment #0)
> Though RFC 2821 applies to SMTP hosts, I'd think the same reasoning should
> apply to programs composing messages.

Attached is a part of rfc 2821, rfc 5322, and rfc 5324.
As for "local part" of mail address or mail box name, "space within a quoted string" is never permitted in SMTP(rfc 2821), and is permitted only as "Folding white space which is for folding" in rfc 5322(succesor of rfc 2822 which is successor of rfc 822).
In rfc822, which is "obsolete", treatment of "folding space(s) in quoted string which has meaning as a 'quoted string'" is pretty unclear, because any number of "folding space" can be inserted upon rfc 822 folding.

rfc 5322(rfc 58229 defines "string appears in mail data stream after folding is executed".
As for "valid 'quoted local part' in mail address or mail box name which is before any folding", I believe "space in qoted local part in mail address or mbox name" can be called "fault of server management people" because SMTP doesn't permit it, if it actually used in actually available mail address.

Tb shouldn't remove "space in quoted local part" silently upon SMTP send, and Tb should ask user for valid mail address, and Tb shouldn't send via SMTP if spaces is contained.
I disagree. From RFC 5234, which you cited:


   local-part      =   dot-atom / quoted-string / obs-local-part
   quoted-string   =   [CFWS]
                       DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                       [CFWS]
   qcontent        =   qtext / quoted-pair
   quoted-pair     =   ("\" (VCHAR / WSP)) / obs-qp
    WSP            =  SP / HTAB

Does that not mean that spaces are allowed, even in the non-obselete syntax, if it's part of a quoted-pair? Also valid anywhere inside a quoted-string:

   FWS             =   ([*WSP CRLF] 1*WSP) /  obs-FWS

The semantics of folding white space (FWS), from RFC 5322 (section 3.2.2):

"""
   Throughout this specification, where FWS (the folding white space
   token) appears, it indicates a place where folding, as discussed in
   section 2.2.3, may take place.  Wherever folding appears in a message
   (that is, a header field body containing a CRLF followed by any WSP),
   unfolding (removal of the CRLF) is performed before any further
   semantic analysis is performed on that header field according to this
   specification.  That is to say, any CRLF that appears in FWS is
   semantically "invisible".
"""

And section 2.2.3:

"""
   Unfolding is accomplished by simply removing any CRLF
   that is immediately followed by WSP.  Each header field should be
   treated in its unfolded form for further syntactic and semantic
   evaluation.
"""

As I read it, only the CRLF is semantically "invisible": the other whitespace is not. I read that the CRLF is optional in FWS: I take this to mean that anywhere folding whitespace is allowed, so is ordinary whitespace (without a CRLF) and that ordinary whitespace remains semantically visible. Just the CRLF is invisible after "unfolding".

I maintain that Thunderbird should accept "i have spaces"@example.com, and pass it, spaces intact, as the envelope-from address, and that there's nothing in any of the RFCs cited to suggest this syntax is even obselete, much less that it should be rejected or mangled.
(In reply to Phil Frost from comment #2)
> I maintain that Thunderbird should accept "i have spaces"@example.com, and
> pass it, spaces intact, as the envelope-from address, and that there's
> nothing in any of the RFCs cited to suggest this syntax is even obselete,
> much less that it should be rejected or mangled.

RFC 5322 permits "white space in quoted local part of mail address placed in mail data stream", but protocol named SMTP never permits "white space in quoted local part of mail address in RCPT: command".
How can mailer send such "white space in quoted local part of mail address in From: message header in mail data stream" to SMTP server?

If RFC5322(and RFC2822) header folding is used upon mail sending, "white space in quoted local part of mail address placed in mail data stream" can't happen, as far as mailer respects RFC 5322 and as far as original quoted local part doesn't have space.
If RFC822 header folding was used for pretty old mail and was applied at mid of quoted local part, "any number of inserted white spaces in quoted local part of mail address placed in mail data stream" might happened even when original quoted local part doesn't have space.
I think "permission of white space in quoted local part by RFC5322(and RFC2822)" is for tolerance with this old RFC822 folding in the past.

If a mail service company who permits mail sending by his customer via SMTP, the mail service company must not provide such "white space in quoted local part of mail address" to his customer.
If a mail service company actually want to use "white space in quoted local part of mail address", the mail service company should limit such mail address use within Web mail system only. If within Web mail system only, mail service company can do anything on mail.
(In reply to WADA from comment #3)
> RFC 5322 permits "white space in quoted local part of mail address placed in
> mail data stream", but protocol named SMTP never permits "white space in
> quoted local part of mail address in RCPT: command".
> How can mailer send such "white space in quoted local part of mail address
> in From: message header in mail data stream" to SMTP server?

Where are you reading this? You are using quotes like you are quoting something, but I can find no occurrence of "RCPT" or "mail data stream" in RFC 5322. I'm not sure specifically what you are citing by "protocol named SMTP". I'm guessing you mean RFC 2821:

""" (section 4.1.1.3)
Syntax:
  "RCPT TO:" ("<Postmaster@" domain ">" / "<Postmaster>" / Forward-Path)
                   [SP Rcpt-parameters] CRLF
"""

""" (excerpts from section 4.1.2)
Forward-path = Path
Path = "<" [ A-d-l ":" ] Mailbox ">"
Mailbox = Local-part "@" Domain
Local-part = Dot-string / Quoted-string
Quoted-string = DQUOTE *qcontent DQUOTE
"""

""" (at the top)
A companion document [32] discusses message headers, message bodies
  and formats and structures for them, and their relationship.
"""

"""
  [32] Resnick, P., Ed., "Internet Message Format", RFC 2822, April
       2001.
"""

And we've already been over RFC 2822. It allows spaces. You say that the mail data stream can contain addresses which are not valid in SMTP, which would seem difficult, given that RFC 2821 gets its syntax from the same RFC that defines the mail data stream.

RFC 2821 also says, right at the top:

"""
  This document is a self-contained specification of the basic protocol
  for the Internet electronic mail transport.  It consolidates, updates
  and clarifies, but doesn't add new or change existing functionality
  of the following:

  -  the original SMTP (Simple Mail Transfer Protocol) specification of
     RFC 821 [30],
"""

If we go to RFC 821, we find:

"""
RCPT <SP> TO:<forward-path> <CRLF>
<forward-path> ::= <path>
<path> ::= "<" [ <a-d-l> ":" ] <mailbox> ">"
<mailbox> ::= <local-part> "@" <domain>
<local-part> ::= <dot-string> | <quoted-string>
<quoted-string> ::=  """ <qtext> """
<qtext> ::=  "\" <x> | "\" <x> <qtext> | <q> | <q> <qtext>
<q> ::= any one of the 128 ASCII characters except <CR>,
                     <LF>, quote ("), or backslash (\)
<x> ::= any one of the 128 ASCII characters (no exceptions)
"""

That seems pretty plain to me: spaces (in fact *ANY* ASCII character) is allowed, though if it's CR, LF, ", or \, it must be escaped with a \.


> If RFC5322(and RFC2822) header folding is used upon mail sending, "white
> space in quoted local part of mail address placed in mail data stream" can't
> happen, as far as mailer respects RFC 5322 and as far as original quoted
> local part doesn't have space.

> If RFC822 header folding was used for pretty old mail and was applied at mid of quoted local part, "any number of inserted white spaces in quoted local part of mail address placed in mail data stream" might happened even when original quoted local part doesn't have space.

Huh? You seem to believe that headers can be folded anywhere, and unfolding removes all whitespace, as if I could do this:

Subject: the qui
 ck brown fox

to make a message with the subject "the quick brown fox". This is not true. As I cited in my previous comment, RFC 5234 says the CRLF is removed, *not* the other whitespace. Here it is again:

"""
   Unfolding is accomplished by simply removing any CRLF
   that is immediately followed by WSP.  Each header field should be
   treated in its unfolded form for further syntactic and semantic
   evaluation.
"""

The subject of the message above is "the qui ck brown fox". Likewise:

Subject: the quick
   brown fox

has a subject of "the quick   brown fox". Note, the extra spaces are preserved.

I can find no standard, current or obsolete, that allows folding to be done at an arbitrary location in a header. Every standard I can find allows folding only at places where there is already whitespace or where whitespace is optional, *and* the BNF syntax specification specifically allows folding whitespace. In *some* of these cases, the whitespace has no semantic value, for example, after the commas delimiting multiple addresses in a To: header. However, in a quoted string, it certainly does have semantic value, and RFC 5234 is quite clear that *only the CRLF* is removed, the other whitespace is preserved, and the semantic value of the whitespace is preserved through the folding-unfolding process.
IIUC rfc 2821, space(0x20, %x21 or %d32 in RFC) can appear as only quoted-pair(should be escaped by \).
Do you think "space in quoted local part in Mailbox in RCPT: command" is permitted by RFC if it's escaped by "\"?

  Localpart definition in rfc 2821 == I called "a definition of protocol named SMTP".
   http://tools.ietf.org/html/rfc2821#section-4.1.2

     Mailbox = Local-part "@" Domain

      Local-part = Dot-string / Quoted-string
            ; MAY be case-sensitive

 (as we are discussing about Quoted-string in Local-part, ignore Dot-string)

      Quoted-string = DQUOTE *qcontent DQUOTE
       (see rfc 5322 for qcontent)

   (in rfc 5234)

   qcontent        =   qtext / quoted-pair

   qtext           =   %d33 /             ; Printable US-ASCII
                       %d35-91 /          ;  characters not including
                       %d93-126 /         ;  "\" or the quote character
                       obs-qtext

   quoted-pair     =   ("\" (VCHAR / WSP)) / obs-qp

   (in rfc 5234)

         VCHAR          =  %x21-7E
                                ; visible (printing) characters

         WSP            =  SP / HTAB
                                ; white space

  (obsolete one, which is better tolerant with, but shouldn't generate)

   obs-qtext       =   obs-NO-WS-CTL

   obs-NO-WS-CTL   =   %d1-8 /            ; US-ASCII control
                       %d11 /             ;  characters that do not
                       %d12 /             ;  include the carriage
                       %d14-31 /          ;  return, line feed, and
                       %d127              ;  white space characters

   obs-qp          =   "\" (%d0 / obs-NO-WS-CTL / LF / CR)
For folding/unfolding.
  Definition is different between RFC5322\RFC2822 and RFC822.
  Problems due to RFC822 was corrected by RFC2822,
  and definition of RFC28822 is inherited by RFC5322.
It doesn't make sense to read 2821, which references 2822 for some elements, but then get those elements from 5322. Yes, 5322 obsoletes 2822, but 5321 obsoletes 2821. So, you can take the obsoleted 2821 & 2822, or the current 5321 & 5322.

If you take the 282x standards, then you are right, SMTP doesn't allow spaces in the same way as the message data, but it does still allow spaces, but only if escaped with a backslash. This is because RFC 2822 defines:

quoted-string   =       [CFWS]
                        DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                        [CFWS]

It's through this rule that whitespace is allowed in message addresses. But 2821 has a different definition of quoted-string, which doesn't allow FWS:

Quoted-string = DQUOTE *qcontent DQUOTE

RFC 2822 defines qcontent as:

qcontent        =       qtext / quoted-pair
quoted-pair     =       ("\" text) / obs-qp
text            =       %d1-9 /         ; Characters excluding CR and LF
                        %d11 /
                        %d12 /
                        %d14-127 /
                        obs-text

...and there's your mechanism to put a space in the local-part in SMTP. You just need to add a backslash.

This is quite silly when you think about it, because 2821 explicitly says it doesn't change RFC 821 (which pretty plainly does allow whitespace, control characters, and in fact anything but CR, LF, \ and " without any escaping inside a quoted-string and thus is somewhat self-contradictory.

This seems to be remedied in RFC 5321, which obsoletes 2821. Here, we get:

   Quoted-string  = DQUOTE *QcontentSMTP DQUOTE

   QcontentSMTP   = qtextSMTP / quoted-pairSMTP

   quoted-pairSMTP  = %d92 %d32-126
                    ; i.e., backslash followed by any ASCII
                    ; graphic (including itself) or SPace

   qtextSMTP      = %d32-33 / %d35-91 / %d93-126
                  ; i.e., within a quoted string, any
                  ; ASCII graphic or space is permitted
                  ; without blackslash-quoting except
                  ; double-quote and the backslash itself.

So, though the three SMTP RFCs differ somewhat in what they allow (cases allowing a space marked with *):

821:
  * without escaping: anything but CR LF " and \
  * with escaping: anything

2821 (by way of qcontent from 2822):
  without escaping: non-whitespace controls, printing characters
  * with escaping: anything but CR or LF

5321:
  * without escaping: any ASCII graphic or space except " and \
  * with escaping: same, plus " and \

ALL of the standards provide some mechanism to deliver to an address with a space in the local part. I think an argument could be made for escaping spaces with a \, or not. Of the three SMTP RFCs, two of them don't require spaces to be escaped, including the most current. And, 5321 says "the sending system SHOULD transmit the form that uses the minimum quoting possible." However, escaping the space is the only form allowed by all three standards, and 5321 (in the same sentence) says "all quoted forms MUST be treated as equivalent", so it shouldn't make a difference.

However, in NONE of the standards does it say MUAs can strip addresses of spaces, and given that it's perfectly possible to retain them (potentially quoting them with a \), I see no reason TB should reject a message containing an address with spaces.
Sorry, I wasn't aware of that rfc2821 was already replaced by rfc5321, as done on rfc2822 by rfc5322.
Reference to other rfc for definition in rfc2821, mismatches between rfc2821/rfc2822 looks corrected by rfc5321/rfc5322.

I guess "removal of space in quoted local part when SMTP RCPT: command" was based on rfc821 and or rfc2821. I guess that space was added at mid of quoted local part of mail address upon rfc822 folding(it was perhaps wrong action if rfc822 is correctly respected) by bad mailer or bad mail system in the past, then such space in quoted local part was removed by Tb upon SMTP send.

Following is an example of quirks on "fold and space at mid of quoted string of name value in Cotent-Type:".
> Content-Type: text/plain; name="ABC[CRLF]
> [SP]DEF.TXT"[CRLF]
What is correct actual name= parameter value?
- RFC5322(RFC2822) folding/unfolding : ABC DEF.TXT
- RFC822 folding/unfolding           : ABCDEF.TXT
It's impossible to know which folding was applied. So, Tb naturally applies widely used and current RFC5322(RFC2822) unfolding.
However, casual mail application or some bad mailer or some bad mail system generated name="ABC[CRLF][SP]DEF.TXT"[CRLF] for original name="ABCDEF.TXT"[CRLF], even after rfc822 had been obsoleted and RFC2822 was widely used for long time.
So, IIRC, Tb had quirks to "remove space after [CRLF] in quoted name parameter value, because "RFC5322(RFC2822) folding at a space in mid of quoted name parameter value" is very very rare, and because almost all name="ABC[CRLF][SP]DEF.TXT"[CRLF] case was originally name="ABCDEF.TXT"[CRLF] case or user's want on saved file name is ABCDEF.TXT instead of CORRECT and original ABC DEF.TXT.
I believe "silent removal of space in quoted local part of mail address for SMTP command" is simlar quirks to quirks on name parmeter, because I believe number of "wrong space in quoted local part of mail address in mail data stream" was far greater than number of "actual and mandatory space in quoted local part" cases. 

Because "space in quoted local part of mail address" is now absolutely proper in any of rfc5321 and rc5322, there is no need to remove "space in quted local part". It's rather invalid action and is apparent RFC violation by Tb.
And, as far as rfc5322 is respected by mailer or mail system and rfc5322 folding is correctly used by them, "excess space in quoted local part by folding" can't happen in mail data stream.

Question;
  Does all SMTP seever accept "space in quoted local part in RCPT: command"?
If majority of SMTP servers don't accept it based on obsolete RFCs, we are better to make "keeping space in quoted local part in SMTP command" optional.
Status: UNCONFIRMED → NEW
Component: Untriaged → Networking: SMTP
Ever confirmed: true
OS: Linux → All
Product: Thunderbird → MailNews Core
Hardware: x86_64 → All
Summary: spaces in the local part of an email address are mangled → spaces in the local part of an email address are mangled (Tb shouldn't remove "space in quoted local part of mail address" in SMTP command, because such space is already absolutely proper in RFC5321 in addition to RFC5322/RFC2822)
(In reply to WADA from comment #8)
> So, IIRC, Tb had quirks to "remove space after [CRLF] in quoted name
> parameter value, because "RFC5322(RFC2822) folding at a space in mid of
> quoted name parameter value" is very very rare, and because almost all
> name="ABC[CRLF][SP]DEF.TXT"[CRLF] case was originally
> name="ABCDEF.TXT"[CRLF] case or user's want on saved file name is ABCDEF.TXT
> instead of CORRECT and original ABC DEF.TXT.

That's interesting. I didn't know this was a quirk that existed. I wonder if this is why RFC 2821 doesn't allow spaces unless escaped with a \. However, in this case the input is coming from the user, which should be free of bad mailer quirks. Also, if TB can avoid folding in the middle of a quoted string, preferring to fold at higher-level syntactic elements where whitespace is irrelevant (such as between multiple addresses in To: or From:), then the issue is avoided, at least when TB is generating the mail. I know I read in one of these RFCs that this is what "SHOULD" be done. I think the line length limit is some 900 characters, so that would be a very long address to require folding in the middle of it.

> Question;
>   Does all SMTP seever accept "space in quoted local part in RCPT: command"?
> If majority of SMTP servers don't accept it based on obsolete RFCs, we are
> better to make "keeping space in quoted local part in SMTP command" optional.

I know that the two SMTP servers I operate (Kerio and exim) allow it. The use case I have for this is a mail to ticket-tracking-system gateway, where I can email QUEUNAME@bugs.example.com to create a ticket. Some of the queues have spaces in their names ("Setup Services"). If I submit the mail myself with telnet, it's delivered fine. I can't however get TB to submit the mail with spaces preserved in the RCPT command, hence this report.
(In reply to Phil Frost from comment #9)
> > Question;
> >   Does all SMTP seever accept "space in quoted local part in RCPT: command"?
> > If majority of SMTP servers don't accept it based on obsolete RFCs, we are
> > better to make "keeping space in quoted local part in SMTP command" optional.
> I know that the two SMTP servers I operate (Kerio and exim) allow it. The
> use case I have for this is a mail to ticket-tracking-system gateway, where
> I can email QUEUNAME@bugs.example.com to create a ticket. Some of the queues
> have spaces in their names ("Setup Services"). If I submit the mail myself
> with telnet, it's delivered fine. I can't however get TB to submit the mail
> with spaces preserved in the RCPT command, hence this report.

Thanks, it's evidence that default of mail.option_for_quirks.fully_respect_rfc5321_upon_sending_SMTP_command = true or mail.option_for_quirks.ignore_rfc5321_on_space_in_quoted_localpart_upon_sending_SMTP_command = false is acceptable.

Even after you provided actual data of "majority of actual/current SMTP servers surely respects RFC5321", I can't neglect existence of SMTP servers who don't know about or don't respect "replacement of rfc2821 by rfc5321 and replacement of rfc821 by rfc2822".
Please note that problem like following may occur by code change.
  1. Tb has problem like bug 317597 comment #22 what is produced by UW-IMAP.
     This is due to unpleasant setup of UW-IMAP server for Tb.
     Even though such setup is never RFC violation,
     many of major IMAP servers don't have such setup and they behaves as Tb expects.
     This can be called "actual Tb bug" because no RFC violation by server is involved.
  2. To resolve such problems, code was changed by bug 799821, and was shipped as Tb 20.
  3. Then problem of bug 859269 and bug 858062 started to occur.
  4. In change by bug 799821, no RFC violation by Tb is involved.
     Cause of "bug 859269 after change by bug 799821" is apparently incorrect setup of
     IMAP server.
     However, these bugs are "regression by Tb 20" for users who experienced these to new
     bugs, and impact of bug 859269 and bug 858062 was far severe than problem like
     bug 317597.
     So, change by bug 799821 was backed out.
If "change by bug 799821", which is never RFC violation by Tb, was optional, problem like "bug 859269 after change by bug 799821" could be pretty easily avoided by simply setting a-prefs-option=false.
Second example is problem like bug 933555 or bugs listed in dependency tree for meta bug 699681. If Reply-To-Self feature and "X-Accoutn-Key based identity selection" was optional, bug 933555 and many funny phenomena(which produces unwanted bug open by many users) could be avoided pretty easily. 
Third example is bug 939462. Because performance issue may occur, a new feature should have been optional.
Blocks: 208018
See Also: → 286760
Blocks: 631206
You need to log in before you can comment on or make changes to this bug.