Closed Bug 486682 Opened 12 years ago Closed 12 years ago

Thunderbird creates invalid RFC2047 encoded Words in filename Headers with specific file length

Categories

(MailNews Core :: MIME, defect)

1.8 Branch
x86
Windows XP
defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0b3

People

(Reporter: harakiri_23, Assigned: emk)

References

Details

Attachments

(1 file, 6 obsolete files)

User-Agent:       Opera/9.64 (Windows NT 5.1; U; en) Presto/2.1.1
Build Identifier: version 2.0.0.16 (20080708)

***Note this is a serious issue because standard conform filters and products
will not be able to recognize attachments by their filename extension, the
bug is also present in engimail but is imho the fault of thunderbird im
crossposting***


When a user attaches a file with a specific minimum length and at least one
non-usascii character, thunderbird encodes the filename/name field incorrect.

Example:

The file

ä1234567890123456789012345678901234567890123456789.txt

is correctly after encryption ISO encoded to

=?ISO-8859-1?Q?=E41234567890123456789012345678901234567890123456789=2Etxt?=

however the filename with just one more character

ä12345678901234567890123456789012345678901234567890.txt

is incorrectly word encoded to 

=?ISO-8859-1?Q?=E412345678901234567890123456789012345678901234567890=2Etx?==?ISO-8859-1?Q?t?=

please notice that the last character in this example is encoded again (t) you
can see the string ISO-8859-1?Q? followed by t and ?=

First off, this encoding is not needed for us-ascii characters, second you only
need one encoding at the beginning of the file name.

Lastly - a client which according to RFC 2047
http://www.faqs.org/rfcs/rfc2047.html decodes the text again will not receive
the expected result.

According to when  RFC 2047 you use an encoding multiple times in one header it
MUST be separated with a space this is needed that applications know where the
a new encoding starts and where it is just interpreted as text.

So in our example - the wrongly encoded filename

=?ISO-8859-1?Q?=E412345678901234567890123456789012345678901234567890=2Etx?==?ISO-8859-1?Q?t?=

will be decoded to 

ä12345678901234567890123456789012345678901234567890.tx=?ISO-8859-1?Q?t?=

as you can see the last part will not be decoded by RFC 2047 because it is
interpreted as text and not encoding. If you have to use multiple encodings in
one header (which is again, not needed) you MUST specify a space - the
following would be the correct header:

=?ISO-8859-1?Q?=E412345678901234567890123456789012345678901234567890=2Etx?= =?ISO-8859-1?Q?t?=

notice the space after =


Quoting the RFC

" white space MUST appear between an encoded-word' and surrounding text"

also

"  Even multiple SPACEs between 'encoded-word's are ignored for the purpose of
display."

receiving client will just ignore this space when decoding the filename - so it
will be valid again.

As i said in the beginning, this is a serious issue - standard confirm filters
which look at the attachment header and filename will not interpret
tx=?ISO-8859-1?Q?t?= as "txt" file extension or for enigmail "pgp" extension

This problem seems to occur only for filenames which are at least 55 characters
(including file extension) and have a non us-ascii character - because only
then will =?ENCODING be used at all.

The problem occurs because through encoding the header size is above 75 - what thunderbird does is create another =?ENCODING in the header - please note RFC:

" While there is no limit to the length of a multiple-line header
   field, each line of a header field that contains one or more
   'encoded-word's is limited to 76 characters."

So it seems to be correct to create a new "encoded word" but what is incorrect, is that the very important space between the encoded words are forgotten!

Severity Major because filters cannot correctly decode the filenames anymore!

Reproducible: Always

Steps to Reproduce:
1. Attach a file with non usascii char and at least 55 characters like ä12345678901234567890123456789012345678901234567890.txt
2. Send your message
3. Observe the incorrectly encoded filename headers (no space between multiple word encodings)
Reference to Enigmail bug

https://www.mozdev.org/bugs/show_bug.cgi?id=20770

sorry, the "encryption" in this report is wrong because i copy&paste it from enigmail - it should be 

"is correctly ISO encoded to" instead of "is correctly after encryption ISO encoded to"
I don't see any relation to Enigmail; I can reproduce the behavior with Enigmail NOT installed (not claiming that the behavior is correct or not).
Is the issue also present in recent Thunderbird 3.0 beta builds ?
Component: General → MIME
Product: Thunderbird → MailNews Core
QA Contact: general → mime
Version: unspecified → 1.8 Branch
(In reply to comment #0)
> ä12345678901234567890123456789012345678901234567890.txt

Following header was created by Tb trunk 2009/3/31 build on MS Win.
Problem occurs on Tb trunk too. Confirming.

> Severity Major because filters cannot correctly decode the filenames anymore!

name in Content-Type: is not official parameter.
Encoding of filename in Content-Disposition: with RFC 2047 is incorrect behaviour.
And, many recent mail clients use filename in Content-Disposition: header.
RFC compliant header is header produced by mail.strictly_mime.parm_folding=2, isn't it?

(A. mail.strictly_mime.parm_folding=3, phenomenon was observed on name only)
> Content-Type: text/plain;
> name="=?ISO-8859-1?Q?=E412345678901234567890123456789012345678901234567890=2Etx?==?ISO-8859-1?Q?t?="
> Content-Disposition: attachment;
>  filename*0*=ISO-8859-1''%E4%31%32%33%34%35%36%37%38%39%30%31%32%33%34%35;
>  filename*1*=%36%37%38%39%30%31%32%33%34%35%36%37%38%39%30%31%32%33%34%35;
>  filename*2*=%36%37%38%39%30%31%32%33%34%35%36%37%38%39%30%2E%74%78%74

(B. mail.strictly_mime.parm_folding=2)
No name parameter on Content-Type: header.
Same Content-Disposition: header as mail.strictly_mime.parm_folding=3 case.

(C. mail.strictly_mime.parm_folding=1 : folded as expected)
> Content-Type: text/plain;
> name="=?ISO-8859-1?Q?=E412345678901234567890123456789012345678901234567890=2Etx?=
>  =?ISO-8859-1?Q?t?="
> Content-Disposition: attachment;
>  filename="=?ISO-8859-1?Q?=E412345678901234567890123456789012345678901234567890=2Etx?=
>  =?ISO-8859-1?Q?t?="
Status: UNCONFIRMED → NEW
Ever confirmed: true
> name in Content-Type: is not official parameter.
>  filename*0*
>  filename*1*
>  filename*2*

i dont agree - RFC 2231 is newer and not implemented on the sending side except for thunderbird http://www.faqs.org/rfcs/rfc2231.html

neither outlook nor lotus notes implement this syntax and fallback to the "name" param in the content-type - this is what most filters do - programming apis i.e. javamail do not support this syntax yet so the majority will face this thunderbird bug - therefor Severity Major is justified - Filters will not be able to recognize the extension
additionally - when TB uses 2231 rfc - why does it only encode the filename param according to 2231? it still uses the old word encoding for the "name" param - which shouldnt be done - if you use rfc 2231 at all - you should use it for ALL headers and not just for some - i doesnt make sense to make a mixed form here

i believe that the name param was left the way it is intentionally - because everyone knows that rfc 2231 is not widley used and all MUA fallback to the name param when they cant interpret the filename*0* param - thus the severity major is right - because the fallback "name" param  is wrongly encoded!
Summary: hunderbird creates invalid RFC encoded Words in filename Headers with specific file length → Thunderbird creates invalid RFC encoded Words in filename Headers with specific file length
I imagined "message filter at server side"(Gmail's filtering like ones or filtering service by modern ISP) when I posted previous comment. And I did know only "misinterpreted filename by MUA(then corrupted filename)" case, which is usually bypassed by parm_folding=1 of user's Tb. I didn't know "fall back to name by MTA/MUA when filename is encoded by RFC2231" case. So I asked "really major?".

Steve, you are right. Interoperability is reason why strictly parm_folding=1/3 mode are implemented in addition to parm_folding=2 mode. Search for bugs refer to RFC 2231, RFC 2047, name, filename (open, fixed, dup, wontfix) and see them, please.
If Tb puts name parameter in Content-Type: header, it should be same as one for parm_folding=1. Please note that I never say that name parameter generated by Tb when parm_folding=3 is not incorrect. (I'm the man who confirmed this bug :-))
Yes - my inquire is not that TB has a "quirk" mode to use one rfc for the name param and another for the filename param. 

Im talking about that if TB uses the older rfc that the word encoding is wrong because there needs to be a space everytime you start a new encoding prefix =?ISO in the same param

How do we proceed from here? I believe this is very easy to fix and could be incooperated in the next TB minor release
(In reply to comment #8)

> How do we proceed from here? I believe this is very easy to fix and could be
> incooperated in the next TB minor release

A patch needs to be written, attached to the bug and reviewed. The next minor release should only contain security fixes. This is more likely to get into 3.0, if there is a patch written.
Bug 193439 introduced parm_folding=2 mode(filename with RFC2231, no name in Content-Type:). And Bug 309566 introduced parm_folding=3/4 mode.
No care for name of Content-Type: when (parm_folding==3||parm_folding==4) in patch for Bug 309566? RFC2027ParmFolding is called for name at somewhere without parmFolding option if (parm_folding==3||parm_folding==4)?
(D. mail.strictly_mime.parm_folding=0 : problem is seen in both name & filename)
> Content-Type: text/plain;
>  name="=?ISO-8859-1?Q?=E412345678901234567890123456789012345678901234567890=2Etx?==?ISO-8859-1?Q?t?="
> Content-Disposition: attachment;
>  filename="=?ISO-8859-1?Q?=E412345678901234567890123456789012345678901234567890=2Etx?==?ISO-8859-1?Q?t?="

Incorrect name parameter data when parm_folding=3 is generated by following code in RFC2047ParmFolding? (added by Bug 309566)
> http://mxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgCompUtils.cpp#1353
> 1350   // Now put backslashes before special characters per RFC 822
> 1351   char *qtextName =
> 1352     msg_make_filename_qtext(encodedRealName,
> 1353       aParmFolding == 0 || aParmFolding == 3 ? PR_TRUE : PR_FALSE);
2nd parm of msg_make_filename_qtext was stripCRLFs. Bug in msg_make_filename_qtext?
> http://mxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgCompUtils.cpp#1579
> 1579 msg_make_filename_qtext(const char *srcText, PRBool stripCRLFs)
msg_make_filename_qtext is for rfc 822, i dont think the bug is there because i see no code that will create a new encoded word when the string size is larger then 76 chars (per rfc) i also do not see the prefix of the encoding added to each string - the issue here is that the final header is

=?ISO.....<missing_space>=?ISO.....

so i believe the caller already does the job and concats 2 strings together incorrectly

"  While there is no limit to the length of a multiple-line header
   field, each line of a header field that contains one or more
   'encoded-word's is limited to 76 characters."
it looks like this method is responsible:

 /*static */ char *
 1335 RFC2047ParmFolding(const nsCString& aCharset,
 1336                    const nsCString& aFileName, PRInt32 aParmFolding)

http://mxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgCompUtils.cpp#1335

when you follow the call hierachy you will get after 3-4 calls to


 600 static
 601 char * apply_rfc2047_encoding(const char *_src, PRBool structured, const char *charset, PRInt32 cursor, PRInt32 foldlen)

http://mxr.mozilla.org/comm-central/source/mailnews/mime/src/comi18n.cpp#601

this is the actual method for the encoding, this method is a bit messy, i didnt see any code that split the string to 2 encoded words if a specific length is achieved - maybe there a is a precheck in the caller methods starting from the initial RFC2047ParmFolding method i linked above
After further investigation it seems that the encoding routine itself is correctly creating a <newline><space> - it can easily be verified by just sending a message with the filename as subject it will be created as:

Subject: =?ISO-8859-1?Q?=E41234567890123456789012345678901234567890123456?=
 =?ISO-8859-1?Q?7890=2Etxt?=

as you can see, after the first encoded word ends, a newline and space is placed

Encoding of the subject is done by directly calling this method from here:
nsMsgI18NEncodeMimePartIIStr

http://mxr.mozilla.org/comm-central/source/mailnews/extensions/mdn/src/nsMsgMdnGenerator.cpp#594

It seems the filename encoding routing is calling RFC2047ParmFolding which itself is also calling nsMsgI18NEncodeMimePartIIStr

So indeed, there seems to be happening in/after RFC2047ParmFolding.

RFC2047ParmFolding is called by  mime_generate_attachment_headers
http://mxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgCompUtils.cpp#778

i do not see that newlines or spaces are removed

 925 #ifdef EMIT_NAME_IN_CONTENT_TYPE
  926   if (encodedRealName && *encodedRealName) {
  927     // Note that we don't need to output the name field if the name encoding is
  928     // RFC 2231. If the MUA knows the RFC 2231, it should know the RFC 2183 too.
  929     if (parmFolding != 2) {
  930       char *nameValue = nsnull;
  931       if (parmFolding == 3 || parmFolding == 4)
  932         nameValue = RFC2047ParmFolding(charset, nsDependentCString(real_name),
  933                                        parmFolding);
  934       if (!nameValue || !*nameValue) {
  935         PR_FREEIF(nameValue);
  936         nameValue = encodedRealName;
  937       }
  938       buf.Append(";\r\n name=\"");
  939       buf.Append(nameValue);
  940       buf.Append("\"");
  941       if (nameValue != encodedRealName)
  942         PR_FREEIF(nameValue);
  943     }
  944   }


so it must be in the method RFC2047ParmFolding and it is =)

see the header


line 1579 -- msg_make_filename_qtext(const char *srcText, PRBool stripCRLFs)
http://mxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgCompUtils.cpp#1579

PRBool stripCRLFs will also remove space for some reason, it is set to true in RFC2047ParmFolding when  aParmFolding == 0 || aParmFolding == 3

The error is in msg_make_filename_qtext


 1604     if (*s == '\r')
 1605     {
 1606       if (stripCRLFs && *(s+1) == '\n' && (s+2) < end && IS_SPACE(*(s+2)))
 1607         s += 2;     // skip CRLFLWSP
 1608     }

why does it remove SP=Space when the boolean says removeCRLF? The comment 
"skip CRLFWSP" means skips Whitespace - this is the error - it either should not remove newlines at all or at least let the space remain!
Wada you were right - it has something to do with https://bugzilla.mozilla.org/show_bug.cgi?id=309566 - they were discussion about this issue whether or not to remove the space - they made a mistake however - the space should not be interpreted when DECODING i.e. when i have

=?ISO....<space>=?ISO... the resulting decoded string is without the space - however clients need to know where a new encoded word starts - and they can only identify it when there is a space before the next token with =?

- you cannot just remove it for encoding - it is needed

I believe this bug depends on 309566 and the people should be informed - maybe make a depency?
(E. mail.strictly_mime.parm_folding=4 : problem of this bug is bypassed)
> Content-Type: text/plain;
>  name="=?ISO-8859-1?Q?=E412345678901234567890123456789012345678901234567890=2Etx?=
>  =?ISO-8859-1?Q?t?="
> Content-Disposition: attachment;
>  filename*0*=ISO-8859-1''%E4%31%32%33%34%35%36%37%38%39%30%31%32%33%34%35;
>  filename*1*=%36%37%38%39%30%31%32%33%34%35%36%37%38%39%30%31%32%33%34%35;
>  filename*2*=%36%37%38%39%30%31%32%33%34%35%36%37%38%39%30%2E%74%78%74

(F. mail.strictly_mime.parm_folding=9 : no name/no filename. following header only)
> Content-Type: text/plain
> Content-Transfer-Encoding: base64

Another possible solution of this bug:
Change default to mail.strictly_mime.parm_folding=4 until "remove of [CRLF] + [Space]" issue will be resolved.

It may be a kind of setback, but it's acceptable because current default of parm_folding=3 instead of parm_folding=2 is for interoperability.
And, I think default of parm_folding=4 is better than 3, because parm_folding=3 can produce problem of long header for name parameter if very long file name, even when "remove of [CRLF] + [Space]" issue will be resolved.
Following change may be a clever solution, because no one talks about detailed behaviour on name parameter when parm_folding=3 & 4 in Bug 309566, and because "default is 3, change parm_folding=3 to 1" is already written in many web pages.
> 1352     msg_make_filename_qtext(encodedRealName,
> 1353       aParmFolding == 0 || aParmFolding == 3 ? PR_TRUE : PR_FALSE);
>     =>     aParmFolding == 0 || aParmFolding == 4 ? PR_TRUE : PR_FALSE);
Following comment also should be cared for. 
> http://mxr.mozilla.org/comm-central/source/mailnews/mailnews.js#195
> 194 // 0/1 (RFC 2047), 2(RFC 2231), 3/4(RFC 2231, but name param is encoded by RFC 2047)
> 195 // 0/3 the name param is never separated to multiple lines.
> 196 pref("mail.strictly_mime.parm_folding",     3);
>the name param is never separated to multiple lines.

why? every other header can be separated - where is the rfc that excludes this one? as you can see it applies to subject, as well as received headers etc

the space should at least remain, but the right thing is probably to leave <crlf><space>
Summary: Thunderbird creates invalid RFC encoded Words in filename Headers with specific file length → Thunderbird creates invalid RFC2047 encoded Words in filename Headers with specific file length
the change from #18 seems resonable and easy - anybody else has an oppinion?

the better way would however be, to treat the attachment header like every other header when using RFC2047 (like subject i.e.)
Blocks: 309566
In the first place, you can't use encoded-words in parameter of a MIME Content-Type field.
See RFC 2047 "5. Use of encoded-words in message headers".
Any RFC compliant recipients must not decode the encoded-words like string in parameter of a MIME Content-Type field.
RFC 2231 is the only way to encode long attachment names.
parm_folding=3/4 will add 'name' paramter not to be compliant with RFC but to rescue the broken (i.e. non-RFC compliant) recipients (e.g. Outlook).
(In reply to comment #22)

Kimura-san, we are not citing "RFC compliant or not". We are citing "Inter operablity" related issue after Bug 309566. Please note that both of parm_folding=3/4 is apparently not RFC compliant as you say, but default of parm_folding=3 & logic by Bug 309566 produced issue of this bug, due to problem of "removing [CRLF][Space] instead of removing [CRLF] only(initial design/spec) by msg_make_filename_qtext". Sorry but I don't know why/who introduced such change in msg_make_filename_qtext.

Why people implemented patch for Bug 309566 chose parm_folding=3 as default, even though they had been aware of issue of "removing [CRLF][Space] by msg_make_filename_qtext"?
(In reply to comment #23)
> we are not citing "RFC compliant or not".
Reporter says again and again in comment #0.
Also, the bug title says "invalid RFC2047 encoded Words".

> We are citing "Interoperablity" related issue after Bug 309566.
Interoperablity with what?
We can't keep interoperability with all legacy clients because there is no standard how to decode "encoded-words like string" in MIME parameter.
parm_folding=3 is interoperable with Outlook, and it's the purpose of bug 309566.

> Why people implemented patch for Bug 309566 chose parm_folding=3 as default,
> even though they had been aware of issue of "removing [CRLF][Space] by
> msg_make_filename_qtext"?
It is not our choice. parm_folding=0 (which removes [CRLF][Space]) was already chosen before bug 193439. Neither bug 193439 nor bug 309566 changed the behavior of msg_make_filename_qtext.
>>We can't keep interoperability with all legacy clients because there is no
standard how to decode "encoded-words like string" in MIME parameter.

This is wrong - the RFC2047 is exactly for that - it also applies - what did clients do between the time the newer RFC 2231 was formed? They used RFC2047 - this is used for Subject Header too, and other params.

RFC2047 states

" or any MIME body part field for which the field body is defined as '*text'"

This applies to ContentType too
(In reply to comment #25)
> >>We can't keep interoperability with all legacy clients because there is no
> standard how to decode "encoded-words like string" in MIME parameter.
> 
> This is wrong - the RFC2047 is exactly for that - it also applies
No. Dis you really read RFC 2047? It clearly state:
>   + An 'encoded-word' MUST NOT be used in parameter of a MIME
>     Content-Type or Content-Disposition field, or in any structured
>     field body except within a 'comment' or 'phrase'.

> - what did
> clients do between the time the newer RFC 2231 was formed? They used RFC2047 -
Those clients are wrong. There was no way to encode long Content-Type parameter until RFC 2231 was formed. It is the reason RFC 2231 was formed. If RFC 2047 can be used for parameter, why RFC 2231 was required in the first place?

> this is used for Subject Header too, and other params.
Whilst encoded-word can be used for Subject, Content-Type can not.

> RFC2047 states
> 
> " or any MIME body part field for which the field body is defined as '*text'"
> 
> This applies to ContentType too
It does not apply because Content-Type is not defined as '*text'.
(In reply to comment #24)
> It is not our choice. parm_folding=0 (which removes [CRLF][Space]) was already chosen before bug 193439.
> Neither bug 193439 nor bug 309566 changed the behavior of msg_make_filename_qtext.

I misunderstood that parm_folding=1 was old default, because no report existed  about issue when parm_folding=0 (See comment #11).
Change of msg_make_filename_qtext(remove of [CRLF][Space]) was done after bug 193439 & bug 309566?

Nakano-san, default change to parm_folding=4 will produce new problems?
Can you accept default change to parm_folding=4?
(In reply to comment #27)
> I misunderstood that parm_folding=1 was old default, because no report existed about issue when parm_folding=0 (See comment #11).
This "issue" existed from the start. Note that reporter doesn't talk about any specific client.

> Change of msg_make_filename_qtext(remove of [CRLF][Space]) was done after bug
> 193439 & bug 309566?
Change have never done. [CRLF][Space] was removed before bug 193439. Hence this bug is not a regression, at least.

> Nakano-san, default change to parm_folding=4 will produce new problems?
> Can you accept default change to parm_folding=4?
I'm not a Nakano-san, but I will not object changing default if it is proven that parm_folding=4 is compatibility with Outlook and it improves the interoperability with some real client.
(In reply to comment #28)
Kimura-san, sorry. (Nakano-san is owner of bug 309566)

> > Change of msg_make_filename_qtext(remove of [CRLF][Space]) was done after bug 193439 & bug 309566?
> Change have never done. [CRLF][Space] was removed before bug 193439.

Thanks for your explanations. I've understood rough history around "removal of [CRLF][Space]". I'll open separate bug for issue of "removal of [CRLF][Space] by msg_make_filename_qtext when stripCRLFs==true, even though encoding".
To Steve:

What site/server acts like next? What MTA of what version is used at the server? 
(A) Knows RFC1806 or RFC2183(Content-Disposition:) & RFC2047.
    Falls back to name in Content-Type: and applies RFC2047 to name, if filename
    is encoded by RFC2231 instead of RFC2047 in Content-Disposition:.
(B) Doesn't know RFC1806 nor RFC2183(Content-Disposition:), knows RFC2047.
    Uses name in Content-Type: and applies RFC2047 to name parameter.

Note: As I wrote before, if issue on MUA of mail recipient(parm_folding=3 still produces problem if long file name), workaround (parm_folding=1) already exists.
>>What site/server acts like next? What MTA of what version is used at the
server? 

The JavaMail API for example and all the implementations that use this standard API, for example Apache James or any of these: http://java.sun.com/products/javamail/Third_Party.html


> RFC2047 states
> 
> " or any MIME body part field for which the field body is defined as '*text'"
> 
> This applies to ContentType too
>It does not apply because Content-Type is not defined as '*text'.

Im still not convinced, i think you interpret to much into this - why should they leave headers with params out? Why do mail clients interpret it correctly if i change the content-type header params like RFC2047? Because they also interpret RFC2047 that every header, even params should be encoded like this.
(In reply to comment #31)
> The JavaMail API for example and all the implementations that use this standard
Huh? Sun states "JavaMail does not encoded and decode filenames by default because doing so would violate the MIME spec."
http://java.sun.com/products/javamail/FAQ.html#encodefilename
Moreover, JavaMail 1.4.2 now supports RFC2231.
http://java.sun.com/products/javamail/NOTES.txt

> why should they leave headers with params out?
Not only are params left out. RFC2047 states:
>   These are the ONLY locations where an 'encoded-word' may appear.  In
>   particular:
>
>   + An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'.
>
>   + An 'encoded-word' MUST NOT appear within a 'quoted-string'.
>
>   + An 'encoded-word' MUST NOT be used in a Received header field.
>
>   + An 'encoded-word' MUST NOT be used in parameter of a MIME
>     Content-Type or Content-Disposition field, or in any structured
>     field body except within a 'comment' or 'phrase'.

> Why do mail clients interpret it correctly
It interpret params *incorrectly*. RFC2047 like params must be parsed literally. Thunderbird deliberately violates RFC2047 only for compatibility with old, legacy, broken clients.
(In reply to comment #31)
> >What site/server acts like next? What MTA of what version is used at the
> server? 
> The JavaMail API for example and all the implementations that use this standard  API, (snip)

> http://java.sun.com/products/javamail/NOTES.txt
> JavaMail(TM) API 1.4.2 release
>(snip)
> A list of the known limitations, bugs, issues:
>(snip)
> 2. Internationalization.  Parameter encoding in MIME headers, as specified by RFC2231, *has* been implemented.
>(snip)
> To enable RFC2231 support in parameter lists, set the System properties
> mail.mime.encodeparameters and mail.mime.decodeparameters to "true".  
And, as of today, JavaMail 1.4.2 looks to be the newest release just.
> http://java.sun.com/products/javamail/downloads/index.html
> JavaMail 1.4.2 (March 2, 2009)

Sigh... Why mail.mime.decodeparameters=true is not default? Still experimental for Sun?
I hope all JavaMail sites will upgrade to 1.4.2 ASAP and will use at least mail.mime.decodeparameters=true, but, ...
Actually RFC2231 support has been implemented since JavaMail 1.4.
http://java.sun.com/products/javamail/javamail-1_4.html
> JavaMail 1.4 Release
> (April 25, 2006) 
http://java.sun.com/products/javamail/NOTES14.txt
Yes since this is not default it doesnt help - its the same for Thunderbird - the setting which would help is not default =)
(In reply to comment #35)
> Yes since this is not default it doesnt help - its the same for Thunderbird -
> the setting which would help is not default =)

So how about we change the default setting (patch welcome) - what would be the implications for people using Thunderbird with that new default setting ?
OK, I made a patch which change the default value of parm_folding.
I changed a function name RFC2047ParmFolding to LegacyFolding because it doesn't really comply with RFC2047. The current name is highly confusing.
This patch also incorporates parm_folding=3/4 into parm_folding=0/1.
Thunderbird will always encode filename parameter in Content-Disposition using RFC2231.
At least Windows Mail (Vista) handled folded name parameter "correctly".
Assignee: nobody → VYV03354
Status: NEW → ASSIGNED
Attachment #374455 - Flags: superreview?(bienvenu)
Attachment #374455 - Flags: review?(bienvenu)
Flags: blocking-thunderbird3?
This patch seems like it really wants a unit test... Standard8, do you know if it would be easy to hook up unit xpcshell tests to compose output?  I'd suspect if the result can be saved to a folder then something could be done...
(In reply to comment #38)
> This patch seems like it really wants a unit test... Standard8, do you know if
> it would be easy to hook up unit xpcshell tests to compose output?  I'd suspect
> if the result can be saved to a folder then something could be done...

I haven't looked much at these interfaces so I'm not sure. I'm unlikely to be able to look at it much before Wednesday.
(In reply to comment #37)
> Make folding name parameter default
>   This patch also incorporates parm_folding=3/4 into parm_folding=0/1.
>   Thunderbird will always encode filename parameter in Content-Disposition using RFC2231.
> At least Windows Mail (Vista) handled folded name parameter "correctly".

"correctly" based on which RFC? :-)

MS Outlook Express 6 on MS Win-XP displayed following string at save dialog in both cases(parm_folding=3 & 4).
> あ1234567890123456789012345678901234567890123456789.txt
(Mail header)
>(parm_folding=3)
> Content-Type: text/plain;
>  name="=?Shift_JIS?B?gqAxMjM0NTY3ODkwMTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0NTY3ODkwMTIz?==?Shift_JIS?B?NDU2Nzg5LnR4dA==?="
>(parm_folding=4)
> Content-Type: text/plain;
>  name="=?Shift_JIS?B?gqAxMjM0NTY3ODkwMTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0NTY3ODkwMTIz?=
>  =?Shift_JIS?B?NDU2Nzg5LnR4dA==?="
I guess this is the reason why we were not aware of this bug's issue during test for bug 309566.
 
As written in Comment #0, if MUA/MTA is sensitive than MS's mailer on "space between encoded-word and encoded-word", issue at mail recipient can occur when parm_folding=3(parm_folding=0 after your patch).
And I can't think currently used old MS's mailer mishandles name parameter generated by current parm_folding=4. 
So I think new default of parm_folding=1(current parm_folding=4) is better, because next problem really exists and msg_make_filename_qtext is still called with stripCRLFs=true when parm_folding=0(current parm_folding=3). 
  msg_make_filename_qtext removes [CRLF][Space] instead of [CRLF],
  when stripCRLFs==true is specified.

Kimura-san, what do you think?
Do you know mailer who produces problem when name parameter generated by new parm_folding=1(current parm_folding=4)?
Attached patch With unit test (obsolete) — Splinter Review
Attachment #374455 - Attachment is obsolete: true
Attachment #374587 - Flags: superreview?(bienvenu)
Attachment #374587 - Flags: review?(bienvenu)
Attachment #374455 - Flags: superreview?(bienvenu)
Attachment #374455 - Flags: review?(bienvenu)
the unit test fails on make check for me on windows with a debug build - does it work for you?

I get a couple errors about the activity manager:

WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111: file C:/mozilla-build/msys/tbirdcheck/mozilla/js/src/xpconnect/loader
/mozJSComponentLoader.cpp, line 1543
JS Component Loader: ERROR (null):0
                     uncaught exception: [Exception... "Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIXPCCom
ponents_Utils.import]"  nsresult: "0x80040111 (NS_ERROR_NOT_AVAILABLE)"  location: "JS frame :: file:///C:/mozilla-build/msys/tbirdchec
k/objdir-tb/mozilla/dist/bin/components/nsActivity.js :: <TOP_LEVEL> :: line 40"  data: no]
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111: file C:/mozilla-build/msys/tbirdcheck/mozilla/js/src/xpconnect/loader
/mozJSComponentLoader.cpp, line 1543

and then an assertion:

nsMsgComposeSendListener: Success on the message copy operation!
###!!! ASSERTION: RemoveCurrentDraftMessage can't get draft id: '(NS_SUCCEEDED(rv) && !curDraftIdURL.IsEmpty())', file C:/mozilla-build
/msys/tbirdcheck/mailnews/compose/src/nsMsgCompose.cpp, line 3593
Attached patch Revised patch (obsolete) — Splinter Review
Changes from the previous patch:
* Removed bogus assertion which caused a test failure on debug build. curDraftIdURL can be empty as the following comment explains. The test creates a new draft message, so curDraftIdURL will be empty.
* Fixed msg_make_filename_qtext. msg_make_filename_qtext changed CRLF to bare LF. It is apparently wrong. This bug was hidden on Tb2 for some unknown reason.
* I understand why msg_make_filename_qtext removed not only CRLF but also LWSP. This function is also used to quote non-RFC2047 (ascii only) parameter. I added an arg to control whether remove LWSP or not, and I added a corresponding testcase.
Attachment #374587 - Attachment is obsolete: true
Attachment #374602 - Flags: superreview?(bienvenu)
Attachment #374602 - Flags: review?(bienvenu)
Attachment #374587 - Flags: superreview?(bienvenu)
Attachment #374587 - Flags: review?(bienvenu)
(In reply to comment #40)
> "correctly" based on which RFC? :-)
Ask Microsoft guys. :-) I'm anxious to know which standards they obey.

> Do you know mailer who produces problem when name parameter generated by new
> parm_folding=1(current parm_folding=4)?
It is unlikely that parm_folding=1 causes a problem because Windows Mail generates a similar (folded) name parameter.
(In reply to comment #43)
> * I understand why msg_make_filename_qtext removed not only CRLF but also LWSP.
> This function is also used to quote non-RFC2047 (ascii only) parameter.
> I added an arg to control whether remove LWSP or not, (snip)

Thanks for resolving it. (I thought it should be requested in separate bug, because I couldn't guess why).
After that, I think default=0 is better to be kept(it respects recommendation on folding by RFC2822). Kimura-san, what do you think? Default=1(same as Windows Mail etc.) is better for interoperability?
(In reply to comment #45)
> After that, I think default=0 is better to be kept(it respects recommendation
> on folding by RFC2822). Kimura-san, what do you think? Default=1(same as
> Windows Mail etc.) is better for interoperability?
RFC2047 seems to recommend (or even mandate) folding a header which contains 'encoded-word's.
>   An 'encoded-word' may not be more than 75 characters long, including
>   'charset', 'encoding', 'encoded-text', and delimiters.  If it is
>   desirable to encode more text than will fit in an 'encoded-word' of
>   75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may
>   be used.
>
>   While there is no limit to the length of a multiple-line header
>   field, each line of a header field that contains one or more
>   'encoded-word's is limited to 76 characters.
Attached patch v4 (obsolete) — Splinter Review
Changes:
* Backing out stripLWSPs stuff. parm_folding=0 header will violate RFC2047 anyway (over 76 char line). Some clients may expect a current parm_folding=0/3 format.
* Added assertion again. While curDraftIdURL may be empty, GetDraftId() should not fail.
Attachment #374602 - Attachment is obsolete: true
Attachment #374644 - Flags: superreview?(bienvenu)
Attachment #374644 - Flags: review?(bienvenu)
Attachment #374602 - Flags: superreview?(bienvenu)
Attachment #374602 - Flags: review?(bienvenu)
Comment on attachment 374644 [details] [diff] [review]
v4

thx for the patch; we should try this out. Asking standard8 for sr, and to have him look at the test code...
Attachment #374644 - Flags: superreview?(bugzilla)
Attachment #374644 - Flags: superreview?(bienvenu)
Attachment #374644 - Flags: review?(bienvenu)
Attachment #374644 - Flags: review+
adding to blocking to keep on radar, given where the patch is at.
Flags: blocking-thunderbird3? → blocking-thunderbird3+
Whiteboard: [needs sr standard8]
Comment on attachment 374644 [details] [diff] [review]
v4

>+function checkAttachmentName(folding, input, expectedCD, expectedCT) {
>+  const prefs = Cc["@mozilla.org/preferences-service;1"]
>+                  .getService(Ci.nsIPrefBranch);
>+  prefs.setIntPref("mail.strictly_mime.parm_folding", folding);
>+
>+  var msgCompose = Components.classes["@mozilla.org/messengercompose/compose;1"]
>+                             .createInstance(Components.interfaces.nsIMsgCompose);
>+  var fields = Components.classes["@mozilla.org/messengercompose/composefields;1"]
>+                         .createInstance(Components.interfaces.nsIMsgCompFields);
>+  var params = Components.classes["@mozilla.org/messengercompose/composeparams;1"]
>+                         .createInstance(Components.interfaces.nsIMsgComposeParams);

Please use Cc and Ci consistently throughout the file (prefer Cc and Ci for unit tests).

>+  params.composeFields = fields;
>+  msgCompose.Initialize(null, params);
>+  var smtpServer = getBasicSmtpServer();
>+  var identity = getSmtpIdentity(null, smtpServer);

Is there a special reason you're creating an smtp server once per call? Why not just create it once and pass it in?

>+  var rootFolder = gLocalIncomingServer.rootMsgFolder;
>+  var draftFolder;
>+  try {
>+    draftFolder = rootFolder.getChildNamed("Drafts");
>+    rootFolder.propagateDelete(draftFolder, true, null);
>+  } catch (e) {
>+  }

I assume this will fail if draftFolder doesn't exist? Please put a comment in the catch as to what it is catching and why it is ok to ignore.

>+  try {
>+    msgCompose.SendMsg(Components.interfaces.nsIMsgSend.nsMsgSaveAsDraft,
>+                       identity, "", null, null);
>+  } catch (e) {
>+    do_throw(e);
>+  } finally {
>+    var thread = gThreadManager.currentThread;
>+    while (thread.hasPendingEvents())
>+      thread.processNextEvent(true);
>+  }

Is saving a message as draft a synchronous or asynchronous event? If you need to wait until it has finished, then you should hook into the listener argument and do the next stage of testing when that returns that it has finished saving. Otherwise we could get issues with the performance of different machines affecting the test result.

>+function run_test() {
>+  server = setupServerDaemon();

You shouldn't need an smtp server if you're not actually sending a message.

The patch is looking reasonable, however I'd like to see the updates to the unit test code so sr-.
Attachment #374644 - Flags: superreview?(bugzilla) → superreview-
Whiteboard: [needs sr standard8] → [needs updated patch]
Target Milestone: --- → Thunderbird 3.0b3
Attached patch updated per comments (obsolete) — Splinter Review
Attachment #374644 - Attachment is obsolete: true
Attachment #376808 - Flags: superreview?(bugzilla)
Attached patch updated per comments (obsolete) — Splinter Review
Sorry, Components.interfaces has still present.
Attachment #376808 - Attachment is obsolete: true
Attachment #376809 - Flags: superreview?(bugzilla)
Attachment #376808 - Flags: superreview?(bugzilla)
I realised I didn't quite explain myself well enough when talking about the async aspects of the test - therefore I decided to redo the test for you (in the style we do with some of the other unit tests) as it would be quicker than another couple of review iterations.

I've not changed any of the code or the functional part of the tests - just the mechanics of how they operate.

sr=Standard8
Attachment #376809 - Attachment is obsolete: true
Attachment #376885 - Flags: superreview+
Attachment #376809 - Flags: superreview?(bugzilla)
Thank you very much and sorry for bothering you.
Keywords: checkin-needed
Whiteboard: [needs updated patch]
(In reply to comment #54)
> Thank you very much and sorry for bothering you.

It really is not a problem (no need to worry about bothering me), I'd rather get a the patch in to the code base and out to nightly testers at this stage than spend another few days sorting out the unit test.

Thanks for the patch, checked in:

http://hg.mozilla.org/comm-central/rev/c4c4a7727bfc
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Flags: in-testsuite+
Keywords: checkin-needed
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.