Closed Bug 227290 Opened 21 years ago Closed 19 years ago

be generous to overlong (invalid) B-encoded words in 2047 encoded header?

Categories

(MailNews Core :: MIME, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ilya.konstantinov+future, Assigned: jshin1987)

References

Details

(Keywords: fixed1.8.1, intl)

Attachments

(3 files)

Multiple encoded words (= MIME header values which include charset
specification, as per RFC 2047) are not parsed. Seems like the only encoded word
to get parsed is the encoded word on the first line of the header.

For example:

Subject:
 =?koi8-r?B?7s/Xyc7LySDLwdTBzM/HwSAi5sXMzMnOySDmxcTF0snLzyAtIEZlZA===?=
 =?koi8-r?B?ZXJpY28gRmVsbGluaSAoMTkyMCAtIDE5OTMpIg==?=

does not get parsed at all, and the mangled header is displayed as-is in the GUI
(both Mozilla and Thunderbird).
Hmm. that's strange. What version did you try?
OS: Linux → All
Hardware: PC → All
It seems like it's the second encoded word that is missing. I sent an email to
myself with the following header and the first and the third encoded words are
decoded and shown, but the second is not. 

Subject: =?UTF-8?B?6rCA64KY64usIO2VnOq4gCDqsITri6Trnbwg7ZWc6riA44WHIOOEtCDqsA==?=
 =?UTF-8?B?gOuCmOuLpOudvCDtlZzquIDqsIDrgpjri6Trnbwg7ZWc6riAIOqwgOuCmA==?=
 =?UTF-8?B?64usIOqwgOuCmOuLpOudvCDqsIDrgpjri6TrnoQg6rCA64KY64us6528IA==?=

I'll take a look.
To reporter :

> Subject:
> =?koi8-r?B?7s/Xyc7LySDLwdTBzM/HwSAi5sXMzMnOySDmxcTF0snLzyAtIEZlZA===?=
> =?koi8-r?B?ZXJpY28gRmVsbGluaSAoMTkyMCAtIDE5OTMpIg==?=

Second part was displayed as 
> erico Fellini (1920 - 1993)"
This part is properly encoded.

But first part was displayed as is, ie. =?koi8-r?B?7s....
This part is not encoded properly.

Reporter, Try following test.
(1) Create two draft mails in Drafts folder 
(2) "Compact this folder" for Drafts folder
(3) Shutdown Mozilla
(4) Edit file for Draft folder (file named "Draft" instead of "Drafts.msf")
    - Paste first part to Subject: header of first mail
    - Paste seconfd part to Subject: header of second mail
(5) Delete file named "Drafts.msf".
(6) Restart Mozilla and see Drafts folder.

This is bug in mail sender's side.
Probably bug in splitting long encoded string to multiple Subject: header lines.
What is the mailer? Mozilla?

To Comment #2 From Jungshik Shin :  

In above test for your UTF-8 Subject: on Thunderbird 2003-12-23 build,
first part and third part are displyed in Hangul characters prperly(probably. I
can not read Hangul chars), but second part was not.

However, WORKSFORME with Mozilla 2003122809-trunk/Win-Me, for long Subject: of
both ISO-2022-JP encoding and UTF-8 encoding for Japanese characters.
Splitting to multiple lines is done with no problem.

Are there any special condition around splitted point?
What led you to believe that the first line of the header in comment #0 is
invalid? By just inspection, I don't see anything wrong with. Besides, Pine
(with iconv patch) has no problem   rendering both lines correctly:

Subject: Новинки каталога "Феллини Федерико -
    Federico Fellini (1920 - 1993)"

However, there's something. There may be an embedded new line (in the first
encoded word) that may lead Mozilla to a trouble. 

As for my case, there's nothing special. I just typed a long enough string to
get Pine to generate multiple encoded words. There's very low chance that Pine
has a bug in RFC 2047 implementation. It's the most standard-compliant MUA. 
There's no new line embedded in either of two encoded words. The first encoded
word is, when decoded, 'Новинки каталога "Феллини Федерико - Fed' and the second
one is 'erico Fellini (1920 - 1993)"'.   Mozilla doesn't decode either of them
as reported. I have to debug it. 

To Comment #4 From Jungshik Shin :

>What led you to believe that the first line of the header in comment #0 is
>invalid? By just inspection, I don't see anything wrong with. Besides, Pine
>(with iconv patch) has no problem   rendering both lines correctly:

My test result lead me :
  Both of Mozilla 2003122809-trunk/Win-Me and Thunderbird 2003-12-23 build   
  displayed following header as ASCII string.
>Subject: =?koi8-r?B?7s/Xyc7LySDLwdTBzM/HwSAi5sXMzMnOySDmxcTF0snLzyAtIEZlZA===?=

But I belive encoding itself is correct as you say since this header is
displayed properly in your environment.

I guess this problem is OS dependent.
You use Pine (with iconv patch) but I use Japanese MS Windows Me.
MS Windows implementation for Unicode is slightly different from Unicode
Standards, for example large Tilda.
In addition to it, MS Win-9x family's Unicode support is partial, although MS
Win-NT family's one is nearly full support. 

Following is mail source when I paseted your decoded text to Subject: and body.
>Subject: =?KOI8-R?Q?=EE=CF=D7=C9=CE=CB=C9_=CB=C1=D4=C1=CC=CF=C7=C1_=22?=
> =?KOI8-R?Q?=E6=C5=CC=CC=C9=CE=C9_=E6=C5=C4=C5=D2=C9=CB=CF_-_Federi?=
> =?KOI8-R?Q?co_Fellini_=281920_-_1993=29=22?=
>Content-Type: text/plain; charset=KOI8-R; format=flowed
>Content-Transfer-Encoding: 8bit
>
>Новинки каталога "Феллини Федерико - Federico Fellini >(1920 - 1993)"

Your second UTF-8 portion was displayed as single strange character, a "?"
sarounded by diamond shape by Mozilla under MS Win-Me.
Font specified for Korean : Proportinal=Arial Unicode MS, Monospace=GulimChe
Did you use Mozilla to test whether the encoded word Mozilla has trouble with is
valid or not per RFC 2047?  Obviously, that doesn't work. How can it work? I
just used Pine as a quick test tool and then independently decoded encoded words
 with other tools.

This bug (as reported)  has NO platform dependency. It's 100% XP code and I know
where to look. Actually, I'm almost sure Mozilla doesn't  have a problem with
'encoded words' themselves, but it has a problem with header fields made of
multiple lines/encoded words in some cases. It has the code to deal with, but
somehow it seems like it fails in some cases (as given here).
Assignee: sspitzer → jshin
To Comment #7 From Jungshik Shin :

When I changed  
>Subject: =?koi8-r?B?7s/Xyc7LySDLwdTBzM/HwSAi5sXMzMnOySDmxcTF0snLzyAtIEZlZA===?=
to
>Subject: =?koi8-r?B?7s/Xyc7LySDLwdTBzM/HwSAi5sXMzMnOySDmxcTF0snLzyAtIEZlZA==?=
 (Removed single "=" before last "?")
Mozilla displayed it as
> Новинки каталога "Феллини Федерико - Fed

Unescape sequence is corrupted.

Jungshik Shin, please do not confuse reporter's case and your case.
I'm completely at loss what you're talking about.
Sorry.  I just  got what you meant. The last '=' is the 57th (if I didn't
miscount) character and it's kinda redundant (57 being 1 modulo 4). So, Mozilla
has a trouble with that. That has to be 'fixed', I guess. 
Status: NEW → ASSIGNED
Sorry for improper word "unescaping".
This is invalid in encoding based on RFC 2047/RFC 2045 and meaningless.
It should have be "exess padding character('=')".

RFC 2045 for Base 64 says :
> Since all base64
>   input is an integral number of octets, only the following cases can
>   arise: (1) the final quantum of encoding input is an integral
>   multiple of 24 bits; here, the final unit of encoded output will be
>   an integral multiple of 4 characters with no "=" padding, (2) the
>   final quantum of encoding input is exactly 8 bits; here, the final
>   unit of encoded output will be two characters followed by two "="
>   padding characters, or (3) the final quantum of encoding input is
>   exactly 16 bits; here, the final unit of encoded output will be three
>   characters followed by one "=" padding character.

I can not find rule for exess "="(s) after proper padding of zero or one or two
"="s.
Mozilla probably considers whole encoded data as "Invalid" when exess padding
character exists (expects "?=", but not). 
I belive this is not violation of RFC.

However, in this bug's case, all data from first bytes to just before exess
padding is valid encoded data.
So I feel "ignoring" exess data or printing exess data as ascii character is
kind action for users since some mailers produced reporter's data actually.

If Mozilla processes encoded word from start to end and expects "?=" just after
proper end of base 64 encoded data, parsing order change(external first,
important first) may help easy solution, for example :
  parse by "=?" and "?=" first, parse by "?"s secons and determin charset and  
  encoding method, then process encoded data portion only.

Jungshik Shin, what do you think?
You're right that it's invalid (you were right at the beginning and I was misled
by Pine and other Mime tools I have that  turned out to be more generous than
Mozilla.)  It's easy to make Mozilla more generous (just a one-line fix would
suffice), but I'm not sure if I have to. There may be a 'security' issue?? 

reporter, what's the mail program that generated the header cited in your report?
Attached patch patchSplinter Review
this patch will "fix" the problem, but as I wrote, we have to think about this
a little.
WADA, you're in favor of the patch, right? David and Seth, what do you think?
Simon, do you see any security implication in accepting overlong base64 encoded
words in the message header? Base64-encoded words (B-encoded word) always have
to the number of characters that is a multiple of four and end with one of three
sequences a) a sequnece entirely made of base64 'alphabets', b) two characters 
(of base64 alphabets) followed by '==', c) three characters of base64 alphabets
followed by '='
Summary: Multiple encoded words (=?charset?...?=) not parsed → be generous to overlong (invalid) B-encoded words in 2047 encoded header?
Being more tolerant makes sense, but I think I would be happier with a more
focused fix to ignore 3 consecutive "=" characters at the end of a B-encoded
word, rather than blindly reducing the length to a multiple of 4.
I would be more happier with fix to ignore "More than 2" consecutive "="
characters at the end of a B-encoded word.

I have questions.

(Q1) I can not say whether exess "="(s) should be displayed as ascii "=" in
order to let mail receiver to know about existence of invalid header, or exess
"="(s) should only be ignored.
Which should Mozilla do?

(Q2) How about characters other than "=" after valid end of encoding word?

(Q3) In replying or forwarding, I can not say whether exess "="(s) or characters
should be removed or shoud be kept.
Which should Mozilla do?
WADA, I don't  want to do anything fancier than this or attachment 138246 [details] [diff] [review].
> I don't want to do anything fancier than this or attachment 138246 [details] [diff] [review]

(1) If enhancement for invalidly encoded header will be developed on Mozilla, I
think it should not be only a limited relief from a bug of one or a few
not-well-designed mailers only.
It should be an universal enhancement.
At least, issues I described in Comment #16 should be discussed and cleared.
(2) I guess invalidly encoded header of this bug was produced by one or a few
versions of one or a few mailers only.
(3) I believe bug of the mailer(s) should be fixed first.

So, I, as an user, recommend you, a developer, to close this bug as INVALID, or
to close as FUTURE or WONTFIX with changing severity=Enhancement.
By the way, Jungshik Shin, how did you generate header in your Comment #2?
It seems to be a new problem in folding of mail header encoded with UTF-8.
Product: MailNews → Core
*** Bug 274156 has been marked as a duplicate of this bug. ***
*** Bug 274384 has been marked as a duplicate of this bug. ***
*** Bug 282439 has been marked as a duplicate of this bug. ***
*** Bug 244002 has been marked as a duplicate of this bug. ***
Jungshik, what are our goals with this one?
Just verifying: bug still exists on Thunderbird 1.5rc2.
As to "what mailer generated this mail", this is an automated mailing generated by a major Russian online store. Yes, custom mailing apps tend to be written with disregard to standards, but if we can afford ourselves a little "be generous in what you accept", why not? (Especially that Pine, Evolution and probably OE too afford it.)
Attached file Testcase
Comment on attachment 207942 [details]
Testcase

ok. let's 'fix' this.
Attachment #207942 - Flags: superreview?(bienvenu)
Attachment #207942 - Flags: review?(smontagu)
Comment on attachment 138283 [details] [diff] [review]
patch that handles one extra '=' only

I assume the review request was supposed to be on this attachment, not the testcase :)
Attachment #138283 - Flags: review+
Comment on attachment 207942 [details]
Testcase

Thanks for r and catching my stupid mistake. :-)
Attachment #207942 - Flags: superreview?(bienvenu)
Attachment #207942 - Flags: review?(smontagu)
Attachment #138283 - Flags: superreview?(bienvenu)
Attachment #138283 - Flags: superreview?(bienvenu) → superreview+
Fix checked into the trunk.
David, I think this patch is safe enough for TB 1.5 release. For what branch (1.8.0.1, 1.8.0.2) should I ask approval? 
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
1.5 is getting release tomorrow - to make a 1.5.0.1 release, I'm not sure what branch you'd want. But definitely do 1.8.1 so it will make 2.0.
Comment on attachment 138283 [details] [diff] [review]
patch that handles one extra '=' only

This is a trivial fix to make our RFC 2047 decoder a bit generous to a common mistake of other mail programs. We need to make it in 2.0. 
I also want this in TB 1.5.1(?), but not sure which branch I have to ask an approval for (1.8.0.1 or 1.8.0.2?). Whichver it may be, it'd be nice to get approval for that, too.
Attachment #138283 - Flags: approval1.8.1?
Comment on attachment 138283 [details] [diff] [review]
patch that handles one extra '=' only

This is a trivial fix to make our RFC 2047 decoder a bit generous to a common
mistake of other mail programs and server-side programs (well, at the moment, we don't interpret C-D filename parameter in 'browser').

Anyway, we'd better fix this in next point release of thunerbird 1.5.1(?)
Attachment #138283 - Flags: approval1.8.0.2?
Comment on attachment 138283 [details] [diff] [review]
patch that handles one extra '=' only

do you need a review from Darin (the module owner of netwerk) for this?
Attachment #138283 - Flags: approval1.8.1? → approval1.8.1+
(In reply to comment #36)
> (From update of attachment 138283 [details] [diff] [review] [edit])
> do you need a review from Darin (the module owner of netwerk) for this?

In principle, I guess, the answer is yes. However, I hope :-) Darin will excuse me for getting away with this especially considering that this part is currently only used by TB (due to bug 299372) Just in case, I'm adding him to cc.

fix landed on the branch for TB 2.0

Keywords: fixed1.8.1
Can some of the folks concerned about this bug on the cc list help test the 1.8 branch builds so we can see how this fix is looking before we consider it for 1.8.0.x? 

Thanks.

ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8
Comment on attachment 138283 [details] [diff] [review]
patch that handles one extra '=' only

lack of testing (no reply to comment 38), too late for 1.8.0.2
Attachment #138283 - Flags: approval1.8.0.2? → approval1.8.0.2-
(In reply to comment #38)
Works for me. Thunderbird 1.5 (Windows/20060222)

Thanks.
*** Bug 302816 has been marked as a duplicate of this bug. ***
Bug 302816 is about the same problem as this bug; the header is:
=?koi8-r?B?KioqIPXXxcTPzczFzsnFIM8g08/T1M/RzsnJIOzJw8XXz8fPINPexdTB=?=

That's 56 bytes of base-64 plus one (superfluous) '='.  Doesn't that make this 
a case of "one extra '=' only"?  But it isn't being handled correctly in recent 2a1/3a1 builds (altho the original data in comment 0 *is* handled correctly).
(In reply to comment #42)
> Bug 302816 is about the same problem as this bug; the header is:
> =?koi8-r?B?KioqIPXXxcTPzczFzsnFIM8g08/T1M/RzsnJIOzJw8XXz8fPINPexdTB=?=
> 
> That's 56 bytes of base-64 plus one (superfluous) '='.  Doesn't that make this 
> a case of "one extra '=' only"?  But it isn't being handled correctly in recent
> 2a1/3a1 builds (altho the original data in comment 0 *is* handled correctly).

Actually, with my patch TB only tolerates case 2 (of RFC 2045 : comment 11) + one superfluous '='. It doesn't accept case 1 or case 3 + one superfluous '=' (bug 302816 being case 1 + '='). That was because we limited our fix to 'then-known' malformed cases (see comment #15).

Do we have to be more generous now that a new strain of malformed header has been discovered? I'm not sure, but it seems a bit arbitrary that case 2 + extra '=' is accepted while case 1 + '=' or case 3 + '=' is rejected as invalid.
*** Bug 351203 has been marked as a duplicate of this bug. ***
Simon, what do you think of these new 'strains' of malformed encoded words? 

(In reply to comment #43)
> I'm not sure, but it seems a bit arbitrary that case 2 + extra
> '=' is accepted while case 1 + '=' or case 3 + '=' is rejected as invalid.

Yes, I think it would be more consistent to accept case 1 and case 3 as well.
Severity: normal → enhancement
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: