Closed Bug 227290 Opened 22 years ago Closed 20 years ago

be generous to overlong (invalid) B-encoded words in 2047 encoded header?

Categories

(MailNews Core :: MIME, enhancement)

Product:

Component:

Type:

enhancement

Priority:

Not set

Severity:

normal

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: ilya.konstantinov+future, Assigned: jshin1987)

References

Details

(Keywords: fixed1.8.1, intl)

Attachments

(3 files)

patch 22 years ago Jungshik Shin 1.18 KB, patch		Details \| Diff \| Splinter Review
patch that handles one extra '=' only 22 years ago Jungshik Shin 1.25 KB, patch	smontagu : review+ Bienvenu : superreview+ dveditz : approval1.8.0.2- mscott : approval1.8.1+	Details \| Diff \| Splinter Review
Testcase 20 years ago Ilya Konstantinov 346 bytes, message/rfc822		Details

Ilya Konstantinov

Reporter

Description

•

22 years ago

Multiple encoded words (= MIME header values which include charset specification, as per RFC 2047) are not parsed. Seems like the only encoded word to get parsed is the encoded word on the first line of the header. For example: Subject: =?koi8-r?B?7s/Xyc7LySDLwdTBzM/HwSAi5sXMzMnOySDmxcTF0snLzyAtIEZlZA===?= =?koi8-r?B?ZXJpY28gRmVsbGluaSAoMTkyMCAtIDE5OTMpIg==?= does not get parsed at all, and the mangled header is displayed as-is in the GUI (both Mozilla and Thunderbird).

Assignee

Comment 1

•

22 years ago

Hmm. that's strange. What version did you try?

OS: Linux → All

Hardware: PC → All

Assignee

Comment 2

•

22 years ago

It seems like it's the second encoded word that is missing. I sent an email to myself with the following header and the first and the third encoded words are decoded and shown, but the second is not. Subject: =?UTF-8?B?6rCA64KY64usIO2VnOq4gCDqsITri6Trnbwg7ZWc6riA44WHIOOEtCDqsA==?= =?UTF-8?B?gOuCmOuLpOudvCDtlZzquIDqsIDrgpjri6Trnbwg7ZWc6riAIOqwgOuCmA==?= =?UTF-8?B?64usIOqwgOuCmOuLpOudvCDqsIDrgpjri6TrnoQg6rCA64KY64us6528IA==?= I'll take a look.

WADA:World Anti-bad-Duping Agency

Comment 3

•

22 years ago

To reporter : > Subject: > =?koi8-r?B?7s/Xyc7LySDLwdTBzM/HwSAi5sXMzMnOySDmxcTF0snLzyAtIEZlZA===?= > =?koi8-r?B?ZXJpY28gRmVsbGluaSAoMTkyMCAtIDE5OTMpIg==?= Second part was displayed as > erico Fellini (1920 - 1993)" This part is properly encoded. But first part was displayed as is, ie. =?koi8-r?B?7s.... This part is not encoded properly. Reporter, Try following test. (1) Create two draft mails in Drafts folder (2) "Compact this folder" for Drafts folder (3) Shutdown Mozilla (4) Edit file for Draft folder (file named "Draft" instead of "Drafts.msf") - Paste first part to Subject: header of first mail - Paste seconfd part to Subject: header of second mail (5) Delete file named "Drafts.msf". (6) Restart Mozilla and see Drafts folder. This is bug in mail sender's side. Probably bug in splitting long encoded string to multiple Subject: header lines. What is the mailer? Mozilla? To Comment #2 From Jungshik Shin : In above test for your UTF-8 Subject: on Thunderbird 2003-12-23 build, first part and third part are displyed in Hangul characters prperly(probably. I can not read Hangul chars), but second part was not. However, WORKSFORME with Mozilla 2003122809-trunk/Win-Me, for long Subject: of both ISO-2022-JP encoding and UTF-8 encoding for Japanese characters. Splitting to multiple lines is done with no problem. Are there any special condition around splitted point?

Assignee

Comment 4

•

22 years ago

What led you to believe that the first line of the header in comment #0 is invalid? By just inspection, I don't see anything wrong with. Besides, Pine (with iconv patch) has no problem rendering both lines correctly: Subject: Новинки каталога "Феллини Федерико - Federico Fellini (1920 - 1993)" However, there's something. There may be an embedded new line (in the first encoded word) that may lead Mozilla to a trouble. As for my case, there's nothing special. I just typed a long enough string to get Pine to generate multiple encoded words. There's very low chance that Pine has a bug in RFC 2047 implementation. It's the most standard-compliant MUA.

Assignee

Comment 5

•

22 years ago

There's no new line embedded in either of two encoded words. The first encoded word is, when decoded, 'Новинки каталога "Феллини Федерико - Fed' and the second one is 'erico Fellini (1920 - 1993)"'. Mozilla doesn't decode either of them as reported. I have to debug it.

WADA:World Anti-bad-Duping Agency

Comment 6

•

22 years ago

To Comment #4 From Jungshik Shin : >What led you to believe that the first line of the header in comment #0 is >invalid? By just inspection, I don't see anything wrong with. Besides, Pine >(with iconv patch) has no problem rendering both lines correctly: My test result lead me : Both of Mozilla 2003122809-trunk/Win-Me and Thunderbird 2003-12-23 build displayed following header as ASCII string. >Subject: =?koi8-r?B?7s/Xyc7LySDLwdTBzM/HwSAi5sXMzMnOySDmxcTF0snLzyAtIEZlZA===?= But I belive encoding itself is correct as you say since this header is displayed properly in your environment. I guess this problem is OS dependent. You use Pine (with iconv patch) but I use Japanese MS Windows Me. MS Windows implementation for Unicode is slightly different from Unicode Standards, for example large Tilda. In addition to it, MS Win-9x family's Unicode support is partial, although MS Win-NT family's one is nearly full support. Following is mail source when I paseted your decoded text to Subject: and body. >Subject: =?KOI8-R?Q?=EE=CF=D7=C9=CE=CB=C9_=CB=C1=D4=C1=CC=CF=C7=C1_=22?= > =?KOI8-R?Q?=E6=C5=CC=CC=C9=CE=C9_=E6=C5=C4=C5=D2=C9=CB=CF_-_Federi?= > =?KOI8-R?Q?co_Fellini_=281920_-_1993=29=22?= >Content-Type: text/plain; charset=KOI8-R; format=flowed >Content-Transfer-Encoding: 8bit > >Новинки каталога "Феллини Федерико - Federico Fellini >(1920 - 1993)" Your second UTF-8 portion was displayed as single strange character, a "?" sarounded by diamond shape by Mozilla under MS Win-Me. Font specified for Korean : Proportinal=Arial Unicode MS, Monospace=GulimChe

Assignee

Comment 7

•

22 years ago

Did you use Mozilla to test whether the encoded word Mozilla has trouble with is valid or not per RFC 2047? Obviously, that doesn't work. How can it work? I just used Pine as a quick test tool and then independently decoded encoded words with other tools. This bug (as reported) has NO platform dependency. It's 100% XP code and I know where to look. Actually, I'm almost sure Mozilla doesn't have a problem with 'encoded words' themselves, but it has a problem with header fields made of multiple lines/encoded words in some cases. It has the code to deal with, but somehow it seems like it fails in some cases (as given here).

Assignee: sspitzer → jshin

WADA:World Anti-bad-Duping Agency

Comment 8

•

22 years ago

To Comment #7 From Jungshik Shin : When I changed >Subject: =?koi8-r?B?7s/Xyc7LySDLwdTBzM/HwSAi5sXMzMnOySDmxcTF0snLzyAtIEZlZA===?= to >Subject: =?koi8-r?B?7s/Xyc7LySDLwdTBzM/HwSAi5sXMzMnOySDmxcTF0snLzyAtIEZlZA==?= (Removed single "=" before last "?") Mozilla displayed it as > Новинки каталога "Феллини Федерико - Fed Unescape sequence is corrupted. Jungshik Shin, please do not confuse reporter's case and your case.

Assignee

Comment 9

•

22 years ago

I'm completely at loss what you're talking about.

Assignee

Comment 10

•

22 years ago

Sorry. I just got what you meant. The last '=' is the 57th (if I didn't miscount) character and it's kinda redundant (57 being 1 modulo 4). So, Mozilla has a trouble with that. That has to be 'fixed', I guess.

Status: NEW → ASSIGNED

WADA:World Anti-bad-Duping Agency

Comment 11

•

22 years ago

Sorry for improper word "unescaping". This is invalid in encoding based on RFC 2047/RFC 2045 and meaningless. It should have be "exess padding character('=')". RFC 2045 for Base 64 says : > Since all base64 > input is an integral number of octets, only the following cases can > arise: (1) the final quantum of encoding input is an integral > multiple of 24 bits; here, the final unit of encoded output will be > an integral multiple of 4 characters with no "=" padding, (2) the > final quantum of encoding input is exactly 8 bits; here, the final > unit of encoded output will be two characters followed by two "=" > padding characters, or (3) the final quantum of encoding input is > exactly 16 bits; here, the final unit of encoded output will be three > characters followed by one "=" padding character. I can not find rule for exess "="(s) after proper padding of zero or one or two "="s. Mozilla probably considers whole encoded data as "Invalid" when exess padding character exists (expects "?=", but not). I belive this is not violation of RFC. However, in this bug's case, all data from first bytes to just before exess padding is valid encoded data. So I feel "ignoring" exess data or printing exess data as ascii character is kind action for users since some mailers produced reporter's data actually. If Mozilla processes encoded word from start to end and expects "?=" just after proper end of base 64 encoded data, parsing order change(external first, important first) may help easy solution, for example : parse by "=?" and "?=" first, parse by "?"s secons and determin charset and encoding method, then process encoded data portion only. Jungshik Shin, what do you think?

Assignee

Comment 12

•

22 years ago

You're right that it's invalid (you were right at the beginning and I was misled by Pine and other Mime tools I have that turned out to be more generous than Mozilla.) It's easy to make Mozilla more generous (just a one-line fix would suffice), but I'm not sure if I have to. There may be a 'security' issue?? reporter, what's the mail program that generated the header cited in your report?

Assignee

Comment 13

•

22 years ago

Attached patch patch — Details — Splinter Review

this patch will "fix" the problem, but as I wrote, we have to think about this a little.

Assignee

Comment 14

•

22 years ago

WADA, you're in favor of the patch, right? David and Seth, what do you think? Simon, do you see any security implication in accepting overlong base64 encoded words in the message header? Base64-encoded words (B-encoded word) always have to the number of characters that is a multiple of four and end with one of three sequences a) a sequnece entirely made of base64 'alphabets', b) two characters (of base64 alphabets) followed by '==', c) three characters of base64 alphabets followed by '='

Summary: Multiple encoded words (=?charset?...?=) not parsed → be generous to overlong (invalid) B-encoded words in 2047 encoded header?

Simon Montagu :smontagu

Comment 15

•

22 years ago

Being more tolerant makes sense, but I think I would be happier with a more focused fix to ignore 3 consecutive "=" characters at the end of a B-encoded word, rather than blindly reducing the length to a multiple of 4.

WADA:World Anti-bad-Duping Agency

Comment 16

•

22 years ago

I would be more happier with fix to ignore "More than 2" consecutive "=" characters at the end of a B-encoded word. I have questions. (Q1) I can not say whether exess "="(s) should be displayed as ascii "=" in order to let mail receiver to know about existence of invalid header, or exess "="(s) should only be ignored. Which should Mozilla do? (Q2) How about characters other than "=" after valid end of encoding word? (Q3) In replying or forwarding, I can not say whether exess "="(s) or characters should be removed or shoud be kept. Which should Mozilla do?

Assignee

Comment 17

•

22 years ago

Attached patch patch that handles one extra '=' only — Details — Splinter Review

WADA, I don't want to do anything fancier than this or attachment 138246 [details] [diff] [review].

WADA:World Anti-bad-Duping Agency

Comment 18

•

22 years ago

> I don't want to do anything fancier than this or attachment 138246 [details] [diff] [review] (1) If enhancement for invalidly encoded header will be developed on Mozilla, I think it should not be only a limited relief from a bug of one or a few not-well-designed mailers only. It should be an universal enhancement. At least, issues I described in Comment #16 should be discussed and cleared. (2) I guess invalidly encoded header of this bug was produced by one or a few versions of one or a few mailers only. (3) I believe bug of the mailer(s) should be fixed first. So, I, as an user, recommend you, a developer, to close this bug as INVALID, or to close as FUTURE or WONTFIX with changing severity=Enhancement.

WADA:World Anti-bad-Duping Agency

Comment 19

•

22 years ago

By the way, Jungshik Shin, how did you generate header in your Comment #2? It seems to be a new problem in folding of mail header encoded with UTF-8.

Mike Cowperthwaite

Comment 20

•

21 years ago

See bug 258320.

Myk Melez [:myk] [@mykmelez]

Updated

•

21 years ago

Product: MailNews → Core

Mike Cowperthwaite

Comment 21

•

21 years ago

*** Bug 274156 has been marked as a duplicate of this bug. ***

Mike Cowperthwaite

Comment 22

•

21 years ago

*** Bug 274384 has been marked as a duplicate of this bug. ***

Assignee

Comment 23

•

20 years ago

*** Bug 282439 has been marked as a duplicate of this bug. ***

Comment 24

•

20 years ago

*** Bug 244002 has been marked as a duplicate of this bug. ***

Ilya Konstantinov

Reporter

Comment 25

•

20 years ago

Jungshik, what are our goals with this one?

Ilya Konstantinov

Reporter

Comment 26

•

20 years ago

Just verifying: bug still exists on Thunderbird 1.5rc2.

Ilya Konstantinov

Reporter

Comment 27

•

20 years ago

As to "what mailer generated this mail", this is an automated mailing generated by a major Russian online store. Yes, custom mailing apps tend to be written with disregard to standards, but if we can afford ourselves a little "be generous in what you accept", why not? (Especially that Pine, Evolution and probably OE too afford it.)

Ilya Konstantinov

Reporter

Comment 28

•

20 years ago

Attached file Testcase — Details

Assignee

Comment 29

•

20 years ago

Comment on attachment 207942 [details] Testcase ok. let's 'fix' this.

Attachment #207942 - Flags: superreview?(bienvenu)

Attachment #207942 - Flags: review?(smontagu)

Simon Montagu :smontagu

Comment 30

•

20 years ago

Comment on attachment 138283 [details] [diff] [review] patch that handles one extra '=' only I assume the review request was supposed to be on this attachment, not the testcase :)

Attachment #138283 - Flags: review+

Assignee

Comment 31

•

20 years ago

Comment on attachment 207942 [details] Testcase Thanks for r and catching my stupid mistake. :-)

Attachment #207942 - Flags: superreview?(bienvenu)

Attachment #207942 - Flags: review?(smontagu)

Assignee

Updated

•

20 years ago

Attachment #138283 - Flags: superreview?(bienvenu)

David :Bienvenu

Updated

•

20 years ago

Attachment #138283 - Flags: superreview?(bienvenu) → superreview+

Assignee

Comment 32

•

20 years ago

Fix checked into the trunk. David, I think this patch is safe enough for TB 1.5 release. For what branch (1.8.0.1, 1.8.0.2) should I ask approval?

Status: ASSIGNED → RESOLVED

Closed: 20 years ago

Resolution: --- → FIXED

David :Bienvenu

Comment 33

•

20 years ago

1.5 is getting release tomorrow - to make a 1.5.0.1 release, I'm not sure what branch you'd want. But definitely do 1.8.1 so it will make 2.0.

Assignee

Comment 34

•

20 years ago

Comment on attachment 138283 [details] [diff] [review] patch that handles one extra '=' only This is a trivial fix to make our RFC 2047 decoder a bit generous to a common mistake of other mail programs. We need to make it in 2.0. I also want this in TB 1.5.1(?), but not sure which branch I have to ask an approval for (1.8.0.1 or 1.8.0.2?). Whichver it may be, it'd be nice to get approval for that, too.

Attachment #138283 - Flags: approval1.8.1?

Assignee

Comment 35

•

20 years ago

Comment on attachment 138283 [details] [diff] [review] patch that handles one extra '=' only This is a trivial fix to make our RFC 2047 decoder a bit generous to a common mistake of other mail programs and server-side programs (well, at the moment, we don't interpret C-D filename parameter in 'browser'). Anyway, we'd better fix this in next point release of thunerbird 1.5.1(?)

Attachment #138283 - Flags: approval1.8.0.2?

Scott MacGregor

Comment 36

•

20 years ago

Comment on attachment 138283 [details] [diff] [review] patch that handles one extra '=' only do you need a review from Darin (the module owner of netwerk) for this?

Attachment #138283 - Flags: approval1.8.1? → approval1.8.1+

Assignee

Comment 37

•

20 years ago

(In reply to comment #36) > (From update of attachment 138283 [details] [diff] [review] [edit]) > do you need a review from Darin (the module owner of netwerk) for this? In principle, I guess, the answer is yes. However, I hope :-) Darin will excuse me for getting away with this especially considering that this part is currently only used by TB (due to bug 299372) Just in case, I'm adding him to cc. fix landed on the branch for TB 2.0

Keywords: fixed1.8.1

Scott MacGregor

Comment 38

•

20 years ago

Can some of the folks concerned about this bug on the cc list help test the 1.8 branch builds so we can see how this fix is looking before we consider it for 1.8.0.x? Thanks. ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8

Daniel Veditz [:dveditz]

Comment 39

•

19 years ago

Comment on attachment 138283 [details] [diff] [review] patch that handles one extra '=' only lack of testing (no reply to comment 38), too late for 1.8.0.2

Attachment #138283 - Flags: approval1.8.0.2? → approval1.8.0.2-

Comment 40

•

19 years ago

(In reply to comment #38) Works for me. Thunderbird 1.5 (Windows/20060222) Thanks.

Mike Cowperthwaite

Comment 41

•

19 years ago

*** Bug 302816 has been marked as a duplicate of this bug. ***

Mike Cowperthwaite

Comment 42

•

19 years ago

Bug 302816 is about the same problem as this bug; the header is: =?koi8-r?B?KioqIPXXxcTPzczFzsnFIM8g08/T1M/RzsnJIOzJw8XXz8fPINPexdTB=?= That's 56 bytes of base-64 plus one (superfluous) '='. Doesn't that make this a case of "one extra '=' only"? But it isn't being handled correctly in recent 2a1/3a1 builds (altho the original data in comment 0 *is* handled correctly).

Assignee

Comment 43

•

19 years ago

(In reply to comment #42) > Bug 302816 is about the same problem as this bug; the header is: > =?koi8-r?B?KioqIPXXxcTPzczFzsnFIM8g08/T1M/RzsnJIOzJw8XXz8fPINPexdTB=?= > > That's 56 bytes of base-64 plus one (superfluous) '='. Doesn't that make this > a case of "one extra '=' only"? But it isn't being handled correctly in recent > 2a1/3a1 builds (altho the original data in comment 0 *is* handled correctly). Actually, with my patch TB only tolerates case 2 (of RFC 2045 : comment 11) + one superfluous '='. It doesn't accept case 1 or case 3 + one superfluous '=' (bug 302816 being case 1 + '='). That was because we limited our fix to 'then-known' malformed cases (see comment #15). Do we have to be more generous now that a new strain of malformed header has been discovered? I'm not sure, but it seems a bit arbitrary that case 2 + extra '=' is accepted while case 1 + '=' or case 3 + '=' is rejected as invalid.

Assignee

Comment 44

•

19 years ago

*** Bug 351203 has been marked as a duplicate of this bug. ***

Assignee

Comment 45

•

19 years ago

Simon, what do you think of these new 'strains' of malformed encoded words?

Simon Montagu :smontagu

Comment 46

•

19 years ago

(In reply to comment #43) > I'm not sure, but it seems a bit arbitrary that case 2 + extra > '=' is accepted while case 1 + '=' or case 3 + '=' is rejected as invalid. Yes, I think it would be more consistent to accept case 1 and case 3 as well.

WADA:World Anti-bad-Duping Agency

Updated

•

18 years ago

Severity: normal → enhancement

Nobody; OK to take it and work on it

Updated

•

17 years ago

Product: Core → MailNews Core

You need to log in before you can comment on or make changes to this bug.