Closed
Bug 493544
Opened 15 years ago
Closed 12 years ago
invalid decoding of multiline utf-8 subject header (better to be tolerant with split of byte code at wrong position in RFC 2047 encoding)
Categories
(MailNews Core :: MIME, enhancement)
MailNews Core
MIME
Tracking
(Not tracked)
RESOLVED
FIXED
Thunderbird 20.0
People
(Reporter: sebcsaba, Assigned: ZaneUJi)
References
(Blocks 2 open bugs)
Details
(Keywords: testcase)
Attachments
(9 files, 3 obsolete files)
1.15 KB,
text/plain
|
Details | |
1.44 KB,
message/rfc822
|
Details | |
284.71 KB,
image/jpeg
|
Details | |
1.56 KB,
application/octet-stream
|
Details | |
78.41 KB,
image/jpeg
|
Details | |
23.11 KB,
message/rfc822
|
Details | |
12.42 KB,
patch
|
smontagu
:
review+
|
Details | Diff | Splinter Review |
8.16 KB,
patch
|
Details | Diff | Splinter Review | |
4.61 KB,
patch
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; hu; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8 Build Identifier: verzió 2.0.0.21 (20090302) If a multiline base64 encoded subject header is wrapped in the middle of a multibyte character, then it's not displayed properly. Reproducible: Always Steps to Reproduce: Create/send a mail with the next header, and open it in Thunderbird: Subject: =?utf-8?B?S0FSIC0gw4lydGVzw610w6lzIHbDoXLDs2xpc3TDoXJhIGhlbHllesOpc3LF?= =?utf-8?B?kWw=?= Actual Results: Shows this subject: "KAR - Értesítés várólistára helyezésr�" Expected Results: Shows this subject: "KAR - Értesítés várólistára helyezésről" The letter 'ő' (u0151) is a special hungarian character, it may be displayed as a question mark on your system. But anyway, the last character is 'l' (u006c) must be displayed properly.
Comment 1•15 years ago
|
||
c3LF in base64 == 0x7372C5 (last part of first encoded word) kWw= in Base64 == 0x916C (second encoded word) 'ő'(U+0151) == 0xC5 0x91 in UTF-8. An encoded word should complete within the encoded word. Mail sender have to split to 0x...7372 and 0xC5916C(split based on unicode character) upon base64 encoding. If Tb generated the header, Tb's bug. If Tb didin't generate the header, INVALID.
Reporter | ||
Comment 2•15 years ago
|
||
Can you reference an rfc about this interpretation? I cannot find any. Other email-clients displays this header correctly (Gmail, RoundCube Webmail)
Comment 3•15 years ago
|
||
RFC 2047.
My description of "should complete within the encoded word" was inaccurate. It was based on "3. Character sets" in RFC 2047, but description in "3. Character sets" was for "escape sequence" of some character sets such as ISO-2022-JP. Sorry for my confusion.
I think your case corresponds to next section.
> http://www.faqs.org/rfcs/rfc2047.html
> 5. Use of encoded-words in message headers
>(snip)
> (1) An 'encoded-word' may replace a 'text' token (as defined by RFC 822)
> in any Subject or Comments header field, (snip)
>(snip)
> Ordinary ASCII text and 'encoded-word's may appear together in the
> same header field. However, an 'encoded-word' that appears in a
> header field defined as '*text' MUST be separated from any adjacent
> 'encoded-word' or 'text' by 'linear-white-space'.
AFAIR, same phenomenon was reported to some other bugs and closed as INVALID. I don't know whether quirks for or torelance with such RFC violation is requested ot not.
Comment 4•15 years ago
|
||
In evaluating whether this deserves some sort of quirk, it would be very helpful to know exactly which clients generate messages with this sort of header...
Reporter | ||
Comment 5•15 years ago
|
||
phpmailer did it. I think the tolerant interpretation of the RFC should be preferred in this case.
Comment 6•15 years ago
|
||
This case is relieved by code like next; If (consecutive encoded words have same charset) { join decoded "consecutive encoded words"; } Interprets the "joined decoded consecutive encoded words" by specified charset; But this kind of coding breaks next case. - First encoded word contains single garbage byte. - Second encoded word contains correct byte codes. In this case, only "single garbage byte" is displayed in U+FFFD currently. But if abve change is made, all characters in second encoded word can be displayed in U+FFFD, if the "single garbage byte" changes interpreted unicode characters for byte codes in second encoded word. AFAIK, most often case is split of 3 byte code(this bug's case), and garbage is in last part of last encoded word even if garbage is involved. So I think code like above is practical. (Ignore a part of "5. Use of encoded-words in message headers" of RFC upon decoding/interpreting.)
Comment 7•15 years ago
|
||
3 byte code => 2/3 byte code
Comment 8•15 years ago
|
||
(Modification of comment #6) > If (consecutive encoded words have same charset) > { join decoded "consecutive encoded words"; } Report of incorrect splitting is mainly for next case. (1) UTF-8 (AFAIK, no bug is reported for 2 bytes code such as Shift_JIS) (nor EUC_JP which has 3 bytes code in addition to 2 bytes code) (2) split of next (AFAIK, no bug is reported for 2 bytes code only case) (2-a) split of 3 bytes code (2-b) split of 2 bytes code, if single or 3 byte code is mixed And RFC 2047 "3. Character sets" shouldn't be ignored when charset which has escape sequence. So, even if quirks will be implemented, it should be limited to UTF-8 only. (Modified version) > If ( utf-8 > && (consecutive encoded words have same charset) > && (consecutive encoded words uses same encoding of Q or B) > ) > { join decoded binary of consecutive encoded words; }
Updated•15 years ago
|
Severity: normal → enhancement
OS: Windows XP → All
Summary: invalid decoding of multiline utf-8 subject header → invalid decoding of multiline utf-8 subject header (better to be tolerant with split of byte code at wrong position in RFC 2047 encoding)
Version: unspecified → Trunk
Comment 11•15 years ago
|
||
attached testcase from bug #476741
Comment 12•15 years ago
|
||
Confirmed here on Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2pre) Gecko/20090725 Lightning/1.0pre Shredder/3.0b4pre ID:20090725112141 Otherwise, confirmed per duplicate in comment #09 and comment #10
Comment 13•15 years ago
|
||
(In reply to comment #12) > confirmed per duplicate in comment #09 and comment #10 Since this bug is Severity=Enhancement, changing from UNCONFIRMED to NEW usually means "it's proved that request is reasonable and very useful for many users and required for many users" or "developers or tiage team accepted the enhancement request". Aureliano Buendía, you changed this enhancement request bug to NEW. You've completed feasibility study of enhancement request of this bug? Or you are ready to implement enhancement of this bug?
Comment 14•15 years ago
|
||
For me this is a bug not enhancement. I forgot of read "Importance:" tag in this bug: in bug #476741 it is "normal". If you think that this is "enhancement" set as "UNCONFIRMED".
Updated•15 years ago
|
Status: NEW → UNCONFIRMED
Ever confirmed: false
Comment 15•15 years ago
|
||
(Off-Topic) (In reply to comment #14) > For me this is a bug not enhancement. > If you think that this is "enhancement" set as "UNCONFIRMED". Right box of Importance field is Severity: of "bug" at B.M.O, which is clearly defined. It's irrelevant to your thoughts about "bug or enhancement". It's irrelevant to my thoughts about "bug or enhancement" too. Aureliano Buendía, please read at least next page, which is linked by "Importance" in this bug. > https://bugzilla.mozilla.org/page.cgi?id=fields.html#importance Other than enhancement(trivial to blocker) : Flaw in code of Tb really exists(Tb doesn't work as designed) enhancement : Tb works as designed, but improvement/change is required.
Severity: enhancement → blocker
Updated•15 years ago
|
Severity: blocker → enhancement
Updated•15 years ago
|
Severity: enhancement → normal
Status: UNCONFIRMED → NEW
Ever confirmed: true
Updated•15 years ago
|
Component: Message Reader UI → MIME
Product: Thunderbird → MailNews Core
QA Contact: message-reader → mime
Hardware: x86 → All
Comment 17•15 years ago
|
||
Changing back to Severity=Enhancement, Status=UNCONFIRMED, to avoid mis-leading, confusion.
Severity: normal → enhancement
Status: NEW → UNCONFIRMED
Ever confirmed: false
Comment 18•15 years ago
|
||
Erm, if I set a status, I mean it... While I don't think that confirmed ENH bugs should be in UNCO state, it'd clearly be an useful enhancement.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 19•15 years ago
|
||
is bug 506927 a dupe of this one?
Comment 20•15 years ago
|
||
what about a subject like this one: (all should go into 1 line) > Subject: =?UTF-8?B?W2lwYXguYXQgIzE2ODg2XSBJUEFYIC0gV2VicmFkaW8gQmVzdGVsbGJlc3TD?= =?UTF-8?B?pHRpZ3VuZyAtIEJFN0I1N0Ux?= this should read decoded: > Subject: [ipax.at #16886] IPAX - Webradio Bestellbestätigung - BE7B57E1 but instead reads: > Subject: [ipax.at #16886] IPAX - Webradio Bestellbest�
Comment 21•15 years ago
|
||
Comment 23•14 years ago
|
||
Well, it is to convenient to say "this bug caused by RFC rule violation " so thats not TB's problem. But yes it is TB's problem, so what if the program that generated this mail faulty encoded the subject violating this RFC rule, other clients get past this violation and display this subject correctly, the same is expected from TB. I love TB, it's great E-mail client, but you can't just say that "we sticking directly to the rules with no exception" because that will leave people no choice rather then moving to other client that does the job correctly.
Comment 24•14 years ago
|
||
(In reply to comment #23) > other clients get past this violation and display this subject correctly There is no "correct" subject there, there's just a broken encoding. Actually, I'd say the best section of RfC 2047 to look at is 6.3(2): | 6.3. Mail reader handling of incorrectly formed 'encoded-word's | It is possible that an 'encoded-word' that is legal according to the | syntax defined in section 2, is incorrectly formed according to the | rules for the encoding being used. ... | (2) Any 'encoded-word' which encodes a non-integral number of ^^^^^^^^^^^^^^^^^^^^^^ | characters or octets is incorrectly formed. ^^^^^^^^^^ | A mail reader need not attempt to display the text associated with an | 'encoded-word' that is incorrectly formed. You can only make *assumptions* about what the subject was meant to be. This bug is about implementing such heuristics. > moving to other client that does the job correctly. Since there does not exist a "correct" here, there is no such client, hence we're lucky. ;-)
Comment 25•14 years ago
|
||
(In reply to comment #24) > (In reply to comment #23) > > other clients get past this violation and display this subject correctly > > There is no "correct" subject there, there's just a broken encoding. > Actually, I'd say the best section of RfC 2047 to look at is 6.3(2): > > | 6.3. Mail reader handling of incorrectly formed 'encoded-word's > | It is possible that an 'encoded-word' that is legal according to the > | syntax defined in section 2, is incorrectly formed according to the > | rules for the encoding being used. > ... > | (2) Any 'encoded-word' which encodes a non-integral number of > ^^^^^^^^^^^^^^^^^^^^^^ > | characters or octets is incorrectly formed. > ^^^^^^^^^^ > | A mail reader need not attempt to display the text associated with an > | 'encoded-word' that is incorrectly formed. > > You can only make *assumptions* about what the subject was meant to be. > This bug is about implementing such heuristics. > > > moving to other client that does the job correctly. > > Since there does not exist a "correct" here, there is no such client, hence > we're lucky. ;-) I really not interested in getting into argue with you nor any one else, but the fact is that Outlook Express managed to display the subject correctly god knows how, it is frustrating that TB's can't do so. I'm not a big fan of Microsoft's software but somehow they do the trick right, and that is what I expect from TB, even if the massage encoded not by the standard RFC rules. It mostly can be equivalent saying that some website is not written in the right web standard thats why FireFox can't diplay it right but I.E. can, the fact the there is a program that can display it correctly and it's Microsoft's is frustrating...and FireFox not going to do anything about it, just kick back and relax until the whole world will code their website in a proper way (thats a huge bummer).
Comment 26•14 years ago
|
||
Add a screenshot of the current problem.
Comment 27•14 years ago
|
||
letter (eml), referring to the above attached screenshot
Comment 28•14 years ago
|
||
the same letter, but in the web interface below the source code in the web interface: Received: from mxfront71.mail.yandex.net ([127.0.0.1]) by mxfront71.mail.yandex.net with LMTP id cKYGAw2V for <mc.sim@k-max.name>; Thu, 7 Oct 2010 22:38:20 +0400 Received: from srv51-h-st.jino.ru (srv51-h-st.jino.ru [81.177.139.53]) by mxfront71.mail.yandex.net (nwsmtp/Yandex) with ESMTP id cJwO9dPD; Thu, 7 Oct 2010 22:38:19 +0400 X-Yandex-Front: mxfront71.mail.yandex.net X-Yandex-TimeMark: 1286476699 X-Yandex-Spam: 1 Received: by srv51-h-st.jino.ru (Postfix, from userid 2154) id 5E365602274; Thu, 7 Oct 2010 22:38:19 +0400 (MSD) To: mc.sim@k-max.name Subject: =?utf-8?B?UmU60LfQsNGP0LLQutCwX9C90LBf0J7QntCeX9CQLdCb0LDQsQ==?= Date: Thu, 7 Oct 2010 18:38:19 +0000 From: "mc.sim@k-max.name" <mc.sim@k-max.name> Reply-to: "mc-sim85@a-lab.name, client@a-lab.name, d_freylik@a-lab.name, d_bondar@a-lab.name," <mc-sim85@a-lab.name> Message-ID: <2229bc106c3737c52a0b3607872b977d@www.a-lab.name> X-Priority: 1 X-Mailer: cformsII (deliciousdays.com) [version 11.6.1] MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="utf-8" X-Yandex-Forward: 9ad8c647fe6d8aa2cecd22a8baeb6ab9 Здравствуйте, Иван! Спасибо за Ваше письмо! Мы ответим Вам в ближайшее время.
Comment 29•13 years ago
|
||
The subject is not full. It ends with symbol �
Comment 32•13 years ago
|
||
I think I got the point. It seems to be the fault of the software/system the mail sender use by mal-splitting the chars embedded. I think it would be nice if there's an extension or a tool to fix the encoding. Here's a simple tool I made to fix the mal-splitted e-mails. Feel free to use it: http://db.tt/gxzlS41
Comment 33•13 years ago
|
||
(In reply to Alex from comment #25) > I really not interested in getting into argue with you nor any one else, but > the fact is that Outlook Express managed to display the subject correctly > god knows > how, it is frustrating that TB's can't do so. > I'm not a big fan of Microsoft's software but somehow they do the trick > right, > and that is what I expect from TB, even if the massage encoded not by the > standard RFC rules. > It mostly can be equivalent saying that some website is not written in the > right web standard thats why FireFox can't diplay it right but I.E. can, > the fact the there is a program that can display it correctly and it's > Microsoft's is frustrating...and FireFox not going to do anything about it, > just kick back and relax until the whole world will code their website in a > proper way (thats a huge bummer). I totally agree! I love TB by this BUG (yes, it's a bug) is driving me insane (and not only me) and should be fixed. Why you FORCE your users to have this problem?
Comment 35•12 years ago
|
||
Bug was described in 2009. 3 years passed and it is seems that this problem will never be fixed because it is "RFC violation". But there are plenty e-mail clients that think about user experience and not about strict RFC compliance. It is very frustrating that TB developers think about RFC and not about people who use this software :(
Assignee | ||
Comment 36•12 years ago
|
||
Attachment #682871 -
Flags: review?(jduell.mcbugs)
Comment 37•12 years ago
|
||
Comment on attachment 682871 [details] [diff] [review] Unit test and patch bz: if you can't take this yourself, can you suggest a reviewer with Unicode chops that we can delegate it to?
Attachment #682871 -
Flags: review?(jduell.mcbugs) → review?(bzbarsky)
Comment 38•12 years ago
|
||
Comment on attachment 682871 [details] [diff] [review] Unit test and patch Julian, are you willing to review this? If not, please let me know and I'll find someone else...
Attachment #682871 -
Flags: review?(bzbarsky) → review?(julian.reschke)
Comment 39•12 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #38) > Comment on attachment 682871 [details] [diff] [review] > Unit test and patch > > Julian, are you willing to review this? If not, please let me know and I'll > find someone else... I can't do a proper *code* review, but I have provide some feedback. 1) Changing the code to be more forgiving should only be done when there's evidence that it is indeed needed. How frequent are these malformed header fields? What MUA is producing them? Just because other MUAs accept them isn't a sufficient reason (to me). 2) More importantly, we are currently trying to *remove* RFC2047 support from Firefox (see bug 601933). Also, there's work on replacing the header field parsing in the mail client by a new JS library (see bug 746052). If the latter is going forward, the code change proposed here will become irrelevant anyway.
Updated•12 years ago
|
Attachment #682871 -
Flags: review?(julian.reschke) → review?(bzbarsky)
Comment 40•12 years ago
|
||
(In reply to Julian Reschke from comment #39) > 2) More importantly, we are currently trying to *remove* RFC2047 support > from Firefox (see bug 601933). Also, there's work on replacing the header > field parsing in the mail client by a new JS library (see bug 746052). If > the latter is going forward, the code change proposed here will become > irrelevant anyway. That code is moving very slowly at the present time, so we'll still need to use nsMIMEHeaderParam for several release cycles.
Updated•12 years ago
|
Attachment #682871 -
Flags: review?(bzbarsky) → review?(smontagu)
Comment 41•12 years ago
|
||
Comment on attachment 682871 [details] [diff] [review] Unit test and patch Review of attachment 682871 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozilla/netwerk/mime/nsMIMEHeaderParamImpl.cpp @@ +1075,5 @@ > + if(aQOrBase64 == 'Q' || aQOrBase64 == 'q') > + decodedText = DecodeQ(aEncoded, aLen); > + else if (aQOrBase64 == 'B' || aQOrBase64 == 'b') { > + decodedText = PL_Base64Decode(aEncoded, aLen, nullptr); > + } else { The input to aQOrBase64 has gone through ToUpper, so no need to test against lowercase 'q' and 'b'. Perhaps this whole test can become an assertion. @@ +1084,5 @@ > + return NS_ERROR_INVALID_ARG; > + } > + > + nsresult rv; > + nsCOMPtr<nsIUTF8ConverterService> Nit: whitespace at end of line. @@ +1173,1 @@ > goto badsyntax; You should be testing curEncoding here, no? @@ +1263,5 @@ > + if (!encodedText.IsEmpty()) { > + rv = DecodeQOrBase64Str(encodedText.get(), encodedText.Length(), > + prevEncoding, prevCharset.get(), aResult); > + if (NS_FAILED(rv)) { > + aResult.Append(encodedText); Nit: overindenting
Assignee | ||
Comment 42•12 years ago
|
||
(In reply to Simon Montagu from comment #41) > @@ +1173,1 @@ > > goto badsyntax; > > You should be testing curEncoding here, no? > Good catch. The document at: https://developer.mozilla.org/en-US/docs/Developer_Guide/Coding_Style says that str.Truncate() is preferred to str.SetLength(0). I was wondering if I should change those str.SetLength(0) too.
Comment 43•12 years ago
|
||
Yes, please do.
Assignee | ||
Comment 44•12 years ago
|
||
Attachment #682871 -
Attachment is obsolete: true
Attachment #682871 -
Flags: review?(smontagu)
Attachment #686540 -
Flags: review?(smontagu)
Updated•12 years ago
|
Attachment #686540 -
Flags: review?(smontagu) → review+
Comment 45•12 years ago
|
||
Any reason not to land this?
Comment 46•12 years ago
|
||
Comment 47•12 years ago
|
||
Comment 48•12 years ago
|
||
Try result: https://tbpl.mozilla.org/?tree=Try&rev=7b41d13e4c5d
Comment 49•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/45fd2425adc4 The testcase (mailnews part) will be landed after m-c merge because c-c depends on m-c.
Assignee: nobody → ZaneUJi
Status: NEW → ASSIGNED
Flags: in-testsuite-
Comment 50•12 years ago
|
||
For the next time, - Please separate the core patch and the mailnews patch because they are in the different repository. - Please set your user name, mail address, and checkin comment in the patch header. (see http://blog.bonardo.net/2010/06/22/so-youre-about-to-use-checkin-needed for details.) - Please push patches to tryserver (or ask for it if you have no Level 1 commit access). - Your patch didn't build with compilers other than msvc because a goto statement can not cross initialization of a variable (msvc allows this violation). - Finally, please set "checkin-needed" keyword if you have no Level 3 commit access. This time, I have done it instead.
Updated•12 years ago
|
Whiteboard: [leave open]
Assignee | ||
Comment 51•12 years ago
|
||
(In reply to Masatoshi Kimura [:emk] from comment #50) > For the next time, > - Please separate the core patch and the mailnews patch because they are in > the different repository. > - Please set your user name, mail address, and checkin comment in the patch > header. (see > http://blog.bonardo.net/2010/06/22/so-youre-about-to-use-checkin-needed for > details.) > - Please push patches to tryserver (or ask for it if you have no Level 1 > commit access). > - Your patch didn't build with compilers other than msvc because a goto > statement can not cross initialization of a variable (msvc allows this > violation). > - Finally, please set "checkin-needed" keyword if you have no Level 3 commit > access. > > This time, I have done it instead. Thank you.
Comment 53•12 years ago
|
||
https://hg.mozilla.org/comm-central/rev/c438208883ab
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Flags: in-testsuite- → in-testsuite+
Resolution: --- → FIXED
Whiteboard: [leave open]
Updated•12 years ago
|
Target Milestone: --- → Thunderbird 20.0
Assignee | ||
Comment 54•11 years ago
|
||
Attachment #704826 -
Flags: review?(jduell.mcbugs)
Assignee | ||
Comment 55•11 years ago
|
||
Attachment #704827 -
Flags: review?(jduell.mcbugs)
Comment 56•11 years ago
|
||
(In reply to Zane U. Ji from comment #55) > Created attachment 704827 [details] [diff] [review] > Patch for bug 833028 The Patch works fine.
Comment 57•11 years ago
|
||
I'm confused--why are we getting new patch for a bug that's already landed? Is there something that the patch landed in comment 53 didn't fix? If so please open a new bug--we try to keep bugs to one patch per bug to make backing out and release management easier.
Comment 58•11 years ago
|
||
The bug is already open - bug 833028. I don't know why Zane U. Ji attached patches not there.
Assignee | ||
Comment 59•11 years ago
|
||
Nothing special. It's just that I wasn't so familiar with the system.
Comment 60•11 years ago
|
||
Comment on attachment 704826 [details] [diff] [review] Test case for bug 833028 Clearing obsolete requests and patches, as these have moved to bug 833028.
Attachment #704826 -
Attachment is obsolete: true
Attachment #704826 -
Flags: review?(jduell.mcbugs)
Updated•11 years ago
|
Attachment #704827 -
Attachment is obsolete: true
Attachment #704827 -
Flags: review?(jduell.mcbugs)
You need to log in
before you can comment on or make changes to this bug.
Description
•