The default bug view has changed. See this FAQ.

invalid decoding of multiline utf-8 subject header (better to be tolerant with split of byte code at wrong position in RFC 2047 encoding)

RESOLVED FIXED in Thunderbird 20.0

Status

MailNews Core
MIME
--
enhancement
RESOLVED FIXED
8 years ago
4 years ago

People

(Reporter: Csaba Sebestyén, Assigned: Zane U. Ji)

Tracking

(Blocks: 2 bugs, {testcase})

Trunk
Thunderbird 20.0
testcase
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(9 attachments, 3 obsolete attachments)

(Reporter)

Description

8 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; hu; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8
Build Identifier: verzió 2.0.0.21 (20090302)

If a multiline base64 encoded subject header is wrapped in the middle of a multibyte character, then it's not displayed properly.


Reproducible: Always

Steps to Reproduce:
Create/send a mail with the next header, and open it in Thunderbird:

Subject: =?utf-8?B?S0FSIC0gw4lydGVzw610w6lzIHbDoXLDs2xpc3TDoXJhIGhlbHllesOpc3LF?=
 =?utf-8?B?kWw=?=

Actual Results:  
Shows this subject: "KAR - Értesítés várólistára helyezésr�"

Expected Results:  
Shows this subject: "KAR - Értesítés várólistára helyezésről"

The letter 'ő' (u0151) is a special hungarian character, it may be displayed as a question mark on your system. But anyway, the last character is 'l' (u006c) must be displayed properly.
c3LF in base64 == 0x7372C5 (last part of first encoded word) 
kWw= in Base64 == 0x916C   (second encoded word)
'ő'(U+0151) == 0xC5 0x91 in UTF-8.

An encoded word should complete within the encoded word. Mail sender have to split to 0x...7372 and 0xC5916C(split based on unicode character) upon base64 encoding.
If Tb generated the header, Tb's bug. If Tb didin't generate the header, INVALID.
(Reporter)

Comment 2

8 years ago
Can you reference an rfc about this interpretation? I cannot find any.

Other email-clients displays this header correctly (Gmail, RoundCube Webmail)
RFC 2047.

My description of "should complete within the encoded word" was inaccurate. It was based on "3. Character sets" in RFC 2047, but description in "3. Character sets" was for "escape sequence" of some character sets such as ISO-2022-JP. Sorry for my confusion.

I think your case corresponds to next section.

> http://www.faqs.org/rfcs/rfc2047.html
> 5. Use of encoded-words in message headers
>(snip)
> (1) An 'encoded-word' may replace a 'text' token (as defined by RFC 822)
>    in any Subject or Comments header field, (snip)
>(snip)
>   Ordinary ASCII text and 'encoded-word's may appear together in the
>   same header field.  However, an 'encoded-word' that appears in a
>   header field defined as '*text' MUST be separated from any adjacent
>   'encoded-word' or 'text' by 'linear-white-space'.

AFAIR, same phenomenon was reported to some other bugs and closed as INVALID. I don't know whether quirks for or torelance with such RFC violation is requested ot not.
In evaluating whether this deserves some sort of quirk, it would be very helpful to know exactly which clients generate messages with this sort of header...
(Reporter)

Comment 5

8 years ago
phpmailer did it.

I think the tolerant interpretation of the RFC should be preferred in this case.
This case is relieved by code like next;
  If (consecutive encoded words have same charset)
    { join decoded "consecutive encoded words"; }  
  Interprets the "joined decoded consecutive encoded words" by specified charset;
But this kind of coding breaks next case.
   - First encoded word contains single garbage byte.
   - Second encoded word contains correct byte codes.
   In this case, only "single garbage byte" is displayed in U+FFFD currently.
   But if abve change is made, all characters in second encoded word
   can be displayed in U+FFFD, if the "single garbage byte" changes interpreted
   unicode characters for byte codes in second encoded word.

AFAIK, most often case is split of 3 byte code(this bug's case), and garbage is in last part of last encoded word even if garbage is involved.
So I think code like above is practical. (Ignore a part of "5. Use of encoded-words in message headers" of RFC upon decoding/interpreting.)
3 byte code => 2/3 byte code
Blocks: 476741
(Modification of comment #6)
>   If (consecutive encoded words have same charset)
>     { join decoded "consecutive encoded words"; }  

Report of incorrect splitting is mainly for next case.
  (1) UTF-8 (AFAIK, no bug is reported for 2 bytes code such as Shift_JIS)
            (nor EUC_JP which has 3 bytes code in addition to 2 bytes code)
  (2) split of next (AFAIK, no bug is reported for 2 bytes code only case)
  (2-a) split of 3 bytes code
  (2-b) split of 2 bytes code, if single or 3 byte code is mixed
And RFC 2047 "3. Character sets" shouldn't be ignored when charset which has escape sequence. So, even if quirks will be implemented, it should be limited to UTF-8 only. 
(Modified version) 
>   If ( utf-8
>        && (consecutive encoded words have same charset)
>        && (consecutive encoded words uses same encoding of Q or B)
>      )
>      { join decoded binary of consecutive encoded words; }
Duplicate of this bug: 505068
Severity: normal → enhancement
OS: Windows XP → All
Summary: invalid decoding of multiline utf-8 subject header → invalid decoding of multiline utf-8 subject header (better to be tolerant with split of byte code at wrong position in RFC 2047 encoding)
Version: unspecified → Trunk
No longer blocks: 476741
Duplicate of this bug: 476741
Created attachment 390778 [details]
testcase from bug #476741

attached testcase from bug #476741
Confirmed here on 

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2pre) Gecko/20090725 Lightning/1.0pre Shredder/3.0b4pre ID:20090725112141

Otherwise, confirmed per duplicate in comment #09 and comment #10
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: testcase
(In reply to comment #12)
> confirmed per duplicate in comment #09 and comment #10

Since this bug is Severity=Enhancement, changing from UNCONFIRMED to NEW usually means "it's proved that request is reasonable and very useful for many users and required for many users" or "developers or tiage team accepted the enhancement request".
Aureliano Buendía, you changed this enhancement request bug to NEW. You've completed feasibility study of enhancement request of this bug? Or you are ready to implement enhancement of this bug?
For me this is a bug not enhancement.

I forgot of read "Importance:" tag in this bug: in bug #476741 it is "normal".

If you think that this is "enhancement" set as "UNCONFIRMED".
Status: NEW → UNCONFIRMED
Ever confirmed: false
(Off-Topic)

(In reply to comment #14)
> For me this is a bug not enhancement.
> If you think that this is "enhancement" set as "UNCONFIRMED".

Right box of Importance field is Severity: of "bug" at B.M.O, which is clearly defined. It's irrelevant to your thoughts about "bug or enhancement". It's irrelevant to my thoughts about "bug or enhancement" too.
Aureliano Buendía, please read at least next page, which is linked by "Importance" in this bug.
> https://bugzilla.mozilla.org/page.cgi?id=fields.html#importance
  Other than enhancement(trivial to blocker) :
    Flaw in code of Tb really exists(Tb doesn't work as designed)
  enhancement :
    Tb works as designed, but improvement/change is required.
Severity: enhancement → blocker
Severity: blocker → enhancement
Severity: enhancement → normal
Status: UNCONFIRMED → NEW
Ever confirmed: true

Updated

8 years ago
Duplicate of this bug: 519191

Updated

8 years ago
Component: Message Reader UI → MIME
Product: Thunderbird → MailNews Core
QA Contact: message-reader → mime
Hardware: x86 → All
Changing back to Severity=Enhancement, Status=UNCONFIRMED, to avoid mis-leading, confusion.
Severity: normal → enhancement
Status: NEW → UNCONFIRMED
Ever confirmed: false

Comment 18

8 years ago
Erm, if I set a status, I mean it...
While I don't think that confirmed ENH bugs should be in UNCO state, it'd clearly be an useful enhancement.
Status: UNCONFIRMED → NEW
Ever confirmed: true

Comment 19

7 years ago
is bug 506927 a dupe of this one?

Comment 20

7 years ago
what about a subject like this one: (all should go into 1 line)

> Subject: =?UTF-8?B?W2lwYXguYXQgIzE2ODg2XSBJUEFYIC0gV2VicmFkaW8gQmVzdGVsbGJlc3TD?= =?UTF-8?B?pHRpZ3VuZyAtIEJFN0I1N0Ux?=

this should read decoded:
> Subject: [ipax.at #16886] IPAX - Webradio Bestellbestätigung - BE7B57E1

but instead reads:
> Subject: [ipax.at #16886] IPAX - Webradio Bestellbest�

Comment 21

7 years ago
Created attachment 414842 [details]
my testcase
Duplicate of this bug: 596896

Updated

7 years ago
Blocks: 586981

Comment 23

7 years ago
Well, it is to convenient to say "this bug caused by RFC rule violation " 
so thats not TB's problem.
But yes it is TB's problem, so what if the program that generated this mail
faulty encoded the subject violating this RFC rule, other clients get past
this violation and display this subject correctly, the same is expected from TB.
I love TB, it's great E-mail client, but you can't just say that "we sticking directly to the rules with no exception" because that will leave people no choice
rather then moving to other client that does the job correctly.

Comment 24

7 years ago
(In reply to comment #23)
> other clients get past this violation and display this subject correctly

There is no "correct" subject there, there's just a broken encoding.
Actually, I'd say the best section of RfC 2047 to look at is 6.3(2):

| 6.3. Mail reader handling of incorrectly formed 'encoded-word's
|   It is possible that an 'encoded-word' that is legal according to the
|   syntax defined in section 2, is incorrectly formed according to the
|   rules for the encoding being used.
...
|   (2) Any 'encoded-word' which encodes a non-integral number of
                                           ^^^^^^^^^^^^^^^^^^^^^^
|       characters or octets is incorrectly formed.
        ^^^^^^^^^^
|   A mail reader need not attempt to display the text associated with an
|   'encoded-word' that is incorrectly formed.

You can only make *assumptions* about what the subject was meant to be.
This bug is about implementing such heuristics.

> moving to other client that does the job correctly.

Since there does not exist a "correct" here, there is no such client, hence we're lucky. ;-)

Comment 25

7 years ago
(In reply to comment #24)
> (In reply to comment #23)
> > other clients get past this violation and display this subject correctly
> 
> There is no "correct" subject there, there's just a broken encoding.
> Actually, I'd say the best section of RfC 2047 to look at is 6.3(2):
> 
> | 6.3. Mail reader handling of incorrectly formed 'encoded-word's
> |   It is possible that an 'encoded-word' that is legal according to the
> |   syntax defined in section 2, is incorrectly formed according to the
> |   rules for the encoding being used.
> ...
> |   (2) Any 'encoded-word' which encodes a non-integral number of
>                                            ^^^^^^^^^^^^^^^^^^^^^^
> |       characters or octets is incorrectly formed.
>         ^^^^^^^^^^
> |   A mail reader need not attempt to display the text associated with an
> |   'encoded-word' that is incorrectly formed.
> 
> You can only make *assumptions* about what the subject was meant to be.
> This bug is about implementing such heuristics.
> 
> > moving to other client that does the job correctly.
> 
> Since there does not exist a "correct" here, there is no such client, hence
> we're lucky. ;-)
I really not interested in getting into argue with you nor any one else, but the fact is that Outlook Express managed to display the subject correctly god knows
how, it is frustrating that TB's can't do so.
I'm not a big fan of Microsoft's software but somehow they do the trick right,
and that is what I expect from TB, even if the massage encoded not by the
standard RFC rules.
It mostly can be equivalent saying that some website is not written in the
right web standard thats why FireFox can't diplay it right but I.E. can,
the fact the there is a program that can display it correctly and it's
Microsoft's is frustrating...and FireFox not going to do anything about it,
just kick back and relax until the whole world will code their website in a
proper way (thats a huge bummer).

Comment 26

7 years ago
Created attachment 481785 [details]
screenshot of the current problem.

Add a screenshot of the current problem.

Comment 27

7 years ago
Created attachment 481786 [details]
letter (eml)

letter (eml), referring to the above attached screenshot

Comment 28

7 years ago
Created attachment 481787 [details]
screenshot web interface

the same letter, but in the web interface below the source code in the web
interface:
Received: from mxfront71.mail.yandex.net ([127.0.0.1])
    by mxfront71.mail.yandex.net with LMTP id cKYGAw2V
    for <mc.sim@k-max.name>; Thu, 7 Oct 2010 22:38:20 +0400
Received: from srv51-h-st.jino.ru (srv51-h-st.jino.ru [81.177.139.53])
    by mxfront71.mail.yandex.net (nwsmtp/Yandex) with ESMTP id cJwO9dPD;
    Thu,  7 Oct 2010 22:38:19 +0400
X-Yandex-Front: mxfront71.mail.yandex.net
X-Yandex-TimeMark: 1286476699
X-Yandex-Spam: 1
Received: by srv51-h-st.jino.ru (Postfix, from userid 2154)
    id 5E365602274; Thu,  7 Oct 2010 22:38:19 +0400 (MSD)
To: mc.sim@k-max.name
Subject: =?utf-8?B?UmU60LfQsNGP0LLQutCwX9C90LBf0J7QntCeX9CQLdCb0LDQsQ==?=
Date: Thu, 7 Oct 2010 18:38:19 +0000

From: "mc.sim@k-max.name" <mc.sim@k-max.name>

Reply-to: "mc-sim85@a-lab.name, client@a-lab.name, d_freylik@a-lab.name,
d_bondar@a-lab.name," <mc-sim85@a-lab.name>

Message-ID: <2229bc106c3737c52a0b3607872b977d@www.a-lab.name>

X-Priority: 1

X-Mailer: cformsII (deliciousdays.com) [version 11.6.1]

MIME-Version: 1.0

Content-Transfer-Encoding: 8bit

Content-Type: text/plain; charset="utf-8"
X-Yandex-Forward: 9ad8c647fe6d8aa2cecd22a8baeb6ab9

Здравствуйте, Иван!

Спасибо за Ваше письмо!

Мы ответим Вам в ближайшее время.

Comment 29

6 years ago
Created attachment 512775 [details]
UTF8 multiline subject problem

The subject is not full. It ends with symbol �
Blocks: 673092
Duplicate of this bug: 528791

Updated

6 years ago
Duplicate of this bug: 676095

Comment 32

6 years ago
I think I got the point.

It seems to be the fault of the software/system the mail sender use by mal-splitting the chars embedded.

I think it would be nice if there's an extension or a tool to fix the encoding.

Here's a simple tool I made to fix the mal-splitted e-mails. Feel free to use it: http://db.tt/gxzlS41

Comment 33

6 years ago
(In reply to Alex from comment #25)

> I really not interested in getting into argue with you nor any one else, but
> the fact is that Outlook Express managed to display the subject correctly
> god knows
> how, it is frustrating that TB's can't do so.
> I'm not a big fan of Microsoft's software but somehow they do the trick
> right,
> and that is what I expect from TB, even if the massage encoded not by the
> standard RFC rules.
> It mostly can be equivalent saying that some website is not written in the
> right web standard thats why FireFox can't diplay it right but I.E. can,
> the fact the there is a program that can display it correctly and it's
> Microsoft's is frustrating...and FireFox not going to do anything about it,
> just kick back and relax until the whole world will code their website in a
> proper way (thats a huge bummer).

I totally agree! I love TB by this BUG (yes, it's a bug) is driving me insane (and not only me) and should be fixed. Why you FORCE your users to have this problem?

Updated

5 years ago
Duplicate of this bug: 807903

Comment 35

4 years ago
Bug was described in 2009. 3 years passed and it is seems that this problem will never be fixed because it is "RFC violation". But there are plenty e-mail clients that think about user experience and not about strict RFC compliance. It is very frustrating that TB developers think about RFC and not about people who use this software :(
(Assignee)

Comment 36

4 years ago
Created attachment 682871 [details] [diff] [review]
Unit test and patch
Attachment #682871 - Flags: review?(jduell.mcbugs)
Comment on attachment 682871 [details] [diff] [review]
Unit test and patch

bz: if you can't take this yourself, can you suggest a reviewer with Unicode chops that we can delegate it to?
Attachment #682871 - Flags: review?(jduell.mcbugs) → review?(bzbarsky)
Comment on attachment 682871 [details] [diff] [review]
Unit test and patch

Julian, are you willing to review this?  If not, please let me know and I'll find someone else...
Attachment #682871 - Flags: review?(bzbarsky) → review?(julian.reschke)

Comment 39

4 years ago
(In reply to Boris Zbarsky (:bz) from comment #38)
> Comment on attachment 682871 [details] [diff] [review]
> Unit test and patch
> 
> Julian, are you willing to review this?  If not, please let me know and I'll
> find someone else...

I can't do a proper *code* review, but I have provide some feedback.

1) Changing the code to be more forgiving should only be done when there's evidence that it is indeed needed. How frequent are these malformed header fields? What MUA is producing them? Just because other MUAs accept them isn't a sufficient reason (to me).

2) More importantly, we are currently trying to *remove* RFC2047 support from Firefox (see bug 601933). Also, there's work on replacing the header field parsing in the mail client by a new JS library (see bug 746052). If the latter is going forward, the code change proposed here will become irrelevant anyway.

Updated

4 years ago
Attachment #682871 - Flags: review?(julian.reschke) → review?(bzbarsky)
(In reply to Julian Reschke from comment #39)
> 2) More importantly, we are currently trying to *remove* RFC2047 support
> from Firefox (see bug 601933). Also, there's work on replacing the header
> field parsing in the mail client by a new JS library (see bug 746052). If
> the latter is going forward, the code change proposed here will become
> irrelevant anyway.

That code is moving very slowly at the present time, so we'll still need to use nsMIMEHeaderParam for several release cycles.
Attachment #682871 - Flags: review?(bzbarsky) → review?(smontagu)
Comment on attachment 682871 [details] [diff] [review]
Unit test and patch

Review of attachment 682871 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla/netwerk/mime/nsMIMEHeaderParamImpl.cpp
@@ +1075,5 @@
> +  if(aQOrBase64 == 'Q' || aQOrBase64 == 'q')
> +    decodedText = DecodeQ(aEncoded, aLen);
> +  else if (aQOrBase64 == 'B' || aQOrBase64 == 'b') {
> +    decodedText = PL_Base64Decode(aEncoded, aLen, nullptr);
> +  } else {

The input to aQOrBase64 has gone through ToUpper, so no need to test against lowercase 'q' and 'b'. 

Perhaps this whole test can become an assertion.

@@ +1084,5 @@
> +    return NS_ERROR_INVALID_ARG;
> +  }
> +
> +  nsresult rv;
> +  nsCOMPtr<nsIUTF8ConverterService> 

Nit: whitespace at end of line.

@@ +1173,1 @@
>        goto badsyntax;

You should be testing curEncoding here, no?

@@ +1263,5 @@
> +  if (!encodedText.IsEmpty()) {
> +    rv = DecodeQOrBase64Str(encodedText.get(), encodedText.Length(),
> +                            prevEncoding, prevCharset.get(), aResult);
> +    if (NS_FAILED(rv)) {
> +        aResult.Append(encodedText);

Nit: overindenting
(Assignee)

Comment 42

4 years ago
(In reply to Simon Montagu from comment #41)
> @@ +1173,1 @@
> >        goto badsyntax;
> 
> You should be testing curEncoding here, no?
>
Good catch.
 
The document at:
https://developer.mozilla.org/en-US/docs/Developer_Guide/Coding_Style
says that str.Truncate() is preferred to str.SetLength(0). I was wondering if I should change those str.SetLength(0) too.
Yes, please do.
(Assignee)

Comment 44

4 years ago
Created attachment 686540 [details] [diff] [review]
Updated unit test and patch
Attachment #682871 - Attachment is obsolete: true
Attachment #682871 - Flags: review?(smontagu)
Attachment #686540 - Flags: review?(smontagu)
Attachment #686540 - Flags: review?(smontagu) → review+

Comment 45

4 years ago
Any reason not to land this?
Created attachment 695139 [details] [diff] [review]
patch for check in. r=smontagu (core part)
Created attachment 695141 [details] [diff] [review]
patch for check in. r=smontagu (mailnews part)
Try result:
https://tbpl.mozilla.org/?tree=Try&rev=7b41d13e4c5d
https://hg.mozilla.org/integration/mozilla-inbound/rev/45fd2425adc4
The testcase (mailnews part) will be landed after m-c merge because c-c depends on m-c.
Assignee: nobody → ZaneUJi
Status: NEW → ASSIGNED
Flags: in-testsuite-
For the next time,
- Please separate the core patch and the mailnews patch because they are in the different repository.
- Please set your user name, mail address, and checkin comment in the patch header. (see http://blog.bonardo.net/2010/06/22/so-youre-about-to-use-checkin-needed for details.)
- Please push patches to tryserver (or ask for it if you have no Level 1 commit access).
- Your patch didn't build with compilers other than msvc because a goto statement can not cross initialization of a variable (msvc allows this violation).
- Finally, please set "checkin-needed" keyword if you have no Level 3 commit access.

This time, I have done it instead.
Whiteboard: [leave open]
(Assignee)

Comment 51

4 years ago
(In reply to Masatoshi Kimura [:emk] from comment #50)
> For the next time,
> - Please separate the core patch and the mailnews patch because they are in
> the different repository.
> - Please set your user name, mail address, and checkin comment in the patch
> header. (see
> http://blog.bonardo.net/2010/06/22/so-youre-about-to-use-checkin-needed for
> details.)
> - Please push patches to tryserver (or ask for it if you have no Level 1
> commit access).
> - Your patch didn't build with compilers other than msvc because a goto
> statement can not cross initialization of a variable (msvc allows this
> violation).
> - Finally, please set "checkin-needed" keyword if you have no Level 3 commit
> access.
> 
> This time, I have done it instead.

Thank you.
https://hg.mozilla.org/mozilla-central/rev/45fd2425adc4
https://hg.mozilla.org/comm-central/rev/c438208883ab
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Flags: in-testsuite- → in-testsuite+
Resolution: --- → FIXED
Whiteboard: [leave open]
Target Milestone: --- → Thunderbird 20.0

Updated

4 years ago
Blocks: 833028
(Assignee)

Comment 54

4 years ago
Created attachment 704826 [details] [diff] [review]
Test case for bug 833028
Attachment #704826 - Flags: review?(jduell.mcbugs)
(Assignee)

Comment 55

4 years ago
Created attachment 704827 [details] [diff] [review]
Patch for bug 833028
Attachment #704827 - Flags: review?(jduell.mcbugs)

Comment 56

4 years ago
(In reply to Zane U. Ji from comment #55)
> Created attachment 704827 [details] [diff] [review]
> Patch for bug 833028

The Patch works fine.
I'm confused--why are we getting new patch for a bug that's already landed?  Is there something that the patch landed in comment 53 didn't fix?  If so please open a new bug--we try to keep bugs to one patch per bug to make backing out and release management easier.
The bug is already open - bug 833028. I don't know why Zane U. Ji attached patches not there.
(Assignee)

Comment 59

4 years ago
Nothing special. It's just that I wasn't so familiar with the system.
Comment on attachment 704826 [details] [diff] [review]
Test case for bug 833028

Clearing obsolete requests and patches, as these have moved to bug 833028.
Attachment #704826 - Attachment is obsolete: true
Attachment #704826 - Flags: review?(jduell.mcbugs)
Attachment #704827 - Attachment is obsolete: true
Attachment #704827 - Flags: review?(jduell.mcbugs)
Duplicate of this bug: 286551
Duplicate of this bug: 672109
Duplicate of this bug: 673670
You need to log in before you can comment on or make changes to this bug.