Closed Bug 317263 Opened 19 years ago Closed 6 years ago

Headers with multiple charsets are displayed improperly (Mail&News converts rfc2047 encoded words to charset of first rfc2047 encoded word in Subject:)

Categories

(MailNews Core :: Backend, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 52.0

People

(Reporter: rimas, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051107 Firefox/1.5
Build Identifier: Thunderbird version 1.5 Beta 2 (20051006)

When a header of an e-mail message is made up of parts in multiple charsets, it is displayed incorrectly in at least these places:
* Message list
* "Subject" field in Reply dialog

In my case, the header that doesn't work, is the following:
Subject: Re: [AKL]  =?iso-8859-2?Q?Bevilti=B9ka?= MS
	=?iso-8859-4?Q?k=FEryba=2E=2E=2E?=

But this might also be affecting other headers, like "To:", "From:" etc.

Reproducible: Always

Steps to Reproduce:
1. Send yourself a message with a header made up of two or more parts in different charsets, encodedin QP (like in my example)
2. See how Thunderbird transcodes that header in different places.



This RFC-compliant practice of creating headers like that might be sort of inefficient, but Ximian Evolution encodes them this way now.
At the top of this image, you'll see how the same Subject looks in message list, and in the preview pane;
bottom-left part shows how the Reply dialog looks;
bottom-right are screenshots of the message source. I'm including them just to demonstrate how a message composed using Ximian Evolution, and sent to a list might look at the end: a real charset soup. ;)
Reproduced with TB 1.6a1-1117, Seamonkey 1.1a-1030 -- moving to Core

Note that the same results are seen if the header is not folded, but appears all on one line; so, the fold is not part of the problem.

The symptom of the u-with-overscore transliterated to "u-" is similar to the symptom described at bug 271508 comment 13, but the context is quite different.

Bug 276199 has a similar symptom of differing display in the Thread pane and envelope panel.
Status: UNCONFIRMED → NEW
Component: Mail Window Front End → MailNews: MIME
Ever confirmed: true
OS: Windows XP → All
Product: Thunderbird → Core
Hardware: PC → All
Version: unspecified → Trunk
Assignee: mscott → nobody
QA Contact: mime
Product: Core → MailNews Core
> Subject: Re: [AKL]  =?iso-8859-2?Q?Bevilti=B9ka?= MS
>     =?iso-8859-4?Q?k=FEryba=2E=2E=2E?=
Checked with next case too. (kūryba... in utf-8 is base64 encoded)
> Subject: Re: [AKL-3]  =?iso-8859-2?Q?Bevilti=B9ka?= MS
>    =?UTF-8?B?a8WrcnliYS4uLg==?=

(A) For ku-ryba at thread pane.

In .msf, Subject: is saved as follows in both cases.
> <(8F=10000011)(90=21)(91=<T-000003-M-000001@f.f.f>)(92
>     =M-000001-T-000003@t.t.t)(9B
>     =[AKL]  =?ISO-8859-2?Q?Bevilti=B9ka_MS____ku-ryba=2E=2E?=$0D$0A =?ISO-8859\
> -2?Q?=2E?=)(94=MAIL.000001)(95=account3)(96=tag-odd)(98=15)(99=ffffffff)>
> {0:^80 {(k^98:c)(s=9)1:m } [0(^88^8F)(^8A=0)(^8B=21)(^82^91)(^85^92)(^81^9B)
>     (^83^94)(^C2^95)(^86=0)(^C3=0)(^89=1)(^C4^96)(^87^9C)(^8C=15)(^9B^99)
>     (^8F=0)(^C5=0)]}

Two encoded words looks to be merged into one encoded word. It looks merge failure.
  Tries to convert =FE in quoted-printable/iso-8859-4 to iso-8859-2.
  But it failed, then replaced by "-".  

If order of iso-8849-2=>iso-8859-4 is reversed, problem is not observed.
> Subject: Re: [AKL]  =?iso-8859-4?Q?k=FEryba=2E=2E=2E?= MS
>     =?iso-8859-2?Q?Bevilti=B9ka?=
If first encoded word is changed to utf-8, problem is not observed.
> Subject: Re: [AKL] =?UTF-8?B?a8WrcnliYS4uLg==?= MS
>     =?iso-8859-4?Q?k=FEryba=2E=2E=2E?=

It looks problem with limied characters(like =FE) in very limited character sets.

(B) For ku-ryba at message header box.

WORKSFORME with Tb 3.0.5 on MS Win-XP. Correctly shown.
0xFE of iso-8859-4 = U+016B can not be converted to iso-8859-2.
0xB9 of iso-8859-4 = U+0161 = 0xB9 of iso-8859-2.

Why Tb tries to convert and merge to first word's charset even though it's not always possible? If merge is needed, I think utf-8 should be used.
As written in bug 271508 comment 13 and 14, conversion of ū of iso-8859-4 to u- was correct, if conversion to iso-8859-2 was requested.
Problem is:
  No one requested such conversion. Mail&News converted by himself.
  If conversion/merge is needed, utf-8 should be selected as target charset.
Component: MIME → Backend
QA Contact: mime → backend
Summary: Headers with multiple charsets are displayed improperly → Headers with multiple charsets are displayed improperly (Mail&News converts rfc2047 encoded words to charset of first rfc2047 encoded word in Subject:)
Blocks: RFC2047
Can the priority on getting this bug fixed be bumped up?

Here's a sample from rfc2047:

Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
    =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=

This bug isn't asking for Thunderbird to be liberal in what it accepts, it's asking for Thunderbird to actually follow the specification.
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
    =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
This appears to work, I get: If you can read this you understand the example.

This works too:
Subject: Re: [AKL]  =?iso-8859-2?Q?Bevilti=B9ka?= MS
	=?iso-8859-4?Q?k=FEryba=2E=2E=2E?=

Most likely fixed by the introduction of JS Mime in around 2015.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 52.0
Awesome :)

I have a fair amount of faith that Joshua Cramner's JS Mime implementation likely did manage to fix this issue if his code made it into Thunderbird, so this is good to hear!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: