Headers with multiple charsets are displayed improperly (Mail&News converts rfc2047 encoded words to charset of first rfc2047 encoded word in Subject:)

NEW
Unassigned

Status

MailNews Core
Backend
12 years ago
3 years ago

People

(Reporter: Rimas Kudelis, Unassigned)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

12 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051107 Firefox/1.5
Build Identifier: Thunderbird version 1.5 Beta 2 (20051006)

When a header of an e-mail message is made up of parts in multiple charsets, it is displayed incorrectly in at least these places:
* Message list
* "Subject" field in Reply dialog

In my case, the header that doesn't work, is the following:
Subject: Re: [AKL]  =?iso-8859-2?Q?Bevilti=B9ka?= MS
	=?iso-8859-4?Q?k=FEryba=2E=2E=2E?=

But this might also be affecting other headers, like "To:", "From:" etc.

Reproducible: Always

Steps to Reproduce:
1. Send yourself a message with a header made up of two or more parts in different charsets, encodedin QP (like in my example)
2. See how Thunderbird transcodes that header in different places.



This RFC-compliant practice of creating headers like that might be sort of inefficient, but Ximian Evolution encodes them this way now.
(Reporter)

Comment 1

12 years ago
Created attachment 203771 [details]
A combination of screenshots where this problem is seen

At the top of this image, you'll see how the same Subject looks in message list, and in the preview pane;
bottom-left part shows how the Reply dialog looks;
bottom-right are screenshots of the message source. I'm including them just to demonstrate how a message composed using Ximian Evolution, and sent to a list might look at the end: a real charset soup. ;)

Comment 2

12 years ago
Reproduced with TB 1.6a1-1117, Seamonkey 1.1a-1030 -- moving to Core

Note that the same results are seen if the header is not folded, but appears all on one line; so, the fold is not part of the problem.

The symptom of the u-with-overscore transliterated to "u-" is similar to the symptom described at bug 271508 comment 13, but the context is quite different.

Bug 276199 has a similar symptom of differing display in the Thread pane and envelope panel.
Status: UNCONFIRMED → NEW
Component: Mail Window Front End → MailNews: MIME
Ever confirmed: true
OS: Windows XP → All
Product: Thunderbird → Core
Hardware: PC → All
Version: unspecified → Trunk

Updated

10 years ago
Assignee: mscott → nobody
QA Contact: mime
(Assignee)

Updated

10 years ago
Product: Core → MailNews Core
> Subject: Re: [AKL]  =?iso-8859-2?Q?Bevilti=B9ka?= MS
>     =?iso-8859-4?Q?k=FEryba=2E=2E=2E?=
Checked with next case too. (kūryba... in utf-8 is base64 encoded)
> Subject: Re: [AKL-3]  =?iso-8859-2?Q?Bevilti=B9ka?= MS
>    =?UTF-8?B?a8WrcnliYS4uLg==?=

(A) For ku-ryba at thread pane.

In .msf, Subject: is saved as follows in both cases.
> <(8F=10000011)(90=21)(91=<T-000003-M-000001@f.f.f>)(92
>     =M-000001-T-000003@t.t.t)(9B
>     =[AKL]  =?ISO-8859-2?Q?Bevilti=B9ka_MS____ku-ryba=2E=2E?=$0D$0A =?ISO-8859\
> -2?Q?=2E?=)(94=MAIL.000001)(95=account3)(96=tag-odd)(98=15)(99=ffffffff)>
> {0:^80 {(k^98:c)(s=9)1:m } [0(^88^8F)(^8A=0)(^8B=21)(^82^91)(^85^92)(^81^9B)
>     (^83^94)(^C2^95)(^86=0)(^C3=0)(^89=1)(^C4^96)(^87^9C)(^8C=15)(^9B^99)
>     (^8F=0)(^C5=0)]}

Two encoded words looks to be merged into one encoded word. It looks merge failure.
  Tries to convert =FE in quoted-printable/iso-8859-4 to iso-8859-2.
  But it failed, then replaced by "-".  

If order of iso-8849-2=>iso-8859-4 is reversed, problem is not observed.
> Subject: Re: [AKL]  =?iso-8859-4?Q?k=FEryba=2E=2E=2E?= MS
>     =?iso-8859-2?Q?Bevilti=B9ka?=
If first encoded word is changed to utf-8, problem is not observed.
> Subject: Re: [AKL] =?UTF-8?B?a8WrcnliYS4uLg==?= MS
>     =?iso-8859-4?Q?k=FEryba=2E=2E=2E?=

It looks problem with limied characters(like =FE) in very limited character sets.

(B) For ku-ryba at message header box.

WORKSFORME with Tb 3.0.5 on MS Win-XP. Correctly shown.
0xFE of iso-8859-4 = U+016B can not be converted to iso-8859-2.
0xB9 of iso-8859-4 = U+0161 = 0xB9 of iso-8859-2.

Why Tb tries to convert and merge to first word's charset even though it's not always possible? If merge is needed, I think utf-8 should be used.
As written in bug 271508 comment 13 and 14, conversion of ū of iso-8859-4 to u- was correct, if conversion to iso-8859-2 was requested.
Problem is:
  No one requested such conversion. Mail&News converted by himself.
  If conversion/merge is needed, utf-8 should be selected as target charset.
Component: MIME → Backend
QA Contact: mime → backend
Summary: Headers with multiple charsets are displayed improperly → Headers with multiple charsets are displayed improperly (Mail&News converts rfc2047 encoded words to charset of first rfc2047 encoded word in Subject:)
Blocks: 673092

Comment 6

3 years ago
Can the priority on getting this bug fixed be bumped up?

Here's a sample from rfc2047:

Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
    =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=

This bug isn't asking for Thunderbird to be liberal in what it accepts, it's asking for Thunderbird to actually follow the specification.
You need to log in before you can comment on or make changes to this bug.