Closed Bug 229399 Opened 17 years ago Closed 16 years ago

RFC2047 subject and realname headers [=?charset?...] miscopied if charset differs from compose body charset

Categories

(MailNews Core :: Composition, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: thomas.lussnig, Assigned: jshin1987)

References

Details

(Keywords: intl)

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20031029
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20031029

When try to reply to this header mozilla use wrong email adresses.
None of them is the correct one all are the Descriptive name decoded.
maybe even used for letting users send mail to not intended persons.
Special in case of signed mail the answer can go to wrong people if the
realname is nicly choosen.

To: lussnig@smcc.net
Cc: yoshfuji@linux-ipv6.org
Subject: Re: IPv6 Patch
From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=
<yoshfuji@linux-ipv6.org>

Reproducible: Always

Steps to Reproduce:
1. Header with
To: lussnig@smcc.net
Cc: yoshfuji@linux-ipv6.org
From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=
<yoshfuji@linux-ipv6.org>

2. Recive the mail and try to reply
3. You see the wrong names
Actual Results:  
To: YOSHIFUJI
To: Hideaki
To: /
To: <control characterts from unicode>

Expected Results:  
yoshfuji@linux-ipv6.org

copy email adress return the right result !
Confirming for the moment (my tree got 'screwed up' so that I couldn't check it
with the trunk, but I was able to reproduce it with 1.5. I have to check it again)

From: address is correctly displaed in the mail list pane and the message
display area. 
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: intl
> From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=
> <yoshfuji@linux-ipv6.org>

RFC 2822 defines : ( See http://www.faqs.org/rfcs/rfc2822.html )

> from = "From:" mailbox-list CRLF
>   mailbox-list = (mailbox *("," mailbox)) / obs-mbox-list
>     mailbox = name-addr / addr-spec
>       name-addr = [display-name] angle-addr
>       addr-spec = local-part "@" domain

[display-name] should be quoted by "(double-quote) if it contains control
character such as space.

>  display-name = phrase
>    phrase = 1*word / obs-phrase
>      obs-phrase = word *(word / "." / CFWS)
>      word = atom / quoted-string
>        atom = [CFWS] 1*atext [CFWS]

Therefore From: should be :
> From: "YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?="
> <yoshfuji@linux-ipv6.org>

What mailer created the mail?
Mailer is
X-Mailer: Mew version 4.0.62 on Emacs 21.3.50 / Mule 5.0 (SAKAKI)

1. Even if the mail is not correct escaped why use mozilla the first 3 Tokens
and not the last one wich is correct?
2. Why than copy mail adress work with the right mouse key.
3. if mozilla take is as an name list, why it skip the last token wich contain
the correct recipient
4. Is there no check agains valid emails wich should not contain the utf8 subset
above 127
I suspected that, but haven't bothered to check RFC (2)822. (well,
I should have known better than that). Changing the severity to
'enhancement' because it's not a bug per se. Various Emacs mail programs
are often broken when it comes to MIME and I18N.

As for point #4, with IDN(international domain name), it's now possible
to have non-ascii characters in the email address. RFC 2822 is not likely
to have been updated yet. And, when it's updated, perhaps punycode would
be used there for 'machines'. So, your point still stands.  Converting
between punycode and UTF-8 (or other forms of Unicode) would be mail
clients' job.

Anyway, it seems like there may be something we can do against this
kind of standard violation.
Severity: major → enhancement
OS: Linux → All
> Therefore From: should be :
>> From: "YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?="
>> <yoshfuji@linux-ipv6.org>

Ooops. Something must have gotten into my head. RFC 2047-encoded word is an atom
(cannot be within a quoted-string). Therefore, the above is not correct. 

atom            =       [CFWS] 1*atext [CFWS]
word            =       atom / quoted-string
phrase          =       1*word / obs-phrase
display-name    =       phrase

None of characters in the header here at issue is forbidden in 'atom' (they are
all valid 'atext' including 'slash'). 

atext           =       ALPHA / DIGIT / ; Any character except controls,
                        "!" / "#" /     ;  SP, and specials.
                        "$" / "%" /     ;  Used for atoms
                        "&" / "'" /
                        "*" / "+" /
                        "-" / "/" /
                        "=" / "?" /
                        "^" / "_" /
                        "`" / "{" /
                        "|" / "}" /
                        "~"


In conclusion, the address is valid. 

From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=
  <yoshfuji@linux-ipv6.org>

Severity: enhancement → normal
*** Bug 104064 has been marked as a duplicate of this bug. ***
>> Therefore From: should be :
>>> From: "YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?="
>>> <yoshfuji@linux-ipv6.org>

> RFC 2047-encoded word is an atom (cannot be within a quoted-string).
> Therefore, the above is not correct. 

Oh yeah, you are right. I forgot RFC 2047.

> In conclusion, the address is valid. 
> From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=
>  <yoshfuji@linux-ipv6.org>

No, you are not correct too.

In BNF notation, "/" is OR. See http://www.faqs.org/rfcs/rfc2234.html
> 3.2  Alternatives                               Rule1 / Rule2
>   Elements separated by forward slash ("/") are alternatives.
>   Therefore,
>        foo / bar
>   will accept <foo> or <bar>.
Since atext can not include space, display-name can not include space if it is
not quoted by "(double-quote).
Therefore, your conclusion is also incorrect. 

To satisfy both RFC(2)822 and RFC 2047, whole display-name characters should be
encoded at once.
ie. valid format in your case is as follows.
> From: =?iso-2022-jp?B?(Encoded-English&Spaces&Japanese)=?=
> <uid@domain.name>

Mozilla 2003122809-trunk/Win-Me generated recipient address of this format for
display-name includes Japanese characters.
No, you got it wrong. For sure, atext doesn't include space, but look how atom
is defined. atom is defined as a sequence of atext _enclosed_ by CFWS.

If you're right, tens of millions of emails  produced per RFC 822 (as shown
below) would be invalid by RFC 2822. Authors of RFC 2822 do care about the
backward compatibility. 

From: Jungshik Shin <jshin@example.com> 

> whole display-name characters should be encoded at once.

Well, you have to be careful NOT to exceed the encoded word length limit (78?)
in RFC 2047. You have to split somewhere if it gets too long. 
The following From: headers exhibit the same (or similar) problems:

(bug 231732)
From: Example Name =?iso-2022-jp?B?GyRCPzkyPEJZOSgbKEI=?= <test@example.com>
  reply init'd as:
To: Example Name {junk}
Nearly identical to this bug -- ISO-2022-JP, and the reply email address is 
trashed.  I'm duping this one over.

(bug 252240)
From: "=?big5?B?IkhzdSwgSmVubnkgW659pk6sTF0i?=" <yyy@xxx.tw>
  reply init'd as;
To: Jenny [{junk}]"" <yyy@xxx.tw>
Similar to this bug, but the email address is preserved; note that this address 
is incorrectly quoted, and exhibits the problems from bug 156588 and bug 254519.

(bug 258155)
From: =?Windows-1251?B?wOHw4Ozq6O3gINLg8vz/7eA=?= <mail@from.host.com>
  reply init'd as:
To: {bunch of junk} <mail@from.host.com>
Again, similar, but the email address is preserved.

Jungshik Shin, do you think the latter two examples are the same problem?
Summary: wrong reply adress on =?utf8?x?xxxx?= realnames → wrong reply address to some RFC2047 realnames [=?charset?...]
*** Bug 231732 has been marked as a duplicate of this bug. ***
In Version 1.8a3 it look that is been fixed.
From: =?GB2312?B?sbG+qdeovNK3rdLrzfhCZWlqaW5n?= Chinese Translation 
<bjhyw35@eyou.com>
Subject: =?GB2312?B?t63S63RyYW5zbGF0aW9u?=

Work correct now. So i would assign the satus to fixed.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
No bug / patch specified as the fix.

->WORKSFORME
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → WORKSFORME
Mike, do you happen to have 'Always use this default character encodings in
replies' checked? 
There's a bug in my patch to add that feature. I've had a patch for quite a
while, but forgot to file a bug and fix it. 
The results I posted in comment 10 were based on tests with various 1.8a builds 
and TB 0.7.  I just retest with 1.8a3-0824, Win2K.  (Is there a related patch 
checked in more recently than this?)

With "always use my default charset in replies," the results depend on which 
charset is specified.  If I specify Big5 as the default, then the header posted 
in comment 12 is handled correctly, as is the one I posted from bug 252240, but 
the others look wrong.  If I uncheck that setting, all the results look like 
junk, as I noted before; the header from comment 12 is entered as
  {junk}Beijing Chinese Translation <bjhyw35@eyou.com>

Reopening this bug.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
I've gotten a slightly better hold on this problem.  I believe the RFC2047 From: 
header is handled correctly -- that is, the same text appears in the To: field 
-- if the charset specified in the header is the same as the initial charset 
used to compose the message.  Once the To: header has been initialized, the 
message encoding can be changed without affecting the header.

This can be seen using either method of selecting the charset for reply -- if 
"Always use my character set" is selected, then replying to a message where the 
From: is encoded with another set reliably exhibits the problem.  If "Use the 
original sender's character set", then replying to a message that is displayed 
in some charset other than that in the From: header will exhibit the problem.
This last point is true whether the character set for the message display has 
been chosen from the Content-Type header, from the folder's properties, or from 
an explicit View|Encoding.

Bug 258856's fix has done nothing to address this issue (I'm not sure if it was 
supposed to).
*** Bug 258155 has been marked as a duplicate of this bug. ***
*** Bug 265423 has been marked as a duplicate of this bug. ***
Product: MailNews → Core
*** Bug 252592 has been marked as a duplicate of this bug. ***
*** Bug 273381 has been marked as a duplicate of this bug. ***
I really suspect it has something to do with the body charset. Quoting bug
252592 (turn on UTF-8 in case you see borked text):

QUOTE:

If I try to reply to an email, of which the "From:" header was encoded in a
different charset than it's body, the "To:" header gets transcoded improperly.

an example:

I recieved an email with the following headers:

From: "=?iso-8859-4?Q?Rytis Umbrasas =AEol=ECdis?=" <user@provider.lt>
Subject:
=?iso-8859-4?B?UHJhuXltYXMgcGFk7HQgabluYWdyaW7sdCBTY3JpYmUgbGF1a3VzILHo6uznufn+vg==?=
MIME-Version: 1.0
Content-Type: text/plain; charset="windows-1257"
Content-Transfer-Encoding: quoted-printable

In message list, the sender looks like this: 
Rytis Umbrasas Žol&#279;dis

However, when replying, the newly formed "To:" field looks like this:
Rytis Umbrasas ®ol&#291;dis

IMHO, that's because Thunderbird uses the body charset to transcode the "From:"
header, instead of using the header charset.

/QUOTE

Furthermore, recently I quite often recieve e-mails from a few Evolution users,
for which Reply names and subjects get borked, for example:

From: =?iso-8859-4?Q?K=EAstutis_Bili=FEnas?= <user@domain.lt>
To: "komp_lt@konferencijos.lt" <list@another_domain.lt>
Mime-Version: 1.0
X-Mailer: Evolution 2.0.3 
Subject: Re: .po =?iso-8859-4?q?fail=F9?= =?iso-8859-2?q?_ra=B9ybos?=
	tikrinimas
Sender: list-bounces@another_domain.lt
Errors-To: list-bounces@another_domain.lt


--===============1238572946==
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="=-lb66fM8zWmjR10clnq6M"


--=-lb66fM8zWmjR10clnq6M
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Pn, 2005-01-21 at 22:52 +0200, Marius Gedminas wrote:
> Radau nuorod=C4=85, gal bus naudinga: skriptukas, leid=C5=BEiantis aspell=
=C4=85 ant

When replying to this email, i get following junk in To: and Subject: fields:
To: &#65533; <kebil@kaunas.init.lt>
Subject: Re: .po &#65533;

(In reply to comment #21)
This one is in UTF-8.
> From: "=?iso-8859-4?Q?Rytis Umbrasas =AEol=ECdis?=" <user@provider.lt>
> Subject:
>
=?iso-8859-4?B?UHJhuXltYXMgcGFk7HQgabluYWdyaW7sdCBTY3JpYmUgbGF1a3VzILHo6uznufn+vg==?=
> MIME-Version: 1.0
> Content-Type: text/plain; charset="windows-1257"
> Content-Transfer-Encoding: quoted-printable
> 
> In message list, the sender looks like this: 
> Rytis Umbrasas Žolėdis
> 
> However, when replying, the newly formed "To:" field looks like this:
> Rytis Umbrasas ®olģdis
> 
> IMHO, that's because Thunderbird uses the body charset to transcode the "From:"
> header, instead of using the header charset.

I think my assumption is correct:
rq@bliss:~$ echo "Rytis Umbrasas Žolėdis" |iconv -futf8 -tiso8859-4 | iconv
-fcp1257 -tutf8
Rytis Umbrasas ®olģdis

This example shows that if you take "Rytis Umbrasas Žolėdis" in ISO-8859-4 (case
header charset) and interpret it as Windows-1257 (case body charset), you see
"Rytis Umbrasas ®olģdis" instead.
Thanks for detailed analysis.

Taking for better tracking. I may or may not have made a patch for this in the
past. Even if I did, I may have lost it. I'll take another look.

Assignee: sspitzer → jshin1987
Status: REOPENED → NEW
It's kind of weird tho, that those headers are transcoded properly in the
message list, and in the message preview window/pane. Why would a reply window
use different transcoding functions at all?
(In reply to comment #24)
> It's kind of weird tho, that those headers are transcoded properly in the
> message list, and in the message preview window/pane. Why would a reply window
> use different transcoding functions at all?

They do use the same  function but with a different option for 'override'.  It
used to use a different function, which I changed to use the same function but
overlooked the override option. 
Anyway, I have a fix at hand, but I need to test more extensively. 
Status: NEW → ASSIGNED
(In reply to comment #25)
> Anyway, I have a fix at hand, but I need to test more extensively. 
> 

Any news?
Fix is very simple (one-liner), but I haven't yet manage to test it comprehensively.
I don't think it requires lots of testing actually. Don't forget the fact that
you're already using exactly the same function in mesage list and preview pane.
What's so urgent? I do know my code. 
I never said you don't know your code. If it looks to you like I said so, i'm sorry.

OK, it's not that very urgent, however, I wish I could expect it to be fixed in
the next release...
*** Bug 285053 has been marked as a duplicate of this bug. ***
Tweaking summary for searchability.
Summary: wrong reply address to some RFC2047 realnames [=?charset?...] → RFC2047 subject and realname headers [=?charset?...] miscopied if charset differs from compose body charset
Hello jshin,

have you tested your one-line fix? If you have not, would you please do that or
commit it anyways? I don't think anyone wants this small yet important bug to be
forgotten for the next 15 months again. Please.

Rimas
Attached patch patchSplinter Review
Sorry for the delay. It took me quite an extensive test (not to break what I
fixed in the past while fixing this).
Attachment #179230 - Flags: superreview?(mscott)
Attachment #179230 - Flags: review?(bienvenu)
Attachment #179230 - Flags: superreview?(mscott) → superreview+
Attachment #179230 - Flags: review?(bienvenu) → review+
(In reply to comment #34)
> Created an attachment (id=179230) [edit]
> patch
> 
> Sorry for the delay. It took me quite an extensive test (not to break what I
> fixed in the past while fixing this). 

Thank you very much! :) You made my morning good! ;)
thanks for r/sr. Fixed on the trunk
Status: ASSIGNED → RESOLVED
Closed: 17 years ago16 years ago
Resolution: --- → FIXED
Verified fixed with TB 1.0+0406, Win2K.  Thank you, Jungshik!
Status: RESOLVED → VERIFIED
jshin, don't listen to them when they hard press you to check in ;-)

This patch caused the regression in bug 291320.
*** Bug 298669 has been marked as a duplicate of this bug. ***
*** Bug 274053 has been marked as a duplicate of this bug. ***
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.