Closed Bug 254519 Opened 20 years ago Closed 16 years ago

Incorrect RFC 2047 encoding and decoding: encoded-words not treated as atoms within headers

Categories

(MailNews Core :: MIME, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED
Thunderbird 3.0b2

People

(Reporter: knightr, Assigned: martin.wilck)

References

Details

(Keywords: intl)

Attachments

(6 files, 4 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8a1) Gecko/20040520
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8a1) Gecko/20040520

When an email contains a "From:" line encoded with RFC 2047 which contains
characters which should, per RFC (2)822, be quoted (because it contains commas
(",") or similar characters), Mozilla doesn't decode it properly and thinks
these are two addresses.

The reverse is also true.  It quotes an email address which contains a comma and
then apply RFC 2047 (which it shouldn't do).

This causes problems with MUAs which correctly encode or decode "From:" lines.



Reproducible: Always
Steps to Reproduce:
1.Send an email from Outlook Express (6.x?) with a name which contains accents
and a comma.
2.Receive it using Mozilla.
3.Try to reply.

Actual Results:  
Mozilla thinks the "From:" lines contained two email addresses.

Expected Results:  
It should have considered it to be one email address.

The syntax used by the other MUA (and with which Mozilla has problems) was
confirmed RFC 2047 compliant by its author, Keith Moore.

The syntax used by Mozilla when sending an email which contains accents and
commas was confirmed *not* RFC 2047 compliant by its author, Keith Moore.

This bug is related to Bugzilla Bug #249626.
Oops, sorry, I'm a newbie... I should add a few details...

The problem is only apparent because of the issues with commas (",") but it is
believed that it might be caused by a slight interpretation problem as to how to
deal with email address which contain things which under RFC (2)822 should be
quoted but because they contain non-ASCII characters must be encoded using RFC 2047.

In that case, it is believed that quoting should not be added before encoding
the email address using RFC 2047 (which unfortunatly Mozilla does).

It also expects to see that quoting when it receives emails and if it is missing
 and the "From:" line contains a comma (",") it think that the "From:" line
contains two email address (which cause problem when replying and when
displaying the list of messages (only one email adddress is shown)).

This is an example of an email address Mozilla has trouble with:

From: =?iso-8859-1?Q?Martin=2C_Andr=E9?= <andre.martin@example.com>

which get converted into this

From: Martin, André <andre.martin@example.com>

Due to the presence of an unquoted "," (comma) when we try to reply it thinks
we're trying to reply to two addresses...

This is what Mozilla generates when sending an email:

From: =?ISO-8859-1?Q?=22Martin=2C_Andr=E9=22?= <andre.martin@example.com> 
Appears to be a dupe of bug 231732 / bug 252240.
I found it somewhat related to bug Bugzilla Bug #249626 (as I had mentionned
below) though it wasn't logged under the same product (in that case it was
Thunderbird) and the problem was concerning appearance of the "To:" line.

It is also similar to Bugzilla Bug 122972.

It does indeed have similarities with Bugzilla bug 231732 / bug 252240 though I
cannot say if it is for the same reason (in my case it is because of an encoded
comma (",").

There might be, indeed, a part of this bug report that could be considered a
dupe if the reasons are the same but the fact that the "From:" line, as
generated by Mozilla is not propoerly encoded is not.

Keith Moore, RFC 2047 author's comment on this was:

<<Adding quotes around the name before encoding doesn't solve the problem, and
actually causes other problems.>>
I don't know what Keith Moore could mean by that.  RFC2047 (section 5, item 3) 
states:
  + An 'encoded-word' MUST NOT appear within a 'quoted-string'.

Quotes are required by RFC822, since without quoting, the comma is a delimiter 
between addresses -- as you see.  Since we cannot include the encoded-word 
*within* the quoted-string, the quotes must be part of the encoded-word.

This is indeed the same issue as bug 122972 and bug 249626 (Thunderbird and the 
suite share this code).  Those bugs are also invalid.

Bug 231732 is unrelated.
Status: UNCONFIRMED → RESOLVED
Closed: 20 years ago
Resolution: --- → INVALID
These are quotes from the emails we exchanged (used with permission):

Keith Moore says:

<<Any_ MUA that tries to "convert" something encoded in RFC 2047 to a form
without encoding is broken.    There's no way to convert every 2047-encoded
message to an unencoded message with valid syntax.  2047 is only intended to be
used for _display_ purposes, not to convert messages into a form which can be
parsed by a normal mail parser.

Adding quotes around the name before encoding doesn't solve the problem, and
actually causes other problems. >>

In the next part, the questions are by me, the answer is given by him:

<<
>So in the case of the "From:" line, what should MUA authors do?
>> 
>> - Convert RFC 2047 encoded messages to an unencoded message, quoting it as 
>> necessary to make it look like RFC (2)822 compliant (except for the presence of 
>> non-ASCII characters). 


no.  (or maybe, this is a last-resort option for patching legacy code where
it's infeasible to do it right)


>> 
>> [otherwise it causes problem with the MUA email address parsers]
>> 
>> - Convert RFC 2047 encoded messages to an unencoded message without bothering 
>> about syntax for display but quoting it, as necessary, when using it to reply.
>> That quoting would actually be temporary as it would have to be removed before 
>> encoding it using RFC 2047 (once again, this is to please these MUA address 
>> parsers).


no.


>> 
>> - Use the RFC 2047 encoded message for display but remove the RFC 2047 encoded 
>> part when replying (keeping only what's between the "<" ">").


definitely not.
>>


This part is interesting:

<<
here's what should be done in MUAs:

always maintain the message in original format.  the 2047-decoding routines 
should be used only by the code that displays messages.  

replies should be based on the unencoded message headers.  that way, the 
exact same 2047 encoding is used on a reply as was used on the message 
being replied to.

the 2047-encoding routines should only be called from the code that handles
message composition.

message stores that do searching (as in, those that support IMAP) probably
need to do it differently.  IMHO, they need to store messages in original
format but build indices based on text extracted from decoding headers
(and for that matter, body parts).
>>

His ok for quoting him:

<<
>> Do I have your permission to quote your reply when contacting the companies in 
>> question? 


yes.
>>

There was a small typo in what he said above so I asked for precisions:

<<
>Your reply (or at least my interpretation of it) suggests that you might 
>> mean "encoded message headers" (that would be the only way to be 
>> *entirely* sure that we use the exact same 2047 encoding. 


yes, I meant to say "undecoded message headers".
>>

He was a very nice person to deal with:

<<
>> Sorry for taking so much of your time...

it's really no problem.  I'm very familiar with this topic and I don't
have to think much about it ... and I type reasonably fast  :) 

Keith
>>

There seems to be a small interpretation problem of that RFC. If what I copied
above doesn't provided you with the answers you seek I can forward your question
to him...

Let me know...




Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
Thanks for your insistence.  I have read RFC2047 in more depth; section 6.2, in 
particular the NOTE there, supports your argument.

For now, I'll leave this bug as it stands, but it's possible the encoding 
portion (Mozilla should not place quotes within the encoded string) is a 
separate bug from the decoding portion (Mozilla should treat encoded-words as 
atoms).
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Incorrect RFC 2047 encoding and decoding of the "From:" line → Incorrect RFC 2047 encoding and decoding: encoded-words not treated as atoms within headers
*** Bug 122972 has been marked as a duplicate of this bug. ***
*** Bug 249626 has been marked as a duplicate of this bug. ***
I agree that adding quotes to a name before encoding is a separate issue than
how an MUA handles decoding and presentation of 2047 encoded-words.

Adding quotes to a name before encoding that name using 2047 is not strictly 
wrong, but it can produce annoying cruft on a recipient's display.  (there are
lots of annoying things that MUAs do to people's names - this is just another 
one)

On the other hand, if a sender's MUA adds double quotes before encoding it might
help to work around bugs in some recipients' MUAs that try to decode 2047
encoded-words and then parse the resulting text. 

-- Keith Moore <moore@cs.utk.edu>
I will begin by saying that I should indeed have opened two different bug
reports as, even though the issues are are related, this surely affects two very
different part of the code (I do believe that they should be corrected at the
same time however).

Mike Cowperthwaite said:

> Thanks for your insistence

No problem... I use Mozilla, I like it and I wanted it to get it right...

Keith Moore said (Hi!):

> Adding quotes to a name before encoding that name using 2047 is not strictly 
> wrong, but it can produce annoying cruft on a recipient's display.  (there are
> lots of annoying things that MUAs do to people's names - this is just another 
> one)
> 
> On the other hand, if a sender's MUA adds double quotes before encoding it 
> might help to work around bugs in some recipients' MUAs that try to decode
> 2047 encoded-words and then parse the resulting text. 

I hope you won't mind if I have a different opinion on this (especially after
harassing you so much... (-; or )-; ? ).

If the only reason for these quotes is to work around bugs in other MUA than I
think that they *should not* be added. If we use workarounds instead of
correcting the problem then they *will never* correct it and eventually assume
that the ones who got it right were wrong.

What I believe is (unfortunatly) the most popular MUA already got it right,
Mozilla will (eventually) get it right and other MUA are already getting it
right when they *send* their messages.

If one wants to have quote to workaround another MUA's bug then they could type
them in Mozilla's "Your name:" field.

What do you think?

Thanks!

Nick

PS: Please forgive my English as it is not my mother tongue. Thanks!
That should have been MUAs and quotes...

I either neeed to augment my caffeine intake or take a few days off... (-;

OS: Windows 2000 → All
Hardware: PC → All
*** Bug 256958 has been marked as a duplicate of this bug. ***
*** Bug 191102 has been marked as a duplicate of this bug. ***
*** Bug 263976 has been marked as a duplicate of this bug. ***
*** Bug 264151 has been marked as a duplicate of this bug. ***
*** Bug 264665 has been marked as a duplicate of this bug. ***
Product: MailNews → Core
*** Bug 272093 has been marked as a duplicate of this bug. ***
*** Bug 274603 has been marked as a duplicate of this bug. ***
*** Bug 274561 has been marked as a duplicate of this bug. ***
From Bug 274561.
I tested Bug 274561 case(same problem when To: header)
> To: =?iso-8859-1?Q?Fa=2E_St=FCrtz=2C_Engel?= <c.engel@stuertz.de>
And problem was re-created.
Then I tested escaped "," case.
> To: =?iso-8859-1?Q?Fa=2E_St=FCrtz=5C=2C_Engel?= <c.engel@stuertz.de>
  - "\"(=5C,escape char) is inserted before ","(=2C) 
Although another problem occured in this test,
  - Display becomes Fa. Stürtz\\, Engel
    Special data was displayed(escape char in this case).
    This is already reported problem.
problem of this bug was resolved.
Mozilla handled "," correctly when "," was escaped before encoding.

Sorry but I'm not familiar with Quoted Printable, so I don't know whether result
of inserting "=5C" only is valid QP or not.
And I can not say RFC 2822's spec is which, "," in atom should be escaped before
encoding, or no need to escape.
*** Bug 277606 has been marked as a duplicate of this bug. ***
*** Bug 279088 has been marked as a duplicate of this bug. ***
*** Bug 280624 has been marked as a duplicate of this bug. ***
As I said comment #20, if "," in encoded-word is escaped, Mozilla/Thunderbird
doesn't split email addresses.
(My understanding of RFC is now : "," in encoded-word needn't be escaped.)
This indicates that parsing is done after decoding of encoded-word.
I think this is the main cause of RFC violation, encoded-word is not treated as
atom.
Order of parsing and decoding of ecoded-word should be reverted.
*** Bug 206491 has been marked as a duplicate of this bug. ***
More examples: Thunderbird Version 1.0 (20041206)

Working:

=?ISO-8859-1?Q?Andreas_Niederl=E4nder?= <a.niederlaender@xxx.de>

Not working:

"Biehl, Alexander" <Biehl@xxx.de>

=?iso-8859-1?Q?Altpeter-R=FCb_Dr=2E_Herbert=2C_ITS-IT?=
<Herbert.Altpeter-Rueb@xxx.biz>
(In reply to comment #26)
> More examples: Thunderbird Version 1.0 (20041206)
> 
> Working:
> 
> =?ISO-8859-1?Q?Andreas_Niederl=E4nder?= <a.niederlaender@xxx.de>

The problem is not with embedded spaces in the encoded string, but with commas.


> Not working:
> 
> "Biehl, Alexander" <Biehl@xxx.de>

Testing this, it is "working" for me -- that is, replying to a from header 
containing that string, the To: in the reply matches that string.


> =?iso-8859-1?Q?Altpeter-R=FCb_Dr=2E_Herbert=2C_ITS-IT?=
> <Herbert.Altpeter-Rueb@xxx.biz>

This last example is basically the same as that described in the original 
report.
*** Bug 292603 has been marked as a duplicate of this bug. ***
*** Bug 292826 has been marked as a duplicate of this bug. ***
*** Bug 295379 has been marked as a duplicate of this bug. ***
*** Bug 292008 has been marked as a duplicate of this bug. ***
*** Bug 299640 has been marked as a duplicate of this bug. ***
*** Bug 301136 has been marked as a duplicate of this bug. ***
is there any activity to this bug? I still have the same problem with the new
Thunderbird 1.5beta version...
*** Bug 312977 has been marked as a duplicate of this bug. ***
I believed this first was a bug in Outlook 2003, and managed to report it to them (not as easy as here...).  Their answer when it comes to the decoding part is that tokenization should take place before decoding.  This is according to RFC 2047 section 6.2.

My own screen dumps and Microsofts internal bu report on this can be found on Mozillazine: http://forums.mozillazine.org/viewtopic.php?t=318281

Is there anything I can do to help fix this problem? (Testing, QA, ...)
*** Bug 322449 has been marked as a duplicate of this bug. ***
*** Bug 310547 has been marked as a duplicate of this bug. ***
*** Bug 319422 has been marked as a duplicate of this bug. ***
Is anyone working on the bug? I found this bug to be very anoying. Unfortunately, I'm not familiar with Mozilla internals so I can't fix it myself.
I would also like to ask the developers to proceed with this bug and fix it. The comments in the thread are correct - the order of tokenization and decoding of encoded-words needs to be reversed. Encoded-words form always one word (or phrase) only, they cannot be split into several addresses after having been decoded. Section 6.2 of RFC 2047 is very clear about that.

I agree with Vaclav Rehak that this issue is very bothering. Currently, e.g. all emails from Vodafone (yes, the big mobile operator, at least in the Czech Republic) are delivered in the form mentioned in the thread, which Thunderbird recognizes incorrectly. When replying to these emails, one always has to type in the email address manually. Very annoying.
Also, apart from my previous comment, I want to raise an additional one that is closely related to it. When I set my "Your name:" in "Account settings" to

Borovi&#269;ka, Jaroslav

i.e. I'm using iso-8859-2 charset, the header actually looks the following way:

From: =?ISO-8859-2?Q?=22Borovi=E8ka=2C_Jaroslav=22?= <jaroslav.borovicka@gmail.com>

Note the two pairs of =22 encoded characters. This are quotation mark signs added by the Thunderbird client into the header. Such a header should be properly decoded and !displayed! as

"Borovi&#269;ka, Jaroslav"

i.e. the recipent should see two extra quotation marks, which I have not specified as my full name ("Your name:"). In fact, many clients behave in this way, and they correctly display the quotation marks after these have been incorrectly added by Thunderbird. It is the same misunderstanding of RFC 2047 as before.

The correct way how the header should look like is

From: =?ISO-8859-2?Q?Borovi=E8ka=2C_Jaroslav?= <jaroslav.borovicka@gmail.com>

I guess that this whole error was originally not a bug, but a misunderstanding of RFC 2047. I post this comment here because I believe that these issues should be solved hand-in-hand. Again, I would like to urge the responsible person to deal with this, since more and more people are using non-ASCII characters in their full names.
To implement what follows "here's what should done by MUAs" in comment #5 (which I'm aware is what Pine does), we may have to do a rather major change. It also leads to a trouble when you send to a "not-so-smart" MUA (e.g. most of web mail services) which cannot deal with RFC2047-encoded words in charset other than that of the message body. [1] So, we might have to find other "ad-hoc" ways.
 

[1] Consider the following scenario
1. You receive a French email in UTF-8 with its sender's name and subject in RFC2047 with 'UTF-8' as charset
2. You reply to it while copying it to another person at hotmail.com (whose default setting is Windows-1252.) Because you know that hotmail cannot deal with UTF-8, you set 'Character Encoding' to Windows-1252
3. If we just copy RFC2047-encoded person's name and subject (with 'Re:' prepended) with charset=UTF-8 instead of decoding and reencoding it in Windows-1252, your recipient at hotmail wouldn't be able to read them. 
4. Needless to say, this is hotmail's fault, but it's what we have to think about.
Keywords: intl
(In reply to comment #43)
> 4. Needless to say, this is hotmail's fault, but it's what we have to think
> about.

In my view any e-mail client should (1) be a proper and bug-free e-mail client for POP and IMAP and (2) try to work around Hotmail peculiarities.
(In reply to comment #44)
> (In reply to comment #43)
> > 4. Needless to say, this is hotmail's fault, but it's what we have to think
> > about.
> 
> In my view any e-mail client should (1) be a proper and bug-free e-mail client
> for POP and IMAP and (2) try to work around Hotmail peculiarities.
> 

Hi!

These same hotmail peculiarities could change over time and DO NOT conform to the standard...

I think the best thing would be to do it the right encodings but provide a way (possibly by manually editing the prefs.js or something similar) of using a more hotmail/webmail compatible encding.

There is also the possibility of warning the user when (s)he types in something which MIGHT not be properly decoded by the receiving MUA and let h(im/er) decides whether it is important to h(er/im) or not...

Have a nice day!

Nick

PS: Anybody wants to submit a bug report to hotmail? (-; or )-; ?
Guys, it's not just hotmail but most of web mail services (and MUAs) out in the wild. Anyway, I'll try to fix this bug so that you don't have to add any more "advocacy" comment here.
(In reply to comment #46)
> Guys, it's not just hotmail but most of web mail services (and MUAs) out in the
> wild. Anyway, I'll try to fix this bug so that you don't have to add any more
> "advocacy" comment here.
> 

Hi,

Advocacy aside, how do you propose to deal with UTF-8 characters which are not re-encodable in Windows-1252 (which is, after all, a "glorified" ISO-8859-1) in the example you posted above (comment #43)?

UTF-8 can represent a LOT more characters than Windows-1252/ISO-8859-1...

This will simply create new problems and will not solve the problems some of the people who are "subscribed" to this bug have...

If you do that kind of workaround you have to make it configurable...

Nick

PS: If you are aware of MUAs that can't deal with properly encoded headers/body, please submit them a bug report and POSSIBLY implement a workaround in the meantime...
(In reply to comment #47)

> Advocacy aside, how do you propose to deal with UTF-8 characters which are not
> re-encodable in Windows-1252 (which is, after all, a "glorified" ISO-8859-1) in
> the example you posted above (comment #43)?

Just as we do now. Replace them with question marks after giving a very conspicuous warning to users that there are characters that cannot be represented in the selected character encoding (they can choose : 'go ahead nonetheless', 'go back to the composition window and change the encoding to UTF-8').

> PS: If you are aware of MUAs that can't deal with properly encoded
> headers/body, please submit them a bug report
 
I did for a few of them I do occasionally use, but as for others, it's users of those MUAs that need to do that.
 

Three months passed since the last post for this bug, and I wanted to ask whether this issue is being followed, or in the process of solving. I refer to my previous posts #41 and #42, and to the whole thread.

I strongly suggest to make Thunderbird RFC2047-compliant ASAP, long enough before version 2.0, to be able to capture all weird behavior in before 2.0-final. This concerns both receiving incoming emails correctly, as well as encoding correctly emails being sent. I understand that there is also a wide discussion about a reasonable translation of text between different charsets, but this cannot be reliably solved without compliance to RFC2047.

Please, is anybody currently dealing with this problem? If so, could you report us on the latest development? Thanks.
*** Bug 336714 has been marked as a duplicate of this bug. ***
*** Bug 339419 has been marked as a duplicate of this bug. ***
*** Bug 339970 has been marked as a duplicate of this bug. ***
*** Bug 342417 has been marked as a duplicate of this bug. ***
*** Bug 347848 has been marked as a duplicate of this bug. ***
OK, three more months passed, and there is again time to reiterate on this bug. From the recent contributions, it seems that the bug is indeed bothering, since it has been submitted five times in the last three months as a new bug (and marked as a duplicate). Please, is there any progress on this? Thanks.
This bug is the only thing that stops me from switching to Thunderbird. My co-workers use Microsoft Outlook with the first and last name separated by a comma in the From: header.

Please help me and many other users by resolving this bug!
*** Bug 358054 has been marked as a duplicate of this bug. ***
*** Bug 362849 has been marked as a duplicate of this bug. ***
Please give this issue some priority. Or al least provide some kind of feed-back here!!!
Fixing shouldn't be difficult - IMHO the solution is already mentioned in comment 24:
"I think this is the main cause of RFC violation, encoded-word is not treated as
atom. Order of parsing and decoding of ecoded-word should be reverted."
Comments about Hotmail causing this issue are nonsense: for me it is a daily nuisance and it occurs with all mail clients.
I don't think decoding is a real problem, but parsing on compose is.
We can't simply keep encoded words, copy them around and only decode them for display. When answering we get at the same point as when composing from scratch: The user wants to add, remove, modify names and addresses. And after that step everything has to be parsed and encoded.
For this we've to deal with for example the following input:
Björnsen, Jörn <jb@example.com>
Jörn Björnsen <jb@example.com>
"Björnsen, Jörn" <jb@example.com>
"Jörn Björnsen" <jb@example.com>
Jörn "King" Björnsen <jb@example.com>

It doesn't matter if this comes from a mail that is being replied to or from user input for a newly composed one.

RFC2047-encoding each of the first four display names is valid with those quotes or with them removed. Quotes in the last one should be keept and encoded in any case—and without quoting it by prepending a \.

Remark: Though the compose window offers several lines but it's not only not restricted to one address per line, the whole bunch of addresses is carried comma-separated in one single char * string through a dozend functions.

So parsing is hard and currently we fail in the first example to get that as one address. To ease that part, I'd like to put quotes around names that contain commas if we know that's just one name. Extracting
=?ISO-8859-1?Q?Bj=F6rnsen=2C_J=F6rn?= ...
from a mail is such a situation.
Quotes wrapped around the name automatically as well as by the user should be removed again before 2047-encoding. As a problem I see user entered addresses like
"Jörn "God" Bjönsen" <jb@example.com>
which should 2822-correctly be
"Jörn \"God\" Bjönsen"
and 2047-encoded
=?ISO-8859-1?Q?J=F6rn_=22God=22_Bj=F6rnsen?=

I'd like to hear some thoughts on that.
(In reply to comment #60)
> I don't think decoding is a real problem

This isn't entirely the case.  When a From: address has a comma in the MIME-encoded atom, you end up with the part of the address before the comma displayed in the Recipient column, and the second part displayed in the envelope panel.  The fact that this problem carries over on Reply is the source of most of the dupes in this bug.


> So parsing is hard and currently we fail in the first example to get that as
> one address. To ease that part, I'd like to put quotes around names that
> contain commas if we know that's just one name. Extracting
> =?ISO-8859-1?Q?Bj=F6rnsen=2C_J=F6rn?= ...
> from a mail is such a situation.

You mean, change the internal representation of the string from MIME-encoded to quoted (and also adding backslash-escaping of quotes and perhaps other special characters within the quoted string), then undoing the same before encoding on send?

I don't see a reason why that wouldn't work, and it would probably be easier to implement.  I do wonder if rearchitecting so that you *do* maintain the 2047 atoms internally until display, and until initializing the edit fields in the compose window, might not be more maintainable in the long run.
> This isn't entirely the case.

It currently is a problem. I wrote not a real problem because fixing that should be fairly simple, sorry.

> So parsing is hard and currently we fail in the first example to get that as
> one address. To ease that part, I'd like to put quotes around names that
> contain commas if we know that's just one name. Extracting
> =?ISO-8859-1?Q?Bj=F6rnsen=2C_J=F6rn?= ...
> from a mail is such a situation.

> I do wonder if rearchitecting so that you *do* maintain the 2047 atoms
> internally until display, and until initializing the edit fields in the
> compose window, might not be more maintainable in the long run.

What I meant is, that it has to be represented as quoted in the edit fields. At which stage before it gets quoted is another question. The point is, that quoted representation can easily be displayed, read and edited by the user but a quoted-printable or even base64 can't.

I'd be happy to get rid of the whole encoding/quoting until immediately before sending. Do you think it's safe if we'd just handle everything until the angled address as the display name? That would also solve the current problem with the first example—but I fear there will be a pitfall somewhere.
(In reply to comment #62)
> I'd be happy to get rid of the whole encoding/quoting until immediately
> before sending. Do you think it's safe if we'd just handle everything until
> the angled address as the display name? That would also solve the current
> problem with the first example—but I fear there will be a pitfall somewhere.

Well, it would probably take me two weeks to figure out the path these strings are taking thru the code, and another week or more to figure out how that actually works, so I'm not the one to comment on whether it's safe.  Certainly within the compose window's edit fields, it should be fine to deal with each address as a Unicode string using whatever method of special-char quoting works (e.g., you might backslash-escape the comma, rather than quoting the name, if that does the trick).

The issue of not decoding atoms until you're in a situation where you *need* the Unicode representation is a different question.  That may not be feasible in the current architecture, but see e.g. bug 314351.  I was only suggesting that it might work out to be more maintainable if the 2047 atoms were kept, where possible -- it's quite possible the opposite would be true.

(Incidentally, if you're working on this area, you might take bug 318705 into consideration.  Bug 180025 may also be affected.)
Just an intermediate report on how I see it.
Even the displaying part looks harder for me now. Decoding every header entry is already done when reading the header lines and spliting them into keyword/value pairs (see MimeHeaders_write_all_headers() in mimehdrs.cpp). Parsing the mail addresses is done at a very different later step. It's not just like swapping two function calls to get decoding of the addreses done after parsing them.
And I'm afraid those headers (addresses) are also used in other places, so just moving the decoding behind the parsing would break other parts.
Flags: blocking1.9a1?
Flags: blocking1.9?
I filed also a new entry: https://bugzilla.mozilla.org/show_bug.cgi?id=377370 and was pointed now to this bug here.

Because of 2.0.0.0rc1 acts different then 2.0beta2 (and now, the reply mail address handling is completly broken) this bug should be marked as important and as blocking for 2.0 final imho.
(In reply to comment #65)
> Decoding every header entry is already done when reading the header lines
> and spliting them into keyword/value pairs
> (see MimeHeaders_write_all_headers() in mimehdrs.cpp).
> Parsing the mail addresses is done at a very different later step.
> It's not just like swapping two function calls to get decoding of the addreses
> done after parsing them.
> And I'm afraid those headers (addresses) are also used in other places, 
> so just moving the decoding behind the parsing would break other parts.

If so, as I wrote in Comment #20, "escaping of non-escaped ',' in an atom just after decoding" is a sufficient solution for this bug.
But it makes same situation as correctly escaped ',' in an atom case, then escape character of '\' just before ',' will be displayed incorrectly.
This is already existing and independent problem, and this has to be resolved at same time.

RFC 2822 defines "specials" as follows.
> specials        =       "(" / ")" /     ; Special characters used in
>                         "<" / ">" /     ;  other parts of the syntax
>                         "[" / "]" /
>                         ":" / ";" /
>                         "@" / "\" /
>                         "," / "." /
>                         DQUOTE
I think other specials than ',' should be escaped also after decoding of an atom.
Christian Eyrich, what do you think?   
I can't believe that this problem is known since more than 2,5 years, but still not proper fixed :-(
Flags: blocking-thunderbird2?
Behavior of 2.0.0.0 final is the same like 2.0.0.0rc1

Just a copy from https://bugzilla.mozilla.org/show_bug.cgi?id=377370, the behavior of reply address handling is completly borken since 2.0.0.0rc1:

Test this and wonder about, that this bug has still not a high priority and is still unconfirmed...

1. Create test mail:

$ cat tmp/mail-umlaut.txt
From: =?iso-8859-1?Q?Sure=F6name=2C_Forename__Dr=2E?= <pb@bieringer.de>
To: "Peter Bieringer" <pb@bieringer.de>
Subject: Umlauttest

Testmail containing Umlaut and commata in From:
<<END

2. Send mail
$ cat tmp/mail-umlaut.txt | /usr/sbin/sendmail pb@bieringer.de


Results:

a) Main mail panel / sender column:
Is:     Sureöname
Should: Sureöname, Forename Dr.

-> same like in 2.0b2 and 1.5.0.10


b) Mail window / From:
Is:     Sureöname, Forename Dr. <pb@bieringer.de>
        _________  ______________________________
(break in underlining)

Should: Sureöname, Forename Dr. <pb@bieringer.de>
        _________________________________________
(no break in underlining)

Note that a "From" can normally never contain 2 entries

1.5.0.10:
Is:     Forename Dr. <pb@bieringer.de>
Should: Sureöname, Forename Dr. <pb@bieringer.de>

So since 2.0.0.0rc1, it's partially fixed, but "," is treated as address separator.

c) after hitting the reply button / recipients:
Is:     To: Peter Bieringer <pbieringer.de>
Should: To: Sureöname, Forename Dr. <pb@bieringer.de>

This is now the recipient, not the sender, completly different behavior to  1.5.0.10.
There is still no reaction from any developer for the complete buggy behavior introduced in 2.0.0.0rc1 - strange, can one check whether the "Assigned To" is still valid:

Assigned To:  	 (not reading, please use sspitzer@moz...

Currently there is also no QA contact specified, can one adjust this?
->defaults (seth isn't doing mail bugs these days)
Assignee: sspitzer → nobody
QA Contact: mime
(In reply to comment #71)
> There is still no reaction from any developer for the complete buggy behavior
> introduced in 2.0.0.0rc1

Do you mean (b) or (c) (or both) from comment 70?  (b) looks like old behavior to me, but I could be misremembering; I don't recall (c) being a problem.  
I don't see either issue in 2.0 final.
(In reply to comment #74)

> I don't see either issue in 2.0 final.

I have 2.0.0.0 (20070326) Win-EN version, and I can confirm both a) and b) from comment #70 when viewing e-mails that have been stored on the local database (using POP3) with previous versions of Thunderbird.

However, when I receive a new email now with the new version of Thunderbird, everything seems to work fine.

I suspect that old emails have been already incorrectly stored in the local database. When I look at the header of an old email that causes these problems, I see:

From: =?iso-8859-2?Q?Kub=EDkov=E1=2C_VF-CZ?= <kk@vf.com>

However, the headers of freshly received emails contain

From: "=?ISO-8859-2?Q?Borovi=E8ka,_Jaroslav?=" <bb@gmail.com>

The headers are now being stored with quoted ISO strings, and Thunderbird's behavior is correct (even when replying). However, notice that in the first case, the comma is encoded as =2C, Unfortunately, I was not able to reproduce this to check that Thunderbird behaves correctly even in cases when commas are encoded in such a way in freshly received emails. I think this should be done before marking the problem as solved, just in case.

Regarding c) from comment #70, I cannot confirm the described behavior. Using the new version of Thunderbird to reply to old emails stored with previous versions of Thunderbird still produces the same buggy behavior as described in many previous comments (not the one from comment #70). Replying to newly received emails is correct.
In reply to https://bugzilla.mozilla.org/show_bug.cgi?id=254519#c74 just note, that all e-mails are stored in IMAP (courier-imapd [maildir] or dovecot [mbox]) and it is expected by me that all e-mails are untouched.

BTW: server and client OS are Linux.

On a second system using 2.0.0.0 I still can reproduce issue b) instead of c). 
On a third system (also Fedora Core 6) I can reproduce issue c)

Double checked this issue I found that after removing the complete thunderbird configuration on the third system and reenter all the settings I get behavior b). Note on the second system I started with a clean 2.0.0.0 configuration, while on the third system 2.0b2 was tested before.
Adding to comment #75, when I receive an email (thanks Peter) where the comma is encoded as =2C

From: =?iso-8859-1?Q?Sure=F6name=2C_Forename__Dr=2E?= <pb@bieringer.de>

the behavior is still buggy, exactly as described in comment #70 a) and b). However, when I reply, I do not get the behavior described in c), but the old incorrect behavior, where From field splits into two addresses:

To: Sureöname
To: Forename Dr. <pb@bieringer.de>

Thus, at least for me, it works exactly in the same buggy way as in Thunderbird 1.5.
Is there any progress in this issue by developers? Most of the entries here are relating to "bug is seen and can be confirmed" and "duplicates"...
Is it not strange that no one cares about the bug and also it is not assigned to anyone:

   Assigned To:  	 Nobody; OK to take it and work on it


Would this mean that it is not planned to fix this bug?
It's bad but not that strange given the code is deep down, quite ugly and you can damage quite a few by changing it.

But it doesn't get better if you ask each few weeks. I guess that one would be a candidate for a bounty.
I've submitted a "ping", because for user's point of view only "duplicate of this bug" (inbetween > 30 times) is happen on this bug, but no developing related comment since January until https://bugzilla.mozilla.org/show_bug.cgi?id=254519#c81

I think, this issue is a major one because it can break the use of Thunderbird in companies (on Linux and Windows desktops), because users (and managers...) have problems accepting this buggy behaviour.

If there is a developer ressource problem, perhaps one of the CCs has a valid subscription for "Red Hat Enterprise Linux Desktop 5" and open a support call on Red Hat site. Afterwards I would hope that they push some ressources for fixing this problem.
Per discussion in m.d.a.seamonkey minusing this for 1.9 - if it is a core bug and seamonkey or tbird need this in the 1.9 timeframe please do re-nominate.
Flags: blocking1.9? → blocking1.9-
Flags: blocking-thunderbird2?
Blocks: 188980
Using trunk seamonkey, I see the exact same issues as comment #77 describes.
I got this message today (fixed a bit to protect the innocent)

From: =?Windows-1252?Q?Vey=2C_A=2ED=2EJ=2E_=28Di=EBk=29?= <Diek.Vey@Obfuscate>
To: <me>

the name should be printed as 
Vey, A.D.J (Diëk)

instead it is shown as two addresses :

'Vey' with email address 'Vey'
'A.D.J (Diëk)' with the email address 'Diek.Vey@Obfuscate'

When I reply I get two To: adresses.

So it seems Thunderbird must also support fancy (non standard) code pages too.
(In reply to comment #87)
> I got this message today (fixed a bit to protect the innocent)
> 
> From: =?Windows-1252?Q?Vey=2C_A=2ED=2EJ=2E_=28Di=EBk=29?= <Diek.Vey@Obfuscate>
> To: <me>
> 
> the name should be printed as 
> Vey, A.D.J (Diëk)
> 
> instead it is shown as two addresses :
> 
> 'Vey' with email address 'Vey'
> 'A.D.J (Diëk)' with the email address 'Diek.Vey@Obfuscate'
> 
> When I reply I get two To: adresses.
> 
> So it seems Thunderbird must also support fancy (non standard) code pages too.
> 

There's nothing fancy in here, except the use of an encoded comma character in the "From" header. Tb has decoded and then interpreted it, but it shouldn't have.
Status: NEW → ASSIGNED
Status: ASSIGNED → NEW
(In reply to comment #87)
> 'Vey' with email address 'Vey'
> 'A.D.J (Diëk)' with the email address 'Diek.Vey@Obfuscate'
> 
> When I reply I get two To: adresses.
> 
> So it seems Thunderbird must also support fancy (non standard) code pages too.

Same problem with Big5 messages from Chinese business correspondence in Bug 428005.

So now I'm going to have to manually remove 6 or 7 erroneous "email addresses" from my To: box every time I hit Reply All?  Great.  And I suppose no one's willing to increase the severity of this bug from "normal", and it's going to sit around for several more years before anyone even starts to work on it, right?

Does anyone know of a workaround (default encoding?) or extension or anything that could fix this?
Jon B,

Maybe a regex that either strips or alter the problematic addresses...

At the same time I had submitted that bug for Mozilla I had submitted a similar bug report to IBM for Lotus Notes...

I was expecting the bug in Mozilla to be fixed before the one in Lotus Notes but apparently the bug in Lotus Notes has been corrected years ago...

If anybody knows how to change the priority of this bug, please let us know...

Have a nice day!

Nick
BTW, said regex would be applied on the MTA (with Postfix for example you can alter  headers) or MDA (procmail?) level...

Nick
(In reply to comment #96)
> BTW, said regex would be applied on the MTA (with Postfix for example you can
> alter  headers) or MDA (procmail?) level...

Is there no way to do it within Thunderbird itself, like a TB version of Greasemonkey?
Attached patch an attempt to fix this problem (obsolete) — Splinter Review
This patch fixes handling of addresses of the form  'Spät, Karl <ks@nomail.xx>' for me. Please give it a try. The Patch is against HEAD (May 07, 2008).

Rationale: 

In many parts of the MailNews sources, MIME headers containing RFC 2822 address lists are treated by decoding MIME first, then expanding address lists. This is wrong if the MIME-encoded data contains a comma (MIME encoded data should be treated as an RFC 2822 atom).

This patch changes the behavior such that MIME decoding is done last, immediately before displaying the headers, and in particular, after splitting up the address list.

Individual changes:

 - nsMsgCompFields::SplitRecipients(): run postponed late MIME conversions
 - nsMsgDBView::FetchAuthor(): fixes message list display
 - nsMsgCompose::CreateMessage(): postpone MIME conversion for some fields (for "Reply")
 - QuotingOutputStreamListener():: postpone more MIME conversions (for "Reply")
 - QuotingOutputStreamListener::OnStopRequest(): dito
 - CreateCompositionFields(): dito, for "Edit as New"
 - MimeHeaders_write_all_headers(): dito, for message display window
Attachment #319890 - Flags: review?(mkmelin+mozilla)
Comment on attachment 320002 [details] [diff] [review]
similar patch for MOZILLA_1_8_BRANCH

Marking obsolete as this isn't something that ever would be approved for the branch.
Attachment #320002 - Attachment is obsolete: true
Comment on attachment 319890 [details] [diff] [review]
an attempt to fix this problem

I'm not a mailnews/ reviewer, try dmose@mozilla.org perhaps?
Attachment #319890 - Flags: review?(mkmelin+mozilla)
(In reply to comment #101)
> (From update of attachment 320002 [details] [diff] [review])
> Marking obsolete as this isn't something that ever would be approved for the
> branch.

Why not? Is Mozilla only concerned about security bugs on branch, and ignores all the others?
Gee how I hate this attitude... :( But I can see at least some reasoning behind it, so I'll just hope that at least Tb3 is planned to be released this year then. :)
(In reply to comment #105)
> Gee how I hate this attitude... :( But I can see at least some reasoning behind
> it, so I'll just hope that at least Tb3 is planned to be released this year
> then. :)

This isn't going to be fixed until Thunderbird 3??
No, don't expect it.

Thunderbird is close to 3.0 alpha 1 - not meaning 3.0 will be out any time soon.

It's too dangerous to include stuff like this, if it messes up anything. It might be included on trunk soon and you really need it, then try the nightlies then.
Attachment #319890 - Flags: review?(dmose)
(In reply to comment #107)
> No, don't expect it.
> 
> Thunderbird is close to 3.0 alpha 1 - not meaning 3.0 will be out any time
> soon.

When should we expect a fix for this serious bug?  Do any of the patches work?  Can they be implemented as extensions?
(In reply to comment #108)

> When should we expect a fix for this serious bug?  Do any of the patches work? 

Both of these patches "work" for me, in the sense that they fix the "comma" problem. I have been using TB 2.0 with attachment 320002 [details] [diff] [review] in a production environment today, and encountered no problems. The question is if the patches cause any regressions for other people, and if they are considered clean enough by the reviewers.

If you want to help, please build Thunderbird with appropriate patch applied, and report if it works for you, or if you encounter any problems. It would be good if these patches get some testing from different people, especially people working with other charsets than me (e.g  Asian), and on other platforms (I did all my testing on Linux).

> Can they be implemented as extensions?

I seriously doubt it. You need to change the format in which data is passed between different Thunderbird components. That will hardly be possible without touching core code.

Regards, Martin
OMG this is annoying thing. Please fix it, I am getting dozens of e-mails every day that is treatened like this :-(
(In reply to comment #110)
> OMG this is annoying thing. Please fix it, I am getting dozens of e-mails every
> day that is treatened like this :-(
> 
As you might have noticed, the prioritisation of bugs has a rather Anglo-American bias in this matter. As you know, there is no need for punctuation and the likes.

In other words: 
Would you be so kind to include this patch in the next major release ?
(In reply to comment #107)
> It
> might be included on trunk soon and you really need it, then try the nightlies
> then.
> 

I perfectly understand that this comes too late for TB2.

You say it might be included on trunk. What is necessary to make that happen as soon as possible, so we see it in TB 3?


I have understood that there is currently no plan do accept this on the 1.8 branch. I just wanted to attach this improved patch for those who'd like to build thunderbird 2.0 with this patch. The old 1.8 patch had a problem with Quoting (the author name in "xyz wrote:" was undecoded). This patch (replacement for the previous one) fixes that behavior.

The trunk patch (attachment 319890 [details] [diff] [review]) is unaffected by this problem.
I vote for including this patch to trunk as soon as possible, thanks.
My patch has been waiting for review for almost 2 months now. I don't want to press, but perhaps someone can recommend another reviewer?
Martin, sorry for not getting to the review sooner.  Unfortunately, since you've submitted the trunk patch, the trunk code has changed enough such that it doesn't apply, even with various patch flag massaging.  I've just spent about an hour wrapping my head around a bunch (thought not yet all) of the various comments here.  Unfortunately, I'm not going to finish this evening, but I'll do some more tomorrow.

If you or someone else could un-bitrot the patch, that would be fantastic.  If not, I'll try and find the bandwidth to do it.  In any case, I'll poke some more at this tomorrow.
Things get moving. Great to see.

Dan, what is your assumption of having this in future stable version? 3.0 maybe?
(In reply to comment #116)

> If you or someone else could un-bitrot the patch, that would be fantastic.  If
> not, I'll try and find the bandwidth to do it.  In any case, I'll poke some
> more at this tomorrow.

Sorry, I should have done that before pinging you. I'll see what I can do.

Martin
Attached patch patch against CVS 20080716 (obsolete) — Splinter Review
Applies, compiles, and passes basic testing with today's code in HEAD. 

As before, the rationale of this patch is to postpone RFC2047 decoding of "address list" type fields until after they are split into a real list. This prevents the misinterpretation of RFC2047-encoded special characters (most commonly, ","). With the patch, Thunderbird uses the *encoded* fields internally and does RFC2047 decoding basically just before displaying the values to the user.

This means that calls to MIME_DecodeMimeHeader() are now done at different parts of the code than before.

A small problem: MIME_DecodeMimeHeader() takes the "charset" and "charsetoverride" parameters. I didn't find it obvious how to set these correctly, given that I am now calling this function from a different stage as before. As I am not sure how charset overriding is supposed to work, I leave it up to others to decide.

The Fixme in the code for mimehdrs.cpp means that I am not 100% sure if additional headers should also be treated this way. Currently the patch postpones RFC2047 decoding for the "to", "from", "cc", "bcc", "reply-to", and "sender" headers.
Attachment #319890 - Attachment is obsolete: true
Attachment #329874 - Flags: review?(dmose)
Attachment #319890 - Flags: review?(dmose)
lzap: Barring anything unforeseen, I'd expect 3.0 to be the first stable version that contains this fix, yes.

Martin: thanks for the updated patch.  Unfortunately, today didn't quite work out the way I expected, so I hope to spend some more time on this tomorrow, and I'll try and address the concerns you've mentioned in the most recent comment.
Product: Core → MailNews Core
Comment on attachment 329874 [details] [diff] [review]
patch against CVS 20080716

Setting review flag for bienvenu@nventure.com, on suggestion from dmose.

I have not checked if the patch needs work to apply to current CVS.
Attachment #329874 - Flags: review?(dmose) → review?(bienvenu)
I'm trying to apply this patch (it doesn't apply right now) and I'll check it out. Thx for the patch! I'm sure I'm going to have some questions...
I've tweaked the patch a little to get it to compile - ExtractHeaderAddressName no longer takes a charset. And I've gotten rid of the code that handles failing to get the mime converter - if that fails, we've got bigger problems so I just exit early. I'll attach a new patch once I've verified that things basically work.

I can probably construct some test cases from the various comments in the bug, but if anyone has a test case mailbox file already made that they'd like to put in the bug, that would be helpful.
We should add test cases to mailnews/compose/test/unit/test_nsMsgCompose1.js, probably right at the end, if I understand that test correctly.
Attached patch hg patch against trunk (obsolete) — Splinter Review
this applies against the hg tip, and works with some simple tests...I'll try to add this case to the comp fields test case.
Attachment #329874 - Attachment is obsolete: true
Attachment #343484 - Flags: superreview?(neil)
Attachment #329874 - Flags: review?(bienvenu)
thx very much for the help fixing the test!
Attachment #343547 - Flags: review?(bugzilla)
in‑testsuite to be plussed when checked in
Flags: in-testsuite?
Attachment #343547 - Flags: review?(bugzilla) → review+
I can't seem to get this to apply, has it bitrotted?
wouldn't surprise me if it has bitrotted again. I'll see if I have an unbit-rotted version in a tree somewhere...
this should apply
Attachment #343484 - Attachment is obsolete: true
Attachment #346538 - Flags: superreview?(neil)
Attachment #343484 - Flags: superreview?(neil)
Comment on attachment 346538 [details] [diff] [review]
un-bittrotted patch

>+            rv = parser->MakeFullAddressString(decodedName.get(), pAddresses, 
>                                                getter_Copies(fullAddress));
(2×) The "old" code used to have fallback
e.g. decodedName.length() ? decodedName.get() : pNames

>+          recipient = NS_ConvertUTF8toUTF16(outCString);
(6×) These should be converted to the form
CopyUTF8toUTF16(outCString, recipient);

>   if (from) {
>-    val = MIME_DecodeMimeHeader(from, charset, PR_FALSE, PR_TRUE);
>-    cFields->SetFrom(NS_ConvertUTF8toUTF16(val ? val : from));
>-    PR_FREEIF(val);
>+    cFields->SetFrom(NS_ConvertUTF8toUTF16(from));
>   }
> 
>   if (subject) {
>     val = MIME_DecodeMimeHeader(subject, charset, PR_FALSE, PR_TRUE);
>     cFields->SetSubject(NS_ConvertUTF8toUTF16(val ? val : subject));
>     PR_FREEIF(val);
>   }
It's confusing that some fields are decoded here and some are decoded elsewhere... is it possible to decode them all in the same place?

>+    if (!(name.LowerCaseEqualsLiteral("to") || name.LowerCaseEqualsLiteral("from") ||
>+          name.LowerCaseEqualsLiteral("cc") || name.LowerCaseEqualsLiteral("bcc") ||
>+          name.LowerCaseEqualsLiteral("reply-to") || name.LowerCaseEqualsLiteral("sender")))
>+          MimeHeaders_convert_header_value(opt, hdr_value);
What does MimeHeaders_convert_header_value do that we don't want it to?
While I was waiting for the answers to my questions I thought I'd try the patch out. I didn't notice anything obviously wrong, but I did discover that if I create a card with a display name of =?utf-8?Q?=C2=A1Hola_se=C3=B1or!?= then it gets decoded when I compose ;-) (oddly it gets ignored without the patch...)
Martin, if you get a chance to answer those questions, that would be very helpful...
(In reply to comment #132)

> (2×) The "old" code used to have fallback
> e.g. decodedName.length() ? decodedName.get() : pNames
OK, should be easy to fix.

> (6×) These should be converted to the form
> CopyUTF8toUTF16(outCString, recipient);
OK, should be easy to fix.


> It's confusing that some fields are decoded here and some are decoded
> elsewhere... is it possible to decode them all in the same place?

I agree, but I wanted to change as little as possible to fix the problem encountered. IMO all else would have strongly increased the probability of regressions. Strictly thought-through, all MIME encode/decode operations should be done in one single place. As noted in comment #99, I neeeded to change different code for the message list, message display, "reply", and "edit as new" operations. IMO, MIME-decoded header data should never be used internally, only for displaying the data to the user (immediately before putting the text in some GUI field). But that would go far beyond the bug fix that I attempted here, and had better be done by someone with a deeper understanding of the data flow between different mozilla components than me. For example, some Mozilla components may rely on parsing some header data in clear text and would fail with MIME-encoded data (I don't know if that's the case, I just wanted to minimize the risk to encounter that situation).

> What does MimeHeaders_convert_header_value do that we don't want it to?

See comment #99. MimeHeaders_convert_header_value() converts MIME encoded data to clear text. If the MIME encoded data contains a comma, later parsing will falsely split the header into fields. This is exactly what this patch is all about.
(In reply to comment #133)
Do you consider that a regression? Have you been able to send that email (if you assign  a valid email address to the card)?

If I understand right, you have entered `=?utf-8?Q?=C2=A1Hola_se=C3=B1or!?=' directly into the "Display name" field? That's a bit unusual - I think users are normally expected to enter such fields in cleartext with some encoding matching their locale. Do you disagree?
Comment on attachment 346538 [details] [diff] [review]
un-bittrotted patch

OK, sr=me with those minor code changes fixed.

I wonder how many extensions expect the data to be MIME-decoded...
Attachment #346538 - Flags: superreview?(neil) → superreview+
Martin, if you submit a new patch addressing the minor code change comments soon, I can drive it in the tree before the code freeze for beta1 (which is next week)
The bugzilla interface is horrible. I cannot find how to uncheck me from the CC list...
(In reply to comment #138)
Uff, that's close... I'll see what I can do.
Updated patch which addresses Neil's points.

> (2×) The "old" code used to have fallback
> e.g. decodedName.length() ? decodedName.get() : pNames
did it using !decodedName.IsEmpty? ...

> (6×) These should be converted to the form
> CopyUTF8toUTF16(outCString, recipient);
I did this for nsMsgCompose.cpp, but not for mimedrft.cpp, where NS_ConvertUTF8toUTF16() is used all over the place. I think that would be a separate cleanup patch.

The Message list display part (nsMsgDBView.cpp) got lost in attachment #343484 [details] [diff] [review], I re-added it. Without this, author names in the message list will still be broken.

The patch applies to today's CVS compiles and works (for me, at least).
Hope it's in time for the beta.
This should be ready for checkin then. 

lzap: click on my votes, and uncheck yourself.
Assignee: nobody → martin.wilck
Status: NEW → ASSIGNED
Keywords: checkin-needed
Target Milestone: --- → Thunderbird 3.0b1
Martin: this patch shouldn't be against cvs. We're no longer developing against cvs, we're developing trunk in Mercurial. The patch doesn't apply against the current Mercurial code base.

Please see https://developer.mozilla.org/en/Comm-central_source_code_(Mercurial) for how to get the latest source code.
Keywords: checkin-needed
Can't you even export the changes to CVS? I see no chance to get this bug fix into the beta if I need to switch to a new VC system first.
I don't think the code is available in CVS anywhere. You may be able to get a tarball of the current tree, make your changes, and diff that against the original versions of the files...

Otherwise, someone will need to fix the patch to make it apply against the trunk, which has become increasingly difficult as the code around it has changed.

thx again for working on this.
Note that pulling and building a new tree from mercurial is actually fairly straightforward: see <https://developer.mozilla.org/en/Comm-central_source_code_(Mercurial)> for details...
Please Martin, add your patches to new VCS.

I am very happy that this bug is resolved.

At the moment, firebird is not usable in corportations outside of United States.
When you replay a valid email generated by Outlook Firebirds, it breaks the addresses from the header if your name hasn't a ascii character (example Spanish names, it is the third world language more spoken on the world).

I know that you haven't been helped by the Firebirds developers, as they seem to ignorate anything related to Outlook or not English languages.

Please give it last work for that it is integrated.
I wanted to say thunderbird developers not firefox developers in my previous e-mail.

What steps needs to be done for increase the importance of this old bug from Normal to  Major?

I think this old bug must be a blocker for any future release of Thunderbird. What steps do I need to do this?

At this moment a basic funcionality as the Replay/Reply is broken. If you reply a rfc-complaint e-mail composed by Outlook, that contains Names in From/To/CC with a non-ascii language, it breaks the names and the e-mail fails

By example my Full name is very usual in Spain: "Rodríguez García, José Luis", has got three accents, that aren't ASCII characters. In my company more of 50%
of times that I reply to e-mail, somebody has accent in its name.

Not giving importance to this bug only shows a childish hate to Outlook or a thinking that all world live in United States and all world must use 7-bits ASCII characterset.

At the moment tunderbird doesn't implemente one of the four basic funcionalities of a e-mail client: compose, get messages, reply and forward messages
Requesting blocking-thunderbird3 per fact that this bug is really annoying for users in countries where "Lastname, Firstname" form is a popular tradition.
Flags: blocking-thunderbird3?
This bug is in principle solved. All that needs to be done is that someone takes the patch from attachment #348191 [details] [diff] [review] and ports it to the main branch in mercurial. I'd like to do that myself but I am not sure when I'll find a time slot for it.
Then setting it to blocking-thunderbird3 shouldn't be a big deal?
Just to make sure the solution does make it into the release?
Please?
That bug is a continuous source of support calls for me.
Whiteboard: [patchlove]
blocking‑thunderbird3+; we should definitely get the patch unbitrotted and checked in before tb3. (It's only two hunks that fail atm.)
Flags: blocking-thunderbird3? → blocking-thunderbird3+
Target Milestone: Thunderbird 3.0b1 → Thunderbird 3.0b2
This brings attachment 348191 [details] [diff] [review] to Hg trunk, plus a minor typo correction.
Attachment #351464 - Flags: superreview?(neil)
Attachment #351464 - Flags: review?(bienvenu)
Comment on attachment 351464 [details] [diff] [review]
attachment 348191 [details] [diff] [review] ported to Hg trunk

>+          CopyUTF8toUTF16(outCString, bcc);
>+          if (bcc.Length() > 0)
>+            compFields->SetBcc(bcc);
Could be
if (!outCString.IsEmpty())
  compFields->SetBcc(NS_ConvertUTF8toUTF16(outCString));
[Length() is wrong either way]
Attachment #351464 - Flags: superreview?(neil) → superreview+
Hello, I'm new here. 

I'm using thunderbird for windows and I wonder how can I apply the solutions explained here.

Thank you very much for your help.
Attachment #351464 - Flags: review?(bienvenu) → review+
Comment on attachment 351464 [details] [diff] [review]
attachment 348191 [details] [diff] [review] ported to Hg trunk

thx very much for de-bitrotting the patch, Karsten.

I've followed Neil's suggestion about the bcc header, and removed the consequently unused local var. And thx again, Martin, for the patch.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago16 years ago
Resolution: --- → FIXED
Whiteboard: [patchlove]
Attachment #343547 - Attachment description: add test case → add test case - checked in.
No longer blocks: 188980
Depends on: 468351
(In reply to comment #0)
> When an email contains a "From:" line encoded with RFC 2047 which contains
> characters which should, per RFC (2)822, be quoted (because it contains commas
> (",") or similar characters), Mozilla doesn't decode it properly and thinks
> these are two addresses.

Verifying this has been fixed.  I am seeing correct behavior in 
 Gecko/20081207 Shredder/3.0b2pre
(Find it here:
  ftp://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-central/
)

The name is correctly shown in the Recipient column, and replying to the message constructs the name as expected in the addressing widget of the compose window -- except, the name is quoted, as in...


> The reverse is also true.  It quotes an email address which contains a comma
> and then apply RFC 2047 (which it shouldn't do).

The quotes aren't necessary for the MIME encoding, but I think they may be necessary for the addressing widget to parse correctly.  If you simply type a comma-separated name into the field without the quotes, the widget automatically splits it into two separate addresses.  If the addition of quotes is still a problem, that should go into a new bug -- if not every dupe to this bug is about the first problem, I bet 98% of them are.

Thanks very much, Martin and David.
Status: RESOLVED → VERIFIED
Tested with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20081207 Shredder/3.0b2pre and reply works. 
But if i click on the sender in mail view and choose "compose mail to" in the dropdown menu, the address is still split into two addresses.
the "clicking on the sender" issue should be a new, spin-off bug. Do you mind filing that as a thunderbird front end bug?
(In reply to comment #162)

In my opinion as an end user this is the same bug. 
But I don't know the code of thunderbird and therefore can't estimate in which part of thunderbird the fault is.
This FIXED bug is flagged with in‑testsuite?   It would be great if assignee or someone else can clear the flag if a test is not appropriate.  And if appropriate, create a test and plus the flag to finish off the bug.
(In reply to comment #164)
> This FIXED bug is flagged with in‑testsuite?   It would be great if assignee or
> someone else can clear the flag if a test is not appropriate.  And if
> appropriate, create a test and plus the flag to finish off the bug.

ping ?
Setting blocking‑seamonkey2.0b1?

IIRC, bug fixes from TB were supposed to be migrated to SeaMonkey before beta, but as of 2.0a3, this has not happened yet.  This bug is

 (a) a blocker for enterprise use in non-English-language countries
     (cf. the number of votes, number of dupes and the bug spam
     by desperate victims), also blocking‑thunderbird3+

 (b) committed to TB fairly recently (2008-12-04 to Hg-trunk), so that
     it can serve as an indicator whether the bug fix migration
     has actually happened
Flags: blocking-seamonkey2.0b1?
There's no "migration" - tb and sm use the same code.
(In reply to comment #168)
> Setting blocking‑seamonkey2.0b1?
> 
> IIRC, bug fixes from TB were supposed to be migrated to SeaMonkey before beta,
> but as of 2.0a3, this has not happened yet.

All the code here is shared between TB + SM. This bug also seems to have a unit test which we know hasn't regressed. Therefore not blocking. If you think you have a specific problem in SM I suggest you file a new bug for it and it can be covered there.
Flags: blocking-seamonkey2.0b1? → blocking-seamonkey2.0b1-
(In reply to comment #161)
> Tested with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US;
> rv:1.9.1b3pre) Gecko/20081207 Shredder/3.0b2pre and reply works. 
> But if i click on the sender in mail view and choose "compose mail to" in the
> dropdown menu, the address is still split into two addresses.

Just for the reference: I cannot confirm this with
Mozilla/5.0 (Windows; U; Windows NT 6.0; lt; rv:1.9.1b4pre) Gecko/20090414 Lightning/1.0pre Shredder/3.0b3pre
Flags: blocking-seamonkey2.0b1- → blocking-seamonkey2.0b1?
(In reply to comment #171)
> (In reply to comment #161)
> > Tested with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US;
> > rv:1.9.1b3pre) Gecko/20081207 Shredder/3.0b2pre and reply works. 
> > But if i click on the sender in mail view and choose "compose mail to" in the
> > dropdown menu, the address is still split into two addresses.
> 
> Just for the reference: I cannot confirm this with
> Mozilla/5.0 (Windows; U; Windows NT 6.0; lt; rv:1.9.1b4pre) Gecko/20090414
> Lightning/1.0pre Shredder/3.0b3pre

Please clarify: Are you saying this bug is fixed or not?
(In reply to comment #172)
> (In reply to comment #171)
> > (In reply to comment #161)
> > > Tested with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US;
> > > rv:1.9.1b3pre) Gecko/20081207 Shredder/3.0b2pre and reply works. 
> > > But if i click on the sender in mail view and choose "compose mail to" in the
> > > dropdown menu, the address is still split into two addresses.
> > 
> > Just for the reference: I cannot confirm this with
> > Mozilla/5.0 (Windows; U; Windows NT 6.0; lt; rv:1.9.1b4pre) Gecko/20090414
> > Lightning/1.0pre Shredder/3.0b3pre
> 
> Please clarify: Are you saying this bug is fixed or not?
Reply to an RFC2047 encoded sender is fixed.

Right-click on sender address and choosing "Compose Mail To" is still broken, the new mail has two "to:"-addresses in it.

Tested on MacOSX:
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b5pre) Gecko/20090506 Shredder/3.0b3pre
(In reply to comment #173)
> Right-click on sender address and choosing "Compose Mail To" is still broken,
> the new mail has two "to:"-addresses in it.

Ok, so please raise that in a separate bug as previously requested.
Flags: blocking-seamonkey2.0b1? → blocking-seamonkey2.0b1-
(In reply to comment #172)
Yes, it works fine for me. Including the scenario described in comment #173.
Could that be OS-specific? Or maybe it has regressed between the dates?..
(In reply to comment #174)
> (In reply to comment #173)
> > Right-click on sender address and choosing "Compose Mail To" is still broken,
> > the new mail has two "to:"-addresses in it.
> 
> Ok, so please raise that in a separate bug as previously requested.
Ok, I raised a new bug: Bug 491832
marking in testsuite +
Flags: in-testsuite? → in-testsuite+
Please backport this patch to Thunderbird 2.0.0.x. Thunderbird 3 is still beta, and there already has been a testing period for this patch.

Conformance to an RFC should be enough of a reason to introduce this change into the stable branch.

Thanks alot for the work on this issue, it is crucial for acceptance in corporate environments.
(In reply to comment #179)
> Please backport this patch to Thunderbird 2.0.0.x. Thunderbird 3 is still beta,
> and there already has been a testing period for this patch.
> 

3.0RC will come out next week. We just don't have the resources to backport and test for the next release of the 2.X branch.
The actual status is correct?
"VERIFIED FIXED"?
Still failing under 3.0.1
(In reply to comment #183)
> The actual status is correct?
> "VERIFIED FIXED"?
> Still failing under 3.0.1

Could you attach an example message please?
Hi!

Address at header:
=?iso-8859-1?Q?L=E1zaro=2C_Nuria?= <nuria.lazaro@xxx.com>

Thunderbird decode this as:

Lázaro, Nuria <nuria.lazaro@xxx.com> 

The parsing of this string fails, identify two address "Lázaro", and "Nuria
<nuria.lazaro@xxx.com>"
So sorry.

The computer used to test my software has a old version of Thunderbird. Updated to 3.0.1 like the other computers and works Ok, please ignore comments 183 and 185
Hi,

I'm using the (german) version 3.1.6 of Thunderbird on Windows XP.
The bug seems to be fixed when replying to a message by using the "Reply" button, the menue "Message/Reply" or the shortcut Ctrl+A.

BUUUT ....
Writing a new email with following procedure still fails:
1. Use the right mouse button at the sender's adress of a displayed email
2. Select the entry "Write email to" (resp. in German "Nachricht verfassen an")

I hope my comment helps to improve Thunderbird.
Just to confirm Ralf previous comment.

I have the same problem on Thunderbird 3.1.10 over Ubuntu 10.10 (64 bits).

This bug should be reopened.
Problem still exists in Thunderbird 3.1.14 on openSUSE 11.4 64 bit
Mozilla/5.0 (X11; U; Linux x86_64; de; rv:1.9.2.22) Gecko/20110907 SUSE/3.1.14 Thunderbird/3.1.14

Right-clicking the "From" field of a displayed mail with the "From" header:

From: =?ISO-8859-1?Q?Dr=2E_J=F6rg_Schr=F6per=2C_LANline?= <llsec@lanline.de>

results in a message composition window with two recipients:
"Dr. Jörg Schröper" and "LANline <llsec@lanline.de>".

(That's the sender address of a public newsletter so I think posting the unsanitized address is ok.)
The problem still exists in 9.0.1 (Linux and Windows). Adresses with german umlauts like ä,ö,ü are displayed incorrectly in the main screen. In the message view 

I attached a screenshot to show the issue in detail.
To complete my comment:
In the message view the address is displayed correctly.
Attached image Screenshot
(In reply to Christian Reischl from comment #193)
> The problem still exists in 9.0.1 (Linux and Windows). Adresses with german
> umlauts like ä,ö,ü are displayed incorrectly in the main screen. In the
> message view 
> 
> I attached a screenshot to show the issue in detail.

I think that's likely to have been fixed by bug 669925 (although a slightly different issue, the fix would probably cover this as well).

If you can reproduce this in Thunderbird 10 beta, please file a new bug for it. Thunderbird 10 Beta is available here:

http://www.mozilla.org/en-US/thunderbird/channel/

Also generally note that it is not a good idea to comment on fixed/resolved bugs, comments will likely get lost.
You need to log in before you can comment on or make changes to this bug.