Closed Bug 1088975 Opened 10 years ago Closed 9 years ago

Answering mail with sendername containing encoded special chars and comma creates two "To"-entries

Categories

(Thunderbird :: Message Reader UI, defect)

38 Branch
defect
Not set
major

Tracking

(thunderbird38+ fixed, thunderbird39 fixed, thunderbird40 fixed, thunderbird41 fixed, thunderbird_esr31 unaffected)

RESOLVED FIXED
Thunderbird 41.0
Tracking Status
thunderbird38 + fixed
thunderbird39 --- fixed
thunderbird40 --- fixed
thunderbird41 --- fixed
thunderbird_esr31 --- unaffected

People

(Reporter: mail, Assigned: rkent)

References

Details

Attachments

(4 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
Build ID: 20141011015303

Steps to reproduce:

Recived mail from Swiftmailer with following encoded "From"-names:

#1
senderName = Test, Test2
From: "Test, Test2" <test345@example.org>

#2
senderName = Test, Tesät2
From: =?utf-8?Q?Test=2C_Tes=C3=A4t2?= <test345@example.org>

#3
senderName = Test3 ä,ä test4
From: Test3 =?utf-8?Q?=C3=A4=2C=C3=A4?= test4 <test345@example.org>


Answering to these emails
A: by clicking on the "Answer"-button
B: by right-clicking on the senders name/email and selecting from the context-menu "Compose message to"


Actual results:

Composer window opens

Cases #1 / A & B: One correct "To"-recipient prefilled
Cases #1 & 2 / A: One correct "To"-recipient prefilled

Cases #1 & 2 / B: Two "To"-recipient prefilled, e.g.:
Test
Tesät2 <test345@example.org>




Expected results:

Cases #1 / A & B: are totally fine.
Cases #1 & 2 / A: are totally fine.

Cases #1 & 2 / B: One correct "To"-recipient prefilled, e.g.:
Test, Tesät2 <test345@example.org>
-or-
"Test, Tesät2" <test345@example.org>
-numbering mixed up-

Actual results:

Composer window opens

Cases #1 / A & B: One correct "To"-recipient prefilled
Cases #2 & 3 / A: One correct "To"-recipient prefilled

Cases #2 & 3 / B: Two "To"-recipient prefilled, e.g.:
Test
Tesät2 <test345@example.org>




Expected results:

Cases #1 / A & B: are totally fine.
Cases #2 & 3 / A: are totally fine.

Cases #2 & 3 / B: One correct "To"-recipient prefilled, e.g.:
Test, Tesät2 <test345@example.org>
-or-
"Test, Tesät2" <test345@example.org>
This might be related to bug 1071476 (I'd call it a duplicate of that).

We need to separate "answering" from "creating" a message here, since the behaviour is different.

Case A above (answering by reply button, if address was RFC2047-encoded) earlier had a bug, which has been fixed meanwhile (bug 254519).

Case B (creating a new message by context menu) seems to be a new bug in MailNews and also appears in SeaMonkey. It is not related to RFC2047 encoding, it just needs a comma in the realname part. The same bug happens when creating a new message by other means, not by context menu, and on selecting mail addresses with commas in the realname part from the address book.

For more details on this new bug, see bug 1071476.
I also see this problem, in the latest version of TB 40 (trunk) and, more worryingly, in TB 38b (which will be the next ESR). See this on Windows 7 and OS X 10.10. Because of this bug one email I was sending to my colleges was send with the sender name "=?UTF-8?Q?H=c3=a4name?=", without any email address. Therefore it was sorted out into the spam folder.


STR: Go to "View settings for this account" -> And change the "Your name:".

Your name: Haeberle, Hugo
-->
Seen in TB send folder as: "From Haeberle <>, Me <email@address.com>"
Received in webmail client as ""Haeberle" (without any email address, no respond possible)"

Your name: Häberle, Hugo
-->
Seen in TB send folder as:  "From Häberle <>, Me <email@address.com>"
Received in webmail client  as ""=?UTF-8?Q?H=c3=a4berle?=" (without any email adress, no respond possible)"
 
Your name: Hugo Häberle
-->
Seen in TB send folder as: "From Me <email@address.com>"
Received in webmail client as ""Hugo Häberle" <email@address.com>"

"Häberle, Hugo" was set from autoconfig. So I think some more people could have a "Your name" with a comma.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows 7 → All
Hardware: x86_64 → All
I see nothing specific in the error console.
Is there any reason why the TB version is set to 24? I expect this is new in TB 38, right?

Can someone please confirm that this is seen in TB 38 betas, and not seen in TB 31?
See Also: → 1130248
nomis101 was testingtb38b2 when he encountered this bug
Version: 24 → 38
(In reply to Kent James (:rkent) from comment #5)
> Is there any reason why the TB version is set to 24? I expect this is new in
> TB 38, right?

Ayke, what version were you using when you reported this? 

(FWIW Ayke couldn't have been on 38 - the nightly build on 2014-02-24 was 36.0a1)
This seems similar to bug 1130248. In the "Compose message to" case the quotes (") are removed from the sender name (even if the original message contained them) and then the compose window parses the address wrongly. It is supposed this is caused by the cleanup of decoding the full address in the new jsmime.
JoeS1 is proposing that we not ship further betas until this is resolved. I doubt we can do that, but let me at least bump the seriousness to motivate us. Someone needs to pick this up.
Severity: normal → major
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #7)
> (In reply to Kent James (:rkent) from comment #5)
> > Is there any reason why the TB version is set to 24? I expect this is new in
> > TB 38, right?
> 
> Ayke, what version were you using when you reported this? 
> 
> (FWIW Ayke couldn't have been on 38 - the nightly build on 2014-02-24 was
> 36.0a1)

If it was me who set TB version to 24, then I was still using Thunderbird 24 - german version
It kind of fits with the release cycle:
https://mozorg.cdn.mozilla.net/media/img/firefox/organizations/release-overview-high-res.9808ec366a94.png
The bug might be in mail/base/content/msgHdrViewOverlay.js

function SendMailToNode(...)
{
	...
	fields.to = addressNode.getAttribute("fullAddress");
	...
	params.composeFields = fields;
	...
	MailServices.compose.OpenComposeWindowWithParams(null, params);
}

"fullAddress" is like
params.composeFields.to = Test, Tesät2 <test345@example.org>
params.composeFields.to = Test3 ä,ä test4 <test345@example.org>

Call in MsgComposeCommands.js prepares new composer
gMsgCompose = MailServices.compose.initCompose(params, window, editorElement.docShell);

To work correctly it has to be like this (the address book sets the params like this):
params.composeFields.to = "Test, Tesät2" <test345@example.org>
params.composeFields.to = "Test3 ä,ä test4" <test345@example.org>


A change in mail/base/content/msgHdrViewOverlay.js might fix the bug

function SendMailToNode(...)
{
	...
	fields.to = emailAddressNode.getAttribute("emailAddress");
	if (fields.to != addressNode.getAttribute("fullAddress")) {
		// only if a name exists
		// syntax of "fullAddress" is wrong, included displayName with comma breaks params.composeFields.to for MailServices.compose.initCompose()
		fields.to = '"' + emailAddressNode.getAttribute("displayName") + '" <' + fields.to + '>';
	}
	...
}


A better fix might be a new addressNode attribute like "fullMailToAddress" which has set the correct syntax.
My opinion is not to change "fullAddress" as this is also used for displaying the name/email in the frontend and the name looks nicer without the quotation marks.
Fix not tested!

in mail/base/content/msgHdrViewOverlay.js:

function SendMailToNode(addressNode)
{
	...
	fields.to = addressNode.getAttribute("emailAddress");
	if (fields.to != addressNode.getAttribute("fullAddress")) {
		// only if a name exists
		// syntax of "fullAddress" is wrong, included displayName with comma breaks params.composeFields.to for MailServices.compose.initCompose()
		fields.to = '"' + addressNode.getAttribute("displayName") + '" <' + fields.to + '>';
	}
	...
}
Yes, that is what I am saying in bug 1130248. But we probably should not hardcode adding the quotes like that. Either we should put properly quoted address in the "fullAddress" in the first place (from the decoding functions). Or if the decoding functions are correct (per specification), then we need to fix the compose code to parse this unquoted address correctly.
Joshua, this is one of our likely-jsmime blockers. Could you give some comments?
Flags: needinfo?(Pidgeot18)
https://bugzilla.mozilla.org/show_bug.cgi?id=254519
Seems to have fixed this before the jsmime landing
So I see these ways to handle correctly name+address containing a comma for the use case "Compose message to":
- Add quotations in Mailbox.toString
- change params.composeFields.to to handle an array with seperated name & address
- integrate another addressNode-attribute like "fullMailToAddress" which has the correct RFC-encoding for passing on to MailServices.compose.OpenComposeWindowWithParams(...)



(In reply to :aceman from comment #14)
> Or if the decoding functions are correct (per specification),
> then we need to fix the compose code to parse this unquoted address
> correctly.

The decoding functions work perfect - name and addresses are displayed correctly in the message viewer.

But the encoding functions can't differentiate between a single address and multiple addresses in the string given as argument.
If the name with comma is enclosed by quotation marks it works.

A draw-back of quoted "fullAddress" is, that it does not look that nice when this is displayed somewhere. Therefore I would prefer a new attribute e.g. "fullMailToAddress".
(In reply to :aceman from comment #14)
> properly quoted address in the "fullAddress" in the first place

thunderbird-38.0b2
In mimeJSComponents.js repolace lines 209-214 by:

  // These are prototypes for nsIMsgHeaderParser implementation
  var Mailbox = {
    toString: function () {
      if (this.name) {
        if (this.name.indexOf(',')) {
          // Makes the handling work for names containing comma 
          return '"' + this.name + '" <' + this.email + '>';
        } else {
          return this.name + '" <' + this.email + '>';
        }
      } else {
        return this.email;
      }
    }
  };
Re-encoding the decoded "fullAddress" with the functions we have is hard to implement. These functions handle a string with multiple addresses as well as a string containing a single address.
Maybe pass-through a flag for the state of the address-string: single-address or multiple-addresses

So the function parseAddressingHeader(header, doRFC2047) in jsmime.js needs to call the function getHeaderTokens with the correct delimiters:
for single-address: ":;<>@"
for multiple-addresses: ":,;<>@" (in this case single addresses with comma will make trouble too)
The actual fix for this bug is dead simple:
This line:
const qpForbidden = "=?_()\"";

should be:
const qpForbidden = "=?_()\",";

Now I need to write a testcase for this change.
Flags: needinfo?(Pidgeot18)
When can you get this done? We really need this for the Thunderbird 38 betas.
Flags: needinfo?(Pidgeot18)
I've been looking at putting in place the tests for this. The hard part seems to be to make changes without having the encoding switch to ?B which is hard to interpret.
Assignee: nobody → rkent
I've implemented the suggestion in comment 20, and adjusted tests accordingly. I've got a try run going, I'll post the patch if that looks good. But I do have a concern.

Although I am by no means an expert on email encoding, it seems to me that the root issue here is that jsmime is not currently attempting to enforce character restrictions for rfc 2047 encoding that allow the encoded string to be treated as an atom. Specifically, RFC 5322 specifies the allowable characters in an atom as:

   atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                       "!" / "#" /        ;  characters not including
                       "$" / "%" /        ;  specials.  Used for atoms.
                       "&" / "'" /
                       "*" / "+" /
                       "-" / "/" /
                       "=" / "?" /
                       "^" / "_" /
                       "`" / "{" /
                       "|" / "}" /
                       "~"

Also in rfc 2047 section 5 (3) we have (which is even more restrictive):

(3) As a replacement for a 'word' entity within a 'phrase', for example,
    one that precedes an address in a From, To, or Cc header.  The ABNF
    definition for 'phrase' from RFC 822 thus becomes:

    phrase = 1*( encoded-word / word )

    In this case the set of characters that may be used in a "Q"-encoded
    'encoded-word' is restricted to: <upper and lower case ASCII
    letters, decimal digits, "!", "*", "+", "-", "/", "=", and "_"
    (underscore, ASCII 95.)>.  

We are not currently limiting the encoded characters per that requirement, but perhaps we should. The standard implies that the allowable characters depends on the context, but currently we do not allow different encoding in different contexts. So perhaps we should use the most restrictive set of characters possible?

I think for now we just need to go with the simpler patch, but how are these issues addressed in jsmime?
Sorry, I forgot to post the patch.
Attachment #8601711 - Flags: review?(Pidgeot18)
Status: NEW → ASSIGNED
I was testing this patch in a 38 build. It was fixing the ""=?UTF-8?Q?H=c3=a4berle?=" issue very nicely for me. But it seems to have no effect to the duplicated and empty "To" addresses. Do I need to apply an additional patch to make this working?
(In reply to Nomis101 from comment #25)
> I was testing this patch in a 38 build. It was fixing the
> ""=?UTF-8?Q?H=c3=a4berle?=" issue very nicely for me. But it seems to have
> no effect to the duplicated and empty "To" addresses. Do I need to apply an
> additional patch to make this working?

Nomis, That split doesn't seem to bother Gmail, 
Is your webmail client still broken after this patch ?
(In reply to Joe Sabash [:JoeS1] from comment #26)
> (In reply to Nomis101 from comment #25)
> > I was testing this patch in a 38 build. It was fixing the
> > ""=?UTF-8?Q?H=c3=a4berle?=" issue very nicely for me. But it seems to have
> > no effect to the duplicated and empty "To" addresses. Do I need to apply an
> > additional patch to make this working?
> 
> Nomis, That split doesn't seem to bother Gmail, 
> Is your webmail client still broken after this patch ?

For me it is still the same like in #3, but with the ""=?UTF-8?Q?H=c3=a4berle?=" issue fixed.
Nomis101: Which 38 build are you using exactly? Your picture says TB40, so a daily, or did you compile it yourself?
The processing of "=?UTF-8?Q?H=c3=a4berle?=" has been fixed in bug 1141446.
Can you please post the full header in question here (View > Message Source).
The number of quotes in comment #27 does not match: ""=?UTF-8?Q?H=c3=a4berle?=",
there is a closing quote and perhaps something else missing somewhere.
Attached file source code (obsolete) —
(In reply to Jorg K from comment #28)
> Nomis101: Which 38 build are you using exactly? Your picture says TB40, so a
> daily, or did you compile it yourself?
> The processing of "=?UTF-8?Q?H=c3=a4berle?=" has been fixed in bug 1141446.
> Can you please post the full header in question here (View > Message Source).
> The number of quotes in comment #27 does not match:
> ""=?UTF-8?Q?H=c3=a4berle?=",
> there is a closing quote and perhaps something else missing somewhere.

This is the full message source of all three cases. Interesting, that for the last case TB changed the content-type and content-encoding. Is this expected?
Because there is no build with the attached patch applied, I was building my own version of TB38, after applying this patch.
Thanks for pointing me to Bug 1141446. I did my tests with a build before this checked in. I think it is now worth to try again.
Your headers are:
From: =?UTF-8?Q?Hugo_H=c3=a4berle?= <hugo_haeberle@web.de>
From: Hugo , =?UTF-8?Q?Hugo_H=c3=a4berle?= <hugo_haeberle@web.de>
From: Hugo , Haeberle <hugo_haeberle@web.de>
There are no quotes anywhere, so it has nothing to do with bug 1141446.
Looking at the messages next. Next time, please include the EML files, now I have to cut up the text file and do it myself.
(In reply to Jorg K from comment #30)
> There are no quotes anywhere, so it has nothing to do with bug 1141446.
Yes, tested with a brand new build, still the same.
Attached file Three test messages
Test 1: All working, only the body shows Hugo H�berle, since it's not properly encoded.
Test 2 and Test 3: Yes, reproducing the problem but only because the headers are illegal. You're missing the quotes. I fixed them.
Test 3 now reads:
From: "Hugo , Haeberle" <hugo_haeberle@web.de> and works.
Test 2 now reads (with quotes, now you need bug 1141446):
From: "Hugo , =?UTF-8?Q?Hugo_H=c3=a4berle?=" <hugo_haeberle@web.de> and also works.

In conclusion: It will all be working if you add the quotes and use the fix here and the fix to bug 1141446.
Attachment #8603868 - Attachment is obsolete: true
Actually, I correct myself. Add the quotes and the fix to bug 1141446.
This bug here has nothing to do with your test cases. The bug here is about the comma in the RFC2047 encoded string, something like =?UTF-8?B?IlRlc3QzIMOkLMOkIHRlc3Q0Ig==?= (look in attachment 8596604 [details] for more). You can decode online, e.g. at www.jorgk.com/decode
(In reply to Jorg K from comment #30)
> Looking at the messages next. Next time, please include the EML files, now I
> have to cut up the text file and do it myself.

You told me to do so:
(In reply to Jorg K from comment #28)
> Can you please post the full header in question here (View > Message Source).


(In reply to Jorg K from comment #32)
> Test 2 and Test 3: Yes, reproducing the problem but only because the headers
> are illegal. You're missing the quotes. I fixed them.
I did not remove any quotes. It is just copy and paste.
Comment on attachment 8601711 [details] [diff] [review]
Include comma with forbidden

Philipp, if Joshua does not get to this could you take a look? Thanks.
Attachment #8601711 - Flags: review?(philipp)
This issue still exists and is a blocker. The review should be fairly easy, this just implements the solution proposed by Joshua and adds some tests as requested.

Fallen or Magnus, could one of you look at this and review if Joshua does not get to it?  Thanks.
Flags: needinfo?(philipp)
Flags: needinfo?(mkmelin+mozilla)
Comment on attachment 8601711 [details] [diff] [review]
Include comma with forbidden

Review of attachment 8601711 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/mime/jsmime/jsmime.js
@@ +2832,5 @@
>  /// The beginnings of RFC 2047 encoded-word
>  const b64Prelude = "=?UTF-8?B?", qpPrelude = "=?UTF-8?Q?";
>  
>  /// A list of ASCII characters forbidden in RFC 2047 encoded-words
> +const qpForbidden = "=?_()\",";

I wonder if , is all that we should forbid. I looked at RFC2047, and the only case where comma is not allowed is section 5 part 3, where nothing other than upper and lower case ASCII letters, decimal digits, "!", "*", "+", "-", "/", "=", and "_" is allowed.

If this is not here because of RFC2047, I'd suggest adding a comment why the comma is added.
Attachment #8601711 - Flags: review?(philipp)
Attachment #8601711 - Flags: review?(Pidgeot18)
Attachment #8601711 - Flags: review+
Leaving needinfo for Joshua regarding the extra characters, but I think we can push this to beta even without.
Flags: needinfo?(philipp)
Flags: needinfo?(mkmelin+mozilla)
(In reply to Philipp Kewisch [:Fallen] from comment #37)
> > +const qpForbidden = "=?_()\",";
> 
> I wonder if , is all that we should forbid. I looked at RFC2047, and the
> only case where comma is not allowed is section 5 part 3, where nothing
> other than upper and lower case ASCII letters, decimal digits, "!", "*",
> "+", "-", "/", "=", and "_" is allowed.

Right, I also wondered about that. It seems like there is a definition of a limited character set that we are only partly implementing. I'm just reluctant to make any change that is not absolutely necessary this late in the game.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Flags: needinfo?(Pidgeot18)
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 41.0
You need to log in before you can comment on or make changes to this bug.