Closed Bug 1658361 Opened 2 years ago Closed 2 years ago

"Edit as new message" not working of addressing header contains raw UTF-8, was: Non-ASCII characters in e-mail address don't survive round trip of "save draft" and "edit draft"

Categories

(Thunderbird :: Message Compose Window, defect, P2)

Tracking

(thunderbird_esr78 fixed, thunderbird80 fixed)

RESOLVED FIXED
81 Branch
Tracking Status
thunderbird_esr78 --- fixed
thunderbird80 --- fixed

People

(Reporter: jorgk-bmo, Assigned: jorgk-bmo)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(1 file, 2 obsolete files)

Gene detected this in bug 1563891 and tracked it down to this regression range:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f83f2771414cf5938a46f44e1b11cdcd5181ea0f&tochange=192e0e33eb597e8d923eb89f6d49bf42654e9d11
See bug 1563891 comment #35

STR:
Compose a message to
a) öö@öö.com
b) jøran@blåbærsyltetøy.gulbrandsen.priv.nǿ
Save draft, edit draft.
a) Recipient lost
b) Recipient corrupted: j<?>ran@bl<?>b<?>rsyltet<?>y.gulbrandsen.priv.n<?> (where <?> is the UTF-8 replacement character that BMO doesn't display)

You will notice that in case a) there is no To: header at all, in case b) we have To: jøran@blåbærsyltetøy.gulbrandsen.priv.nǿ in the message encoded in UTF-8.

Gene tracked it down to a truncation that happens from a "high bit" UTF-16 character like ǿ which is U+01FF, down to U+00FF in the JS engine, so that smells like something running through an IDL-defined API of type string. Apparently moving some of those APIs in attachment 9169034 [details] [diff] [review] of the said bug doesn't help. I noticed that the recipients are all AString in nsIMsgCompFields.idl, so hard to tell where the information loss happens.

The round trip was working in TB 68. Whilst this is not a terrible issue for TB 78.x, it's blocking further development in bug 1563891.

Flags: needinfo?(alessandro)

(In reply to Jorg K (CEST = GMT+2) from comment #0)

Gene detected this in bug 1563891 and tracked it down to this regression range:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f83f2771414cf5938a46f44e1b11cdcd5181ea0f&tochange=192e0e33eb597e8d923eb89f6d49bf42654e9d11
See bug 1563891 comment #35

Actually, the reporter of bug 1563891 Alessandro V. originally found the problem.

STR:
Compose a message to
a) öö@öö.com
b) jøran@blåbærsyltetøy.gulbrandsen.priv.nǿ
Save draft, edit draft.
a) Recipient lost
b) Recipient corrupted: j<?>ran@bl<?>b<?>rsyltet<?>y.gulbrandsen.priv.n<?> (where <?> is the UTF-8 replacement character that BMO doesn't display)

You will notice that in case a) there is no To: header at all, in case b) we have To: jøran@blåbærsyltetøy.gulbrandsen.priv.nǿ in the message encoded in UTF-8.

I tried with both a) and b) as two (2) TO fields and see them both on edit after saving the draft. Same with just a single TO of a) or b). Of course the non-ascii chars are replaced with <?> as you show. This is with about Jul 26 trunk and with your patch attachment 9169034 [details] [diff] [review] so that might matter.

Gene tracked it down to a truncation that happens from a "high bit" UTF-16 character like ǿ which is U+01FF, down to U+00FF in the JS engine, so that smells like something running through an IDL-defined API of type string. Apparently moving some of those APIs in attachment 9169034 [details] [diff] [review] of the said bug doesn't help. I noticed that the recipients are all AString in nsIMsgCompFields.idl, so hard to tell where the information loss happens.

AFAICT, the problem is not exclusively caused by the high-byte getting chopped which only affect chars with code point U+01FF and higher. I see this getting called twice: https://searchfox.org/comm-central/rev/085526b22865ea90738d779fa14bf711d46f69d3/mailnews/mime/jsmime/jsmime.js#684. The first call returns properly encoded UTF-16 (converted from UTF-8). The 2nd call receives UTF-16 as input and fails when trying to convert any non-ascii from UTF-8 to UTF-16 resulting in the <?> chars displayed. The 2nd (redundant?) call is not a problem when the address to convert to UTF-16 is pure ascii.

The round trip was working in TB 68. Whilst this is not a terrible issue for TB 78.x, it's blocking further development in bug 1563891.

Yes, the fix for bug 1563891 (support SMTPUTF8) allows the recipient address to be non-ascii which reveals the problem.

Gene, maybe you're still around today. I took a look here:
https://hg.mozilla.org/comm-central/rev/019eaf0f9cf72621a436ce06b2607bc5a3532fae#l6.264
Can you revert this to the original, like restore the, for example, let msgReplyTo = msgCompFields.replyTo. I can't see why the parseEncodedHeader() call is necessary when it wasn't necessary before. It will lead to decoding raw UTF-8 from a header (encoded header) to a UTF-16 JS string. But I assume msgCompFields.replyTo is already a UTF-16 JS string, so why decode and parse it again? However, should parsing be required at that point, try parseDecodedHeader() instead since the JS string already has a decoded header and no longer an UTF-8 header in it.

So two options: Original code, or parseDecodedHeader().

I can try it in the morning unless you get there first.

Looking at the context, it looks like parsing is necessary if you have multiple To: or Cc/Bcc: recipients, see the old code here:
https://hg.mozilla.org/comm-central/rev/019eaf0f9cf72621a436ce06b2607bc5a3532fae#l6.294
So I guess, switching to parseDecodedHeader() is the go here. There are intricacies of transporting UTF-8 strings as single bytes in UTF-16 JS strings and it's important to use the right parse function. There's actually not much difference between them until you reach one of those UTF-8/UTF-16 points where you need to know exactly what the string contains since interpreting a JS string (UTF-16) as raw UTF-8 is fatal. So it looks like the "2nd call" you've described is the culprit and I bet you'll find that it's gone after the change I suggested.

For more confusing reading I suggest bug 1390337 ;-)

Attached patch fixes-reflected-addresses.diff (obsolete) — Splinter Review

Yes, I think you found the problem! Work fine now with just one call to the convert8BitHeader(). One of my earlier attempt to fix the problem was to change this https://searchfox.org/comm-central/rev/45c9a2810df38a7d8b4a9a396e244a9b438f3729/mailnews/mime/src/mimeParser.jsm#235 so that the call to convert8BitHeader() is skipped when the flag indicates an address. But that's what using the calls to parseDecodedHeader() instead of parseEncodedHeader() does so it looks like the correct fix. (And my fix failed under some conditions.)

Also, with this change not seeing the chopped off upper byte for chars like U+01FF. Didn't test with all the possible field types but OK with TO (including multiple TOs) and BCC.

Assignee: nobody → gds
Assignee: gds → nobody

Comment on attachment 9169356 [details] [diff] [review]
fixes-reflected-addresses.diff

Thanks, Gene.

Flags: needinfo?(alessandro)
Attachment #9169356 - Flags: review+
Assignee: nobody → gds
Status: NEW → ASSIGNED
Target Milestone: --- → 81 Branch

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/345cbe13bb71
Correct use of header parsing function in CompFields2Recipients() after bug 440377. r=jorgk DONTBUILD

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Attached patch 1658361-fix-header-parsing.patch (obsolete) — Splinter Review

[Approval Request Comment]
Regression caused by (bug #): Bug 440377
User impact if declined: "Edit Draft" or "Edit as new message" will result in garbled recipients if recipients field contains raw UTF-8 according to https://tools.ietf.org/html/rfc6532.
Testing completed (on c-c, etc.): Yes.
Risk to taking this patch (and alternatives if risky): This patch corrects a coding error which slipped through the review of bug 440377. Having this fix is also a pre-condition of getting bug 1563891 implemented.

Attachment #9169356 - Attachment is obsolete: true
Attachment #9169375 - Flags: review+
Attachment #9169375 - Flags: approval-comm-esr78?
Attachment #9169375 - Flags: approval-comm-beta?
Summary: Non-ASCII characters in e-mail address don't survive round trip of "save draft" and "edit draft" → "Edit as new message" not working of addressing header contains raw UTF-8, was: Non-ASCII characters in e-mail address don't survive round trip of "save draft" and "edit draft"
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #9169375 - Flags: approval-comm-esr78?
Attachment #9169375 - Flags: approval-comm-beta?

Sorry about that. I did take a look, but obviously I missed it (in a sea of orange) :-( - I can't see it on debug runs.

OK, so the issue is more difficult, parseEncodedHeader() is the wrong function since it will consider the input as raw UTF-8 string and go through here https://searchfox.org/comm-central/rev/51b7d1ea0a96a8c4d52812ac3e27360b2da4b91c/mailnews/mime/src/mimeParser.jsm#238 which is undesired. But the test shows that we want RFC 2047 decoding. I'll have to think of something.

Assignee: gds → jorgk-bmo

Actually, I see it now. Gene, instead of using parseEncodedHeader() use parseEncodedHeaderW() - W for wide string, so we pass in a wide UTF-16 string. That will work.

Comment on attachment 9169523 [details] [diff] [review]
1658361-fix-header-parsing.patch

Looks good. I tested against the original problem and still fixes it.

Attachment #9169523 - Flags: feedback?(gds) → feedback+

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/27513002dc77
Correct use of header parsing function in CompFields2Recipients() after bug 440377. r=jorgk

Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED

Comment on attachment 9169523 [details] [diff] [review]
1658361-fix-header-parsing.patch

[Approval Request Comment]
Regression caused by (bug #): Bug 440377
User impact if declined: "Edit Draft" or "Edit as new message" will result in garbled recipients if recipients field contains raw UTF-8 according to https://tools.ietf.org/html/rfc6532.
Testing completed (on c-c, etc.): Yes.
Risk to taking this patch (and alternatives if risky): This patch corrects a coding error which slipped through the review of bug 440377. Having this fix is also a pre-condition of getting bug 1563891 implemented.

Attachment #9169523 - Flags: approval-comm-esr78?
Attachment #9169523 - Flags: approval-comm-beta?

Comment on attachment 9169523 [details] [diff] [review]
1658361-fix-header-parsing.patch

[Triage Comment]
Approved for beta

Attachment #9169523 - Flags: approval-comm-beta? → approval-comm-beta+

Comment on attachment 9169523 [details] [diff] [review]
1658361-fix-header-parsing.patch

[Triage Comment]
Approved for esr78 (hasn't gone through beta, but simple enough)

Attachment #9169523 - Flags: approval-comm-esr78? → approval-comm-esr78+
You need to log in before you can comment on or make changes to this bug.