Correctly fix Bug 1023285 and utf8 addressing header display for jsmime.

RESOLVED FIXED in Thunderbird 67.0

Status

defect
RESOLVED FIXED
4 months ago
3 months ago

People

(Reporter: alta88, Assigned: alta88)

Tracking

(Blocks 1 bug)

unspecified
Thunderbird 67.0
Dependency tree / graph

Thunderbird Tracking Flags

(thunderbird_esr6067+ fixed, thunderbird66 fixed, thunderbird67 fixed)

Details

Attachments

(3 attachments, 8 obsolete attachments)

Assignee

Description

4 months ago

Bug 1023285 fixed the display of not RFC2047 encoded utf8 in addressing headers in message pane, but subsequently broke gloda results/facet display and new message tooltip summary (hovering a folder tooltip). Gloda wasn't evident, since messages with RFC6532 utf8 headers (only stored by feeds, so far), were broken due to Bug 1522156 and fixed in Bug 1423487.

The correct fix is to ensure jsmime via callers of parseHeadersWithArray() gets the binary string it wants. The only place it isn't getting that is from the headers streamed by libmime into the header sink eventually processed by OutputEmailAddresses(). The prior fix needs to be reverted to allow the RAW option again.

Assignee

Comment 1

4 months ago
Posted patch encoding.patch (obsolete) — Splinter Review

This is the lowest risk fix, but possibly the better thing to do is to ensure libmime doesn't stream utf8 for pretty display but rather the binary string so the header parsing can be done upstream, in parseHeadersWithArray() and jsmime.

Assignee: nobody → alta88
Attachment #9044425 - Flags: feedback?(jorgk)
Assignee

Updated

4 months ago
Blocks: 1023285
Assignee

Updated

4 months ago
Blocks: 1526985

Comment 2

4 months ago

Hmm, I'm being thrown in here at the deep end :-(

Can you expand your comments a bit, where is libmime streaming anything? And what do you mean by "binary string"? It's all binary ;-)

Looking at the proposed change, you're replacing

-    let value = MimeParser.parseHeaderField(aHeader || "",
-                                            MimeParser.HEADER_ADDRESS |
-                                            MimeParser.HEADER_OPTION_DECODE_2231 |
-                                            MimeParser.HEADER_OPTION_DECODE_2047,
-                                            undefined);
-    let allAddresses = fixArray(value, false);

with a call to parseEncodedHeader which is essentially the same thing but with other flags:

    let value = MimeParser.parseHeaderField(aHeader,
      MimeParser.HEADER_ADDRESS | MimeParser.HEADER_OPTION_ALL_I18N, aCharset);  // aCharset would be undefined.
    return fixArray(value, aPreserveGroups, count);

So essentially your just adding HEADER_OPTION_ALLOW_RAW to the call, right?

Assignee

Comment 3

4 months ago

(In reply to Jorg K (GMT+1) from comment #2)

Hmm, I'm being thrown in here at the deep end :-(

Can you expand your comments a bit, where is libmime streaming anything? And what do you mean by "binary string"? It's all binary ;-)

Looking at the proposed change, you're replacing

-    let value = MimeParser.parseHeaderField(aHeader || "",
-                                            MimeParser.HEADER_ADDRESS |
-                                            MimeParser.HEADER_OPTION_DECODE_2231 |
-                                            MimeParser.HEADER_OPTION_DECODE_2047,
-                                            undefined);
-    let allAddresses = fixArray(value, false);

with a call to parseEncodedHeader which is essentially the same thing but with other flags:

    let value = MimeParser.parseHeaderField(aHeader,
      MimeParser.HEADER_ADDRESS | MimeParser.HEADER_OPTION_ALL_I18N, aCharset);  // aCharset would be undefined.
    return fixArray(value, aPreserveGroups, count);

So essentially your just adding HEADER_OPTION_ALLOW_RAW to the call, right?

Right, this is just a reversion of the (your) change in Bug 1023285. What is required for jsmime is RAW meaning it will decode a raw octet binary string, but if it gets utf8, it will mojibake.
https://searchfox.org/comm-central/rev/b1306c2db4010cae30c8b0eec3f94e1ae5c3e272/mailnews/mime/jsmime/jsmime.js#1460

libmime streams headers for display to messageHeaderSink.processHeaders here
https://searchfox.org/comm-central/rev/b1306c2db4010cae30c8b0eec3f94e1ae5c3e272/mail/base/content/msgHdrView.js#461
in this function:
https://searchfox.org/comm-central/rev/b1306c2db4010cae30c8b0eec3f94e1ae5c3e272/mailnews/mime/emitters/nsMimeHtmlEmitter.cpp#161

With the current RAW flag removed, there is mojibake in a gloda search results From field and new messages folder tooltip. Since jsmime doc says it expects an octet string for RAW we should ensure that's what it gets. The only place it doesn't is messagepane display headers.

Anyway, rather than digging into all that, I'll attach a test file and you can see. Bottom line is that the libmime emitted values are already set for display and are therefore not RAW, so they need to be converted to RAW if we're going to use the parseHeadersWithArray() function, which we need to do currently in several places.

Assignee

Comment 4

4 months ago

So to subscribe to this file, first enable file:// urls. In console issue:

FeedUtils._validSchemes.push("file");

then just subscribe to it using a file url (win) like

file:///C/.../feedFromTestAtom.xml

It used to be and maybe still is possible, on win, to just dnd from file explorer onto a feed account in folderpane.. You can easily see what happens with/without the patch; the debug will show the difference. To change the strings and test, also change the <id> tag value (to not be a duplicate) and get new messages.

Comment 5

4 months ago
Comment on attachment 9044425 [details] [diff] [review]
encoding.patch

So if you're reverting my change from bug 1023285, you'll regress that bug (which I had forgotten about), no? Why would that be OK? The sample e-mail in that bug uses "raw" UTF-8 in the e-mail headers which is OK according to https://tools.ietf.org/html/rfc6532#section-3.2 as you pointed out yourself over in that bug.

Well, strangely enough, I applied the patch and lo and behold, the Greek stuff still displays OK? How come? Since in a later bug we switched libmime to use jsmime for headers as well? (https://hg.mozilla.org/comm-central/rev/32170e6fb298#l1.38) - Or how do you explain that the regression doesn't happen?

I'm still lost in the terminology here. What is "a raw octet binary string"? I think in bytes and encodings, so the way a byte string is interpreted as characters of a charset based on an encoding.

In the test you added, clearly the source file is encoded in UTF-8, so "David Håsäther" is a byte-sequence which represent the UTF-8 encoding of those characters. You can even more clearly see it in the Greek example, Π is CE A0 in UTF-8.

In summary: I'm OK with backing out my incorrect fix which happened at TB 45(!!) if we can understand why it doesn't cause a problem today.
Attachment #9044425 - Flags: feedback?(jorgk) → feedback+

Comment 6

4 months ago

No, not related to rev/32170e6fb298#l1.38 I quoted above.

Assignee

Comment 7

4 months ago

(In reply to Jorg K (GMT+1) from comment #5)

Comment on attachment 9044425 [details] [diff] [review]
encoding.patch

So if you're reverting my change from bug 1023285, you'll regress that bug
(which I had forgotten about), no? Why would that be OK? The sample e-mail
in that bug uses "raw" UTF-8 in the e-mail headers which is OK according to
https://tools.ietf.org/html/rfc6532#section-3.2 as you pointed out yourself
over in that bug.

Well, strangely enough, I applied the patch and lo and behold, the Greek
stuff still displays OK? How come? Since in a later bug we switched libmime
to use jsmime for headers as well?
(https://hg.mozilla.org/comm-central/rev/32170e6fb298#l1.38) - Or how do you
explain that the regression doesn't happen?

The backout would cause a regression for messagepane, but is fixed with

  • emailAddresses = unescape(encodeURIComponent(emailAddresses || ""));

The backout fixes the regression to gloda search display; it already sends the right input to parseHeadersWithArray().

I'm still lost in the terminology here. What is "a raw octet binary string"?
I think in bytes and encodings, so the way a byte string is interpreted as
characters of a charset based on an encoding.

In the test you added, clearly the source file is encoded in UTF-8, so
"David Håsäther" is a byte-sequence which represent the UTF-8 encoding of
those characters. You can even more clearly see it in the Greek example, Π
is CE A0 in UTF-8.

In summary: I'm OK with backing out my incorrect fix which happened at TB
45(!!) if we can understand why it doesn't cause a problem today.

If you look at the debug, you'll see the utf8 first, then the encoding is done, then the raw octet string.
parseHeadersWithArray: aHeader1 - <«ΠΟΛΙΤΗΣ»>
parseHeadersWithArray: aHeader2 - <«Π���Τ�Σ»>
or in binary strings:
console.info: "parseHeadersWithArray: aHeader1 - <\xAB\u03A0\u039F\u039B\u0399\u03A4\u0397\u03A3\xBB>"
console.info: "parseHeadersWithArray: aHeader2 - <\xC2\xAB\xCE\xA0\xCE\x9F\xCE\x9B\xCE\x99\xCE\xA4\xCE\x97\xCE\xA3\xC2\xBB>"

parseHeadersWithArray() wants the second encoding, \xCE\xA and not \u03A0 (the utf8 for Π).

If this is clear enough, feel free to remove debug and land..

Comment 8

4 months ago

Sure, the "raw" UTF-8 encoding is \xCE\xA, \u03A0 is UTF-16 used internally in strings.

The backout would cause a regression for messagepane, but is fixed with emailAddresses = unescape(encodeURIComponent(emailAddresses || ""));

Damn, I didn't see that code between the two debug lines :-(

It's clear enough now. Why didn't you come forward with this solution back in bug 1023285?

Assignee

Comment 11

4 months ago

I didn't make the comment, it would require tracing the callers and determining what the input is to verify that it's only dealing with already decoded headers ready for display and just need final prettifying. I would assume, if the arg is utf8, it's not encoded to the raw string.

Comment 12

4 months ago
Posted patch encoding.patch (JK 1) (obsolete) — Splinter Review

I've looked at this for a while.

I don't understand how unescape(encodeURIComponent(emailAddresses || "")); works. encodeURIComponent percent-encodes all the non-ASCII characters, and unescape then reverts that. Its documentation says to use decodeURIComponent which would be the exact reversal, I assume. I can only imagine that unescape returns UTF-8, but that wouldn't make any sense in the JS world.

The problems seems to be that we have an e-mail address stored in a JS string which is already decoded, so running it through parseHeadersWithArray which takes encoded headers, incl. raw UTF-8 is unfortunate.

So we can either convert back to UTF-8 or use the correct function. Here's my first version that encoded into UTF-8.

Attachment #9045186 - Flags: feedback?(alta88)

Comment 13

4 months ago

I have to correct my previous comment. At that call site we have a potentially RFC2047 encoded header bu tin UTF-16 already, so the decoding from "raw" UTF-8 has already happened elsewhere.

The "correct" function to use doesn't exist apparently, so I tried to create one. But that still doesn't work, I get mojibake in the header pane.

So the only way forward I see is Alta's original patch or my alternative which is a little clearer. Or maybe I have missed something and the "correct" function can be created.

Comment 14

4 months ago
Posted patch encoding.patch (JK 1b) (obsolete) — Splinter Review

OK, while we're here, we can remove use of a deprecated function with a clumsy API.

Attachment #9045186 - Attachment is obsolete: true
Attachment #9045186 - Flags: feedback?(alta88)
Attachment #9045195 - Flags: feedback?(alta88)

Comment 15

4 months ago
Posted patch encoding.patch (JK 1c) (obsolete) — Splinter Review

Comment tweak.

Attachment #9045195 - Attachment is obsolete: true
Attachment #9045195 - Flags: feedback?(alta88)
Attachment #9045196 - Flags: feedback?(alta88)

Comment 16

4 months ago
Posted patch encoding.patch (JK 2b) (obsolete) — Splinter Review

OK, I got the second version working as well. This creates the "correct" function to use and avoids re-encoding in UTF-8.

Attachment #9045190 - Attachment is obsolete: true
Attachment #9045240 - Flags: feedback?(alta88)

Comment 17

4 months ago

This passes:
mach xpcshell-test comm/mailnews/mime/test/unit/
mach xpcshell-test comm/mailnews/mime/jsmime/test/
with v2, didn't try v1, but that should pass, too.

Assignee

Comment 19

4 months ago
Comment on attachment 9045240 [details] [diff] [review]
encoding.patch (JK 2b)

Ok, so the benefit of (all) this as opposed to the one liner is that it avoids an unnecessary conversion since utf8 is already sent up from libmime. But this is also why I said in comment 1 that the better place to look would be libmime, and output the 2047 encoded and raw string to its consumer (jsmime), thus offloading conversion to utf8 from libmime to jsmime. Something obviously never checked when implementing jsmime. 

Your version adds an extra function for one caller. Is that idl necessary?  On the plus side, it does get rid of one deprecated function (though there are several remaining uses), but this is tangential to the central issue and probably should be done separately with all the others.

You're confused by the output of unescape(encodeURIComponent()), a very standard and very hardened func, because devtools console. It shows like in comment 7 in stdout if you start from a shell command line. I think I'd almost rather go with the basically meaningless double conversion of these very short strings in a single message display situation than cargo cult yet another jsmime hack and create more idl.

Also:
https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding#The_Unicode_Problem

But if you really want to go this way, f+/r+.

Also, if you're touching OutputEmailAddresses() to remove the deprecated function, perhaps better to use
```
for (let addr of addresses}
  ...
  address.emailAddress = addr.email;
  ...
```
instead of
```
while (index < numAddresses)
```
Attachment #9045240 - Flags: feedback?(alta88) → feedback+
Assignee

Updated

4 months ago
Attachment #9045240 - Flags: feedback+
Assignee

Updated

4 months ago
Attachment #9045196 - Flags: feedback?(alta88)
Assignee

Updated

4 months ago
Attachment #9045240 - Flags: feedback+

Comment 20

4 months ago
Posted patch encoding.patch (JK 2c) (obsolete) — Splinter Review

Thanks, let's go with this. Tweaked the comments a bit and changed the name of the new function.

Watch the next patch where I will replace EncodedHeader(NS_ConvertUTF16toUTF8(xxx));.

There really is no need to crazy back and forward conversions.

Attachment #9044425 - Attachment is obsolete: true
Attachment #9045196 - Attachment is obsolete: true
Attachment #9045240 - Attachment is obsolete: true
Attachment #9045379 - Flags: review+

Comment 21

4 months ago

Here's part two which makes use of the new goodness ;-)

Attachment #9045384 - Flags: review?(alta88)
Assignee

Comment 22

4 months ago
Comment on attachment 9045384 [details] [diff] [review]
1528496-EncodedHeaderW.patch

nice, doesn't feel as bad if use is made of that idl.. it feels like there would be many more instances of this multiple encoding/decoding in libmime and elsewhere. anyway, this sort of thing should have a try, no?  r+ with that.
Attachment #9045384 - Flags: review?(alta88) → review+

Comment 23

4 months ago

https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=56cd72c675dd09b3f9731e33ff38c885a22ec113

I'll buy you a beer if this fails. All we cut out it this expensive conversion back to UTF-16 where we started:
https://searchfox.org/comm-central/rev/508fdc93f1f9612d56587ad653acde51d67fe81e/mailnews/mime/src/mimeParser.jsm#234
https://searchfox.org/comm-central/rev/508fdc93f1f9612d56587ad653acde51d67fe81e/mailnews/mime/jsmime/jsmime.js#574

Converting between UTF-16 and UTF-8 in C++/RS code is fairly cheap, but this looks expensive in JS and is totally unnecessary, especially converting it from UFT-16 to UTF-8 in JS and then back in JS which we killed in the first part.

Comment 24

4 months ago

Hmm, the Xpcshell was green, but MozMill got hit by some bustage. Here we go again:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=5f446150e4d51be774c0759773010fdf7befa775

Comment 25

4 months ago

OK, I owe you a beer now ;-) - MozMill completely obliterated with a variety of very strange errors, like:
EXCEPTION: this._treeElement is undefined
EXCEPTION: this._rowMap is null
EXCEPTION: parent.messengerBundle is null
EXCEPTION: aController.messageDisplay is null

Really strange.

Comment 26

4 months ago

Aha, reviewer didn't catch this one ;-)
JavaScript error: chrome://messenger/content/msgHdrView.js, line 1203: SyntaxError: missing ) after condition

Comment 28

4 months ago
Posted patch encoding.patch (JK 2d) (obsolete) — Splinter Review

Here with the while changed to for and unused index removed. Sigh, we should run the linter locally.

https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=f913ea40abd7eedc70154c99d37f1469aceb04d4

Attachment #9045379 - Attachment is obsolete: true

Updated

4 months ago
Attachment #9045589 - Flags: review+

Comment 30

4 months ago

Fourth time lucky ;-)

Assignee

Comment 31

4 months ago

(In reply to Jorg K (GMT+1) from comment #26)

Aha, reviewer didn't catch this one ;-)
JavaScript error: chrome://messenger/content/msgHdrView.js, line 1203: SyntaxError: missing ) after condition

sorry, but the onus is on the author to compile and at least start the thing and select a message before submitting for review, and the reviewer should expect that has been done. plus, it was conditional on a try anyway, since i detected a bit of gunslinging.. ;D

(In reply to Jorg K (GMT+1) from comment #30)

Fourth time lucky ;-)

first time lucky is called skill. ;)

Comment 32

4 months ago

Well, part 2 was right first time, but then, I had a compiler at my disposal. Apparently linting helps in JS, but I didn't do it. Someone stole my landing spot, so I'll land that tonight.

Comment 33

4 months ago

Oh, what does "gunslinging" mean in this context?

Assignee

Comment 34

4 months ago

https://en.wikipedia.org/wiki/Gunfighter

these days, basically someone who shoots first, then runs a try.

Assignee

Updated

4 months ago
Blocks: 574548

Comment 35

4 months ago

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/99ea0ca93b9b
Back out bug 1023285 (rev 1ecabc4781e3) and fix handling of raw UTF-8 in headers properly. r=jorgk
https://hg.mozilla.org/comm-central/rev/1a4da3cb06f1
Introduce EncodedHeaderW() using ParseEncodedHeaderW(). r=alta88

Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED

Comment 36

4 months ago

Should we backport this? To ESR?

Target Milestone: --- → Thunderbird 67.0

Comment 37

4 months ago
Comment on attachment 9045603 [details] [diff] [review]
encoding.patch (JK 2e)

Surely a beta backport won't hurt here.
Attachment #9045603 - Flags: review+
Attachment #9045603 - Flags: approval-comm-beta+

Updated

3 months ago
Attachment #9045603 - Flags: approval-comm-esr60?

Updated

3 months ago
Attachment #9045603 - Flags: approval-comm-esr60? → approval-comm-esr60+
You need to log in before you can comment on or make changes to this bug.