Correctly fix Bug 1023285 and utf8 addressing header display for jsmime.
Categories
(MailNews Core :: MIME, defect)
Tracking
(thunderbird_esr6067+ fixed, thunderbird66 fixed, thunderbird67 fixed)
People
(Reporter: alta88, Assigned: alta88)
References
(Blocks 1 open bug)
Details
Attachments
(3 files, 8 obsolete files)
|
812 bytes,
text/plain
|
Details | |
|
5.67 KB,
patch
|
alta88
:
review+
|
Details | Diff | Splinter Review |
|
9.36 KB,
patch
|
jorgk-bmo
:
review+
jorgk-bmo
:
approval-comm-beta+
jorgk-bmo
:
approval-comm-esr60+
|
Details | Diff | Splinter Review |
Bug 1023285 fixed the display of not RFC2047 encoded utf8 in addressing headers in message pane, but subsequently broke gloda results/facet display and new message tooltip summary (hovering a folder tooltip). Gloda wasn't evident, since messages with RFC6532 utf8 headers (only stored by feeds, so far), were broken due to Bug 1522156 and fixed in Bug 1423487.
The correct fix is to ensure jsmime via callers of parseHeadersWithArray() gets the binary string it wants. The only place it isn't getting that is from the headers streamed by libmime into the header sink eventually processed by OutputEmailAddresses(). The prior fix needs to be reverted to allow the RAW option again.
This is the lowest risk fix, but possibly the better thing to do is to ensure libmime doesn't stream utf8 for pretty display but rather the binary string so the header parsing can be done upstream, in parseHeadersWithArray() and jsmime.
Comment 2•7 years ago
|
||
Hmm, I'm being thrown in here at the deep end :-(
Can you expand your comments a bit, where is libmime streaming anything? And what do you mean by "binary string"? It's all binary ;-)
Looking at the proposed change, you're replacing
- let value = MimeParser.parseHeaderField(aHeader || "",
- MimeParser.HEADER_ADDRESS |
- MimeParser.HEADER_OPTION_DECODE_2231 |
- MimeParser.HEADER_OPTION_DECODE_2047,
- undefined);
- let allAddresses = fixArray(value, false);
with a call to parseEncodedHeader which is essentially the same thing but with other flags:
let value = MimeParser.parseHeaderField(aHeader,
MimeParser.HEADER_ADDRESS | MimeParser.HEADER_OPTION_ALL_I18N, aCharset); // aCharset would be undefined.
return fixArray(value, aPreserveGroups, count);
So essentially your just adding HEADER_OPTION_ALLOW_RAW to the call, right?
(In reply to Jorg K (GMT+1) from comment #2)
Hmm, I'm being thrown in here at the deep end :-(
Can you expand your comments a bit, where is libmime streaming anything? And what do you mean by "binary string"? It's all binary ;-)
Looking at the proposed change, you're replacing
- let value = MimeParser.parseHeaderField(aHeader || "", - MimeParser.HEADER_ADDRESS | - MimeParser.HEADER_OPTION_DECODE_2231 | - MimeParser.HEADER_OPTION_DECODE_2047, - undefined); - let allAddresses = fixArray(value, false);with a call to
parseEncodedHeaderwhich is essentially the same thing but with other flags:let value = MimeParser.parseHeaderField(aHeader, MimeParser.HEADER_ADDRESS | MimeParser.HEADER_OPTION_ALL_I18N, aCharset); // aCharset would be undefined. return fixArray(value, aPreserveGroups, count);So essentially your just adding HEADER_OPTION_ALLOW_RAW to the call, right?
Right, this is just a reversion of the (your) change in Bug 1023285. What is required for jsmime is RAW meaning it will decode a raw octet binary string, but if it gets utf8, it will mojibake.
https://searchfox.org/comm-central/rev/b1306c2db4010cae30c8b0eec3f94e1ae5c3e272/mailnews/mime/jsmime/jsmime.js#1460
libmime streams headers for display to messageHeaderSink.processHeaders here
https://searchfox.org/comm-central/rev/b1306c2db4010cae30c8b0eec3f94e1ae5c3e272/mail/base/content/msgHdrView.js#461
in this function:
https://searchfox.org/comm-central/rev/b1306c2db4010cae30c8b0eec3f94e1ae5c3e272/mailnews/mime/emitters/nsMimeHtmlEmitter.cpp#161
With the current RAW flag removed, there is mojibake in a gloda search results From field and new messages folder tooltip. Since jsmime doc says it expects an octet string for RAW we should ensure that's what it gets. The only place it doesn't is messagepane display headers.
Anyway, rather than digging into all that, I'll attach a test file and you can see. Bottom line is that the libmime emitted values are already set for display and are therefore not RAW, so they need to be converted to RAW if we're going to use the parseHeadersWithArray() function, which we need to do currently in several places.
So to subscribe to this file, first enable file:// urls. In console issue:
FeedUtils._validSchemes.push("file");
then just subscribe to it using a file url (win) like
file:///C/.../feedFromTestAtom.xml
It used to be and maybe still is possible, on win, to just dnd from file explorer onto a feed account in folderpane.. You can easily see what happens with/without the patch; the debug will show the difference. To change the strings and test, also change the <id> tag value (to not be a duplicate) and get new messages.
Comment 5•7 years ago
|
||
Comment 6•7 years ago
|
||
No, not related to rev/32170e6fb298#l1.38 I quoted above.
(In reply to Jorg K (GMT+1) from comment #5)
Comment on attachment 9044425 [details] [diff] [review]
encoding.patchSo if you're reverting my change from bug 1023285, you'll regress that bug
(which I had forgotten about), no? Why would that be OK? The sample e-mail
in that bug uses "raw" UTF-8 in the e-mail headers which is OK according to
https://tools.ietf.org/html/rfc6532#section-3.2 as you pointed out yourself
over in that bug.Well, strangely enough, I applied the patch and lo and behold, the Greek
stuff still displays OK? How come? Since in a later bug we switched libmime
to use jsmime for headers as well?
(https://hg.mozilla.org/comm-central/rev/32170e6fb298#l1.38) - Or how do you
explain that the regression doesn't happen?
The backout would cause a regression for messagepane, but is fixed with
- emailAddresses = unescape(encodeURIComponent(emailAddresses || ""));
The backout fixes the regression to gloda search display; it already sends the right input to parseHeadersWithArray().
I'm still lost in the terminology here. What is "a raw octet binary string"?
I think in bytes and encodings, so the way a byte string is interpreted as
characters of a charset based on an encoding.In the test you added, clearly the source file is encoded in UTF-8, so
"David Håsäther" is a byte-sequence which represent the UTF-8 encoding of
those characters. You can even more clearly see it in the Greek example, Π
is CE A0 in UTF-8.In summary: I'm OK with backing out my incorrect fix which happened at TB
45(!!) if we can understand why it doesn't cause a problem today.
If you look at the debug, you'll see the utf8 first, then the encoding is done, then the raw octet string.
parseHeadersWithArray: aHeader1 - <«ΠΟΛΙΤΗΣ»>
parseHeadersWithArray: aHeader2 - <«ΠÎÎÎΤÎΣ»>
or in binary strings:
console.info: "parseHeadersWithArray: aHeader1 - <\xAB\u03A0\u039F\u039B\u0399\u03A4\u0397\u03A3\xBB>"
console.info: "parseHeadersWithArray: aHeader2 - <\xC2\xAB\xCE\xA0\xCE\x9F\xCE\x9B\xCE\x99\xCE\xA4\xCE\x97\xCE\xA3\xC2\xBB>"
parseHeadersWithArray() wants the second encoding, \xCE\xA and not \u03A0 (the utf8 for Π).
If this is clear enough, feel free to remove debug and land..
Comment 8•7 years ago
|
||
Sure, the "raw" UTF-8 encoding is \xCE\xA, \u03A0 is UTF-16 used internally in strings.
The backout would cause a regression for messagepane, but is fixed with
emailAddresses = unescape(encodeURIComponent(emailAddresses || ""));
Damn, I didn't see that code between the two debug lines :-(
It's clear enough now. Why didn't you come forward with this solution back in bug 1023285?
Comment 9•7 years ago
|
||
We should also revert the comment, right?
https://searchfox.org/comm-central/rev/b1306c2db4010cae30c8b0eec3f94e1ae5c3e272/mailnews/mime/public/nsIMsgHeaderParser.idl#182
Comment 10•7 years ago
|
||
| Assignee | ||
Comment 11•7 years ago
|
||
I didn't make the comment, it would require tracing the callers and determining what the input is to verify that it's only dealing with already decoded headers ready for display and just need final prettifying. I would assume, if the arg is utf8, it's not encoded to the raw string.
Comment 12•7 years ago
|
||
I've looked at this for a while.
I don't understand how unescape(encodeURIComponent(emailAddresses || "")); works. encodeURIComponent percent-encodes all the non-ASCII characters, and unescape then reverts that. Its documentation says to use decodeURIComponent which would be the exact reversal, I assume. I can only imagine that unescape returns UTF-8, but that wouldn't make any sense in the JS world.
The problems seems to be that we have an e-mail address stored in a JS string which is already decoded, so running it through parseHeadersWithArray which takes encoded headers, incl. raw UTF-8 is unfortunate.
So we can either convert back to UTF-8 or use the correct function. Here's my first version that encoded into UTF-8.
Comment 13•7 years ago
|
||
I have to correct my previous comment. At that call site we have a potentially RFC2047 encoded header bu tin UTF-16 already, so the decoding from "raw" UTF-8 has already happened elsewhere.
The "correct" function to use doesn't exist apparently, so I tried to create one. But that still doesn't work, I get mojibake in the header pane.
So the only way forward I see is Alta's original patch or my alternative which is a little clearer. Or maybe I have missed something and the "correct" function can be created.
Comment 14•7 years ago
|
||
OK, while we're here, we can remove use of a deprecated function with a clumsy API.
Comment 15•7 years ago
|
||
Comment tweak.
Comment 16•7 years ago
|
||
OK, I got the second version working as well. This creates the "correct" function to use and avoids re-encoding in UTF-8.
Comment 17•7 years ago
|
||
This passes:
mach xpcshell-test comm/mailnews/mime/test/unit/
mach xpcshell-test comm/mailnews/mime/jsmime/test/
with v2, didn't try v1, but that should pass, too.
Comment 18•7 years ago
|
||
| Assignee | ||
Comment 19•7 years ago
|
||
Comment 20•7 years ago
|
||
Thanks, let's go with this. Tweaked the comments a bit and changed the name of the new function.
Watch the next patch where I will replace EncodedHeader(NS_ConvertUTF16toUTF8(xxx));.
There really is no need to crazy back and forward conversions.
Comment 21•7 years ago
|
||
Here's part two which makes use of the new goodness ;-)
| Assignee | ||
Comment 22•7 years ago
|
||
Comment 23•7 years ago
|
||
I'll buy you a beer if this fails. All we cut out it this expensive conversion back to UTF-16 where we started:
https://searchfox.org/comm-central/rev/508fdc93f1f9612d56587ad653acde51d67fe81e/mailnews/mime/src/mimeParser.jsm#234
https://searchfox.org/comm-central/rev/508fdc93f1f9612d56587ad653acde51d67fe81e/mailnews/mime/jsmime/jsmime.js#574
Converting between UTF-16 and UTF-8 in C++/RS code is fairly cheap, but this looks expensive in JS and is totally unnecessary, especially converting it from UFT-16 to UTF-8 in JS and then back in JS which we killed in the first part.
Comment 24•7 years ago
|
||
Hmm, the Xpcshell was green, but MozMill got hit by some bustage. Here we go again:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=5f446150e4d51be774c0759773010fdf7befa775
Comment 25•7 years ago
|
||
OK, I owe you a beer now ;-) - MozMill completely obliterated with a variety of very strange errors, like:
EXCEPTION: this._treeElement is undefined
EXCEPTION: this._rowMap is null
EXCEPTION: parent.messengerBundle is null
EXCEPTION: aController.messageDisplay is null
Really strange.
Comment 26•7 years ago
|
||
Aha, reviewer didn't catch this one ;-)
JavaScript error: chrome://messenger/content/msgHdrView.js, line 1203: SyntaxError: missing ) after condition
Comment 27•7 years ago
|
||
Oops, needed to s/while/for/ :-(
Comment 28•7 years ago
|
||
Here with the while changed to for and unused index removed. Sigh, we should run the linter locally.
Updated•7 years ago
|
Comment 29•7 years ago
|
||
Mixed up displayName/fullAddress.
Comment 30•7 years ago
|
||
Fourth time lucky ;-)
| Assignee | ||
Comment 31•7 years ago
|
||
(In reply to Jorg K (GMT+1) from comment #26)
Aha, reviewer didn't catch this one ;-)
JavaScript error: chrome://messenger/content/msgHdrView.js, line 1203: SyntaxError: missing ) after condition
sorry, but the onus is on the author to compile and at least start the thing and select a message before submitting for review, and the reviewer should expect that has been done. plus, it was conditional on a try anyway, since i detected a bit of gunslinging.. ;D
(In reply to Jorg K (GMT+1) from comment #30)
Fourth time lucky ;-)
first time lucky is called skill. ;)
Comment 32•7 years ago
|
||
Well, part 2 was right first time, but then, I had a compiler at my disposal. Apparently linting helps in JS, but I didn't do it. Someone stole my landing spot, so I'll land that tonight.
Comment 33•7 years ago
|
||
Oh, what does "gunslinging" mean in this context?
| Assignee | ||
Comment 34•7 years ago
|
||
https://en.wikipedia.org/wiki/Gunfighter
these days, basically someone who shoots first, then runs a try.
Comment 35•7 years ago
|
||
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/99ea0ca93b9b
Back out bug 1023285 (rev 1ecabc4781e3) and fix handling of raw UTF-8 in headers properly. r=jorgk
https://hg.mozilla.org/comm-central/rev/1a4da3cb06f1
Introduce EncodedHeaderW() using ParseEncodedHeaderW(). r=alta88
Comment 37•7 years ago
|
||
Comment 38•7 years ago
|
||
TB 66 beta 3:
https://hg.mozilla.org/releases/comm-beta/rev/071ea9f37f83dc9bac2e603cff7003044c4ea84d
https://hg.mozilla.org/releases/comm-beta/rev/8eed720334644dd96ce6dd0dd788a0b2176b08c4
Updated•7 years ago
|
Comment 39•7 years ago
|
||
TB 60.6.1 or later:
https://hg.mozilla.org/releases/comm-esr60/rev/f2555e37393955a18fff44e96e5a1e31d49cbe88
https://hg.mozilla.org/releases/comm-esr60/rev/a0eed606d9fff31dfd0c0dc74fa4461c4d03739e
Updated•7 years ago
|
Description
•