Closed Bug 1424359 Opened 7 years ago Closed 6 years ago

Simplify handling of TextDecoder in JS Mime after bug 1401528

Categories

(MailNews Core :: MIME, defect)

defect
Not set
minor

Tracking

(thunderbird_esr60 fixed, thunderbird60 fixed, thunderbird61 wontfix, thunderbird62 fixed)

RESOLVED FIXED
Thunderbird 62.0
Tracking Status
thunderbird_esr60 --- fixed
thunderbird60 --- fixed
thunderbird61 --- wontfix
thunderbird62 --- fixed

People

(Reporter: aceman, Assigned: jorgk-bmo)

References

Details

Attachments

(3 files, 5 obsolete files)

+++ This bug was initially created as a clone of Bug #1401528 comment 143 +++ In bug 1401528 we fixed handling the textdecoder variable in jsmime.jsm. I propose to rename the variable to something local to the file. This renaming would make me feel better about which decoder is actually used, because in jsmime.jsm the file jsmime.js (which uses the decoder) is loaded sooner than we redefine the decoder at the end of the file. It is the same as attachment 8923209 [details] [diff] [review], just the test includes jsmime.jsm (which includes jsmime.js) instead of jsmime.js. So then no duplicate definition of MimeTextDecoder is needed, which I objected to.
Attached patch 1424359.patch (obsolete) — Splinter Review
Attachment #8935876 - Flags: review?(Pidgeot18)
Attached patch 1424359.patch (v2) (obsolete) — Splinter Review
It turned out that nsIScriptableUnicodeConverter::convertFromByteArray() was removed from M-C. Also, TextDecoder now does exactly the same as the scriptable Unicode converter, so Joshua's magic can mostly go. See bug 1469499 comment #12 for further reading.
Attachment #8935876 - Attachment is obsolete: true
Attachment #8935876 - Flags: review?(Pidgeot18)
Attachment #8986641 - Flags: review?(acelists)
"Green" try. Check https://hg.mozilla.org/try-comm-central/graph/1c2f8c1c9325fdb7aa36ce51e36e7ff56250b2d7 to see that the changes here are included ;-)
Comment on attachment 8986641 [details] [diff] [review] 1424359.patch (v2) Review of attachment 8986641 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/mime/src/jsmime.jsm @@ +28,5 @@ > + let manager = Cc["@mozilla.org/charset-converter-manager;1"] > + .createInstance(Ci.nsICharsetConverterManager); > + this.charset = manager.getCharsetAlias(label); > + if (this.charset.toLowerCase() != "utf-7") > + throw new Error("UTF7TextDecoder: This code should never be reached for " + this.charset); throw new Error("UTF7TextDecoder can't be used for charset=" + label); @@ +36,4 @@ > decode: function (input, options = {}) { > let more = 'stream' in options ? options.stream : false; > + if (this.charset.toLowerCase() != "utf-7") > + throw new Error("UTF7TextDecoder: This code should never be reached for " + this.charset); You would never have this as the constructor checks. @@ +53,3 @@ > } catch (e) { > + // If TextDecoder fails, it must be UTF-7. > + return new UTF7TextDecoder(charset, options); Wouldn't it be better to check if it's UTF-7 before trying TextDecoder?
(In reply to Magnus Melin from comment #5) > Wouldn't it be better to check if it's UTF-7 before trying TextDecoder? Thanks for the comments. We'll, Joshua's original magic went via try/catch, that's all I can say. He wanted to ship all charsets that the "standard" TextDecoder couldn't handle to nsIScriptableUnicodeConverter which had been configured to handle additional charsets. I assume you read bug 1469499 comment #12 for further information. Sure, we could check for UTF-7 and it's aliases earlier, that would mean to move the > + let manager = Cc["@mozilla.org/charset-converter-manager;1"] > + .createInstance(Ci.nsICharsetConverterManager); > + this.charset = manager.getCharsetAlias(label); code there. In effect for performance, it might be simpler to just leave that part "as is" in the patch (and only remove the second throw).
True.
Attached patch 1424359.patch (v3) (obsolete) — Splinter Review
Since Magnus showed interest, maybe he wants to r+ as well ;-)
Attachment #8986641 - Attachment is obsolete: true
Attachment #8986641 - Flags: review?(acelists)
Attachment #8986694 - Flags: review?(mkmelin+mozilla)
Attachment #8986694 - Flags: review?(acelists)
Comment on attachment 8986694 [details] [diff] [review] 1424359.patch (v3) Review of attachment 8986694 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/mime/jsmime/test/head_xpcshell_glue.js @@ +43,5 @@ > .then(contents => callback(undefined, contents), callback); > }, > }; > requireCache.set("fs", fs); > +Components.utils.import("resource:///modules/jsmime.jsm"); How did the path change?
(In reply to :aceman from comment #9) > How did the path change? Please ask the original author of that hunk who would be, let me see, Aceman ;-) All I can see is that |Components.utils.import("resource:///modules/jsmime.jsm");| is used elsewhere and it's working.
(In reply to Jorg K (GMT+2) from comment #10) > All I can see is that > |Components.utils.import("resource:///modules/jsmime.jsm");| is used > elsewhere and it's working. I see now, it is a different file ;)
Henri, quick question: Before the arrival of encoding_rs the scriptable Unicode converter could be "enhanced" to do additional charsets. If enhanced, it could handle more charsets than the standard TextDecoder. Correct? With encoding_rs, the scriptable Unicode converter and the TextDecoder do exactly the same, so we need to treat UTF-7 on a different code path. I'm asking because in JS Mime we have three code paths: 1) Using TextDecoder, 2) not using TextDecoder and doing UTF-7 and 3) not using TextDecoder and not doing UTF-7. That 3rd code path stopped working with the removal of nsIScriptableUnicodeConverter::convertFromByteArray(). We never noticed since that 3rd code path is dead code. So here I'm proposing to remove it.
Flags: needinfo?(hsivonen)
Comment on attachment 8986694 [details] [diff] [review] 1424359.patch (v3) On an invalid charset, like in this header: From: =?ZUDF-8?Q?J=c3=b6rg_Knobloch?= <xx@xx.com> the existing software seems to default to UTF-8, despite this test: https://dxr.mozilla.org/comm-central/rev/51678c5b0b3edd827c57a70f1c5baf02d44acead/mailnews/mime/jsmime/test/test_header.js#555 So RFC2047 decoding doesn't substitute, but that seems to be done further up the callstack. Needs more investigation and potentially another test.
Attachment #8986694 - Flags: review?(mkmelin+mozilla)
Attachment #8986694 - Flags: review?(acelists)
(In reply to Jorg K (GMT+2) from comment #13) > So RFC2047 decoding doesn't substitute, but that seems to be done further up > the callstack. I tricked myself. My test case had an address that was in the address book, so the display name got replaced. Bad charsets are positively not defaulted. But maybe the code still needs a tweak.
Attached file UTF-7.eml
Test message with UTF-7 subject. Do this: Import this message twice, select both to get the conversations view. Result: Nothing, it doesn't work. Now try again with the patch that's coming.
Assignee: acelists → jorgk
Attached patch 1424359.patch (v4) (obsolete) — Splinter Review
OK, we'll talk about the differences about v3 and v4 in the next comment.
Attachment #8986694 - Attachment is obsolete: true
Attachment #8987219 - Flags: review?(acelists)
This is what I've added: https://bugzilla.mozilla.org/attachment.cgi?oldid=8986694&action=interdiff&newid=8987219&headers=1 + try { + this.charset = manager.getCharsetAlias(label); + } catch (ex) { + // Unknown charset, nothing we can do. + throw ex; + } So when we pass in an unrecognised charset, the getCharsetAlias() throws already. Now we prepare for this case and catch it so it doesn't work by accident like before. + // There are cases where this is called without input. + if (!input) + return ""; His is what fixes the multiple selection view. In that case, the decode() was called with undefined input leading to a JS error in function bytesToString(buffer) { var string = ''; for (var i = 0; i < buffer.length; i++) <=== JS error here. with unexpected results. With the extra check, all is well.
Comment on attachment 8987219 [details] [diff] [review] 1424359.patch (v4) Review of attachment 8987219 [details] [diff] [review]: ----------------------------------------------------------------- Thanks. Do we have a test checking the utf7 support? ::: mailnews/mime/src/jsmime.jsm @@ +48,5 @@ > + if (more) > + return ""; > + let manager = Cc["@mozilla.org/charset-converter-manager;1"] > + .getService(Ci.nsICharsetConverterManager); > + return manager.utf7ToUnicode(this.collectInput); Can you please see if this code is run for every line of message? You already fetch the manager in the constructor so maybe caching it would make sense if this is called often.
Attachment #8987219 - Flags: review?(acelists) → review+
(In reply to :aceman from comment #18) > Do we have a test checking the utf7 support? Yes we do: For bodies: mail/test/mozmill/composition/test-charset-edit.js 135 bodyPart: new SyntheticPartLeaf(UTF7, {charset: "utf-7"}) And for headers: mailnews/mime/test/unit/test_jsmime_charset.js 11 ["=?UTF-7?Q?+AKM-1?=", "\u00A31"], and a few more: https://dxr.mozilla.org/comm-central/search?q=utf-7+file%3A*.js&redirect=false for aliases. You didn't review the "bring back UTF-7" bug 1402813, so you didn't see those being re-established. Of course there isn't a test for what I mentioned in comment #15: Two messages with UTF-7 headers selected in the thread pane. > Can you please see if this code is run for every line of message? You > already fetch the manager in the constructor so maybe caching it would make > sense if this is called often. There is this misconception that JS Mime actually decodes bodies. It doesn't. But you're right, this should be moved into the constructor. I'll do that.
Attached patch 1424359.patch (v4b) (obsolete) — Splinter Review
Only get the charset-converter-manager in the constructor.
Attachment #8987219 - Attachment is obsolete: true
Attachment #8987342 - Flags: review+
'charset' is only used in the constructor and can be a local variable.
Attachment #8987342 - Attachment is obsolete: true
Attachment #8987345 - Flags: review+
Pushed by mozilla@jorgk.com: https://hg.mozilla.org/comm-central/rev/954cc5890cd1 Use MimeTextDecoder() to return actual text decoder used in jsmime.js. r=aceman,jorgk
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Comment on attachment 8987345 [details] [diff] [review] 1424359.patch (v4c) This contains fixes to the UTF-7 decoding, hence we need uplift.
Attachment #8987345 - Flags: approval-comm-esr60+
Attachment #8987345 - Flags: approval-comm-beta+
Target Milestone: --- → Thunderbird 62.0
Before landing the patch I had my doubts whether Joshua would approve. Basically jsmime.js is "pure" stand-alone code that only uses pure JS and no Mozilla-isms. jsmime.jsm is a wrapper around it doing the dirty work, for example adding extra charsets to the decoder. Now we've broken that concept since jsmime.js is no longer stand-alone as it relies on MimeTextDecoder(). However, the magic of assigning a new value to TextDecoder, so inside jsmime.js TextDecoder actually wasn't the original TextDecoder, has just cost us too much headache in the past, so I went ahead with the simplification. After all, this is code actively used in Thunderbird, so it must suit its needs first and foremost.
Comment on attachment 8987345 [details] [diff] [review] 1424359.patch (v4c) Review of attachment 8987345 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/mime/src/jsmime.jsm @@ +31,5 @@ > + try { > + charset = this.manager.getCharsetAlias(label); > + } catch (ex) { > + // Unknown charset, nothing we can do. > + throw ex; Catching just to throw it makes no sense ;)
Joshua, please see comment 24, any input?
(In reply to Magnus Melin from comment #26) > > + try { > > + charset = this.manager.getCharsetAlias(label); > > + } catch (ex) { > > + // Unknown charset, nothing we can do. > > + throw ex; > Catching just to throw it makes no sense ;) OK, so you document what's going on with a comment?
(In reply to Jorg K (GMT+2) from comment #12) > Henri, quick question: Before the arrival of encoding_rs the scriptable > Unicode converter could be "enhanced" to do additional charsets. If > enhanced, it could handle more charsets than the standard TextDecoder. > Correct? Correct. Neither TextDecoder nor nsIScriptableUnicodeConverter can be extended via XPCOM these days, but it was possible to extend nsIScriptableUnicodeConverter via XPCOM previously. nsIScriptableUnicodeConverter is going away (more slowly than I'd like).
Flags: needinfo?(hsivonen)
(In reply to Jorg K (GMT+2) from comment #28) > OK, so you document what's going on with a comment? Yes you can add a comment like "Will throw if the charset is unknown."
Addressing Magnus' comment.
Attachment #8987441 - Flags: review+
Pushed by mozilla@jorgk.com: https://hg.mozilla.org/comm-central/rev/b9c66fd1654a Follow-up: Replace re-throw with comment. r=mkmelin
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: