Closed Bug 1402813 Opened 7 years ago Closed 6 years ago

Re-establish UTF-7 support in JS Mime for headers and C++ MIME for bodies. Re-establish UTF-7 tests.

Categories

(MailNews Core :: MIME, enhancement)

enhancement
Not set
normal

Tracking

(thunderbird60+ fixed, thunderbird61 fixed)

RESOLVED FIXED
Thunderbird 61.0
Tracking Status
thunderbird60 + fixed
thunderbird61 --- fixed

People

(Reporter: jorgk-bmo, Assigned: jorgk-bmo)

Details

Attachments

(2 files, 6 obsolete files)

UTF-7 decoding was removed in bug 1363281 by switching to the new Rust-based encoding_rs. Tests were also removed here:

https://hg.mozilla.org/comm-central/rev/322986aa2dbabda0ba031d76ba2dc959c7cc9f7a#l3.12
 var tests = [
-  ["=?UTF-7?Q?+AKM-1?=", "\u00A31"],
-  ["=?UTF-7?Q?+AK?= =?UTF-7?Q?M-1?=", "\u00A31"],

Three things are needed to re-establish UTF-7 support in JS Mime:

Implement a scriptable encoder/decoder like nsICharsetConverterManager.mutf7ToUnicode() and nsICharsetConverterManager.unicodeToMutf7().

Integrate this into the FakeTextDecoder in jsmime.jsm.

Re-establish the tests.
Attached patch 1402813-utf-7.patch - (v1) (obsolete) — Splinter Review
This does the trick, it's basically a copy/paste job of what we have for MUTF-7.

mozilla/mach xpcshell-test mailnews/mime/test/unit/test_jsmime_charset.js
passes.

I'm not sure how you want to handle the integration in jsmime.jsm. I think we could make the fake decoder exclusively for UTF-7. We could even take the try/catch out and just use the fake decoder if UTF-7 in encountered.

For the time being I fiddled it into the existing code, but since a) the scriptable unicode converter is on the way out (but we might have to port it to C-C) and b) it doesn't do anything else that the "real" text decoder doesn't do, we might as well get rid of it.

Please let me know.
Attachment #8912092 - Flags: review?(Pidgeot18)
Comment on attachment 8912092 [details] [diff] [review]
1402813-utf-7.patch - (v1)

Review of attachment 8912092 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/mime/src/jsmime.jsm
@@ +43,5 @@
>        "@mozilla.org/charset-converter-manager;1"]
>        .createInstance(Components.interfaces.nsICharsetConverterManager);
> +    this.charset = manager.getCharsetAlias(label);
> +    if (this.charset.toLowerCase() != "utf-7")
> +      this._encoder.charset = manager.getCharsetAlias(label);

Oops, use this.charset.
And of course we only need UTF-7 to unicode, so I could take the other direction out again.
Here is the rebased patch. Joshua said on IRC in September 2017(!!) (2017-09-30T22:39:26.000Z)

jorgk: jcranmer: You can have utf-7 back, just do the review ;-)
jcranmer: jorgk: I don't like that patch for other reasons, but I might be able to get to it tonight to fix it

Sadly, more than six months later I haven't heard anything.

Apparently we need UTF-7 support form e-mail from Windows Live Mail which is still supported under Windows 10.

The C++ parts of the patch are copied from MUTF-7 (used for IMAP folder names) so there's nothing not to like. Test just re-establishes what was previously removed, again, nothing not to like.

So the only thing that Joshua might not like is in jsmime.jsm. If anyone can second-guess his intentions, please step forward.

The test in question is:
mach xpcshell-text comm/mailnews/mime/test/unit/test_jsmime_charset.js
(with M-C as top source dir).
Attachment #8912092 - Attachment is obsolete: true
Attachment #8912092 - Flags: review?(Pidgeot18)
Attachment #8970372 - Flags: review?(mkmelin+mozilla)
Attachment #8970372 - Flags: review?(acelists)
Attachment #8970372 - Flags: review?(Pidgeot18)
Minor change according to comment #2.
Attachment #8970372 - Attachment is obsolete: true
Attachment #8970372 - Flags: review?(mkmelin+mozilla)
Attachment #8970372 - Flags: review?(acelists)
Attachment #8970372 - Flags: review?(Pidgeot18)
Attachment #8970375 - Flags: review?(mkmelin+mozilla)
Attachment #8970375 - Flags: review?(acelists)
Attachment #8970375 - Flags: review?(Pidgeot18)
My sitting on this for months is inexcusable, but there is a reason for it.

Last September, I had lots of time owing to my previous job ending in August and the new one starting in October. When in the new job, I lost the ability to keep one eye on bugmail throughout the working day, with the predictable mess that I garnered a large backlog of email which meant most of my open source time consisted of working through email backlogs. And when it came to emails about this bug... I forgot what I objected to. The first thought that always crossed my mind was "I don't want progressive decoding," but rereading the bug, it doesn't do it so it can't be it.


Actually, in writing this up, I finally remembered what the problem was. One of the original things that prompted UTF-7 support was messages being sent with UTF-7 bodies. This patch, AIUI, doesn't support UTF-7 bodies in messages, only UTF-7 in RFC2047 support.
Comment on attachment 8970375 [details] [diff] [review]
1402813-utf-7.patch - (v1b) rebased

Review of attachment 8970375 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/intl/nsICharsetConverterManager.idl
@@ +63,5 @@
>      /**
> +     * Support for UTF-7 used by JS Mime.
> +     */
> +    AString utf7ToUnicode(in string aMutf7);
> +    ACString unicodeToUtf7(in wstring aUnicode);

Using string and wstring in IDL is frowned upon. You can replace string with ACString, as I'm pretty sure we don't have any warnings anymore about high-bits being set in these strings when crossing XPConnect.

(I'd say octet array, since UInt8Array can be passed to an octet array argument, but you need to concatenate the arrays anyways, so the benefit here).

unicodeToUtf7 is unused and should be deleted. Let's not pretend that we want to support encoding anything to UTF-7.
Comment on attachment 8970375 [details] [diff] [review]
1402813-utf-7.patch - (v1b) rebased

OK, I'll address Joshua's comments and do some special casing in nsMsgI18N.cpp to cater for UTF-7 bodies.
Attachment #8970375 - Flags: review?(mkmelin+mozilla)
Attachment #8970375 - Flags: review?(acelists)
Attachment #8970375 - Flags: review?(Pidgeot18)
Attached patch 1402813-utf-7.patch - (v2) (obsolete) — Splinter Review
Removed the UTF-16 to UTF-7 encoding.
Added special case to decode UTF-7 in nsMsgI18NConvertToUnicode() for bodies.
Switched string/wstring uses to nsACString/nsAString, also for pre-existing MUTF-7.
Assignee: nobody → jorgk
Attachment #8970375 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8970840 - Flags: review?(Pidgeot18)
Comment on attachment 8970840 [details] [diff] [review]
1402813-utf-7.patch - (v2)

I've just noticed that we should enable test_decode_utf-7.js again and also restore the testcase with UTF-7 in test-charset-edit.js by reverting this:
https://hg.mozilla.org/comm-central/rev/c744963d757b#l1.16
Attachment #8970840 - Flags: review?(Pidgeot18)
Attached patch 1402813-utf-7.patch - (v3) WIP (obsolete) — Splinter Review
UTF-7 bodies now display correctly. The Xpcshell test works again. The Mozmill test almost works, somehow getMsgHeaders(msg, true).get("") returns an empty body since some part of JS Mime still can't handle UTF-7. That's strange since the tricky fake decoder should handle it. Needs more investigation.

Joshua, any hints?
Attachment #8970840 - Attachment is obsolete: true
Flags: needinfo?(Pidgeot18)
I added some debug and for the Xpcshell test test_jsmime_charset.js fake decoders are created and I can see that the utf-7 string is actually decoded, debug is:
 0:01.09 pid:5656 === decoding utf-7 of +AKM-1
 0:01.09 pid:5656 == JS stack> resource:///modules/jsmime.jsm (60)
 0:01.09 pid:5656 == JS stack> resource:///modules/jsmime.jsm -> resource:///modules/jsmime/jsmime.js (695)
 0:01.09 pid:5656 == JS stack> resource:///modules/jsmime.jsm -> resource:///modules/jsmime/jsmime.js (713)
 0:01.09 pid:5656 == JS stack> resource:///modules/jsmime.jsm -> resource:///modules/jsmime/jsmime.js (211)
 0:01.09 pid:5656 == JS stack> resource:///modules/jsmime.jsm -> resource:///modules/jsmime/jsmime.js (1404)
 0:01.09 pid:5656 == JS stack> c:/mozilla-source/comm-central/obj-x86_64-pc-mingw32/_tests/xpcshell/comm/mailnews/mime/test/unit/test_jsmime_charset.js (26)

Not so for the Mozmill test. Again, a fake decoder is created for utf-7, but nothing ever gets decoded. JS stack is:

=== creating fake decoder for utf-7
== JS stack> resource:///modules/jsmime.jsm (93)
== JS stack> resource:///modules/jsmime.jsm -> resource:///modules/jsmime/jsmime.js (2428)
== JS stack> resource:///modules/jsmime.jsm -> resource:///modules/jsmime/jsmime.js (2149)
== JS stack> resource:///modules/jsmime.jsm -> resource:///modules/jsmime/jsmime.js (1979)
== JS stack> resource:///modules/mimeParser.jsm (121)
== JS stack> chrome://mozmill/content/modules/utils.js (437)
== JS stack> chrome://mozmill/content/modules/frame.js -> file:///c:/mozilla-source/comm-central/comm/mail/test/mozmill/composition/test-charset-edit.js (81)
== JS stack> chrome://mozmill/content/modules/frame.js -> file:///c:/mozilla-source/comm-central/comm/mail/test/mozmill/composition/test-charset-edit.js (146)

test-charset-edit.js (146) is
  dump("=== getMsgHeaders(msg, true).get() = |"+getMsgHeaders(msg, true).get("")+"\n").

jsmime.js (2428) is:
  this._decoder = new TextDecoder(this._charset);
in function Parser_startBody()

So the right decoder is created, but not called on the utf-7 body.

===

Finally I found the problem:
Body decode does this:

    if (this._decoder)
      return this._decoder.decode(buffer, {stream: more});

And my simple decoder does this:
      if (more)
        return "";
Flags: needinfo?(Pidgeot18)
Attached patch 1402813-utf-7.patch - (v4) (obsolete) — Splinter Review
Found and fixed the bug in JS Mime. Can't all the encoder with 'true' on the last call.
Attachment #8970935 - Attachment is obsolete: true
Attachment #8971089 - Flags: review?(Pidgeot18)
Comment on attachment 8971089 [details] [diff] [review]
1402813-utf-7.patch - (v4)

Green try.
Attachment #8971089 - Flags: review?(mkmelin+mozilla)
Attachment #8971089 - Flags: review?(acelists)
Attachment #8971089 - Flags: review?(acelists)
Comment on attachment 8971089 [details] [diff] [review]
1402813-utf-7.patch - (v4)

Review of attachment 8971089 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/base/util/nsMsgI18N.cpp
@@ +110,5 @@
>  {
>    #define IMAP_UTF7_BUF_LENGTH 100
>    nsUnicodeToMUTF7 converter;
>    static char buffer[IMAP_UTF7_BUF_LENGTH];
> +  char16_t *in = (char16_t *)aSrc.BeginReading();

I wonder why this needs the cast.

::: mailnews/mime/src/jsmime.jsm
@@ +16,5 @@
>  
>  var EXPORTED_SYMBOLS = ["jsmime"];
>  
> +function bytesToString(buffer) {
> +  var string = '';

Double quotes please.

@@ +18,5 @@
>  
> +function bytesToString(buffer) {
> +  var string = '';
> +  for (var i = 0; i < buffer.length; i++)
> +    string += String.fromCharCode(buffer[i]);

I think we require {} for loops.

::: mailnews/mime/src/mimemoz2.cpp
@@ +814,5 @@
> +    nsAutoString utf16;
> +    rv = CopyUTF7toUTF16(nsDependentCString(stringToUse, inLength), utf16);
> +    if (NS_FAILED(rv))
> +      return -1;
> +    CopyUTF16toUTF8(utf16, outString);

Does the Copy* return a success value? Can we check it?
Like return NS_SUCCEEDED(CopyUTF16toUTF8(utf16, outString))? 0 : -1);
(In reply to :aceman from comment #16)
> I wonder why this needs the cast.
I can try and get rid of it.

> Double quotes please.
No, that's not the style in JS Mime.

> I think we require {} for loops.
No, that's not the style in JS Mime.

> > +    CopyUTF16toUTF8(utf16, outString);
> Does the Copy* return a success value? Can we check it?
Infallible:
https://searchfox.org/mozilla-central/rev/08df4e6e11284186d477d7e5b0ae48483ecc979c/xpcom/string/nsReadableUtils.cpp#125
How could converting UTF-16 to UTF-8 fail?

So is that an r+ then modulo the cast ;-)
Removed the cast. Sorry about the noise.
Attachment #8971089 - Attachment is obsolete: true
Attachment #8971089 - Flags: review?(mkmelin+mozilla)
Attachment #8971089 - Flags: review?(acelists)
Attachment #8971089 - Flags: review?(Pidgeot18)
Attachment #8971965 - Flags: review?(mkmelin+mozilla)
Attachment #8971965 - Flags: review?(acelists)
Attachment #8971965 - Flags: review?(Pidgeot18)
Is JSMime allowed to have its own style? :)
The linked code seems to make a fallible allocation. Just that the whole function returns 'void'.
I was misled by the other CopyUTF7toUTF16 functions that do return a value.

I'd think converting utf-16 to utf-8 can fail on invalid bytes easily, unless we already checked that the utf-16 string is valid. OK, maybe the function would then be called CopyRandombytestreamToUTF8().
Allocation in Mozilla is equally infallible ;-) - That means that control does not return to the caller in case of failure.
Summary: Re-establish UTF-7 support in JS Mime → Re-establish UTF-7 support in JS Mime for headers and C++ MIME for bodies. Re-establish UTF-7 tests.
Comment on attachment 8971965 [details] [diff] [review]
1402813-utf-7.patch - (v4b)

Review of attachment 8971965 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me, r=mkmelin
Attachment #8971965 - Flags: review?(mkmelin+mozilla) → review+
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/f622e6a7a742
Re-establish UTF-7 support in message headers and bodies. r=mkmelin
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 61.0
Comment on attachment 8971965 [details] [diff] [review]
1402813-utf-7.patch - (v4b)

[Triage Comment]

Maybe Joshua still want so look at it.
Attachment #8971965 - Flags: review?(acelists) → approval-comm-beta+
Attachment #8971965 - Flags: review?(Pidgeot18)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: