<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Assignee

Comment 5

•

8 years ago

Oops, I forgot the link: https://tools.ietf.org/html/rfc3501#section-5.1.3

Assignee

Comment 6

•

8 years ago

Looks like mailnews had its own UTF-7 decoder. And that relied on the removed nsUCSupport.h with classes like nsEncoderSupport.

Assignee

Comment 7

•

8 years ago

Attached patch WIP: 1363281-uconv.patch (v4). (obsolete) — Details — Splinter Review

This compiles all the way through (well, so far). Work left is marked MORE WORK in: - nsMsgCompose.cpp - mimemoz2.cpp - mimetext.cpp (fight over mozilla::Decoder* or UniquePtr<mozilla::Decoder>) Also, compilation of nsUnicodeToUTF7.cpp and nsUTF7ToUnicode.cpp disabled. That could be a bucket of fun since the underlying framework as gone away.

Attachment #8872556 - Attachment is obsolete: true

Attachment #8872656 - Attachment is obsolete: true

Assignee

Comment 8

•

8 years ago

I looked through my e-mail database for UTF-7. So its needed for IMAP and by the looks of it we've already handled it in the source code I stopped compiling for now. So how can that be adapted to the new scheme? In fact, the compile didn't go though, choking on: https://dxr.mozilla.org/comm-central/rev/62db49c50e60514a4178651227629a9625381cb4/mailnews/build/nsMailModule.cpp#1333 https://dxr.mozilla.org/comm-central/rev/62db49c50e60514a4178651227629a9625381cb4/mailnews/build/nsMailModule.cpp#1371 So I think what I've done so far, some little tweaks here and there to use the new encoder/decoder, is nothing compared to integrating UTF-7 support into the new scheme.

Assignee

Comment 9

•

8 years ago

Attached patch WIP: 1363281-uconv.patch (v4b). (obsolete) — Details — Splinter Review

nsMailModule.cpp forced to compile by removing the UTF-7 stuff. Now it really compiles all the way to the end. It even starts and runs and shows some messages.

Attachment #8872689 - Attachment is obsolete: true

Magnus Melin [:mkmelin]

Comment 10

•

8 years ago

Didn't investigate the feasibility, but maybe we can bite the bullet and move to a JavaScript utf-7 encoder. https://github.com/kkaefer/utf7 is apparently what gaia email ended up using for that.

Magnus Melin [:mkmelin]

Comment 11

•

8 years ago

And yes, we need utf-7 for email since the IMAP spec used that :( https://tools.ietf.org/html/rfc3501#section-5.1.3

Assignee

Comment 12

•

8 years ago

Attached patch WIP: 1363281-uconv.patch (v4c). (obsolete) — Details — Splinter Review

OK, of the open issues I fixed the pointer business in MIME. So issues left are: - nsMsgCompose.cpp - mimemoz2.cpp - UTF-7 stuff. The first two are easy, I'll do them in the next 24 hours.

Attachment #8872710 - Attachment is obsolete: true

Assignee

Comment 13

•

8 years ago

Well, I renamed an IMAP folder from HUHU to hühü and saw no problem.

Assignee

Comment 14

•

8 years ago

(In reply to Magnus Melin from comment #10) > Didn't investigate the feasibility, but maybe we can bite the bullet and > move to a JavaScript utf-7 encoder. We have a functioning UTF-7 encoder/decoder. The problem is that previously it was registered with Gecko as a "custom" encoder and it shared a common interface with other decoders, see nsEncoderSupport. So if the set of encoders is now "closed" and can't be extended, which I don't know, then all we can do is something like: if (charset == "utf-7") call mailnews' own utf-7 encoder else call the gecko decoder. That would be very cumbersome since encoder objects are retrieved once for a given charset and then passed around, previously as nsIUnicodeEncoder*, now as mozilla::Encoder*. I hope Henri will advise us on the issue.

Comment 15

•

8 years ago

IMAP modified UTF-7 ("x-imap4-modified-utf7" in old code) and email UTF-7 ("UTF-7" in old code) are different. I suggest handling the IMAP modified UTF-7 case outside the converter framework: refactoring the existing code for usage directly from the IMAP code without the converter participating in the converter framework used for headers and message content at all. In https://groups.google.com/d/msg/mozilla.dev.apps.thunderbird/8R8HU3gvif4/NJzZ9QkNBAAJ I said of email UTF-7: "I suggest introducing a wrapper type mozilla::EmailEncoding that wraps a pointer to mozilla::Encoding or nullptr. If [the wrapped pointer is] nullptr, the type represents UTF-7, otherwise it represents a Web encoding and delegates to the wrapped encoding." Dropping UTF-7 support for message body and headers would be a fine outcome if it's feasible to do so. IIRC, jcranmer has studied the feasibility and, as noted above, the Gaia email app ended up implementing UTF-7 decoding. It's sad that in 2017 we are still discussing UTF-7, when http://web.archive.org/web/19981202094534/http://www.imc.org/imcr-010.html from *1998* said "Fortunately, very few vendors implemented UTF-7, and its use is strongly discouraged in Internet mail."

Flags: needinfo?(hsivonen) → needinfo?(Pidgeot18)

Assignee

Comment 16

•

8 years ago

Attached patch WIP: 1363281-uconv.patch (v5). (obsolete) — Details — Splinter Review

Fixed nsMsgCompose.cpp. No idea why there was such complicated code to convert UTF-8 to UTF-16, hmm. One left to go: mimemoz2.cpp.

Attachment #8872770 - Attachment is obsolete: true

Assignee

Comment 17

•

8 years ago

Attached patch WIP: 1363281-uconv.patch (v6) (obsolete) — Details — Splinter Review

So this compiles and identifies all the call sites that need to be changed. Henri, would you be able to look this over and point all the errors I made. Most likely there are many and some stuff is clumsy :-( - I also left questions in comments in the code. I've basically looked at https://hg.mozilla.org/try/rev/6767a326ba28 while doing the conversion. I didn't read all the documentation in Encoding.h, I just wanted to get a feeling for the size of the job. I wasn't able to work out how to set the replacement character. Also, I copied the code for inserting '?', but I'm asking myself whether '?' is valid to be inserted into the encoded stream. Also please keep in mind that Thunderbird code is old and ugly, particularly the MIME code from the year 2000 which started as C code. Oh, in MIME, class MimeDisplayOptions, I'm using mozilla::Encoder* instead of UniquePtr<mozilla::Encoder> as members simply since those objects are assigned in mimeeobj.cpp |MimeDisplayOptions newopt = *obj->options;|, and that caused a compile error (asking for a cast). Most likely that's also wrong. Finally keep in mind that we Thunderbird people are "jacks of all trades, masters of none". We have to interface with all areas of Mozilla core, DOM, editor, storage, JS, Intl, just to name a few and we really can't know all that stuff as well as the highly qualified Mozilla module owners and peers. Lastly: (In reply to Henri Sivonen (:hsivonen) from comment #15) > I suggest handling the IMAP modified UTF-7 case outside the converter > framework: refactoring the existing code for usage directly from the IMAP > code without the converter participating in the converter framework used for > headers and message content at all. Agreed. I have yet to investigate where this is used in the IMAP code. So the UTF-7 stuff will be covered in a second patch.

Attachment #8872950 - Attachment is obsolete: true

Attachment #8873187 - Flags: feedback?(hsivonen)

Comment 18

•

8 years ago

(In reply to Jorg K (GMT+2) from comment #17) > I wasn't able to work out how to set the replacement character. There isn't a way to set it, by design. For decode, there should be no legitimate reason to use a replacement other than U+FFFD. For encode, there is only one replacement approach--decimal NCRs--that's relevant to the two cases where legacy (i.e. non-UTF-8) encode is exposed in the Web Platform. For cases other than URL query string parsing and form submission, output should be UTF-8 anyway, in which case there's no need for replacement. If you want a different replacement, you need to implement it yourself on top of the lower-level API (the *WithoutReplacement entry points). > Also, I > copied the code for inserting '?', but I'm asking myself whether '?' is > valid to be inserted into the encoded stream. It's valid in the sense that it's an ASCII byte that's not an ISO-2022-JP escape or a byte whose meaning differs between ISO-2022-JP ASCII and Roman modes. The easy way to avoid having to consider this issue is to make Thunderbird always and only use UTF-8 for outgoing email.

Comment 19

•

8 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #18) > There isn't a way to set it, by design. See https://docs.rs/encoding_rs/0.6.10/encoding_rs/#no-convenience-api-for-custom-replacements for design philosophy.

Assignee

Comment 20

•

8 years ago

Thanks for the comments so far, I hope you can give me some feedback on the code as well. (In reply to Henri Sivonen (:hsivonen) from comment #18) > The easy way to avoid having to consider this issue is to make Thunderbird > always and only use UTF-8 for outgoing email. I know you've been advocating this for years ;-) - bug 862292. I'm not up-to-speed with this discussion, but I understand that conservative Japanese forces what to adhere to ISO-2022-JP, CTE 7bit and even plain text.

Comment 21

•

8 years ago

Comment on attachment 8873187 [details] [diff] [review] WIP: 1363281-uconv.patch (v6) Review of attachment 8873187 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/base/util/nsMsgI18N.cpp @@ +50,5 @@ > } > else if (!PL_strcasecmp(aCharset, "UTF-8")) { > CopyUTF16toUTF8(inString, outString); > return NS_OK; > } The above optimization is built into mozilla::Encoding, so the above is now unnecessary. @@ +63,2 @@ > bool mappingFailure = false; > outString.Truncate(); Changing the length of a buffer wrapped as a span without rewrapping it is not OK. @@ +68,5 @@ > + size_t read; > + size_t written; > + mozilla::Tie(result, read, written) = > + encoder->EncodeFromUTF16WithoutReplacement(src, dst, false); > + if (result != mozilla::kInputEmpty && result != mozilla::kOutputFull) { It seems that upon kOutputFull, there's nothing that would grow outString and rewrap it as a dst span. @@ +75,5 @@ > + // an unmappable character, because otherwise > + // we'd have gotten `kOutputFull`. > + dst[written++] = '?'; > + } else { > + mappingFailure = true; Seems odd that this case doesn't add the replacement but doesn't break out early, either. @@ +119,1 @@ > outString.Truncate(); Truncating here seems unnecessary. @@ +219,5 @@ > + return false; > + auto encoder = encoding->NewEncoder(); > + > + uint8_t buffer[512]; > + auto src = mozilla::MakeSpan(nsDependentString(inString)); You can use MakeCStringSpan and avoid the intermediate nsDependentString. ::: mailnews/compose/src/nsMsgCompose.cpp @@ +3056,5 @@ > + // Should I use DecodeToUTF16() (with replacement) like in MIME which processes > + // potentially bad input data. Hmm. > + if (!inStr.IsEmpty()) { > + nsAutoString tmp; > + CopyUTF8toUTF16(inStr, tmp); CopyUTF8toUTF16 has undesirable behavior if the input is invalid. Unless you know it to be valid UTF-8, UTF_8_ENCODING->DecodeToUTF16WithoutBOMHandling() makes more sense. ::: mailnews/import/outlook/src/MapiMessage.cpp @@ +578,5 @@ > > + uint8_t buffer[512]; > + auto src = mozilla::MakeSpan(m_body); > + auto bufferSpan = mozilla::MakeSpan(buffer); > + auto dst = bufferSpan.To(bufferSpan.Length()); What's the purpose of this line and having dst distinct from bufferSpan? ::: mailnews/mime/src/mimetext.h @@ +65,5 @@ > char *cbuffer; /* Buffer used for charset conversion. */ > int32_t cbuffer_size; > > + mozilla::Decoder *inputDecoder; > + mozilla::Encoder *utf8Encoder; There is no need to pivot through UTF-16 to convert to UTF-8. All this code should become a direct decode to UTF-8. (Without holding the decoder as a raw pointer. If not streaming, there's no need to hold a decoder at all and the methods on mozilla::Encoding should suffice.)

Attachment #8873187 - Flags: feedback?(hsivonen) → feedback-

Assignee

Comment 22

•

8 years ago

Attached patch WIP: 1363281-uconv.patch (v7) (obsolete) — Details — Splinter Review

Thank you for you feedback, Henri. I hope you don't mind another round. I've addressed all your feedback. I cleaned up the MIME code and I'm now using methods on mozilla::Encoding where possible. The MIME function ConvertToUTF8() is still a mess. There doesn't appear to be a method on mozilla::Encoding to do the direct decoding to UTF-8. Also, I don't see a way to calculate a good buffer size to start with. There is also no example using this function. So your suggestions would be greatly appreciated.

Attachment #8873187 - Attachment is obsolete: true

Attachment #8873658 - Flags: feedback?(hsivonen)

Assignee

Comment 23

•

8 years ago

Comment on attachment 8873658 [details] [diff] [review] WIP: 1363281-uconv.patch (v7) Review of attachment 8873658 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/base/util/nsMsgI18N.cpp @@ +51,5 @@ > + > + auto encoding = mozilla::Encoding::ForLabelNoReplacement(nsDependentCString(aCharset)); > + if (!encoding) > + return NS_ERROR_UCONV_NOCONV; > + auto encoder = encoding->NewEncoder(); Unused variable.

Comment 24

•

8 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #15) > Dropping UTF-7 support for message body and headers would be a fine outcome > if it's feasible to do so. IIRC, jcranmer has studied the feasibility and, > as noted above, the Gaia email app ended up implementing UTF-7 decoding. I definitely seem to recall that a major client was liable to end up using UTF-7 in some configuration. (Looking back at bug 414064, apparently DSNs from Exchange were UTF-7, blech). Surprisingly, searching for fixed UTF-7 bugs in bugzilla turned up bug 614617, where one of our engineers thought it a good idea to send out UTF-7 emails in 2010? I have a test that we pull in UTF-7 for RFC 2047-decoding, but I'm not sure about body decoding tests for UTF-7. The test for x-mac-croatian can be ditched (I'm assuming encoding-rs drops support for those, the x-mac-croatian stuff is mostly about testing fallout of dumb Thunderbird defaults we changed a long while ago). Unfortunately, we've got some decoding being done in C++ code and some other decoding being done in JS code, which is not ideal for creating new encoding data structures. Looking on DXR, the imap4-modified-utf7 converter is only being called via the two nsMsgI18N.h helper functions. Switching those to using a hardcoded one-shot converter should be feasible if necessary (we definitely don't need streaming). The downside is that it makes a modicum of sense to reuse the UTF-7 converter with the imap4-modified-utf7 encoder which is, ugh. The other option is switching it all to a non-streaming variant and making the JSMime fake converter just pass everything off to the decoder at the end of the stream rather than trying to build a streaming decoder (UTF-7 should be rare enough that I'm fine making it pay a steep performance penalty).

Flags: needinfo?(Pidgeot18)

Comment 25

•

8 years ago

Comment on attachment 8873658 [details] [diff] [review] WIP: 1363281-uconv.patch (v7) Review of attachment 8873658 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/mime/src/mimemoz2.cpp @@ +816,5 @@ > > + auto encoding = mozilla::Encoding::ForLabelNoReplacement(nsDependentCString(input_charset)); > + if (!encoding) { > + // Assume input is UTF-8. > + encoding = UTF_8_ENCODING; This means that XSS-dangerous encodings like HZ and ISO-2022-KR are decoded as UTF-8 rather than being decoded as the replacement encoding. Is that the desired outcome? That is, is it certain that the output of this function will never be parsed as HTML with scripting enabled? Parsing it as HTML at all matters in terms of loading external images. (That is, some other process filtering external images before the message reaches Thunderbird and Thunderbird would disagree on the presence of external images.) In general the `ForLabelNoReplacement` variant should only be used if the action taken for `nullptr` return value is to stop and report an error. If the action is to use some other encoding as fallback, in general, the `ForLabel` variant should be used instead to avoid XSS issues. (I'll add this to the API docs.) @@ +821,4 @@ > } > + // Is there no method on Encoding to do the conversion to UTF-8? > + // With buffer allocation, etc. > + auto decoder = encoding->NewDecoder(); As noted in the other comment, the streaming semantics of this conversion are unclear to me. If it's non-streaming, it would be nice for the outparam to be nsACString& in which case you could use a convenience method on mozilla::Encoding. It it's streaming, then the decoder should survive over multiple calls to this method. If the desired semantics are non-streaming but the output has to be a raw buffer (not an XPCOM string), then instantiating a decoder here makes sense. @@ +825,2 @@ > > + // XXX Is it possbile to get a good start length?? mozilla::Decoder::MaxUTF8BufferLength() if you want simple. If you want complicated but more memory-efficient, you can start with MaxUTF8BufferLengthWithoutReplacement() first and then realloc the buffer to MaxUTF8BufferLength() if you get kOutputFull. The non-streaming convenience methods that require nsACString output take care of this. @@ +831,2 @@ > > + auto src = mozilla::AsBytes(mozilla::MakeSpan(nsDependentCString(stringToUse, inLength))); No need for the intermediate nsDependentCString. ::: mailnews/mime/src/modlmime.h @@ +222,5 @@ > into HTML. (This should return bytes which may appear in an HTML file, > ie, we must be able to scan through the string to search for "<" and > turn it in to "<", and so on.) > > `input' is a non-NULL-terminated string of a single line from the message. The streaming semantics of this function are unclear to me. The comment suggests that the input always ends in a line break or was immediately followed by one. In that case, for encodings other than ISO-2022-JP, UTF-16LE, UTF-16BE and replacement it doesn't matter whether the conversion is done in a streaming manner (same converter instance for all lines) or not (new converter for each line). For ISO-2022-JP streaming vs. non-streaming conversion doesn't matter if input conforms to the extra email requirement of shifting back to the ASCII state at the end of each line. Can input be trusted to conform to that requirement, though? Also, is there a guarantee that this method is never used with UTF-16LE/UTF-16BE input? Or if it is that the caller is able to locate non-ASCII-compatible line breaks?

Attachment #8873658 - Flags: feedback?(hsivonen) → feedback-

Assignee

Comment 26

•

8 years ago

Thanks for your comments. Just a brief reply for now. (In reply to Henri Sivonen (:hsivonen) from comment #25) > > + if (!encoding) { > > + // Assume input is UTF-8. > > + encoding = UTF_8_ENCODING; > This means that XSS-dangerous encodings like HZ and ISO-2022-KR are decoded > as UTF-8 rather than being decoded as the replacement encoding. That was motivated by the current code: https://dxr.mozilla.org/comm-central/rev/62db49c50e60514a4178651227629a9625381cb4/mailnews/mime/src/mimetext.cpp#369 MIME_get_unicode_decoder(text->charset, getter_AddRefs(text->inputDecoder)); // If no decoder found, use ""UTF-8"", that will map most non-US-ASCII chars as invalid // A pure-ASCII only decoder would be better, but there is none if (text->inputDecoder == nullptr) MIME_get_unicode_decoder("UTF-8", getter_AddRefs(text->inputDecoder)); > As noted in the other comment, the streaming semantics of this conversion > are unclear to me. If it's non-streaming, it would be nice for the outparam > to be nsACString& in which case you could use a convenience method on > mozilla::Encoding. It it's streaming, then the decoder should survive over > multiple calls to this method. [snip] > The non-streaming convenience methods that require nsACString output take > care of this. For all I can tell, non-streaming, converting a line at a time, there was even the assumptions that lines are 72 characters long, but in fact, according to RCF 821 they can be 1000 characters long. https://dxr.mozilla.org/comm-central/rev/62db49c50e60514a4178651227629a9625381cb4/mailnews/mime/src/mimemoz2.cpp#816 I have considered changing the output parameter to an nsACString&, but I'd have to see how far that ripples up. There is not a single ns*String used in mimetext.cpp (well, one use to get a preference value). The "convenience function" appears to be DecodeWithoutBOMHandlingAndWithoutReplacement() which has two interfaces converting to both UTF-16 and UTF-8.

Comment 27

•

8 years ago

(In reply to Jorg K (GMT+2) from comment #26) > The "convenience function" appears to be > DecodeWithoutBOMHandlingAndWithoutReplacement() which has two interfaces > converting to both UTF-16 and UTF-8. DecodeWithoutBOMHandling(), rather, since the old code did replacement.

Assignee

Comment 28

•

8 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #25) > > + // XXX Is it possbile to get a good start length?? > mozilla::Decoder::MaxUTF8BufferLength() if you want simple. [snip] > The non-streaming convenience methods that require nsACString output take > care of this. I was tempted to use the convenience method, but looking at |inline std::tuple<std::string, bool> decode_without_bom_handling| I see that it uses |size_t needed = decoder->max_utf8_buffer_length(bytes.size());|. So I might as well go with the simple MaxUTF8BufferLength() and won't be worse.

Assignee

Comment 29

•

8 years ago

Attached patch 1363281-uconv.patch (v8) (obsolete) — Details — Splinter Review

OK, this takes the last feedback on board. Maybe use interdiff to see the few changes. Sorry I can't answer all questions about the legacy MIME code.

Attachment #8873658 - Attachment is obsolete: true

Attachment #8873914 - Flags: feedback?(hsivonen)

Assignee

Comment 30

•

8 years ago

Attached patch 1363281-uconv.patch (v8b) (obsolete) — Details — Splinter Review

I did something really silly and ran the wrong binary for testing (old 32bit build instead of new 64bit build due to wrong shortcut). Switching to the right binary I noticed some things that didn't work. So with this patch, the application starts, folders and message are displayed. I mildly tested replying to a message and sending a message (exercises code in nsMsgCompose.cpp and nsMsgI18N.cpp). However, IMAP folder names look really funny since there is currently no UTF-7 support. Obviously my comment #13 was wrong. Henri, for further feedback, please interdiff v7 and v8b.

Attachment #8873914 - Attachment is obsolete: true

Attachment #8873914 - Flags: feedback?(hsivonen)

Attachment #8874190 - Flags: feedback?(hsivonen)

Assignee

Comment 31

•

8 years ago

To look at the UTF-7 issue, I've added some debug and I see: === nsMsgI18NConvertToUnicode: x-imap4-modified-utf7 |INBOX.&xUjFSMVIxUjFSMVIxUg-| === nsMsgI18NConvertFromUnicode: x-imap4-modified-utf7 |Trash| So as expected, the IMAP folder names pass though nsMsgI18N.cpp. They should not pass through the compose code or the MIME code (or the MAPI code). So I will add a special case there to get IMAP going again. That shouldn't be too hard with the existing code in nsUTF7ToUnicode.cpp and nsUnicodeToUTF7.cpp.

Assignee

Comment 32

•

8 years ago

Attached patch WIP: 1363281-utf7-imap.patch (v1) (obsolete) — Details — Splinter Review

This hacks MUTF-7 to Unicode into existence using the existing code. So is this the approach we're going to take? It's basically what Joshua called: > Switching those [calls in nsMsgI18N.cpp] to using a hard-coded one-shot converter should be feasible if necessary. With this patch, IMAP folders look good again. The patch is WIP and the Unicode to MUTF-7 part is still missing but not hard to do.

Assignee

Comment 33

•

8 years ago

Attached patch 1363281-utf7-imap.patch (v2) (obsolete) — Details — Splinter Review

OK, this does the UFT-7 encoding to and from Unicode using the existing code. Tested by renaming an IMAP folder called 안a안b안c안d안e안f안gAAABBBCCC to 안a안b안c안d안e안f안gAAABBBDDD (while having the buffer size set to 10 to exercise the loop). So the two patches together move TB into the new age of Rust.

Attachment #8874205 - Attachment is obsolete: true

Attachment #8874213 - Flags: feedback?(mkmelin+mozilla)

Attachment #8874213 - Flags: feedback?(Pidgeot18)

Comment 34

•

8 years ago

Comment on attachment 8874213 [details] [diff] [review] 1363281-utf7-imap.patch (v2) Review of attachment 8874213 [details] [diff] [review]: ----------------------------------------------------------------- So, ideally, I'd like to see the x-imap4-modified-utf7 removed from the catch-all functions and just have the code be inlined directly into CopyUTF16ToMUTF7 and its inverse (which means moving their implementation from an inline function in nsMsgI18N.h to nsMsgI18N.cpp). In particular, our current implementation makes it possible to say Content-Type: text/plain; charset="x-imap4-modified-utf7" and trigger using MUTF7, which I don't think is desirable. On the other hand, we are going to need to directly call the UTF-7 decoder from somewhere for jsmime and libmime, and keeping the similarity of code for these pieces may make some sense. I don't exactly see where you're triggering the UTF-7 decoder in such a way that we can decode UTF-7 text/ parts or UTF-7 in RFC 2047 encoded-words, so I can't comment on what work is necessary. ::: mailnews/base/util/nsMsgI18N.cpp @@ +52,5 @@ > LossyCopyUTF16toASCII(inString, outString); > return NS_OK; > } > + else if (!PL_strcasecmp(aCharset, "x-imap4-modified-utf7")) { > + nsUnicodeToMUTF7 *converter = new nsUnicodeToMUTF7(); You don't need to heap-allocate here. Just say nsUnicodeToMUTF7 converter;, and then you don't need to delete it later. @@ +53,5 @@ > return NS_OK; > } > + else if (!PL_strcasecmp(aCharset, "x-imap4-modified-utf7")) { > + nsUnicodeToMUTF7 *converter = new nsUnicodeToMUTF7(); > + char buffer[IMAP_UTF7_BUF_LENGTH]; Please don't use a stack buffer. Use at least static instead. @@ +57,5 @@ > + char buffer[IMAP_UTF7_BUF_LENGTH]; > + char16_t *in = (char16_t *)inString.get(); > + int32_t inLen = inString.Length(); > + outString.Truncate(); > + while (inLen) { while (inLen > 0) is better. @@ +118,5 @@ > return NS_ERROR_UNEXPECTED; > } > + else if (!PL_strcasecmp(aCharset, "x-imap4-modified-utf7")) { > + nsMUTF7ToUnicode *converter = new nsMUTF7ToUnicode(); > + char16_t buffer[IMAP_UTF7_BUF_LENGTH]; So, for converting UTF7 to unicode, it's the case that we have a good upper bound on the size of the string we need--namely, the converted string can't be larger than the input string (it doesn't quite hold true for UTF-8--a span of 3*n UCS-2 characters use 8*n + 2 characters in UTF-7 while they could use 9*n characters in UTF-8). So rather than allocating a temporary buffer and the copying elements in, just ensure that the output string has sufficient space, convert into that buffer. Something like (untested): // Add a comment about UTF-7 being strictly larger in encoding size than UTF-16. nsMUTF7ToUnicode converter; int32_t outLen = inString.Length(); outString.SetCapacity(outLen); int32_t inLen = outLen; converter->ConvertNoBuff(in, &inLen, outString.BeginWriting(), &outLen); MOZ_ASSERT(inLen == inString.Length(), "UTF-7 should not produce a longer output"); outString.SetLength(outLen);

Attachment #8874213 - Flags: feedback?(Pidgeot18) → feedback+

Assignee

Comment 35

•

8 years ago

I tried to follow Joshua's advice to move the UTF-7 stuff into CopyUTF16toMUTF7() and its friend. Having done this, I started TB (after having renamed my test folder on a different machine/profile) and I was greeted by: === The folder 'h&APw-h&APw-' could not be found, so filter(s) associated with this folder will be disabled. Verify that the folder exists, and that filters point to a valid destination folder. === So some other parts of the system run the catch-all functions. So all I can take on board are the other suggestions. Updated patch coming.

Assignee

Comment 36

•

8 years ago

Damn, as much as I tried, I could reproduce this condition any more. So let's attach both versions.

Assignee

Comment 37

•

8 years ago

Attached patch 1363281-utf7-imap.patch (v3) (obsolete) — Details — Splinter Review

Version with the UTF-7 stuff moved out of the catch-all function.

Assignee

Comment 38

•

8 years ago

Attached patch 1363281-utf7-imap-JK.patch (v3-alt) (obsolete) — Details — Splinter Review

Processing still in the catch-all function, but Joshua's remaining comments addressed. I noticed that I can't rename a folder to "Hö" with a trailing Umlaut, so something's wrong here.

Attachment #8874213 - Attachment is obsolete: true

Attachment #8874213 - Flags: feedback?(mkmelin+mozilla)

Assignee

Comment 39

•

8 years ago

Attached patch 1363281-utf7-imap.patch (v4) (obsolete) — Details — Splinter Review

OK, fixed the "can't rename to 'Hö'" problem, there was a "finish" call missing. This is following all of Joshua's suggestions now, but I'm afraid we'll see a funny error message as per comment #35. Hmm, no idea how to trigger it.

Attachment #8874262 - Attachment is obsolete: true

Attachment #8874263 - Attachment is obsolete: true

Attachment #8874265 - Flags: feedback?(Pidgeot18)

Assignee

Comment 40

•

8 years ago

For the record: This converter.charset = "x-imap4-modified-utf7"; https://dxr.mozilla.org/comm-central/rev/62db49c50e60514a4178651227629a9625381cb4/mailnews/test/fakeserver/imapd.js#307 will of course not work any more.

Comment 41

•

8 years ago

(In reply to Jorg K (GMT+2) from comment #28) > (In reply to Henri Sivonen (:hsivonen) from comment #25) > > > + // XXX Is it possbile to get a good start length?? > > mozilla::Decoder::MaxUTF8BufferLength() if you want simple. > [snip] > > The non-streaming convenience methods that require nsACString output take > > care of this. > I was tempted to use the convenience method, but looking at |inline > std::tuple<std::string, bool> decode_without_bom_handling| I see that it > uses |size_t needed = decoder->max_utf8_buffer_length(bytes.size());|. So I > might as well go with the simple MaxUTF8BufferLength() and won't be worse. It seems that you are reading the sample code that uses the C++ standard library. That code isn't used in Gecko with XPCOM/MFBT types. The Gecko-relevant bit is: https://hg.mozilla.org/try/file/6767a326ba28/intl/encoding_glue/src/lib.rs#l316 which tries to use decoder.max_utf8_buffer_length_without_replacement() first and stretches to decoder.max_utf8_buffer_length() if applicable. Using the convenience method would also avoid heap-allocating a new decoder for each converted line. (Since the convenience-method is Rust-backed, it can stack-allocate the decoder. Stack allocation isn't available to C++ due to there being no stable ABI for returning a Decoder by value to C. Recycling a heap-allocation is available to C++, though, in the form of NewDecoderWithoutBOMHandlingInto().)

Assignee

Comment 42

•

8 years ago

Yep, I was reading in |inline std::tuple<std::string, bool> decode_without_bom_handling(| in the first part of the patch. It didn't look like Rust to me ;-) - Sorry. I'll look into using NewDecoderWithoutBOMHandlingInto() or would you prefer to drill open that call stack and use nsACString? That can be done, too, and then I don't have to worry about the fine print.

Comment 43

•

8 years ago

Comment on attachment 8874190 [details] [diff] [review] 1363281-uconv.patch (v8b) Review of attachment 8874190 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/base/util/nsMsgI18N.cpp @@ +46,5 @@ > !PL_strcasecmp(aCharset, "us-ascii") || > !PL_strcasecmp(aCharset, "ISO-8859-1"))) { > LossyCopyUTF16toASCII(inString, outString); > return NS_OK; > } us-ascii and ISO-8859-1 are no longer canonical names and the above behavior differs from letting encoding_rs handle them (as labels for windows-1252). Is the special-casing of us-ascii and ISO-8859-1 above actually desirable? @@ +214,5 @@ > + // Didn't use all the input but the outout isn't full, hence > + // there was an unencodable character. > + result = false; > + break; > + } It seems that the loop should be broken out of (without setting result = false) on kInputEmpty to avoid and infinite loop. ::: mailnews/import/outlook/src/MapiMessage.cpp @@ +586,5 @@ > + if (result != mozilla::kInputEmpty && result != mozilla::kOutputFull) { > + // Didn't use all the input but the outout isn't full, hence > + // there was an unencodable character. > + return false; > + } This loop, too, needs breaking out of (or just "return true;") on kInputEmpty.

Attachment #8874190 - Flags: feedback?(hsivonen) → feedback-

Comment 44

•

8 years ago

(In reply to Jorg K (GMT+2) from comment #42) > Yep, I was reading in |inline std::tuple<std::string, bool> > decode_without_bom_handling(| in the first part of the patch. It didn't look > like Rust to me ;-) - Sorry. > > I'll look into using NewDecoderWithoutBOMHandlingInto() or would you prefer > to drill open that call stack and use nsACString? That can be done, too, and > then I don't have to worry about the fine print. I recommend switching the outparam to nsACString unless that's particularly messy. Note that if the input is in an nsACString, too, the conversion step becomes a mere nsStringBuffer refcount bump when possible.

Assignee

Comment 45

•

8 years ago

Thanks for the comments. Today is a public holiday in Germany, but I'll get to it soon.

Assignee

Comment 46

•

8 years ago

Henri, you questioned this code: https://dxr.mozilla.org/comm-central/rev/62db49c50e60514a4178651227629a9625381cb4/mailnews/base/util/nsMsgI18N.cpp#45 else if (!aReportUencNoMapping && (!*aCharset || !PL_strcasecmp(aCharset, "us-ascii") || !PL_strcasecmp(aCharset, "ISO-8859-1"))) { LossyCopyUTF16toASCII(inString, outString); return NS_OK; } How about this: https://dxr.mozilla.org/comm-central/rev/62db49c50e60514a4178651227629a9625381cb4/mailnews/base/util/nsMsgI18N.cpp#127 else if (!*aCharset || !PL_strcasecmp(aCharset, "us-ascii") || !PL_strcasecmp(aCharset, "ISO-8859-1")) { // Despite its name, it also works for Latin-1. CopyASCIItoUTF16(inString, outString); return NS_OK; } else if (!PL_strcasecmp(aCharset, "UTF-8")) { if (MsgIsUTF8(inString)) { nsAutoString tmp; CopyUTF8toUTF16(inString, tmp); if (!tmp.IsEmpty() && tmp.First() == char16_t(0xFEFF)) tmp.Cut(0, 1); outString.Assign(tmp); return NS_OK; } NS_WARNING("Invalid UTF-8 string"); return NS_ERROR_UNEXPECTED; }

Comment 47

•

8 years ago

(In reply to Jorg K (GMT+2) from comment #46) > Henri, you questioned this code: > > https://dxr.mozilla.org/comm-central/rev/ > 62db49c50e60514a4178651227629a9625381cb4/mailnews/base/util/nsMsgI18N.cpp#45 > else if (!aReportUencNoMapping && (!*aCharset || > !PL_strcasecmp(aCharset, "us-ascii") || > !PL_strcasecmp(aCharset, "ISO-8859-1"))) { > LossyCopyUTF16toASCII(inString, outString); > return NS_OK; > } > > How about this: > https://dxr.mozilla.org/comm-central/rev/ > 62db49c50e60514a4178651227629a9625381cb4/mailnews/base/util/nsMsgI18N.cpp#127 > else if (!*aCharset || !PL_strcasecmp(aCharset, "us-ascii") || > !PL_strcasecmp(aCharset, "ISO-8859-1")) { > // Despite its name, it also works for Latin-1. > CopyASCIItoUTF16(inString, outString); > return NS_OK; > } Same issue but the other way round. > else if (!PL_strcasecmp(aCharset, "UTF-8")) { > if (MsgIsUTF8(inString)) { > nsAutoString tmp; > CopyUTF8toUTF16(inString, tmp); > if (!tmp.IsEmpty() && tmp.First() == char16_t(0xFEFF)) > tmp.Cut(0, 1); > outString.Assign(tmp); > return NS_OK; > } > NS_WARNING("Invalid UTF-8 string"); > return NS_ERROR_UNEXPECTED; > } This doesn't handle invalid UTF-8 as gracefully as encoding_rs doing replacement.

Assignee

Comment 48

•

8 years ago

Attached patch 1363281-uconv.patch (v9) (obsolete) — Details — Splinter Review

OK, I switched to nsACString. It wasn't messy, I had looked at it before but I wasn't aware of the benefits of using the "convenience function". You may give more feedback now or later. I haven't removed the questionable code quoted in comment #46 yet.

Attachment #8874190 - Attachment is obsolete: true

Attachment #8874375 - Flags: feedback?(hsivonen)

Assignee

Comment 49

•

8 years ago

Interdiff v8b to v9 works here: https://bugzilla.mozilla.org/attachment.cgi?oldid=8874190&action=interdiff&newid=8874375&headers=1 Oops, I didn't notice comment #47 while busily attaching patches. So remove those three special cases, right? Cutting 0xFEFF is the BOM-handling, right? So should I not use the DecodeWithoutBOMHandlingAndWithoutReplacement() and instead use the one with BOM handling?

Comment 50

•

8 years ago

Comment on attachment 8874375 [details] [diff] [review] 1363281-uconv.patch (v9) Review of attachment 8874375 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/base/util/nsMsgI18N.cpp @@ +212,5 @@ > + encoder->EncodeFromUTF16WithoutReplacement(src, dst, false); > + if (result == mozilla::kInputEmpty) { > + // All converted successfully. > + break; > + } else if (result != mozilla::kInputEmpty && result != mozilla::kOutputFull) { No need for "else" or "result != mozilla::kInputEmpty" on this line anymore. ::: mailnews/import/outlook/src/MapiMessage.cpp @@ +585,5 @@ > + encoder->EncodeFromUTF16WithoutReplacement(src, dst, false); > + if (result == mozilla::kInputEmpty) { > + // All converted successfully. > + break; > + } else if (result != mozilla::kInputEmpty && result != mozilla::kOutputFull) { No need for "else" or "result != mozilla::kInputEmpty" on this line anymore. ::: mailnews/mime/src/mimemoz2.cpp @@ +821,3 @@ > } > > + rv = encoding->DecodeWithoutBOMHandling(nsDependentCString(stringToUse, inLength), outString); nsDependentString doesn't give you the potential zero-copy benefit. Propagating the XPCOM stringness of the input further up the call stack might. The zero-copy case can work only when the input string's buffer is a heap-allocated nsStringBuffer.

Attachment #8874375 - Flags: feedback?(hsivonen) → feedback+

Comment 51

•

8 years ago

(In reply to Jorg K (GMT+2) from comment #49) > Interdiff v8b to v9 works here: > https://bugzilla.mozilla.org/attachment. > cgi?oldid=8874190&action=interdiff&newid=8874375&headers=1 > > Oops, I didn't notice comment #47 while busily attaching patches. So remove > those three special cases, right? > > Cutting 0xFEFF is the BOM-handling, right? So should I not use the > DecodeWithoutBOMHandlingAndWithoutReplacement() and instead use the one with > BOM handling? "BOM removal" means removing the BOM for this encoding if one is there (i.e. if this encoding is UTF-8, removing the UTF-8 BOM but treating other BOMs as garbage input). "Without BOM handling" means leaving the BOM there in the output. The third default unqualified option means potentially switching encoding according to the BOM and removing the BOM. Since this is converting line-by-line and it would be inappropriate to BOM-sniff line-by-line, this should be WithoutBOMHandling.

Assignee

Comment 52

•

8 years ago

Henri, thanks for all your help, advice and feedback. Before I make the final changes addressing comment #50, let me ask/confirm again: - I'll remove the legacy us-ascii/ISO-8859-1 special casing (comment #47). - This code |CopyUTF8toUTF16(inString, tmp); if (!tmp.IsEmpty() && tmp.First() == char16_t(0xFEFF))| removes the BOM from the UTF-16 string. Surely, as you pointed out, this is inappropriate for line-by-line processing but it may be needed on the first line. So what do I do with this code? Will WithoutBOMHandling remove it? Sorry about the NI, I don't know how closely you follow bugs.

Flags: needinfo?(hsivonen)

Comment 53

•

8 years ago

(In reply to Jorg K (GMT+2) from comment #52) > - This code |CopyUTF8toUTF16(inString, tmp); if (!tmp.IsEmpty() && > tmp.First() == char16_t(0xFEFF))| > removes the BOM from the UTF-16 string. > Surely, as you pointed out, this is inappropriate for line-by-line > processing but it may > be needed on the first line. So what do I do with this code? Will > WithoutBOMHandling remove it? DecodeWithoutBOMHandling won't remove it. UTF_8_ENCODING->DecodeWithBOMRemoval() will remove the UTF-8 BOM but will treat UTF-16LE and UTF-16BE BOMs as garbage entering the decoder like any other garbage. UTF_8_ENCODING->Decode() will remove any of the three BOMs and perform a UTF-16LE or UTF-16BE decode instead of UTF-8 decode if the BOM was the BOM for UTF-16LE or UTF-16BE, respectively, and the second item in the return tuple indicates which encoding was actually used. So to replace the above-quoted code, you want UTF_8_ENCODING->DecodeWithBOMRemoval().

Flags: needinfo?(hsivonen)

Assignee

Comment 54

•

8 years ago

Attached patch 1363281-uconv.patch (v10) (obsolete) — Details — Splinter Review

I hope this is the last iteration. Removed doubtful code and used UTF_8_ENCODING->DecodeWithBOMRemoval(). Thanks again for all your help.

Attachment #8874375 - Attachment is obsolete: true

Attachment #8874907 - Flags: feedback?(hsivonen)

Assignee

Comment 55

•

8 years ago

Attached patch 1363281-utf7-imap.patch (v4) - refreshed — Details — Splinter Review

Attachment #8874265 - Attachment is obsolete: true

Attachment #8874265 - Flags: feedback?(Pidgeot18)

Attachment #8874911 - Flags: feedback?(Pidgeot18)

Comment 56

•

8 years ago

Comment on attachment 8874907 [details] [diff] [review] 1363281-uconv.patch (v10) Review of attachment 8874907 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/base/util/nsMsgI18N.cpp @@ +41,5 @@ > } > + > + auto encoding = mozilla::Encoding::ForLabelNoReplacement(nsDependentCString(aCharset)); > + if (!encoding) > + return NS_ERROR_UCONV_NOCONV; Sorry about not mentioning this before, but, to be safe, you might check if you got UTF_16LE_ENCODING or UTF_16BE_ENCODING and error out in those cases, too. (If you don't, those actually produce UTF-8 output to match URL parsing and form submission behavior.) Or assert, since, AFAIK, UTF-16 blocking is supposed to happen elsewhere. @@ +71,2 @@ > else if (!PL_strcasecmp(aCharset, "UTF-8")) { > + return UTF_8_ENCODING->DecodeWithBOMRemoval(inString, outString); This special case means that the BOM is removed only if aCharset is "UTF-8", but "UTF-16BE", "UTF-16LE" and "utf-8" don't get the BOM removed below. I'm not sure if that matters, but your use of mozilla::Encoding::ForLabelNoReplacement instead of mozilla::Encoding::ForName below implies that you don't trust callers to use "UTF-8" consistently and never to use "utf-8".

Attachment #8874907 - Flags: feedback?(hsivonen) → feedback+

Assignee

Comment 57

•

8 years ago

Thanks, I'll look into it. Also, I forgot to address your comment #50 comment completely: > No need for "else" or "result != mozilla::kInputEmpty" on this line anymore.

Assignee

Comment 58

•

8 years ago

Attached patch 1363281-uconv.patch (v11) — Details — Splinter Review

I've removed the unnecessary "result != mozilla::kInputEmpty" and added a check for the UTF-16 (outgoing) encoding. Sadly I don't understand the other comment: > > else if (!PL_strcasecmp(aCharset, "UTF-8")) { > > + return UTF_8_ENCODING->DecodeWithBOMRemoval(inString, outString); > This special case means that the BOM is removed only if aCharset is "UTF-8", > but "UTF-16BE", "UTF-16LE" and "utf-8" don't get the BOM removed below. Well, "utf-8" in lower case is covered by the case-insensitive compare, no? What am I missing?

Attachment #8874907 - Attachment is obsolete: true

Comment 59

•

8 years ago

(In reply to Jorg K (GMT+2) from comment #58) > Sadly I don't understand the other comment: > > > else if (!PL_strcasecmp(aCharset, "UTF-8")) { > > > + return UTF_8_ENCODING->DecodeWithBOMRemoval(inString, outString); > > This special case means that the BOM is removed only if aCharset is "UTF-8", > > but "UTF-16BE", "UTF-16LE" and "utf-8" don't get the BOM removed below. > Well, "utf-8" in lower case is covered by the case-insensitive compare, no? > What am I missing? My mistake. Sorry. I didn't notice the comparison was case-insensitive. Then the difference is only with surrounding whitespace, which I guess isn't an issue in mailnews, and the two UTF-16 variants, which may or may not be an issue.

Assignee

Comment 60

•

8 years ago

Comment on attachment 8875618 [details] [diff] [review] 1363281-uconv.patch (v11) If you could kindly stick another f+ onto it, I think I'm done here for now. When bug 1261841 lands, I imaging on mozilla56 after the next branch date, I will land this and we'll have a least a working TB. We'll see what tests fail and I'll fix them after the event. I little hard to push M-C changes to C-C try, especially with large patches like in this case.

Attachment #8875618 - Flags: review?(mkmelin+mozilla)

Attachment #8875618 - Flags: feedback?(hsivonen)

Assignee

Comment 61

•

8 years ago

Comment on attachment 8874911 [details] [diff] [review] 1363281-utf7-imap.patch (v4) - refreshed Whoever comes first ;-) I will have to land this as bustage fix if the review doesn't come on time.

Attachment #8874911 - Flags: review?(mkmelin+mozilla)

Attachment #8874911 - Flags: review?(Pidgeot18)

Attachment #8874911 - Flags: feedback?(Pidgeot18)

Updated

•

8 years ago

Attachment #8875618 - Flags: feedback?(hsivonen) → feedback+

Updated

•

8 years ago

Attachment #8874911 - Flags: review?(Pidgeot18) → review+

Assignee

Comment 62

•

8 years ago

Comment on attachment 8875618 [details] [diff] [review] 1363281-uconv.patch (v11) Joshua, maybe you want to look this over, too. Or you could rs it. I went through many iterations with Henri until he was finally satisfied ;-) - well, he asked some questions in the process that I couldn't answer.

Attachment #8875618 - Flags: review?(Pidgeot18)

Assignee

Updated

•

8 years ago

Attachment #8874911 - Flags: review?(mkmelin+mozilla)

Magnus Melin [:mkmelin]

Comment 63

•

8 years ago

Comment on attachment 8875618 [details] [diff] [review] 1363281-uconv.patch (v11) Review of attachment 8875618 [details] [diff] [review]: ----------------------------------------------------------------- Skimmed through it, rs=mkmelin ::: mailnews/base/util/nsMsgI18N.cpp @@ +57,5 @@ > + if (aReportUencNoMapping) { > + rv = NS_ERROR_UENC_NOMAPPING; > + } else { > + rv = NS_OK; > + } could just be rv = (aReportUencNoMapping) ? NS_ERROR_UENC_NOMAPPING : NS_OK; @@ +182,5 @@ > + > + uint8_t buffer[512]; > + auto src = mozilla::MakeStringSpan(inString); > + auto dst = mozilla::MakeSpan(buffer); > + for (;;) { while(true) ? ::: mailnews/import/outlook/src/MapiMessage.cpp @@ +576,5 @@ > > + uint8_t buffer[512]; > + auto src = mozilla::MakeSpan(m_body); > + auto dst = mozilla::MakeSpan(buffer); > + for (;;) { while (true)

Attachment #8875618 - Flags: review?(mkmelin+mozilla) → review+

Assignee

Comment 64

•

8 years ago

Thanks, |for(;;)| seems to be Henri's style which I've just copied. I'll fix the nits before landing it.

Assignee

Comment 65

•

8 years ago

https://hg.mozilla.org/comm-central/rev/f558febc1ead57c76ba5eb5af4443a67680f7fa4 https://hg.mozilla.org/comm-central/rev/ac30eed3d7e54d44f6824d9da864d3a89579e76c https://hg.mozilla.org/comm-central/rev/6e8cb90569125b04aadbb9f34c2ecc560c07837b There will be test failures as indicated in comment #40. This left-over https://hg.mozilla.org/comm-central/rev/f558febc1ead57c76ba5eb5af4443a67680f7fa4#l5.21 will need to be removed/

Keywords: leave-open

Target Milestone: --- → Thunderbird 56.0

Assignee

Comment 66

•

8 years ago

https://hg.mozilla.org/comm-central/rev/8af60715cdafb3bff5ee7ef2522e73ea88a2489f Follow-up to fix compile error and clean up left-over comment.

Assignee

Comment 67

•

8 years ago

https://hg.mozilla.org/comm-central/rev/a3b550af3c582739249505897fab03c3b51b08e3 Follow-up to fix compile errors on Mac.

Assignee

Comment 68

•

8 years ago

Attached patch 1363281-follow-up3.patch - Remove UTF-7 decoding from IMAP Daemon — Details — Splinter Review

https://hg.mozilla.org/comm-central/rev/f2532fa2989b44536653261e53c179edf27c5400 Bustage fix to reduce the massive Xpcshell test failure. If decoding from UTF-7 is really needed during the tests, we'll need to revisit this later. I ran three of the tests that were failing, and they passed with this patch.

Assignee

Comment 69

•

8 years ago

There are also many Mozmill test failures reported. I ran a few manually and they didn't fail: mozmake SOLO_TEST=composition/test-image-display.js mozmill-one mozmake SOLO_TEST=composition/test-reply-format-flowed.js mozmill-one mozmake SOLO_TEST=composition/test-send-format.js mozmill-one mozmake SOLO_TEST=composition/test-drafts.js mozmill-one So it looks like preceding test fail and don't clean up properly, so subsequent ones fail. One that fails for real is: mozmake SOLO_TEST=composition/test-multipart-related.js mozmill-one which gives: SUMMARY-UNEXPECTED-FAIL | c:\mozilla-source\comm-central\mail\test\mozmill\composition\test-multipart-related.js | test-multipart-related.js::test_basic_multipart_related EXCEPTION: Expected HTML to refer to cid part1.A82A7414.A4F11358@foo.invalid at: test-multipart-related.js line 117 mozmake SOLO_TEST=composition/test-charset-edit.js mozmill-one fails with: SUMMARY-UNEXPECTED-FAIL | c:\mozilla-source\comm-central\mail\test\mozmill\composition\test-charset-edit.js | test-charset-edit.js::test_wrong_reply_charset EXCEPTION: a != b: 'UTF-8' != 'windows-1252'. at: test-folder-display-helpers.js line 2890 assert_true test-folder-display-helpers.js:2890 11 assert_equals test-folder-display-helpers.js:2877 3 test_wrong_reply_charset test-charset-edit.js:108 3 and SUMMARY-UNEXPECTED-FAIL | c:\mozilla-source\comm-central\mail\test\mozmill\composition\test-charset-edit.js | test-charset-edit.js::test_no_mojibake EXCEPTION: a != b: 'x-mac-croatian' != 'utf-7'. at: test-folder-display-helpers.js line 2890 assert_true test-folder-display-helpers.js:2890 11 assert_equals test-folder-display-helpers.js:2877 3 test_no_mojibake test-charset-edit.js:141 3 mozmake SOLO_TEST=composition/test-charset-upgrade.js mozmill-one fails with: SUMMARY-UNEXPECTED-FAIL | c:\mozilla-source\comm-central\mail\test\mozmill\composition\test-charset-upgrade.js | test-charset-upgrade.js::test_encoding_upgrade_html_compose EXCEPTION: a != b: 'UTF-8' != 'windows-1252'. at: test-folder-display-helpers.js line 2890 assert_true test-folder-display-helpers.js:2890 11 assert_equals test-folder-display-helpers.js:2877 3 test_encoding_upgrade_html_compose test-charset-upgrade.js:65 3 and SUMMARY-UNEXPECTED-FAIL | c:\mozilla-source\comm-central\mail\test\mozmill\composition\test-charset-upgrade.js | test-charset-upgrade.js::test_encoding_upgrade_plaintext_compose EXCEPTION: a != b: 'UTF-8' != 'windows-1252'. at: test-folder-display-helpers.js line 2890 assert_true test-folder-display-helpers.js:2890 11 assert_equals test-folder-display-helpers.js:2877 3 test_encoding_upgrade_plaintext_compose test-charset-upgrade.js:147 3 Also spotted: JavaScript error: resource:///modules/jsmime.jsm, line 39: : Component returned failure code: 0x80500001 [nsIScriptableUnicodeConverter.charset] Xpcshell results from the latest push have just arrived, nine real failures after the the landing in comment #68: TEST-UNEXPECTED-FAIL | mailnews/base/test/unit/test_getMsgTextFromStream.js | run_test/< - [run_test/< : 77] "" == "This message has no content type. This should be assumed to be text/plain." TEST-UNEXPECTED-FAIL | mailnews/compose/test/unit/test_attachment.js | testInput0 - [testInput0 : 92] "Content-Disposition: attachment;\\r\\n =?UTF-8?Q?_!=22#$%&'=28=29*+=2c-./0123456789:;<=3d>=3f@ABCDEFGHIJKLM?==?UTF-8?Q?NOPQRSTUVW == "Content-Disposition: attachment;\\r\\n filename*0*=UTF-8''%20%21%22%23%24%25%26%27%28%29%2A%2B%2C%2D%2E%2F%30%31;\\r\\n filename*1* TEST-UNEXPECTED-FAIL | mailnews/compose/test/unit/test_messageHeaders.js | testContentHeaders - [testContentHeaders : 71] "8bit" == "7bit" TEST-UNEXPECTED-FAIL | mailnews/compose/test/unit/test_longLines.js | testBodyWithLongLine - [testBodyWithLongLine : 67] "8bit" == "base64" TEST-UNEXPECTED-FAIL | mailnews/imap/test/unit/test_mailboxes.js | checkDiscovery - [checkDiscovery : 38] false == true TEST-UNEXPECTED-FAIL | mailnews/imap/test/unit/test_mailboxes.js | checkDiscovery - [checkDiscovery : 38] false == true TEST-UNEXPECTED-FAIL | mailnews/intl/test/unit/test_decode_utf-7_internal.js | checkDecode - [checkDecode : 31] "+LGI--+ITIipSIp- +AocCeQ-oddns +Ad0CjQ- s+ATECZQKH- p+AlAB3QJ5- u+AlACVA- no+Ao4- +Al8-I" == "Ɫ-Ⅎ⊥∩ ʇɹoddns ǝʍ sıɥʇ pɐǝɹ uɐɔ noʎ ɟI" TEST-UNEXPECTED-FAIL | mailnews/intl/test/unit/test_encode_utf-7_internal.js | checkEncode - [checkEncode : 47] "?-??? ??oddns ?? s??? p??? u?? no? ?I" == "+LGI--+ITIipSIp- +AocCeQ-oddns +Ad0CjQ- s+ATECZQKH- p+AlAB3QJ5- u+AlACVA- no+Ao4- +Al8-I" TEST-UNEXPECTED-FAIL | mailnews/mime/test/unit/test_jsmime_charset.js | run_test - [run_test : 27] "=?UTF-7?Q?+AKM-1?=" == "£1" So I guess I'll be here for a while :-(

Assignee

Comment 70

•

8 years ago

Attached patch 1363281-follow-up4.patch - remove/disable tests related to UTF-7 — Details — Splinter Review

https://hg.mozilla.org/comm-central/rev/322986aa2dbabda0ba031d76ba2dc959c7cc9f7a For the time being, we can't decode UTF-7 in JS since we only have a C++ interface for it. Previously the nsIScriptableUnicodeConverter could be used for this purpose. I'll file another bug to reinstate testing removed here. This will fix/silence test_jsmime_charset.js that won't be restored and test_encode_utf-7_internal.js, test_decode_utf-7_internal.js which may be restored and test_mailboxes.js which should be restored. The following tests still need investigation: test_getMsgTextFromStream.js, test_attachment.js, test_messageHeaders.js, test_longLines.js since there the behaviour changed in an unexpected way. Also the Mozmill need to be investigated.

Assignee

Updated

•

8 years ago

Blocks: 1372899

Assignee

Comment 71

•

8 years ago

Filed bug 1372899 for UTF-7 follow up work so the slashing can continue here ;-)

Assignee

Comment 72

•

8 years ago

Attached patch 1363281-follow-up5.patch - WIP. (obsolete) — Details — Splinter Review

test_getMsgTextFromStream.js can be fixed by restoring this code: https://hg.mozilla.org/comm-central/rev/f558febc1ead57c76ba5eb5af4443a67680f7fa4#l3.117 since in this test an empty charset is passed.

Assignee

Comment 73

•

8 years ago

I noticed that a message written with UTF-8 encoding is stored as an empty message. So that's likely to cause a few test failures ;-)

Assignee

Comment 74

•

8 years ago

Attached patch 1363281-follow-up5.patch - fix encoding that doesn't work at all — Details — Splinter Review

https://hg.mozilla.org/comm-central/rev/542d84bf54f1b21ce9979fe82bbb1c46a78f41f5 There was a silly typo/cut&paste bug that disabled encoding message content into the target charset, that is, all saved drafts and sent messages were empty, not real good for an e-mail program. Also I reinstated the option to pass an empty charset. Three of the four failing Xpcshell tests now pass, there is still trouble with test_longLines.js which fails when testing ISO-2022-JP in plain text mode. Somehow the converter flags a replacement and hence the content is encoded in UTF-8 instead which is undesired/unexpected. We'll see how the Mozmill tests come out with this latest patch. Since encoding works now, they should be whole lot better, maybe still failing on x-mac-croation. We'll see.

Attachment #8877746 - Attachment is obsolete: true

Assignee

Comment 75

•

8 years ago

Henri, I'm having trouble with encoding in ISO-2022-JP. My test encodes this message htmlMessage = "<html><head>" + "<meta http-equiv=\"content-type\" content=\"text/html; charset=ISO-2022-JP\">" + "</head><body>" + longMultibyteLineJapanese + "</body></html>"; where let longMultibyteLineJapanese = "語".repeat(450); https://dxr.mozilla.org/comm-central/rev/322986aa2dbabda0ba031d76ba2dc959c7cc9f7a/mailnews/compose/test/unit/test_longLines.js#165 That test passes. The next test encodes the same message, but forced to plain text. Basically, all the tags are removed and only |longMultibyteLineJapanese| is encoded. This time I get a NS_OK_HAD_REPLACEMENTS, so I abort using ISO-2022-JP and fall back to UTF-8, which makes the test fail. I'm a little puzzled by this. How can a Japanese string embedded in HTML tags be encoded correctly, yet, the string without the tags has replacements? Weird. I tried to reproduce scenario manually, that is compose a message with that string, send as plain text. This works, the message is encoded as ISO-2022-JP. So my questions: - Any hints how to debug this? A little hard to dump out the input string at time of encoding. - Under which circumstances would encoding to ISO-2022-JP return "had replacements" if the text consists of 語 and maybe a linefeed? - Is there the remote chance of a bug in the encoder that the status is not reliable?

Flags: needinfo?(hsivonen)

Assignee

Comment 76

•

8 years ago

Attached patch 1363281-follow-up6.patch - fix test-charset-edit.js — Details — Splinter Review

https://hg.mozilla.org/comm-central/rev/c744963d757b8843f5cc4724cfe384038562d154 Aceman, please do a post-landing review here. test-charset-edit.js failed since it was still using UTF-7 which is no longer supported as body charset. Sadly, it threw before cleaning up the draft it created which made all subsequent tests fail. I've fix this and I am using UFT-16 instead now, which is not perfect, but at least maintains the test. With this landing, Mozmill should be green. The only failure left should be xpcshell test_longLines.js as per comment #75. Summarising landing so far: Comment #65: https://hg.mozilla.org/comm-central/rev/f558febc1ead57c76ba5eb5af4443a67680f7fa4 - main code https://hg.mozilla.org/comm-central/rev/ac30eed3d7e54d44f6824d9da864d3a89579e76c - UTF-7 support https://hg.mozilla.org/comm-central/rev/6e8cb90569125b04aadbb9f34c2ecc560c07837b - mozconfig Comment #66 and comment #67: https://hg.mozilla.org/comm-central/rev/8af60715cdafb3bff5ee7ef2522e73ea88a2489f - compile error, follow-up 2 https://hg.mozilla.org/comm-central/rev/a3b550af3c582739249505897fab03c3b51b08e3 - compile error, follow-up 3 Comment #70: https://hg.mozilla.org/comm-central/rev/322986aa2dbabda0ba031d76ba2dc959c7cc9f7a - remove utf-7-based tests, follow-up 4, see follow-up bug 1372899. Comment #74: https://hg.mozilla.org/comm-central/rev/542d84bf54f1b21ce9979fe82bbb1c46a78f41f5 - corrected encoding, follow-up 5 This comment: https://hg.mozilla.org/comm-central/rev/c744963d757b8843f5cc4724cfe384038562d154 - fix test-charset-edit.js, follow-up 6. Not a nice way to work, but after the initial landing in comment #65 we were basically in bustage-fix mode.

Attachment #8877952 - Flags: review?(acelists)

Comment 77

•

8 years ago

(In reply to Jorg K (GMT+2) from comment #75) > So my questions: > - Any hints how to debug this? A little hard to dump out the input string at > time of encoding. Trying to dump the string as close to the encode step as possible seems to be the best bet apart from gdb. To debug in gdb, the code of interest starts at https://dxr.mozilla.org/comm-central/source/mozilla/third_party/rust/encoding_rs/src/iso_2022_jp.rs#471 However, to debug it, you need to build Thunderbird with -Zdebug-macros in RUSTFLAGS. In the Firefox case, RUSTFLAGS is set in config/rules.mk. With that, setting a breakpoint by line number should work. > - Under which circumstances would encoding to ISO-2022-JP return "had > replacements" if the text consists of > 語 and maybe a linefeed? There shouldn't be any such circumstances. I just wrote a test case in Rust feeding 語, 語, LF and 語 to the encoder individually and it didn't misbehave. > - Is there the remote chance of a bug in the encoder that the status is not > reliable? *Remote* chance exists, of course, but per my testing above, I don't think that's the issue here.

Flags: needinfo?(hsivonen)

Comment 78

•

8 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #77) > Trying to dump the string as close to the encode step as possible seems to > be the best bet apart from gdb. Dumping the encoder output would be useful, too.

Assignee

Comment 79

•

8 years ago

Assignee

Comment 80

•

8 years ago

Attached file 1363281.log - log of test (obsolete) — Details

OK, here we have the log. The previous test with the HTML message succeeds: 0:02.52 TEST_STATUS: Thread-1 testBodyWithLongLine PASS [testBodyWithLongLine : 65] 1 == 1 0:02.52 TEST_STATUS: Thread-1 testBodyWithLongLine PASS [testBodyWithLongLine : 67] "text/html; charset=ISO-2022-JP" == "text/html; charset=ISO-2022-JP" 0:02.53 TEST_STATUS: Thread-1 testBodyWithLongLine PASS [testBodyWithLongLine : 65] 1 == 1 0:02.53 TEST_STATUS: Thread-1 testBodyWithLongLine PASS [testBodyWithLongLine : 67] "base64" == "base64" 0:02.53 TEST_STATUS: Thread-1 testBodyWithLongLine PASS [testBodyWithLongLine : 73] true == true 0:02.54 TEST_STATUS: Thread-1 testBodyWithLongLine PASS [testBodyWithLongLine : 48] "<html><head><meta http-equiv=\\"content-type\\" content=\\"text/html; charset=ISO-2022-JP\\"></head><body>Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬× == "<html><head><meta http-equiv=\\"content-type\\" content=\\"text/html; charset=ISO-2022-JP\\"></head><body>Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬×Þ¬× For the failing test, I added this code: if (rv == NS_OK_HAD_REPLACEMENTS) { printf("=== %s\n", aCharset); for (uint32_t i=0; i<inString.Length();i++) printf("=== %lx\n", inString.CharAt(i)); printf("=== Out\n"); for (uint32_t j=0; j<outString.Length();j++) printf("=== %x\n", outString.CharAt(j)); rv = aReportUencNoMapping ? NS_ERROR_UENC_NOMAPPING : NS_OK; } and you can see the result in the log. Maybe I should be dumping it out some other way. I'm surprised that the input doesn't appear to be printing more than 0xFF. Also, what would be the replacement character in the output?

Flags: needinfo?(hsivonen)

Assignee

Updated

•

8 years ago

Attachment #8877960 - Attachment mime type: text/x-log → text/plain

Assignee

Comment 81

•

8 years ago

Hmm, never mind, something is wrong here, the input that shows the replacement is: "=== 3c <" "=== 68 h" "=== 74 t" "=== 6d m" "=== 6c l" "=== 3e >" "=== 3c <" "=== 68 h" "=== 65 e" "=== 61 a" "=== 64 d" "=== 3e >" etc. In the plain text test case only the raw text without the tags should have been presented for conversion. I'll have check what's going on there.

Flags: needinfo?(hsivonen)

Assignee

Comment 82

•

8 years ago

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

Keywords: leave-open

Resolution: --- → FIXED

Assignee

Comment 83

•

8 years ago

Attached patch 1363281-fix-test.patch (follow up 7) - fix test_longLines.js — Details — Splinter Review

Attachment #8877960 - Attachment is obsolete: true

Assignee

Comment 84

•

8 years ago

Looks like I'm not quite done here: TEST-UNEXPECTED-FAIL | mailnews/base/test/unit/test_bug366491.js | xpcshell return code: 1 [log…] PROCESS-CRASH | mailnews/base/test/unit/test_bug366491.js | application crashed [@ mozalloc_abort(char const*)] [log…] TEST-UNEXPECTED-FAIL | mailnews/db/gloda/test/unit/test_mime_attachments_size.js | xpcshell return code: 1 [log…] PROCESS-CRASH | mailnews/db/gloda/test/unit/test_mime_attachments_size.js | application crashed [@ mozalloc_abort(char const*)] [log…] TEST-UNEXPECTED-FAIL | mailnews/extensions/bayesian-spam-filter/test/unit/test_bug228675.js | xpcshell return code: 1 [log…] PROCESS-CRASH | mailnews/extensions/bayesian-spam-filter/test/unit/test_bug228675.js | application crashed [@ mozalloc_abort(char const*)] [log…] TEST-UNEXPECTED-FAIL | mailnews/extensions/bayesian-spam-filter/test/unit/test_customTokenization.js | xpcshell return code: 1 [log…] PROCESS-CRASH | mailnews/extensions/bayesian-spam-filter/test/unit/test_customTokenization.js | application crashed [@ mozalloc_abort(char const*)] [log…] TEST-UNEXPECTED-FAIL | mailnews/extensions/bayesian-spam-filter/test/unit/test_junkAsTraits.js | xpcshell return code: 1 [log…] PROCESS-CRASH | mailnews/extensions/bayesian-spam-filter/test/unit/test_junkAsTraits.js | application crashed [@ mozalloc_abort(char const*)] [log…] TEST-UNEXPECTED-FAIL | mailnews/extensions/bayesian-spam-filter/test/unit/test_traits.js | xpcshell return code: 1 [log…] PROCESS-CRASH | mailnews/extensions/bayesian-spam-filter/test/unit/test_traits.js | application crashed [@ mozalloc_abort(char const*)] [log…] TEST-UNEXPECTED-FAIL | mailnews/imap/test/unit/test_partsOnDemand.js | xpcshell return code: 1 [log…] PROCESS-CRASH | mailnews/imap/test/unit/test_partsOnDemand.js | application crashed [@ mozalloc_abort(char const*)] [log…] TEST-UNEXPECTED-FAIL | mailnews/imap/test/unit/test_partsOnDemand.js | xpcshell return code: 1 [log…] PROCESS-CRASH | mailnews/imap/test/unit/test_partsOnDemand.js | application crashed [@ mozalloc_abort(char const*)] [log…] TEST-UNEXPECTED-FAIL | mailnews/mime/test/unit/test_attachment_size.js | xpcshell return code: 1 [log…] PROCESS-CRASH | mailnews/mime/test/unit/test_attachment_size.js | application crashed [@ mozalloc_abort(char const*)] [log…] https://archive.mozilla.org/pub/thunderbird/tinderbox-builds/comm-central-macosx64-debug/1497556368/comm-central_yosemite_r7-debug_test-xpcshell-bm106-tests1-macosx-build30.txt.gz says: 14:50:45 INFO - PID 7304 | [7304] ###!!! ASSERTION: nsTDependentString must wrap only null-terminated strings. You are probably looking for nsTDependentSubstring.: 'mData[mLength] == 0', file /builds/slave/tb-c-cen-m64-d-000000000000000/build/objdir-tb/dist/include/nsTString.h, line 460 14:50:45 INFO - PID 7304 | Hit MOZ_CRASH() at /builds/slave/tb-c-cen-m64-d-000000000000000/build/mozilla/memory/mozalloc/mozalloc_abort.cpp:33 14:50:45 INFO - <<<<<<< 14:50:45 INFO - mozcrash Copy/paste: /builds/slave/test/build/macosx64-minidump_stackwalk /var/folders/93/f4rdtxvx5xg5dzcxcm29xd4c00000w/T/xpc-other-Zr_IEs/D63F1B09-4E1B-49F1-85C2-C7C4EB663C4B.dmp /builds/slave/test/build/symbols 14:50:59 INFO - mozcrash Saved minidump as /builds/slave/test/build/blobber_upload_dir/D63F1B09-4E1B-49F1-85C2-C7C4EB663C4B.dmp 14:50:59 INFO - mozcrash Saved app info as /builds/slave/test/build/blobber_upload_dir/D63F1B09-4E1B-49F1-85C2-C7C4EB663C4B.extra 14:50:59 WARNING - PROCESS-CRASH | mailnews/extensions/bayesian-spam-filter/test/unit/test_bug228675.js | application crashed [@ mozalloc_abort(char const*)] 14:50:59 INFO - Crash dump filename: /var/folders/93/f4rdtxvx5xg5dzcxcm29xd4c00000w/T/xpc-other-Zr_IEs/D63F1B09-4E1B-49F1-85C2-C7C4EB663C4B.dmp 14:50:59 INFO - Operating system: Mac OS X 14:50:59 INFO - 10.10.5 14F27 14:50:59 INFO - CPU: amd64 14:50:59 INFO - family 6 model 69 stepping 1 14:50:59 INFO - 4 CPUs 14:50:59 INFO - GPU: UNKNOWN 14:50:59 INFO - Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 14:50:59 INFO - Crash address: 0x0 14:50:59 INFO - Process uptime: 0 seconds 14:50:59 INFO - Thread 0 (crashed) 14:50:59 INFO - 0 libmozglue.dylib!mozalloc_abort(char const*) [mozalloc_abort.cpp:75be6742abb9 : 33 + 0x0] 14:50:59 INFO - rax = 0x0000000000000000 rdx = 0x00007fff759f11f8 14:50:59 INFO - rcx = 0x0000000000000000 rbx = 0x00007fff759f1c50 14:50:59 INFO - rsi = 0x0000350000003500 rdi = 0x0000340000003503 14:50:59 INFO - rbp = 0x00007fff57592690 rsp = 0x00007fff57592680 14:50:59 INFO - r8 = 0x00007fff57592630 r9 = 0x00007fff76274300 14:50:59 INFO - r10 = 0x0000000000000000 r11 = 0x0000000000000246 14:50:59 INFO - r12 = 0x0000000000000001 r13 = 0x000000010e1e2fc8 14:50:59 INFO - r14 = 0x00007fff57592700 r15 = 0x00007fff57595b51 14:50:59 INFO - rip = 0x000000010fecba99 14:50:59 INFO - Found by: given as instruction pointer in context 14:50:59 INFO - 1 XUL!Abort [nsDebugImpl.cpp:75be6742abb9 : 451 + 0x5] 14:50:59 INFO - rbx = 0x00007fff759f1c50 rbp = 0x00007fff575926a0 14:50:59 INFO - rsp = 0x00007fff575926a0 r12 = 0x0000000000000001 14:50:59 INFO - r13 = 0x000000010e1e2fc8 r14 = 0x00007fff57592700 14:50:59 INFO - r15 = 0x00007fff57595b51 rip = 0x0000000108e4e769 14:50:59 INFO - Found by: call frame info 14:50:59 INFO - 2 XUL!NS_DebugBreak [nsDebugImpl.cpp:75be6742abb9 : 438 + 0x8] 14:50:59 INFO - rbx = 0x00007fff759f1c50 rbp = 0x00007fff57592b30 14:50:59 INFO - rsp = 0x00007fff575926b0 r12 = 0x0000000000000001 14:50:59 INFO - r13 = 0x000000010e1e2fc8 r14 = 0x00007fff57592700 14:50:59 INFO - r15 = 0x00007fff57595b51 rip = 0x0000000108e4e540 14:50:59 INFO - Found by: call frame info 14:50:59 INFO - 3 XUL!ConvertToUTF8(char const*, int, char const*, nsACString&) [nsTString.h:75be6742abb9 : 458 + 0x25] 14:50:59 INFO - rbx = 0x000000010fb19010 rbp = 0x00007fff57592bf0 14:50:59 INFO - rsp = 0x00007fff57592b40 r12 = 0x000000000000003a 14:50:59 INFO - r13 = 0x000000011a031000 r14 = 0x00007fff57592c00 14:50:59 INFO - r15 = 0x00007fff57592b50 rip = 0x0000000108d8f7e5 So something wrong in the new ConvertToUTF8().

https://hg.mozilla.org/comm-central/rev/108327ec86f2e7f4a75701a0a5bbe7fbb3d0bb74

Assignee

Comment 85

•

8 years ago

Attached patch 1363281-follow-up8.patch - fix debug MOZ_ASSERTs. — Details — Splinter Review

Updated

•

8 years ago

Attachment #8875618 - Flags: review?(Pidgeot18)

:aceman

Comment 86

•

8 years ago

Comment on attachment 8877952 [details] [diff] [review] 1363281-follow-up6.patch - fix test-charset-edit.js Review of attachment 8877952 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me, but what was the reason to also drop the japanese text ? What is the encoding of the 'nonASCII' string? ::: mail/test/mozmill/composition/test-charset-edit.js @@ +140,5 @@ > let msg = select_click_row(0); > assert_selected_and_displayed(mc, msg); > + assert_equals(getMsgHeaders(msg).get("").charset, "UTF-16LE"); > + // We need to check for includes() here since the message > + // also contains CRLF, which is interepreted as invalid UTF-16. Typo.

Attachment #8877952 - Flags: review?(acelists) → review+

Assignee

Comment 87

•

8 years ago

Sorry about the typo. The Japanese text was fine when I had UTF-7 as decodable but invalid charset since UTF-7 is pure ASCII. Now I was looking for a non-ASCII string with an obvious UFT-16 encoding and found the € sign. The encoding is UTF-16LE, so the 0x20 and 0xAC get swapped around. UFT-16 is quite complicated, so this was my best bet: https://en.wikipedia.org/wiki/UTF-16#Examples

Alice0775 White

Updated

•

7 years ago

Depends on: 1511950