in the TXT->HTML libmime converters, in the parse_line function, we do nsString lineSourceStr; lineSourceStr.AssignWithConversion(line, length); but line is UTF8, not? Shouldn't that be nsString lineSourceStr(NS_ConvertUTF8ToUCS2(line, length)); or similar?
mailnews/mime/src/mimetp(la|fl).cpp is the source...
That is right. Does TXT->HTML currently handle characters above 255? I think there is other bug filed for this code for a performance issue (because of the nsString allocation). In future, I would like to change to pass PRUnichar* to the function and delay the UTF-8 conversion after the TXT->HTML conversion.
> Does TXT->HTML currently handle characters above 255? Yes, mozTXTToHTMLConv works with nsString / PRUnichar internally and is written with unicode in mind. (In fact, it will probably break, if you do not use UCS2.) > In future, I would like to change to pass PRUnichar* to the function and delay > the UTF-8 conversion after the TXT->HTML conversion. ? In parse_line, we get char* from libmime and that's what we have to output, too.
>In parse_line, we get char* from libmime and that's what we have to output, too. What I meant was change that char* to PRUnichar* and convert to UTF-8 after the TXT->HTML conversion. Anyway, 331 nsString lineSourceStr; 332 lineSourceStr.AssignWithConversion(line, length); corresponds to 483 nsCAutoString lineResultCStr; 484 CopyUCS2toASCII(lineResultUnichar, lineResultCStr); so both need the change. Ignoring about the UTF-8 conversion, line 331 can change to use nsAutoString for performance. My testing result for MimeInlineTextPlain_parse_line() using Quantify shows that, nsString::nsString takes 9.06% of the function while nsCAutoString::nsCAutoString 0.88%.
Created attachment 70910 [details] [diff] [review] Proposed Fix, Version 1 Done. Also fixed one unnecessary string copy. nhotta, please review and assign to me (if you don't mind). I tried some I18N testcases I got from Rich and I see no difference. All works fine before (! - no idea why) and after the change. What do we do with the following in the SaveAs codepath? Are the comments still true? newcstr = ToNewCString(lineSource); // lineSource uses nsString but the string is NOT unicode [...} lineResult.AssignWithConversion(cstr.get()); // create nsString which contains NON unicode // as the following code expecting it
>I tried some I18N testcases I got from Rich and I see no difference. All works >fine before (! - no idea why) and after the change. I think this is because the parser does not parse non ASCII characters. I am going to check performance with the patch because that is important here.
The result of using two mails (bugzilla mail and the same body re-sent as flowed). columns label: num of calls, function time(microseconds), F+D time, F time(%), D time(%), Avg F time, Min F time, Max F time 1) without the patch MimeInlineTextPlain_parse_line 13 5.06 5559.86 0.00 0.00 0.39 0.35 0.58 MimeInlineTextPlainFlowed_parse_line 20 9.77 15263.19 0.00 0.00 0.49 0.48 0.60 2) with the patch MimeInlineTextPlain_parse_line 13 4.85 5333.32 0.00 0.00 0.37 0.33 0.56 MimeInlineTextPlainFlowed_parse_line 20 9.29 7681.36 0.00 0.00 0.46 0.45 0.57 With the patch, F+D time (function and its descendants time) decreased. Cc to bienvenu since he was looking at the performance issue before.
>What do we do with the following in the SaveAs codepath? >Are the comments still true? No, that part has to change because the input string in case of the SaveAs case is not in UTF-8 but encoded in mail charset. So the patch would break non ASCII charactgers for SaveAs. We can remove that special code by doing either, 1) convert the input string from mail charset to UTF-8 in advance (by the caller) for SaveAs 2) skip ScanTXT for SaveAs Ben, could you check who is calling the parse functions for SaveAs?
Comment on attachment 70910 [details] [diff] [review] Proposed Fix, Version 1 in mimeptfl.cpp, I don't think so it's a good idea to use an nsAutoString as it's very likely that at least one of the line of the message exceed 64 characters. It's better to keep the actual code which preset the buffer size to 100 characters
That's right. The i18n special case inflate/deflate for itself in mimeptfl.cpp, so not affected by the patch.