Closed
Bug 50413
Opened 25 years ago
Closed 25 years ago
In saving JPN msgs into HTML format, characters which contain 0x3C are misconverted into <
Categories
(MailNews Core :: Internationalization, defect, P2)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: momoi, Assigned: nhottanscp)
Details
(Keywords: crash, Whiteboard: [nsbeta3+]fix in hand, reviewed)
Attachments
(3 files)
|
6.37 KB,
patch
|
Details | Diff | Splinter Review | |
|
5.89 KB,
text/plain
|
Details | |
|
789 bytes,
patch
|
Details | Diff | Splinter Review |
** Observed with 8/25/2000 Win32 build **
Take the Intl Smoketest mail folder and select the 2nd message from the top.
Now save this message using HTML format -- File | Save as | File menu item.
Give it a name and supply teh extension. .html and then save it.
Now open this file and view it with Browser under Japanese (ISO-2022-JP) encoding.
The subject and the body contain exactly the same data but the body text is
partially corrupted displays as non-Japanese.
Switch to ISO-8859-1 encoding and view hte page source and compare the Subject
and Body lines that are in ISO-2022-JP.
$B$3$l$OF|K\8l$N%a!<%k$G$9!#(B
$B$3$l$OF|K\8l$N%a!<%k$G$9!#(B --> Note the misconversion of "<" into "<".
Initially sending this to Naoki.
| Reporter | ||
Comment 1•25 years ago
|
||
This is a familiar porblem. It will probably affect other languages
which uses multi-byte encodings (e.g. Chinese) in addition
to Japanese.
| Assignee | ||
Comment 2•25 years ago
|
||
HTML escape problem, it should be escaped when the data is in unicode before
the charset conversion.
Status: NEW → ASSIGNED
| Assignee | ||
Comment 6•25 years ago
|
||
< is generated by ScanTXT() which is used for plain text display and save as
file.
So I think the bug only happens when the original message is plain text and
it's saved as HTML.
Comment 7•25 years ago
|
||
But the ScanTXT function pass parameter by using PRUnichar* and nsString. I
think the problem is the caller convert char* into PRUnichar* .
One of the problem is here. The problem is this routine convert char* into
nsString by using ASCII to PRUnichar* rule.
http://lxr.mozilla.org/seamonkey/source/mailnews/mime/src/mimetpfl.cpp#340
255
256 static int
257 MimeInlineTextPlainFlowed_parse_line (char *line, PRInt32 length, MimeObject
*obj)
258 {
259 int status;
260 PRBool quoting = ( obj->options
261 && ( obj->options->format_out == nsMimeOutput::nsMimeMessageQuoting ||
262 obj->options->format_out == nsMimeOutput::nsMimeMessageBodyQuoting
263 ) ); // see above
264 PRBool plainHTML = quoting || (obj->options &&
265 obj->options->format_out == nsMimeOutput::nsMimeMessageSaveAs);
266 // see above
267
268 struct MimeInlineTextPlainFlowedExData *exdata;
269 exdata = MimeInlineTextPlainFlowedExDataList;
270 while(exdata && (exdata->ownerobj != obj)) {
271 exdata = exdata->next;
272 }
273
274 NS_ASSERTION(exdata, "The extra data has disappeared!");
275
276 NS_ASSERTION(length > 0, "zero length");
277 if (length <= 0) return 0;
278
279 uint32 linequotelevel = 0;
280 char *linep = line;
281 // Space stuffed?
282 if(' ' == *linep) {
283 linep++;
284 } else {
285 // count '>':s before the first non-'>'
286 while('>' == *linep) {
287 linep++;
288 linequotelevel++;
289 }
290 // Space stuffed?
291 if(' ' == *linep) {
292 linep++;
293 }
294 }
295
296 // Look if the last character (after stripping ending end
297 // of lines and quoting stuff) is a SPACE. If it is, we are looking at a
298 // flowed line. Normally we assume that the last two chars
299 // are CR and LF as said in RFC822, but that doesn't seem to
300 // be the case always.
301 PRBool flowed = PR_FALSE;
302 PRInt32 index = length-1;
303 while(index >= 0 && ('\r' == line[index] || '\n' == line[index])) {
304 index--;
305 }
306 if (index > linep - line && ' ' == line[index])
307 /* Ignore space stuffing, i.e. lines with just
308 (quote marks and) a space count as empty */
309 flowed = PR_TRUE;
310
311 mozITXTToHTMLConv *conv = GetTextConverter(obj->options);
312
313 PRBool skipConversion = !conv ||
314 (obj->options &&
obj->options->force_user_charset);
315
316 nsString lineSource;
317 nsString lineResult;
318 lineSource.SetCapacity(kInitialBufferSize);
319 lineResult.SetCapacity(kInitialBufferSize);
320
321 if (!skipConversion)
322 {
323 lineSource.AssignWithConversion(linep, (length - (linep - line)) );
324 PRUnichar* wresult = nsnull;
325 nsresult rv = NS_OK;
326 PRBool whattodo = obj->options->whattodo;
327 if (plainHTML)
328 {
329 if (quoting)
330 whattodo = 0;
331 else
332 whattodo = whattodo & ~mozITXTToHTMLConv::kGlyphSubstitution;
333 /* Do recognition for the case, the result is viewed in
334 Mozilla, but not GlyphSubstitution, because other UAs
335 might not be able to display the glyphs. */
336 }
337
338 /* This is the main TXT to HTML conversion:
339 escaping (very important), eventually recognizing etc. */
340 rv = conv->ScanTXT(lineSource.GetUnicode(), whattodo, &wresult);
341 if (NS_FAILED(rv)) return -1;
342
343 lineResult = wresult;
344 Recycle(wresult);
345 }
346 else
347 {
348 lineResult.AssignWithConversion(line, length);
349 status = NS_OK;
350 }
351
352 nsCAutoString preface;
353
354 /* Correct number of blockquotes */
355 int32 quoteleveldiff=linequotelevel - exdata->quotelevel;
356 if((quoteleveldiff != 0) && flowed && exdata->inflow) {
357 // From RFC 2646 4.5
358 // The receiver SHOULD handle this error by using the 'quote-depth-wins'
rule,
359 // which is to ignore the flowed indicator and treat the line as fixed.
That
360 // is, the change in quote depth ends the paragraph.
361
362 // We get that behaviour by just going on.
363 }
364 while(quoteleveldiff>0) {
365 quoteleveldiff--;
366 preface += "<blockquote type=cite";
367 // This is to have us observe the user pref settings for citations
368 MimeInlineTextPlainFlowed *tObj = (MimeInlineTextPlainFlowed *) obj;
369 char *style = MimeTextBuildPrefixCSS(tObj->mQuotedSizeSetting,
370 tObj->mQuotedStyleSetting,
371 tObj->mCitationColor);
372 if (!plainHTML && style && strlen(style))
373 {
374 preface += " style=\"";
375 preface += style;
376 preface += '"';
377 PR_FREEIF(style);
378 }
379 preface += '>';
380 }
381 while(quoteleveldiff<0) {
382 quoteleveldiff++;
383 preface += "</blockquote>";
384 }
385 exdata->quotelevel = linequotelevel;
386
387 nsString lineResult2;
388 lineResult2.SetCapacity(kInitialBufferSize);
389
390 if(flowed) {
391 // Check RFC 2646 "4.3. Usenet Signature Convention": "-- "+CRLF is
392 // not a flowed line
393 if
394 ( // is "-- "LINEBREAK
395 lineSource.Length() >= 4
396 && lineSource[0] == '-'
397 &&
398 (
399 lineSource.EqualsWithConversion("-- \r", PR_FALSE, 4) ||
400 lineSource.EqualsWithConversion("-- \n", PR_FALSE, 4)
401 )
402 )
403 {
404 if (linequotelevel > 0 || exdata->isSig)
405 {
406 preface += "-- <br>";
407 } else {
408 exdata->isSig = PR_TRUE;
409 preface +=
410 "<div class=\"txt-sig\"><span
class=\"txt-tag\">-- <br></span>";
411 }
412 } else {
413 Line_convert_whitespace(lineResult, PR_FALSE /* Allow wraps */,
414 lineResult2);
415 }
416
417 exdata->inflow=PR_TRUE;
418 } else {
419 // Fixed paragraph.
420 Line_convert_whitespace(lineResult,
421 !plainHTML && !obj->options->wrap_long_lines_p
422 /* If wrap, convert all spaces but the last in
423 a row into nbsp, otherwise all. */,
424 lineResult2);
425 lineResult2 += NS_LITERAL_STRING("<br>");
426 exdata->inflow = PR_FALSE;
427 } // End Fixed line
428
429 if (!(exdata->isSig && quoting))
430 {
431 char* tmp = preface.ToNewCString();
432 status = MimeObject_write(obj, tmp, preface.Length(), PR_TRUE);
433 Recycle(tmp);
434 tmp = lineResult2.ToNewCString();
435 status = MimeObject_write(obj, tmp, lineResult2.Length(), PR_TRUE);
436 Recycle(tmp);
437 return status;
438 }
439 else
440 return NS_OK;
441 }
Looks like the same kind of problem in
http://lxr.mozilla.org/seamonkey/source/mailnews/mime/src/mimetpla.cpp#398
and
http://lxr.mozilla.org/seamonkey/source/mailnews/mime/src/mimetpla.cpp#428
I think the function ScanTXT itself have no problem. The problem is we apply
some expansion code which should apply to the PRUnichar* data after it have
already convert to char*.
Comment 8•25 years ago
|
||
add rhp (and if someone can, please add BenB's correct email address ) to the cc
list since these code (Those
code which convert char* to nsString and then pass to scanTXT() ) are check in
by them.
Comment 9•25 years ago
|
||
Adding Ben.
Comment 10•25 years ago
|
||
> I think the problem is the caller convert char* into PRUnichar* .
> One of the problem is here. The problem is this routine convert char* into
> nsString by using ASCII to PRUnichar* rule.
You mean |lineSource.AssignWithConversion(linep, ...);|? What is wrong with
that? What should I use instead?
Is this a new problem? IIRC, this conversion has always (i.e. since end of last
year) been done this way. Or did I break it with my last checkin?
(I just noticed that we should propably do the quote handling ("...286 while('>'
== *linep)...") with nsString's instead of char*s, too. But that is really an
old problem. Adding Daniel Bratell.)
| Assignee | ||
Comment 11•25 years ago
|
||
I think the usage of nsString is not correct here but that's not the reason
which causes this bug. This part is shared by message display, quoting, save as,
maybe more. So, I wouldn't correct that at this time.
The problem here is that the string is usually converted to UTF-8 but for saveas
case it is in mail charset (e.g. ISO-2022-JP). Since ScanTXT does not expect
ISO-2022-JP as an input, it does not escape correctly.
So I was thinking to do a special case for saveas. It is ugly but minimize
possible side effects. The change would be for save as only convert from mail
charset to unicode before ScanTXT then convert back to mail charset.
Comment 12•25 years ago
|
||
> So I was thinking to do a special case for saveas. It is ugly but minimize
> possible side effects.
It's not that these classes were pretty right now :).
> convert from mail
> charset to unicode before ScanTXT then convert back to mail charset.
I like this solution - this is what I thought it would work.
| Assignee | ||
Comment 13•25 years ago
|
||
Updated•25 years ago
|
Whiteboard: [nsbeta3+] → [nsbeta3+]fix in hand, need review
Comment 14•25 years ago
|
||
nhotta said he have a patch in hand, need code review and test.
He today crash before he can hit that code, with or without his patch.
nhotta- please attach your patch here. Thanks
Comment 15•25 years ago
|
||
I did not review, but looked at the file. Suggestions/Corrections:
In mimetpla.cpp:
- |lineSourceStr| is just the nsString versionof |line|. Use that instead of
|newcstr|.
- In the charset-magic else block, ScanTXT is ran over the whole source line.
This is wrong, you need to skip the first |logicalLineStart| chars.
This is propably just the tip of the iceberg - I (and Daniel, too, as it seems)
wrote this stuff in the assumption that I operate with Unicode only. We should
leave this open or file a new bug about that.
Also, independant from that, I need some education about I18N in general and in
Mozilla, especially with |ns*String|. I'll look at the website, when I have
time, but pointers to references, tutorials etc. appreciated.
| Assignee | ||
Comment 16•25 years ago
|
||
> - |lineSourceStr| is just the nsString versionof |line|. Use that instead of
> |newcstr|.
The problem is that is using nsString but the actual string is UTF-8 because of
the incorrect use of nsString.
> - In the charset-magic else block, ScanTXT is ran over the whole source line.
> This is wrong, you need to skip the first |logicalLineStart| chars.
I ignore that because I thought that's only needed for quoting and the else
block is only for save as. But do I also need that for saving quoted text?
Comment 17•25 years ago
|
||
Hi Naoki,
Looks reasonable to me. r: rhp
- rhp
Comment 18•25 years ago
|
||
> > - |lineSourceStr| is just the nsString versionof |line|. Use that instead of
> > |newcstr|.
> The problem is that is using nsString but the actual string is UTF-8 because
> of the incorrect use of nsString.
Sorry, i was not clear. |line| is a cstr, and |lineSourceStr| holds the nsString
equivalence. No point in converting from nsString back to cstr, use the original
(|line|).
> I ignore that because I thought that's only needed for quoting and the else
> block is only for save as. But do I also need that for saving quoted text?
This is for quotes in the recived msg. E.g. the previous paragraph would count
as quote, and |logicalLineStart| would be 2. The ">"s are added at another place
(to |prefaceResultStr|).
| Assignee | ||
Comment 19•25 years ago
|
||
Comment 20•25 years ago
|
||
You reused |line| in mimemtpfl.cpp, too. However, in *this* class, |lineSource|
!= |line|. In this class, |lineSource| does *not* contain the ">"s anymore,
while |line| does. (That's propably my fault - misleading variable names.) Take
the patch for mimetpfl.cpp from teh first attachment, and the patch for
mimetpla.cpp from the second attachment, and you should be set.
| Assignee | ||
Comment 21•25 years ago
|
||
| Assignee | ||
Updated•25 years ago
|
Whiteboard: [nsbeta3+]fix in hand, need review → [nsbeta3+]fix in hand, reviewed
Comment 22•25 years ago
|
||
also crashing problem. change the priority to P2 because of crash
Keywords: crash
Priority: P4 → P2
Comment 23•25 years ago
|
||
woa :-(
Do you happen to know, where it crashes? (Many this helps preventing similar
crashs.) If you don't, that's OK, too.
| Assignee | ||
Comment 24•25 years ago
|
||
Fix check in (both the crasher and "<" problem).
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Comment 26•25 years ago
|
||
Verified with win32 2000092005 and linux 2000092008 build. It's fixed.
Status: RESOLVED → VERIFIED
Comment 27•24 years ago
|
||
jgmyers wants to remove the saveashtml functinality altogether: bug 68982.
Updated•21 years ago
|
Product: MailNews → Core
Updated•17 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•