Closed Bug 59679 Opened 24 years ago Closed 23 years ago

Mac: A Japanese character in blockquote is ill-converted if Saved as ISO-2022-JP

Categories

(Core :: Internationalization, defect, P2)

PowerPC
Mac System 9.x
defect

Tracking

()

VERIFIED FIXED
mozilla0.9

People

(Reporter: tarahim, Assigned: nhottanscp)

References

Details

(Keywords: intl)

Attachments

(2 files)

In HTML composer, a Japanese character "~" (2141 in JIS and 301C in Unicode) in
blockquote is converted to three garbage characters when a file is saved as
ISO-2022-JP.
2000110808 MTrunk.
I can reproduce, if not blockquote then it's okay (so doesn't seem to be a
converter problem).
Looks like the character was converted to NCR.

<blockquote>$B$"$$$($*(B<br> $B$F$9$H&#12316;%F%9%H(B<br> </blockquote>
Status: NEW → ASSIGNED
On Mac, I get this problem whether it it in Blockquote
or in normal HTML text.
The problem I think has to do with our decicion to map
Shift_JIS 0x8160 to FF5E (by MS conversion map). But Mac
maps it to 301C in Unicode. We had this problem discussed elsewhere
and decided to go with FF5E knowing that there will be an incompatibility
problem on Mac.
&#12316; is 301C in hex.
Did we fix our Unicode <-> ISO-2022-JP to be consistent
with our Unicode <-> Shift_JIS table?
I tried again and I can reproduce it without blockquote. So it generically
happens on Macintosh. I get the charset warning on Mac for sending a mail with
that character.
I was not involved with the old issue, probably Frank changed something, cc to
cata.
If we can find out the bug number of the old problem I migtht be able to find
his check in for that.

http://bugzilla.mozilla.org/show_bug.cgi?id=35166

This is the bug where we changed mapping.
Is this working fine on Unix?
I don't see the problem on a linux build. (110807 Ja build)
This does not happen on Win95J. 
Keywords: intl
Now I can reassign to Frank.
Assignee: nhotta → ftang
Status: ASSIGNED → NEW
Summary: A Japanese character in blockquote is ill-converted if Saved as ISO-2022-JP → Mac: A Japanese character in blockquote is ill-converted if Saved as ISO-2022-JP
This looks similar to bug 63841
><blockquote>$B$"$$$($*(B<br> $B$F$9$H&#12316;%F%9%H(B<br> </blockquote>
notice that the there are no esc + "(B" before &#12316; and there are no esc
+"$B" after it.
Reassign this back to nhotta to debug.
It looks a dup of 63841.
Assignee: ftang → nhotta
Keywords: nsbeta1
Priority: P3 → P2
Target Milestone: --- → mozilla0.8
Mark it as P2 nsbeta1.
There could be a problem for encoder client (e.g. not calling Finish() for
unmapped error case) so it may cause the incorrect escape sequences.
But the mapping problem is a separate issue and should be corrected by the
converter.
I will take a look at the client side first.
I can reproduce this on Windows 2000 when I input \u301C
 using "Character Map" utility.
Status: NEW → ASSIGNED
Somehow nsISaveAsCharset is not used any more, charset conversion is done in 
layout (nsDocumentEncoder.cpp, rev=1.35).
I think it does not call Finish() in case of conversion error.
Adding jst to cc, I have other bug 65324 caused by that change.
Reassign to jst, nsIUnicodeEncoder::Finish() has to be called in case of
NS_ERROR_UENC_NOMAPPING.

nsDocumentEncoder.cpp

431 jst      1.35     if (convert_rv == NS_ERROR_UENC_NOMAPPING) {
 432                     nsCAutoString entString("&#");
 433                     entString.AppendInt(unicodeBuf[unicodeLength - 1]);
 434                     entString.Append(';');
 435
 436                     rv = aStream->Write(entString.GetBuffer(),
entString.Length(), &written);
Assignee: nhotta → jst
Status: ASSIGNED → NEW
Reassigning to anthonyd.
Assignee: jst → anthonyd
Filed JIS encoder problem for \u301C as bug 65991.
moving this to moz0.9
Target Milestone: mozilla0.8 → mozilla0.9
near as I can tell, (though I have no way to reproduce this, there is some 
character that isn't being converted correctly or something on the mac.  The 
solution (as stated in the bug is to call:
nsIUnicodeEncoder::Finish(...)

BUT, this method has never been implemented at all.  Can someone please explain 
to me what is supposed to be done here, and why implementing, and then calling 
this method will fix fix this bug?

anthonyd
Please look at the header file for the info nsIUnicodeEncoder::Finish.
http://lxr.mozilla.org/seamonkey/source/intl/uconv/public/nsIUnicodeEncoder.h#128

See below for the implementation.
http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvja/nsUCvJaSupport.cpp#543

Reassign to nhotta.
Assignee: anthonyd → nhotta
jst, could you do a review for the patch?
Status: NEW → ASSIGNED
The one thing that concerns me about the patch is:

  char finish_buf[32];

I don't see any code that guarantees that we won't write past the bounds of this
buffer, is Finish guaranteed to never ever write more than 31 characters into
the output buffer?

If that's guaranteed to be ok, then r=jst.
I forgot to set a length before calling Finish(), I will attach a patch.
If we call Finish() to get the converter to write out an escape sequence for the
character, do we still need to write out a numerical character entity for it as
well? We're also presuming that the result of the call to Finish() relate to
just the character that triggered the error. Is this a safe assumption? Could
the call to Finish() actually require more space than the stack-based buffer and
generate a NS_OK_UENC_MOREOUTPUT error code?
The entity needs to be written out after the escape sequence since the character
was not mapped to the target charset.
NS_ERROR_UENC_NOMAPPING is returned when a character could not be mapped from
unicode. That case, we always need to call Finish(). Even if calling Finish()
when it is not needed just causes extra escape sequences to be written out and
does not corrupt the output.
About the buffer size, we could loop and increase the buffer but the output is
small (3 bytes for ISO-2022-JP) so practically no problem.
Vidur, did my comment make sense, do you have other comments?
Blocks: 63841
Keywords: review
It was reviewed and checked in as a part of bug 65324.
JIS encoder problem for \u301C is bug 65991.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Verified as fixed in 3-01 build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: