Closed Bug 24807 Opened 26 years ago Closed 26 years ago

Incorrect conversion of a Zenkaku long dash char in Japanese

Categories

(MailNews Core :: Internationalization, defect, P3)

x86
Windows 98

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: momoi, Assigned: ftang)

Details

(Whiteboard: [PDT+] reviewed fix in tree, wait tree open to check in.)

Attachments

(3 files)

** Observed with 1/21/00 Win32 build ** Japanese has 2 types of Zenkaku long dash characters. One is used when elongating a Kana vowel, the other is used when a preceding entry is Zenkaku Roman. This latter is more like a Zenkaku dash or minus sign. Japanese IME normally takes crae of this switch automatically figuring out which one to use from context of input when the dash character key is pressed. The problem occurs as we try to send out the Roman dash or minus sign. It arrives as some other characters -- actually looks like 2 unrelated characters instead of one dash/minus character. Clearly we have a conversion problem somewhere -- possibly in the conversion table itself. The Zenkaku dash/minus is: 0x815C (Shift_JIS) \u2015 The Zenkaku vowel elongator is: 0x815B (Shift_JIS) \u30FC What is interesting is that when you send a Japanese msg containting these characters, the msg arrives with the following changes: 1. Zenkaku dash/minus is tunred into some other characters (2 chars) 2. Zenkaku vowel elongator seems to be turned into Zenkaku dash/minus. I have 2 images illustrating this problem. Image 1 -- When composing the word UTF-8 with 2 different dashes. Image 2 -- when the msg arrives in the mailbox and looks converted.
Can this be seen also with HTML editor?
Yes, also with HTML editor. The result looks a little different from Plain Test editor case but converted wrong also.
I was able to send those two characters (\u2015 and \u30FC). Do I need to combine those with other characters in order to reproduce?
I don' think so but you might want to begin with Zenkaku Roman when you input. This normally induces the input of the problem character. Under Kana mode, press Shift key and then start inputting UTF-8. The dash in that character will cause the problem.
Accepting, I can reproduce.
Status: NEW → ASSIGNED
Target Milestone: M14
When converting the unicode string to ISO-2022-JP, I get NS_ERROR_UENC_NOMAPPING for a character \uFF0D. The character is FULLWIDTH HYPHEN-MUNUS and should be converted to a JIS code 0x213D. In the code, it fallback to NCR - in case of text/html as below. ESC$B#U#T#F-$B#8ESC(B (ESC is 0x1B) The expected conversion is from unicode 0xFF0D to JIS 0x213D. Reassign to cata.
Assignee: nhotta → cata
Status: ASSIGNED → NEW
Commonly used character in Japanese. Cata, Is this just a 1-line fix in a table?
Keywords: beta1
Putting on PDT+ radar for beta1.
Whiteboard: [PDT+]
reassign to ftang
reassign to ftang
Assignee: cata → ftang
Status: NEW → ASSIGNED
fix and check in.
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → FIXED
** Checked with 2/7/2000 Win32 build ** I don't think this is the correct mapping. Instead of a corrupt character, now I see a dash which is twice as long as it should be. In this particular context, the dask should be a short one. Notepad and 4.72 for eaxmple display the short dash for the particular input.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Here's a bit more info on this: The current fix maps the dash which gets inserted after Zenkaku alphanumeric characters "atuomatically" by Japanese IME to JIS: 0x213D = Shift_JIS: 0x815C =Unicode: \u2015 NotePad, 4.72 and other applications map this type of dash to: JIS:0x 215D = Shift_JIS: 0x817C = Unicode: \uFF0D I think we should be doing the latter.
I am confused. I though nhotta say we should map to JIS 213d
Status: REOPENED → ASSIGNED
Frank, I think I made a mistake originally on 1/22/2000 when I said: "The Zenkaku dash/minus is: 0x815C (Shift_JIS) \u2015" (=JIS 0x213D) I've misled Naoki about this. When I saw the real result, I realized that what I wanted is JIS 0x215D. JIS 0x213D/\u2015 is called "horizonatl bar", while JIS 0x215D/\uFF0D is called "Fullwidth-Hyphen Minus" (Full-width ASCII variant). This latter is what 4.7 uses. I'm sorry about this confusion. I'll send you and Naoki an example of this JIS 0x215D using 4.7x. Please take a look. Again I apologize for my misleading statement.
I sent you two the example of "Fullwidth-Hyphen Minus" which gets generated by JIME when the Zenkaku Alphanumeric mode is chosen under Japanese Windows and when you use the following key sequence under 4.7x. "Shift+U" + "Shift+T" + "Shift+F" + "- (without Shift)" + "8"
So the remaining issue is to map \uFF0D to JIS 0x215D instead of 0x213D?
How about Unicode: \u2015? Is that currently mapped to JIS: 0x213D = Shift_JIS: 0x815C ?
I am totaly confused. what should the following map to? U+FF0D U+2015 Here is the mapping from the unicode ftp site. ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0208.TXT 0x815C 0x213D 0x2015 # HORIZONTAL BAR ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/SHIFTJIS.TXT 0x815C 0x2015 # HORIZONTAL BAR ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT 0x815C 0x2015 #HORIZONTAL BAR 0x817C 0xFF0D #FULLWIDTH HYPHEN-MINUS
Right. This is why on Windows, it looks funny. Frank, can we look at what we did in 4.x and do the same -- whatever that was.
Whiteboard: [PDT+] → [PDT+] reviewed fix in tree, wait tree open to check in.
fixed and check in. now we map to 215D instead
Status: ASSIGNED → RESOLVED
Closed: 26 years ago26 years ago
Resolution: --- → FIXED
** Checked with 2/11/2000 Win32 build ** It seems that there might be a side-effect of this fix. It could be a font issue, but this is a strange problem. (CC'ing erik just in case...) I don't see the character which corresponds to \u215D at all when I try to input it in the following text fields: 1. Mail "To:", Mail Subject, Nav Location field, Nav form field, etc. Instead I get a space inserted. When I receive mail with this input, I see the space also. But I do see the intended character when I do the same key strokes in Editor/Mail Composer Text edit field. See the image below for illustration of this problem. By the way, this problem does not exist with the 2/10/2000 Win32 build.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
momoi-san, please do not recycle bug. Please close this bug if the conversion problem is fixed. Please open new bug for different issue. Mark it fixed.
Status: REOPENED → RESOLVED
Closed: 26 years ago26 years ago
Resolution: --- → FIXED
** Checked with 2/23/2000 Win32 build ** In mail, the Full-width Hyphen-minus is sent correctly with the above build. This is the desired result. Marking it verified/fixed.
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: