Closed
Bug 24807
Opened 26 years ago
Closed 26 years ago
Incorrect conversion of a Zenkaku long dash char in Japanese
Categories
(MailNews Core :: Internationalization, defect, P3)
Tracking
(Not tracked)
VERIFIED
FIXED
M14
People
(Reporter: momoi, Assigned: ftang)
Details
(Whiteboard: [PDT+] reviewed fix in tree, wait tree open to check in.)
Attachments
(3 files)
** Observed with 1/21/00 Win32 build **
Japanese has 2 types of Zenkaku long dash characters.
One is used when elongating a Kana vowel, the other is used
when a preceding entry is Zenkaku Roman. This latter is more
like a Zenkaku dash or minus sign. Japanese IME normally
takes crae of this switch automatically figuring out which one
to use from context of input when the dash character key is
pressed.
The problem occurs as we try to send out the Roman dash or
minus sign. It arrives as some other characters -- actually looks
like 2 unrelated characters instead of one dash/minus character.
Clearly we have a conversion problem somewhere -- possibly in
the conversion table itself.
The Zenkaku dash/minus is: 0x815C (Shift_JIS) \u2015
The Zenkaku vowel elongator is: 0x815B (Shift_JIS) \u30FC
What is interesting is that when you send a Japanese msg containting
these characters, the msg arrives with the following changes:
1. Zenkaku dash/minus is tunred into some other characters (2 chars)
2. Zenkaku vowel elongator seems to be turned into Zenkaku dash/minus.
I have 2 images illustrating this problem.
Image 1 -- When composing the word UTF-8 with 2 different dashes.
Image 2 -- when the msg arrives in the mailbox and looks converted.
| Reporter | ||
Comment 1•26 years ago
|
||
| Reporter | ||
Comment 2•26 years ago
|
||
Comment 3•26 years ago
|
||
Can this be seen also with HTML editor?
| Reporter | ||
Comment 4•26 years ago
|
||
Yes, also with HTML editor. The result looks a little different from Plain Test editor case
but converted wrong also.
Comment 5•26 years ago
|
||
I was able to send those two characters (\u2015 and \u30FC).
Do I need to combine those with other characters in order to reproduce?
| Reporter | ||
Comment 6•26 years ago
|
||
I don' think so but you might want to begin with Zenkaku Roman when you input. This normally
induces the input of the problem character. Under Kana mode, press Shift key and then start inputting
UTF-8. The dash in that character will cause the problem.
Comment 8•26 years ago
|
||
When converting the unicode string to ISO-2022-JP, I get NS_ERROR_UENC_NOMAPPING
for a character \uFF0D. The character is FULLWIDTH HYPHEN-MUNUS and should be
converted to a JIS code 0x213D. In the code, it fallback to NCR - in case
of text/html as below.
ESC$B#U#T#F-$B#8ESC(B (ESC is 0x1B)
The expected conversion is from unicode 0xFF0D to JIS 0x213D.
Reassign to cata.
Assignee: nhotta → cata
Status: ASSIGNED → NEW
Commonly used character in Japanese.
Cata, Is this just a 1-line fix in a table?
Keywords: beta1
| Assignee | ||
Comment 11•26 years ago
|
||
reassign to ftang
| Assignee | ||
Updated•26 years ago
|
Status: NEW → ASSIGNED
| Assignee | ||
Comment 13•26 years ago
|
||
fix and check in.
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 14•26 years ago
|
||
** Checked with 2/7/2000 Win32 build **
I don't think this is the correct mapping. Instead of a corrupt
character, now I see a dash which is twice as long as it should be.
In this particular context, the dask should be a short one. Notepad and 4.72
for eaxmple display the short dash for the particular input.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Reporter | ||
Comment 15•26 years ago
|
||
Here's a bit more info on this:
The current fix maps the dash which gets inserted after Zenkaku alphanumeric characters
"atuomatically" by Japanese IME to
JIS: 0x213D = Shift_JIS: 0x815C =Unicode: \u2015
NotePad, 4.72 and other applications map this type of dash to:
JIS:0x 215D = Shift_JIS: 0x817C = Unicode: \uFF0D
I think we should be doing the latter.
| Assignee | ||
Comment 16•26 years ago
|
||
I am confused. I though nhotta say we should map to JIS 213d
Status: REOPENED → ASSIGNED
| Reporter | ||
Comment 17•26 years ago
|
||
Frank, I think I made a mistake originally on 1/22/2000 when I said:
"The Zenkaku dash/minus is: 0x815C (Shift_JIS) \u2015" (=JIS 0x213D)
I've misled Naoki about this. When I saw the real result, I realized that what I wanted is
JIS 0x215D. JIS 0x213D/\u2015 is called "horizonatl bar", while JIS 0x215D/\uFF0D is called
"Fullwidth-Hyphen Minus" (Full-width ASCII variant). This latter is what 4.7 uses.
I'm sorry about this confusion. I'll send you and Naoki an example of this JIS 0x215D
using 4.7x. Please take a look. Again I apologize for my misleading statement.
| Reporter | ||
Comment 18•26 years ago
|
||
I sent you two the example of "Fullwidth-Hyphen Minus"
which gets generated by JIME when the Zenkaku Alphanumeric
mode is chosen under Japanese Windows and when you use the
following key sequence under 4.7x.
"Shift+U" + "Shift+T" + "Shift+F" + "- (without Shift)" + "8"
Comment 19•26 years ago
|
||
So the remaining issue is to map \uFF0D to JIS 0x215D instead of 0x213D?
| Reporter | ||
Comment 20•26 years ago
|
||
How about Unicode: \u2015? Is that currently mapped to
JIS: 0x213D = Shift_JIS: 0x815C ?
| Assignee | ||
Comment 21•26 years ago
|
||
I am totaly confused.
what should the following map to?
U+FF0D
U+2015
Here is the mapping from the unicode ftp site.
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0208.TXT
0x815C 0x213D 0x2015 # HORIZONTAL BAR
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/SHIFTJIS.TXT
0x815C 0x2015 # HORIZONTAL BAR
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
0x815C 0x2015 #HORIZONTAL BAR
0x817C 0xFF0D #FULLWIDTH HYPHEN-MINUS
| Reporter | ||
Comment 22•26 years ago
|
||
Right. This is why on Windows, it looks funny.
Frank, can we look at what we did in 4.x and
do the same -- whatever that was.
| Assignee | ||
Updated•26 years ago
|
Whiteboard: [PDT+] → [PDT+] reviewed fix in tree, wait tree open to check in.
| Assignee | ||
Comment 23•26 years ago
|
||
fixed and check in. now we map to 215D instead
Status: ASSIGNED → RESOLVED
Closed: 26 years ago → 26 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 24•26 years ago
|
||
** Checked with 2/11/2000 Win32 build **
It seems that there might be a side-effect of this fix.
It could be a font issue, but this is a strange problem.
(CC'ing erik just in case...)
I don't see the character which corresponds to \u215D
at all when I try to input it in the following text fields:
1. Mail "To:", Mail Subject, Nav Location field, Nav form field, etc.
Instead I get a space inserted. When I receive mail with this
input, I see the space also.
But I do see the intended character when I do the same key strokes
in Editor/Mail Composer Text edit field.
See the image below for illustration of this problem.
By the way, this problem does not exist with the 2/10/2000 Win32 build.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Reporter | ||
Comment 25•26 years ago
|
||
| Assignee | ||
Comment 26•26 years ago
|
||
momoi-san, please do not recycle bug. Please close this bug if the conversion
problem is fixed. Please open new bug for different issue. Mark it fixed.
Status: REOPENED → RESOLVED
Closed: 26 years ago → 26 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 27•25 years ago
|
||
** Checked with 2/23/2000 Win32 build **
In mail, the Full-width Hyphen-minus is sent correctly
with the above build. This is the desired result.
Marking it verified/fixed.
Status: RESOLVED → VERIFIED
Updated•21 years ago
|
Product: MailNews → Core
Updated•17 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•