Closed Bug 108136 Opened 19 years ago Closed 18 years ago

Shift_JIS conversion problem on MacOS9, OS/2

Categories

(Core :: Internationalization, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla1.2beta

People

(Reporter: shom, Assigned: smontagu)

References

Details

(Keywords: intl)

Attachments

(7 files, 2 obsolete files)

Now, the internal mapping table for Japanese is fully based on CP932 (bug-54135).
MacOS9 and OS/2 have another mapping table, so some characters have conversion
problem when mozilla passes internal UCS2 codes to OS Native functions which
handle UCS2.

PROBLEM:

testpage: http://rh.vinelinux.org/~shom/sjisprob.html

a problem on MacOS9
 http://bugzilla.mozilla.gr.jp/showattachment.cgi?attach_id=364

a problem on OS/2
 http://bugzilla.mozilla.gr.jp/showattachment.cgi?attach_id=367

RELATED BUGS : bug 35166, bug 58637, bug 33162, bug 65991

SOLUTIONs:

i) convert internal UCS2 codes to compatible codes of OS native codes when use
every OS function which treat UCS2. SO HARD?

ii) implement dual mapping method to conversion tables. VERY HARD, I think.

iii) make other tables for Shift_JIS variants. Currently Japanese:UCS2
conversion table is generated from CP932.txt with mkjpconv.pl (bug 54135). Since
this tool can generate other mapping tables (ex APPLE_JAPANESE.txt), it is easy
to make Shift_JIS(MacOS9) and Shift_JIS(OS2) -- or Shift_JIS(IBM943). This
solution have another advantage -- can treat platform depend characters without
unicode sequences (surrogate pairs?).
teruko: can you confirm.
->nhotta
Assignee: yokoyama → nhotta
*** This bug has been confirmed by popular vote. ***
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reassign to ftang.
Reassign to ftang.

Assignee: nhotta → ftang
Status: NEW → ASSIGNED
what will happen if we don't fix this?
Priority: -- → P4
Cannot treat many vendor specific Shift JIS kanji chars (I know NC4 can).
# CP932 contains MS specific kanji chars, so on Windows can treat them :b

and legal chars in JIS X 0208 have conversion problem.

[reported in bugzilla-jp <http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=868>]

testpage : http://rh.vinelinux.org/~shom/sjisprob2.html

* OS/2 

SJIS 4 chars (0x815c,0x8160,0x8161,0x817c) have problem.

screen shot 
  http://bugzilla.mozilla.gr.jp/showattachment.cgi?attach_id=538
screen shot after re-input '?' chars in and submit
  http://bugzilla.mozilla.gr.jp/showattachment.cgi?attach_id=539

 - display problem
   (0x815c,0x8160,0x8161,0x817c) are displayed as '?'
   on page body, bookmark title, tab, javascript alert.
   on titlebar, ' '.

 - query send problem
   When input one of (0x815c,0x8160,0x8161,0x817c) in INPUT type=text /
   TEXTAREA, chars following these chars are truncated.
   (http://bugzilla.mozilla.gr.jp/showattachment.cgi?attach_id=539)

 - compose problem
   (0x815c,0x8160,0x8161,0x817c) becomes &#8212; &#12316; &#8214; &#8722;
   in saved page.

 - mail/news send problem
   (0x815c,0x8160,0x8161,0x817c) treated as illegal, so cannot send.
   if ignore alert, 0x815c becomes '--', others '?'.


* Mac OS 9 (and probably Mac OS X)

 (0x815c,0x8160,0x8161,0x817c,0x8191,0x8192,0x81ca) have problem.

 - query send problem
   When input (0x815c,0x8160,0x8161,0x817c,0x8191,0x8192,0x81ca) in
   INPUT type=text / TEXTAREA, chars following these chars are truncated.

 - mail/news send problem
   (0x815c,0x8160,0x8161,0x817c,0x8191,0x8192,0x81ca) treated as illegal,
   so cannot send.
   if ignore alert, 0x815c becomes '--', others '?'.

 - bookmark problem
   bookmark title contains (0x815c,0x8160,0x8161,0x817c,0x8191,0x8192,0x81ca)
   in menubar of OS are displaed as blank.

Blocks: 157673
one of the top problem mozilla japanese group report. not sure how to solve it
yet. May need to break down to different tasks.
Keywords: intl, nsbeta1+
Priority: P4 → P2
Target Milestone: --- → mozilla1.2beta
Kohei Ichioka has made a patch for this bug.

http://www5a.biglobe.ne.jp/~expf/ucvja.tar.gz

This file contains readme.txt which explains how to apply the patch.

And chado has made a Mac build based on this patch.
ftp://download.sourceforge.jp/wazilla/996/Wazilla-mac-1.1-2156c.sea.bin
Adding mkaply to Cc.

Tarball in comment 8 contains the patch for OS/2, but Kohei Ichioka
hasn't tested it. He doesn't have OS/2. Can you review the patch and
test it?
Severity: normal → critical
In the original report,
>MacOS9 and OS/2 have another mapping table
Does the problem exist for MacOSX or this is specific to MacOS9?
Can anyone attach a patch using cvs diff -u to this bug?
Attached file gzipped patch (obsolete) —
* change Japanese to Unicode conversion rule
 pref("intl.jis0208.map", "Apple") using MacJapanese conversion rule.
 pref("intl.jis0208.map", "IBM943") using IBM943 conversion rule.

* dual mapping for Unicode to Japanese conversion rule
 CP932 ,Apple ,IBM943	 SJIS  (JIS)
 U+2015,U+2014,U+2014 -> 0x815C(01-29)
 U+FF5E,U+301C,U+301C -> 0x8160(01-33)
 U+2225,U+2016,U+2016 -> 0x8161(01-34)
 U+FF0D,U+2212,U+2212 -> 0x817C(01-61)
 U+FFE0,U+00A2,U+FFE0 -> 0x8191(01-81)
 U+FFE1,U+00A3,U+FFE1 -> 0x8192(01-82)
 U+FFE2,U+00AC,U+FFE2 -> 0x81CA(02-44)
 U+FFE4,U+FFE4,U+00A6 -> 0xEEFA(92-92)
 U+FFE4,U+FFE4,U+00A6 -> 0xFA55

mozilla/intl/uconv/tools/jamap.pl creates maps.
mozilla/intl/uconv/ucvja/japanese.map is the map for Japanese to Unicode.
Matsumoto san,
Does the problem exist for MacOSX or this is specific to MacOS9?
I don't know.
But I think MacOSX uses the same conversion rule as MacOS9 for backward
compatibility.
could you give us a patch instead of a application/x-gzip ?
Attached patch patch #1/3Splinter Review
Attached patch patch #2/3Splinter Review
Attached patch patch #3/3Splinter Review
Attached patch patch #1/4Splinter Review
Attached patch patch #2/4Splinter Review
Attached patch patch #3/4Splinter Review
Attached patch patch #4/4 (obsolete) — Splinter Review
The patch id=98510-98512 is incomplete.
id=102147-102150 is the actual patch.
Attachment #102147 - Flags: review+
Attachment #102148 - Flags: review+
Attachment #102149 - Flags: review+
Attachment #102150 - Flags: review+
Attachment #98078 - Attachment is obsolete: true
Changed QA contact to ylong@netscape.com.
QA Contact: teruko → ylong
Comment on attachment 102147 [details] [diff] [review]
patch #1/4

sr=alecf
Comment on attachment 102148 [details] [diff] [review]
patch #2/4

sr=alecf
Attachment #102148 - Flags: superreview+
Comment on attachment 102149 [details] [diff] [review]
patch #3/4

sr=alecf
Attachment #102149 - Flags: superreview+
Comment on attachment 102150 [details] [diff] [review]
patch #4/4

what does this notation mean?
+ const PRUint16 (*mMapIndex)[128];

this seems a little confusing, how about

const PRUint16* mMapIndex[128]?

Though actually are you storing a pointer to a 128 bit array? I think this is a
misuse of this type and what you might really want is PRUint16**  mMapIndex?

Also, storing the per-platform in prefs seems unnecessary... I mean, the value
is never going to change right? why not just #ifdef the code?

Prefs should only be used when the value is going to be changed... the
per-platform pref stuff is when you want the DEFAULT value of the pref to vary
based on the platform, but you still expect the user to change it later.
Attachment #102150 - Attachment is obsolete: true
mMapIndex is actually a pointer to a 128-PRUint16-values array.
It points the first item of gIndex, gCP932Index, or gIBM943Index.

const PRUint16 gIndex[2][128];
const PRUint16 gCP932Index[2][128];
const PRUint16 gIBM943Index[2][128];

If I use PRUint16** mMapIndex, I must use extra variables.
const PRUint16 *const gIndex[2] = { gIndex1, gIndex2 };
const PRUint16 gInde