Closed Bug 54135 Opened 24 years ago Closed 23 years ago

conversion (fromU/toU) problem- Sjis code x'81ca' becomes x'fa54'

Categories

(Core :: DOM: Editor, defect, P3)

x86
Windows 2000
defect

Tracking

()

VERIFIED FIXED
mozilla0.9.6

People

(Reporter: hobbit_mak, Assigned: ftang)

References

()

Details

(Keywords: intl)

Attachments

(20 files)

4.69 KB, patch
Details | Diff | Splinter Review
1.09 KB, text/plain
Details
4.75 KB, patch
Details | Diff | Splinter Review
820 bytes, patch
Details | Diff | Splinter Review
16.51 KB, application/octet-stream
Details
35.80 KB, patch
Details | Diff | Splinter Review
313.27 KB, text/plain
Details
479.19 KB, patch
Details | Diff | Splinter Review
8.00 KB, text/plain
Details
145.66 KB, application/octet-stream
Details
928.55 KB, patch
Details | Diff | Splinter Review
35.80 KB, text/plain
Details
5.20 KB, text/plain
Details
483.29 KB, text/html
Details
490.65 KB, text/html
Details
483.38 KB, text/html
Details
483.38 KB, text/html
Details
62.78 KB, patch
Details | Diff | Splinter Review
164.38 KB, application/octet-stream
ftang
: review+
Details
146.12 KB, application/octet-stream
Details
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; m18) Gecko/20000924
BuildID:    2000092408

If you edit page of Shift JIS and save it proper character x'81ca' becomes x'fa54'.

Reproducible: Always
Steps to Reproduce:
1.Edit page of
http;//homepage1.nifty.com/hobbit/html/utf8.html
2.Save it to local file.


Actual Results:  x'81ca'(proper code) changed to x'fa54'(Windows code)						

Expected Results:  x'81ca' is reatained.						

Maybe related with 35166.

http://bugzilla.mozilla.org/show_bug.cgi?id=35166
assigning to ftang for initial debug
Assignee: beppe → ftang
minor issue. mark it as assign
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Target Milestone: --- → Future
x'fa54'(Windows code) cannot be displayed by Mozilla itself. (Build 2000112704) 
It is reported that Linux build also had this problem.

http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=474
Also sjis code 0x81E0 becomes to 0x8790
sjis code 0x81e6 becomes to 0xfA5B

Patch above was verified on Windows 2000 environments.
Attached list is in utf-8 encoding.
remove Future from the target milestone.
Keywords: intl
Target Milestone: Future → ---
This problem is fixed in Build 2001011720.
Sorry test with modified modules.
This problem is reproduced on Build ID 2001012304.
Because

http://bugzilla.mozilla.org/show_bug.cgi?id=44374

was fixed,

 81BE becomes 879C.
 81BF becomes 879B.
 81DA becomes 8797.
 81DB becomes 8796.
 81DF becomes 8791.
 81E3 becomes 8795.
 81E7 becomes 8792.

Patch is also updated.
Summary: Sjis code x'81ca' becomes x'fa54' → conversion problem- Sjis code x'81ca' becomes x'fa54'
 hobbit.makoto@nifty.ne.jp:
How you generate these patch ? Do you change the source table and use the ufrom
and uto tool to generate it? If so, can you give us the change of the source 
table?
Summary: conversion problem- Sjis code x'81ca' becomes x'fa54' → conversion (fromU/toU) problem- Sjis code x'81ca' becomes x'fa54'
I could not find how to use the tool. So I changed both source of coment and 
object.
Mozilla convert U+FFE2 to 7C7B (ISO-8022-JP).
It must be 224C (ISO-8022-JP).
How can I change the source table and use the ufrom and uto tool to generate it?

I could not find these tools in source file.
tools at mozilla/intl/uconv/tools/umaptable.c

nhotta- can you help to drive this ? I am overload
Assignee: ftang → nhotta
Status: ASSIGNED → NEW
hobbit.makoto@nifty.ne.jp, 
could you summarize the current remaining problem?
Problem left in build 2001050804 is

- Ten characters are changed if you edit Shift JIS source and save it as Shift 
JIS code.

 0x81be becomes 0x879c
 0x81bf becomes 0x879b
 0x81ca becomes 0xfa54
 0x81da becomes 0x8797
 0x81db becomes 0x8796
 0x81df becomes 0x8791
 0x81e0 becomes 0x8790
 0x81e3 becomes 0x8795
 0x81e6 becomes 0xfa5b
 0x81e7 becomes 0x8792

Problem about iso-8022-jp was fixed.
  I could not download latest source yet, so I could not use tool yet.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.1
 hobbit.makoto@nifty.ne.jp:
Please try the attached file and update your patch, thanks.
 I download mozilla/intl/uconv/tools/.
 But I could not found how you made sjis.ut and shis.ut.

 I went to mozilla/intl/uconb/tools/.
 I nmaked make.win and get umaptable.exe.

 Maybe you made sjis.ut and sjis.uf by umaptable and original conversion table.
 But I could not fine where and how to make sjis.ut and shis.ut.
Let me ask Frank and I will update.
for convert from sjis into unicode
I run /intl/uconv/tools/cp932tojdx.pl against 
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
and it will generate source/intl/uconv/ucvja/jis0208.ump
this will be shared by SJIS/EUC/ISO-2022-JP to unicode conversion

for convert from unicode into ShiftJIS
I run intl/uconv/tools/jis0208fromcp932.pl againt 
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
It will generate a file and I then pipe that file into umaptable -uf > 0208.uf
to generate the jis0208.uf

 I got cp932.txt from unicode and made sjis.uf from that.
 But some characters mapped to two sjis position.
 So I comment out sjis locations that had not proper in JIS X 0208 and 0212.
 I attached diff list and sjis.uf and confirmed that this sjis.uf solves 
problems.
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Target Milestone: mozilla0.9.2 → mozilla0.9.1
I put a diff for sjis.uf, it's very big.
I expected something similar to the patch of 02/14/01 06:13.
hobbit.makoto@nifty.ne.jp, do you have any idea why the diff is so large? What 
characters did you actually changed? Please list character codes of changed 
characters.
I suppose that original table is not derived from CP932.txt.
I would like to know the original table also, but I could not find it.
I am going to ask Frank.

The characters you changed are the same as listed in your comment 2001-05-08 18:07?
No, character I changed from cp932.txt is listed in 05/15/01 07:30.
No character in 2001-05-08 18:07 is not changed. They are the same as in 
cp932.txt.
It is strongly recommended to record from which tool and table or other 
resource, source was created. It is better to record in source file.

Maybe this is the reason of difficulity to solve this bug.

In 

http://bugzilla.mozilla.org/show_bug.cgi?id=35166

You conclude that you use cp932 for Unicode to SJIS conversion.
Bug 67374 - sources and tools to build unicode converters not in tree.
Depends on: 67374
Whiteboard: ftang to provide a source file for the current sjis.uf
TM to 0.9.2 per PDT triage (it's OK to check it in by Friday or after 0.9.1
branch is made).
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Reassign to ftang.
Assignee: nhotta → ftang
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
pdt+ base on 6/11 pdt meeting.
Whiteboard: ftang to provide a source file for the current sjis.uf → [PDT+]ftang to provide a source file for the current sjis.uf
I don't think we have time to address this problem by moz0.9.2. Push to moz0.9.3 
Target Milestone: mozilla0.9.2 → mozilla0.9.3
remove PDT+
Whiteboard: [PDT+]ftang to provide a source file for the current sjis.uf → ftang to provide a source file for the current sjis.uf
mark as nsbranch
Keywords: nsBranch
Whiteboard: ftang to provide a source file for the current sjis.uf → no progress yet. ftang to provide a source file for the current sjis.uf
I read a part of program for japanese-unicode conversion.
But I didn't recognize the sources and ways to generate some mapping tables. 

So, I made a tool to generate jis0201.uf, jis0208.uf, jis0208.ump, jis0208ext.uf
and sjis.uf from CP932.TXT and SHIFTJIS.TXT.
*.uf are generated with 'umaptable'.

Diffs are so large because,,,, the original mapping policy about codes that
SJIS:UCS2 = N:1 is to use HIGHER SJIS code.  It is not so good idea. They shoud
be mapped to LOWER SJIS code (without IBM ext codes : bug-82678).
see http://support.microsoft.com/support/kb/articles/Q170/5/59.ASP.
testpage : http://rh.vinelinux.org/~shom/sjis-cp932.html

----------

In addition, this tool can generate tables from APPLE_JAPANESE.TXT. 
# ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT
If it is possible to add "Shift_JIS (Macintosh)" , some problems will be resolved:
1) SJIS in/out problems (bookmark import, saving mail draft, compose, etc.)
     on Mac0S 8,9
   SJIS 815C (U+2014) EM DASH
        8160 (U+301C) WAVE DASH
        8161 (U+2016) DOUBLE VERTICAL LINE
        817C (U+2212) MINUS SIGN
        8191 (U+00A2) CENT SIGN (questionable : U+FFE0?)
        8192 (U+00A3) POUND SIGN (questionable : U+FFE1?)
        81CA (U+00AC) NOT SIGN (questionable : U+FFE2?)
2) Apple extended ShiftJIS codes (SJIS 8540-886D,EB41-ED96)
     # partly. because APPLE defined some codes as Unicode Sequences.
     # mozilla cannot process Unicode Sequeces.
testpage : http://rh.vinelinux.org/~shom/sjis-mac.html
Attached file mkjpconv.pl
usage: mkjpconv.pl SHIFTJIS.TXT CP932.TXT
(or mkjpconv.pl SHIFTJIS.TXT APPLE_JAPANESE.TXT
    APPLE_JAPANESE.TXT is generated (CR->LF) from APPLE/JAPANESE.TXT)

SHIFTJIS.TXT is:
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/SHIFTJIS.TXT
Missed 0.9.3.
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Matsumoto san, could you put sjis.uf generated by your tool?
I think the current problem is that it is hard to identify modifications.
For example, if we want to change the mapping for Shift_JIS 0x81ca, we want to
identify that change in sjis.uf. Then we can make sure the change won't affect
other characters.
There is very large amount of diffs, but I can see all glyphs defined in
SHIFTJIS.TXT on http://rh.vinelinux.org/~shom/sjis-cp932.html.

I think current mapping table has many (hidden) problems espacially dual mapped
codes in CP932.TXT.

Do you have a tool (or method) to generate SJIS->UCS2, UCS2->SJIS, JIS->UCS2,
UCS2->JIS mapping tables ?
I made a tool to check all codes in CP932.TXT.

# to generate Shift JIS encoded HTML page
perl mksjistest.pl CP932.TXT > sjis-cp932.html

# to generate UTF-8 encoded HTML page
perl mksjistest.pl CP932.TXT UTF-8 > sjis-cp932-utf8.html
I modified sjis-cp932-utf8.html by 0.9.2 and 0.9.2 + generated maps, and 'Save
As Charset' with Shift_JIS.
(so I'm using Linux. Please check on Windows)

diffs are: SRC = SJIS, ORG = modified by 0.9.2, NEW = modified by newmap

SRC  ORG  NEW
------------------ JIS defined region
81BE 879C 81BE
81BF 879B 81BF
81CA FA54 81CA
81DA 8797 81DA
81DB 8796 81DB
81DF 8791 81DF
81E0 8790 81E0
81E3 8795 81E3
81E6 FA5B 81E6 
81E7 8792 81E7
------------------ NEC specific codes
8754 FA4A 8754 
8755 FA4B 8755
 :     :    :
875D FA53 875D
8782 FA59 8782
8784 FA5A 8784
878A FA58 878A
8790 8790 81E0
8791 8791 81DF
8792 8792 81E7
8795 8795 81E3
8796 8796 81DB
8797 8797 81DA
879A FA5B 81E6
879B 879B 81BF
879C 879C 81BE
----------------- NEC selected IBM ext region
ED40 FA5C ED40
  :    :    :
EEF8 FA49 EEF8
EEF9 FA54 81CA
EEFA FA55 EEFA
EEFB FA56 EEFB
EEFC FA57 EEFC
------------------ IBM ext region
FA40 FA40 EEFA
  :    :    :
FA49 FA49 EEF8
FA4A FA4A 8754
  :    :    :
FA53 FA53 875D
FA54 FA54 81CA
FA55 FA55 EEFA
  :    :    :
FA57 FA57 EEFC
FA58 FA58 878A
FA59 FA59 8782
FA5B FA5B 81E6
FA5C FA5C ED40
  :    :    :
FC4B FC4B EEEC
------------------------- 
I think new mapping policy is same as OE.
(I heard OE mapped codes in IBM ext region to NEC selected region)
Attached file mksjistest.pl
Attached file sjis-cp932.html
roy yokoyama, can you help the check in the changes?
shoji-san, which diffs should we pick?
Assignee: ftang → yokoyama
Status: ASSIGNED → NEW
accepting for 0.9.4 milestone. 
Status: NEW → ASSIGNED
Please use *.uf, *.ump in the next attachment (old newmap.zip is not include
jisx0208ext.uf, sorry) or create them by mkjpconv.pl (from SHIFTJIS.TXT and
CP932.TXT).
'jisx0201gl.uf' is obsolete (not used in all sources).

And if these are acceptable (I'll make testcases), add mkjpconv.pl into
intl/uconv/tools.
# cp932tojdx.pl and jis0208fromcp932.pl will be obsolete.

I don't know where is the source of jis0212.{uf,ump}.
I want to change mkjpconv.pl to make jis0212.{uf, ump}.
m0.9.5
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Blocks: 99171
nsbranch- since Frank moved it to 0.9.5
Keywords: nsbranchnsbranch-
shoji-san: what is the status of this bug?
Are we waitng for ftang to provide sjis.uf source as stated in the whileboard?

Note: I'd appreciate if you can change the status of patches which are already obsolete.
=== cc'ing ftang
Please test new maps on Windows, Mac and OS/2.

testcases.zip has SJIS encoded texts to test.
1. display

 ALL chars in raw.txt must be shown.
 On Windows, ALL chars in rawext.txt, rawibmext.txt must be shown.

2. compose (round trip)

 1) edit raw{,ext,ibmext}.txt.html on composer
 2) save as with ShiftJIS
 3) rawdump.pl <saved html>

 "<ORG>:<NEW>:DIFF" are not round tripped codes.
 New codes must be "SJIS lower" in 
    http://bugzilla.mozilla.org/attachment.cgi?id=44509&action=view

 (see http://support.microsoft.com/support/kb/articles/Q170/5/59.ASP)

3. mail

 1) compose new mail
 2) CUT & PASTE all chars in raw.txt
 3) send
 ALL chars in the mail with raw.txt must be shown.
 on Windows, ALL chars in the mail with raw{ext,ibmext}.txt must be shown.
------
If any problem would be occured on Mac or OS/2 especially about 9 chars in
http://rh.vinelinux.org/~shom/sjisprob.html , it should not be corrected by 
changing mapping tables.
nhotta is back from sabbatical.  assiging back to him.
Assignee: yokoyama → nhotta
Status: ASSIGNED → NEW
move to 0.9.6
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.5 → mozilla0.9.6
I think the tool has to be reviewed first.
Frank, please review mkjpconv.pl included in the attachment of 08/08/01 03:17.
Assignee: nhotta → ftang
Status: ASSIGNED → NEW
Whiteboard: no progress yet. ftang to provide a source file for the current sjis.uf
Status: NEW → ASSIGNED
Blocks: 104056
Viewing the following diff by 4.x, we can see that mozilla is generating codes
which 4.x cannot show, so put 4xp keywoard.
diff between ...-sjis-0.9.2.html and ..--sjis-new.html
http://bugzilla.mozilla.org/attachment.cgi?id=44534&action=view
Keywords: 4xp
Comment on attachment 45060 [details]
newmap.zip (mkjpconv.pl, jis0208.uf, jis0208ext.uf, jis0201.uf, sjis.uf, IBMNEC.map )

rs=ftang.
Attachment #45060 - Flags: review+
Please check them in.
give back to nhotta for check in.
Assignee: ftang → nhotta
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
rs=blizzard
should someone from international QA be the qa_contact for this bug ?
Change QA contact to myself.
QA Contact: sujay → ylong
Checked in to the trunk.
The tool still needs to be checked in. Frank, please review the tool.
http://bugzilla.mozilla.org/attachment.cgi?id=51199&action=view
Assignee: nhotta → ftang
Status: ASSIGNED → NEW
Keywords: 4xp
The tool issue to be handled by bug 67374. Mark this as FIXED.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Changed QA contact to teruko@netscape.com.
QA Contact: ylong → teruko
No longer blocks: 104056
Verified as fixed in 2001-10-26 trunk build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: