Closed Bug 162364 Opened 22 years ago Closed 8 years ago

update cns11643 mapping to unicode3.2

Categories

(Core :: Internationalization, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Future

People

(Reporter: ftang, Unassigned)

Details

(Keywords: intl)

Attachments

(15 files)

Our current cns11643 mapping is based on a beta version of unicode 3.0 unihan
database. Now we have unicode 3.2, we should update our table to that.
The unicode 3.2 version of unihan database is at 
http://www.unicode.org/Public/3.2-Update/Unihan-3.2.0.txt
since now the table include plane 1 - 16, we need to change our tool.
Here is what I done for the tool. it origional will generate 
cns1986p1.txt
cns1986p14.txt
cns1986p2.txt
cns1992p1.txt
cns1992p2.txt
cns1992p3.txt
cnsIRGTp1.txt
cnsIRGTp15.txt
cnsIRGTp2.txt
cnsIRGTp3.txt
cnsIRGTp4.txt
cnsIRGTp5.txt
cnsIRGTp6.txt
cnsIRGTp7.txt

now, it will generate 5 more files
cnsIRGTp15ExtB.txt
cnsIRGTp3ExtB.txt
cnsIRGTp4ExtB.txt
cnsIRGTp5ExtB.txt
cnsIRGTp6ExtB.txt
cnsIRGTp7ExtB.txt

the cnsIRGTp1-7 will only include the BMP part
the newly generated cnsIRGTpExtB.txt will be the mapping to extension b -
0x20000. For example, mapping to U+212345 will be generated as 0x2345 in the file

how to run the tool?
1. apply the patch so it can handle the unihan3.2
2. download http://www.unicode.org/Public/3.2-Update/Unihan-3.2.0.txt and put
into mozilla/intl/uconv/tools
3. run the tool 
perl unihan2cns.pl < Unihan-3.2.0.txt
it will generate the files
Those files can be pipe into umaptable to generate .uf or .ut files
Attachment #94982 - Attachment description: generated → generated cnsIRGTp5.txt
Attachment #94989 - Attachment description: generated nsIRGTp4ExtB.txt → generated cnsIRGTp4ExtB.txt
Keywords: intl
QA Contact: ruixu → ylong
I found the current nonhan.txt is not good. here is the new one. 
add brian.yuan@sun.com to cc
brian, why don't you start to compare the cnsIRGTp3-7.txt cnsIRGTp15.txt and
cnsIRGTp3-7ExtB.txt cnsIRGTp15ExtB.txt files with the cns mapping table  you have
remember the number in thoes ExtB file need to add up 0x20000 
for example the first line in cnsIRGTp6ExtB.txt is

0x2121
0xF802
# <CJK>

this mean the cns 11643 plane 1 0x2121 is map to unicode U+2F802 (0xf802 + 0x20000)
Herere is the new file size

==> cns15.uf <==
/*      End of table Total Length = 0x0363 * 2 */

==> cns15extb.uf <==
/*      End of table Total Length = 0x3817 * 2 */

==> cns3.uf <==
/*      End of table Total Length = 0x5377 * 2 */

==> cns3extb.uf <==
/*      End of table Total Length = 0x0131 * 2 */

==> cns4.uf <==
/*      End of table Total Length = 0x27A3 * 2 */

==> cns4extb.uf <==
/*      End of table Total Length = 0x355B * 2 */

==> cns5.uf <==
/*      End of table Total Length = 0x07CE * 2 */

==> cns5extb.uf <==
/*      End of table Total Length = 0x6342 * 2 */

==> cns6.uf <==
/*      End of table Total Length = 0x03B0 * 2 */

==> cns6extb.uf <==
/*      End of table Total Length = 0x4376 * 2 */

==> cns7.uf <==
/*      End of table Total Length = 0x0278 * 2 */

==> cns7extb.uf <==
/*      End of table Total Length = 0x46B6 * 2 */

==> cnsIRGTp1.uf <==
/*      End of table Total Length = 0x43BD * 2 */

==> cns_1.uf <==
/*      End of table Total Length = 0x43BD * 2 */

==> cns_2.uf <==
/*      End of table Total Length = 0x4747 * 2 */

==> cns15.ut <==
/*      End of table Total Length = 0x0417 * 2 */

==> cns15extb.ut <==
/*      End of table Total Length = 0x1D20 * 2 */

==> cns3.ut <==
/*      End of table Total Length = 0x19E4 * 2 */

==> cns3extb.ut <==
/*      End of table Total Length = 0x0110 * 2 */

==> cns4.ut <==
/*      End of table Total Length = 0x1CDD * 2 */

==> cns4extb.ut <==
/*      End of table Total Length = 0x1C9D * 2 */

==> cns5.ut <==
/*      End of table Total Length = 0x07D4 * 2 */

==> cns5extb.ut <==
/*      End of table Total Length = 0x22C2 * 2 */

==> cns6.ut <==
/*      End of table Total Length = 0x03B5 * 2 */

==> cns6extb.ut <==
/*      End of table Total Length = 0x19D5 * 2 */

==> cns7.ut <==
/*      End of table Total Length = 0x023B * 2 */

==> cns7extb.ut <==
/*      End of table Total Length = 0x1A72 * 2 */

==> cnsIRGTp1.ut <==
/*      End of table Total Length = 0x1724 * 2 */

==> cns_1.ut <==
/*      End of table Total Length = 0x1724 * 2 */

==> cns_2.ut <==
/*      End of table Total Length = 0x1EF1 * 2 */

Here is the current file size
==> cns3.uf <==
/*      End of table Total Length = 0x39F1 * 2 */

==> cns4.uf <==
/*      End of table Total Length = 0x0F3F * 2 */

==> cns5.uf <==
/*      End of table Total Length = 0x00F1 * 2 */

==> cns6.uf <==
/*      End of table Total Length = 0x0174 * 2 */

==> cns7.uf <==
/*      End of table Total Length = 0x0044 * 2 */

==> cns_1.uf <==
/*      End of table Total Length = 0x43BD * 2 */

==> cns_2.uf <==
/*      End of table Total Length = 0x4747 * 2 */

==> cns3.ut <==
/*      End of table Total Length = 0x1892 * 2 */

==> cns4.ut <==
/*      End of table Total Length = 0x0E59 * 2 */

==> cns5.ut <==
/*      End of table Total Length = 0x00F0 * 2 */

==> cns6.ut <==
/*      End of table Total Length = 0x017B * 2 */

==> cns7.ut <==
/*      End of table Total Length = 0x004B * 2 */

==> cns_1.ut <==
/*      End of table Total Length = 0x1724 * 2 */

==> cns_2.ut <==
/*      End of table Total Length = 0x1EF1 * 2 */
the previous comment how the plan 1 twice, make sure you don't count
cnsIRGTp1.uf and cnsIRGTp1.ut
It looks the old table need 155 K and the update table need 450 K. So it is a
about 300K increase in binary.
here is the wc -l nsIRGT*.txt which represent the number of characters in the cns
   6088 cnsIRGTp1.txt
    245 cnsIRGTp15.txt
   6476 cnsIRGTp15ExtB.txt
   7650 cnsIRGTp2.txt
   6323 cnsIRGTp3.txt
     71 cnsIRGTp3ExtB.txt
   3810 cnsIRGTp4.txt
   3476 cnsIRGTp4ExtB.txt
    458 cnsIRGTp5.txt
   8143 cnsIRGTp5ExtB.txt
    227 cnsIRGTp6.txt
   6159 cnsIRGTp6ExtB.txt
    149 cnsIRGTp7.txt
   6388 cnsIRGTp7ExtB.txt
  55663 total

Add Ervin.Yan@Sun.com to the interest list
also see bug 162431 that part talk about adding surrogate / unicode plane 1-16
support into nsIUnicodeDecodeHelper, nsIUnicodeEncodeHelper
over ftang. estimate work - 5-10 days
Assignee: yokoyama → ftang
ftang future
Status: NEW → ASSIGNED
Target Milestone: --- → Future
what a hack. I have not touch mozilla code for 2 years. I didn't read these bugs
for 2 years. And they are still there. Just close them as won't fix to clean up.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → WONTFIX
Mass Re-open of Frank Tangs Won't fix debacle. Spam is his responsibility not my own
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Mass Re-assinging Frank Tangs old bugs that he closed won't fix and had to be
re-open. Spam is his fault not my own
Assignee: ftang → nobody
Status: REOPENED → NEW
QA Contact: amyy → i18n
This code is gone.
Status: NEW → RESOLVED
Closed: 19 years ago8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: