Closed Bug 162364 Opened 23 years ago Closed 10 years ago

update cns11643 mapping to unicode3.2

Categories

(Core :: Internationalization, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Future

People

(Reporter: ftang, Unassigned)

Details

(Keywords: intl)

Attachments

(15 files)

Our current cns11643 mapping is based on a beta version of unicode 3.0 unihan database. Now we have unicode 3.2, we should update our table to that.
The unicode 3.2 version of unihan database is at http://www.unicode.org/Public/3.2-Update/Unihan-3.2.0.txt
since now the table include plane 1 - 16, we need to change our tool. Here is what I done for the tool. it origional will generate cns1986p1.txt cns1986p14.txt cns1986p2.txt cns1992p1.txt cns1992p2.txt cns1992p3.txt cnsIRGTp1.txt cnsIRGTp15.txt cnsIRGTp2.txt cnsIRGTp3.txt cnsIRGTp4.txt cnsIRGTp5.txt cnsIRGTp6.txt cnsIRGTp7.txt now, it will generate 5 more files cnsIRGTp15ExtB.txt cnsIRGTp3ExtB.txt cnsIRGTp4ExtB.txt cnsIRGTp5ExtB.txt cnsIRGTp6ExtB.txt cnsIRGTp7ExtB.txt the cnsIRGTp1-7 will only include the BMP part the newly generated cnsIRGTpExtB.txt will be the mapping to extension b - 0x20000. For example, mapping to U+212345 will be generated as 0x2345 in the file
how to run the tool? 1. apply the patch so it can handle the unihan3.2 2. download http://www.unicode.org/Public/3.2-Update/Unihan-3.2.0.txt and put into mozilla/intl/uconv/tools 3. run the tool perl unihan2cns.pl < Unihan-3.2.0.txt it will generate the files Those files can be pipe into umaptable to generate .uf or .ut files
Attachment #94982 - Attachment description: generated → generated cnsIRGTp5.txt
Attachment #94989 - Attachment description: generated nsIRGTp4ExtB.txt → generated cnsIRGTp4ExtB.txt
Keywords: intl
QA Contact: ruixu → ylong
I found the current nonhan.txt is not good. here is the new one.
brian, why don't you start to compare the cnsIRGTp3-7.txt cnsIRGTp15.txt and cnsIRGTp3-7ExtB.txt cnsIRGTp15ExtB.txt files with the cns mapping table you have remember the number in thoes ExtB file need to add up 0x20000 for example the first line in cnsIRGTp6ExtB.txt is 0x2121 0xF802 # <CJK> this mean the cns 11643 plane 1 0x2121 is map to unicode U+2F802 (0xf802 + 0x20000)
Herere is the new file size ==> cns15.uf <== /* End of table Total Length = 0x0363 * 2 */ ==> cns15extb.uf <== /* End of table Total Length = 0x3817 * 2 */ ==> cns3.uf <== /* End of table Total Length = 0x5377 * 2 */ ==> cns3extb.uf <== /* End of table Total Length = 0x0131 * 2 */ ==> cns4.uf <== /* End of table Total Length = 0x27A3 * 2 */ ==> cns4extb.uf <== /* End of table Total Length = 0x355B * 2 */ ==> cns5.uf <== /* End of table Total Length = 0x07CE * 2 */ ==> cns5extb.uf <== /* End of table Total Length = 0x6342 * 2 */ ==> cns6.uf <== /* End of table Total Length = 0x03B0 * 2 */ ==> cns6extb.uf <== /* End of table Total Length = 0x4376 * 2 */ ==> cns7.uf <== /* End of table Total Length = 0x0278 * 2 */ ==> cns7extb.uf <== /* End of table Total Length = 0x46B6 * 2 */ ==> cnsIRGTp1.uf <== /* End of table Total Length = 0x43BD * 2 */ ==> cns_1.uf <== /* End of table Total Length = 0x43BD * 2 */ ==> cns_2.uf <== /* End of table Total Length = 0x4747 * 2 */ ==> cns15.ut <== /* End of table Total Length = 0x0417 * 2 */ ==> cns15extb.ut <== /* End of table Total Length = 0x1D20 * 2 */ ==> cns3.ut <== /* End of table Total Length = 0x19E4 * 2 */ ==> cns3extb.ut <== /* End of table Total Length = 0x0110 * 2 */ ==> cns4.ut <== /* End of table Total Length = 0x1CDD * 2 */ ==> cns4extb.ut <== /* End of table Total Length = 0x1C9D * 2 */ ==> cns5.ut <== /* End of table Total Length = 0x07D4 * 2 */ ==> cns5extb.ut <== /* End of table Total Length = 0x22C2 * 2 */ ==> cns6.ut <== /* End of table Total Length = 0x03B5 * 2 */ ==> cns6extb.ut <== /* End of table Total Length = 0x19D5 * 2 */ ==> cns7.ut <== /* End of table Total Length = 0x023B * 2 */ ==> cns7extb.ut <== /* End of table Total Length = 0x1A72 * 2 */ ==> cnsIRGTp1.ut <== /* End of table Total Length = 0x1724 * 2 */ ==> cns_1.ut <== /* End of table Total Length = 0x1724 * 2 */ ==> cns_2.ut <== /* End of table Total Length = 0x1EF1 * 2 */ Here is the current file size ==> cns3.uf <== /* End of table Total Length = 0x39F1 * 2 */ ==> cns4.uf <== /* End of table Total Length = 0x0F3F * 2 */ ==> cns5.uf <== /* End of table Total Length = 0x00F1 * 2 */ ==> cns6.uf <== /* End of table Total Length = 0x0174 * 2 */ ==> cns7.uf <== /* End of table Total Length = 0x0044 * 2 */ ==> cns_1.uf <== /* End of table Total Length = 0x43BD * 2 */ ==> cns_2.uf <== /* End of table Total Length = 0x4747 * 2 */ ==> cns3.ut <== /* End of table Total Length = 0x1892 * 2 */ ==> cns4.ut <== /* End of table Total Length = 0x0E59 * 2 */ ==> cns5.ut <== /* End of table Total Length = 0x00F0 * 2 */ ==> cns6.ut <== /* End of table Total Length = 0x017B * 2 */ ==> cns7.ut <== /* End of table Total Length = 0x004B * 2 */ ==> cns_1.ut <== /* End of table Total Length = 0x1724 * 2 */ ==> cns_2.ut <== /* End of table Total Length = 0x1EF1 * 2 */
the previous comment how the plan 1 twice, make sure you don't count cnsIRGTp1.uf and cnsIRGTp1.ut It looks the old table need 155 K and the update table need 450 K. So it is a about 300K increase in binary.
here is the wc -l nsIRGT*.txt which represent the number of characters in the cns 6088 cnsIRGTp1.txt 245 cnsIRGTp15.txt 6476 cnsIRGTp15ExtB.txt 7650 cnsIRGTp2.txt 6323 cnsIRGTp3.txt 71 cnsIRGTp3ExtB.txt 3810 cnsIRGTp4.txt 3476 cnsIRGTp4ExtB.txt 458 cnsIRGTp5.txt 8143 cnsIRGTp5ExtB.txt 227 cnsIRGTp6.txt 6159 cnsIRGTp6ExtB.txt 149 cnsIRGTp7.txt 6388 cnsIRGTp7ExtB.txt 55663 total
Add Ervin.Yan@Sun.com to the interest list
also see bug 162431 that part talk about adding surrogate / unicode plane 1-16 support into nsIUnicodeDecodeHelper, nsIUnicodeEncodeHelper
over ftang. estimate work - 5-10 days
Assignee: yokoyama → ftang
ftang future
Status: NEW → ASSIGNED
Target Milestone: --- → Future
what a hack. I have not touch mozilla code for 2 years. I didn't read these bugs for 2 years. And they are still there. Just close them as won't fix to clean up.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → WONTFIX
Mass Re-open of Frank Tangs Won't fix debacle. Spam is his responsibility not my own
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Mass Re-assinging Frank Tangs old bugs that he closed won't fix and had to be re-open. Spam is his fault not my own
Assignee: ftang → nobody
Status: REOPENED → NEW
QA Contact: amyy → i18n
This code is gone.
Status: NEW → RESOLVED
Closed: 21 years ago10 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: