Closed
Bug 181725
Opened 22 years ago
Closed 21 years ago
nsBIG5ToUnicode.cpp: Big5 code range is incorrect ...
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: s793016, Assigned: mcsmurf)
References
Details
Attachments
(1 file)
477 bytes,
patch
|
smontagu
:
review+
smontagu
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Win 9x 4.90; zh-TW; rv:1.0.0) Gecko/20020530 Build Identifier: Mozilla/5.0 (Windows; U; Win 9x 4.90; zh-TW; rv:1.0.0) Gecko/20020530 Sory, my English is very very poor ... ;d inside nsBIG5ToUnicode.cpp, line 66: static const uRange g_BIG5Ranges[] = { { 0x00, 0x7E }, { 0x81, 0xFC } }; Should changed to: static const uRange g_BIG5Ranges[] = { { 0x00, 0x7E }, { 0x81, 0xFE } }; ... That's All ... Reproducible: Always Steps to Reproduce: 1. 2. 3. http://umunhum.stanford.edu/~lee/chicomp/DBencoding.html == 2. Generic Mapping The specific Big5 code space contains 127x188 = 23876 code points, where the valid code range of the first byte is 0x80--0xFE, and that of the second byte is 0x21--0x7E and 0xA1--0xFE ... ==
Comment 1•22 years ago
|
||
-> intl
Assignee: asa → smontagu
Component: Browser-General → Internationalization
QA Contact: asa → ylong
Assignee | ||
Comment 2•22 years ago
|
||
But it also says "0x80--0xFE". Shouldn't then the 0x81 changed to 0x80?
Sorry ... my English is very very poor ... :( Sorry again, please forgot that web link ... ^^; Now, let's take a look with nsBIG5HKSCSToUnicode.cpp, line 75: == static const uRange g_BIG5HKSCSRanges[] = { { 0x00, 0x7E }, { 0x81, 0xA0 }, { 0xA1, 0xC6 }, { 0xC6, 0xC8 }, { 0xC9, 0xF9 }, { 0xF9, 0xFE } }; == ... and take a look with nsBIG5ToUnicode.cpp, line 66: == static const uRange g_BIG5Ranges[] = { { 0x00, 0x7E }, { 0x81, 0xFC } }; == See?? The Big5-hkscs code range is "0x00 - 0x7E, 0x81 - 0xFE", and the big5 code range is "0x00 - 0x7E, 0x81 - 0xFC" ... As I know, Big5-hkscs was a subset of Big5, and the code range between them must be the same! http://www.info.gov.hk/digital21/eng/hkscs/download/e_sect2.pdf Code Allocation of the HKSCS-2001 in Big-5 (2.2MB) So, big5 code range inside nsBIG5ToUnicode.cpp is incorrect!
Assignee | ||
Comment 4•22 years ago
|
||
Assignee | ||
Updated•22 years ago
|
Attachment #107366 -
Flags: review?(ftang)
Assignee | ||
Comment 6•22 years ago
|
||
. (sorry for spam)
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Comment 7•22 years ago
|
||
Hello, (Shanjian, I hope you don't mind me adding you to the Cc: list. :-) Yes, I can confirm that Frank Wein's proposed fix is correct. Big5 encodings (the original Big-5, the new Big-5E and Big5-HKSCS) have the same range: * 1st byte: 0x81-0xFE * 2nd byte: 0x40-0x7E, 0xA1-0xFE The current big5.uf and big5.ut do not contain mappings in 1st-byte 0x81-0xA0 and 0xFD-0xFE, but they should: These are part of Big5's User-Defined Areas (UDAs), and they are mapped to the Private Use Area (PUA) in Unicode. (I'll open a new bug report when I'm sure that the new big5.{uf,ut} work.) Cheers, Anthony
Assignee | ||
Comment 8•22 years ago
|
||
AFAIK ftang wanted to take a closer look at the PDF (to review the patch).
Assignee | ||
Updated•22 years ago
|
Attachment #107366 -
Flags: review?(ftang) → review?(tao)
Comment on attachment 107366 [details] [diff] [review] Patch ftang is the real guru in this area :-)
Attachment #107366 -
Flags: review?(tao) → review?(ftang)
Comment 10•22 years ago
|
||
If there is yet any doubt as to the valid range of Big-5, the following is a definition of the original Big-5 standard as explained in the HKSCS-2001 standard: Big-5 is a fixed 2-octet coding scheme which consists of 13,053 traditional Chinese characters. It is a de facto industrial standard commonly used in Taiwan and the HKSAR. The ranges of the octets are as follows: First Octet : 0x81 - 0xFE Second Octet : 0x40 - 0x7E , 0xA1 - 0xFE The current coding assignment and architecture of Big-5 is shown as follows: Range Name of Block (Total code points) 8140 - 8DFE User-Defined Area 3 (UDA3) (2,041 code points) 8E40 - A0FE User-Defined Area 2 (UDA2) (2,983 code points) A140 - A3FE Big-5 Symbols and Control Codes (471 code points) A440 - C67E Big-5 Primary Character Set (5,401 code points) C6A1 - C8FE Vendor-Defined Area (VDA) (408 code points) C940 - F9D5 Big-5 Secondary Character Set (7,652 code points) F9D6 - F9FE Vendor-Defined Area (VDA) (41 code points) FA40 - FEFE User-Defined Area 1 (UDA1) (785 code points) Taiwan Chinese i18n/L10n developer Pofeng Lee also kindly pointed me to a few scanned pages from the ETen Chinese System manual: http://m2000.idv.tw/informer/big5/big5-eten/ especially this one: http://m2000.idv.tw/informer/big5/big5-eten/p6.jpg where B5+FA40 to B5+FEFE is User-Defined Area #1 (UDA1), and it is important that Mozilla support these UDAs or EUDC ranges too. :-) Another "evidence" is Microsoft's http://www.microsoft.com/typography/unicode/950.txt . It has B5+FA40-FEFE (among other ranges) mapped to Unicode's Private Use Area. Of course, extending from FC to FE requires corresponding updates to big5.uf and big5.ut, already provided in: http://bugzilla.mozilla.org/show_bug.cgi?id=9686 I just took the liberty of making this bug depend on Bug#9686. :-) Many thanks, Anthony
Depends on: 9686
Assignee | ||
Comment 11•22 years ago
|
||
I already requested review from him, but that was 1 Month ago and I'm still waiting...
Assignee | ||
Updated•21 years ago
|
Attachment #107366 -
Flags: superreview?(smontagu)
Attachment #107366 -
Flags: review?(smontagu)
Attachment #107366 -
Flags: review?(ftang)
Comment 12•21 years ago
|
||
If I understand correctly, this is more than dependent on bug 9686, since neither this patch nor the group of patches there will work without the other. Am I right?
Comment 13•21 years ago
|
||
Hello Simon, Hmm... Yes, something like that. Bug 9686 and Bug 181725 are mutually dependent on each other, so please fix both at the same time by applying the patches from both Bug reports. Many thanks, Anthony
Comment 14•21 years ago
|
||
Comment on attachment 107366 [details] [diff] [review] Patch r=smontagu and this is covered by blanket rs=roc+moz from bug 199143.
Attachment #107366 -
Flags: superreview?(smontagu)
Attachment #107366 -
Flags: superreview+
Attachment #107366 -
Flags: review?(smontagu)
Attachment #107366 -
Flags: review+
Comment 15•21 years ago
|
||
Checked in.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•