Closed Bug 181725 Opened 22 years ago Closed 21 years ago

nsBIG5ToUnicode.cpp: Big5 code range is incorrect ...

Categories

(Core :: Internationalization, defect)

x86
Windows 98
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: s793016, Assigned: mcsmurf)

References

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Win 9x 4.90; zh-TW; rv:1.0.0) Gecko/20020530
Build Identifier: Mozilla/5.0 (Windows; U; Win 9x 4.90; zh-TW; rv:1.0.0) Gecko/20020530

Sory, my English is very very poor ... ;d

inside nsBIG5ToUnicode.cpp, line 66:

static const uRange g_BIG5Ranges[] = {
  { 0x00, 0x7E },
  { 0x81, 0xFC }
};

Should changed to:
static const uRange g_BIG5Ranges[] = {
  { 0x00, 0x7E },
  { 0x81, 0xFE }
};

... That's All ...

Reproducible: Always

Steps to Reproduce:
1.
2.
3.




http://umunhum.stanford.edu/~lee/chicomp/DBencoding.html
==
2. Generic Mapping 

The specific Big5 code space contains 127x188 = 23876 code points, where the
valid code range of the first byte is 0x80--0xFE, and that of the second byte is
0x21--0x7E and 0xA1--0xFE ...
==
-> intl
Assignee: asa → smontagu
Component: Browser-General → Internationalization
QA Contact: asa → ylong
But it also says "0x80--0xFE". Shouldn't then the 0x81 changed to 0x80?
Sorry ... my English is very very poor ... :(

Sorry again, please forgot that web link ... ^^;

Now, let's take a look with nsBIG5HKSCSToUnicode.cpp, line 75:
==
static const uRange g_BIG5HKSCSRanges[] = {
  { 0x00, 0x7E },
  { 0x81, 0xA0 },
  { 0xA1, 0xC6 },
  { 0xC6, 0xC8 },
  { 0xC9, 0xF9 },
  { 0xF9, 0xFE }
};
==

... and take a look with nsBIG5ToUnicode.cpp, line 66:
==
static const uRange g_BIG5Ranges[] = {
  { 0x00, 0x7E },
  { 0x81, 0xFC }
};
==

See??  

The Big5-hkscs code range is "0x00 - 0x7E, 0x81 - 0xFE", and the big5 code range
is "0x00 - 0x7E, 0x81 - 0xFC" ... 

As I know, Big5-hkscs was a subset of Big5, and the code range between them must
 be the same!

http://www.info.gov.hk/digital21/eng/hkscs/download/e_sect2.pdf
Code Allocation of the HKSCS-2001 in Big-5 (2.2MB)

So, big5 code range inside nsBIG5ToUnicode.cpp is incorrect!
Attached patch PatchSplinter Review
Attachment #107366 - Flags: review?(ftang)
-->me
Assignee: smontagu → mcsmurf
. (sorry for spam)
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Hello,

(Shanjian, I hope you don't mind me adding you to the Cc: list.  :-)

Yes, I can confirm that Frank Wein's proposed fix is correct.  Big5 encodings
(the original Big-5, the new Big-5E and Big5-HKSCS) have the same range:

  * 1st byte: 0x81-0xFE 
  * 2nd byte: 0x40-0x7E, 0xA1-0xFE

The current big5.uf and big5.ut do not contain mappings in 1st-byte 0x81-0xA0
and 0xFD-0xFE, but they should: These are part of Big5's User-Defined Areas
(UDAs), and they are mapped to the Private Use Area (PUA) in Unicode.  (I'll
open a new bug report when I'm sure that the new big5.{uf,ut} work.)

Cheers,

Anthony
AFAIK ftang wanted to take a closer look at the PDF (to review the patch).
Attachment #107366 - Flags: review?(ftang) → review?(tao)
Comment on attachment 107366 [details] [diff] [review]
Patch

ftang is the real guru in this area :-)
Attachment #107366 - Flags: review?(tao) → review?(ftang)
If there is yet any doubt as to the valid range of Big-5, the following is a
definition of the original Big-5 standard as explained in the HKSCS-2001 standard:

     Big-5 is a fixed 2-octet coding scheme which consists of 13,053 traditional
     Chinese characters. It is a de facto industrial standard commonly used in
     Taiwan and the HKSAR.  The ranges of the octets are as follows:

        First Octet :   0x81 - 0xFE
        Second Octet :  0x40 - 0x7E , 0xA1 - 0xFE

     The current coding assignment and architecture of Big-5 is shown as
     follows:

        Range           Name of Block                   (Total code points)

        8140 - 8DFE     User-Defined Area 3 (UDA3)      (2,041 code points)
        8E40 - A0FE     User-Defined Area 2 (UDA2)      (2,983 code points)
        A140 - A3FE     Big-5 Symbols and Control Codes (471 code points)
        A440 - C67E     Big-5 Primary Character Set     (5,401 code points)
        C6A1 - C8FE     Vendor-Defined Area (VDA)       (408 code points)
        C940 - F9D5     Big-5 Secondary Character Set   (7,652 code points)
        F9D6 - F9FE     Vendor-Defined Area (VDA)       (41 code points)
        FA40 - FEFE     User-Defined Area 1 (UDA1)      (785 code points)

Taiwan Chinese i18n/L10n developer Pofeng Lee also kindly pointed me to a few
scanned pages from the ETen Chinese System manual:

    http://m2000.idv.tw/informer/big5/big5-eten/

especially this one:

    http://m2000.idv.tw/informer/big5/big5-eten/p6.jpg

where B5+FA40 to B5+FEFE is User-Defined Area #1 (UDA1), and it is important
that Mozilla support these UDAs or EUDC ranges too.  :-)

Another "evidence" is Microsoft's
http://www.microsoft.com/typography/unicode/950.txt .  It has B5+FA40-FEFE
(among other ranges) mapped to Unicode's Private Use Area.

Of course, extending from FC to FE requires corresponding updates to big5.uf and
big5.ut, already provided in:

    http://bugzilla.mozilla.org/show_bug.cgi?id=9686

I just took the liberty of making this bug depend on Bug#9686.  :-)

Many thanks,

Anthony
Depends on: 9686
I already requested review from him, but that was 1 Month ago and I'm still
waiting...
Attachment #107366 - Flags: superreview?(smontagu)
Attachment #107366 - Flags: review?(smontagu)
Attachment #107366 - Flags: review?(ftang)
If I understand correctly, this is more than dependent on bug 9686, since
neither this patch nor the group of patches there will work without the other.
Am I right? 
Hello Simon,

Hmm... Yes, something like that.  Bug 9686 and Bug 181725 are mutually dependent
on each other, so please fix both at the same time by applying the patches from
both Bug reports.

Many thanks,

Anthony
Comment on attachment 107366 [details] [diff] [review]
Patch

r=smontagu and this is covered by blanket rs=roc+moz from bug 199143.
Attachment #107366 - Flags: superreview?(smontagu)
Attachment #107366 - Flags: superreview+
Attachment #107366 - Flags: review?(smontagu)
Attachment #107366 - Flags: review+
Checked in.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: