Closed Bug 9686 Opened 25 years ago Closed 21 years ago

[converter]BIG-5's UDC not mapped to Unicode's PUA

Categories

(Core :: Internationalization, defect, P3)

x86
Windows 95
defect

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: sammylaw, Assigned: ftang)

References

Details

(Keywords: intl)

Attachments

(5 files)

BIG-5, a popular Internet code for Traditional Chinese,
has region from 0xFA40 - 0xFEFE which is known the User-defined
area (UDA).

It is found that the codes inside UDA are not mapped to Unicode's
PUA (Private Use Area) starting from 0xE000.

As a result, characters in UDA has not shown properly.

regards,
Sammy
Status: NEW → ASSIGNED
Target Milestone: M11
Should do this in the same time when we rewrie the Big5 converter for
performance.
Summary: BIG-5's UDC not mapped to Unicode's PUA → [converter]BIG-5's UDC not mapped to Unicode's PUA
*** Bug 7963 has been marked as a duplicate of this bug. ***
Target Milestone: M11 → M10
Target Milestone: M10 → M14
change to M14 since this is post beta
Target Milestone: M14 → M16
I don't think this is a bets stopper. Move to M16
move it to M17
Target Milestone: M16 → M17
yueheng.xu@intel.com - can you also take care Big5 converter in ucvtw directory 
and change it to the same way you did for GBK ? You should aslo remember adding 
these UDC characters to both Big5 and GB converters.
Assignee: ftang → yueheng.xu
Status: ASSIGNED → NEW
Target Milestone: M17 → M20
I will take a look of this after the printing bugs and GBK converer bugs I 
owned are resolved.
Status: NEW → ASSIGNED
re-assigned to ftang
Assignee: yueheng.xu → ftang
Status: ASSIGNED → NEW
cata- you probably should also consider this in mind when you add the HKSCS 
works
Assignee: ftang → cata
Status: NEW → ASSIGNED
Target Milestone: M20 → Future
Keywords: intl
move all cata's bug to ftang
Assignee: cata → ftang
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
Hello all,

(Shanjian, hope you don't mind me adding you to the Cc: list again.  :-)
(And Francis, I added you to the Cc: list too because your Mozilla zh_TW L10n
page reminded me of th e problem.)

This is a continuation of what I did for hkscs.uf and hkscs.ut.  See

    http://bugzilla.mozilla.org/show_bug.cgi?id=182089

(Since I have the scripts already, generating big5.uf and big5.ut is easy too.  :-)
The UDAs/PUA mappings are now filled in, as well as some minor changes, e.g. the
Japanese hirigana and katagana are added.  It was claimed to be added in
Bug#21209, but I could find no trace of the Japanese hirigana/katagana mappings
in big5.uf and big5.ut, and Taiwan users were somewhat inconvenienced as they
had to manually switch to Big5-HKSCS locale before they could see the
hirigana/katagana, but they shouldn't need to.  See:

    http://www.csie.ntu.edu.tw/~b7506051/mozilla/faq.html

and search for the string "BIG5-HK".  AFAIK, with these new big5.uf and big5.ut,
they should display correctly with modern CJK fonts, i.e. the Japanese
hirigana/katagana should be in the U+3000 area.  On Linux, Arphic "AR PL
Mingti2L Big5" TrueType fonts only have them in the PUA, but those are old fonts
that will need to be revised.  (We intend to revise the CMap of the font itself
once the OpenI18N-big5 subgroup's new TW-BIG5  is finalized.)  MingLiU.ttc have
these in U+3000, not in PUA, and I assume it is the same on MacOS, so putting
them in the U+3000 is the right way to go, IMHO.

I'll upload the relevant files to this bug report.  If you feel that it is
better to open another bug report, please let me know.  :-)

Cheers,

Anthony
This is a slightly revised version of

   http://bugzilla.mozilla.org/attachment.cgi?id=108882&action=view

Attachment #108882 [details] is used for generating hkscs-{uf,ut}.txt, whereas this one
is for big5-{uf,ut}.txt, with $hkscs_mode = 0 and $strict_tw_big5 = 0.	:-)

As for these series of attachment, all I can say is, WORKSFORME on GNU/Linux
with the appropriate font installed.  :-)  Francis (piaip), if you have time,
please try this out on your system (MS Windows?).  You'll need to apply 3
patches:

  1. http://bugzilla.mozilla.org/attachment.cgi?id=108964
  2. http://bugzilla.mozilla.org/attachment.cgi?id=108965
  3. http://bugzilla.mozilla.org/attachment.cgi?id=107366

(1. big5.uf; 2. big5.ut; 3. nsBIG5ToUnicode.cpp (see
http://bugzilla.mozilla.org/show_bug.cgi?id=181725 for more info on Big5 range
in nsBIG5ToUnicode.cpp).

If all is successful, you should be able to see the Japanese hirigana/katagana
characters _without_ switching to Big5-HKSCS encoding.	(By the way, it is
"HKSCS" for "Hong Kong Supplementary Character Set", not "HKCS"  ;-)

Cheers,

Anthony
Thanks Anthony for all your effort. That really helps.
BTW I want to make some notes here.
Refer to openi18n-big5:
 http://i18n.linux.org.tw/openi18n/big5/index.html.en
Because currently we have so many different tweaked
versions of Big5 charset standard, openi18n-big5 is trying to
make a better and more compatible Big5 standard and it has
a big chance to be the 'official' standard in the future.
I strongly suggest Mozilla to use their charset standard.
The difference of Big5 charsets can be fetched here:
http://i18n.linux.org.tw/openi18n/big5/big5-diff.html
Column 2 (TW-Big5) in the table above is the new standard.
It is still in draft now, and I will post here again if
it becomes the national standard of Taiwan.
Blocks: 181725
the patch has been existed for serveral months.
can we have it commited in next release? like, in 1.4?
smontagu, could you take a look at the patches here and review them?
Blocks: 212128
I would like to review these patches, but I don't feel I know enough about Big5
or Chinese in general. Jungshik, can you look at them also?

Is there an up-to-date Big5 table somewhere in the same format as
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT ? If so,
I have a script which will generate a test page. (I could use attachment 108967 [details],
but it seems a bit circular to test the patch against one of the files which was
used to generate it)
Simon, I think I just have to defer to Anthony's expertise on Big5/Big5-HKSCS.
He contributed Big5/Big5-HKSCS support to Perl and glibc (and other projects)
with attachment 108971 [details] (or similar)

ICU may have a separate table, but I strongly doubt that we can regard theirs as
more authoritative than Anthony's. 

re comment #17:

> versions of Big5 charset standard, openi18n-big5 is trying to
> make a better and more compatible Big5 standard and it has
> a big chance to be the 'official' standard in the future.
> I strongly suggest Mozilla to use their charset standard.

  Is Anthony's patch for this charset standard? I can't access either of two
links you gave. 
I've read up on the differences between Big5 and Big5-HKSCS and tested with the
cp950 table at www.unicode.org and what I could find at ICU. r=smontagu on the
three patches listed in comment 16; checked in under blanket rs=roc+moz from bug
199143.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
The official version of Big5 standard is "Big5-2003", which is the successor of
TW-Big5.

Official introduction page of Big5-2003: (this page is only in Chinese now, sorry)
http://www.cns11643.gov.tw/web/big5/

A BIG5 <-> UNICODE mapping table in same format as Unicode.org's BIG5.txt can be
found here:
http://moztw.org/docs/big5/big5-2003.txt
Since Microsoft Windows is still using CP950 (Windows Big5 table)
we found some scenario that is bugging users:

First I have to explain, there are many Big5 variants (or, extensions) currently
in use. Windows has its own table named "CP950" which is widely used but it lacks
of some unicode mappings like Japanese hinakana/katakana which is included in
other Big5 variants and already used in many files/webpages/documents.

The most important Big5 variants are: (ordered by number of mappings from least
from most)
- CP950      (Used by Windows)
- Big5-2003  (Which is the official standard now)
- UAO        (Unicode-At-On, an un-official variant trying to add most CJK Unihan)
  P.s: UAO is installed by many people in Taiwan. It's almost compatible with
       Big5-2003 although the latest version is a little incompatible with 
       Big5-2003 and Big5-HKSCS.

The current table used by Mozilla* now is very similiar to Big5-2003.

The problem is, if a user is browsing non-Big5 pages (e.g., sjis or utf8), 
copied some characters not in CP950 (e.g, Japanese hitakana), and pasted to 
Big5 websites then other users with pure CP950 environment (e.g, a Japanese
using Japanese Windows and Internet Explorer) cannot see these characters
correctly. They will mostly get blank display.

So I'd like to suggest following changes:
(1) Unicode -> Big5 should use the original CP950 table for most compatibility.
(2) Big5 -> Unicode can use Big5-2003, or even UAO.

I can provide Big5<->Unicode tables (see previous comment for Big5-2003. A CP950
table can be found from Unicode.org:
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT) 
but I don't know where to get "fromu" and "tou" so I can't make patches right now. 
Can anyone help? And should we reopen this bug, or file a new one?
Please file this as a new bug.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: