Last Comment Bug 343129 - Big5-HKSCS 2004 <==> Unicode Table Update
: Big5-HKSCS 2004 <==> Unicode Table Update
Status: RESOLVED FIXED
: fixed1.8.1
Product: Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: All All
: -- normal with 11 votes (vote)
: mozilla1.8.1beta2
Assigned To: Simon Montagu :smontagu
: Yuying Long
: Makoto Kato [:m_kato]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-06-29 08:27 PDT by Ho Fung Wong
Modified: 2006-08-20 20:53 PDT (History)
8 users (show)
mbeltzner: blocking1.8.1+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch (597.86 KB, patch)
2006-08-06 08:26 PDT, Simon Montagu :smontagu
no flags Details | Diff | Splinter Review
diff of the intermediate file generated by the perl script (8.53 KB, patch)
2006-08-08 03:01 PDT, Simon Montagu :smontagu
no flags Details | Diff | Splinter Review
diff of hkscs.uf and hkscs.ut for checkin (612.66 KB, patch)
2006-08-08 06:26 PDT, Simon Montagu :smontagu
jshin1987: review+
mtschrep: approval1.8.1+
Details | Diff | Splinter Review

Description Ho Fung Wong 2006-06-29 08:27:39 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4

After the release of Big5-HKSCS 2001, the Hong Kong government updated the Big5-HKSCS table in 2004 and added many new chinese characters. The new table is available publicly for downloads on the official website of Hong Kong Government.

So, the Big5-HKSCS table that Mozilla is using is outdated and it is causing troubles to Chinese communities because many words cannot be displayed properly...

I hope mozilla can update this table ASAP so that Chinese user can view webpages written in Big5-HKSCS 2004 correctly. 

Here is the new BIG5-HKSCS table released by the Hong Kong Government:
http://www.info.gov.hk/digital21/chi/hkscs/download/hkscs-2004-big5-iso.txt

For more information about the update, please go to
http://www.info.gov.hk/digital21/eng/hkscs/mapping_table.html

Reproducible: Always
Comment 1 K' 2006-07-04 02:04:16 PDT
I hope mozilla can update this table.
Comment 2 coolstar1980 2006-07-04 08:55:14 PDT
update , update, update.
Comment 3 Ho Fung Wong 2006-08-02 11:32:25 PDT
I wonder if Mozilla is gonna do anything to this issue?
This bug is causing lots of troubles to Hong Kong people...
I wish the Big5-HKSCS 2004 Unicode Table can be updated ASAP
Comment 4 Simon Montagu :smontagu 2006-08-03 04:50:30 PDT
http://www.microsoft.com/typography/unicode/950.txt used by intl/uconv/tools/gen-big5hkscs-2001-mozilla.pl doesn't seem to exist any more. I There is http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT, but I don't know if it has the same format. I may have to do some reverse engineering.
Comment 5 Ho Fung Wong 2006-08-03 11:32:28 PDT
It should be the same CP950 table used by Microsoft.
And it doesn't matter if it's the same or not...
Why don't we just use the new big5-hkscs table released by hk government?
http://www.info.gov.hk/digital21/chi/hkscs/download/hkscs-2004-big5-iso.txt 
Comment 6 Simon Montagu :smontagu 2006-08-03 11:48:15 PDT
The last version of the Big5-HKSCS conversion tables was generated from three files:
http://www.microsoft.com/typography/unicode/950.txt
http://www.info.gov.hk/digital21/chi/hkscs/download/big5-iso.txt
http://www.info.gov.hk/digital21/chi/hkscs/download/big5cmp.txt

If the Hong Kong government files are sufficient, I'll adjust the generation script to use them.
Comment 7 Roy Tam 2006-08-03 18:42:05 PDT
(In reply to comment #6)
> The last version of the Big5-HKSCS conversion tables was generated from three
> files:
> http://www.microsoft.com/typography/unicode/950.txt
> http://www.info.gov.hk/digital21/chi/hkscs/download/big5-iso.txt
> http://www.info.gov.hk/digital21/chi/hkscs/download/big5cmp.txt
> 
> If the Hong Kong government files are sufficient, I'll adjust the generation
> script to use them.
> 

"hkscs-2004-big5-iso.txt" acts like "big5-iso.txt"
That means, generating whole table still requires "CP950.TXT" or "950.txt"
Comment 9 Ho Fung Wong 2006-08-06 18:05:33 PDT
A page in BIG5-HKSCS:
http://input.foruto.com/jptxt/arti003.htm
Comment 10 Ho Fung Wong 2006-08-06 18:15:04 PDT
One more:
http://cs-people.bu.edu/butta1/personal/hkscs/hkscs-oct.html
Comment 11 Ho Fung Wong 2006-08-06 18:18:35 PDT
A site with the latest BIG5-HKSCS characters
http://code.web.idv.hk/h2u/h2u.php
Comment 12 Simon Montagu :smontagu 2006-08-08 03:01:01 PDT
Created attachment 232685 [details] [diff] [review]
diff of the intermediate file generated by the perl script

It's probably more informative to see a diff of the files from which hkscs.ut and hkscs.uf are generated.

Things to notice: there are no new entries in the .ut file (from Big5 to Unicode). All the new characters were already mapped to the PUA. These mappings have been changed to the mappings in the new BIG5-HKSCS table, except in the case of mappings to Unicode Plane 2, which still use the old PUA mappings (we can't change that until bug 162431 is fixed).

In the .ut file (Unicode to Big5), I've removed the additional mappings from the "Kangxi Radicals" area mentioned in bug 182089 comment 23, since they don't seem to be in the HKSCS-2004 table.
Comment 13 Ho Fung Wong 2006-08-08 06:07:37 PDT
Thx for the patch!
SO when can this patch be checked in?
Comment 14 Simon Montagu :smontagu 2006-08-08 06:26:38 PDT
Created attachment 232712 [details] [diff] [review]
diff of hkscs.uf and hkscs.ut for checkin
Comment 15 Jungshik Shin 2006-08-08 21:37:15 PDT
Comment on attachment 232712 [details] [diff] [review]
diff of hkscs.uf and hkscs.ut for checkin

r=jshin
Comment 16 Simon Montagu :smontagu 2006-08-09 02:49:40 PDT
Checked in.
Comment 17 Simon Montagu :smontagu 2006-08-09 03:11:29 PDT
(In reply to comment #16)
> Checked in.

Actually not, I'm having problems with CVS.
Comment 18 Simon Montagu :smontagu 2006-08-09 03:50:19 PDT
Really checked in.
Comment 19 Ian Macfarlane 2006-08-18 07:59:52 PDT
Just thought I'd put on the radar, though it may be too late (though maybe not, as it's only a data file update). Awfully long wait for Firefox 3 so people in Hong Kong (and others around the world) can read their own language properly :-/

(if it's too late for Firefox 2, perhaps it can be considered for the first point release afterwards)
Comment 20 Mike Beltzner [:beltzner, not reading bugmail] 2006-08-18 10:42:48 PDT
Marking blocking the final release. Didn't we take a big Unicode 5.0 update? Does this add on to that?
Comment 21 Simon Montagu :smontagu 2006-08-19 12:54:54 PDT
(In reply to comment #20)
> Marking blocking the final release. Didn't we take a big Unicode 5.0 update?
> Does this add on to that?

This is orthogonal to that. These data tables are for conversion between the Big HKSCS legacy code page and Unicode
Comment 22 Simon Montagu :smontagu 2006-08-19 12:55:40 PDT
Comment on attachment 232712 [details] [diff] [review]
diff of hkscs.uf and hkscs.ut for checkin

Asking approval for this data-file only patch.
Comment 23 Mike Schroepfer 2006-08-20 19:07:46 PDT
Comment on attachment 232712 [details] [diff] [review]
diff of hkscs.uf and hkscs.ut for checkin

a=schrep for drivers
Comment 24 :Gavin Sharp [email: gavin@gavinsharp.com] 2006-08-20 20:53:37 PDT
I checked this in on the branch so that it could make the b2 candidate builds.
mozilla/intl/uconv/ucvtw/hkscs.ut 	1.3.92.1
mozilla/intl/uconv/ucvtw/hkscs.uf 	1.3.92.1

Note You need to log in before you can comment on or make changes to this bug.