Closed
Bug 310299
Opened 19 years ago
Closed 19 years ago
Big5 Unicode Mapping Table Update
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla1.8.1beta1
People
(Reporter: piaip, Assigned: smontagu)
Details
(Keywords: fixed1.8.1)
Attachments
(4 files)
272.21 KB,
text/plain
|
Details | |
277.98 KB,
text/plain
|
Details | |
446.54 KB,
patch
|
smontagu
:
review+
jshin1987
:
review+
mconnor
:
approval1.8.1+
|
Details | Diff | Splinter Review |
426.61 KB,
patch
|
smontagu
:
review+
jshin1987
:
review+
mconnor
:
approval1.8.1+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.8b5) Gecko/20050921 Firefox/1.4
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.8b5) Gecko/20050921 Firefox/1.4
The Big5 (The most popular charset for Traditional Chinese) to Unicode mapping
table used in Mozilla source tree is last touched by bug #9686.
However the table should be updated again because of the following reason:
First please allow me to explain the brief history of #9686 and Big5 variants.
There are many Big5 variants (or, extensions) currently in use.
Windows has its own table named "CP950" which is widely used but it lacks
of some unicode mappings like Japanese hinakana/katakana which is included in
other Big5 variants and already used in many files/webpages/documents.
Mozilla's BIG5 table was similiar to CP950 before #9686.
So that's mainly what we did in bug #9686 - add these mappings and correct some
wrong mappings.
The most important Big5 variants are: (ordered by number of mappings from least
from most)
- CP950 (Used by Windows)
- Big5-2003 (Which is the official standard by Taiwan government now)
- UAO (Unicode-At-On, an un-official variant trying to add most CJK Unihan)
P.S: UAO is installed by many people in Taiwan. It was almost compatible with
Big5-2003 although the latest version is a little incompatible with
Big5-2003 and Big5-HKSCS.
A comparision table for Big5 variants and their code page can be found from
Big5-2003's introduction page: http://www.cns11643.gov.tw/web/big5/ (Chinese,
sorry)
The table currently used by Mozilla* now is very similiar to Big5-2003.
The problem is, if a user browsing non-Big5 pages (e.g., sjis or utf8)
copied some characters not in CP950 (e.g, Japanese hitakana) and pasted to
Big5 websites then other users with pure CP950 environment (e.g, a Japanese
using Japanese Windows and Internet Explorer) cannot see these characters
correctly. They will mostly get blank display. But if we use real CP950 table
then they will be encoded as HTML entity form so that everybody (even with
original CP950+IE) can read it correctly.
So I'd like to suggest following changes:
(1) Unicode -> Big5 should use the original CP950 table for most compatibility.
(2) Big5 -> Unicode can use Big5-2003, or even UAO.
P.S: does anyone know where to get "fromu" and "tou" which is required to
generate new table of Mozilla Big5 table?
Reproducible: Always
Steps to Reproduce:
1. Browser a SJIS or UTF8 web page and copy Japanese Hitakana/Katakana characters
2. Find a BIG5 website with text area forms (e.g, a php-BB forum), paste and submit
3. Browse the result page with non-Mozilla browsers (e.g: IE or Opera) on
non-Big5-2003 system (e.g: Windows, or unpatched Linux)
Actual Results:
Non-mozilla browsers see blank characters
Expected Results:
Should be Japanese hitakana/katakana characterse (in &12345; HTML entity form)
CP950 Unicode Mapping Table (from Unicode.org):
http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
Big5-2003 Unicode Mapping Table:
http://moztw.org/docs/big5/big5-2003.txt
Comment 1•19 years ago
|
||
It's good to know that Big5 has been standardized by Taiwanese government.
Something similar to what you suggested is done for a couple of encodings
Mozilla supports. Before going further, let me ask you a question. Is the
character repertoire of CP950 a subset of that of Big5-2003? Moreover, do
characters in the intersection of two have exactly the same code point
assignments in CP950 and Big5-2003? Well, I can check them out myself, but I'm
being lazy here thinking you'll be able to answer them more quickly..
OS: Windows XP → All
Hardware: PC → All
Reporter | ||
Comment 2•19 years ago
|
||
We (Mozilla Taiwan) are currently making new tables and asking members for
test for the new table. We'll try our best to complete these in few days and
we do hope it can be landed in Mozilla 1.8 branch.
(In reply to comment #1)
> Is the character repertoire of CP950 a subset of that of Big5-2003?
> Do characters in the intersection of two have exactly the same code point
> assignments in CP950 and Big5-2003?
I'm afraid that the answer may be "No".
Big5-2003 is a superset of CP950 in most case, but there is difference in
Symbols section.
9 characters in this section have "same looking" but different unicode value.
I mean, they look almost the same, like: (you may check these by Unicode.org
http://www.unicode.org/charts/unihan.html)
(Big5=0xA156) +U2015 +U2013
So we will need to put both 2015/2013 in "fromu" table.
Other different symbols are:
(Big5 B5-2003 CP950)
0xA1C2 +U203E +U00AF
0xA2A4 +U2501 +U2550
0xA2A5 +U251D +U255E
0xA2A6 +U253F +U256A
0xA2A7 +U2525 +U2561
0xA2CC +U3038 +U5341
0xA2CD +U3039 +U5344
0xA2CE +U303A +U5345
BTW, The UAO used only non-used (user private area) part of CP-950 so CP950
IS exactly a subset of UAO. UAO is also designed to be compatible with Big5-2003,
(but since it's a subset of CP950, it has same problem in Symbol section)
so our plan now is to make a "tou"(big5 to unicode) table based on Big5-2003
plus compatible UAO mappings.
Reporter | ||
Comment 3•19 years ago
|
||
Big5<->Unicode Mapping Tables
(all presented in [big5-value unicode-value] format)
(b2u=toU=big5->unicode, u2b=fromU=unicode->big5)
CP950
http://moztw.org/docs/big5/table/cp950-b2u.txt
http://moztw.org/docs/big5/table/cp950-u2b.txt
Big5-2003
http://moztw.org/docs/big5/table/big5_2003-b2u.txt
http://moztw.org/docs/big5/table/big5_2003-u2b.txt
UAO2.41
http://moztw.org/docs/big5/table/uao241-b2u.txt
http://moztw.org/docs/big5/table/uao241-u2b.txt
Reporter | ||
Comment 4•19 years ago
|
||
The draft version of the result table is:
http://moztw.org/docs/big5/table/moz18-b2u.txt
http://moztw.org/docs/big5/table/moz18-u2b.txt
I'll attach big5.ut and big5.uf after we complete and
verified several tests.
Reporter | ||
Comment 5•19 years ago
|
||
It seems like that the new table works fine for most people. The only special
case is for Hong Kong user (Hong Kong uses Big5 but they have their own
extension named Big5-hkscs, which is also supported by Mozilla as a
different charset).
Although Mozilla has "BIG5-HKSCS" charset, because IE has no "Big5-HKSCS"
(only Big5 in IE) so many web pages still describe themselves as "Big5" only.
For all non-HK users, the only way to see HKSCS on Mozilla is to set charset
to Big5-HKSCS so they won't get bothered by the new table. This also applies
to HK users who installed Big5 extensions which does not change System Font.
So exactly who'll be affected? Those installed Microsoft HKSCS (which changed
both system NLS table and system font) and browsing Big5-HKSCS pages (which use
only "Big5" in their content type meta directive) without setting charset to
Big5-HKSCS. Because MS HKSCS changed system font, it puts HK character glyphs
on the font's user private area (by the mappings of original Big5). So whether
the program converts multibyte to correct Unicode or not user can always "see"
correct glyphs ("see" only. Because they are actually different Unicode value
if copy/paste/written to disk).
This may be the only issue of the new table. If we want to be fully compatible,
we can change UAO in user private area back to BIG5-2003. However since there
is still big5-hkscs, maybe this is not necessary...
supports correct Unicode mapping or not
Reporter | ||
Comment 6•19 years ago
|
||
We've decided that it should be O.K to apply UAO extension table. Here is the
reason:
1. Mozilla DOES have a big5-Hkscs charset.
2. Many webpages which supports both ANSI text and HTML mode (e.g., a website
providing telnet/SSH services and newsgroup service) already used UAO charset.
A user can always succesfully browser Big5-HKSCS pages by Mozilla without
HKSCS extension installed on his PC, but a user cannot browse UAO pages even
with UAO extension installed.
Because the conflict comes from wrong meta information (charset=Big5) for those
Big5-HKSCS pages, we believe a better solution to this issue is to provide an
preference to determine "how to select which Big5 uconv to use", or an extension
that converts all charset=big5 meta request to big5-hkscs.
Reporter | ||
Comment 7•19 years ago
|
||
Reporter | ||
Comment 8•19 years ago
|
||
Reporter | ||
Comment 9•19 years ago
|
||
The final version of diff file for new Big5 table
Attachment #198205 -
Flags: review?(smontagu)
Reporter | ||
Comment 10•19 years ago
|
||
The final version of new table [with Big5-2003+UAO] of big5.ut
Attachment #198206 -
Flags: review?(smontagu)
Reporter | ||
Comment 11•19 years ago
|
||
Please use attachment 198205 [details] [diff] [review] and 198206 to patch new Big5 table.
They are already tested by several non-official builds of Firefox.
The big5.uf (unicode->big5) table is based on strict CP950. All mappings
to user private area and buggy areas are eliminated and followed CP950.
The big5.ut (big5->unicode) table is based on CP950 plus Big5-2003. (i.e.,
mappings conflicted between Big5-2003 and CP950 still follow CP950 for
compatibility to make it a complete subset of CP950) For user private area,
the mappings follow Big5-2003 and overriden by UAO2.41 extension.
Reporter | ||
Updated•19 years ago
|
Attachment #198205 -
Flags: review?(smontagu) → review?
Reporter | ||
Updated•19 years ago
|
Attachment #198206 -
Flags: review?(smontagu) → review?
Reporter | ||
Comment 12•19 years ago
|
||
One more comment. If you worry about compatibility, please at least
commit big5.uf (attach 198205) as soon as possible because it's bugging
more and more user recently and we do really hope it commited before the
incoming Fx1.5. Is this possible?
big5.ut (b->u) is somehow more like an "improvement" which changed a lot
while big5.uf (u->b) is basically original Big5/CP950 so it's almost harmless
in any concern and is a real "bug fix". However we still do wish big5.ut
to be commited at the same time.
The files are tested by several volunteers for a period and should be OK
for most user.
Reporter | ||
Comment 13•19 years ago
|
||
(In reply to comment #6)
> Because the conflict comes from wrong meta information (charset=Big5) for those
> Big5-HKSCS pages, we believe a better solution to this issue is to provide an
> preference to determine "how to select which Big5 uconv to use", or an extension
> that converts all charset=big5 meta request to big5-hkscs.
This can be solved by writing big5=BIG5-HKSCS in res/charsetalias.properties
Maybe we can split Big5-UAO as an independent locale (because it does not have
an official name in IANA yet) but it seems good enough now.
For a HKSCS user in the situation mentioned in comment #5, a solution is to
modify res/charsetalias.properties. (this may be achievd by an XPI.)
Reporter | ||
Comment 14•19 years ago
|
||
(In reply to comment #13)
> This can be solved by writing big5=BIG5-HKSCS in res/charsetalias.properties
> For a HKSCS user in the situation mentioned in comment #5, a solution is to
> modify res/charsetalias.properties. (this may be achievd by an XPI.)
A sample XPI to demonstrate this solution can be found from
http://moztw.org/dls/xpi/hkscs.xpi
Reporter | ||
Updated•19 years ago
|
Attachment #198205 -
Attachment description: cvs diff for /intl/uconv/ucvtw/big5.uf [fromu] → (patchset) cvs diff for /intl/uconv/ucvtw/big5.uf [fromu]
Reporter | ||
Updated•19 years ago
|
Attachment #198206 -
Attachment description: cvs diff for /intl/uconv/ucvtw/big5.ut [tou], Big5-2003+UAO → (patchset) cvs diff for /intl/uconv/ucvtw/big5.ut [tou], Big5-2003+UAO
Reporter | ||
Comment 15•19 years ago
|
||
The patches has been tested by Taiwan users for a while (by un-official
community builds) so they should be stable enough to be commited for 1.8 and trunk.
Flags: blocking1.8rc1?
Comment 16•19 years ago
|
||
tool late in the game to block on non-critical changes.
Flags: blocking1.8rc1? → blocking1.8rc1-
Reporter | ||
Updated•19 years ago
|
Flags: blocking1.9a1?
Flags: blocking1.8.1?
Updated•19 years ago
|
Flags: blocking1.8.1?
Comment 17•19 years ago
|
||
I wonder if mozilla can apply the big5-2003 + UAO patch to firefox 2.0?
Leaving this problem unsolved will just continue to bring inconvenience to chinese users.
Attachment #198205 -
Flags: review? → review?(smontagu)
Attachment #198206 -
Flags: review? → review?(smontagu)
Flags: blocking1.8.1? → blocking1.8.1+
Assignee | ||
Comment 18•19 years ago
|
||
Comment on attachment 198205 [details] [diff] [review]
(patchset) cvs diff for /intl/uconv/ucvtw/big5.uf [fromu]
I can't assess these patches codepoint by codepoint, but I am happy to accept them based on comments 12 and 15. Auto-generated table patches in intl don't need super-review, but I'd like jshin's approval before checking in.
Attachment #198205 -
Flags: review?(smontagu)
Attachment #198205 -
Flags: review?(jshin1987)
Attachment #198205 -
Flags: review+
Assignee | ||
Updated•19 years ago
|
Attachment #198206 -
Flags: review?(smontagu)
Attachment #198206 -
Flags: review?(jshin1987)
Attachment #198206 -
Flags: review+
Comment 19•19 years ago
|
||
Thanks a lot smontagu!
Hope these patches can be commited before the official release of firefox 2.0
Comment 20•19 years ago
|
||
Comment on attachment 198205 [details] [diff] [review]
(patchset) cvs diff for /intl/uconv/ucvtw/big5.uf [fromu]
Sorry for the long delay.
I'll edit big5.uf and big5.ut to add the urls of conversion tables you used.
lxr will point back at this bug so that we can do without that, but still it is nice to have that.
Attachment #198205 -
Flags: review?(jshin1987) → review+
Comment 21•19 years ago
|
||
Comment on attachment 198206 [details] [diff] [review]
(patchset) cvs diff for /intl/uconv/ucvtw/big5.ut [tou], Big5-2003+UAO
r=jshin
Attachment #198206 -
Flags: review?(jshin1987) → review+
Comment 22•19 years ago
|
||
Thank you jshin!
BTW, apart from big5-2003, there is a bug about big5-hkscs table...
The one that mozilla use is too old.
The Hong Kong government has updated the big5-hkscs table in 2004 on its official site...
I hope mozilla can fix this bug as well.
Here is the table releaed by hk government:
http://www.info.gov.hk/digital21/chi/hkscs/download/hkscs-2004-big5-iso.txt
For more information about the update, please go to
http://www.info.gov.hk/digital21/eng/hkscs/mapping_table.html
Updated•19 years ago
|
Whiteboard: [checkin needed]
Target Milestone: --- → mozilla1.8.1beta1
Updated•19 years ago
|
Attachment #198205 -
Flags: approval1.8.1+
Updated•19 years ago
|
Attachment #198206 -
Flags: approval1.8.1+
Assignee | ||
Comment 23•19 years ago
|
||
Checked in to trunk
Assignee | ||
Comment 24•19 years ago
|
||
Checked in to MOZILLA_1_8_BRANCH. BTW, I added links to the conversion tables as suggested in comment 20 to all checkins.
Status: NEW → RESOLVED
Closed: 19 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Whiteboard: [checkin needed]
You need to log in
before you can comment on or make changes to this bug.
Description
•