Closed
Bug 310299
Opened 19 years ago
Closed 18 years ago
Big5 Unicode Mapping Table Update
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla1.8.1beta1
People
(Reporter: piaip, Assigned: smontagu)
Details
(Keywords: fixed1.8.1)
Attachments
(4 files)
|
272.21 KB,
text/plain
|
Details | |
|
277.98 KB,
text/plain
|
Details | |
|
446.54 KB,
patch
|
smontagu
:
review+
jshin1987
:
review+
mconnor
:
approval1.8.1+
|
Details | Diff | Splinter Review |
|
426.61 KB,
patch
|
smontagu
:
review+
jshin1987
:
review+
mconnor
:
approval1.8.1+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.8b5) Gecko/20050921 Firefox/1.4 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.8b5) Gecko/20050921 Firefox/1.4 The Big5 (The most popular charset for Traditional Chinese) to Unicode mapping table used in Mozilla source tree is last touched by bug #9686. However the table should be updated again because of the following reason: First please allow me to explain the brief history of #9686 and Big5 variants. There are many Big5 variants (or, extensions) currently in use. Windows has its own table named "CP950" which is widely used but it lacks of some unicode mappings like Japanese hinakana/katakana which is included in other Big5 variants and already used in many files/webpages/documents. Mozilla's BIG5 table was similiar to CP950 before #9686. So that's mainly what we did in bug #9686 - add these mappings and correct some wrong mappings. The most important Big5 variants are: (ordered by number of mappings from least from most) - CP950 (Used by Windows) - Big5-2003 (Which is the official standard by Taiwan government now) - UAO (Unicode-At-On, an un-official variant trying to add most CJK Unihan) P.S: UAO is installed by many people in Taiwan. It was almost compatible with Big5-2003 although the latest version is a little incompatible with Big5-2003 and Big5-HKSCS. A comparision table for Big5 variants and their code page can be found from Big5-2003's introduction page: http://www.cns11643.gov.tw/web/big5/ (Chinese, sorry) The table currently used by Mozilla* now is very similiar to Big5-2003. The problem is, if a user browsing non-Big5 pages (e.g., sjis or utf8) copied some characters not in CP950 (e.g, Japanese hitakana) and pasted to Big5 websites then other users with pure CP950 environment (e.g, a Japanese using Japanese Windows and Internet Explorer) cannot see these characters correctly. They will mostly get blank display. But if we use real CP950 table then they will be encoded as HTML entity form so that everybody (even with original CP950+IE) can read it correctly. So I'd like to suggest following changes: (1) Unicode -> Big5 should use the original CP950 table for most compatibility. (2) Big5 -> Unicode can use Big5-2003, or even UAO. P.S: does anyone know where to get "fromu" and "tou" which is required to generate new table of Mozilla Big5 table? Reproducible: Always Steps to Reproduce: 1. Browser a SJIS or UTF8 web page and copy Japanese Hitakana/Katakana characters 2. Find a BIG5 website with text area forms (e.g, a php-BB forum), paste and submit 3. Browse the result page with non-Mozilla browsers (e.g: IE or Opera) on non-Big5-2003 system (e.g: Windows, or unpatched Linux) Actual Results: Non-mozilla browsers see blank characters Expected Results: Should be Japanese hitakana/katakana characterse (in &12345; HTML entity form) CP950 Unicode Mapping Table (from Unicode.org): http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT Big5-2003 Unicode Mapping Table: http://moztw.org/docs/big5/big5-2003.txt
Comment 1•19 years ago
|
||
It's good to know that Big5 has been standardized by Taiwanese government. Something similar to what you suggested is done for a couple of encodings Mozilla supports. Before going further, let me ask you a question. Is the character repertoire of CP950 a subset of that of Big5-2003? Moreover, do characters in the intersection of two have exactly the same code point assignments in CP950 and Big5-2003? Well, I can check them out myself, but I'm being lazy here thinking you'll be able to answer them more quickly..
OS: Windows XP → All
Hardware: PC → All
| Reporter | ||
Comment 2•19 years ago
|
||
We (Mozilla Taiwan) are currently making new tables and asking members for test for the new table. We'll try our best to complete these in few days and we do hope it can be landed in Mozilla 1.8 branch. (In reply to comment #1) > Is the character repertoire of CP950 a subset of that of Big5-2003? > Do characters in the intersection of two have exactly the same code point > assignments in CP950 and Big5-2003? I'm afraid that the answer may be "No". Big5-2003 is a superset of CP950 in most case, but there is difference in Symbols section. 9 characters in this section have "same looking" but different unicode value. I mean, they look almost the same, like: (you may check these by Unicode.org http://www.unicode.org/charts/unihan.html) (Big5=0xA156) +U2015 +U2013 So we will need to put both 2015/2013 in "fromu" table. Other different symbols are: (Big5 B5-2003 CP950) 0xA1C2 +U203E +U00AF 0xA2A4 +U2501 +U2550 0xA2A5 +U251D +U255E 0xA2A6 +U253F +U256A 0xA2A7 +U2525 +U2561 0xA2CC +U3038 +U5341 0xA2CD +U3039 +U5344 0xA2CE +U303A +U5345 BTW, The UAO used only non-used (user private area) part of CP-950 so CP950 IS exactly a subset of UAO. UAO is also designed to be compatible with Big5-2003, (but since it's a subset of CP950, it has same problem in Symbol section) so our plan now is to make a "tou"(big5 to unicode) table based on Big5-2003 plus compatible UAO mappings.
| Reporter | ||
Comment 3•19 years ago
|
||
Big5<->Unicode Mapping Tables (all presented in [big5-value unicode-value] format) (b2u=toU=big5->unicode, u2b=fromU=unicode->big5) CP950 http://moztw.org/docs/big5/table/cp950-b2u.txt http://moztw.org/docs/big5/table/cp950-u2b.txt Big5-2003 http://moztw.org/docs/big5/table/big5_2003-b2u.txt http://moztw.org/docs/big5/table/big5_2003-u2b.txt UAO2.41 http://moztw.org/docs/big5/table/uao241-b2u.txt http://moztw.org/docs/big5/table/uao241-u2b.txt
| Reporter | ||
Comment 4•19 years ago
|
||
The draft version of the result table is: http://moztw.org/docs/big5/table/moz18-b2u.txt http://moztw.org/docs/big5/table/moz18-u2b.txt I'll attach big5.ut and big5.uf after we complete and verified several tests.
| Reporter | ||
Comment 5•19 years ago
|
||
It seems like that the new table works fine for most people. The only special
case is for Hong Kong user (Hong Kong uses Big5 but they have their own
extension named Big5-hkscs, which is also supported by Mozilla as a
different charset).
Although Mozilla has "BIG5-HKSCS" charset, because IE has no "Big5-HKSCS"
(only Big5 in IE) so many web pages still describe themselves as "Big5" only.
For all non-HK users, the only way to see HKSCS on Mozilla is to set charset
to Big5-HKSCS so they won't get bothered by the new table. This also applies
to HK users who installed Big5 extensions which does not change System Font.
So exactly who'll be affected? Those installed Microsoft HKSCS (which changed
both system NLS table and system font) and browsing Big5-HKSCS pages (which use
only "Big5" in their content type meta directive) without setting charset to
Big5-HKSCS. Because MS HKSCS changed system font, it puts HK character glyphs
on the font's user private area (by the mappings of original Big5). So whether
the program converts multibyte to correct Unicode or not user can always "see"
correct glyphs ("see" only. Because they are actually different Unicode value
if copy/paste/written to disk).
This may be the only issue of the new table. If we want to be fully compatible,
we can change UAO in user private area back to BIG5-2003. However since there
is still big5-hkscs, maybe this is not necessary...
supports correct Unicode mapping or not | Reporter | ||
Comment 6•19 years ago
|
||
We've decided that it should be O.K to apply UAO extension table. Here is the reason: 1. Mozilla DOES have a big5-Hkscs charset. 2. Many webpages which supports both ANSI text and HTML mode (e.g., a website providing telnet/SSH services and newsgroup service) already used UAO charset. A user can always succesfully browser Big5-HKSCS pages by Mozilla without HKSCS extension installed on his PC, but a user cannot browse UAO pages even with UAO extension installed. Because the conflict comes from wrong meta information (charset=Big5) for those Big5-HKSCS pages, we believe a better solution to this issue is to provide an preference to determine "how to select which Big5 uconv to use", or an extension that converts all charset=big5 meta request to big5-hkscs.
| Reporter | ||
Comment 7•19 years ago
|
||
| Reporter | ||
Comment 8•19 years ago
|
||
| Reporter | ||
Comment 9•19 years ago
|
||
The final version of diff file for new Big5 table
Attachment #198205 -
Flags: review?(smontagu)
| Reporter | ||
Comment 10•19 years ago
|
||
The final version of new table [with Big5-2003+UAO] of big5.ut
Attachment #198206 -
Flags: review?(smontagu)
| Reporter | ||
Comment 11•19 years ago
|
||
Please use attachment 198205 [details] [diff] [review] and 198206 to patch new Big5 table. They are already tested by several non-official builds of Firefox. The big5.uf (unicode->big5) table is based on strict CP950. All mappings to user private area and buggy areas are eliminated and followed CP950. The big5.ut (big5->unicode) table is based on CP950 plus Big5-2003. (i.e., mappings conflicted between Big5-2003 and CP950 still follow CP950 for compatibility to make it a complete subset of CP950) For user private area, the mappings follow Big5-2003 and overriden by UAO2.41 extension.
| Reporter | ||
Updated•19 years ago
|
Attachment #198205 -
Flags: review?(smontagu) → review?
| Reporter | ||
Updated•19 years ago
|
Attachment #198206 -
Flags: review?(smontagu) → review?
| Reporter | ||
Comment 12•19 years ago
|
||
One more comment. If you worry about compatibility, please at least commit big5.uf (attach 198205) as soon as possible because it's bugging more and more user recently and we do really hope it commited before the incoming Fx1.5. Is this possible? big5.ut (b->u) is somehow more like an "improvement" which changed a lot while big5.uf (u->b) is basically original Big5/CP950 so it's almost harmless in any concern and is a real "bug fix". However we still do wish big5.ut to be commited at the same time. The files are tested by several volunteers for a period and should be OK for most user.
| Reporter | ||
Comment 13•19 years ago
|
||
(In reply to comment #6) > Because the conflict comes from wrong meta information (charset=Big5) for those > Big5-HKSCS pages, we believe a better solution to this issue is to provide an > preference to determine "how to select which Big5 uconv to use", or an extension > that converts all charset=big5 meta request to big5-hkscs. This can be solved by writing big5=BIG5-HKSCS in res/charsetalias.properties Maybe we can split Big5-UAO as an independent locale (because it does not have an official name in IANA yet) but it seems good enough now. For a HKSCS user in the situation mentioned in comment #5, a solution is to modify res/charsetalias.properties. (this may be achievd by an XPI.)
| Reporter | ||
Comment 14•19 years ago
|
||
(In reply to comment #13) > This can be solved by writing big5=BIG5-HKSCS in res/charsetalias.properties > For a HKSCS user in the situation mentioned in comment #5, a solution is to > modify res/charsetalias.properties. (this may be achievd by an XPI.) A sample XPI to demonstrate this solution can be found from http://moztw.org/dls/xpi/hkscs.xpi
| Reporter | ||
Updated•19 years ago
|
Attachment #198205 -
Attachment description: cvs diff for /intl/uconv/ucvtw/big5.uf [fromu] → (patchset) cvs diff for /intl/uconv/ucvtw/big5.uf [fromu]
| Reporter | ||
Updated•19 years ago
|
Attachment #198206 -
Attachment description: cvs diff for /intl/uconv/ucvtw/big5.ut [tou], Big5-2003+UAO → (patchset) cvs diff for /intl/uconv/ucvtw/big5.ut [tou], Big5-2003+UAO
| Reporter | ||
Comment 15•19 years ago
|
||
The patches has been tested by Taiwan users for a while (by un-official community builds) so they should be stable enough to be commited for 1.8 and trunk.
Flags: blocking1.8rc1?
Comment 16•19 years ago
|
||
tool late in the game to block on non-critical changes.
Flags: blocking1.8rc1? → blocking1.8rc1-
| Reporter | ||
Updated•19 years ago
|
Flags: blocking1.9a1?
Flags: blocking1.8.1?
Updated•19 years ago
|
Flags: blocking1.8.1?
Comment 17•19 years ago
|
||
I wonder if mozilla can apply the big5-2003 + UAO patch to firefox 2.0? Leaving this problem unsolved will just continue to bring inconvenience to chinese users.
Attachment #198205 -
Flags: review? → review?(smontagu)
Attachment #198206 -
Flags: review? → review?(smontagu)
Flags: blocking1.8.1? → blocking1.8.1+
| Assignee | ||
Comment 18•19 years ago
|
||
Comment on attachment 198205 [details] [diff] [review] (patchset) cvs diff for /intl/uconv/ucvtw/big5.uf [fromu] I can't assess these patches codepoint by codepoint, but I am happy to accept them based on comments 12 and 15. Auto-generated table patches in intl don't need super-review, but I'd like jshin's approval before checking in.
Attachment #198205 -
Flags: review?(smontagu)
Attachment #198205 -
Flags: review?(jshin1987)
Attachment #198205 -
Flags: review+
| Assignee | ||
Updated•19 years ago
|
Attachment #198206 -
Flags: review?(smontagu)
Attachment #198206 -
Flags: review?(jshin1987)
Attachment #198206 -
Flags: review+
Comment 19•19 years ago
|
||
Thanks a lot smontagu! Hope these patches can be commited before the official release of firefox 2.0
Comment 20•19 years ago
|
||
Comment on attachment 198205 [details] [diff] [review] (patchset) cvs diff for /intl/uconv/ucvtw/big5.uf [fromu] Sorry for the long delay. I'll edit big5.uf and big5.ut to add the urls of conversion tables you used. lxr will point back at this bug so that we can do without that, but still it is nice to have that.
Attachment #198205 -
Flags: review?(jshin1987) → review+
Comment 21•19 years ago
|
||
Comment on attachment 198206 [details] [diff] [review] (patchset) cvs diff for /intl/uconv/ucvtw/big5.ut [tou], Big5-2003+UAO r=jshin
Attachment #198206 -
Flags: review?(jshin1987) → review+
Comment 22•19 years ago
|
||
Thank you jshin! BTW, apart from big5-2003, there is a bug about big5-hkscs table... The one that mozilla use is too old. The Hong Kong government has updated the big5-hkscs table in 2004 on its official site... I hope mozilla can fix this bug as well. Here is the table releaed by hk government: http://www.info.gov.hk/digital21/chi/hkscs/download/hkscs-2004-big5-iso.txt For more information about the update, please go to http://www.info.gov.hk/digital21/eng/hkscs/mapping_table.html
Updated•18 years ago
|
Whiteboard: [checkin needed]
Target Milestone: --- → mozilla1.8.1beta1
Updated•18 years ago
|
Attachment #198205 -
Flags: approval1.8.1+
Updated•18 years ago
|
Attachment #198206 -
Flags: approval1.8.1+
| Assignee | ||
Comment 24•18 years ago
|
||
Checked in to MOZILLA_1_8_BRANCH. BTW, I added links to the conversion tables as suggested in comment 20 to all checkins.
Status: NEW → RESOLVED
Closed: 18 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Whiteboard: [checkin needed]
You need to log in
before you can comment on or make changes to this bug.
Description
•