Closed Bug 403564 Opened 17 years ago Closed 9 years ago

Incorrect Big5-HKSCS mapping output

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: hfwong1, Assigned: smontagu)

References

Details

Attachments

(2 files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.8.1.8) Gecko/20071021 Firefox/2.0.0.8 (pigfoot) Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.8.1.8) Gecko/20071021 Firefox/2.0.0.8 (pigfoot) This is a bug related to the PUA area of big5. If I switch the page encoding to big5-hkscs, then when I send some unicode text out in email or on forums, the unicode text will display as the PUA text of big5-hkscs. I know that this problem in solved in big5 (UAO 2003). The output text which is not in the big5 characters set will display as unicode text instead of PUA text. I hope you can make some change to the output table of big5-hkscs so that users will no longer output PUA text on the internet which other users cannot see. Reproducible: Always Steps to Reproduce: 1. Switch to big5-hkscs encoding 2. Type some unicode chinese (e.g. 嘅, 着) which is in the big5-hkscs character set but not in the big5 character set and submit it on the web (forum or blog) 3. Use other browsers (IE7, safari) to see that website, you will see that the unicode chinese character will display as an empty square, which implies that the unicode text become hkscs PUA text. Actual Results: The unicode chinese character will display as an empty square, which implies that the unicode text become hkscs PUA text. Expected Results: The unicode chinese character should display itself using the unicode mapping instead of the hkscs mapping that not everyone can see
Assignee: nobody → smontagu
Status: UNCONFIRMED → NEW
Component: Build Config → Internationalization
Depends on: 162431
Ever confirmed: true
OS: Windows XP → All
Product: Firefox → Core
QA Contact: build.config → i18n
Hardware: PC → All
Except for characters in Unicode Plane 2, this was supposed to have been fixed by bug 343129. Following the "steps to reproduce" right here in bugzilla:
My attempt to reproduce in comment 1 failed because bugzilla cut off the comment at the characters that I copied and pasted from comment 0. I still don't really understand this bug report. The characters in comment 0 aren't in the PUA area of hkscs, either in hkscs-2001 or hkscs-2004. If the character displays as an empty square, doesn't it just mean that the other browser is using Big5 without the hkscs extensions and not recognizing the codepoints?
I dunno whether the character I listed above are in HKSCS PUA or not. The problem is that, whenever I send out unicode texts in forums using Big5-HKSCS code, I always got some non-Big5 character converted to Big5-HKSCS PUA codepoint instead of unicode codepoint. This produces a very huge problem as people who do not have Big5-HKSCS installed cannot view those characters properly on Internet Explorer. It also created many PUA texts in Big5-HKSCS forums which might not be viewed properly by others. To give you an example, I have attached a link with PUA texts created by Firefox. http://www28.discuss.com.hk/viewthread.php?tid=8409425&page=3#pid182291730 In order to reproduce the problem, you must first switch the page to Big5-HKSCS first (not the default Big5 encoding). Second, reply to the board with some non-Big5 HKSCS characters such as 嘅. After replying, you can see the reply properly in Big5-HKSCS page encoding. However, if you switch back to Big5, those non-big5 characters would switch to Big5-UAO characters instead. And if you view the page in Internet Explorer, you will see that those non-big5 (HKSCS) characters would shown blank. This proves that there is some problem with the Big5-HKSCS tables of Firefox.
The PUA codepoint rendered as 嗰 in Big5-HKSCS.
The PUA codepoint rendered as 鰟 in Big5-HKSCS.
Seeing the pictures shown above, it is obvious that there is some problem with the Big5-HKSCS table that FX is currently using. This should be fixed as soon as possible as this can create a lot of PUA texts in the Chinese internet community. I think that Big5-HKSCS should just use the same conversion table that Big5-2003 (UAO) is currently using. As I have tested, no PUA characters were created and no rendering was problem found if non-Big5 characters are sent to the forums when the page encoding is set to Big5.
Sorry that there is a typo in Comment #5: The PUA character pointed by the red arrow is rendered as 鰟 in Big5 encoding instead of Big5-HKSCS.
Sorry, my English is very very poor. I think Mr. Ho 's meaning is like below: The encoding table for HKSCS should just exist only "B2U", "U2B" should use the same table from Big5 (UAO). That's means HKSCS only go one way conver - Big5-HKSCS => unicode. When Unicode => Big5 (HKSCS), it will use unicode escape char (&#xxx;).
Sorry, something I type is too quickly and make it the wrong meaning. >That's means HKSCS only go one way conver - Big5-HKSCS => unicode. >When Unicode => Big5 (HKSCS), it will use unicode escape char (&#xxx;). Fixed here: That's means HKSCS PUA characters only go one way convert - Big5-HKSCS => unicode. When those PUA character Unicode => Big5 (HKSCS), it will go back to native unicode escape char (&#xxx;), got maximus compatibility to those browsers they NEVER support HKSCS.
I've wrote a test for the encoding/decoding table. http://share.timc.idv.tw/encodetest.php I could see that the problem here is exactly what had happen to Big5. Therefore, I agree with comments above on the solution based on idea in bug 310229, to replace u2h table to strictly CP950 for most compatibility. The best thing about this solution is the data is already cooked - overwrite hkscs.uf with big5.uf can do the trick. As of h2u one, given the fact that bug 162431 has not fixed and will likely never fixed (thus we never correctly implement HKSCS-2004, leaving many ext B characters in Unicode PUA block as HKSCS-2001 specified), I suggest we leave it for now and solve it with some workarounds * https://addons.mozilla.org/en-US/firefox/addon/9294, written the person who fixed bug 310229, which overwrite charset metadata of all Big5 pages to Big5-HKSCS, so Fx won't process HKSCS bytes in Big5-2003/uao. * Another new extension converts all the Unicode PUA characters to their correct position, so Fx could render it properly on all systems with proper ext B font (instead of squares indicating its a PUA char). This extension can convert HKSCS chars on Big5-HKSCS pages that maps (incorrectly) to Unicode PUA block, and could also convert PUA code points on UTF-8 pages. For latter one, I have wrote a proof-of-concept GreaseMonkey script. I'll post it here once I feel comfortable to do so. Simon, Please tell me what's the rule at bugzilla on providing such partial solution; can a bug check-in anything without considered fixed?
http://forum.moztw.org/viewtopic.php?t=26693 Here is a patch for this problem.
(In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from comment #10) > http://share.timc.idv.tw/encodetest.php This page is now http://timc.idv.tw/encodetest/
For those still watching this bug, bug 912470 is the alternative proposal to merge big5-hkscs and big5 with a new asymmetrical algorithm (see spec). It might fix this bug altogether.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: