Closed Bug 61422 Opened 25 years ago Closed 24 years ago

Traditional Chinese(EUC) characters can not be displayed

Categories

(Core :: Internationalization, defect, P1)

Sun
Solaris
defect

Tracking

()

VERIFIED FIXED
mozilla0.9

People

(Reporter: masaki.katakai, Assigned: bstell)

References

Details

(Keywords: intl, Whiteboard: converter problem)

Attachments

(15 files)

152.63 KB, image/jpeg
Details
150.79 KB, text/plain
Details
129.66 KB, image/jpeg
Details
98.10 KB, text/plain
Details
293.29 KB, text/plain
Details
2.07 KB, text/plain
Details
98.10 KB, text/plain
Details
293.35 KB, text/plain
Details
98.19 KB, text/plain
Details
128.83 KB, text/plain
Details
92.53 KB, application/octet-stream
Details
157.43 KB, text/plain
Details
1.68 KB, text/plain
Details
176.05 KB, image/jpeg
Details
2.15 KB, text/plain
Details
Some traditional Chinese(EUC) characters are displayed as '?' on Solaris zh_TW.EUC locale. Please try to browse the following. The area, 0xa3cf - 0xa4a1, 0xa7a0 - 0xa9b9 and 0xc2a1 - 0xc2c1 could not be displayed. http://village.infoweb.ne.jp/~katakai/mozilla/zh_TW_EUC_1.txt http://village.infoweb.ne.jp/~katakai/mozilla/zh_TW_EUC_2.txt Snapshots in, http://village.infoweb.ne.jp/~katakai/mozilla/zh_TW_EUC_NS6.gif http://village.infoweb.ne.jp/~katakai/mozilla/zh_TW_EUC_dtterm.gif Can those characters be displayed correctly in Linux platform?
Cc to Xianglan, could you try thin on Linux?
Same problem on linux. Please see the attached screen shot.
Attached image A screen shot on linux.
Erik, do you think this is a font problem or converter?
Blocks: 60916
I ran N6 on linux but showed the display to an IRIX machine that has Traditional Chinese font support. N6 was not able render characters in that range correctly, while 4.x (Communicator) can. Erik looked at the mapping between EUC-TW and Unicode and suggetsed that this might be converter problem. Reassign to ftang who will return in the middle of Dec.
Assignee: nhotta → ftang
Whiteboard: converter problem
Added keywords intl, nsbeta1 This bug is showstopper for Sun's Traditional Chinese release (see attachment) Note that Sun classifies it as a potential showstopper, but I think it's a showstopper
Keywords: intl, nsbeta1
I think the following file does the conversion. Shanjian, do you see any problem there? http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvtw2/nsEUCTWToUnicode.cpp
Please double check with ISO registry 171 or http://kanji.zinbun.kyoto-u.ac.jp/~yasuoka/CJK/cns11643-1992.1.gif 0xA3CF-0XA4A1 are not defined in CNS 11643 plan 1, therefore there are no character there. Display then as ? should be fine. 0xA7A1-0xA9B9 are radical characters. We should fix the display 0xc2a1 - 0xc2c1 are telcommunicatoin symbols. We should also fix these. All we need to change is cns_1.ut (and probably also cns_1.uf) under intl/uconv/ucvtw2/ I am supprise that we didn't catch this.
reassign to bstell and mark this as P2 nsbeta1, moz0.8 bstell- talk to me about how to fix this. We need to update the conversion table.
Assignee: ftang → bstell
Priority: P3 → P2
Target Milestone: --- → mozilla0.8
Status: NEW → ASSIGNED
Katakai-san, is the 0xc2a1 - 0xc2c1 range displaying '?' or blank glyphs?
Katakai-san, Would you copy the cns_1.uf and cns_1.uf into the intl/uconv/ucvtw2/ dir and rebuild? This should rebuild libucvtw2.so with the KangXi radicals enablee (0xa7a0 - 0xa9b9). Would you also run the patch on (a fresh copy of) gfx/src/gtk/nsFontMetricsGTK.cpp ? This should make the font preferences recognize the cns11643 fonts. Do you know if any of the fonts you sent me have glyphs for the control characters (0xc2a1 - 0xc2c1)?
Brian, I have two questions, - there are still some missing characters (displayed as ?) e.g. 0xa1ba-0xa1bd why? - On Communicator 4.x, we can set fonts for cns11643-1 and -2, but Mozilla has only one field. Which font (-1 or -2) should I set?
> - On Communicator 4.x, we can set fonts for cns11643-1 and -2, but Mozilla has only one field. Which font (-1 or -2) should I set? It should not matter which font you set, CNS-1 or CNS2. Mozilla will find the glyph. It is not easy on Unix to set the desired fallback font with the current UI. That is true. A bug was filed on this problem. See http://bugzilla.mozilla.org/show_bug.cgi?id=50363
Frank, Do we have unicode converters for cns planes 0, 8-16? + { "cns11643-0", &Unknown }, + { "cns11643-1", &CNS116431 }, + { "cns11643-2", &CNS116432 }, + { "cns11643-3", &CNS116433 }, + { "cns11643-4", &CNS116434 }, + { "cns11643-5", &CNS116435 }, + { "cns11643-6", &CNS116436 }, + { "cns11643-7", &CNS116437 }, + { "cns11643-8", &Unknown }, + { "cns11643-9", &Unknown }, + { "cns11643-10", &Unknown }, + { "cns11643-11", &Unknown }, + { "cns11643-12", &Unknown }, + { "cns11643-13", &Unknown }, + { "cns11643-14", &Unknown }, + { "cns11643-15", &Unknown }, + { "cns11643-16", &Unknown },
Katakai-san, I added positions a1ba-a1bd, a2a4, a2a6 to these new cns_1.uf cns_1.uf Let me know if you see any other missing code points. I found that sun-ming size 16 shows the control characters c2a1-c2c1. You may need to edit you prefs.js by hand to select this font. There seems to be a problem with prefs right now.
Erik, Would you review the font preference details with me?
Brian, two characters a4be and a4c0 are displayed as ?. I could not find out how to specify the fonts in prefs.js, but by View->Larger fonts, I could see c2a1-c2c1.
Katakai-san, The new cns_1.uf/cns_1.ut have the a4be and a4c0 code points.
Katakai-san, I simplified the font preference to only list one plane for each of the cns11643 fonts. Could you try the patch and let me know how it works? thanks
Target Milestone: mozilla0.8 → mozilla0.9
Setting Priority to P1. This is a show-stopper for Sun.
Priority: P2 → P1
This problem can be reproduced on a Chinese Windows 2000 system, except that characters between 0xc2a1 - 0xc2c1 are displayed as English letters instead of question marks. I'll attach a snapshot later. Please compare it with the one that Katakai-san attached on 02/01/01 18:58.
Changed QA contact to ji@netscape.com. Xianglan, I changed QA contact to you since you have already tested this.
QA Contact: teruko → ji
Xianglan, The 0xc2a1 - 0xc2c1 chars are actually control characters and the glyphs are have the abbreviations of each control character. I believe the reason that they show up under windows is that the only font available under unix (with out the patches) did no have those glyphs thus the font system substituted question marks.
Brian, two attachements you did on 2/5 looks both cns_1.ut judging from the file size. Will you please attach both files again for sure? BTW, I've reached to the same results as Masaki got. Other than 0xa4be and 0xa4c0, any other characters of cns11643-1 look fine to me. [a final confirmation will be given by our l10n QA folks, though.]
Tajami-san, I am sorry for the confusion. This bug has way too many attachemnts. Please use the attachments: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24471 has the latest cns_1 files. patches to the source files (nsFontMetricsGTK.cpp, charsetData.properties) http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24485 For reference only: The reference table that the converters were built from. http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24484 This is the table from Unicode web site with modifications to add the 8 points. All the other attachments should be ignored.
While awaiting new and right cns_1.ut and cns_1.uf files for displaying 0xa4be and 0xa4c0, we've done some more testing with the previous ones filed on 02/02/01, and found there are some other undisplayed characters written below. They are all displayed with brank. cns11643-1: 0xd8c8, ddee, f3f8 cns11643-2: 0x8ea2a1e2, 8ea2a1f0, 8ea2a1fe, 8ea2a2b3, all characters between 8ea2dbae and 8ea2dce3 all characters after 0x8ea2f0f8
Getting the right patch installed, I confirmed 0xa4be and 0xa4c0, both of which used to be "?", get displayed properly. We'll contibue to check other characters reported in my previous comments.
With the latest patches, there are still a lot of characters cannot be displayed:d1d6 d8c8 ddee f3f8 e5a2 e7c8 e7cd ebaf ebcf ecb0 f5f7 and a lot of other characters, I don't think this bug can be fixed by adding those characters one by one, there might have some serious problems in either the original tables or the algorithm to generate the final data. BTW, there are similar problems in GBK, a lot of GBK characters are displayed as blank, I don't list them because there are tooooooo many :).
For GBK: 1) are you displaying gbk data? If so is this related to http://bugzilla.mozilla.org/show_bug.cgi?id=60826 2) are you using a gb2312 font? If so is this related to http://bugzilla.mozilla.org/show_bug.cgi?id=66744
Regarding to the original zh_TW.EUC locale problem, there are two more findings: 1) If we reduce x font paths so that neither jisx0208 nor jisx0212 should be found, then the characters: 0xd1d6, d8c8 ddee f3f8 e5a2 e7c8 e7cd ebaf ebcf ecb0 f5f7 are displayed properly. It seems that jisx0208/jisx0212 fonts always have precedence of cns11643 fonts, regardless user's locale or a choice of charset encoding. We'd like to propose that font preference be enhanced so that users can change priority order of han-unitified character sets. 2) These blank displayed zh_TW.EUC encoded characters are converted into 0x8f???? in ja_JP.EUC encoding by converting twice zh_TW.EUC => UTF16 => ja_JP.EUC. We found even in ja_JP.EUC locale, the same set of the characters are displayed with blank glyphs. We suspect it could be a code conversion problem for jisx0212 character set.
I don't know whether the fixes of 60826 and 66744 have been "integrated" in the latest version that Toshi is working on, but that version still show the problem, I think it's the same as this bug, both are because of the wrong fonts files used by Mozilla, as what Toshi said, Mozilla always uses Japanese fonts as the first choice whenever possible even in Simplified Chinese or Traditional Chinese locales (it's very unfortunate that the Japanese fonts are allways available in Solaris in every locale). If CNS11643-1 fonts are used in zh_TW.EUC locale, this bug should be fixed, if GBK fonts are used for GBK pages, those two GBK bugs will be fixed, I guess. Thanks. Brian.
Brien's code checkin for 60826 was not in my local build when we tested yesterday. We'll test again after checking-out and rebuilding.
We already have a bug open to look for other parts of the font if it has been subsetted / plane'd http://bugzilla.mozilla.org/show_bug.cgi?id=67732 I open a new bug to request the font fallback first look in the langgroup fonts before looking thru all fonts (ie: random) http://bugzilla.mozilla.org/show_bug.cgi?id=69139
r=ftang
I tried the latest patches for this bug and patches for http://bugzilla.mozilla.org/show_bug.cgi?id=61108, It seems that characters of cns11643-1 can be displayed properly. I can not see any japanese glyphs. The character size is properly. My environment is Solaris 8 zh_TW.EUC locale, and set fonts like user_pref("font.name.monospace.zh-CN", "dt-interface user-gb2312.1980-0"); user_pref("font.name.monospace.zh-TW", "dt-interface user-cns11643-1"); user_pref("font.name.sans-serif.zh-CN", "dt-interface user-gb2312.1980-0"); user_pref("font.name.sans-serif.zh-TW", "dt-interface user-cns11643-1"); user_pref("font.name.serif.zh-CN", "dt-interface user-gb2312.1980-0"); user_pref("font.name.serif.zh-TW", "dt-interface user-cns11643-1"); I'll provide my enviroment to Toshi and Brian of Sun for evaluation. So I think it is ready for check-in for cns11643-1. For cns11643-2, we already filed a separate bug 67732.
checked in
the fix was checked in (02/28)
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Brian, with 03/07 build, 0xc2a1 - 0xc2c1 range is displaying blank glyphs. With the cns fonts you sent to me, these should NOT be blank. right?
Ji, These are communication characters and aren't really intended for display. Blank is okay if the font has blank glyphs. If there are question marks then there is no font or the converter does not recognize those points. Try setting your monospace font preference for Traditional Chinese to sun-ming 24 point and see if the glyphs appear.
Yes, after I set sun-ming 24 for monospace font, I can see the glyphs in 0xc2a1 - 0xc2c1 range. Marked it as verified.
Status: RESOLVED → VERIFIED
There are still 8 characters are displayed as blanks in CNS11643: 0xa3c0 - 0xa3c7 These characters are blanks in BIG5 encoding too: 0xa27e, 0xa2a1 - 0xa2a7
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: