Closed
Bug 61422
Opened 25 years ago
Closed 24 years ago
Traditional Chinese(EUC) characters can not be displayed
Categories
(Core :: Internationalization, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla0.9
People
(Reporter: masaki.katakai, Assigned: bstell)
References
Details
(Keywords: intl, Whiteboard: converter problem)
Attachments
(15 files)
152.63 KB,
image/jpeg
|
Details | |
150.79 KB,
text/plain
|
Details | |
129.66 KB,
image/jpeg
|
Details | |
98.10 KB,
text/plain
|
Details | |
293.29 KB,
text/plain
|
Details | |
2.07 KB,
text/plain
|
Details | |
98.10 KB,
text/plain
|
Details | |
293.35 KB,
text/plain
|
Details | |
98.19 KB,
text/plain
|
Details | |
128.83 KB,
text/plain
|
Details | |
92.53 KB,
application/octet-stream
|
Details | |
157.43 KB,
text/plain
|
Details | |
1.68 KB,
text/plain
|
Details | |
176.05 KB,
image/jpeg
|
Details | |
2.15 KB,
text/plain
|
Details |
Some traditional Chinese(EUC) characters are displayed as '?'
on Solaris zh_TW.EUC locale. Please try to browse the following.
The area, 0xa3cf - 0xa4a1, 0xa7a0 - 0xa9b9 and 0xc2a1 - 0xc2c1
could not be displayed.
http://village.infoweb.ne.jp/~katakai/mozilla/zh_TW_EUC_1.txt
http://village.infoweb.ne.jp/~katakai/mozilla/zh_TW_EUC_2.txt
Snapshots in,
http://village.infoweb.ne.jp/~katakai/mozilla/zh_TW_EUC_NS6.gif
http://village.infoweb.ne.jp/~katakai/mozilla/zh_TW_EUC_dtterm.gif
Can those characters be displayed correctly in Linux platform?
Comment 1•25 years ago
|
||
Cc to Xianglan, could you try thin on Linux?
Comment 4•25 years ago
|
||
Erik, do you think this is a font problem or converter?
I ran N6 on linux but showed the display to an IRIX machine that has Traditional
Chinese font support. N6 was not able render characters in that range correctly,
while 4.x (Communicator) can.
Erik looked at the mapping between EUC-TW and Unicode and suggetsed that this
might be converter problem.
Reassign to ftang who will return in the middle of Dec.
Assignee: nhotta → ftang
Comment 6•25 years ago
|
||
Added keywords intl, nsbeta1
This bug is showstopper for Sun's Traditional Chinese release (see attachment)
Note that Sun classifies it as a potential showstopper, but I think it's a
showstopper
Comment 7•25 years ago
|
||
I think the following file does the conversion.
Shanjian, do you see any problem there?
http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvtw2/nsEUCTWToUnicode.cpp
Comment 8•25 years ago
|
||
Please double check with ISO registry 171 or
http://kanji.zinbun.kyoto-u.ac.jp/~yasuoka/CJK/cns11643-1992.1.gif
0xA3CF-0XA4A1 are not defined in CNS 11643 plan 1, therefore there are no
character there. Display then as ? should be fine.
0xA7A1-0xA9B9 are radical characters. We should fix the display
0xc2a1 - 0xc2c1 are telcommunicatoin symbols. We should also fix these.
All we need to change is cns_1.ut (and probably also cns_1.uf) under
intl/uconv/ucvtw2/
I am supprise that we didn't catch this.
Comment 9•25 years ago
|
||
reassign to bstell and mark this as P2 nsbeta1, moz0.8
bstell- talk to me about how to fix this. We need to update the conversion table.
Assignee: ftang → bstell
Priority: P3 → P2
Target Milestone: --- → mozilla0.8
Assignee | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 10•25 years ago
|
||
Katakai-san,
is the 0xc2a1 - 0xc2c1 range displaying '?' or blank glyphs?
Assignee | ||
Comment 11•25 years ago
|
||
Reporter | ||
Comment 12•25 years ago
|
||
Assignee | ||
Comment 13•25 years ago
|
||
Assignee | ||
Comment 14•25 years ago
|
||
Assignee | ||
Comment 15•25 years ago
|
||
Assignee | ||
Comment 16•25 years ago
|
||
Katakai-san,
Would you copy the cns_1.uf and cns_1.uf into the intl/uconv/ucvtw2/ dir and
rebuild? This should rebuild libucvtw2.so with the KangXi radicals enablee
(0xa7a0 - 0xa9b9).
Would you also run the patch on (a fresh copy of)
gfx/src/gtk/nsFontMetricsGTK.cpp ? This should make the font preferences
recognize the cns11643 fonts.
Do you know if any of the fonts you sent me have glyphs for the control
characters (0xc2a1 - 0xc2c1)?
Reporter | ||
Comment 17•25 years ago
|
||
Brian, I have two questions,
- there are still some missing characters (displayed as ?) e.g. 0xa1ba-0xa1bd
why?
- On Communicator 4.x, we can set fonts for cns11643-1 and -2, but Mozilla
has only one field. Which font (-1 or -2) should I set?
Comment 18•25 years ago
|
||
> - On Communicator 4.x, we can set fonts for cns11643-1 and -2, but Mozilla
has only one field. Which font (-1 or -2) should I set?
It should not matter which font you set, CNS-1 or CNS2. Mozilla will
find the glyph.
It is not easy on Unix to set the desired fallback font with the
current UI. That is true. A bug was filed on this problem. See
http://bugzilla.mozilla.org/show_bug.cgi?id=50363
Assignee | ||
Comment 19•25 years ago
|
||
Frank,
Do we have unicode converters for cns planes 0, 8-16?
+ { "cns11643-0", &Unknown },
+ { "cns11643-1", &CNS116431 },
+ { "cns11643-2", &CNS116432 },
+ { "cns11643-3", &CNS116433 },
+ { "cns11643-4", &CNS116434 },
+ { "cns11643-5", &CNS116435 },
+ { "cns11643-6", &CNS116436 },
+ { "cns11643-7", &CNS116437 },
+ { "cns11643-8", &Unknown },
+ { "cns11643-9", &Unknown },
+ { "cns11643-10", &Unknown },
+ { "cns11643-11", &Unknown },
+ { "cns11643-12", &Unknown },
+ { "cns11643-13", &Unknown },
+ { "cns11643-14", &Unknown },
+ { "cns11643-15", &Unknown },
+ { "cns11643-16", &Unknown },
Assignee | ||
Comment 20•25 years ago
|
||
Assignee | ||
Comment 21•25 years ago
|
||
Assignee | ||
Comment 22•25 years ago
|
||
Katakai-san,
I added positions a1ba-a1bd, a2a4, a2a6 to these new cns_1.uf cns_1.uf
Let me know if you see any other missing code points.
I found that sun-ming size 16 shows the control characters c2a1-c2c1.
You may need to edit you prefs.js by hand to select this font. There seems to be
a problem with prefs right now.
Assignee | ||
Comment 23•25 years ago
|
||
Erik,
Would you review the font preference details with me?
Reporter | ||
Comment 24•25 years ago
|
||
Brian, two characters a4be and a4c0 are displayed as ?.
I could not find out how to specify the fonts in prefs.js,
but by View->Larger fonts, I could see c2a1-c2c1.
Assignee | ||
Comment 25•25 years ago
|
||
bug 67716 is somewhat related to this bug
http://bugzilla.mozilla.org/show_bug.cgi?id=67716
Assignee | ||
Comment 26•25 years ago
|
||
Assignee | ||
Comment 27•25 years ago
|
||
Assignee | ||
Comment 28•25 years ago
|
||
Assignee | ||
Comment 29•25 years ago
|
||
Assignee | ||
Comment 30•25 years ago
|
||
Katakai-san,
The new cns_1.uf/cns_1.ut have the a4be and a4c0 code points.
Assignee | ||
Comment 31•25 years ago
|
||
Assignee | ||
Comment 32•25 years ago
|
||
Katakai-san,
I simplified the font preference to only list one plane for each of the cns11643
fonts.
Could you try the patch and let me know how it works?
thanks
Assignee | ||
Updated•25 years ago
|
Target Milestone: mozilla0.8 → mozilla0.9
Comment 34•25 years ago
|
||
This problem can be reproduced on a Chinese Windows 2000 system, except that characters between
0xc2a1 - 0xc2c1 are displayed as English letters instead of question marks. I'll attach a snapshot
later. Please compare it with the one that Katakai-san attached on 02/01/01 18:58.
Comment 35•25 years ago
|
||
Comment 36•25 years ago
|
||
Changed QA contact to ji@netscape.com. Xianglan, I changed QA contact to you
since you have already tested this.
QA Contact: teruko → ji
Assignee | ||
Comment 37•25 years ago
|
||
Xianglan,
The 0xc2a1 - 0xc2c1 chars are actually control characters and the glyphs are
have the abbreviations of each control character. I believe the reason that they
show up under windows is that the only font available under unix (with out the
patches) did no have those glyphs thus the font system substituted question
marks.
Comment 38•24 years ago
|
||
Brian, two attachements you did on 2/5 looks both cns_1.ut judging from
the file size. Will you please attach both files again for sure?
BTW, I've reached to the same results as Masaki got. Other than 0xa4be and
0xa4c0, any other characters of cns11643-1 look fine to me.
[a final confirmation will be given by our l10n QA folks, though.]
Assignee | ||
Comment 39•24 years ago
|
||
Tajami-san,
I am sorry for the confusion. This bug has way too many attachemnts.
Please use the attachments:
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24471
has the latest cns_1 files.
patches to the source files (nsFontMetricsGTK.cpp, charsetData.properties)
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24485
For reference only:
The reference table that the converters were built from.
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24484
This is the table from Unicode web site with modifications to
add the 8 points.
All the other attachments should be ignored.
Comment 40•24 years ago
|
||
While awaiting new and right cns_1.ut and cns_1.uf files for displaying
0xa4be and 0xa4c0, we've done some more testing with the previous ones
filed on 02/02/01, and found there are some other undisplayed characters
written below. They are all displayed with brank.
cns11643-1:
0xd8c8, ddee, f3f8
cns11643-2:
0x8ea2a1e2, 8ea2a1f0, 8ea2a1fe, 8ea2a2b3,
all characters between 8ea2dbae and 8ea2dce3
all characters after 0x8ea2f0f8
Comment 41•24 years ago
|
||
Getting the right patch installed, I confirmed 0xa4be and 0xa4c0, both of which
used to be "?", get displayed properly.
We'll contibue to check other characters reported in my previous comments.
Comment 42•24 years ago
|
||
With the latest patches, there are still a lot of characters cannot be
displayed:d1d6
d8c8 ddee f3f8 e5a2 e7c8 e7cd ebaf ebcf ecb0 f5f7 and a lot of other characters,
I don't think this bug can be fixed by adding those characters one by one, there
might have some serious problems in either the original tables or the algorithm
to generate the final data.
BTW, there are similar problems in GBK, a lot of GBK characters are displayed as
blank, I don't list them because there are tooooooo many :).
Assignee | ||
Comment 43•24 years ago
|
||
For GBK:
1) are you displaying gbk data? If so is this related to
http://bugzilla.mozilla.org/show_bug.cgi?id=60826
2) are you using a gb2312 font? If so is this related to
http://bugzilla.mozilla.org/show_bug.cgi?id=66744
Comment 44•24 years ago
|
||
Regarding to the original zh_TW.EUC locale problem, there are two more
findings:
1) If we reduce x font paths so that neither jisx0208 nor jisx0212 should be
found, then the characters: 0xd1d6, d8c8 ddee f3f8 e5a2 e7c8 e7cd ebaf ebcf
ecb0 f5f7 are displayed properly.
It seems that jisx0208/jisx0212 fonts always have precedence of cns11643
fonts, regardless user's locale or a choice of charset encoding.
We'd like to propose that font preference be enhanced so that users can
change priority order of han-unitified character sets.
2) These blank displayed zh_TW.EUC encoded characters are converted into
0x8f???? in ja_JP.EUC encoding by converting twice
zh_TW.EUC => UTF16 => ja_JP.EUC. We found even in ja_JP.EUC locale,
the same set of the characters are displayed with blank glyphs.
We suspect it could be a code conversion problem for jisx0212 character
set.
Comment 45•24 years ago
|
||
I don't know whether the fixes of 60826 and 66744 have been "integrated" in the
latest version that Toshi is working on, but that version still show the
problem, I think it's the same as this bug, both are because of the wrong fonts
files used by Mozilla, as what Toshi said, Mozilla always uses Japanese fonts
as the first choice whenever possible even in Simplified Chinese or Traditional
Chinese locales (it's very unfortunate that the Japanese fonts are allways
available in Solaris in every locale). If CNS11643-1 fonts are used in zh_TW.EUC
locale, this bug should be fixed, if GBK fonts are used for GBK pages, those two
GBK bugs will be fixed, I guess.
Thanks.
Brian.
Comment 46•24 years ago
|
||
Brien's code checkin for 60826 was not in my local build when we tested
yesterday. We'll test again after checking-out and rebuilding.
Assignee | ||
Comment 47•24 years ago
|
||
We already have a bug open to look for other parts of the font if it has been
subsetted / plane'd
http://bugzilla.mozilla.org/show_bug.cgi?id=67732
I open a new bug to request the font fallback first look in the langgroup fonts
before looking thru all fonts (ie: random)
http://bugzilla.mozilla.org/show_bug.cgi?id=69139
Comment 48•24 years ago
|
||
Comment 49•24 years ago
|
||
r=ftang
Reporter | ||
Comment 50•24 years ago
|
||
I tried the latest patches for this bug and patches
for http://bugzilla.mozilla.org/show_bug.cgi?id=61108,
It seems that characters of cns11643-1 can be displayed properly.
I can not see any japanese glyphs. The character size is properly.
My environment is Solaris 8 zh_TW.EUC locale, and set fonts like
user_pref("font.name.monospace.zh-CN", "dt-interface user-gb2312.1980-0");
user_pref("font.name.monospace.zh-TW", "dt-interface user-cns11643-1");
user_pref("font.name.sans-serif.zh-CN", "dt-interface user-gb2312.1980-0");
user_pref("font.name.sans-serif.zh-TW", "dt-interface user-cns11643-1");
user_pref("font.name.serif.zh-CN", "dt-interface user-gb2312.1980-0");
user_pref("font.name.serif.zh-TW", "dt-interface user-cns11643-1");
I'll provide my enviroment to Toshi and Brian of Sun for evaluation.
So I think it is ready for check-in for cns11643-1. For cns11643-2,
we already filed a separate bug 67732.
Comment 51•24 years ago
|
||
Assignee | ||
Comment 52•24 years ago
|
||
checked in
Assignee | ||
Comment 53•24 years ago
|
||
the fix was checked in (02/28)
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Comment 54•24 years ago
|
||
Brian, with 03/07 build, 0xc2a1 - 0xc2c1 range is displaying blank glyphs.
With the cns fonts you sent to me, these should NOT be blank. right?
Assignee | ||
Comment 55•24 years ago
|
||
Ji,
These are communication characters and aren't really intended for display.
Blank is okay if the font has blank glyphs. If there are question marks then
there is no font or the converter does not recognize those points.
Try setting your monospace font preference for Traditional Chinese to
sun-ming 24 point and see if the glyphs appear.
Comment 56•24 years ago
|
||
Yes, after I set sun-ming 24 for monospace font, I can see the glyphs in 0xc2a1
- 0xc2c1 range. Marked it as verified.
Status: RESOLVED → VERIFIED
Comment 57•24 years ago
|
||
There are still 8 characters are displayed as blanks in CNS11643:
0xa3c0 - 0xa3c7
These characters are blanks in BIG5 encoding too:
0xa27e, 0xa2a1 - 0xa2a7
You need to log in
before you can comment on or make changes to this bug.
Description
•