Closed Bug 70550 Opened 25 years ago Closed 25 years ago

Can not display Korean hangul characters properly on my test page

Categories

(Core :: Internationalization, defect)

x86
Linux
defect
Not set
normal

Tracking

()

VERIFIED INVALID

People

(Reporter: masaki.katakai, Assigned: bstell)

References

()

Details

(Keywords: intl)

Attachments

(3 files)

I'm not sure why this happens on my test page that contains all UTF-8 chars for testing purpose. Please try to visit http://village.infoweb.ne.jp/~katakai/mozilla/UTF-8_all.html And you will see the results like http://village.infoweb.ne.jp/~katakai/mozilla/UTF-8_all.jpg and you can understand Korean hungul can not be displayed properly. *However*, when I use the other example that contains japanese and korean characters in UTF-8 format, it works fine. http://village.infoweb.ne.jp/~katakai/mozilla/ja_ko_utf.html http://village.infoweb.ne.jp/~katakai/mozilla/ja_ko_utf.jpg I use Mozilla on Solaris japanese locale primarily, but I'm seeing the same result on Linux. On Windows, hungle characters of the page UTF-8_all.html can be displayed properly. I'm not sure what's difference between UTF-8_all.html and ja_ko_utf.html.
Have found the difference, Mozilla doesn't support all of the Hangul included in Unicode (more than 11,000), it only supports ksc5601.1987-0 which contains only 2,350 pre-combined hangul. Since this problem is only in Linux and Solaris, I think it is a showstopper, but it's better to ask a native Korean speaker whether this is acceptable or not. BTW, Solaris UTF-8 locales including ko_KR.UTF-8 supports all of the 11,000 Korean hanguls.
Brina@Sun, Can you ask someone in MPK whether this is acceptable or not? How about Ienup? But please note this happens in UTF-8 not native EUC KR. If it is not popular, I don't think we should escalate this.
adding keyword intl and nsbeta1.
Keywords: intl, nsbeta1
The page http://village.infoweb.ne.jp/~katakai/mozilla/UTF-8_all.html and http://village.infoweb.ne.jp/~katakai/mozilla/ja_ko_utf.html are not identical. Although both page are encoded in UTF-8, http://village.infoweb.ne.jp/~katakai/ mozilla/ja_ko_utf.html only contains Hangul characters which could be encoded in KSC 5601. In the other hand, http://village.infoweb.ne.jp/~katakai/mozilla/UTF- 8_all.html have all the possible character in Unicode. Can you display http://village.infoweb.ne.jp/~katakai/mozilla/UTF-8_all.html in your xterm under your korean utf8 locale ? if so, how ? which font are you using ? I don't think you can use KSC 5601 font since they do not encode those glyph. The reason window version can display them is because the Window TRUE TYPE font encode these glyph.
Have added a snapshot which shows the Hangul Characters in zh_CN.UTF-8 locale, BTW, all of the UTF-8 locales in Solaris 2.6/7/8/9 supports all of the 11172 Hangul Characters, the fonts we use are *ksc5601.1992-3, both TrueType and Bitmap fonts are available from Solaris 2.6.
Let me begin with saying that there's *nothing* wrong with your screenshot. It's absolutely normal. Now here comes the detail. The rendering of the web page in Mozilla is completely independent of the locale (it has been the case since Netscape 3.x !!). That is, the behavior should be the same as far as the rendering of the web page is concerned under whatever locale it's run (ll_CC.utf8, POSIX, ll_CC.eee ) provided that the identical set of fonts are available in all cases (i.e. the result of 'xlsfonts' is identical regardless of the locale). If you're in doubt, try it in any locale and you'll get the exactly the same result (as far as web page rendering is concerned). What's dependent on the current locale is input method and interaction with other client (e.g. Windows Manager) via ICCCM. What Mozilla is supposed to do and has been doing when it cannot find glyphs for a given Hangul syllable (pre-composed) is convert it into one-anchor character(0xa4d4 in EUC-KR) followed by three Hangul jamos(alphabets) making up the syllable. (ref. intl/uconv/ucvko/nsUnicodeToEUCKR.cpp, intl/uconv/src/uscan.c ) A quick test shows that it's behaving as expected: A. When the fonts for Korean are set to those of KS C 5601 GL encoding, 2350 syllables are rendered as pre-composed and 8822 syllables (for which glyphs are not available in ksc5601-1987.0 encoded fonts) are rendered as 4 character sequence(one hollow box for 0xa4d4 followed by glyphs for 3 Hangul alphabets). Encoding 8822 syllables using 0xa4d4 (in EUC-KR) and three Jamos(alphabets) is specified in KS X 1001:1997(KS C 5601-1992) annotation 3.3. Other than Mozilla, Hanterm (Korean xterm) is the only program that I'm aware of that implements this. Microsoft should have done this instead of introducing CP949. B. When I switch fonts for Korean to 'Johab' (as used by Hanterm), Mozilla composes glyphs for all the syllables on-the-fly (see intl/uconv/ucvko/nsUnicodeToX11Johab.cpp). C. When I set the fonts for Korean to those encoded in iso10646-1 with all the glyphs for 11,172 syllables, Mozilla renders all of them making use of glyphs available from iso10646-1 encoded fonts. In case of Linux, with X-TT installed (which comes with XFree86 4.0 or which can be installed along with XFree86 3.3.x), any Korean true-type fonts can be presented to clients either in ksc5601.1987-0 encoding or iso10646-1 encoding. Anyway, what you're experiencing is absolutely normal !! :-). It's not a bug but a *feature* (see the case B above). Instead of rendering Hangul syllables not available in the fonts on the system (that is, only available fonts on the system is ksc5601.1987-0 or ksc5601.1987-1) as '?' (or a hollow box or whatever Mozilla falls back to in such a case), Mozilla (thanks to Frank) falls back to rendering them with enumerated sequence of Hangul alphabets(jamos). That way, Mozilla can convey the information(there's no loss of information) as opposed to make users wonder what syllables they're missing (if they're rendered as '?'). One improvement desired is remove the leading 0xa4d4 for the rendering purposes while 0xa4d4 should be kept intact for the exchange of information (when it's put on to the wire). You may wonder why Mozilla doesn't make use of ksc5601.1987-3 encoded fonts with the glyphs for all of 11,172 syllables. Well, that's because Mozilla doesn't know anything about that encoding. I wrote about it a long time ago in I18N newsgroup when Frank asked me whether it's necessary to implement JOHAB converter. I replied that it may be necessary to do so because Sun uses the encoding name ksc5601.1992-3 to mean "SANG-YONG JOHAB" encoded font. 'SANG-YOUNG JOHAB" encoding is specified in annex 3 of KS X 1001:1997 (KS C 5601-1992), Please, note that it is different from X11-Johab used by Hanterm(with the encoding name of 'johab[sh]-1'. Those fonts are available at <http://elf.kaist.ac.kr/hanterm>. See B. above) I don't know why Sun came up with the idea of offering X11 fonts with the full repertoire of 11,172 Hangul syllables in this encoding instead of just making those fonts encoded in ISO10646-1. It should be trivial to recode them to put in ISO10646-1 encoding. Anyway, Mozilla needs 'SANG-YONG JOHAB' <-> Unicode converter to make use of Sun's ksc5601.1992-3 fonts. The converter should be very easy to write. For Hangul syllables, the conversion is algorithmic and for the rest(Hanja and symbols), shifted-translated tables for EUC-KR can be used(refer to my implementation of JOHAB<->UCS-4 conversion for iconv() in glibc 2.1.x or later or LGPLed libiconv by Bruno Haible). Perhaps, it's time to overhaul intl/uconv/ucvko directory following a similar way used for ucvcn. Jungshik
I suggest that summary line be changed to 'JOHAB <-> Unicode converter has to be implemented' for Korean locale to support ksc5601.1992-3 encoded fonts in Solaris. In addition, I believe this should be assinged the priority of 'enhancement'. As for rewriting existing converters and adding new ones (JOHAB <->Unicode, WIndows-949 <-> Unicode, supporting 8byte sequence mentioned in my previous note in EUC-KR -> Unicode direction, removing the leading 0xa4d4 for the rendering purpose in X11) for Korean locale a la those for simplified Chinese locales, I might be able to do it sometime in May. I think I now know what's going on in intl/uconv/ucv* well enough. Jungshik
Hmm, I can't help wondering what the intent of creating the last two attachments are. They're mostly irrelevant to this bug, which is NOT a bug BUT a feature as I wrote yesterday. Moreover, HANGUL.TXT (presumably copied from ftp.unicode.com) has very MISLEADING information. It's mixing up the coded character set and character set encoding scheme. (see <http://pantheon.yale.edu/~jshin/faq/qa8.html> and references there in. Also, refer to 'the' excellent reference 'CJKV Information Processing' by Ken Lunde). When you're talking about KS C 5601-1987, you'd better talk in terms of row and column of the character set table given in the standard instead of talking in terms of the code point of a particular encoding (e.g. EUC-KR). HANGUL.TXT was made by Microsoft people who are ignorant of this distinction. On top of that, it's simply wrong to refer to KS C 5601-1992 as JOHAB. Basically, KS C 5601-1987 and KS C 5601-1992 (later renamed as KS X 1001:1997) define exactly the same 94x94 coded character set (with 2350 Hangul precomposed syllables, 4xxx Hanjas and 1xxx symbols). JOHAB is mentioned ONLY in the annex of KS C 5601-1992 as a supplementary encoding. Jungshik Shin
Jungshik, Sorry that I didn't read your comments before adding the attachments, the reason I added the first attachment is to reply Brian@Netscape's following e-mail, I should have read all of the e-mails before updating the bugzilla. Thanks for the information about HANGUL.TXT too. Brian@Sun Date: Thu, 08 Mar 2001 19:14:19 -0800 From: bstell@netscape.com (Brian Stell) X-Accept-Language: en MIME-Version: 1.0 To: Brian.Yuan@Sun.COM Subject: Re: [Bug 70550] Changed - Can not display Korean hangul characters properly on my test page Content-Transfer-Encoding: 7bit Brian, Could you make page with just a few of the problematic characters and encoded in ksc5601? Thanks
Since this is a feature not a bug I'm marking this INVALID and I've opened bug 71489: "RFE: JOHAB <-> Unicode converter for Korean locale" http://bugzilla.mozilla.org/show_bug.cgi?id=71489 If something more needs to be done please reopen this bug.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → INVALID
Setting QA Contact to ylong@netscape.com. Yuying, can you verify/close this bug as invalid?
QA Contact: andreasb → ylong
Mark as verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: