Closed
Bug 70550
Opened 25 years ago
Closed 25 years ago
Can not display Korean hangul characters properly on my test page
Categories
(Core :: Internationalization, defect)
Tracking
()
VERIFIED
INVALID
People
(Reporter: masaki.katakai, Assigned: bstell)
References
()
Details
(Keywords: intl)
Attachments
(3 files)
I'm not sure why this happens on my test page that contains all UTF-8 chars
for testing purpose.
Please try to visit
http://village.infoweb.ne.jp/~katakai/mozilla/UTF-8_all.html
And you will see the results like
http://village.infoweb.ne.jp/~katakai/mozilla/UTF-8_all.jpg
and you can understand Korean hungul can not be displayed properly.
*However*, when I use the other example that contains japanese and
korean characters in UTF-8 format, it works fine.
http://village.infoweb.ne.jp/~katakai/mozilla/ja_ko_utf.html
http://village.infoweb.ne.jp/~katakai/mozilla/ja_ko_utf.jpg
I use Mozilla on Solaris japanese locale primarily, but I'm
seeing the same result on Linux. On Windows, hungle characters
of the page UTF-8_all.html can be displayed properly.
I'm not sure what's difference between UTF-8_all.html and
ja_ko_utf.html.
Comment 1•25 years ago
|
||
Have found the difference, Mozilla doesn't support all of the Hangul included in
Unicode (more than 11,000), it only supports ksc5601.1987-0 which contains only
2,350 pre-combined hangul.
Since this problem is only in Linux and Solaris, I think it is a showstopper,
but it's better to ask a native Korean speaker whether this is acceptable or
not. BTW, Solaris UTF-8 locales including ko_KR.UTF-8 supports all of the 11,000
Korean hanguls.
| Reporter | ||
Comment 2•25 years ago
|
||
Brina@Sun,
Can you ask someone in MPK whether this is acceptable or
not? How about Ienup? But please note this happens in UTF-8
not native EUC KR. If it is not popular, I don't think we
should escalate this.
Comment 4•25 years ago
|
||
The page http://village.infoweb.ne.jp/~katakai/mozilla/UTF-8_all.html and
http://village.infoweb.ne.jp/~katakai/mozilla/ja_ko_utf.html are not identical.
Although both page are encoded in UTF-8, http://village.infoweb.ne.jp/~katakai/
mozilla/ja_ko_utf.html only contains Hangul characters which could be encoded in
KSC 5601. In the other hand, http://village.infoweb.ne.jp/~katakai/mozilla/UTF-
8_all.html have all the possible character in Unicode.
Can you display http://village.infoweb.ne.jp/~katakai/mozilla/UTF-8_all.html in
your xterm under your korean utf8 locale ? if so, how ? which font are you using
? I don't think you can use KSC 5601 font since they do not encode those glyph.
The reason window version can display them is because the Window TRUE TYPE font
encode these glyph.
Comment 5•25 years ago
|
||
Comment 6•25 years ago
|
||
Have added a snapshot which shows the Hangul Characters in zh_CN.UTF-8 locale,
BTW, all of the UTF-8 locales in Solaris 2.6/7/8/9 supports all of the 11172
Hangul Characters, the fonts we use are *ksc5601.1992-3, both TrueType and
Bitmap fonts are available from Solaris 2.6.
| Assignee | ||
Comment 7•25 years ago
|
||
add to cc: blee@netscape.com jshin@pantheon.yale.edu
Comment 8•25 years ago
|
||
Let me begin with saying that there's *nothing* wrong with your screenshot.
It's absolutely normal. Now here comes the detail.
The rendering of the web page in Mozilla is completely independent
of the locale (it has been the case since Netscape 3.x !!). That is, the
behavior should be the same as far as the rendering of the web page is
concerned under whatever locale it's run (ll_CC.utf8, POSIX, ll_CC.eee
) provided that the identical set of fonts are available in all cases
(i.e. the result of 'xlsfonts' is identical regardless of the locale).
If you're in doubt, try it in any locale and you'll get the exactly
the same result (as far as web page rendering is concerned). What's
dependent on the current locale is input method and interaction
with other client (e.g. Windows Manager) via ICCCM.
What Mozilla is supposed to do and has been doing when it cannot
find glyphs for a given Hangul syllable (pre-composed) is convert
it into one-anchor character(0xa4d4 in EUC-KR) followed by three
Hangul jamos(alphabets) making up the syllable.
(ref. intl/uconv/ucvko/nsUnicodeToEUCKR.cpp, intl/uconv/src/uscan.c )
A quick test shows that it's behaving as expected:
A. When the fonts for Korean are set to those of KS C 5601 GL encoding,
2350 syllables are rendered as pre-composed and 8822 syllables
(for which glyphs are not available in ksc5601-1987.0 encoded fonts)
are rendered as 4 character sequence(one hollow box for 0xa4d4
followed by glyphs for 3 Hangul alphabets). Encoding
8822 syllables using 0xa4d4 (in EUC-KR) and three Jamos(alphabets)
is specified in KS X 1001:1997(KS C 5601-1992) annotation
3.3. Other than Mozilla, Hanterm (Korean xterm) is the only
program that I'm aware of that implements this. Microsoft
should have done this instead of introducing CP949.
B. When I switch fonts for Korean to 'Johab' (as used by Hanterm),
Mozilla composes glyphs for all the syllables on-the-fly
(see intl/uconv/ucvko/nsUnicodeToX11Johab.cpp).
C. When I set the fonts for Korean to those encoded in iso10646-1
with all the glyphs for 11,172 syllables, Mozilla
renders all of them making use of glyphs available from
iso10646-1 encoded fonts. In case of Linux, with X-TT
installed (which comes with XFree86 4.0 or which can be installed
along with XFree86 3.3.x), any Korean true-type fonts can be presented
to clients either in ksc5601.1987-0 encoding or iso10646-1 encoding.
Anyway, what you're experiencing is absolutely normal !! :-). It's
not a bug but a *feature* (see the case B above). Instead of rendering
Hangul syllables not available in the fonts on the system (that is, only
available fonts on the system is ksc5601.1987-0 or ksc5601.1987-1) as
'?' (or a hollow box or whatever Mozilla falls back to in such a case),
Mozilla (thanks to Frank) falls back to rendering them with enumerated
sequence of Hangul alphabets(jamos). That way, Mozilla can convey the
information(there's no loss of information) as opposed to make users
wonder what syllables they're missing (if they're rendered as '?').
One improvement desired is remove the leading
0xa4d4 for the rendering purposes while 0xa4d4 should be kept intact
for the exchange of information (when it's put on to the wire).
You may wonder why Mozilla doesn't make use of ksc5601.1987-3
encoded fonts with the glyphs for all of 11,172 syllables. Well,
that's because Mozilla doesn't know anything about that encoding.
I wrote about it a long time ago in I18N newsgroup when Frank asked
me whether it's necessary to implement JOHAB converter. I replied
that it may be necessary to do so because Sun uses the encoding name
ksc5601.1992-3 to mean "SANG-YONG JOHAB" encoded font. 'SANG-YOUNG JOHAB"
encoding is specified in annex 3 of KS X 1001:1997 (KS C 5601-1992),
Please, note that it is different from X11-Johab used by Hanterm(with
the encoding name of 'johab[sh]-1'. Those fonts are available at
<http://elf.kaist.ac.kr/hanterm>. See B. above) I don't know why Sun
came up with the idea of offering X11 fonts with the full repertoire of
11,172 Hangul syllables in this encoding instead of just making those
fonts encoded in ISO10646-1. It should be trivial to recode them
to put in ISO10646-1 encoding.
Anyway, Mozilla needs 'SANG-YONG JOHAB' <-> Unicode converter
to make use of Sun's ksc5601.1992-3 fonts. The converter should be very
easy to write. For Hangul syllables, the conversion is algorithmic and for
the rest(Hanja and symbols), shifted-translated tables for EUC-KR can be
used(refer to my implementation of JOHAB<->UCS-4 conversion for iconv()
in glibc 2.1.x or later or LGPLed libiconv by Bruno Haible). Perhaps,
it's time to overhaul intl/uconv/ucvko directory following a similar
way used for ucvcn.
Jungshik
Comment 9•25 years ago
|
||
I suggest that summary line be changed to
'JOHAB <-> Unicode converter has to be implemented' for Korean locale
to support ksc5601.1992-3 encoded fonts in Solaris.
In addition, I believe this should be assinged the priority of 'enhancement'.
As for rewriting existing converters and adding new ones (JOHAB
<->Unicode, WIndows-949 <-> Unicode, supporting 8byte sequence mentioned
in my previous note in EUC-KR -> Unicode direction, removing the leading
0xa4d4 for the rendering purpose in X11) for Korean locale a la those
for simplified Chinese locales, I might be able to do it sometime in
May. I think I now know what's going on in intl/uconv/ucv* well enough.
Jungshik
Comment 10•25 years ago
|
||
Comment 11•25 years ago
|
||
Comment 12•25 years ago
|
||
Hmm, I can't help wondering what the intent of creating the last two
attachments are. They're mostly irrelevant to this bug, which is NOT
a bug BUT a feature as I wrote yesterday.
Moreover, HANGUL.TXT (presumably copied from ftp.unicode.com)
has very MISLEADING information. It's mixing up the
coded character set and character set encoding scheme. (see
<http://pantheon.yale.edu/~jshin/faq/qa8.html> and references there
in. Also, refer to 'the' excellent reference 'CJKV Information Processing'
by Ken Lunde). When you're talking about KS C 5601-1987, you'd better
talk in terms of row and column of the character set table given in the
standard instead of talking in terms of the code point of a particular
encoding (e.g. EUC-KR). HANGUL.TXT was made by Microsoft people who
are ignorant of this distinction.
On top of that, it's simply wrong to refer to KS C 5601-1992 as JOHAB.
Basically, KS C 5601-1987 and KS C 5601-1992 (later renamed as KS
X 1001:1997) define exactly the same 94x94 coded character set (with
2350 Hangul precomposed syllables, 4xxx Hanjas and 1xxx symbols). JOHAB
is mentioned ONLY in the annex of KS C 5601-1992 as a supplementary
encoding.
Jungshik Shin
Comment 13•25 years ago
|
||
Jungshik,
Sorry that I didn't read your comments before adding the attachments, the
reason I added the first attachment is to reply Brian@Netscape's following
e-mail, I should have read all of the e-mails before updating the bugzilla.
Thanks for the information about HANGUL.TXT too.
Brian@Sun
Date: Thu, 08 Mar 2001 19:14:19 -0800
From: bstell@netscape.com (Brian Stell)
X-Accept-Language: en
MIME-Version: 1.0
To: Brian.Yuan@Sun.COM
Subject: Re: [Bug 70550] Changed - Can not display Korean hangul characters
properly on my test page
Content-Transfer-Encoding: 7bit
Brian,
Could you make page with just a few of the problematic
characters and encoded in ksc5601?
Thanks
| Assignee | ||
Comment 14•25 years ago
|
||
Since this is a feature not a bug I'm marking this INVALID and
I've opened bug 71489:
"RFE: JOHAB <-> Unicode converter for Korean locale"
http://bugzilla.mozilla.org/show_bug.cgi?id=71489
If something more needs to be done please reopen this bug.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → INVALID
Comment 15•25 years ago
|
||
Setting QA Contact to ylong@netscape.com. Yuying, can you verify/close this bug
as invalid?
QA Contact: andreasb → ylong
You need to log in
before you can comment on or make changes to this bug.
Description
•