Closed Bug 157602 Opened 19 years ago Closed 18 years ago

Non-ascii (in ISO-8859-1) title not displayed correctly on JA linux

Categories

(MailNews Core :: Internationalization, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 150131

People

(Reporter: jeesun, Assigned: nhottanscp)

Details

Attachments

(7 files)

Non-ascii title not displayed correctly on JA linux
Platform: JA linux 7.1

Steps:
1. Send yourself a mail which has accent characters  (i used êtes-vous sû)  in
subject and body. Choose ISO-8859-1 encoding. And receive the msg. (Or Subscribe
to news.mozilla.org/netscape.public.mozilla.qa.i18n and see any msg with accent
letters)

2. Notice that the title is not displayed correctly.
QA Contact: marina → jeesun
8 bit Latin1 characters are not displayed on JA locale, this is a known problem.
I'll search for dup.
dup of bug 150131?
On linux, address book card has the same problem. For entries which have
latin-1 chars, the window title doesn't display the name correctly.
Jungshik, do you think this is a dup of bug 150131?
The first case (a French mail subject not rendered correctly
on the window title bar under non-Latin-1 locale) is
for sure a dup. of 150131.

As for the second screenshot (attachment 91854 [details]), it seems like
there's another issue. Letters with accent should be
rendered as ?'s instead of Kanjis if this is also a dup.
of bug 150131. However, they're not, which  means
that a translation between encodings is missing somewhere 
in the implementation of bookmark code.  

To be sure, I have to know how the French name in the
screenshot was entered in the first place. Jeesun,
could you tell me how you entered it? 
Hmm. something is wrong with either my copies of Mozillas 
(my own build,and 1.1alpha) or my Window manager. An addr. book 
card window does not have window title bar so that I can't
test it on my machine. I'll reboot to Win2k and test it there.

In the meantime, I tried to make some sense out of the screenshot.
The name used in the screenshot
(décelées Après) is represented by 

64 e9 63 65 6c e9 65 73 20 41 70 72 e8 73  in ISO-8859-1

and 

64 c3 a9 63 65 6c c3 a9 65 73 20 41 70 72 c3  a8 73  in UTF-8


I've just checked abook.mab file and found that it stores strings in 
'a kind of UTF-8'. Thereore, a translation step is missing somewhere
between abook.mab and the window title bar. 

Three Kanjis in the shot are (set your encoding to UTF-8 to read the
following)
  
             EUC-JP    SJIS  
  宴 U+5BB4  0xB1E3     0x8983
  怨 U+6028  0xB1E5     0x8985
  回 U+56DE  0xB2F3     0x89F1

Since I don't know how EUC-JP(or JIS X 0208) encoder is
implemented in Mozilla, I can't make much sense out of this.


Just in case, Jeesun, could you show me the values of
$LC_* and $LANG?

  $ env | egrep '(LC_|LANG)'
  
would do it in Bourn-shell like shell.

I also like to know what window manager you're using and
what you have in /etc/sysconfig/i18n(if it's RedHat
or Mandrake) and ~/.i18n

I conducted some experiments under Win2k (both KO and EN locale)
and Linux(both ko_KR.UTF-8 and ko_KR.eucKR locale). Judging from
the result of these experiments, this bug is almost certainly
a dup. of bug 150131. I'm not yet certain why some Kanji
characters appeared in attachment 91854 [details] instead of ?'s. 
However, my test results under both Win2k and Linux 
(in case of the latter, I used 'xprop' to confirm that
WM_NAME and _NET_WM_NAME properties are set as they should
be for addressbook window and mail/news display window) clearly
indicate that two cases I thought of as separate in comment #6 and comment #7
are not separate. 

Before resolving this bug as a dup, I'd like to see the result
of 'xprop' on the window in attachment 91854 [details]. 
Jeesun, after opening up that window in Mozilla, you can
run 'xprop' from an xterm, move the cross-mouse-pointer
over the window and press the left button. In the xterm, 
you'll get a screenful of output like the following:

-----------
snip...
_NET_WM_NAME(UTF8_STRING) = 0x70, 0xc3, 0xbc, 0x6b, 0xc3, 0xbc, 0x20, 0x2d,
0x20, 0x6e, 0x65, 0x74, 0x73, 0x63, 0x61, 0x70, 0x65, 0x2e, 0x70, 0x75, 0x62,
0x6c, 0x69, 0x63, 0x2e, 0x6d, 0x6f, 0x7a, 0x69, 0x6c, 0x6c, 0x61, 0x2e, 0x71,
0x61, 0x2e, 0x69, 0x31, 0x38, 0x6e, 0x20, 0x6f, 0x6e, 0x20, 0x6e, 0x65, 0x77,
0x73, 0x2e, 0x6d, 0x6f, 0x7a, 0x69, 0x6c, 0x6c, 0x61, 0x2e, 0x6f, 0x72, 0x67,
0x20, 0x2d, 0x20, 0x4d, 0x6f, 0x7a, 0x69, 0x6c, 0x6c, 0x61
snip....

WM_LOCALE_NAME(STRING) = "ko_KR.eucKR"
WM_NAME(STRING) = "p?k? - netscape.public.mozilla.qa.i18n on news.mozilla.org -
Mozilla"

snip...
--------

I'm interested in three of them - _NET_WM_NAME, WM_LOCALE_NAME and 
WM_NAME. 


BTW, you may also try http://jshin.net/moztest/frenchname.html
and http://jshin.net/moztest/frenchname.utf8.html 
Jungshik, please notice that even the first case is showing Kanji instead of
?'s on the window's title bar
>To be sure, I have to know how the French name in the screenshot was >entered
in the first place.
I copied the letters from http://home.netscape.com/fr site and pasted   them
into mail compose window.


>Just in case, Jeesun, could you show me the values of
>$LC_* and $LANG?
> $ env | egrep '(LC_|LANG)'
LANG=ja_JP.eucJP
GDM_LANG=ja_JP
There are no $LC env variables

>I also like to know what window manager
I don't know. How can I tell?

>you're using and what you have in /etc/sysconfig/i18n
LANG="ja_JP.eucJP"
SUPPORTED="zh_TW.euctw:zh_TW:zh:en_US:en:ja_JP.eucJP:ja_JP:ja:ko_KR.euckr:ko_KR:ko"
SYSFONT="lat0-16"
SYSFONTACM="iso01"

Attached file The result of xprop
>Before resolving this bug as a dup, I'd like to see the result of >'xprop' on
the window in attachment 91854 [details]. 
See the attached text file.
Jeesun,
Thank you for testing. Can you also try 
http://jshin.net/moztest/frenchname.html (ISO-8859-1)
and http://jshin.net/moztest/frenchname.utf8.html (UTF-8)
to see whether the window title bar has the same problem?
I think it does.

As for the result of xprop, I'm sorry what I wrote in comment #8
may have been confusing. I'd like you run 'xprop' over
a _real_ *Mozilla* window with Kanji's in the titlebar
(instead of a image display program showing the screenshot)
Sorry I misunderstood your comment. Here's a new xprop result
> I'd like you run 'xprop' over
> a _real_ *Mozilla* window with Kanji's in the titlebar

 I ran 'xprop WM_NAME 8x' and 'xprop _NET_WM_NAME 8x'
over a Mozilla window displaying
http://jshin.net/moztest/frenchname.html under both ja_JP.eucJP
locale and ko_KR.eucKR locale. The result is interesting although
I don't think there's anything Mozilla is doing wrong here.

Under both locales, Mozilla sets _NET_WM_NAME (UTF8_STRING)
correctly (i.e. the UTF-8 string of the title of the page
above with French name). This is thanks to a patch
committed last July for bug 9449.

In case of WM_NAME, Mozilla sets WM_NAME(STRING) to
'd?cel?es Apr?s' under ko_KR.eucKR locale. I expected
the same under ja_JP.eucJP.
However, under ja_JP.eucJP locale, for some unknown reason,
it sets WM_NAME(COMPOUND_TEXT) to the following:

WM_NAME(COMPOUND_TEXT) = "d\033$(B\017+1c\033(Bel\033$(B\017+1e\033(Bs
Apr\033$(B\017+2s

where '\033' denotes ESC(0x27) and '\017' denotes Shift-IN(0x0f).


"ESC $ ( B" is not a sequence  defined in ISO-2022 although
it could well be used synonymously with "ESC $ B" for
designating JIS C 6226:1983 as G0. More mysterious is
why 'Shift-In' is there. This bug may have exposed a bug
in XFree86's compound_text/string handling code or
gtk's bug.

Anyway, it seems like there's very little Mozilla can do here
other than what we already know about this (see bug 9449
and bug 150131). As I wrote in bug 9449 and bug 150131,
we've already taken care of the window title bar
issue under Linux with _NET_WM_NAME patch.

 Nonetheless, it still bothers me that 
Mozilla's behavior is different under ko_KR.eucKR
from that under ja_JP.eucJP.
 
  Naoki, can you think of any place where Japanese and
Korean are treated differently  ? Although not
likely, there's a possibility that for ja_JP.eucJP,
a translation step is missing somwwhere while it's
not for ko_KR.eucKR.

  I'm adding Katakai-san to CC so that he can take a look
at this.

Jeesun,
thank you for the screenshots.

It turns out that U+00E8 and U+00E9 are representable in
EUC-JP because they're covered by JIS X 0212. That's why 
I got a sequence like '0x8f 0xab 0xb1' in the
following: (note ix86 is little-endian and two octets in 16bit 
word are reversed) where '0x8f' is used to declare
that the following two octets represent a char. in JIS X 0212.
'0xab 0xb1' is indeed U+00E9 in JIS X 0212 (invoked on GR).
(my understanding of X11 Compound Text encoding
was wrong and there must be something more in it than
ISO 2022. I have X11 C_T spec. somewhere, but I didn't
bother to dig it up.)

(xprop was run under ja_JP.eucJP)
$ xprop | egrep '^WM_NAME' | sed 's/WM_NAME(COMPOUND_TEXT) = "//' \ 
   | hexdump

0000000 8f64 b1ab 335c 3334 6c65 ab8f 5cb1 3433
0000010 7335 4120 7270 ab8f 5cb2 3633 2033 202d
0000020 6f4d 697a 6c6c 2061 427b 6975 646c 4920
0000030 3a44 3220 3030 3032 3136 3031 7d38 0a22

Then, why astray Kanjis in the title bar? I think
that's because RH JA 7.1 is misconfigured. 
/etc/gtk/gtkrc.ja on my RH (En) 7.1 does not
have any JIS X 0212 fonts. Adding jis x 0212
font there may or may not solve the problem. 
Some window managers require a separate fontset
specification. 

Anyway, this is a duplicate of bug 150131
(and 9449) and I suggest this be marked as such.

Mark this as dup of bug 150131 as Jungshik suggested.

*** This bug has been marked as a duplicate of 150131 ***
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → DUPLICATE
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.