Closed
Bug 121193
Opened 23 years ago
Closed 3 years ago
Need a way to let user specify language for unicode encoded webpages.
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
INVALID
Future
People
(Reporter: u32858, Unassigned)
References
()
Details
(Keywords: intl)
Attachments
(1 file)
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7) Gecko/20011226
BuildID: 2001122617
Messenger displays Japanese utf-8 emails fine. when using helvetica fonts.
However in Mozilla when i use babelfish to translate some words I dont know the
punctuation characters "." and "," the full stop looks like a small circle and
the commer is angled forward not back.
the formating displays them as if they are ' eg at the top of the line. I have
seen this in other programs, the problem is that Japanese can go vertically down
as well as left -> right, the other time i saw this error the text was rotated
90 degrees to the left as well.
So to sum up please check that punctuation is sopported in vertical, and that
the horizontal left -> right formating is fixed.
Possibly this is a problem with X, but as Messenger works fine i suspect its
mozilla.
If you would like to test, check this.
BEGIN しく。 また、 END of japanese, it means nothing if u are trying to work
it out.. :)
Reproducible: Always
Steps to Reproduce:
1.type some text with japanese full stops and comers
2.paste it into babelfish.altavista.com
3.
Actual Results: commer and full stop at top of line
Expected Results: commer should be at bottom and so should full stop
Unless in the vertical Japanese mode, please dont stop this.
Comment 1•23 years ago
|
||
Changing component from browser-general to internationalization.
Component: Browser-General → Internationalization
Comment 3•23 years ago
|
||
I failed to see the problem on my W2K-Ja
I pasted the given Japanese text
BEGIN しく。 また、 END (note: the text between BEGIN and END)
into babelfish.altavista.com and got
BEGIN It does, the く. In addition, END
reporter: I think I need a screen shot. Can you attach the image?
ruixu: can you verify this? is this Linux only?
These comma and full stop are displayed as if they are vertical text, but the
text is left->right so they should rest at the bottom, on the line
Comment 5•23 years ago
|
||
jg: thanks for the screen shot. It looks like only in Linux platform. I didn't
see the problem on my Win machine.
shanjian: I am not sure about the linux fonts, so I am assigning this to you. Do
you see a similar problem before? It looks familiar to me.....
Assignee: yokoyama → shanjian
Reporter:
Could you please provide us more detailed repro steps and more information
about your system environment, e.g. your Linux version and language, your
working locale, etc.? Do you still see the same problem on the latest build?
Thank you.
Comment 7•23 years ago
|
||
The glyph for unicode u+3002 (cjk full stop) and u+3001 (CJK comma) in Korean font are like
these. For unicode encoded webpages, we could not find out its language just base on its
encoding. The font search list is somewhat random. If a korean font is tried first, these
2 characters will be rendered this way.
Web page authors can specify the language throgh the use of "lang" attribute. We really
need a mechanism for end user to fill in this piece of information when necessary. But
before bug 115121 is fixed, we can't do much about it.
Status: UNCONFIRMED → ASSIGNED
Depends on: 115121
Ever confirmed: true
Summary: UTF-8 Japanese incorrectly displayed → Need a way to let user specify language for unicode encoded webpages.
Target Milestone: --- → Future
Well, it looks like we are waiting on
http://bugzilla.mozilla.org/show_bug.cgi?id=115121
This is my system setup currently, changed since I reported the bug but still
the same problem, so i will list the differences. I wanted to get kinput2
working so i made some modifications.
$ locale
LANG=ja_JP
LC_CTYPE=ja
LC_NUMERIC=en_GB
LC_TIME=en_GB
LC_COLLATE=en_GB
LC_MONETARY=en_GB
LC_MESSAGES=en_GB
LC_PAPER="ja_JP"
LC_NAME="ja_JP"
LC_ADDRESS="ja_JP"
LC_TELEPHONE="ja_JP"
LC_MEASUREMENT="ja_JP"
LC_IDENTIFICATION="ja_JP"
LC_ALL=
What was different before was LC_CTYPE=en_GB:en if i remeber correctly.
Please note this bug is only present in mozilla, not messenger, that must be
doing something correctly, perhaps you can compare the utf8 stuff?
My OS is Linux mandrake8.1, English install.
Other things I have modifed when tyring to get japanese to work are
http://www.mandrakeforum.org/article.php?sid=1420&lang=en
and an english guide for japanese as a second language
http://www.math.wisc.edu/~stefanss/japanese/index.html
it still does not work after the modifications.
Unicode fonts in mozilla tested are. wadalab-gothic-jisx0208, 1983-0
and helvetica, it still displays incorrectly though
JG
JG, Thank you for the detailed information.
We are able to reproduce this problem with recent builds on RedHat JA Linux 7.1.
When copying your test text between BEGIN and END to the site
http://babelfish.altavista.com, the problem seems only happening if the encoding
is set as Korean in the Mozilla browser. We also tried UTF-8, Japanese and
Chinese encodings and the text can be displayed correctly.
Could you please check what your browser encoding is? Thank you.
Keywords: intl
Reporter | ||
Comment 10•23 years ago
|
||
Hello
I just did some more tests,
I have never visited a korean site to my knowlege and I have not set it to
korean in the View->Character coding.. menu.
I tested the following encodings (the babelfish site was only an example of a
site with a box to type into, i did these tests on this mozilla bug report page!)
UTF-8: Bug present
Shift_JIS: NOT
EUC_JP: NOT
ISO-2022-JP: NOT (is this some windows format?)
So it appears that it is a UTF-8 only bug, thinking korean, as someone sugested,
but as unicode has separate punctuation for each language (i think) i dont know
why it would display korean full stop and comma.
I believe it is the same bug I experienced on windows with some english text
editors. i had all text rotated 90 Degrees anti-clockwise, thus the full stop
was at the top, this is due to Japanese going top->bottom (then moving left
across page) as well as left->right.
I dont know if my idea is correct.
Today Mozilla locked up when writing a japanese email, the whole thing crashed.
Just when writing this it locked up for 10secs before comming back. I wont
report a bug as I doubt I can replicated it.
I dont know who did the UTF-8 code for top->bottom code, but could someone add
them to the CC list?
JG
Comment 11•23 years ago
|
||
JG, Thanks a lot for the updates!
Here is the summary:
1. On JA RedHat Linux 7.1, JA MacOS 9.1 and EN MacOS X:
When encoding is set as Korean and using JA IME, the problem will appear,
but this combination is unlikely to happen with real users.
2. On EN Linux mandrake8.1:
When encoding is set as UTF-8 and using JA IME, the problem will appare, it
is really bad in this case.
It can even be reproduced using the "Additional Comments" box in the bug report.
QA Contact: ruixu → ylong
Reporter | ||
Comment 12•23 years ago
|
||
Hello,
Updates are fine :) Thank you for checking this bug, and hopefully fixing it!
my browser is set to UTF-8 default, so its any site with a text box in mozilla.
Note messenger is fine, what is it doing differently?
JG
Comment 13•23 years ago
|
||
I need to clarify something here.
1) this problem should only exist with unicode encoding (like UTF-8).
2) Changing encoding from UTF8 to something else will cause garbled display,
unless the character is represented in NCR, (like —).
3) We can't do much before 115121 is fixed. For unicode encoding, our best
guess is locale language. That's say you should see 2 chars rendered in
japanese font if you are running in japanese locale. (We may still have
some minor issues.) If you see korean font is used under japanese locale
inside browser, and you do have japanese font, please file a bug.
I will keep this bug for the issue mentioned in summary.
Comment 14•23 years ago
|
||
> this problem should only exist with unicode encoding (like UTF-8).
Per our discussing, there is only one this kind of case:
UTF-8 On EN Linux mandrake8.1
but it not reproducible with UTF-8/linux RedHat7.1
Comment 15•23 years ago
|
||
I could reproduce it on RH7.2. It is depend on font availability and
resolving order. Even you don't see it in certain platform, problem
still exists. It will happen in a different scenario.
Comment 16•23 years ago
|
||
> I could reproduce it on RH7.2. It is depend on font availability and
> resolving order. Even you don't see it in certain platform, problem
> still exists. It will happen in a different scenario.
I agree. I was able to reproduce it on my RH 7.1. I launched Mozilla
under Japanese locale, but still saw Korean glyphs take precedence
over Japanese glyphs for two characters in question (with Character coding
set to UTF-8).
BTW, these two characters (ideographic full stop and ideographic comma)
should NOT have been unified with full stop for vertical writing
and comma for vertical writing ONLY present in KS X 1001 (at 1-2 and 1-3,
respectively) in Unicode/ISO 10646. Apparently, annoations to two
characters at 1-2 and 1-3 in KS X 1001 were overlooked and they're falsely
identified with ideographic full stop and ideographic comma in JIS
(and perhaps in corresponding PRC and ROC standards). I'll raise the issue
at Unicode mailing list.
Reporter | ||
Comment 17•23 years ago
|
||
Any progress on resolving this bug? I can see it now in 2002 03 11 15
Comment 18•22 years ago
|
||
Related bug (the opposite case) is bug 122779 (closed as of now, but
to be reopened).
Reporter | ||
Comment 19•22 years ago
|
||
Confirming still present in
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2b) Gecko/20021029, build 2002102908
Could a prioity or milestone be set on this one? i think its rather more
important than "future"
Regards
JG
Updated•22 years ago
|
Keywords: mozilla1.3
Comment 20•22 years ago
|
||
We can't do much before xml:lang is resolved. After that, there are still a lot
of work to do, so I don't see this problem will be resolved in near future. The
work around s to use "lang" attribute in html web page.
Depends on: 41978
Comment 21•22 years ago
|
||
bug 41978 (xml:lang) was fixed, but there are
still a lot of works to fix this (including
adding langObserver similar to charsetObserver)
as Shanjian wrote.
IMHO, it should be evangelized that every
Unicode web page specifies lang pseudo-elment
for html and xml:lang for xml. Even with that,
there's a bug(bug 204586) to fix (which is easier to fix
than this) with languages/scripts that should
benefit most from using Unicode (that is,
those scripts/languages for which Unicode is the
first and only widely accepted character set).
Updated•21 years ago
|
Comment 22•20 years ago
|
||
(In reply to comment #21)
> IMHO, it should be evangelized that every
> Unicode web page specifies lang pseudo-elment
> for html
You mean lang attribute? I was just thinking the same thing myself. But in the
meantime...
(In reply to comment #7)
> Web page authors can specify the language throgh the use of "lang" attribute.
We really
> need a mechanism for end user to fill in this piece of information when necessary.
Yes please! This bites me all the time. I need to set LANG (or LC_CTYPE and
LC_MESSAGES) to ja_JP in order to enable Japanese text input, but then all
Unicode pages lacking language tags are rendered using a Japanese font, which
looks terrible for English text (because it's fixed-width). I haven't been able
to find a way to make the default language English without disabling Japanese
input. Having separate control over the default language for various purposes
(UI input, UI output, page rendering) would be very helpful. Being able to
change the language in a single window would be even better. Maybe via a menu,
analogous to the menu that allows the charset to be overridden for one window.
By the way, this is with Firefox 0.9.3 on Linux.
Comment 23•20 years ago
|
||
Not that I can work on this at the moment (it requires a lot of changes) but
that I have a slightly higher chance of being able to work on this than shanian
.. and it helps me track this one better.
Assignee: shanjian → jshin
Status: ASSIGNED → NEW
Updated•15 years ago
|
QA Contact: amyy → i18n
Comment 24•3 years ago
|
||
The bug assignee didn't login in Bugzilla in the last 7 months.
:m_kato, could you have a look please?
For more information, please visit auto_nag documentation.
Assignee: jshin1987 → nobody
Flags: needinfo?(m_kato)
Comment 25•3 years ago
|
||
Oh man, babelfish.altavista.com! I have fond memories. this bug is really old and invalid now.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INVALID
Comment 26•3 years ago
|
||
This seems to be mailnews issue. please file a new issue for thunderbird.
Flags: needinfo?(m_kato)
You need to log in
before you can comment on or make changes to this bug.
Description
•