Closed
Bug 107217
Opened 23 years ago
Closed 15 years ago
Cyrillic is rendered with a double-width font in UTF-8.
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
WORKSFORME
mozilla1.2alpha
People
(Reporter: mikhailian, Assigned: shanjian)
References
(Depends on 1 open bug, )
Details
(Keywords: intl)
Attachments
(1 file)
9.41 KB,
application/octet-stream
|
Details |
Cyrillic text is rendered with a double-with font in utf-8,
as if it was Chinese. Apparently, there is no way to change
this setting in the user configs.
See http://bellinux.sourceforge.net/mikhailian/
for an example of such a text.
Mozilla version is 0.9.5, built with the following
options: ./configure '--enable-optimizations=-O4\ -finline\
-fno-omit-frame-pointer\ -march=pentiumpro\ -mcpu=pentiumpro' --disable-debug
--enable-svg --enable-mathml --prefix=/usr/local/mozilla-9.5
![]() |
||
Updated•23 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 1•23 years ago
|
||
Font Metric change may have caused this problem.This looks ok on my Windows.
Teruko: can you verify on Windows?
-> Assiging to bstell for Linux
Assignee: yokoyama → bstell
Updated•23 years ago
|
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.7
Reporter | ||
Comment 2•23 years ago
|
||
I forgot to add that the problem appears only on Linux.
Yet another, more visible example of incorrectly
rendered page is http://bellinux.sf.net/lite/
I also attach the output of xlsfonts.XFree
version is 4.1.0-7 from the Debian distro
Reporter | ||
Comment 3•23 years ago
|
||
Comment 5•23 years ago
|
||
If you get user-specified font for x-unicode lang group
effective (see bug 91190. it's turned off now and
you have to modify the source and build mozilla),
Cyrillic letters in UTF-8
pages will be rendered with glyphs from 'NON-CJK' fonts
most of time.
Why 'most of time'? Because some iso10646-1 fonts in XF86 4.x
have exactly the same FFRE(foundry,family,repertoire,encoding)
but with different values in 'additional style' field
and it's not possible to differentiate
them from one another at the moment.
There are so many complicated things Mozilla have to deal
with in picking up appropriate fonts and glyphs in X11.
On font-front, MS Windows makes the life of developers
much easier than X11. Hopefully, X Render
(a new extension introduced in XF86 4.x and
already used by KDE 2.x) will make
things much simpler for X11 developers including
Mozilla developers than now.
Depends on: 91190
Comment 7•23 years ago
|
||
Well, cyrillic (and in particular Bulgarian) is rendered as double-width in most
UTF-8 (all?) e-mail messages for a long time (>Mozilla 0.9.3).
My system: Win98SP2 Japanese distribution, Mozilla 0.9.5
The two URLs mentioned are also double-width in my Win98.
Assignee | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Assignee | ||
Comment 8•23 years ago
|
||
accepting
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Comment 10•23 years ago
|
||
I'm not sure if this is the same thing or a different bug. Both Greek and
Cyrillic appear to have incorrect letter-spacing on this page (in Linux):
http://www.unicode.org/unicode/standard/WhatIsUnicode.html. I think this is
actually a very good page to test Unicode capabilities on (the ZWSP bug shows up
here too).
Reporter | ||
Comment 11•23 years ago
|
||
>I'm not sure if this is the same thing or a different bug.
It is the same, I think.Jungshik Shin already explained why
this happens. The same bug can occur for any non-CJK language
for which there are symbols in CJK (double-width) fonts.
Comment 12•23 years ago
|
||
I can confirm this problem for Mozilla 1.0rc2 for Linux. Test page:
http://www.cl.cam.ac.uk/~mgk25/ucs/wgl4.txt (line 0410, etc.)
Most likely cause of the problem: Mozilla takes a glyph out of a CJK font with a
higher priority than out of an ISO10646-1 font, because CJK fonts provide
usually full coverage (so Mozilla knows what is in them), whereas all ISO10646-1
fonts are necessarily just subset fonts, and testing a font for the presence of
each glyph can be inefficient (unless implemented properly).
Suggested solution: When Mozilla finds any ISO 10646-1 font, then it should take
from any higher-priority CJK fonts only those glyphs that fall into one of the
following Unicode ranges:
2380..D7AF,F900..FAFF,FE30..FE6F,FF01..FF5E,FFE0..FFE6,20000..2FFFF.
For comparison, the corresponding Unicode blocks are:
2E80..2EFF; CJK Radicals Supplement
2F00..2FDF; Kangxi Radicals
2FF0..2FFF; Ideographic Description Characters
3000..303F; CJK Symbols and Punctuation
3040..309F; Hiragana
30A0..30FF; Katakana
3100..312F; Bopomofo
3130..318F; Hangul Compatibility Jamo
3190..319F; Kanbun
31A0..31BF; Bopomofo Extended
31F0..31FF; Katakana Phonetic Extensions
3200..32FF; Enclosed CJK Letters and Months
3300..33FF; CJK Compatibility
3400..4DBF; CJK Unified Ideographs Extension A
4E00..9FFF; CJK Unified Ideographs
A000..A48F; Yi Syllables
A490..A4CF; Yi Radicals
AC00..D7AF; Hangul Syllables
F900..FAFF; CJK Compatibility Ideographs
FE30..FE4F; CJK Compatibility Forms
FE50..FE6F; Small Form Variants
FF00..FFEF; Halfwidth and Fullwidth Forms
20000..2A6DF; CJK Unified Ideographs Extension B
2F800..2FA1F; CJK Compatibility Ideographs Supplement
Such a restricted mapping of a CJK font could easily be used with a
higher priority than an ISO 10646-1 font, without troubling European
users with doublewidth Greek, Cyrillic and Blockgraphics glyphs.
Comment 13•23 years ago
|
||
Does the document indicate a language?
If a document just has a encoding tag of Unicode how should an app say the
the user wants CJK glyphs or western glyphs? Japanese users could reasonably
argue that a Japanese width glyphs should be used. Cyrillic users could
reasonably argue that a Cyrillic width chars should be used.
For a while Mozilla ignored a variety of chars in CJK fonts such as smart
quotes. Moz did this because the width of a CJK smart quote was far too big for
a western document. However, CJK users then complained that they could not
access the right width smart quotes for CJK documents; the western smart quotes
were too narrow. Mozilla no longer ignores these chars but instead first tries
to find a font in the language group; ie: a western font for western documents,
a Japanese font for Japanese docs, etc.
Yes, mozilla avoids iso10646 fonts because it is so expensive to find out what
is in them. The problem is that XLFD registry-encoding of iso10646-1 only
says Unicode but gives no clue about what chars a font has. To find out Moz
needs to call XLoadQueryFont for *every* font it looks at until a it finds one.
This is extremely expensive and avoided when ever possible.
> testing a font for the presence of each glyph can be inefficient (unless
> implemented properly).
Could you describe an efficient method?
> Suggested solution: When Mozilla finds any ISO 10646-1 font, then it should
> take from any higher-priority CJK fonts only those glyphs that fall into one
> of the following Unicode ranges:
How would this work for Japanese users that want wider chars?
Comment 14•23 years ago
|
||
> If a document just has a encoding tag of Unicode how should an app say the
> the user wants CJK glyphs or western glyphs? Japanese users could reasonably
> argue that a Japanese width glyphs should be used. Cyrillic users could
> reasonably argue that a Cyrillic width chars should be used.
Mozilla 1.0rc2 uses always double-width cyrillic and block graphics characters
from CJK fonts, even if the source is a HTML 4.01 file with LANG=en or LANG=ru
as an attribute of the HTML element. As far as I can tell, language tagging does
not influence Mozilla's choice of glyphs at the moment.
Example file:
http://www.cl.cam.ac.uk/~mgk25/ucs/wgl4.html
It would be desireable if the non-ideographic characters are taken from CJK
fonts *only* if the HTML language tag (or in its absence as a fallback perhaps
the URL/DNS country code) suggests that the document is in a CJK language. By
default, Mozilla should follow the same width convention as xterm, which is
documented in
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
The old CJK terminal emulator habit of rendering every character from a
double-byte encoded font as a double-width character has always been an
accidental typographic menace and must by no means be carried over and
generalized for non-CJK languages into the Unicode world.
Reporter | ||
Comment 15•23 years ago
|
||
>It would be desireable if the non-ideographic characters are taken from CJK
>fonts *only* if the HTML language tag (or in its absence as a fallback perhaps
>the URL/DNS country code) suggests that the document is in a CJK language.
Before falling back (if this would ever be implemented), Mozilla can also
check the accept-language parameter of the HTTP request.
As of my day-to-day experience, I can tell that non-CJK users tend to remove
all the CJK fonts from the system to bypass the problem. This is not much of
a loss because occasional CJK glyphs can be displayed with a unicode font as
well. What causes more harm is that many users can not figure out how to deal
with this double-sized vs. `normal' issue at all. Thus, proper documentation
and/or a configuration parameter will also help to solve the issue.
Comment 16•23 years ago
|
||
> As far as I can tell, language tagging does not influence Mozilla's choice
> of glyphs at the moment.
Are you seeing no language effect or are you saying you would like iso10646
fonts to be considered equal with non-iso10646 fonts?
Erik and I have struggled with the iso10646 problem for years now and lacking
some way to solve this we have had no choice but to only use them as a last
resort. Making *all* page layout performance suffer so that iso10646 fonts can
be used is not attractive.
Does this system have Cyrillic fonts other than iso10646? If non iso10646
Cyrillic fonts are available then the Cyrillic font searching code needs work.
> It would be desireable if the non-ideographic characters are taken from CJK
> fonts *only* if the HTML language tag (or in its absence as a fallback perhaps
> the URL/DNS country code) suggests that the document is in a CJK language.
I think there is agreement that Mozilla should use glyphs appropiate for the
document's language group. However, until there is an reasonable way to find
out what is in a iso10646 font; those fonts will only be used as a last resort.
Comment 17•19 years ago
|
||
dup of bug 163754, perhaps?
Updated•15 years ago
|
QA Contact: teruko → i18n
Comment 18•15 years ago
|
||
WORKSFORME for some time now.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•