Closed
Bug 122436
Opened 23 years ago
Closed 22 years ago
Unicode (UTF-8) pages use Western font preference
Categories
(Core :: Internationalization, defect)
Tracking
()
Future
People
(Reporter: liblit, Assigned: shanjian)
References
()
Details
(Keywords: intl)
Attachments
(2 files)
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7) Gecko/20020104
BuildID: 20020104
The URL given above contains XHTML encoded using UTF-8. However, Mozilla
renders it using my "Western" serif font rather than the "Unicode" font. This
leads to incorrect display of Unicode characters that are not represented in the
(typically ISO-8859-*) Western font. For example, curly quotes and apostrophes
instead appear as straight vertical marks.
The easiest way to see that something is going wrong is to select wildly
different Western and Unicode fonts, then visit a Unicode page. You will see
that the Western font is used.
Reproducible: Always
Steps to Reproduce:
1. Select a "Western" font that will be easily recognized as follows:
1.1. Bring up the Preferences dialog.
1.2. Select Appearance -> Fonts.
1.3. In the "Fonts for:" menu, select "Western".
1.4. In the "Proportional:" menu, select "Serif".
1.5. In the "Serif:" menu, select something distinctive, such as
"urw-zaph chancery-iso8859-1".
1.6. Verify that the "Unicode" serif font is set to something
reasonable and different from the "Western" font just selected.
2. Visit <http://www.cs.berkeley.edu/~liblit/>.
3. Under View -> Character Coding, verify that Mozilla has correctly
selected Unicode (UTF-8).
4. Observe that the distinctive "Western" font is being used, in spite
of the fact that the page is UTF-8 encoded and therefore should be
using the "Unicode" font.
Actual Results: The page is rendered using the distinctive "Western" font.
Expected Results: The page should have been rendered using the selected
"Unicode" font.
View -> Character Coding shows "Unicode (UTF-8)" selected. So Mozilla does know
how the page is encoded. It's just not picking the right set of fonts for that
encoding.
I suggested picking visually distinctive "Western" and "Unicode" fonts to make
the problem easier to see. This issue shows up in normal usage as well, though
it's more subtle. I normally have "adobe-times-iso8859-1" selected as my
Western serif font, and "adobe-times-iso10646-1" selected as my Unicode serif
font. There are a couple of Unicode characters at the URL given above: for
example, the apostrophe in "Ben's" should be a curly right apostrophe (’),
but it is rendered as a vertical apostrophe (') because the wrong font is being
used. There are a couple more apostrophes as well as some double quotes
(“...”) further down on the page that have the same problem.
Mozilla's font selectors prevent one from choosing an iso10646-1 font for the
Western encoding. Galeon, which uses Mozilla's rendering engine, applies no
such restrictions. If I tell Galeon to use an iso10646-1 font for Western
encodings, the Mozilla engine happily goes ahead and uses it when I visit a
Unicode page, and Unicode characters appear as they should (e.g., curly quotes
really are curly). So at some level the rendering engine *does* know how to use
Unicode fonts, if those are the fonts it's asked to use.
Comment 1•23 years ago
|
||
To intl.
Assignee: attinasi → yokoyama
Status: UNCONFIRMED → NEW
Component: Layout → Internationalization
Ever confirmed: true
QA Contact: petersen → ruixu
Comment 2•23 years ago
|
||
Shanjian,
I believe we can use NS_FONT_DEBUG to find out what lang group the font code
thinks the document is in and where in the font search path the font is
found.
Could you explain to them how to do this?
Once we know what is happening we can try to determine what can/should
be done.
Thanks.
Assignee | ||
Comment 3•23 years ago
|
||
For utf-8 and other unicode encodings, we are currently using user's locale charset
to figure out the language. This is because we lack of a mechinism to specify/recognize
language in xul (and probably other xml files). Before that is fixed, we could not do
much to this bug.
Target Milestone: --- → Future
This happens on Windows too. Changed the platform to ALL.
On my Simplified Chinese Windows XP, UTF-8 page display is using Simplified
Chinese fonts.
Hardware: PC → All
Comment 6•23 years ago
|
||
I'm not sure I understand what's the exact problem. I have multiple languages
inside inside a UTF page, and Mozilla auto-senses the region and displays the
appropriate fonts for the appropriate language. Mozilla's seems to use
multiple fonts in one page.
URL: http://www.realmspace.com/unicode/ut/h/utf8.html
Reporter | ||
Comment 7•23 years ago
|
||
Joaquin Menchaca has one example of a multilingual page that works, but one
working example doesn't mean the code is correct in general. In my original
report I gave quite exhaustive instructions on how to reproduce the problem.
For Unicode characters not present in non-Unicode fonts, Mozilla is clearly and
unambiguously doing the wrong thing.
Joaquin, please surf over to <http://www.cs.berkeley.edu/~liblit/>, and look at
the first word in the title: "Ben's". Do you see a vertical apostrophe, or do
you see a curved single right quote? If you see a vertical apostrophe, then
Mozilla is doing the wrong thing.
Comment 8•23 years ago
|
||
I see 2 issues here:
1) Having Unicode in the list of font language groups in the font prefs seems
inappropiate. The rest of the entries are language groups (excluding
"User Defined" which is there to in the hope that it will allow people to
trick the browser into working for unsupported languages). I suspect that
the Unicode entry is a leftover from NS 4x days when the code did not
support Unicode.
2) The font system tries to avoid iso10646 fonts because it is so expensive to
determine which chars they support and we do not have any good way to tell
which language group they are appropiate for.
It would be great if we could tell what chars are in iso10646 fonts but we
cannot without doing a XLoadQueryFont (or XQueryFont) which is very expensive.
When I added the TrueType support I ended up writing 4000+ lines of code to
address this issue of getting the list of supported chars in a font. I was
able to cache the info because I had access to the TrueType font file
timestamps and could tell if the files had changed. Unfortuantely the X font
API provides no way to tell if the fonts have changes so if we were to cache
which chars ae X iso10646 font had we would never be able to tell if it was
stale. If the info is stale we would get complaints that we did the wrong thing.
Until we have a reasonable way to get the list of chars in iso10646 fonts we
we either have to choose to be very inefficient when searching for glyphs
(all languages) or to have less than perfect Unicode support.
Assignee | ||
Comment 9•23 years ago
|
||
Let me restate the problem to make it more clear. When we choosing fonts, we use
the language group information guiding our font search. This info can be
provided through HTML attribute "lang". When it is missing from doc, in most
cases we are able to figure it out from document's encoding, ie. charset. For
unicode encodings, this approach does not work and we can only mark the document
as unicode. Since all of mozilla's xul files (UI implementation) are in unicode
encoding, and we don't want to use unicode font in such situation, we put a hack
and current locale language replaces unicode. If you run mozilla in western
locale, western language group will be use, thus western font will be selected.
I do plan to fix this, but my effort is blocked by xml's incapability of
handling "lang". Keeping XUL files working well is a priority. Anyway bug has
been filed and I am waiting for it.
For characters like ’ “ ”, Their glyphs could not be found in
western font. If we choose to use asian font, glyph will be too wide. Unicode
font is too expensive (as bstell suggest in his last comment) and we always try
to avoid it. So the current approach is, if we cann't find them in western font,
we transliterate them and using subsitute glyph found in western font to replace
it. I have been thinking if we should try 10646 font or not. There is some
other bugs filed against those problem and should not be the concern of this
one.
to brian,
Current unicode language group is really misleading and practically does not
work at all. It might be a good idea to eliminate it for now. But for future, I
guess it might be useful in certain situation. I have no strong opinion about
this issue.
Comment 10•23 years ago
|
||
Ben: in the css could you try adding "adobe-times-iso10646-1" to the font list?
Reporter | ||
Comment 11•23 years ago
|
||
Per Brian's request, I tried adding the following CSS rule:
html { font-family: adobe-times-iso10646-1 }
With this change, the various curly Unicode quotes do appear as intended.
I'm not sure if Brian was trying to debug things or was suggesting a workound.
I wouldn't really consider this to be a viable workaround, because it has an
additional unwanted side effect (selecting a Times font regardless of the user's
defaults).
If there were a way to specify the "iso10646-1" part without the "adobe-times"
part, that might be a reasonable workaround.
Reporter | ||
Comment 12•23 years ago
|
||
Is this a duplicate of bug #91190? A blocker of it? Dependent upon it? I
think both reports are basically talking about the same issue.
Reporter | ||
Comment 13•23 years ago
|
||
If I understand things correctly, when a character is not mapped to a specific
non-Unicode language group, Mozilla falls back on the language group associated
with the current locale. If the character is not actually defined in that
locale's fonts, then Mozilla performs a reasonable best-effort substitution.
What about adding one more stage to this logic? Before doing the best-effort
substitution, check to see if that character is defined in the iso10646 font.
If it is, then use it. If it's missing from there too, then fall back on the
best-effort substitution.
That should fix the sort of problems I'm seeing without changing the behavior of
anything that was already working correctly. Can this be done in a way which is
efficient relative to Brian Stell's concerns about XQueryFont() inefficiency and
such?
Comment 14•23 years ago
|
||
this looks like a dup of bug 91190
Comment 15•23 years ago
|
||
> Before doing the best-effort substitution, check to see if that character is
> defined in the iso10646 font. ... Can this be done in a way which is
> efficient relative to Brian Stell's concerns about XQueryFont() inefficiency
The problem *is* that to check if a iso10646 font has the char is very
expensive. Thats why we only do it when we are desparate (such as when
tranliteration fails).
This is a problem with trying to use X's XLFD for iso10646 (Unicode) fonts.
All other encoding (mostly) fill in all possible chars. Unicode does not.
Thus we are stuck needing to get the list of chars via XLoadQueryFont (or
XQueryFont).
For a long time now we have talked about caching the data but without a way
to check if the cached data is stale this is not safe to do.
For the TrueType fonts I was able to check for stale data because I have access
to the font file timestamps (if timestamp is not the same as when the data was
generated then the data is stale).
Comment 16•23 years ago
|
||
bstell wrote:
> This is a problem with trying to use X's XLFD for iso10646 (Unicode) fonts.
> All other encoding (mostly) fill in all possible chars. Unicode does not.
> Thus we are stuck needing to get the list of chars via XLoadQueryFont (or
> XQueryFont).
Actually the XLFD standard _allows_ to peek if a char is available in the font
or not...
For example:
'-misc-fixed-medium-r-normal--0-0-0-0-c-0-iso8859-1[65 70 80_92]' tells the font
source (Xserver or xfs) that the client is interested only in characters 65, 70,
and 80-92.
Question is whether major vendors like XFree86 implement that correctly...
Comment 17•23 years ago
|
||
Peeking like this implies a round trip to the X server per font which is also
expensive.
Perhaps for local X servers we could detect that the font info cache is stale
by checking the X font path and the files on that path. If the path or the
files on the path change we could update the cached font info.
I have very limited time and I'm am working on TrueType printing. If someone
would care to volunteer to work on caching the X font info I think I can guide
them. I'd guess that it would only take about a week to get working code and
another 2-3 weeks to bring it up for production grade.
Reporter | ||
Comment 19•23 years ago
|
||
If the only problematic issue here is when to invalidate the cache, why not
invalidate at Mozilla exit? I.e., cache for the lifetime of the process. Fonts
don't change all *that* often, so it seems reasonable to require a quit/restart
cycle to pick up changes. Or flush the cache whenever font prefs change.
Anything more sophisticated, such as monitoring the font search path, is bonus
work that shouldn't prevent us from getting something simple up and running that
will do the job for most people in most common usage scenarios.
Comment 20•23 years ago
|
||
> If the only problematic issue here is when to invalidate the cache, why not
> invalidate at Mozilla exit?
Generating the data is extremely expensive (in the multiple minute range).
Thus we cannot regenerate it every startup (unless we want a multiple minute
delay on startup).
Because of the huge time cost to be useful we would need to generate it once
only and then only check if the data needs to be updated (as I do for the
TrueType fonts).
Reporter | ||
Comment 21•23 years ago
|
||
Egad. I knew it was bad, but I didn't know it was *that* bad. Thanks for the info.
Reporter | ||
Comment 22•22 years ago
|
||
I just went back and revisited the cited URL
(<http://www.cs.berkeley.edu/~liblit/>) using Mozilla 1.0, and the curvy quotes
show up correctly.
The Western font preference is still used for the majority of text on the page,
but a proper Unicode font is being used for the Unicode-only characters (quotes,
in this case).
Is this bug now fixed? Or has it merely changed in some curious way?
Assignee | ||
Comment 23•22 years ago
|
||
That is probably because of the support of freetype.
Reporter | ||
Comment 24•22 years ago
|
||
No, I don't think this is because of freetype support: I'm using the prebuilt
Red Hat RPM's, which supposedly do not include freetype support. Perhaps the
addition of conditional freetype support affected font handling elsewhere,
though, causing this change even without freetype support in my binary,
Comment 25•22 years ago
|
||
Actually the mozilla moz (non Redhat) has direct FreeType2 (Truetype) support
and I believe the Redhat rpms have FreeType2 via Xft (there was/is a long
discussion on whether Xft was/is ready for mozilla) so you might have Truetype
working.
You could use 'xmag' to capture/enlarge the pixels and see if they have "grey"
pixels on the edges (while the direct FreeType2 code does use the Truetype
embedded bitmaps if available I believe that the Xft version cannot).
Reporter | ||
Comment 26•22 years ago
|
||
I'm using Ximian's RPMs installed via Red Carpet. "xmag" shows no grey-edged
antialiasing. "lsof" reports that "mozilla-bin" has neither the Xft nor the
FreeType2 libraries open.
{shrug}
Comment 27•22 years ago
|
||
Comment 28•22 years ago
|
||
I wanted to attach two files to the same comment, but that is not allowed I
guess. This bug is still present in Mozilla 1.0.1 (the browser used to submit
this) and Mozilla 1.2.1. Personally this bug drives me up the wall, especially
since support is so close to working.
While this attachment does show a working example, asking everyone on the
planet with UTF-8 html to add a font selection in a style sheet doesn't seem
like a likely workaround.
Comment 29•22 years ago
|
||
*** This bug has been marked as a duplicate of 91190 ***
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•