Closed Bug 105199 Opened 23 years ago Closed 23 years ago

glyph search order is NOT lang dependent

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla0.9.7

People

(Reporter: jshin, Assigned: shanjian)

References

()

Details

(Keywords: intl, Whiteboard: need r/sr)

Attachments

(3 files, 3 obsolete files)

When CSS specifies 'lang' attribute, Mozilla does not refer to it while searching for glyphs to render CJK (and other ) text. I have the following font specifications in my prefs.js: font.monospace.ko=daewoo-gothic-ksc5601.1987-0 font.monospace.zh-CN=isas-song ti-gb2312.1980-0 font.monospace.ja=kappa-mincho-jisx0208.1990-0 Excerpt from my comment to bug 102623: ---------- Finally, we definitely have to look into and fix the font selection mechanism so that 'lang' attribute is taken into account. It's clear from my screenshot(attachment 53479 [details]) of Mozilla's rendering attachment 53478 [details] that 'lang' attribute is not honored by the font selection routine. About half of Hanzis (covered by KS C 5601) in zh-CN section is rendered with '-daewoo-gothic-ksc5601.1987-0'(green) while the other half of Hanzis (NOT covered by KS C 5601) are rendered with '-isas-song ti-gb2312.1980-0' (red). In ja section, most Kanjis are rendered with -daewoo-gothic, but one Kanji (not covered by KS C 5601/KS X 1001 but covered by both GB 2312-1980 and JIS X 0208) is rendered with 'isas-song ti'. And, one Kanji (only covered by JIS X 0208) is rendered with 'kappa-mincho-jisx0208.1990-0' (blue). I have little idea why this order of font search order (1. ko, 2. zh-CN, 3. ja) is used by Mozilla. I changed my language preference in Pref|Navigator|Language (moving down ko to the bottom and moving ja or zh-CN to the top), but this didn't make any difference. I thought it might be foundray names sorted in alphabetical order. 'daewoo' comes first and 'isas' and 'kappa' follow it. However, this turned out not to be the case, either. When I changed font.monospace.ko to 'sun-gothic', it (alphabetically the last) just took precedence over 'isas' (zh-CN) and 'kappa'(ja). I guess I have to look into nsFontMetricsGTK.cpp as nhotta suggested. ----------------
Blocks: 102623
->ftang
Keywords: intl
I've just attached a screenshot of the URL above. This was rendered by Mozilla when it was launched under ja_JP.EUC-JP locale. The attachment 53479 [details] was made under ko_KR.EUC-KR locale. Under ja_JP.EUC-JP locale, Japanese font takes precedence over C and K fonts. Comparing two screenshots, I realized that glyph search order is locale-dependent. I also had some 'ddd/gdb' session to confirm this. It seems like methods in nsFontMetricsGTK.cpp do what they're told by 'callers'. Therefore, what has to be done is to make explicit specification of 'lang' attribute in CSS override the locale-dependent default glyph search order. I think this has to be done at an earlier stage (than nsFontMetricsGTK or similar parts for other OS) where CSS is handled.
Bug 91190 is about locale font is used for unicode document.
shanjian- can you take a look at this ?
Assignee: ftang → shanjian
Another example why this is a 'must'. http://www.unicode.org/iuc10/x-utf8.html MS IE 5.x honors 'lang' attribute in selecting fonts (for CJK), but Mozilla does not.
We definitely need to fix this sooner. But I am afraid I can't do it in 0.9.6. Set it to 0.9.7.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.7
david, It seems we have some problem in dealing with "lang" attribute. Refer following document for "lang" attribute's specification, http://www.w3.org/TR/html4/struct/dirlang.html#adef-lang I set some break points in nsGenericHTMLElement.cpp. I saw "lang" attribute was parsed and set to html's style sheet. But the callback function "PostResolveCallback" set in RuleData was never called. So could you share some thought about where might be the problem? I guess the invocation of the callback function should be somewhere within nsRuleNode.cpp.
Blocks: 110885
nsbeta1, needed by bug 102623
Keywords: nsbeta1
nsbeta1+
Keywords: nsbeta1nsbeta1+
Attached patch patch (obsolete) — Splinter Review
david, could you review my code? thanks!
Whiteboard: need r/sr
Shanjian, It worked beautifully in my Linux build, but somehow Mozilla showed up with a very strange layout in my Windows build. Have you tried your patch under MS-Windows? Perhaps, it might be transitory due to some bad 'interaction' with other patches in the source. (I pulled out the source Tuesday for MS-Windows build). I may update my source and try again.
Yes, I tried it on windows.
Shanjian, Thanks for the confirmation and sorry for bothering you with a 'broken' build of mine. Now that I pulled the source again and rebuilt it with your patch, it's working great.
CC to couple of CSS guys. HI, could one of the CSS expert give me r/sr? thanks.
I haven't looked at the patch in any detail yet, but a few immediate comments: * you shouldn't be adding a CSS property to the nsCSSPropList.h if it's not a real property defined by a CSS spec, unless you use the -moz- prefix. But is there any need for this pseudo-property to be parsed? * In nsRuleNode, it would be easier to just release the language atom service in a function called from the content module destructor rather than constantly counting construction/destruction of rule nodes (of which we have many). * is there any overlap with the work done in bug 35768?
>> you shouldn't be adding a CSS property to the nsCSSPropList.h if it's not a real property defined by a CSS spec, unless you use the -moz- prefix. That can be easily done. >>But is there any need for this pseudo-property to be parsed? Yes, definitely. Whenever we are selecting any font, we always need to know what language we are dealing with. That allow us to choose the right font for that language. Currently this 'lang' attribute is passed as an html attribute, and a post callback function is used to get the value. However, that does not work at all. Jungshik's testcase is a good example of the problem. The newly developed opentype font allow user to pass language inforation directly, otherwise you might get wrong glyph. >> In nsRuleNode, it would be easier to just release the language atom service >>in a function called from the content module destructor rather than constantly >>counting construction/destruction of rule nodes (of which we have many). This can be done once we agree my approach is sound. >> is there any overlap with the work done in bug 35768? That's a hard question. I didn't realize the existence of bug 35768. A quick browse tell me that we seems to handle this issue for different purpose. "lang" property is not pseudo to me at all. I will make a comment in 35768.
>>>But is there any need for this pseudo-property to be parsed? >Yes, definitely. Whenever we are selecting any font, we always need to know what >language we are dealing with. That allow us to choose the right font for that >language. So you're proposing a new CSS property because we don't use the CSS2 font selection algorithm correctly so things are broken? If we use the CSS2 font selection algorithm correctly it should be able to solve most of these problems by specifying fonts in the correct order. Or do I not understand the problem correctly? Could you explain very clearly what the problem that you're trying to fix is, and why you need to add a new CSS property that you'd expect all authors to add to their stylesheets to fix it?
>So you're proposing a new CSS property because we don't use the CSS2 font >selection algorithm correctly so things are broken? If we use the CSS2 font >selection algorithm correctly it should be able to solve most of these problems >by specifying fonts in the correct order. Or do I not understand the problem >correctly? Let's make it clearly when you are talking about CSS2 font selection algorithm. Do you mean "15.3.2 Descriptors for Selecting a Font: 'font-family', 'font-style', 'font-variant', 'font-weight', 'font-stretch' and 'font-size'"? If so, we have nothing broken. It is just because CSS2 font selection algorithm itself could not lead to satisfactory result, or this is just a implementation detail CSS2 don't care about. >Could you explain very clearly what the problem that you're trying to fix is, Do I need to expain why we need language information in selecting fonts? Whenever a piece of text needs to be display and a font need to be selected, unless a physical font on system is specified as font-family, CSS need to resolve font using generic font family like 'serif', 'sans-serif', 'cursive', 'fantasy', and 'monospace' specified or default font if generic is unavailable. But those generic fonts and default fonts are specific to languages. Certain fonts are design for certain languages, only those fonts could lead to good result when displaying text for that language. Currently, mozilla allow users to set this configuration in preference->appearance->fonts. This configuation is done for each language/ language group separately. We need language information to choose the right configuration. To make things even more complicated, the newly developed open-type fonts allow a single font return different glyphs base on language for single code point. Han-unification happend in unicode standard encode semantically the same character using a single code point. The glyphs for one character might be different in simplified chinese, traditional chinese, japanese, and korean. Without language information, we just can't rendering the text correctly. >and why you need to add a new CSS property that you'd expect all authors to add >to their stylesheets to fix it? No, I didn't add a new CSS property. (Either a new CSS property is needed for CSS is another question.) I just use the existing CSS mechanism to propagate an existing html attribute. You might want to ask, if this piece of information is so important, why it is missing? In fact, it is only partially missing, as you can see in my patch. We did parsed 'lang' attribute, but current post callback mechanism does not work in propagating language information in CSS tree. All other piece of codes (like font selecting using langGroup, map langGroup from charset when 'lang' is missing) are there. But we do have bunch of bugs blocking by this problem.
No, I don't mean the descriptors -- I mean just the font selection algorithm ignoring descriptors, with prioritized font-family lists. If the CSS2 standard is incorrect, we need to try to fix the standard, but I'm getting tired of seeing every internationalization problem related to font selection being solved by moving farther and farther from the standard rather than using the standard to our own benefit (e.g., bug 33162, bug 4760, and I think another bug that I can't find right now, which I believe could all have been fixed using the method advocated by the standard -- by using a default prioritized font list beginning with a Latin1 font for CJK locales). If there's something wrong with the standard, we should be involved with the W3C's internationalization and CSS groups to try to fix it. I'm having trouble telling whether this patch fits along those same lines (or is a fix to those fixes that broke sometime recently due to other changes) or whether it's a different issue that I need to understand. >>and why you need to add a new CSS property that you'd expect all authors to >>add to their stylesheets to fix it? > No, I didn't add a new CSS property. (Either a new CSS property is needed for > CSS is another question.) I just use the existing CSS mechanism to propagate > an existing html attribute. Your patch adds another CSS property to the stylesheet data structs (not just the computed data structs), and I don't think it needs to do that -- it just adds bloat to the system that you don't need. If you're not intending that the property be parsed from a CSS stylesheet, you don't need any of the changes to nsCSSDeclaration.cpp, nsCSSPropList.h, nsICSSDeclaration.h, or nsCSSStyleRule.cpp. If your patch doesn't need any of those changes, then it might be a lot easier for me to understand what the patch is trying to change.
First of all, in my bug report and comments I misused the term CSS (property) when refering to 'lang' attribute for html tags like div,span,p and so forth. This misuse of mine may have something to do with what you don't like (adding pseudo-css property?) in Shanjian's way of fixing this bug. DB>But is there any need for this pseudo-property to be parsed? SJ>Yes, definitely. Whenever we are selecting any font, we always SJ>need to know what SJ>language we are dealing with. That allow us to choose the right font SJ> for that language. DB>So you're proposing a new CSS property because we don't use the CSS2 font DB>selection algorithm correctly so things are broken? If we use the CSS2 font DB>selection algorithm correctly it should be able to solve most of DB> these problems by specifying fonts in the correct order. Did you mean that CSS2 font selection algorithm is currently broken in Mozilla or that it's not broken but Shanjian's patch assumed that it's broken and bypassed it? If the latter is the case, building a *lang-dependent* virtual font-family list may be a solution. (I suspect that Mozilla builds these 'internal' 'virtual' lists for langs or character codings it support at the start-up for a couple of Mozilla's internal classes like '-moz-fixed' and what Shanjian's patch does is to activate one of them according to the value of 'lang') :lang(zh-TW) { font-family: trad-chinese-font, x-western-font,.....} :lang(zh-CN) { font-family: simp-chinese-font, x-western-font,.....} :lang(ja) { font-family: ja-font, x-western-font,.....} :lang(ko) { font-family: ko-font, x-western-font,.....} However, there's a problem with this in X11 unless this font names in font-family list include not only font-face but also foundry, character set and encoding (the first two and the last two elements of XLFD. See the footnote 1 below) DB>Or do I not understand the problem correctly? I'm not sure whether Shanjian's way of fixing this bug is most desirable (whether that is the best or not, it works beautifully). You're certainly a much better judge in CSS related issues than I'm. However, I think you probably agree that HTML 'lang' attribute specification should be utilized to help font selection in *absence* of sufficient information provided via CSS. What would you say Mozilla should do when rendering a document like http://www.unicode.org/iuc10/x-utf8.html ? Whether that's right or wrong, there are documents with 'lang' attribute specified but without font specified via CSS. I believe when 'font-family' is not specified in CSS, Mozilla kinda build a virtual font-family list to use. The question here is what fonts go there in what order and what considerations have to be given when building it. One of factors to consider is 'lang' attribute, isn't it? The locale under which Mozilla is launched and the encoding (character coding) of the currrent document are considered but 'lang' attribute is out of the equation. The result of not taking into account 'lang' attribute leads to a very ugly rendering as shown in attachment 53479 [details]. DB>Could you explain very clearly what the problem that you're trying DB> to fix is, Given a html snippett like this (with no font specification in CSS), what does Mozilla have to do? An example of this (misnamed!!) is given at the URL in the URL field and another is at <http://www.unicode.org/iuc/iuc10/x-utf8.html>. <div lang="zh-CN"> Simplified Chinese </div> <div lang="ja"> Japanese</div> <div lang="zh-TW">Traditional Chinese</div> <div lang="ko">Korean</div> When 'font-family' is not specified, I believe 'lang' attribute has to be refered to to build a 'virtual' font-family list picking fonts from user pref. files. For <div lang="zh-CN">, the first font in this virtual font-family list should be a font specified for Chinese(Simplified) and for <div lang="ja">, it should be a font for Japanese. Instead of doing this, Mozilla without Shanjian's patch depends on the character coding(encoding) of the document and the locale under which Mozilla is launched. If the character coding(encoding) of the document is not 'unicode', that determines what fonts go into the 'virtual' font list in what order. In case it's unicode, the locale is used to pick fonts. If Mozilla is launched under zh-TW locale, for all four <div>'s above, the first font in the virtual font-family list is always a font for traditional Chinese regardless of 'lang'. It's a font for Korean if Mozilla is launched under ko locale. attachment 53479 [details] clearly shows this problem. DB> why you need to add a new CSS property that you'd expect all authors DB> to add to their stylesheets to fix it? Not all authors are expected to add 'pseudo-css' lang property to their stylesheets. To facilate lang-dependent font selection, they *only* have to add 'lang' attribute to html tags such as <div>, <p>, and <span>. footnote 1: In Unix/X11, it's not uncommon that a single font-face/font-family name is used for multiple fonts with sometimes drastically different looks. In addition, there may be multiple gothic fonts with different character set and encoding. In an ideal world, those fonts should all have identical look and feel and be interchangeable, but the reality is not that simple. (see my comments for bug 102623 and bug 94327)
> :lang(zh-TW) { font-family: trad-chinese-font, x-western-font,.....} > :lang(zh-CN) { font-family: simp-chinese-font, x-western-font,.....} > :lang(ja) { font-family: ja-font, x-western-font,.....} > :lang(ko) { font-family: ko-font, x-western-font,.....} > > However, there's a problem with this in X11 unless > this font names in font-family list include not only font-face but also > foundry, character set and encoding (the first two and the last two > elements of XLFD. See the footnote 1 below) What what what ???? No, I totally disagree with that. The UA has to look for X fonts including the wanted characteristics whatever is the foundry. And encoding is most of the time directly available from the charset name. I mean that if the X11 font is an hebraic font, it *is* an iso8859-8 font, right ? I think that the interpretation above is a complete misunderstanding of the concept of WebFonts and its corresponding font description and selection mechanisms. I am with David Baron here and not necessarily understand what is proposed and why new additions are required. I am definitely with him about one point : if you can prove the CSS font selection is false/incomplete/whatever, this *must* be raised to the I18N and CSS Working Groups.
I would like to clarify several issues: 1)Treating HTML attribute as CSS property is not my invention. In file nsGenericHTMLElement.cpp, funtion "MapCommonAttributesInto" is for this purpose. The mechanism is already there. Otherwise my patch will be even more complicated. "dir" is an attribute in HTML. (It also has a CSS counter part 'direction'). 2)This language information is needed whenever font selection is needed, CSS rule resolving mechanism is the best way IMO. 3)It is rather strange to see language is not mentioned at all in CSS's font selection algorithm. It acts like a CSS property in many aspects in the browser. 4)Probably it doesn't add bloat to the system. Because we are passing lang directly, "nsIHTMLMappedAttributes* mAttributes; " was eliminated from nsNode structure. "mLanguage" is part of the "nsStyleVisibility" long before my change. 5)It might be true in one or 2 places in those files that my change is unnecessary. But most of them are needed even though 'lang' is not from CSS2. Those are simple changes and shouldn't be a concern.
> not necessarily understand what is proposed and > why new additions are required. Please, read all the comments in this bug,bug 102633 and bug 94327 before making any hasty judgement on the *reality*. I'm the last person who wants to break the standard. I do want Mozilla to stick to all the relevant standards as much as possible.. > What what what ???? No, I totally disagree with that. The UA has to look for > X fonts including the wanted characteristics whatever is the foundry. And Mozilla does that now with bug 94327 fixed. > encoding is most of the time directly available from the charset name. Hmm.... do you know what the last two fields of XLFD are? > I mean > that if the X11 font is an hebraic font, it *is* an iso8859-8 font, right ? No, it could be iso10646-1 as well. Do you know how many different 'gothic' fonts I have on my Linux box with different character sets, encodings, glyph coverage, look and feel for Korean alone? > I think that the interpretation above is a complete misunderstanding of the > concept of WebFonts and its corresponding font description and selection > mechanisms. It's based on the assumption that font-family would be a pretty good means of rather 'uniquely' specifying font, but unfortunately, the reality of X11 font system (as opposed on Windows and MacOS) is not that simple. Of course, it's moving in that direction with Xft, but we're not there yet.
>What what what ???? No, I totally disagree with that. The UA has to look for >X fonts including the wanted characteristics whatever is the foundry. And >encoding is most of the time directly available from the charset name. I mean >that if the X11 font is an hebraic font, it *is* an iso8859-8 font, right ? >I think that the interpretation above is a complete misunderstanding of the >concept of WebFonts and its corresponding font description and selection >mechanisms. Please explain what is "the concept of WebFonts" and so on. I don't want to limit this discussion to X fonts. It applies to all platforms. "Encoding" and "language" are 2 different concept. Yes, in most situations, language is deducible from encoding (charset). But what happen when encoding is one of the transformation of unicode (like utf8). As the time goes one, more and more webpage will be encoded in unicode. Language has to be provided somehow. Let's leave along the question about amending CSS specification. In http://www.w3.org/TR/html4/struct/dirlang.html#adef-lang there are following paragraph for attribute 'lang' Attribute definitions lang = language-code [CI] This attribute specifies the base language of an element's attribute values and text content. The default value of this attribute is unknown. Language information specified via the lang attribute may be used by a user agent to control rendering in a variety of ways. Some situations \ where author-supplied language information may be helpful include: Assisting search engines Assisting speech synthesizers Helping a user agent select glyph variants for high quality typography Helping a user agent choose a set of quotation marks Helping a user agent make decisions about hyphenation, ligatures, and spacing Assisting spell checkers and grammar checkers The lang attribute specifies the language of element content and attribute values; whether it is relevant for a given attribute depends on the syntax and semantics of the attribute and the operation involved. The intent of the lang attribute is to allow user agents to render content more meaningfully based on accepted cultural practice for a given language. This does not imply that user agents should render characters that are atypical for a particular language in less meaningful ways; user agents must make a best attempt to render all characters, regardless of the value specified by lang. For instance, if characters from the Greek alphabet appear in the midst of English text: <P><Q lang="en">Her super-powers were the result of &gamma;-radiation,</Q> he explained.</P> a user agent (1) should try to render the English content in an appropriate manner (e.g., in its handling the quotation marks) and (2) must make a best attempt to render &gamma; even though it is not an English character. AND 8.1.2 Inheritance of language codes An element inherits language code information according to the following order of precedence (highest to lowest): The lang attribute set for the element itself. The closest parent element that has the lang attribute set (i.e., the lang attribute is inherited). The HTTP "Content-Language" header (which may be configured in a server). User agent default values and user preferences. Does those statement looks similar to CSS property? Is it a natural choice to implement this attribute using existing CSS rule resolving mechanism? >I am with David Baron here and not necessarily understand what is proposed and >why new additions are required. I am definitely with him about one point : if >you can prove the CSS font selection is false/incomplete/whatever, this *must* >be raised to the I18N and CSS Working Groups. This is not an invention of a CSS property. It does look strange to me that lang is not a CSS property in CSS2. Since I am not a CSS expert, I assume there is certain reason not to put it in CSS. (One of my guess is, language is the description of content instead of appearance.) But we definitely need to honor this attribute. for HTML.
From earlier comments, it seems the bug is because nsStyleVisibility::mLanguage isn't computed properly by the Style System. http://lxr.mozilla.org/seamonkey/source/content/shared/public/nsStyleStruct.h#62 9 (Later on, the font engine uses that to pick the right subset in fonts, for example to distinguish "Times New Roman"-Western-subset from "Times New Roman"-Arabic-subset. Strictly speaking, it isn't the CSS selection algorithm that is at fault.) Looks like the patch was an attempt to get nsStyleVisibility::mLanguage to be filled properly. But the issue might just be to get PostResolveCallback() to work. @@ -3301,10 +3269,10 @@ nsHTMLValue value; aAttributes->GetAttribute(nsHTMLAtoms::lang, value); if (value.GetUnit() == eHTMLUnit_String) { - // Register a post-resolve callback for filling in the language atom - // over in the computed style data. - aData->mAttributes = (nsIHTMLMappedAttributes*)aAttributes; - aData->mPostResolveCallback = &PostResolveCallback; + nsAutoString lang; + value.GetStringValue(lang); + aData->mDisplayData->mLang.SetStringValue(lang, + eCSSUnit_String);
Comments on the backend changes: > 4)Probably it doesn't add bloat to the system. Because we are passing > lang directly, "nsIHTMLMappedAttributes* mAttributes; " was eliminated > from nsNode structure. "mLanguage" is part of the "nsStyleVisibility" > long before my change. The nsCSSDeclaration data structures (of which nsCSSDisplay is one) are one of our biggest bloat problems. It most definitely does add bloat to the system, and for a bad reason (see next point). > 5)It might be true in one or 2 places in those files that my change is > unnecessary. But most of them are needed even though 'lang' is not from > CSS2. Those are simple changes and shouldn't be a concern. Causing a new CSS property to be recognized and parsed is not a simple change that should not be of concern. The CSS working group has agreed that, since the CSS property namespace is very crowded, we will not add custom properties without prefixing them. Furthermore, adding a new CSS property, in the current CSS declaration implementation (which we're hoping to change), adds a good bit of bloat to the system. Finally, I don't see a reason that 'lang' (or '-moz-lang') should be a CSS property. The language that content is written in should be part of the markup, not part of the style. You do not need to add a new CSS property to be parsed to allow the value to be in the style structs -- the backend style rule structs and the frontend style data structs are different structures, and you don't need anything in the backend structs since there is no CSS property -- you only need the information in the style data. You should remove the changes to the 4 files I mentioned above and attach a new patch for review so that we can then discuss the actual changes that you want to make.
Comments on font selection: The basic idea of the CSS font selection algorithm is that a list of fonts should be sufficient when styling an element -- if a glyph is not found in the first font, then one proceeds to the second font to look for the glyph, etc. For documents containing multiple languages of text, there may need to be different default font styles that act like selectors with [lang="..."] in the UA stylesheet. To solve problems like the poor appearance of western characters in CJK fonts, the default font-family list for text in Japanese would *begin* with a Western font that contains good glyphs for western characters, and it would then have the preferred fonts used to display the Japanese characters. This leads to Western characters being displayed using Western fonts and Japanese characters being displayed using Japanese fonts. This is a simple way of doing things, although it generally requires more work in the style code and less work in the font code. The preferred approach up to this point has been to hack lots of complexity into the platform-specific font code (e.g., bug 33162 and bug 4760) and keep everything out of the style code. I've been disappointed that the approach advocated by the standard (which would have involved moving a bit more of the handling of these issues into the cross-platform style code rather than handling them in the platform-specific font code) was not even tried before nonstandard approaches were taken, and I'm wondering how these changes fit in to our overall approach to font selection.
Yep, that's how it is supposed to work (and in fact that's how it precisely works in GfxWin). The default fonts can be anything and the language information is used in situations such as "element {font-family: serif}". There are lists for serif, sans-serif, etc fonts. The lists can be anything and the element's language (which comes from nsStyleVisibility::mLanguage) helps to pick the correct one for a language group (people filling the default prefs can use whatever fonts in order suitable for a particular language group). I am not sure if the weirdness on Linux is due to the general weirdness of Linux fonts or to the fact that lists of multiple defaults are not yet supported there. There is no choice other than sticking with a single default font per generic family there at the moment. I had hoped GfxGTK would have caught up with GfxWin, but... [Err, disclaimer... the last two comments are a disgression from getting nsStyleVisibility::mLanguage filled properly which is the root cause of this bug.]
This may not be directly related to this bug, but since the font selection issue came up here ....... I have nothing against doing as much as possible at the level of platform-independent CSS2 and I think I18N team members agree with you on this, but I'm afraid there's only as much we can do that way in presence of unavoidable differences between platforms. > The basic idea of the CSS font selection algorithm is that a list of fonts > should be sufficient when styling an element -- if a glyph is not found in the> first font, then one proceeds to the second font to look for the glyph, etc. This works perfectly well under MS-Windows (and MacOS) where specifying font-family narrows down the set of matched fonts reasonably well. Most of time it's narrowed down to a single font. Even if it's not (which is rare. This may only happen if there are multiple fonts with a single name in different formats, TTF, OTF and bitmap), it's safe to assume that all of multiple matched fonts have identical looks and feels so that picking any one of them (with some additional criteria if there's any) wouldn't lead to any problem. Under Unix/X11 this model breaks down especially for CJK fonts. although for Western fonts this model still holds up pretty well. See how many 'gothic' fonts I have. Unlike in MS-Windows, font-family in X11 doesn't include foundry name and that results in degeneracies to lift by other means. Korean: -daewoo-gothic-medium-r-normal--0-0-100-100-c-0-ksc5601.1987-0 -misc-gothic-medium-r-normal--0-0-75-75-c-0-ksc5601.1987-0 -sun-gothic-medium-r-normal--0-0-75-75-c-0-ksc5601.1987-0 -kaist-gothic-medium-r-normal--0-0-75-75-c-0-johab-1 -kaist-gothic-medium-r-normal--0-0-75-75-c-0-johabs-1 -kaist-gothic-medium-r-normal--0-0-75-75-c-0-johabsh-1 -munhwa-gothic-medium-r-normal--0-0-75-75-c-0-ksc5601.1992-3 Japanese: -wadalab-gothic-medium-r-normal--0-0-0-0-c-0-jisx0208.1983-0 -misc-gothic-medium-r-normal--0-0-75-75-c-0-jisx0208.1983-0 Western Europe(Latin1) -misc-gothic-medium-r-normal--0-0-75-75-c-0-iso8859-1 -wadalab-gothic-medium-r-normal--0-0-0-0-c-0-iso8859-1 That's why in prefs.js for Unix/X11, font.name.style.lang has four fields (the first two fields and the last two fields from XLFD) as shown below: user_pref("font.name.monospace.ko", "kaist-gothic-johabsh-1") as opposed to what's done under MS-Windows: user_pref("font.name.monospace.ko", "gulimche") If we completely rely on CSS alone without any x11/gtk specific measures taken, we're left with 'gothic' losing 'kaist, johabsh, 1' which are all important in uniquely identifying the font a user wants Mozilla to use. Mozilla may pick a font *different* from what a user specified based on what it's left with. As though this is not complicated enough, a recent XFree86 has two fonts with identical first two and last two fields in XLFD but with different add-style field: -misc-fixed-medium-r-normal-ja-18-120-100-100-c-180-iso10646-1 -misc-fixed-medium-r-normal-ko-18-120-100-100-c-180-iso10646-1 Character repertoires of two fonts are not identical although they're both iso10646-1. > For documents containing multiple languages of text, there may need to be > different default font styles that act like selectors with [lang="..."] in the> UA stylesheet. Not just for multilingual documents but also for monolingual documents, this is necessary if it's in utf-8 or any other unicode transformation format. For instance, Mozilla-mail converts all email messages to utf-8 before rendering it and it needs to specify 'lang' to make up for the information loss in converting from locale-specifing MIME charset to locale-neutral UTF-8. See bug 102633 in which mainly distinguishing between CJK cases has been talked about. However, it also applies to breaking up the tie between Russian and Greek fonts and CJK fonts which include a subset of Cyrillic and Greek letters. The reason this hasn't been noticed by Russian/Greek speakers is that they run Mozilla under Russian/Greek locale to read Russian/Greek emails. If Mozilla is launched under one of CJK locales to read Russian emails, it would have been apparent to them. Related is bug 107217 (Russian html docs are rendered with CJK fonts if they're in UTF-8 which doesn't have inherent/default lang value unlike other lang/locale specific encodings.) Anyway, this would work perfectly on MacOS and MS-Windows, but not under Unix/X11 at the moment (without fixing this bug ) as I wrote in comment #24. Unix/X11 is moving in this direction with upcoming Xft/X Render, but it will take long to complete the transition. In the meantime, some platform specific measures (perhaps 'hacks' to you) are necessary. (I think that's why nsFontMetricksGTK.cpp is so long and complex: font banning, glyph exclusion, transliteration, single width vs double width, and so forth). More Off-topic (although related) > To solve problems like the poor appearance of western characters in CJK > fonts, the default font-family list for text in Japanese would *begin* > with a Western font that contains good glyphs for western characters, > and it would then have > the preferred fonts used to display the Japanese characters. This leads > to Western characters being displayed using Western fonts and Japanese > characters being displayed using Japanese fonts. This would not solve the problem. In Japanese text, those Western characters are, more often than not, expected to be rendered with glyphs drawn from Japanese fonts (NOT Western fonts). Japanese fonts contain *good* glyphs for those characters that go along well with neighboring Japanese characters. To make things more complicated, _sometimes_ this is not desired even by Japanese, in which case I think the author has to enclose Russian portion with <span lang="ru">....</span>. Anyway, this part of the problem can be at least partly (as far as just specifying font-family works in Unix/X11) solved with lang-specific default font-familiy lists. (For Japanese, the list begins with 'Japanese' fonts) A harder part of the problem the patch for bug 33162 tries to solve is that characters NOT covered by ISO-8859-1 (but covered by Windows-1252) in Western European text are ofen found in Japanese fonts but NOT found at all in Western fonts (because many X11 installations have only iso8859-1 fonts instead of windows-1252 fonts). Therefore, even if Western fonts come before Japanese fonts in the default font-family list for Western European, those characters missing in Western fonts end up getting rendered with Japanese glyphs that don't go along well with neighboring European characters if CSS2 is abided by to letter. To work around this, the patch uses 'transliteration' for Western European texts. This is not the case under MS-Windows and only happens under Unix/X11. CSS2 level solution would be nice, but I'm afraid for a while we have to live with platform-specific measures until Xft/X Render finally catches up in Unix/X11.
Since style resolution is critical, it will be impeded if the handling of the default font lists is moved there (which unfortunately will require the UA CSS to distinguish lists for each generic font family per language). Another advantage in doing this work in GFX is that the same code is handling two cases: "element { font-family: generic}" and "element { font-family: font1, font2, generic }". If the "generic" re-mapping is done upfront, it could end up that this wasn't necessary because the element is not rendered, or because it may happen that glyph lookup doesn't reach the generic font in the font-family list (this is done lazily in the font code at the moment). So while the situation on Windows is more uniform that Linux, shielding the font code from knowing the language will bring other issues to consider too. Now, back to Linux...
I'd certainly be willing to review a patch that corrects the computation of the language stored in the style data struct (although I think we should discuss the larger issues) -- in fact, I'd love to see the number of places where we need PostResolveCallbacks reduced, but I still object to the addition of a new CSS property (and incorrectly, too).
Attachment #59394 - Attachment is obsolete: true
That patch removes the parsing but it doesn't remove the changes to the other three files that add bloat, and now that you've removed the parsing, do *absolutely* nothing (other than overwrite the lang when visibility is set in some cases). See comment #23, where I omitted mentioning the one file where you did remove the changes (nsCSSParser.cpp).
+ static nsILanguageAtomService *mLangService; This should be gLangService rather than mLangService. Also, the patch doesn't contain the changes to the module destructor.
dbaron, Don't know how that can be done. nsRuleData has to contains 'lang' info (as it does in my patch) or contains the source of 'lang' info (as it does in original code, mAttributes) so that 'lang' info can be computed. Please be more specific about how to remove the bloatness.
answer for #38 mLanguageService is now declared as static member of nsRuleNode. nsRuleNode::Shutdown is called is nsStyleSet.cpp. I put it there to reduce the affect code scope.
Oh, I see. It's all this silly thing about how we use the same structs on the heap in the declaration structs and on the stack when walking the rule tree. So maybe you'd be better off defining an nsCSSStackDisplay that inherits from nsCSSDisplay but also has an mLang member? (At some point soon we should stop using those structures on the heap...) You still don't need the changes to nsCSSStyleRule.cpp. And probably it's better to call nsRuleNode::Shutdown from the module destructor (in content/build/nsContentModule.cpp) -- it will be less confusing later when I try to move that code around.
Attachment #60546 - Attachment is obsolete: true
> So maybe you'd be better off defining an nsCSSStackDisplay that inherits from > nsCSSDisplay but also has an mLang member?
I didn't miss that part, but I thought it was optional. I am not sure to complicate the code with another almost identical structure just to save a field is a good idea. I will investigate this a little bit of further tomorrow. BTW, what controls the number of nsCSSDisplay structures used by system, including both heap and stack?
There are only about 300 nsCSSDisplay to bring up a browser window -- I thought there were a bit more than that. Any style rule in a stylesheet that has one of the properties stored in that struct causes one to be created. So I guess this seems OK to me, since we won't be using these structures anymore once we fix bug 107270. However, you should put a highly visible comment above the line you're adding in nsICSSDeclaration.h saying that the mLang member variable is there not because in needs to be stored in nsCSSDeclaration objects but because it's needed on the stack when the struct is used in WalkRuleTree. On the other hand, it shouldn't be difficult to do what I suggested, I don't think... Could you also change the name of the mLangService that you added to nsRuleNode.h and .cpp to gLangService? The + nsServiceManager::GetService(NS_LANGUAGEATOMSERVICE_CONTRACTID, + NS_GET_IID(nsILanguageAtomService), (nsISupports**) &mLangService); much more cleanly written as + CallGetService(NS_LANGUAGEATOMSERVICE_CONTRACTID, &mLangService); (CallGetService is a function template that uses the type of the template parameter to get the right IID.) Finally, there's no need for these 4 lines: + } + else if (eCSSUnit_Inherit == displayData.mLang.GetUnit()) { + inherited = PR_TRUE; + visibility->mLanguage = parentVisibility->mLanguage; since displayData.mLang isn't a real property. The "lang: string, inherit" comment should probably also mention that it's not a real property. Other than that, the patch looks fine, although I'd like to see the bloat issue fixed if there's not some obstacle that I'm not thinking of.
Any info on why PostResolveCallback() is broken here? I had thought that it was a ad-hoc meant to be used with caution, but to rescue cases like this.
This patch renamed mLangService to gLangService, and removed 4 lines in nsRuleNode.cpp as suggested by dbaron. (I though those 4 lines was used to make inheritance work, but it seems ok after removing those 4 lines.) I didnot create another structure for stack version of nsCSSDisplay. I took a look at bug 107270. The main idea of the currect approach is to parse raw css declaration to an intermediate form (property id, value pair) in CSSDeclarationImpl. The mapping to various CSS data structures are delayed. So name "nsCSSDisplay" should be kept for stack version. Creation of another structure need to be removed when fixing 107270. So I found enough reason for me to excuse myself from doing extra work. :-) (Did you ever think of compressing stack version of css data structure? Speed might be a major concern here.)
Attachment #60569 - Attachment is obsolete: true
Comment on attachment 60716 [details] [diff] [review] yet another update sr=attinasi
Attachment #60716 - Flags: superreview+
fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Jungshik Shin, could you verify this?
QA Contact: teruko → jshin
Yes, this bug was fixed. I've just checked with 2001-12-27 and it worked fine as early December build did.
Status: RESOLVED → VERIFIED
*** Bug 122779 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: