Last Comment Bug 105199 - glyph search order is NOT lang dependent
: glyph search order is NOT lang dependent
Status: VERIFIED FIXED
need r/sr
: intl
Product: Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla0.9.7
Assigned To: Shanjian Li
: Jungshik Shin
Mentors:
http://jshin.net/moztest/css2.html
Depends on:
Blocks: 102623 110885
  Show dependency treegraph
 
Reported: 2001-10-16 21:21 PDT by Jungshik Shin
Modified: 2002-04-24 16:45 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
utf-8 page screenshot with CJK lang tag (96.11 KB, image/jpeg)
2001-10-17 13:45 PDT, Jungshik Shin
no flags Details
patch (12.93 KB, patch)
2001-11-27 14:32 PST, Shanjian Li
no flags Details | Diff | Splinter Review
screen shot after the patch. (97.30 KB, image/jpeg)
2001-12-04 17:13 PST, Shanjian Li
no flags Details
new patch for dbaron's suggestion (9.29 KB, patch)
2001-12-05 15:09 PST, Shanjian Li
no flags Details | Diff | Splinter Review
another update address comment #41 (8.70 KB, patch)
2001-12-05 17:15 PST, Shanjian Li
no flags Details | Diff | Splinter Review
yet another update (8.75 KB, patch)
2001-12-06 15:35 PST, Shanjian Li
dbaron: review+
attinasi: superreview+
Details | Diff | Splinter Review

Description Jungshik Shin 2001-10-16 21:21:37 PDT
When CSS specifies 'lang' attribute, Mozilla does not refer to it
while searching for glyphs to render CJK (and other ) text. 
I have the following font specifications in my prefs.js:

font.monospace.ko=daewoo-gothic-ksc5601.1987-0
font.monospace.zh-CN=isas-song ti-gb2312.1980-0
font.monospace.ja=kappa-mincho-jisx0208.1990-0

Excerpt from my comment to bug 102623:

----------
Finally, we definitely have to look into
and fix the font selection mechanism so that
'lang' attribute is taken into account. It's clear from
my screenshot(attachment 53479 [details]) of Mozilla's
rendering attachment 53478 [details] that 'lang' attribute
is not honored by the font selection routine. 


About half of Hanzis (covered by KS C 5601) in zh-CN section 
is rendered with
'-daewoo-gothic-ksc5601.1987-0'(green) while the other half
of Hanzis (NOT covered by KS C 5601)
are rendered with '-isas-song ti-gb2312.1980-0' (red).
In ja section, most Kanjis are rendered with -daewoo-gothic,
but one Kanji (not covered by KS C 5601/KS X 1001 but
covered by both GB 2312-1980 and JIS X 0208) is 
rendered with 'isas-song ti'. And, one Kanji
(only covered by JIS X 0208) is rendered with
'kappa-mincho-jisx0208.1990-0' (blue).

I have little idea why this order of font search order
(1. ko, 2. zh-CN, 3. ja) is used
by Mozilla. I changed my language preference
in Pref|Navigator|Language (moving down ko to the
bottom and moving ja or zh-CN to the top), but
this didn't make any difference.

I thought it might be foundray names sorted in
alphabetical order. 'daewoo' comes
first and 'isas' and 'kappa' follow it.
However, this turned out not to be the case, either.
When I changed font.monospace.ko to 'sun-gothic',
it (alphabetically the last) just took precedence over 'isas' (zh-CN)
and 'kappa'(ja). 

I guess I have to look into nsFontMetricsGTK.cpp as
nhotta suggested. 
----------------
Comment 1 Roy Yokoyama 2001-10-17 09:54:13 PDT
->ftang
Comment 2 Jungshik Shin 2001-10-17 13:45:41 PDT
Created attachment 53956 [details]
utf-8 page screenshot with CJK lang tag
Comment 3 Jungshik Shin 2001-10-17 13:53:14 PDT
I've just attached a screenshot of the URL above.
This was rendered by Mozilla when it was launched
under ja_JP.EUC-JP locale. The attachment 53479 [details]
was made under ko_KR.EUC-KR locale. Under ja_JP.EUC-JP
locale, Japanese font takes precedence over C and K
fonts. Comparing two screenshots, I realized that
glyph search order is locale-dependent. I also
had some 'ddd/gdb' session to confirm this. 
It seems like methods in nsFontMetricsGTK.cpp
do what they're told by 'callers'. 

Therefore, what has to be done is to make  
explicit specification of 'lang' attribute in CSS
override the locale-dependent default glyph search
order. I think this has to be done at an earlier
stage (than nsFontMetricsGTK or similar parts
for other OS) where CSS is handled.   
Comment 4 nhottanscp 2001-10-17 18:07:14 PDT
Bug 91190 is about locale font is used for unicode document.
Comment 5 Frank Tang 2001-10-18 13:21:43 PDT
shanjian- can you take a look at this ?
Comment 6 Jungshik Shin 2001-10-19 09:11:27 PDT
Another example why this is a 'must'.
 
http://www.unicode.org/iuc10/x-utf8.html

MS IE 5.x honors 'lang' attribute in selecting fonts
(for CJK), but Mozilla does not. 
Comment 7 Jungshik Shin 2001-10-19 09:12:35 PDT
Ooops... It's http://www.unicode.org/iuc/iuc10/x-utf8.html


Comment 8 Shanjian Li 2001-10-29 19:08:13 PST
We definitely need to fix this sooner. But I am afraid I can't do it in 0.9.6. 
Set it to 0.9.7.
Comment 9 Shanjian Li 2001-11-15 16:16:59 PST
david, 
It seems we have some problem in dealing with "lang" attribute. Refer following 
document for "lang" attribute's specification,
http://www.w3.org/TR/html4/struct/dirlang.html#adef-lang

I set some break points in nsGenericHTMLElement.cpp. I saw "lang" attribute was 
parsed and set to html's style sheet. But the callback function 
"PostResolveCallback" set in RuleData was never called. So could you share some 
thought about where might be the problem? I guess the invocation of the callback 
function should be somewhere within nsRuleNode.cpp. 

Comment 10 nhottanscp 2001-11-26 11:41:01 PST
nsbeta1, needed by bug 102623
Comment 11 nhottanscp 2001-11-26 11:46:13 PST
nsbeta1+
Comment 12 Shanjian Li 2001-11-27 14:32:39 PST
Created attachment 59394 [details] [diff] [review]
patch
Comment 13 Shanjian Li 2001-11-27 14:34:01 PST
david, could you review my code? thanks!
Comment 14 Jungshik Shin 2001-11-29 15:42:52 PST
Shanjian,

It worked beautifully in my Linux build, but somehow
Mozilla showed up with a very strange layout in my
Windows build. Have you tried your patch under 
MS-Windows? Perhaps, it might be transitory due
to some bad 'interaction' with other patches 
in the source. (I pulled out the source Tuesday
for MS-Windows build). I may update my source
and try again. 

Comment 15 Shanjian Li 2001-11-29 16:48:57 PST
Yes, I tried it on windows. 
Comment 16 Jungshik Shin 2001-11-29 20:09:25 PST
Shanjian,

Thanks for the confirmation and sorry for bothering you with a 'broken' build
of mine. Now that I pulled the source again and rebuilt it with your patch,
it's working great. 
Comment 17 Shanjian Li 2001-12-04 09:48:43 PST
CC to couple of CSS guys.

HI, could one of the CSS expert give me r/sr? thanks.
Comment 18 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2001-12-04 12:59:32 PST
I haven't looked at the patch in any detail yet, but a few immediate comments:
 * you shouldn't be adding a CSS property to the nsCSSPropList.h if it's not a
real property defined by a CSS spec, unless you use the -moz- prefix.  But is
there any need for this pseudo-property to be parsed?
 * In nsRuleNode, it would be easier to just release the language atom service
in a function called from the content module destructor rather than constantly
counting construction/destruction of rule nodes (of which we have many).
 * is there any overlap with the work done in bug 35768?
Comment 19 Shanjian Li 2001-12-04 15:51:16 PST
>> you shouldn't be adding a CSS property to the nsCSSPropList.h if it's not a
real property defined by a CSS spec, unless you use the -moz- prefix.  
That can be easily done. 

>>But is there any need for this pseudo-property to be parsed?
Yes, definitely. Whenever we are selecting any font, we always need to know what 
language we are dealing with. That allow us to choose the right font for that 
language. Currently this 'lang' attribute is passed as an html attribute, and 
a post callback function is used to get the value. However, that does not work 
at all. Jungshik's testcase is a good example of the problem. The newly developed 
opentype font allow user to pass language inforation directly, otherwise you 
might get wrong glyph. 

>> In nsRuleNode, it would be easier to just release the language atom service
>>in a function called from the content module destructor rather than constantly
>>counting construction/destruction of rule nodes (of which we have many).
This can be done once we agree my approach is sound.

>> is there any overlap with the work done in bug 35768?
That's a hard question. I didn't realize the existence of bug 35768. A quick browse
tell me that we seems to handle this issue for different purpose. "lang" property 
is not pseudo to me at all. I will make a comment in 35768.

Comment 20 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2001-12-04 16:01:21 PST
>>>But is there any need for this pseudo-property to be parsed?
>Yes, definitely. Whenever we are selecting any font, we always need to know what 
>language we are dealing with. That allow us to choose the right font for that 
>language.

So you're proposing a new CSS property because we don't use the CSS2 font
selection algorithm correctly so things are broken?  If we use the CSS2 font
selection algorithm correctly it should be able to solve most of these problems
by specifying fonts in the correct order.  Or do I not understand the problem
correctly?

Could you explain very clearly what the problem that you're trying to fix is,
and why you need to add a new CSS property that you'd expect all authors to add
to their stylesheets to fix it?
Comment 21 Shanjian Li 2001-12-04 17:05:43 PST
>So you're proposing a new CSS property because we don't use the CSS2 font
>selection algorithm correctly so things are broken?  If we use the CSS2 font
>selection algorithm correctly it should be able to solve most of these problems
>by specifying fonts in the correct order.  Or do I not understand the problem
>correctly?
Let's make it clearly when you are talking about CSS2 font selection algorithm. 
Do you mean "15.3.2 Descriptors for Selecting a Font: 'font-family', 'font-style', 
'font-variant', 'font-weight', 'font-stretch' and 'font-size'"? If so, we have 
nothing broken. It is just because CSS2 font selection algorithm itself could not 
lead to satisfactory result, or this is just a implementation detail CSS2 don't 
care about. 

>Could you explain very clearly what the problem that you're trying to fix is,
Do I need to expain why we need language information in selecting fonts? 
Whenever a piece of text needs to be display and a font need to be selected,
unless a physical font on system is specified as font-family, CSS need to resolve 
font using generic font family like 'serif', 'sans-serif', 'cursive', 'fantasy', 
and 'monospace' specified or default font if generic is unavailable. But those generic
fonts and default fonts are specific to languages. Certain fonts are design for 
certain languages, only those fonts could lead to good result when displaying 
text for that language. Currently, mozilla allow users to set this configuration 
in preference->appearance->fonts. This configuation is done for each language/
language group separately. We need language information to choose the right 
configuration. 

To make things even more complicated, the newly developed open-type fonts 
allow a single font return different glyphs base on language for single code point.
Han-unification happend in unicode standard encode semantically the same character
using a single code point. The glyphs for one character might be different in simplified
chinese, traditional chinese, japanese, and korean. Without language information,
we just can't rendering the text correctly. 

>and why you need to add a new CSS property that you'd expect all authors to add
>to their stylesheets to fix it?
No, I didn't add a new CSS property. (Either a new CSS property is needed for CSS
is another question.) I just use the existing CSS mechanism to propagate an existing 
html attribute.

You might want to ask, if this piece of information is so important, why it is 
missing? In fact, it is only partially missing, as you can see in my patch. We did 
parsed 'lang' attribute, but current post callback mechanism does not work in 
propagating language information in CSS tree. All other piece of codes (like font 
selecting using langGroup, map langGroup from charset when 'lang' is missing) are there.
But we do have bunch of bugs blocking by this problem.
Comment 22 Shanjian Li 2001-12-04 17:13:01 PST
Created attachment 60419 [details]
screen shot after the patch.
Comment 23 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2001-12-04 17:52:36 PST
No, I don't mean the descriptors -- I mean just the font selection algorithm
ignoring descriptors, with prioritized font-family lists.  If the CSS2 standard
is incorrect, we need to try to fix the standard, but I'm getting tired of
seeing every internationalization problem related to font selection being solved
by moving farther and farther from the standard rather than using the standard
to our own benefit (e.g., bug 33162, bug 4760, and I think another bug that I
can't find right now, which I believe could all have been fixed using the method
advocated by the standard -- by using a default prioritized font list beginning
with a Latin1 font for CJK locales).  If there's something wrong with the
standard, we should be involved with the W3C's internationalization and CSS
groups to try to fix it.  I'm having trouble telling whether this patch fits
along those same lines (or is a fix to those fixes that broke sometime recently
due to other changes) or whether it's a different issue that I need to understand.

>>and why you need to add a new CSS property that you'd expect all authors to
>>add to their stylesheets to fix it?
> No, I didn't add a new CSS property. (Either a new CSS property is needed for
> CSS is another question.) I just use the existing CSS mechanism to propagate
> an existing html attribute.

Your patch adds another CSS property to the stylesheet data structs (not just
the computed data structs), and I don't think it needs to do that -- it just
adds bloat to the system that you don't need.  If you're not intending that the
property be parsed from a CSS stylesheet, you don't need any of the changes to
nsCSSDeclaration.cpp, nsCSSPropList.h, nsICSSDeclaration.h, or
nsCSSStyleRule.cpp.  If your patch doesn't need any of those changes, then it
might be a lot easier for me to understand what the patch is trying to change.
Comment 24 Jungshik Shin 2001-12-04 18:43:36 PST
 First of all, in my bug report and comments  I misused the term
CSS (property) when refering to 'lang' attribute for html tags like
div,span,p and so forth. This misuse of mine may have something to do
with what you don't like (adding pseudo-css property?) in Shanjian's
way of fixing this bug.

DB>But is there any need for this pseudo-property to be parsed?
SJ>Yes, definitely. Whenever we are selecting any font, we always 
SJ>need to know what 
SJ>language we are dealing with. That allow us to choose the right font
SJ> for that language.

DB>So you're proposing a new CSS property because we don't use the CSS2 font
DB>selection algorithm correctly so things are broken?  If we use the CSS2 font
DB>selection algorithm correctly it should be able to solve most of
DB> these problems by specifying fonts in the correct order.  

  Did you mean that CSS2 font selection algorithm is currently broken
in Mozilla or that it's not broken but  Shanjian's patch assumed that
it's broken and bypassed it?  If the latter is the case, building
a *lang-dependent* virtual font-family list  may be a solution.
(I suspect that Mozilla builds these 'internal' 'virtual' lists for
langs or character codings it support at the start-up  for a couple of
Mozilla's internal classes like '-moz-fixed' and what Shanjian's patch
does is to activate one of them according to the value of 'lang')


 :lang(zh-TW) { font-family: trad-chinese-font, x-western-font,.....}
 :lang(zh-CN) { font-family: simp-chinese-font, x-western-font,.....}
 :lang(ja) { font-family: ja-font, x-western-font,.....}
 :lang(ko) { font-family: ko-font, x-western-font,.....}

However, there's a problem with this in X11 unless
this font names in font-family list include not only font-face but also
foundry, character set and encoding (the first two and the last two
elements of XLFD. See the footnote 1 below)

DB>Or do I not understand the problem correctly?

  I'm not sure  whether Shanjian's way of fixing this bug is most
desirable (whether that is the best or not, it works beautifully).  
You're certainly a much better judge in CSS related issues
than I'm.  However, I think you probably agree  that HTML 'lang' attribute
specification should be utilized to help font selection  in *absence*
of sufficient information provided via CSS.

What would you say Mozilla should do when rendering a document like
http://www.unicode.org/iuc10/x-utf8.html ?  Whether that's right or
wrong, there are documents with 'lang' attribute specified but without
font specified via CSS. I believe when 'font-family' is not specified
in CSS, Mozilla kinda build a virtual font-family list to use. 

  The question here is what fonts go there in what order and what
considerations have to be given when building it. One of factors to
consider is 'lang' attribute, isn't it?  The locale under which Mozilla
is launched and the encoding (character coding) of the currrent document
are considered but 'lang' attribute is out of the equation. 
The result of not taking into account 'lang' attribute leads
to a very ugly rendering as shown in attachment 53479 [details]. 

DB>Could you explain very clearly what the problem that you're trying
DB> to fix is,

 Given a html snippett like this (with no font specification
in CSS), what does Mozilla have to do?  An example of this 
(misnamed!!) is given at the URL in the URL field and
another is at <http://www.unicode.org/iuc/iuc10/x-utf8.html>.

<div lang="zh-CN"> Simplified Chinese </div>
<div lang="ja"> Japanese</div>
<div lang="zh-TW">Traditional Chinese</div>
<div lang="ko">Korean</div>

When 'font-family' is not specified, I believe 'lang' attribute has to
be refered to to build a 'virtual' font-family list picking fonts
from user pref. files. For <div lang="zh-CN">, the first font
in this virtual font-family list should be a font specified for
Chinese(Simplified)  and for <div lang="ja">, it should
be a font for Japanese. 

Instead of doing this, Mozilla without Shanjian's patch depends on the
character coding(encoding) of the document and the locale under which
Mozilla is launched. If the character coding(encoding) of the document is
not 'unicode', that determines what fonts go into the 'virtual' font list
in what order. In case it's unicode, the locale is used to pick fonts.
If Mozilla is launched under zh-TW locale, for all four <div>'s above,
the first font in the virtual font-family list is always a font for
traditional Chinese regardless of 'lang'. It's a font for Korean if
Mozilla is launched under ko locale. attachment 53479 [details] clearly shows
this problem.
 

DB> why you need to add a new CSS property that you'd expect all authors
DB> to add to their stylesheets to fix it?

   Not all authors are expected to add 'pseudo-css' lang property to their
stylesheets.  To facilate lang-dependent font selection, they *only* have
to add 'lang' attribute to html tags such as <div>, <p>, and <span>.

  
footnote 1:
In Unix/X11, it's not uncommon that a single font-face/font-family name
is used for multiple fonts with sometimes drastically different looks. In
addition, there may be multiple gothic fonts with different character set
and encoding. In an ideal world, those fonts should all have identical
look and feel and be interchangeable, but the reality is not that simple.
(see my comments for bug 102623 and bug 94327)

Comment 25 Daniel Glazman (:glazou) 2001-12-04 19:14:53 PST
>  :lang(zh-TW) { font-family: trad-chinese-font, x-western-font,.....}
>  :lang(zh-CN) { font-family: simp-chinese-font, x-western-font,.....}
>  :lang(ja) { font-family: ja-font, x-western-font,.....}
>  :lang(ko) { font-family: ko-font, x-western-font,.....}
>
> However, there's a problem with this in X11 unless
> this font names in font-family list include not only font-face but also
> foundry, character set and encoding (the first two and the last two
> elements of XLFD. See the footnote 1 below)

What what what ???? No, I totally disagree with that. The UA has to look for
X fonts including the wanted characteristics whatever is the foundry. And 
encoding is most of the time directly available from the charset name. I mean
that if the X11 font is an hebraic font, it *is* an iso8859-8 font, right ?
I think that the interpretation above is a complete misunderstanding of the
concept of WebFonts and its corresponding font description and selection
mechanisms.

I am with David Baron here and not necessarily understand what is proposed and
why new additions are required. I am definitely with him about one point : if
you can prove the CSS font selection is false/incomplete/whatever, this *must*
be raised to the I18N and CSS Working Groups.
Comment 26 Shanjian Li 2001-12-04 19:55:43 PST
I would like to clarify several issues:
1)Treating HTML attribute as CSS property is not my invention. In file 
  nsGenericHTMLElement.cpp,   funtion "MapCommonAttributesInto" is 
  for this purpose. The mechanism is already there. Otherwise my patch
  will be even more complicated. "dir" is an attribute in HTML. (It also
  has a CSS counter part 'direction'). 
2)This language information is needed whenever font selection is needed,
  CSS rule resolving mechanism is the best way IMO. 
3)It is rather strange to see language is not mentioned at all in CSS's 
  font selection algorithm. It acts like a CSS property in many aspects 
  in the browser. 
4)Probably it doesn't add bloat to the system. Because we are passing 
  lang directly, "nsIHTMLMappedAttributes* mAttributes; " was eliminated 
  from nsNode structure. "mLanguage" is part of the "nsStyleVisibility"
  long before my change. 
5)It might be true in one or 2 places in those files that my change is 
  unnecessary. But most of them are needed even though 'lang' is not from
  CSS2. Those are simple changes and shouldn't be a concern. 
  

Comment 27 Jungshik Shin 2001-12-04 20:16:06 PST
  
> not necessarily understand what is proposed and
> why new additions are required.


Please, read all the comments in this bug,bug 102633 and bug 94327
before making any hasty judgement on the *reality*. I'm the last person
who wants to break the standard. I do want Mozilla to stick to all the
relevant standards as much as possible.. 

> What what what ???? No, I totally disagree with that. The UA has to look for
> X fonts including the wanted characteristics whatever is the foundry. And

  Mozilla does that now with bug 94327 fixed.
 
> encoding is most of the time directly available from the charset name. 
 
 Hmm.... do you know what the last two fields of XLFD are?   

> I mean
> that if the X11 font is an hebraic font, it *is* an iso8859-8 font, right ?

  No, it could be iso10646-1 as well.

  Do you know how many different 'gothic' fonts I have on my 
Linux box with different character sets, encodings, glyph coverage,
look and feel for Korean alone?  


> I think that the interpretation above is a complete misunderstanding of the
> concept of WebFonts and its corresponding font description and selection
> mechanisms.

  It's based on the assumption that font-family would be a pretty
good means of rather 'uniquely' specifying font, but unfortunately,
the reality of X11 font system (as opposed on Windows and MacOS)
is not that simple. Of course, it's moving in that direction with
Xft, but we're not there yet. 

Comment 28 Shanjian Li 2001-12-04 20:21:31 PST
>What what what ???? No, I totally disagree with that. The UA has to look for
>X fonts including the wanted characteristics whatever is the foundry. And 
>encoding is most of the time directly available from the charset name. I mean
>that if the X11 font is an hebraic font, it *is* an iso8859-8 font, right ?
>I think that the interpretation above is a complete misunderstanding of the
>concept of WebFonts and its corresponding font description and selection
>mechanisms.
Please explain what is "the concept of WebFonts" and so on. I don't want to 
limit this discussion to X fonts. It applies to all platforms. "Encoding" and 
"language" are 2 different concept. Yes, in most situations, language is deducible
from encoding (charset). But what happen when encoding is one of the transformation
of unicode (like utf8). As the time goes one, more and more webpage will be encoded
in unicode. Language has to be provided somehow. 

Let's leave along the question about amending CSS specification. In
http://www.w3.org/TR/html4/struct/dirlang.html#adef-lang

there are following paragraph for attribute 'lang'

Attribute definitions 
 lang = language-code [CI] 
This attribute specifies the base language of an element's attribute 
values and text content. The default value of this attribute is unknown. 

Language information specified via the lang attribute may be used by 
a user agent to control rendering in a variety of ways. Some situations \
where author-supplied language information may be helpful include:

              Assisting search engines 
              Assisting speech synthesizers 
              Helping a user agent select glyph variants for high quality typography 
              Helping a user agent choose a set of quotation marks 
              Helping a user agent make decisions about hyphenation, ligatures, and spacing 
              Assisting spell checkers and grammar checkers 

 The lang attribute specifies the language of element content and attribute 
 values; whether it is relevant for a given attribute depends on the syntax and
 semantics of the attribute and the operation involved.

 The intent of the lang attribute is to allow user agents to render content 
 more meaningfully based on accepted cultural practice for a given language.
 This does not imply that user agents should render characters that are 
 atypical for a particular language in less meaningful ways; user agents must make a
 best attempt to render all characters, regardless of the value specified by lang.

 For instance, if characters from the Greek alphabet appear in the midst of English text:

      <P><Q lang="en">Her super-powers were the result of
      &gamma;-radiation,</Q> he explained.</P>

  a user agent (1) should try to render the English content in an appropriate 
  manner (e.g., in its handling the quotation marks) and (2) must make a
  best attempt to render &gamma; even though it is not an English character.

AND

8.1.2 Inheritance of language codes

 An element inherits language code information according to the following order of precedence (highest to lowest):

      The lang attribute set for the element itself. 
      The closest parent element that has the lang attribute set (i.e., the lang attribute is inherited). 
      The HTTP "Content-Language" header (which may be configured in a server).
      User agent default values and user preferences. 

Does those statement looks similar to CSS property?
Is it a natural choice to implement this attribute using existing CSS rule resolving 
mechanism? 

>I am with David Baron here and not necessarily understand what is proposed and
>why new additions are required. I am definitely with him about one point : if
>you can prove the CSS font selection is false/incomplete/whatever, this *must*
>be raised to the I18N and CSS Working Groups.
This is not an invention of a CSS property. It does look strange to me that lang 
is not a CSS property in CSS2. Since I am not a CSS expert, I assume there is 
certain reason not to put it in CSS. (One of my guess is, language is the description
of content instead of appearance.) But we definitely need to honor this attribute.


for HTML. 
Comment 29 rbs 2001-12-04 20:25:00 PST
From earlier comments, it seems the bug is because nsStyleVisibility::mLanguage 
isn't computed properly by the Style System. 
http://lxr.mozilla.org/seamonkey/source/content/shared/public/nsStyleStruct.h#62
9

(Later on, the font engine uses that to pick the right subset in fonts, for 
example to distinguish "Times New Roman"-Western-subset from "Times New 
Roman"-Arabic-subset. Strictly speaking, it isn't the CSS selection algorithm 
that is at fault.)

Looks like the patch was an attempt to get nsStyleVisibility::mLanguage to be 
filled properly. But the issue might just be to get PostResolveCallback() to 
work.

@@ -3301,10 +3269,10 @@
     nsHTMLValue value;
     aAttributes->GetAttribute(nsHTMLAtoms::lang, value);
     if (value.GetUnit() == eHTMLUnit_String) {
-      // Register a post-resolve callback for filling in the language atom
-      // over in the computed style data.
-      aData->mAttributes = (nsIHTMLMappedAttributes*)aAttributes;
-      aData->mPostResolveCallback = &PostResolveCallback;
+      nsAutoString lang;
+      value.GetStringValue(lang);
+      aData->mDisplayData->mLang.SetStringValue(lang,
+                                                eCSSUnit_String);
Comment 30 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2001-12-04 20:51:30 PST
Comments on the backend changes:

> 4)Probably it doesn't add bloat to the system. Because we are passing 
>   lang directly, "nsIHTMLMappedAttributes* mAttributes; " was eliminated 
>   from nsNode structure. "mLanguage" is part of the "nsStyleVisibility"
>   long before my change.

The nsCSSDeclaration data structures (of which nsCSSDisplay is one) are one of
our biggest bloat problems.  It most definitely does add bloat to the system,
and for a bad reason (see next point).

> 5)It might be true in one or 2 places in those files that my change is 
>   unnecessary. But most of them are needed even though 'lang' is not from
>   CSS2. Those are simple changes and shouldn't be a concern. 

Causing a new CSS property to be recognized and parsed is not a simple change
that should not be of concern.  The CSS working group has agreed that, since the
CSS property namespace is very crowded, we will not add custom properties
without prefixing them.  Furthermore, adding a new CSS property, in the current
CSS declaration implementation (which we're hoping to change), adds a good bit
of bloat to the system.

Finally, I don't see a reason that 'lang' (or '-moz-lang') should be a CSS
property.  The language that content is written in should be part of the markup,
not part of the style.

You do not need to add a new CSS property to be parsed to allow the value to be
in the style structs -- the backend style rule structs and the frontend style
data structs are different structures, and you don't need anything in the
backend structs since there is no CSS property -- you only need the information
in the style data.  You should remove the changes to the 4 files I mentioned
above and attach a new patch for review so that we can then discuss the actual
changes that you want to make.
Comment 31 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2001-12-04 20:58:48 PST
Comments on font selection:

The basic idea of the CSS font selection algorithm is that a list of fonts
should be sufficient when styling an element -- if a glyph is not found in the
first font, then one proceeds to the second font to look for the glyph, etc. 
For documents containing multiple languages of text, there may need to be
different default font styles that act like selectors with [lang="..."] in the
UA stylesheet.

To solve problems like the poor appearance of western characters in CJK fonts,
the default font-family list for text in Japanese would *begin* with a Western
font that contains good glyphs for western characters, and it would then have
the preferred fonts used to display the Japanese characters.  This leads to
Western characters being displayed using Western fonts and Japanese characters
being displayed using Japanese fonts.

This is a simple way of doing things, although it generally requires more work
in the style code and less work in the font code.  The preferred approach up to
this point has been to hack lots of complexity into the platform-specific font
code (e.g., bug 33162 and bug 4760) and keep everything out of the style code.
I've been disappointed that the approach advocated by the standard (which would
have involved moving a bit more of the handling of these issues into the
cross-platform style code rather than handling them in the platform-specific
font code) was not even tried before nonstandard approaches were taken, and I'm
wondering how these changes fit in to our overall approach to font selection.
Comment 32 rbs 2001-12-04 21:49:48 PST
Yep, that's how it is supposed to work (and in fact that's how it precisely 
works in GfxWin). The default fonts can be anything and the language information 
is used in situations such as "element {font-family: serif}". There are lists 
for serif, sans-serif, etc fonts. The lists can be anything and the element's 
language (which comes from nsStyleVisibility::mLanguage) helps to pick the 
correct one for a language group (people filling the default prefs can use 
whatever fonts in order suitable for a particular language group). I am not sure 
if the weirdness on Linux is due to the general weirdness of Linux fonts or to 
the fact that lists of multiple defaults are not yet supported there. There is 
no choice other than sticking with a single default font per generic family 
there at the moment. I had hoped GfxGTK would have caught up with GfxWin, but...
[Err, disclaimer... the last two comments are a disgression from getting 
nsStyleVisibility::mLanguage filled properly which is the root cause of this 
bug.]
Comment 33 Jungshik Shin 2001-12-05 07:05:10 PST
 This may not be directly related to this bug, but since the font selection
issue  came up here .......

 I have nothing against doing as much as possible at the level
of platform-independent CSS2 and I think I18N team members agree with
you on this, but I'm afraid there's only as much we can do that way in
presence of unavoidable differences between platforms.


> The basic idea of the CSS font selection algorithm is that a list of fonts
> should be sufficient when styling an element -- if a glyph is not found in
the> first font, then one proceeds to the second font to look for the glyph, etc. 

  This works perfectly well under MS-Windows (and MacOS) where specifying
font-family narrows down the set of matched fonts reasonably well. Most
of time it's narrowed down to a single font.  Even if it's not (which
is rare. This may only happen if there are multiple fonts with a single
name in different formats, TTF, OTF and bitmap), it's safe to assume that
all of multiple matched fonts have identical looks and feels so that
picking any one of them (with some additional criteria if there's any)
wouldn't lead to any problem. 


  Under Unix/X11 this model breaks down especially for CJK fonts.
although for Western fonts this model still holds up pretty well. See
how many 'gothic' fonts I have. Unlike in MS-Windows, font-family in
X11  doesn't include foundry name and that results in degeneracies to
lift by other means.

Korean:
-daewoo-gothic-medium-r-normal--0-0-100-100-c-0-ksc5601.1987-0
-misc-gothic-medium-r-normal--0-0-75-75-c-0-ksc5601.1987-0
-sun-gothic-medium-r-normal--0-0-75-75-c-0-ksc5601.1987-0

-kaist-gothic-medium-r-normal--0-0-75-75-c-0-johab-1
-kaist-gothic-medium-r-normal--0-0-75-75-c-0-johabs-1
-kaist-gothic-medium-r-normal--0-0-75-75-c-0-johabsh-1

-munhwa-gothic-medium-r-normal--0-0-75-75-c-0-ksc5601.1992-3

Japanese:
-wadalab-gothic-medium-r-normal--0-0-0-0-c-0-jisx0208.1983-0
-misc-gothic-medium-r-normal--0-0-75-75-c-0-jisx0208.1983-0

Western Europe(Latin1)
-misc-gothic-medium-r-normal--0-0-75-75-c-0-iso8859-1
-wadalab-gothic-medium-r-normal--0-0-0-0-c-0-iso8859-1


That's why in prefs.js for Unix/X11, font.name.style.lang
has four fields (the first two fields and the last two
fields from XLFD) as shown below:

user_pref("font.name.monospace.ko", "kaist-gothic-johabsh-1")

as opposed to what's done under MS-Windows:

user_pref("font.name.monospace.ko", "gulimche")


 If we completely rely on CSS alone without any x11/gtk
specific measures taken, we're left with
'gothic' losing 'kaist, johabsh, 1' which are all important
in uniquely identifying the font a user wants Mozilla to use.
Mozilla may pick a font *different* from what a user specified
based on what it's left with.

 As though this is not complicated enough, a recent XFree86 has
two fonts with identical first two and last two fields in XLFD
but with different add-style field:

-misc-fixed-medium-r-normal-ja-18-120-100-100-c-180-iso10646-1
-misc-fixed-medium-r-normal-ko-18-120-100-100-c-180-iso10646-1

Character repertoires of two fonts are not identical although
they're both iso10646-1. 


> For documents containing multiple languages of text, there may need to be
> different default font styles that act like selectors with [lang="..."] in
the> UA stylesheet.

  Not just for multilingual documents but also for monolingual documents,
this is necessary if it's in utf-8 or any other unicode transformation
format. For instance, Mozilla-mail converts all email messages to utf-8
before rendering it and it needs to specify 'lang' to make up for the
information loss in converting from locale-specifing MIME charset to
locale-neutral UTF-8. See bug 102633 in which mainly distinguishing
between CJK cases has been talked about. However, it also applies
to breaking up the tie between Russian and Greek fonts and CJK fonts
which include a subset of  Cyrillic and Greek letters.  The reason this
hasn't been noticed by Russian/Greek speakers is that they run Mozilla
under Russian/Greek locale to read Russian/Greek emails. If Mozilla
is launched under one of CJK locales to read Russian emails, it would
have been apparent to them.  Related is bug 107217 (Russian html docs
are rendered with CJK fonts if they're in UTF-8 which doesn't have
inherent/default lang value unlike other lang/locale specific encodings.)


Anyway, this would work perfectly on MacOS and MS-Windows, but not
under Unix/X11 at the moment (without fixing this bug ) as I wrote in
comment #24.  Unix/X11 is moving in this direction with upcoming Xft/X
Render, but it will take long to complete the transition.

In the meantime, some platform specific measures (perhaps 'hacks' to you)
are necessary. (I think that's why nsFontMetricksGTK.cpp is so long and
complex: font banning, glyph exclusion, transliteration, single width
vs double width, and so forth).


More Off-topic (although related)
> To solve problems like the poor appearance of western characters in CJK
> fonts, the default font-family list for text in Japanese would *begin*
> with a Western font that contains good glyphs for western characters,
> and it would then have
> the preferred fonts used to display the Japanese characters.  This leads
> to Western characters being displayed using Western fonts and Japanese
> characters being displayed using Japanese fonts.


  This would not solve the problem.
In Japanese text, those Western characters are, more often than not,
expected to be rendered with glyphs drawn from Japanese fonts (NOT Western
fonts). Japanese fonts contain *good* glyphs for those characters that
go along well with neighboring Japanese characters.  To make things
more complicated, _sometimes_ this is not desired even by Japanese,
in which  case I think the author has to enclose Russian portion with
<span lang="ru">....</span>. Anyway, this part of the problem can be at
least partly (as far as just specifying font-family works in Unix/X11)
solved with lang-specific default font-familiy lists. (For Japanese,
the list begins with 'Japanese' fonts)

A harder part of the problem the patch for bug 33162 tries to solve is
that  characters NOT covered by ISO-8859-1 (but covered by Windows-1252)
in Western European text are ofen found in Japanese fonts but NOT found at
all in Western fonts (because many X11 installations have only iso8859-1
fonts instead of windows-1252 fonts). Therefore, even if Western fonts
come before Japanese fonts in the default font-family list for Western
European, those characters missing in Western fonts end up getting
rendered with Japanese glyphs that don't go along well with neighboring
European characters if CSS2 is abided by to letter. To work around this,
the patch uses 'transliteration' for Western European texts.

This is not the case under MS-Windows and only happens under Unix/X11.
CSS2 level solution would be nice, but I'm afraid for a while we have to
live with platform-specific measures until Xft/X Render finally catches
up in Unix/X11.
Comment 34 rbs 2001-12-05 11:41:53 PST
Since style resolution is critical, it will be impeded if the handling of the
default font lists is moved there (which unfortunately will require the UA CSS
to distinguish lists for each generic font family per language). Another
advantage in doing this work in GFX is that the same code is handling two cases:
"element { font-family: generic}" and "element { font-family: font1, font2,
generic }". If the "generic" re-mapping is done upfront, it could end up that
this wasn't necessary because the element is not rendered, or because it may
happen that glyph lookup doesn't reach the generic font in the font-family list
(this is done lazily in the font code at the moment). So while the situation on
Windows is more uniform that Linux, shielding the font code from knowing the
language will bring other issues to consider too. Now, back to Linux...
Comment 35 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2001-12-05 11:51:44 PST
I'd certainly be willing to review a patch that corrects the computation of the
language stored in the style data struct (although I think we should discuss the
larger issues) -- in fact, I'd love to see the number of places where we need
PostResolveCallbacks reduced, but I still object to the addition of a new CSS
property (and incorrectly, too).
Comment 36 Shanjian Li 2001-12-05 15:09:30 PST
Created attachment 60546 [details] [diff] [review]
new patch for dbaron's suggestion
Comment 37 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2001-12-05 15:17:20 PST
That patch removes the parsing but it doesn't remove the changes to the other
three files that add bloat, and now that you've removed the parsing, do
*absolutely* nothing (other than overwrite the lang when visibility is set in
some cases).  See comment #23, where I omitted mentioning the one file where you
did remove the changes (nsCSSParser.cpp).
Comment 38 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2001-12-05 15:24:39 PST
+  static nsILanguageAtomService *mLangService;

This should be gLangService rather than mLangService.

Also, the patch doesn't contain the changes to the module destructor.
Comment 39 Shanjian Li 2001-12-05 15:40:22 PST
dbaron,
Don't know how that can be done. nsRuleData has to contains 'lang' info (as it does 
in my patch) or contains the source of 'lang' info (as it does in original code, 
mAttributes) so that 'lang' info can be computed. Please be more specific about 
how to remove the bloatness. 
Comment 40 Shanjian Li 2001-12-05 15:47:58 PST
answer for #38
mLanguageService is now declared as static member of nsRuleNode. 
nsRuleNode::Shutdown is called is nsStyleSet.cpp. I put it there 
to reduce the affect code scope. 
Comment 41 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2001-12-05 16:21:28 PST
Oh, I see.  It's all this silly thing about how we use the same structs on the
heap in the declaration structs and on the stack when walking the rule tree.  
So maybe you'd be better off defining an nsCSSStackDisplay that inherits from
nsCSSDisplay but also has an mLang member?  (At some point soon we should stop
using those structures on the heap...)  You still don't need the changes to
nsCSSStyleRule.cpp.

And probably it's better to call nsRuleNode::Shutdown from the module destructor
(in content/build/nsContentModule.cpp) -- it will be less confusing later when I
try to move that code around.
Comment 42 Shanjian Li 2001-12-05 17:15:24 PST
Created attachment 60569 [details] [diff] [review]
another update address comment #41
Comment 43 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2001-12-05 18:00:22 PST
> So maybe you'd be better off defining an nsCSSStackDisplay that inherits from
> nsCSSDisplay but also has an mLang member?
Comment 44 Shanjian Li 2001-12-05 19:02:45 PST
I didn't miss that part, but I thought it was optional. I am not sure to 
complicate the code with another almost identical structure just to save 
a field is a good idea. I will investigate this a little bit of further 
tomorrow. BTW, what controls the number of nsCSSDisplay structures used 
by system, including both heap and stack? 
Comment 45 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2001-12-05 19:31:22 PST
There are only about 300 nsCSSDisplay to bring up a browser window -- I thought
there were a bit more than that.  Any style rule in a stylesheet that has one of
the properties stored in that struct causes one to be created.

So I guess this seems OK to me, since we won't be using these structures anymore
once we fix bug 107270.

However, you should put a highly visible comment above the line you're adding in
nsICSSDeclaration.h saying that the mLang member variable is there not because
in needs to be stored in nsCSSDeclaration objects but because it's needed on the
stack when the struct is used in WalkRuleTree.

On the other hand, it shouldn't be difficult to do what I suggested, I don't
think...

Could you also change the name of the mLangService that you added to
nsRuleNode.h and .cpp to gLangService?

The 

+      nsServiceManager::GetService(NS_LANGUAGEATOMSERVICE_CONTRACTID,
+        NS_GET_IID(nsILanguageAtomService), (nsISupports**) &mLangService);

much more cleanly written as

+      CallGetService(NS_LANGUAGEATOMSERVICE_CONTRACTID, &mLangService);

(CallGetService is a function template that uses the type of the template
parameter to get the right IID.)

Finally, there's no need for these 4 lines:

+  } 
+  else if (eCSSUnit_Inherit == displayData.mLang.GetUnit()) {
+    inherited = PR_TRUE;
+    visibility->mLanguage = parentVisibility->mLanguage;

since displayData.mLang isn't a real property.  The "lang: string, inherit"
comment should probably also mention that it's not a real property.

Other than that, the patch looks fine, although I'd like to see the bloat issue
fixed if there's not some obstacle that I'm not thinking of.
Comment 46 rbs 2001-12-05 20:00:44 PST
Any info on why PostResolveCallback() is broken here? I had thought that it was
a ad-hoc meant to be used with caution, but to rescue cases like this.
Comment 47 Shanjian Li 2001-12-06 15:35:06 PST
Created attachment 60716 [details] [diff] [review]
yet another update

This patch renamed mLangService to gLangService, and removed 4 lines 
in nsRuleNode.cpp as suggested by dbaron. (I though those 4 lines was used 
to make inheritance work, but it seems ok after removing those 4 lines.)

I didnot create another structure for stack version of nsCSSDisplay. 
I took a look at bug 107270. The main idea of the currect approach is to 
parse raw css declaration to an intermediate form (property id, value pair) 
in CSSDeclarationImpl. The mapping to various CSS data structures are delayed.
So name "nsCSSDisplay" should be kept for stack version. Creation of another 
structure need to be removed when fixing 107270. So I found enough reason 
for me to excuse myself from doing extra work. :-)
(Did you ever think of compressing stack version of css data structure? Speed
might be a major concern here.)
Comment 48 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2001-12-07 10:07:58 PST
Comment on attachment 60716 [details] [diff] [review]
yet another update

r=dbaron
Comment 49 Marc Attinasi 2001-12-10 11:05:33 PST
Comment on attachment 60716 [details] [diff] [review]
yet another update

sr=attinasi
Comment 50 Shanjian Li 2001-12-10 20:22:24 PST
fix checked in. 
Comment 51 Teruko Kobayashi 2001-12-28 13:36:12 PST
Jungshik Shin, could you verify this?
Comment 52 Jungshik Shin 2001-12-29 13:01:07 PST
Yes, this bug was fixed. I've just checked with
2001-12-27 and it worked fine as early December
build did.
Comment 53 Shanjian Li 2002-04-24 16:45:33 PDT
*** Bug 122779 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.