Last Comment Bug 122779 - Use fonts according to Content-Language/lang attribute in Unicode page
: Use fonts according to Content-Language/lang attribute in Unicode page
Status: RESOLVED DUPLICATE of bug 547267
: intl
Product: Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: All All
: -- enhancement (vote)
: Future
Assigned To: Jungshik Shin
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2002-01-31 07:42 PST by dzy
Modified: 2012-02-23 06:57 PST (History)
17 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
v1 patch (1.51 KB, patch)
2002-08-15 22:50 PDT, Jungshik Shin
no flags Details | Diff | Splinter Review
v2 patch w/ some tightening (2.17 KB, patch)
2002-08-16 09:43 PDT, Jungshik Shin
no flags Details | Diff | Splinter Review
a new patch (2.99 KB, patch)
2002-08-17 15:46 PDT, Jungshik Shin
no flags Details | Diff | Splinter Review
the same patch with missing nsLanguageAtomService.cpp (4.10 KB, patch)
2002-08-17 15:51 PDT, Jungshik Shin
no flags Details | Diff | Splinter Review

Description dzy 2002-01-31 07:42:18 PST
Mozilla use only one set of fonts to render the text in Unicode pages, if some
glyphs are not available in the chosen font, then Mozilla/host OS tries to
render them with other fonts.

I think it would be better if we can set the font to use for each language, and
Moziila choose the font to render according to the Content-Language header or
lang attribute in Unicode pages.
Comment 1 Frank Tang 2002-02-14 13:09:47 PST
push to future. 
Comment 2 Roy Yokoyama 2002-02-28 11:26:21 PST
->shanjian
Comment 3 Shanjian Li 2002-03-13 17:43:12 PST
We now honor lang attribute. Could you verify?
Comment 4 Shanjian Li 2002-04-24 16:45:28 PDT
This has been fixed. 

*** This bug has been marked as a duplicate of 105199 ***
Comment 5 Yuying Long 2002-04-25 13:22:26 PDT
Verified as dup.  Please re-open if disagree.
Comment 6 Jungshik Shin 2002-08-14 23:42:35 PDT
'lang' attribute in html doc is honored in font selection
(as Shanjian wrote, it's fixed in bug 105199), 
but 'Content-Language' specification in HTTP header
doesn't seem to be honored by Mozilla. 

Try the following three pages under non-SC locale
(e.g. JA, TC or KO) and compare the results.

  http://jshin.net/moztest/zh-CN.utf8.html 
  http://jshin.net/moztest/zh-CN2.utf8.html
  http://jshin.net/moztest/zh-CN3.utf8.html

I set up htaccess file in my server in such a way
that my web server emits 'Content-Language: zh-CN'
in http header for the first file. 
The second file has meta-tag to specify Content-Language.
In the third one, 'lang' attribute is explicitly used. 

Only the third one is rendered with a SC font
while the first two pages are rendered with
multiple fonts ( I tried them under KO locale
so that KO and SC fonts were mixed to make
the result look like a 'ransom' note.)

I suggest that this bug be reopened with
the following summary line 'Content-Language 
has to be refered to in font selection for a UTF-8
page'.  

C-L http header field  is specified in http 1.1 section 14.12
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html)
Comment 7 Jungshik Shin 2002-08-15 22:50:00 PDT
Created attachment 95528 [details] [diff] [review]
v1 patch

With this patch, the first test case works as intended.
It's rendered with a single SC font when Mozilla
is run under Korean locale. The second test still
doesn't work. I thought C-L specified in meta-tag
gets parsed, but I may have been wrong. 
The third test case works well as it should(ie.
rendered with a single SC font)
However, with this patch, the default font size got smaller
for the third case. 

The patch for bug 98929 laid out the foundation for 
this fix, but 'lang' set by the patch for 98929
gets 'sort of' masked/shadowed by nsPresContext::UpdateCharset().
My patch reads off nsDocument::mLanguage (set by
the patch for 98929) from Content-Language http header)
and use it to set nsPresContext::mLanguage and langGroup
for Unicode-encoded pages(UTF-8 and other UTF's). 

My patch doesn't change the behavior of Mozilla for
html docs encoded in non-Unicode legacy encodings 
(ISO-8859-x, EUC-JP/KR, GB2312, Big5, KOI8-R/U, CP12xx,etc).
For those documents, langGroup is still derived from
the encoding(charset). 

UpdateCharset() may not be the best place to do this
and I'm open to suggestions for a better place.
Comment 8 Jungshik Shin 2002-08-16 09:43:36 PDT
Created attachment 95594 [details] [diff] [review]
v2 patch w/ some tightening

I tightened up some loose ends. When intl.accept.language
was missing in prefs.js(and C-L header field is absent
in http header), mContentLanguage was not
set explicitly and had a 'random' value. Now it's set to
NULL string explicitly. 

The reason the third case got rendered with a smaller
size font with my patch than otherwise turned out 
to be that I didn't have 'intl.accept.languages' 
in prefs.js and that resulted in langGroup 
being set to 'x-western', which has a smaller 
default font size than 'zh-CN' in my preference.

As for the second case, C-L in metatag is not
recognized yet. It has to be filed as a separate
bug. 

In summary, this patch does all it can do for the moment.
Comment 9 Shanjian Li 2002-08-16 09:59:12 PDT
Reopen the bug for content/language problem as suggested by jshin. 
Comment 10 Shanjian Li 2002-08-16 10:21:55 PDT
That's a very good job. You patch certainly make sense. I have 2 questions.
1, What about the priority between document charset and C-L? For example, a
document encoded in GB2312, but C-L specifies japanese?
2, When C-L is misspelled, should default language be used?

I suggest to try content language in all situation and use it if one is found,
otherwise fallback to existing code. 

jshin, can I reassign the bug to you or I have to act like a proxy? 
Comment 11 Jungshik Shin 2002-08-16 14:09:29 PDT
Shanijian,
Thank you for your comment and glad that you like it.
 
> I have 2 questions.
> 1, What about the priority between document charset and C-L? 
> For example, a
> document encoded in GB2312, but C-L specifies japanese?

  I thought about it and decided to leave those a bit edge
cases alone and to work only on UTF-* cases.
 Currently, mContentLanguage is obtained from
two different sources, C-L header and intl.accept.languages.
If mContentLanguage is from C-L header (or meta-tag: not
yet implemented), I think Mozilla should respect the author's
intent for cases where langGroup deduced from the encoding
is different from that specified in C-L. However, if it's
obtained from intl.accept.languages, I'm afraid we'd better
stick to the one deduced from the encoding(charset). 
For instance, I have 'ko,en-US' in intl.accept.languages.
It's fine to use 'ko' for UTF-8 page without C-L header.
However, it doesn't make sense to use 'ko' for GB2312 encoded
pages without C-L header. 

  One way to work around this issue is add mContentLangSource
(a la mCharacterSetSource) to nsDocument class so that
we can differentiate between mContentLanguage obtained
from C-L header(and meta-tag when implemented.
this has to be done at nsHTMLDocument class, though) and 
intl.accept.languges.

  Do you think it's worth pursuing? 

> 2, When C-L is misspelled, should default language be used?

  For misspelled C-L, langGroup is set to x-western by
nsLangAtom::LookupLanguage(). It appears harmless for
UTF-* documents although not most desirable. For documents
encoded in non-Unicode encodings, it can do some damages,
but currently my patch doesn't deal with them as I wrote
in my answer to your first question. Do you think 
that it's necessary to modify LookupLanguage() to accept
an _optional_ argument (charset) and set langGroup
to 'x-unicode' instead of ('x-western') if charset is 
one of UTF-*'s and aLanguage argument is unknown/misspelled? 
We can go even further and make LookupLanguage()
to set different default langGroup for different
charset (of course, this should be optional.)

  Alternative is to do some check in the caller, but ...



> I suggest to try content language in all situation and use it 
> if one is found, otherwise fallback to existing code.

  I explained some issues with doing this in the above.
Can you tell me what you think of them? 
   

> jshin, can I reassign the bug to you or I have to act like a proxy? 

  Yes, you can reassign it to me.

BTW, here's a different problem. When I took up this bug,
I tried to solve it by making Mozilla behave 
as if there were 'lang' attribute in <body> or <html>
of which value is obtained from C-L http header
as below:

<html lang="zh-TW">
<head>....</head>
<body>
...

or 

<html> ... 
<body lang="zh-TW'>

That is, I wanted to set lang(pseudo-class) in the
very root of style resolution, but I couldn't figure
out how and came up with modifying UpdateCharset(),
instead.  I'd like to hear your opinion on this
approach compared with my present patch. 

Another BTW, in my patch(attachment 95594 [details] [diff] [review]), there's
an mistake using 'end-1' where just 'end' is used
in calling Substring().
Comment 12 Shanjian Li 2002-08-16 14:40:48 PDT
>   I thought about it and decided to leave those a bit edge
> cases alone and to work only on UTF-* cases.
>  Currently, mContentLanguage is obtained from
> two different sources, C-L header and intl.accept.languages.
> If mContentLanguage is from C-L header (or meta-tag: not
> yet implemented), I think Mozilla should respect the author's
> intent for cases where langGroup deduced from the encoding
> is different from that specified in C-L. However, if it's
> obtained from intl.accept.languages, I'm afraid we'd better
> stick to the one deduced from the encoding(charset). 
Agree. I didn't realize that mContentLanguage can originated from
accept languages. That make things complicated.

>  One way to work around this issue is add mContentLangSource
> (a la mCharacterSetSource) to nsDocument class so that
> we can differentiate between mContentLanguage obtained
> from C-L header(and meta-tag when implemented.
> this has to be done at nsHTMLDocument class, though) and 
> intl.accept.languges.
Who will use the mContentLanguage besides what you are doing here? Is it
possible to let mContentLanguage be originated from only one source (ie, C-L
header) or take into consideration of charset when deciding mContentLanguage?
Using your example (accept-lang = ko, charset = gb2312), I don't think setting
mContentLanguage to Ko can lead to any reasonable result anywhere.

>  For misspelled C-L, langGroup is set to x-western by
> nsLangAtom::LookupLanguage(). It appears harmless for
> UTF-* documents although not most desirable. For documents
> encoded in non-Unicode encodings, it can do some damages,
> but currently my patch doesn't deal with them as I wrote
> in my answer to your first question. Do you think 
> that it's necessary to modify LookupLanguage() to accept
> . an _optional_ argument (charset) and set langGroup
> to 'x-unicode' instead of ('x-western') if charset is 
> one of UTF-*'s and aLanguage argument is unknown/misspelled? 
> We can go even further and make LookupLanguage()
> to set different default langGroup for different
> charset (of course, this should be optional.)
I suggest treat misspelled C-L as no C-L, ie. fall back to charset.

> That is, I wanted to set lang(pseudo-class) in the
> very root of style resolution, but I couldn't figure
> out how and came up with modifying UpdateCharset(),
> instead.  I'd like to hear your opinion on this
> approach compared with my present patch. 

I strongly favor your current approach. lang attribute from tags can still
override the default one. 
Comment 13 Shanjian Li 2002-08-16 14:42:43 PDT
give it to jshin.
Comment 14 Jungshik Shin 2002-08-17 15:41:19 PDT
> Who will use the mContentLanguage besides what you are doing here? 

  It's used in content/html/style/src/nsCSSStyleSheet.cpp
to select lang-based selector(??). The idea for referencing
intl.accept.languages probably arose to handle cases
where charset/encoding can't be mapped to a single unique language
That includes UTF-8(x-unicode) and ISO-8859-1(x-western). 
Try http://jshin.net/moztest/lang.latin1.html (with intl.accept.languages
="de", "fr" and "fr,de", "de,ko,en-US"). The way it's used
in nsCSSStyleSheet.cpp is different from the way it's used
in nsPresContext.cpp, though. In the former case, if there are 
multiple elements in intl.accept.languages and multiple
lang-based selectors in CSS, it seems like the last lang based
selector in CSS matched with one of languages specified in
intl.accept.language  gets effective. (that is, the order languages
are specified in intl.accept.language does not matter). 

However, there should be 
very few documents with something like 'q:lang(de)' in CSS but without 
explicit use of 'lang' attribute in html elements (here it's 'q').
This, along with not-so-intutive way of choosing
lang-based selector when multiple langs are present in intl.accept.languages
(as described above) I have some reservation about the usefulness of obtaining
mContentLanguage from intl.accept.languages.   It also has to be
noted, though, that C-L http header can have multiple languages listed
(however starnage it may sound.)

> Is it
> possible to let mContentLanguage be originated from only one source (ie, C-L
> header) 

  Yes, it's possible. It's easy (I just have to take out
a part of the patch for bug 98929: attachment 48982 [details] [diff] [review]),
but the question is whether or not to do. I am inclined to
take that out for the reason given above, but like to hear  
from Ulrich who added it in in his patch for bug 98929
before going ahead. 

  Another aspect that may make things complicated in the future
is that once bug 121193 is fixed, we  may have yet another way 
to obtain the value for mContentLanguage. With this,
mContentLanguage becomes almost like mCharSet in terms
of the number of sources where its value can come from
: C-L http header, meta-tag, user setting via UI 
(like character coding menu) and user pref. value
in intl.accept.language (settable via Pref|Language).
Of course, when bug 121193 is fixed, probably we
have to remove the last (intl.accept.languages).   

>>Do you think 
>> that it's necessary to modify LookupLanguage() to accept
>> . an _optional_ argument (charset) and set langGroup
>> to 'x-unicode' instead of ('x-western') if charset is 
>> one of UTF-*'s and aLanguage argument is unknown/misspelled? 
>> We can go even further and make LookupLanguage()
>> to set different default langGroup for different
>> charset (of course, this should be optional.)

> I suggest treat misspelled C-L as no C-L, ie. fall back to charset.

  Now it does. Instead of maing nsLookupLanguage() to
have an optional third argument (I don't know how
to specify an optional argument with the default
value in XPCOM IDL), I modified it to return 
NS_ERROR_LANGATOM_UNKNOWN_LANG (its severity bit
is still 0 so that NS_SUCCEEDED() results in 'true')
instead of NS_OK when mContentLanguage has
an unknown/misspelled language. In UpdateCharSet()
in nsPresContext.cpp, the return value is checked
and acted upon accordingly. 

> I strongly favor your current approach. lang attribute 
> from tags can still override the default one.
 
  All right. I agree with you that lang attrib. can
override the default det. from C-L.  

Comment 15 Jungshik Shin 2002-08-17 15:46:21 PDT
Created attachment 95718 [details] [diff] [review]
a new patch 

Addressing some of Shanjian's concerns.
I haven't yet taken out the code to obtain mC-L from intl.accept.languages.
mC-L is still only checked for UTF-* cases, but it's easy
to check mC-L for all charsets. However, that is contingent on
what we decide to do with intl.accept.language as a source
of mC-L.
Comment 16 Jungshik Shin 2002-08-17 15:51:34 PDT
Created attachment 95720 [details] [diff] [review]
the same patch with missing nsLanguageAtomService.cpp 

sorry for spamming. I forgot to include nsLanguageAtomService.cpp
Comment 17 Jungshik Shin 2002-08-17 17:39:29 PDT
> I modified nsLookupLanguage() to return 
> NS_ERROR_LANGATOM_UNKNOWN_LANG (its severity bit
> is still 0 so that NS_SUCCEEDED() results in 'true')
> instead of NS_OK when mContentLanguage has
> an unknown/misspelled language.

  Because of caching in nsLookupLanguage(), the second time
it's handed 'an invalid/misspelled' value in aLanguage,
it sets aResult to 'x-western' returning NS_OK
instead of NS_ERROR_LANGATOM_UNKNOWN. To work around
this, we have to record the
fact that language was set to 'x-western' because
of invalid/misspelled aLanguage is recorded in
the cache. How? One way is to add a third field
to nsILanguageAtom, but I have little idea
if it's allowed/desirable because that involves
changes in xpcom? and the use for that value
is pretty rare(?). 

See also bug 163271 which led me to discover this problem. 

BTW, I don't know why I can't accept this bug. Bugzilla doesn't
show 'accept' button. I've accepted some bugs in the past,,,,
Comment 18 Katsuhiko Momoi 2003-06-28 04:18:18 PDT
http://www.faqs.org/rfcs/rfc3282.html

Adding an RFC link for Content-Language. 
Comment 19 :aceman 2011-05-11 01:48:51 PDT
What is the status of this bug? Was the patch merged or is it now outdated? Is the bug still valid?
Comment 20 Masatoshi Kimura [:emk] 2012-02-22 20:00:34 PST
Ficed by bug 547267.

*** This bug has been marked as a duplicate of bug 547267 ***
Comment 21 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2012-02-23 06:57:38 PST
Did bug 416581 help as well?

Note You need to log in before you can comment on or make changes to this bug.