122779 - Use fonts according to Content-Language/lang attribute in Unicode page

Reporter

Description

•

23 years ago

Mozilla use only one set of fonts to render the text in Unicode pages, if some
glyphs are not available in the chosen font, then Mozilla/host OS tries to
render them with other fonts.

I think it would be better if we can set the font to use for each language, and
Moziila choose the font to render according to the Content-Language header or
lang attribute in Unicode pages.

Rui Xu

Updated

•

23 years ago

QA Contact: ruixu → ylong

Frank Tang

Comment 1

•

23 years ago

push to future.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Target Milestone: --- → Future

Rui Xu

Updated

•

23 years ago

Keywords: intl

Roy Yokoyama

Comment 2

•

23 years ago

->shanjian

Assignee: yokoyama → shanjian

Shanjian Li

Comment 3

•

23 years ago

We now honor lang attribute. Could you verify?

Status: NEW → ASSIGNED

Shanjian Li

Comment 4

•

23 years ago

This has been fixed. 

*** This bug has been marked as a duplicate of 105199 ***

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → DUPLICATE

Yuying Long

Comment 5

•

23 years ago

Verified as dup.  Please re-open if disagree.

Status: RESOLVED → VERIFIED

Jungshik Shin

Assignee

Comment 6

•

22 years ago

'lang' attribute in html doc is honored in font selection
(as Shanjian wrote, it's fixed in bug 105199), 
but 'Content-Language' specification in HTTP header
doesn't seem to be honored by Mozilla. 

Try the following three pages under non-SC locale
(e.g. JA, TC or KO) and compare the results.

  http://jshin.net/moztest/zh-CN.utf8.html 
  http://jshin.net/moztest/zh-CN2.utf8.html
  http://jshin.net/moztest/zh-CN3.utf8.html

I set up htaccess file in my server in such a way
that my web server emits 'Content-Language: zh-CN'
in http header for the first file. 
The second file has meta-tag to specify Content-Language.
In the third one, 'lang' attribute is explicitly used. 

Only the third one is rendered with a SC font
while the first two pages are rendered with
multiple fonts ( I tried them under KO locale
so that KO and SC fonts were mixed to make
the result look like a 'ransom' note.)

I suggest that this bug be reopened with
the following summary line 'Content-Language 
has to be refered to in font selection for a UTF-8
page'.  

C-L http header field  is specified in http 1.1 section 14.12
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html)

Jungshik Shin

Assignee

Comment 7

•

22 years ago

Attached patch v1 patch (obsolete) — Details — Splinter Review

With this patch, the first test case works as intended.
It's rendered with a single SC font when Mozilla
is run under Korean locale. The second test still
doesn't work. I thought C-L specified in meta-tag
gets parsed, but I may have been wrong. 
The third test case works well as it should(ie.
rendered with a single SC font)
However, with this patch, the default font size got smaller
for the third case. 

The patch for bug 98929 laid out the foundation for 
this fix, but 'lang' set by the patch for 98929
gets 'sort of' masked/shadowed by nsPresContext::UpdateCharset().
My patch reads off nsDocument::mLanguage (set by
the patch for 98929) from Content-Language http header)
and use it to set nsPresContext::mLanguage and langGroup
for Unicode-encoded pages(UTF-8 and other UTF's). 

My patch doesn't change the behavior of Mozilla for
html docs encoded in non-Unicode legacy encodings 
(ISO-8859-x, EUC-JP/KR, GB2312, Big5, KOI8-R/U, CP12xx,etc).
For those documents, langGroup is still derived from
the encoding(charset). 

UpdateCharset() may not be the best place to do this
and I'm open to suggestions for a better place.

Jungshik Shin

Assignee

Comment 8

•

22 years ago

Attached patch v2 patch w/ some tightening (obsolete) — Details — Splinter Review

I tightened up some loose ends. When intl.accept.language
was missing in prefs.js(and C-L header field is absent
in http header), mContentLanguage was not
set explicitly and had a 'random' value. Now it's set to
NULL string explicitly. 

The reason the third case got rendered with a smaller
size font with my patch than otherwise turned out 
to be that I didn't have 'intl.accept.languages' 
in prefs.js and that resulted in langGroup 
being set to 'x-western', which has a smaller 
default font size than 'zh-CN' in my preference.

As for the second case, C-L in metatag is not
recognized yet. It has to be filed as a separate
bug. 

In summary, this patch does all it can do for the moment.

Shanjian Li

Comment 9

•

22 years ago

Reopen the bug for content/language problem as suggested by jshin.

Status: VERIFIED → REOPENED

Resolution: DUPLICATE → ---

Shanjian Li

Comment 10

•

22 years ago

That's a very good job. You patch certainly make sense. I have 2 questions.
1, What about the priority between document charset and C-L? For example, a
document encoded in GB2312, but C-L specifies japanese?
2, When C-L is misspelled, should default language be used?

I suggest to try content language in all situation and use it if one is found,
otherwise fallback to existing code. 

jshin, can I reassign the bug to you or I have to act like a proxy?

Status: REOPENED → ASSIGNED

Jungshik Shin

Assignee

Comment 11

•

22 years ago

Shanijian,
Thank you for your comment and glad that you like it.
 
> I have 2 questions.
> 1, What about the priority between document charset and C-L? 
> For example, a
> document encoded in GB2312, but C-L specifies japanese?

  I thought about it and decided to leave those a bit edge
cases alone and to work only on UTF-* cases.
 Currently, mContentLanguage is obtained from
two different sources, C-L header and intl.accept.languages.
If mContentLanguage is from C-L header (or meta-tag: not
yet implemented), I think Mozilla should respect the author's
intent for cases where langGroup deduced from the encoding
is different from that specified in C-L. However, if it's
obtained from intl.accept.languages, I'm afraid we'd better
stick to the one deduced from the encoding(charset). 
For instance, I have 'ko,en-US' in intl.accept.languages.
It's fine to use 'ko' for UTF-8 page without C-L header.
However, it doesn't make sense to use 'ko' for GB2312 encoded
pages without C-L header. 

  One way to work around this issue is add mContentLangSource
(a la mCharacterSetSource) to nsDocument class so that
we can differentiate between mContentLanguage obtained
from C-L header(and meta-tag when implemented.
this has to be done at nsHTMLDocument class, though) and 
intl.accept.languges.

  Do you think it's worth pursuing? 

> 2, When C-L is misspelled, should default language be used?

  For misspelled C-L, langGroup is set to x-western by
nsLangAtom::LookupLanguage(). It appears harmless for
UTF-* documents although not most desirable. For documents
encoded in non-Unicode encodings, it can do some damages,
but currently my patch doesn't deal with them as I wrote
in my answer to your first question. Do you think 
that it's necessary to modify LookupLanguage() to accept
an _optional_ argument (charset) and set langGroup
to 'x-unicode' instead of ('x-western') if charset is 
one of UTF-*'s and aLanguage argument is unknown/misspelled? 
We can go even further and make LookupLanguage()
to set different default langGroup for different
charset (of course, this should be optional.)

  Alternative is to do some check in the caller, but ...



> I suggest to try content language in all situation and use it 
> if one is found, otherwise fallback to existing code.

  I explained some issues with doing this in the above.
Can you tell me what you think of them? 
   

> jshin, can I reassign the bug to you or I have to act like a proxy? 

  Yes, you can reassign it to me.

BTW, here's a different problem. When I took up this bug,
I tried to solve it by making Mozilla behave 
as if there were 'lang' attribute in <body> or <html>
of which value is obtained from C-L http header
as below:

<html lang="zh-TW">
<head>....</head>
<body>
...

or 

<html> ... 
<body lang="zh-TW'>

That is, I wanted to set lang(pseudo-class) in the
very root of style resolution, but I couldn't figure
out how and came up with modifying UpdateCharset(),
instead.  I'd like to hear your opinion on this
approach compared with my present patch. 

Another BTW, in my patch(attachment 95594 [details] [diff] [review]), there's
an mistake using 'end-1' where just 'end' is used
in calling Substring().

Shanjian Li

Comment 12

•

22 years ago

>   I thought about it and decided to leave those a bit edge
> cases alone and to work only on UTF-* cases.
>  Currently, mContentLanguage is obtained from
> two different sources, C-L header and intl.accept.languages.
> If mContentLanguage is from C-L header (or meta-tag: not
> yet implemented), I think Mozilla should respect the author's
> intent for cases where langGroup deduced from the encoding
> is different from that specified in C-L. However, if it's
> obtained from intl.accept.languages, I'm afraid we'd better
> stick to the one deduced from the encoding(charset). 
Agree. I didn't realize that mContentLanguage can originated from
accept languages. That make things complicated.

>  One way to work around this issue is add mContentLangSource
> (a la mCharacterSetSource) to nsDocument class so that
> we can differentiate between mContentLanguage obtained
> from C-L header(and meta-tag when implemented.
> this has to be done at nsHTMLDocument class, though) and 
> intl.accept.languges.
Who will use the mContentLanguage besides what you are doing here? Is it
possible to let mContentLanguage be originated from only one source (ie, C-L
header) or take into consideration of charset when deciding mContentLanguage?
Using your example (accept-lang = ko, charset = gb2312), I don't think setting
mContentLanguage to Ko can lead to any reasonable result anywhere.

>  For misspelled C-L, langGroup is set to x-western by
> nsLangAtom::LookupLanguage(). It appears harmless for
> UTF-* documents although not most desirable. For documents
> encoded in non-Unicode encodings, it can do some damages,
> but currently my patch doesn't deal with them as I wrote
> in my answer to your first question. Do you think 
> that it's necessary to modify LookupLanguage() to accept
> . an _optional_ argument (charset) and set langGroup
> to 'x-unicode' instead of ('x-western') if charset is 
> one of UTF-*'s and aLanguage argument is unknown/misspelled? 
> We can go even further and make LookupLanguage()
> to set different default langGroup for different
> charset (of course, this should be optional.)
I suggest treat misspelled C-L as no C-L, ie. fall back to charset.

> That is, I wanted to set lang(pseudo-class) in the
> very root of style resolution, but I couldn't figure
> out how and came up with modifying UpdateCharset(),
> instead.  I'd like to hear your opinion on this
> approach compared with my present patch. 

I strongly favor your current approach. lang attribute from tags can still
override the default one.

Shanjian Li

Comment 13

•

22 years ago

give it to jshin.

Assignee: shanjian → jshin

Status: ASSIGNED → NEW

Jungshik Shin

Assignee

Comment 14

•

22 years ago

> Who will use the mContentLanguage besides what you are doing here? 

  It's used in content/html/style/src/nsCSSStyleSheet.cpp
to select lang-based selector(??). The idea for referencing
intl.accept.languages probably arose to handle cases
where charset/encoding can't be mapped to a single unique language
That includes UTF-8(x-unicode) and ISO-8859-1(x-western). 
Try http://jshin.net/moztest/lang.latin1.html (with intl.accept.languages
="de", "fr" and "fr,de", "de,ko,en-US"). The way it's used
in nsCSSStyleSheet.cpp is different from the way it's used
in nsPresContext.cpp, though. In the former case, if there are 
multiple elements in intl.accept.languages and multiple
lang-based selectors in CSS, it seems like the last lang based
selector in CSS matched with one of languages specified in
intl.accept.language  gets effective. (that is, the order languages
are specified in intl.accept.language does not matter). 

However, there should be 
very few documents with something like 'q:lang(de)' in CSS but without 
explicit use of 'lang' attribute in html elements (here it's 'q').
This, along with not-so-intutive way of choosing
lang-based selector when multiple langs are present in intl.accept.languages
(as described above) I have some reservation about the usefulness of obtaining
mContentLanguage from intl.accept.languages.   It also has to be
noted, though, that C-L http header can have multiple languages listed
(however starnage it may sound.)

> Is it
> possible to let mContentLanguage be originated from only one source (ie, C-L
> header) 

  Yes, it's possible. It's easy (I just have to take out
a part of the patch for bug 98929: attachment 48982 [details] [diff] [review]),
but the question is whether or not to do. I am inclined to
take that out for the reason given above, but like to hear  
from Ulrich who added it in in his patch for bug 98929
before going ahead. 

  Another aspect that may make things complicated in the future
is that once bug 121193 is fixed, we  may have yet another way 
to obtain the value for mContentLanguage. With this,
mContentLanguage becomes almost like mCharSet in terms
of the number of sources where its value can come from
: C-L http header, meta-tag, user setting via UI 
(like character coding menu) and user pref. value
in intl.accept.language (settable via Pref|Language).
Of course, when bug 121193 is fixed, probably we
have to remove the last (intl.accept.languages).   

>>Do you think 
>> that it's necessary to modify LookupLanguage() to accept
>> . an _optional_ argument (charset) and set langGroup
>> to 'x-unicode' instead of ('x-western') if charset is 
>> one of UTF-*'s and aLanguage argument is unknown/misspelled? 
>> We can go even further and make LookupLanguage()
>> to set different default langGroup for different
>> charset (of course, this should be optional.)

> I suggest treat misspelled C-L as no C-L, ie. fall back to charset.

  Now it does. Instead of maing nsLookupLanguage() to
have an optional third argument (I don't know how
to specify an optional argument with the default
value in XPCOM IDL), I modified it to return 
NS_ERROR_LANGATOM_UNKNOWN_LANG (its severity bit
is still 0 so that NS_SUCCEEDED() results in 'true')
instead of NS_OK when mContentLanguage has
an unknown/misspelled language. In UpdateCharSet()
in nsPresContext.cpp, the return value is checked
and acted upon accordingly. 

> I strongly favor your current approach. lang attribute 
> from tags can still override the default one.
 
  All right. I agree with you that lang attrib. can
override the default det. from C-L.

Jungshik Shin

Assignee

Comment 15

•

22 years ago

Attached patch a new patch (obsolete) — Details — Splinter Review

Addressing some of Shanjian's concerns.
I haven't yet taken out the code to obtain mC-L from intl.accept.languages.
mC-L is still only checked for UTF-* cases, but it's easy
to check mC-L for all charsets. However, that is contingent on
what we decide to do with intl.accept.language as a source
of mC-L.

Jungshik Shin

Assignee

Comment 16

•

22 years ago

Attached patch the same patch with missing nsLanguageAtomService.cpp — Details — Splinter Review

sorry for spamming. I forgot to include nsLanguageAtomService.cpp

Attachment #95528 - Attachment is obsolete: true

Attachment #95594 - Attachment is obsolete: true

Attachment #95718 - Attachment is obsolete: true

Jungshik Shin

Assignee

Comment 17

•

22 years ago

> I modified nsLookupLanguage() to return 
> NS_ERROR_LANGATOM_UNKNOWN_LANG (its severity bit
> is still 0 so that NS_SUCCEEDED() results in 'true')
> instead of NS_OK when mContentLanguage has
> an unknown/misspelled language.

  Because of caching in nsLookupLanguage(), the second time
it's handed 'an invalid/misspelled' value in aLanguage,
it sets aResult to 'x-western' returning NS_OK
instead of NS_ERROR_LANGATOM_UNKNOWN. To work around
this, we have to record the
fact that language was set to 'x-western' because
of invalid/misspelled aLanguage is recorded in
the cache. How? One way is to add a third field
to nsILanguageAtom, but I have little idea
if it's allowed/desirable because that involves
changes in xpcom? and the use for that value
is pretty rare(?). 

See also bug 163271 which led me to discover this problem. 

BTW, I don't know why I can't accept this bug. Bugzilla doesn't
show 'accept' button. I've accepted some bugs in the past,,,,

Andrew Hagen

Updated

•

22 years ago

Keywords: mozilla1.3, patch

Jungshik Shin

Assignee

Updated

•

22 years ago

Status: NEW → ASSIGNED

Katsuhiko Momoi

Comment 18

•

21 years ago

http://www.faqs.org/rfcs/rfc3282.html

Adding an RFC link for Content-Language.

Phil Ringnalda (:philor)

Updated

•

15 years ago

QA Contact: amyy → i18n

:aceman

Comment 19

•

13 years ago

What is the status of this bug? Was the patch merged or is it now outdated? Is the bug still valid?

Masatoshi Kimura [:emk]

Comment 20

•

13 years ago

Ficed by bug 547267.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago → 13 years ago

Resolution: --- → DUPLICATE

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 21

•

13 years ago

Did bug 416581 help as well?

v1 patch 22 years ago Jungshik Shin 1.51 KB, patch		Details \| Diff \| Splinter Review
v2 patch w/ some tightening 22 years ago Jungshik Shin 2.17 KB, patch		Details \| Diff \| Splinter Review
a new patch 22 years ago Jungshik Shin 2.99 KB, patch		Details \| Diff \| Splinter Review
the same patch with missing nsLanguageAtomService.cpp 22 years ago Jungshik Shin 4.10 KB, patch		Details \| Diff \| Splinter Review