Closed Bug 122779 Opened 23 years ago Closed 13 years ago

Use fonts according to Content-Language/lang attribute in Unicode page

Categories

(Core :: Internationalization, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 547267
Future

People

(Reporter: dzy, Assigned: jshin1987)

Details

(Keywords: intl)

Attachments

(1 file, 3 obsolete files)

Mozilla use only one set of fonts to render the text in Unicode pages, if some glyphs are not available in the chosen font, then Mozilla/host OS tries to render them with other fonts. I think it would be better if we can set the font to use for each language, and Moziila choose the font to render according to the Content-Language header or lang attribute in Unicode pages.
QA Contact: ruixu → ylong
push to future.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Target Milestone: --- → Future
Keywords: intl
->shanjian
Assignee: yokoyama → shanjian
We now honor lang attribute. Could you verify?
Status: NEW → ASSIGNED
This has been fixed. *** This bug has been marked as a duplicate of 105199 ***
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
Verified as dup. Please re-open if disagree.
Status: RESOLVED → VERIFIED
'lang' attribute in html doc is honored in font selection (as Shanjian wrote, it's fixed in bug 105199), but 'Content-Language' specification in HTTP header doesn't seem to be honored by Mozilla. Try the following three pages under non-SC locale (e.g. JA, TC or KO) and compare the results. http://jshin.net/moztest/zh-CN.utf8.html http://jshin.net/moztest/zh-CN2.utf8.html http://jshin.net/moztest/zh-CN3.utf8.html I set up htaccess file in my server in such a way that my web server emits 'Content-Language: zh-CN' in http header for the first file. The second file has meta-tag to specify Content-Language. In the third one, 'lang' attribute is explicitly used. Only the third one is rendered with a SC font while the first two pages are rendered with multiple fonts ( I tried them under KO locale so that KO and SC fonts were mixed to make the result look like a 'ransom' note.) I suggest that this bug be reopened with the following summary line 'Content-Language has to be refered to in font selection for a UTF-8 page'. C-L http header field is specified in http 1.1 section 14.12 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html)
Attached patch v1 patch (obsolete) — Splinter Review
With this patch, the first test case works as intended. It's rendered with a single SC font when Mozilla is run under Korean locale. The second test still doesn't work. I thought C-L specified in meta-tag gets parsed, but I may have been wrong. The third test case works well as it should(ie. rendered with a single SC font) However, with this patch, the default font size got smaller for the third case. The patch for bug 98929 laid out the foundation for this fix, but 'lang' set by the patch for 98929 gets 'sort of' masked/shadowed by nsPresContext::UpdateCharset(). My patch reads off nsDocument::mLanguage (set by the patch for 98929) from Content-Language http header) and use it to set nsPresContext::mLanguage and langGroup for Unicode-encoded pages(UTF-8 and other UTF's). My patch doesn't change the behavior of Mozilla for html docs encoded in non-Unicode legacy encodings (ISO-8859-x, EUC-JP/KR, GB2312, Big5, KOI8-R/U, CP12xx,etc). For those documents, langGroup is still derived from the encoding(charset). UpdateCharset() may not be the best place to do this and I'm open to suggestions for a better place.
Attached patch v2 patch w/ some tightening (obsolete) — Splinter Review
I tightened up some loose ends. When intl.accept.language was missing in prefs.js(and C-L header field is absent in http header), mContentLanguage was not set explicitly and had a 'random' value. Now it's set to NULL string explicitly. The reason the third case got rendered with a smaller size font with my patch than otherwise turned out to be that I didn't have 'intl.accept.languages' in prefs.js and that resulted in langGroup being set to 'x-western', which has a smaller default font size than 'zh-CN' in my preference. As for the second case, C-L in metatag is not recognized yet. It has to be filed as a separate bug. In summary, this patch does all it can do for the moment.
Reopen the bug for content/language problem as suggested by jshin.
Status: VERIFIED → REOPENED
Resolution: DUPLICATE → ---
That's a very good job. You patch certainly make sense. I have 2 questions. 1, What about the priority between document charset and C-L? For example, a document encoded in GB2312, but C-L specifies japanese? 2, When C-L is misspelled, should default language be used? I suggest to try content language in all situation and use it if one is found, otherwise fallback to existing code. jshin, can I reassign the bug to you or I have to act like a proxy?
Status: REOPENED → ASSIGNED
Shanijian, Thank you for your comment and glad that you like it. > I have 2 questions. > 1, What about the priority between document charset and C-L? > For example, a > document encoded in GB2312, but C-L specifies japanese? I thought about it and decided to leave those a bit edge cases alone and to work only on UTF-* cases. Currently, mContentLanguage is obtained from two different sources, C-L header and intl.accept.languages. If mContentLanguage is from C-L header (or meta-tag: not yet implemented), I think Mozilla should respect the author's intent for cases where langGroup deduced from the encoding is different from that specified in C-L. However, if it's obtained from intl.accept.languages, I'm afraid we'd better stick to the one deduced from the encoding(charset). For instance, I have 'ko,en-US' in intl.accept.languages. It's fine to use 'ko' for UTF-8 page without C-L header. However, it doesn't make sense to use 'ko' for GB2312 encoded pages without C-L header. One way to work around this issue is add mContentLangSource (a la mCharacterSetSource) to nsDocument class so that we can differentiate between mContentLanguage obtained from C-L header(and meta-tag when implemented. this has to be done at nsHTMLDocument class, though) and intl.accept.languges. Do you think it's worth pursuing? > 2, When C-L is misspelled, should default language be used? For misspelled C-L, langGroup is set to x-western by nsLangAtom::LookupLanguage(). It appears harmless for UTF-* documents although not most desirable. For documents encoded in non-Unicode encodings, it can do some damages, but currently my patch doesn't deal with them as I wrote in my answer to your first question. Do you think that it's necessary to modify LookupLanguage() to accept an _optional_ argument (charset) and set langGroup to 'x-unicode' instead of ('x-western') if charset is one of UTF-*'s and aLanguage argument is unknown/misspelled? We can go even further and make LookupLanguage() to set different default langGroup for different charset (of course, this should be optional.) Alternative is to do some check in the caller, but ... > I suggest to try content language in all situation and use it > if one is found, otherwise fallback to existing code. I explained some issues with doing this in the above. Can you tell me what you think of them? > jshin, can I reassign the bug to you or I have to act like a proxy? Yes, you can reassign it to me. BTW, here's a different problem. When I took up this bug, I tried to solve it by making Mozilla behave as if there were 'lang' attribute in <body> or <html> of which value is obtained from C-L http header as below: <html lang="zh-TW"> <head>....</head> <body> ... or <html> ... <body lang="zh-TW'> That is, I wanted to set lang(pseudo-class) in the very root of style resolution, but I couldn't figure out how and came up with modifying UpdateCharset(), instead. I'd like to hear your opinion on this approach compared with my present patch. Another BTW, in my patch(attachment 95594 [details] [diff] [review]), there's an mistake using 'end-1' where just 'end' is used in calling Substring().
> I thought about it and decided to leave those a bit edge > cases alone and to work only on UTF-* cases. > Currently, mContentLanguage is obtained from > two different sources, C-L header and intl.accept.languages. > If mContentLanguage is from C-L header (or meta-tag: not > yet implemented), I think Mozilla should respect the author's > intent for cases where langGroup deduced from the encoding > is different from that specified in C-L. However, if it's > obtained from intl.accept.languages, I'm afraid we'd better > stick to the one deduced from the encoding(charset). Agree. I didn't realize that mContentLanguage can originated from accept languages. That make things complicated. > One way to work around this issue is add mContentLangSource > (a la mCharacterSetSource) to nsDocument class so that > we can differentiate between mContentLanguage obtained > from C-L header(and meta-tag when implemented. > this has to be done at nsHTMLDocument class, though) and > intl.accept.languges. Who will use the mContentLanguage besides what you are doing here? Is it possible to let mContentLanguage be originated from only one source (ie, C-L header) or take into consideration of charset when deciding mContentLanguage? Using your example (accept-lang = ko, charset = gb2312), I don't think setting mContentLanguage to Ko can lead to any reasonable result anywhere. > For misspelled C-L, langGroup is set to x-western by > nsLangAtom::LookupLanguage(). It appears harmless for > UTF-* documents although not most desirable. For documents > encoded in non-Unicode encodings, it can do some damages, > but currently my patch doesn't deal with them as I wrote > in my answer to your first question. Do you think > that it's necessary to modify LookupLanguage() to accept > . an _optional_ argument (charset) and set langGroup > to 'x-unicode' instead of ('x-western') if charset is > one of UTF-*'s and aLanguage argument is unknown/misspelled? > We can go even further and make LookupLanguage() > to set different default langGroup for different > charset (of course, this should be optional.) I suggest treat misspelled C-L as no C-L, ie. fall back to charset. > That is, I wanted to set lang(pseudo-class) in the > very root of style resolution, but I couldn't figure > out how and came up with modifying UpdateCharset(), > instead. I'd like to hear your opinion on this > approach compared with my present patch. I strongly favor your current approach. lang attribute from tags can still override the default one.
give it to jshin.
Assignee: shanjian → jshin
Status: ASSIGNED → NEW
> Who will use the mContentLanguage besides what you are doing here? It's used in content/html/style/src/nsCSSStyleSheet.cpp to select lang-based selector(??). The idea for referencing intl.accept.languages probably arose to handle cases where charset/encoding can't be mapped to a single unique language That includes UTF-8(x-unicode) and ISO-8859-1(x-western). Try http://jshin.net/moztest/lang.latin1.html (with intl.accept.languages ="de", "fr" and "fr,de", "de,ko,en-US"). The way it's used in nsCSSStyleSheet.cpp is different from the way it's used in nsPresContext.cpp, though. In the former case, if there are multiple elements in intl.accept.languages and multiple lang-based selectors in CSS, it seems like the last lang based selector in CSS matched with one of languages specified in intl.accept.language gets effective. (that is, the order languages are specified in intl.accept.language does not matter). However, there should be very few documents with something like 'q:lang(de)' in CSS but without explicit use of 'lang' attribute in html elements (here it's 'q'). This, along with not-so-intutive way of choosing lang-based selector when multiple langs are present in intl.accept.languages (as described above) I have some reservation about the usefulness of obtaining mContentLanguage from intl.accept.languages. It also has to be noted, though, that C-L http header can have multiple languages listed (however starnage it may sound.) > Is it > possible to let mContentLanguage be originated from only one source (ie, C-L > header) Yes, it's possible. It's easy (I just have to take out a part of the patch for bug 98929: attachment 48982 [details] [diff] [review]), but the question is whether or not to do. I am inclined to take that out for the reason given above, but like to hear from Ulrich who added it in in his patch for bug 98929 before going ahead. Another aspect that may make things complicated in the future is that once bug 121193 is fixed, we may have yet another way to obtain the value for mContentLanguage. With this, mContentLanguage becomes almost like mCharSet in terms of the number of sources where its value can come from : C-L http header, meta-tag, user setting via UI (like character coding menu) and user pref. value in intl.accept.language (settable via Pref|Language). Of course, when bug 121193 is fixed, probably we have to remove the last (intl.accept.languages). >>Do you think >> that it's necessary to modify LookupLanguage() to accept >> . an _optional_ argument (charset) and set langGroup >> to 'x-unicode' instead of ('x-western') if charset is >> one of UTF-*'s and aLanguage argument is unknown/misspelled? >> We can go even further and make LookupLanguage() >> to set different default langGroup for different >> charset (of course, this should be optional.) > I suggest treat misspelled C-L as no C-L, ie. fall back to charset. Now it does. Instead of maing nsLookupLanguage() to have an optional third argument (I don't know how to specify an optional argument with the default value in XPCOM IDL), I modified it to return NS_ERROR_LANGATOM_UNKNOWN_LANG (its severity bit is still 0 so that NS_SUCCEEDED() results in 'true') instead of NS_OK when mContentLanguage has an unknown/misspelled language. In UpdateCharSet() in nsPresContext.cpp, the return value is checked and acted upon accordingly. > I strongly favor your current approach. lang attribute > from tags can still override the default one. All right. I agree with you that lang attrib. can override the default det. from C-L.
Attached patch a new patch (obsolete) — Splinter Review
Addressing some of Shanjian's concerns. I haven't yet taken out the code to obtain mC-L from intl.accept.languages. mC-L is still only checked for UTF-* cases, but it's easy to check mC-L for all charsets. However, that is contingent on what we decide to do with intl.accept.language as a source of mC-L.
sorry for spamming. I forgot to include nsLanguageAtomService.cpp
Attachment #95528 - Attachment is obsolete: true
Attachment #95594 - Attachment is obsolete: true
Attachment #95718 - Attachment is obsolete: true
> I modified nsLookupLanguage() to return > NS_ERROR_LANGATOM_UNKNOWN_LANG (its severity bit > is still 0 so that NS_SUCCEEDED() results in 'true') > instead of NS_OK when mContentLanguage has > an unknown/misspelled language. Because of caching in nsLookupLanguage(), the second time it's handed 'an invalid/misspelled' value in aLanguage, it sets aResult to 'x-western' returning NS_OK instead of NS_ERROR_LANGATOM_UNKNOWN. To work around this, we have to record the fact that language was set to 'x-western' because of invalid/misspelled aLanguage is recorded in the cache. How? One way is to add a third field to nsILanguageAtom, but I have little idea if it's allowed/desirable because that involves changes in xpcom? and the use for that value is pretty rare(?). See also bug 163271 which led me to discover this problem. BTW, I don't know why I can't accept this bug. Bugzilla doesn't show 'accept' button. I've accepted some bugs in the past,,,,
Keywords: mozilla1.3, patch
Status: NEW → ASSIGNED
http://www.faqs.org/rfcs/rfc3282.html Adding an RFC link for Content-Language.
QA Contact: amyy → i18n
What is the status of this bug? Was the patch merged or is it now outdated? Is the bug still valid?
Ficed by bug 547267.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago13 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: