122779 - Use fonts according to Content-Language/lang attribute in Unicode page

Reporter

Description

•

23 years ago

Mozilla use only one set of fonts to render the text in Unicode pages, if some glyphs are not available in the chosen font, then Mozilla/host OS tries to render them with other fonts. I think it would be better if we can set the font to use for each language, and Moziila choose the font to render according to the Content-Language header or lang attribute in Unicode pages.

Rui Xu

Updated

•

23 years ago

QA Contact: ruixu → ylong

Frank Tang

Comment 1

•

23 years ago

push to future.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Target Milestone: --- → Future

Rui Xu

Updated

•

23 years ago

Keywords: intl

Roy Yokoyama

Comment 2

•

23 years ago

->shanjian

Assignee: yokoyama → shanjian

Shanjian Li

Comment 3

•

23 years ago

We now honor lang attribute. Could you verify?

Status: NEW → ASSIGNED

Shanjian Li

Comment 4

•

23 years ago

This has been fixed. *** This bug has been marked as a duplicate of 105199 ***

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → DUPLICATE

Yuying Long

Comment 5

•

23 years ago

Verified as dup. Please re-open if disagree.

Status: RESOLVED → VERIFIED

Jungshik Shin

Assignee

Comment 6

•

23 years ago

'lang' attribute in html doc is honored in font selection (as Shanjian wrote, it's fixed in bug 105199), but 'Content-Language' specification in HTTP header doesn't seem to be honored by Mozilla. Try the following three pages under non-SC locale (e.g. JA, TC or KO) and compare the results. http://jshin.net/moztest/zh-CN.utf8.html http://jshin.net/moztest/zh-CN2.utf8.html http://jshin.net/moztest/zh-CN3.utf8.html I set up htaccess file in my server in such a way that my web server emits 'Content-Language: zh-CN' in http header for the first file. The second file has meta-tag to specify Content-Language. In the third one, 'lang' attribute is explicitly used. Only the third one is rendered with a SC font while the first two pages are rendered with multiple fonts ( I tried them under KO locale so that KO and SC fonts were mixed to make the result look like a 'ransom' note.) I suggest that this bug be reopened with the following summary line 'Content-Language has to be refered to in font selection for a UTF-8 page'. C-L http header field is specified in http 1.1 section 14.12 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html)

Jungshik Shin

Assignee

Comment 7

•

23 years ago

Attached patch v1 patch (obsolete) — Details — Splinter Review

With this patch, the first test case works as intended. It's rendered with a single SC font when Mozilla is run under Korean locale. The second test still doesn't work. I thought C-L specified in meta-tag gets parsed, but I may have been wrong. The third test case works well as it should(ie. rendered with a single SC font) However, with this patch, the default font size got smaller for the third case. The patch for bug 98929 laid out the foundation for this fix, but 'lang' set by the patch for 98929 gets 'sort of' masked/shadowed by nsPresContext::UpdateCharset(). My patch reads off nsDocument::mLanguage (set by the patch for 98929) from Content-Language http header) and use it to set nsPresContext::mLanguage and langGroup for Unicode-encoded pages(UTF-8 and other UTF's). My patch doesn't change the behavior of Mozilla for html docs encoded in non-Unicode legacy encodings (ISO-8859-x, EUC-JP/KR, GB2312, Big5, KOI8-R/U, CP12xx,etc). For those documents, langGroup is still derived from the encoding(charset). UpdateCharset() may not be the best place to do this and I'm open to suggestions for a better place.

Jungshik Shin

Assignee

Comment 8

•

23 years ago

Attached patch v2 patch w/ some tightening (obsolete) — Details — Splinter Review

I tightened up some loose ends. When intl.accept.language was missing in prefs.js(and C-L header field is absent in http header), mContentLanguage was not set explicitly and had a 'random' value. Now it's set to NULL string explicitly. The reason the third case got rendered with a smaller size font with my patch than otherwise turned out to be that I didn't have 'intl.accept.languages' in prefs.js and that resulted in langGroup being set to 'x-western', which has a smaller default font size than 'zh-CN' in my preference. As for the second case, C-L in metatag is not recognized yet. It has to be filed as a separate bug. In summary, this patch does all it can do for the moment.

Shanjian Li

Comment 9

•

23 years ago

Reopen the bug for content/language problem as suggested by jshin.

Status: VERIFIED → REOPENED

Resolution: DUPLICATE → ---

Shanjian Li

Comment 10

•

23 years ago

That's a very good job. You patch certainly make sense. I have 2 questions. 1, What about the priority between document charset and C-L? For example, a document encoded in GB2312, but C-L specifies japanese? 2, When C-L is misspelled, should default language be used? I suggest to try content language in all situation and use it if one is found, otherwise fallback to existing code. jshin, can I reassign the bug to you or I have to act like a proxy?

Status: REOPENED → ASSIGNED

Jungshik Shin

Assignee

Comment 11

•

23 years ago

Shanijian, Thank you for your comment and glad that you like it. > I have 2 questions. > 1, What about the priority between document charset and C-L? > For example, a > document encoded in GB2312, but C-L specifies japanese? I thought about it and decided to leave those a bit edge cases alone and to work only on UTF-* cases. Currently, mContentLanguage is obtained from two different sources, C-L header and intl.accept.languages. If mContentLanguage is from C-L header (or meta-tag: not yet implemented), I think Mozilla should respect the author's intent for cases where langGroup deduced from the encoding is different from that specified in C-L. However, if it's obtained from intl.accept.languages, I'm afraid we'd better stick to the one deduced from the encoding(charset). For instance, I have 'ko,en-US' in intl.accept.languages. It's fine to use 'ko' for UTF-8 page without C-L header. However, it doesn't make sense to use 'ko' for GB2312 encoded pages without C-L header. One way to work around this issue is add mContentLangSource (a la mCharacterSetSource) to nsDocument class so that we can differentiate between mContentLanguage obtained from C-L header(and meta-tag when implemented. this has to be done at nsHTMLDocument class, though) and intl.accept.languges. Do you think it's worth pursuing? > 2, When C-L is misspelled, should default language be used? For misspelled C-L, langGroup is set to x-western by nsLangAtom::LookupLanguage(). It appears harmless for UTF-* documents although not most desirable. For documents encoded in non-Unicode encodings, it can do some damages, but currently my patch doesn't deal with them as I wrote in my answer to your first question. Do you think that it's necessary to modify LookupLanguage() to accept an _optional_ argument (charset) and set langGroup to 'x-unicode' instead of ('x-western') if charset is one of UTF-*'s and aLanguage argument is unknown/misspelled? We can go even further and make LookupLanguage() to set different default langGroup for different charset (of course, this should be optional.) Alternative is to do some check in the caller, but ... > I suggest to try content language in all situation and use it > if one is found, otherwise fallback to existing code. I explained some issues with doing this in the above. Can you tell me what you think of them? > jshin, can I reassign the bug to you or I have to act like a proxy? Yes, you can reassign it to me. BTW, here's a different problem. When I took up this bug, I tried to solve it by making Mozilla behave as if there were 'lang' attribute in <body> or <html> of which value is obtained from C-L http header as below: <html lang="zh-TW"> <head>....</head> <body> ... or <html> ... <body lang="zh-TW'> That is, I wanted to set lang(pseudo-class) in the very root of style resolution, but I couldn't figure out how and came up with modifying UpdateCharset(), instead. I'd like to hear your opinion on this approach compared with my present patch. Another BTW, in my patch(attachment 95594 [details] [diff] [review]), there's an mistake using 'end-1' where just 'end' is used in calling Substring().

Shanjian Li

Comment 12

•

23 years ago

> I thought about it and decided to leave those a bit edge > cases alone and to work only on UTF-* cases. > Currently, mContentLanguage is obtained from > two different sources, C-L header and intl.accept.languages. > If mContentLanguage is from C-L header (or meta-tag: not > yet implemented), I think Mozilla should respect the author's > intent for cases where langGroup deduced from the encoding > is different from that specified in C-L. However, if it's > obtained from intl.accept.languages, I'm afraid we'd better > stick to the one deduced from the encoding(charset). Agree. I didn't realize that mContentLanguage can originated from accept languages. That make things complicated. > One way to work around this issue is add mContentLangSource > (a la mCharacterSetSource) to nsDocument class so that > we can differentiate between mContentLanguage obtained > from C-L header(and meta-tag when implemented. > this has to be done at nsHTMLDocument class, though) and > intl.accept.languges. Who will use the mContentLanguage besides what you are doing here? Is it possible to let mContentLanguage be originated from only one source (ie, C-L header) or take into consideration of charset when deciding mContentLanguage? Using your example (accept-lang = ko, charset = gb2312), I don't think setting mContentLanguage to Ko can lead to any reasonable result anywhere. > For misspelled C-L, langGroup is set to x-western by > nsLangAtom::LookupLanguage(). It appears harmless for > UTF-* documents although not most desirable. For documents > encoded in non-Unicode encodings, it can do some damages, > but currently my patch doesn't deal with them as I wrote > in my answer to your first question. Do you think > that it's necessary to modify LookupLanguage() to accept > . an _optional_ argument (charset) and set langGroup > to 'x-unicode' instead of ('x-western') if charset is > one of UTF-*'s and aLanguage argument is unknown/misspelled? > We can go even further and make LookupLanguage() > to set different default langGroup for different > charset (of course, this should be optional.) I suggest treat misspelled C-L as no C-L, ie. fall back to charset. > That is, I wanted to set lang(pseudo-class) in the > very root of style resolution, but I couldn't figure > out how and came up with modifying UpdateCharset(), > instead. I'd like to hear your opinion on this > approach compared with my present patch. I strongly favor your current approach. lang attribute from tags can still override the default one.

Shanjian Li

Comment 13

•

23 years ago

give it to jshin.

Assignee: shanjian → jshin

Status: ASSIGNED → NEW

Jungshik Shin

Assignee

Comment 14

•

23 years ago

> Who will use the mContentLanguage besides what you are doing here? It's used in content/html/style/src/nsCSSStyleSheet.cpp to select lang-based selector(??). The idea for referencing intl.accept.languages probably arose to handle cases where charset/encoding can't be mapped to a single unique language That includes UTF-8(x-unicode) and ISO-8859-1(x-western). Try http://jshin.net/moztest/lang.latin1.html (with intl.accept.languages ="de", "fr" and "fr,de", "de,ko,en-US"). The way it's used in nsCSSStyleSheet.cpp is different from the way it's used in nsPresContext.cpp, though. In the former case, if there are multiple elements in intl.accept.languages and multiple lang-based selectors in CSS, it seems like the last lang based selector in CSS matched with one of languages specified in intl.accept.language gets effective. (that is, the order languages are specified in intl.accept.language does not matter). However, there should be very few documents with something like 'q:lang(de)' in CSS but without explicit use of 'lang' attribute in html elements (here it's 'q'). This, along with not-so-intutive way of choosing lang-based selector when multiple langs are present in intl.accept.languages (as described above) I have some reservation about the usefulness of obtaining mContentLanguage from intl.accept.languages. It also has to be noted, though, that C-L http header can have multiple languages listed (however starnage it may sound.) > Is it > possible to let mContentLanguage be originated from only one source (ie, C-L > header) Yes, it's possible. It's easy (I just have to take out a part of the patch for bug 98929: attachment 48982 [details] [diff] [review]), but the question is whether or not to do. I am inclined to take that out for the reason given above, but like to hear from Ulrich who added it in in his patch for bug 98929 before going ahead. Another aspect that may make things complicated in the future is that once bug 121193 is fixed, we may have yet another way to obtain the value for mContentLanguage. With this, mContentLanguage becomes almost like mCharSet in terms of the number of sources where its value can come from : C-L http header, meta-tag, user setting via UI (like character coding menu) and user pref. value in intl.accept.language (settable via Pref|Language). Of course, when bug 121193 is fixed, probably we have to remove the last (intl.accept.languages). >>Do you think >> that it's necessary to modify LookupLanguage() to accept >> . an _optional_ argument (charset) and set langGroup >> to 'x-unicode' instead of ('x-western') if charset is >> one of UTF-*'s and aLanguage argument is unknown/misspelled? >> We can go even further and make LookupLanguage() >> to set different default langGroup for different >> charset (of course, this should be optional.) > I suggest treat misspelled C-L as no C-L, ie. fall back to charset. Now it does. Instead of maing nsLookupLanguage() to have an optional third argument (I don't know how to specify an optional argument with the default value in XPCOM IDL), I modified it to return NS_ERROR_LANGATOM_UNKNOWN_LANG (its severity bit is still 0 so that NS_SUCCEEDED() results in 'true') instead of NS_OK when mContentLanguage has an unknown/misspelled language. In UpdateCharSet() in nsPresContext.cpp, the return value is checked and acted upon accordingly. > I strongly favor your current approach. lang attribute > from tags can still override the default one. All right. I agree with you that lang attrib. can override the default det. from C-L.

Jungshik Shin

Assignee

Comment 15

•

23 years ago

Attached patch a new patch (obsolete) — Details — Splinter Review

Addressing some of Shanjian's concerns. I haven't yet taken out the code to obtain mC-L from intl.accept.languages. mC-L is still only checked for UTF-* cases, but it's easy to check mC-L for all charsets. However, that is contingent on what we decide to do with intl.accept.language as a source of mC-L.

Jungshik Shin

Assignee

Comment 16

•

23 years ago

Attached patch the same patch with missing nsLanguageAtomService.cpp — Details — Splinter Review

sorry for spamming. I forgot to include nsLanguageAtomService.cpp

Attachment #95528 - Attachment is obsolete: true

Attachment #95594 - Attachment is obsolete: true

Attachment #95718 - Attachment is obsolete: true

Jungshik Shin

Assignee

Comment 17

•

23 years ago

> I modified nsLookupLanguage() to return > NS_ERROR_LANGATOM_UNKNOWN_LANG (its severity bit > is still 0 so that NS_SUCCEEDED() results in 'true') > instead of NS_OK when mContentLanguage has > an unknown/misspelled language. Because of caching in nsLookupLanguage(), the second time it's handed 'an invalid/misspelled' value in aLanguage, it sets aResult to 'x-western' returning NS_OK instead of NS_ERROR_LANGATOM_UNKNOWN. To work around this, we have to record the fact that language was set to 'x-western' because of invalid/misspelled aLanguage is recorded in the cache. How? One way is to add a third field to nsILanguageAtom, but I have little idea if it's allowed/desirable because that involves changes in xpcom? and the use for that value is pretty rare(?). See also bug 163271 which led me to discover this problem. BTW, I don't know why I can't accept this bug. Bugzilla doesn't show 'accept' button. I've accepted some bugs in the past,,,,

Andrew Hagen

Updated

•

23 years ago

Keywords: mozilla1.3, patch

Jungshik Shin

Assignee

Updated

•

22 years ago

Status: NEW → ASSIGNED

Katsuhiko Momoi

Comment 18

•

22 years ago

http://www.faqs.org/rfcs/rfc3282.html Adding an RFC link for Content-Language.

Phil Ringnalda (:philor)

Updated

•

16 years ago

QA Contact: amyy → i18n

:aceman

Comment 19

•

14 years ago

What is the status of this bug? Was the patch merged or is it now outdated? Is the bug still valid?

Masatoshi Kimura [:emk]

Comment 20

•

13 years ago

Ficed by bug 547267.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago → 13 years ago

Resolution: --- → DUPLICATE

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 21

•

13 years ago

Did bug 416581 help as well?

v1 patch 23 years ago Jungshik Shin 1.51 KB, patch		Details \| Diff \| Splinter Review
v2 patch w/ some tightening 23 years ago Jungshik Shin 2.17 KB, patch		Details \| Diff \| Splinter Review
a new patch 23 years ago Jungshik Shin 2.99 KB, patch		Details \| Diff \| Splinter Review
the same patch with missing nsLanguageAtomService.cpp 23 years ago Jungshik Shin 4.10 KB, patch		Details \| Diff \| Splinter Review