Pages entirely in us-ascii with, e.g., <span lang="ja">manga</span> will result in a font download dialog for Japanese fonts if they are not installed. See the post referenced in the URL field and related material linked therefrom. We should not try to download fonts unless the page's character encoding actually exceeds the available font repertoire, right?
reassign to Frank.
>We should not try to download fonts unless the page's character encoding actually >exceeds the available font repertoire, right? Why not. Invalid bug. It does not really cause problem. It is by design to improve text layout performance so we don't need to do a per char checking.
Mark as verified per previous comment.
Fonts really should be associated with character repertoires, not languages... these are two very different things. As the original reporter noted, lang="ja" denotes Japanese language, not any particular writing system, and can be used on a Romanized representation of something in Japanese (e.g., "manga"). Assuming that a Japanese font is needed in the absence of encountering a specific character that requires such a font is not very reasonable.
Just a quick note to second comment no #5 Typefaces relate to a character repertoire not a specific language. This bug also specifically affects the accessibility of pages that use the lang attribute to aid aural browsers (such as IBM's home page reader). By prompting the standard user to download large font sets, that they don't need, you are actively discouraging authors from using the lang attribute. This then affects accessibility by not making available language information to aural browsers. This is important because these browsers need to know the language in order to pronounce the word correctly. For example a word that is marked up in a "anglicised" japanese, eg konichiwa, will then be pronounced according to japanese pronunciation rules rather than english rules. This can make a huge difference to the comprehension of the spoken text. I think this bug should be reopened, retaining the status quo has a strong negative effect on the usefulness of the lang attribute.
smontagu noticed that language codes like ja-Latn are registered to indicated Japanese written in Latin script. Page authors should mark up such text as <span lang="ja-Latn">manga</span> and we should display it in the font selected for Western, rather than Japanese. Reopening.
handing to smontagu
Created attachment 135079 [details] Testcase with ja-Latn
It seems the *-Latn convention is still controversial. I am still reading up the archives at http://eikenes.alvestrand.no/pipermail/ietf-languages/ to discover if there is a consensus and if so, what.
ja-Latn is not listed at http://www.iana.org/assignments/lang-tags/
No, it isn't, but it could be :-). Currently registered are: az-Latn Azerbaijani in Latin script sr-Latn Serbian in Latin script uz-Latn Uzbek in Latin script yi-latn Yiddish, in Latin script It seems possible that a future revision of RFC 3066 will formalize the use of ISO 15924 script tags as part of language identifiers, making any other combination, e.g. ja-Latn, valid without special registration at IANA.
(In reply to comment #12) > It seems possible that a future revision of RFC 3066 will formalize the use of > ISO 15924 script tags as part of language identifiers, making any other > combination, e.g. ja-Latn, valid without special registration at IANA. This is now RFCs 4646 and 4647
It's marked as "Platform: x86 Windows 2000", but I see this bug on Mac OS X as well -- I suspect it's present on all platforms. Here's a testcase: <span>normal</span> <span lang="en-Latn">en-Latn</span> <span lang="sa-Latn">sa-Latn</span> <span lang="ar-Latn">ar-Latn</span> <span lang="el-Latn">el-Latn</span> <span lang="ru-Latn">ru-Latn</span>. I see at least three fonts there.
This should really be fixed. This issue is a problem on Wikipedia atm, where we are now left with the choice of forcing a latin compatible font on the user, or removing lang tags for transliterated text. This is rather suboptimal.
Due to apparent lack of progress on this issue in the past 7 years, and an increasing amount of complaints from readers, I have disabled the generation of lang= attributes for transliterated text in Wikipedia. http://en.wikipedia.org/w/index.php?title=Template%3ATransl&action=historysubmit&diff=377349407&oldid=242769116
(In reply to comment #17) > Due to apparent lack of progress on this issue in the past 7 years, and an > increasing amount of complaints from readers, I have disabled the generation > of lang= attributes for transliterated text in Wikipedia. > > http://en.wikipedia.org/w/index. > php?title=Template%3ATransl&action=historysubmit&diff=377349407&oldid=2427691 > 16 This is now on our radar in our plans to implement BCP 47.
Unduping: bug 756022: the fix for that was narrower in scope than this bug and didn't address the issue of script subtags in language tags in content.
Created attachment 8659117 [details] A screenshot showing -Latn text I've attached a screenshot, of how -Latn language text currently render on FF 40.0.3, Mac OS X 10.10.5 The line height and differing font usage is clearly still problematic
I'm skeptical that Bug 556237 actually blocks this; it's about a whole new system for treating language and font negotiation. It does not require such a system to fix the problem reported here in 192636. For anything tagged *-Latn, just a) use the current font, if latin; or b) use the default latin font, if the current font is something else (Chinese, etc.). The end. If and when 556237's bigger-better-faster idea is implemented ("don't think it should be a priority", they say, and depends in turn on at least three other bugs), in 10 years or whatever, it can supersede what we're resolving here. But this problem should be resolved now, not later.