Closed Bug 232487 Opened 21 years ago Closed 21 years ago

Text identified as lang="sa" (Sanskrit) uses Western fonts, not Devanagari fonts

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jamie, Assigned: jshin1987)

References

()

Details

(Keywords: fixed1.7, intl, l12y)

Attachments

(5 files, 2 obsolete files)

User-Agent: Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031225 Firebird/0.7 In the example page, the first piece of devanagari text (marked as lang="hi") is displayed by Mozilla (and Mozilla Firebird) using whatever font the user has set for Devanagari. The second piece of devanagari text (marked as lang="sa") is display by those browsers using whatever font the user has set for Western. This is problematic because some fonts which include devanagari characters do not handle conjoint characters properly (eg, Arial Unicode MS), but Mozilla may use characters from this font if it cannot find them in the specified Western font. This makes texts marked up as Sanskrit display very poorly (and is indeed how I noticed the problem). Reproducible: Always Steps to Reproduce: 1. Set Devanagari font to Raghindi (or other font suitable for displaying devanagari) 2. Set Western font to Code2000 (or other font different from the first, but suitable for displaying devanagari) 3. Visit the supplied URL Actual Results: I noticed that the two pieces of devanagari displayed using different fonts. Expected Results: Displayed both pieces of devanagari using the same font (the one specified for Devanagari).
Good catch. "sa", "mr" (Marathi), "ne" (Nepali) and probably other languages should all be treated as in the Devanagari langGroup, by which we really mean a script group. There is a list of languages written in Devanagari script at http://omniglot.com/writing/devanagari.htm.
Status: UNCONFIRMED → NEW
Component: Layout: Fonts and Text → Internationalization
Ever confirmed: true
Indeed, a good catch. Currently, there are 170 languages listed in intl/locale/src/language.properties while there are only 86 languages in langGroup.properties. We may have to go through two lists and add missing ones to langGroup.properties.
Assignee: nobody → jshin
Sorry. Actually, langGroup.properties has only 47 languages mapped to langGroups (script groups). Simon, what do you think we have to do for languages written in multiple scripts (I mean not languages like Japanese but languages like Azeri and Mongolian)? Would adding script 'identifiers' like '-latn' work?
Keywords: intl
OS: Linux → All
Hardware: PC → All
I would dearly like to move towards including ISO 15924 script codes in language tags, but we should probably wait for the successor of RFC 3066. The latest draft is at http://www.ietf.org/internet-drafts/draft-phillips-langtags-00.txt and discussion takes place in the ietf-languages list archived at http://eikenes.alvestrand.no/pipermail/ietf-languages/
I've gone through langGroups.properties and language.properties and filled in a few gaps in the former among languages listed in the latter. Patch to follow. Which files determine which languages are recognised by Mozilla? It would be nice to add recognition/support for, say, Maori (which uses the Latin alphabet).
I went about half way through the list the other day. I'll finish that up sometime soon.
Status: NEW → ASSIGNED
Keywords: l12y
Attached patch patch (obsolete) — Splinter Review
I built a new patch upon Jamie's patch.
Attachment #140226 - Attachment is obsolete: true
Attached patch updateSplinter Review
same as before except that I cleaned up a little bit.
Attachment #143095 - Attachment is obsolete: true
Comment on attachment 143097 [details] [diff] [review] update asking for r/sr. smontagu, if you happen to read this email, please feel free to chime in.
Attachment #143097 - Flags: superreview?(blizzard)
Attachment #143097 - Flags: review?(momoi)
Just a heads-up. This will take a few days to review. Looking at the first 12 or so entries, there are already some problematical cases where recent language policy changes don't match what you find in Ethnolgue. We probably should honor what the official language policy of the country affected rather than what Ethnolgue describes in such a case.
I guess you're concerned about some entries mapped to 'x-cyrillic'. Anyway, please take your time and I'll be happy to incorporate your findings.
Comment on attachment 143097 [details] [diff] [review] update Clearing, waiting for momoi.
Attachment #143097 - Flags: superreview?(blizzard)
I'm OK with most of what you have added but I want you to address some issues before checking these in. For langGroups.properties file: 1. There are those languages in the list that are added but commented out. You have certain script categories that seem undefined in the code officially, e.g. Ethiopic. What are your plans? Can we do something like "x-ethiopic-u" to refer to any category that is defined in Unicode Standard and have all "..-u" entries map to "x-unicode" for the purpose of font selection? 2. For those commented out, you can simply take note of my comments and add comments as needed. But for those not commented out, i.e. ce, gd, gl, and om, please indicate your agreement or disagreement on my suggestions. For language.properties file: 3. I went through most of them carefully but since it takes a lot of time to figure out why certain languages are "false" and shoult not appear on the Accept Language list, I'ved decided to focus my energy on the ones that you added to appear on the list. There was only one entry that concerned me under the "true" category. This is "ve (Venda)". See my comment on this. In general, I did not get a good sense of why certain langs should not appear. (See my comments on 6 languages with "false" value.) But as long as they don't appear, there is no practical harm. We may simply wait until someone alerts us about a language that's not there and then decide to see if we should change from "false" to "true".
I stated before that Maori (mi) should be x-western, without remembering that it requires macronised vowels and therefore falls outside the scope of ISO-8859-1. Those ten characters from Latin Extended-A are the only non ISO-8859-1 characters used.
Thanks, Jamie. I've gone back and looked at those classified as x-western and found that a number of them use characters from Latin Extended A, B, and/or Latin Extended Additional. They are listed in this attachment. I still think the list may contain some inaccuracies but these corrections should improve the situation considerably.
(In reply to comment #14) Thanks for your thorough review. > For langGroups.properties file: > > 1. There are those languages in the list that are added but commented out. You > have certain script categories that seem undefined in the code officially, e.g. > Ethiopic. What are your plans? Can we do something like "x-ethiopic-u" to refer > to any category that is defined in Unicode Standard and have all "..-u" entries > map to "x-unicode" for the purpose of font selection? I think the current langGroup approach has to be overhauled eventually (when?) and 'Uncode' langGroup (which is at best a hack) has to be removed when we do that. (see also bug 91190). It doesn't scale very well as you found out reviewing my changes. For instance, Maori is not fully covered by x-western. Is there any alternative? Thre may be, but it'd not be easy for speakers of languages like Maori or African languages (that use Latin alphabet) to figure out which langGroup their languages belong to. We have to move on to using Unicode code ranges (or something similar). The only platform where that doesn't work well with that is X11corefont builds. We have to support that platform (on Linux, we may not have to but for other commercial Unix, we have to) so that we have to come up with a way to support it when we make changes. In the meantime, we may add 'Ethiopic', 'Georgian', 'Armenian' and some others. I'd love to add all Indic scripts as well, but our support of Indic scripts are not uniform across platforms and there's no way to add them selectively in a platform-dependent manner. > 2. But for those not commented out, i.e. ce, gd, gl, and om, > please indicate your agreement or disagreement on my suggestions. I'll check them out and comment later. > For language.properties file: > > 3. I went through most of them carefully but since it takes a lot of time to > figure out why certain languages are "false" and shoult not appear on the I gave up rationalizing my choices there :-) At the beginning, I may have had some criteria, but soon enough it became quite arbitrary as you found out (my choice is likely to be biased toward European minority languages). I'm sorry for forcing you to spend your time figuring out my 'rationale'. > Accept Language list, I'ved decided to focus my energy on the ones that you > added to appear on the list. There was only one entry that concerned me under > the "true" category. This is "ve (Venda)". See my comment on this. ... as I'll do what you suggested about this. > long as they don't appear, there is no practical harm. We may simply wait > until someone alerts us about a language that's not there and then decide > to see if we should change from "false" to "true". I agree with you on this point.
I propose to move forward on this after making changes suggested in my 2 attachments. With the changes suggested there, the 2 lists will be much more better than before. Though we can't deny that there might still be some inaccuracy left in there, the ones that will appear on the Accpet-Language list ("true") have been examined carefully.
Comment on attachment 143466 [details] A list of those currently classfied as x-western but use non-Latin 1 characters. >The ones below are classified currently as x-western but use characters from Latin Extended A/B and/or Latin Extended Additional >+eo=x-western (Esperanto uses non Latin 1 characters from Latin Extended A.) x-western is not only Latin-1 but also includes Latin-3 which covers Esperanto rather well. So, we should be fine here. Do you have any alternative for others? The font selection dialog in Mozilla is not exactly about the coverage (that is only important on X11corefont build but doesn't matter much in other builds, Windows, Xft on Linux and MacOS X) If you're an Upper Sorbian/Welsh/Cornish speaker, what langGroup would use to specify your font preferences? It's likely to be 'Western'. For Fijian, Maori and Yoruba, there's no obvious choice....
Attachment #143097 - Flags: review?(momoi)
I addressed Kat's comments. I added comments to languages in attachment 143466 [details] while keeping the assignment because I can't think of a better alternative at the moment. We may later add 'x-celtic'.
Comment on attachment 143866 [details] [diff] [review] update addressing Kat's comments asking for review.
Attachment #143866 - Flags: review?(momoi)
Comment on attachment 143866 [details] [diff] [review] update addressing Kat's comments r=momoi. All my concerns have been addressed in this new patch. Hopefully we can overhaul the lang group issues addressed in my and jungshik's omments in the future.
Attachment #143866 - Flags: review?(momoi) → review+
Comment on attachment 143866 [details] [diff] [review] update addressing Kat's comments thanks for r. asking for sr.
Attachment #143866 - Flags: superreview?(blizzard)
Comment on attachment 143866 [details] [diff] [review] update addressing Kat's comments I'm not terribly happy about all the commented out stuff without explainations in the various files but I can live.
Attachment #143866 - Flags: superreview?(blizzard) → superreview+
Comment on attachment 143866 [details] [diff] [review] update addressing Kat's comments thanks for r/sr. asking for a1.7 This is to add ~10 mappings from languages to langGroups and ~20 languages to the list of our supported languages that don't require any special handling in Mozilla (that is, they've been supported for a long time, but haven't been in the list) risk : almost zero if not none. affected users : speakers/users of those languages added affected platforms : all
Attachment #143866 - Flags: approval1.7?
Comment on attachment 143866 [details] [diff] [review] update addressing Kat's comments a=chofmann for 1.7
Attachment #143866 - Flags: approval1.7? → approval1.7+
Keywords: fixed1.7
sorry I forgot to mark this as fixed.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Attachment #152111 - Flags: superreview?(blizzard)
Attachment #152111 - Flags: review?(jshin)
Attachment #152111 - Flags: superreview?(blizzard) → superreview+
Comment on attachment 152111 [details] [diff] [review] Fix duplicate line r=jshin thanks for catching it.
Attachment #152111 - Flags: review?(jshin) → review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: