556237 - Implement font and encoding negotiation based on BCP 47

Reporter

Description

•

15 years ago

Our current "language groups" are a leftover from the pre-Unicode world of multiple codepages and codepage-specific fonts, and do not serve our needs well any longer. They are used as the key to specifying default fonts, but the distinctions they make are no longer helpful. For example, "Baltic", "Turkish" and "Western" languages are all written in Latin script, and should normally share font preferences; the split dates back to old codepages where no single codepage provided all the necessary accented characters, but is obsolete in the Unicode world. And the meaning of "Other Languages" and "User Defined" (does that mean x-unicode internally?) are far from clear in the font preferences. At the same time, the langGroups do not provide users with the flexibility they want - hence periodic requests to create new langGroup values such as Tibetan, Persian, Macedonian,..... whenever people want to set specific default fonts for a language that is not currently exposed as its own "group". To update and improve this situation, and to allow better user control, I propose that we replace the langGroup-based font prefs with a system based on BCP 47 "Tags for Identifying Languages" (http://tools.ietf.org/html/bcp47). This provides a standard model for tags that can incorporate language, script, and region, as well as rules for specific/generic tag matching. I envisage that font preferences will primarily be expressed in terms of script, with the language subtag normally being a "wildcard" (and the trailing region subtag being omitted); for example, the default Latin font would be listed under "*-Latn", the default Arabic as "*-Arab", etc. However, there will be the flexibility to create preferences for specific languages, so that if Persian users want different default fonts from the Arabic one, these can be provided as "fa-Arab". Or a different font preference for West African Arabic might be specified as "*-Arab-011" (where 011 is the IANA-registered subtag for Western Africa). Fonts will then be resolved using the script of the text, in combination with the language (where available) and the user's locale (if not overridden by an extended lang tag), and finding the most specific match among the available preferences. So Arabic-script text tagged as "fa" would use the "fa-Arab" fonts if defined; but if not, it would fall back to the "*-Arab" fonts. This will give us a consistent, extensible model where localizers or users can specify additional font preferences as needed, and have them automatically used in the right contexts, rather than being constrained by the fixed (and artificial) collection of defined langGroups. The font preferences UI will need some corresponding redesign; with care, we should be able to make it both clearer and more useful.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 1

•

15 years ago

This sounds good to me, but I don't think it should be a priority.