NS_FindFCLangGroup in gfxFontconfigUtils.cpp and GetPangoLanguage both use (different) mapping arrays in a half-hearted attempt to recover the language information disposed of with nsLanguageAtomService::LookupLanguage and langGroups.properties. The MozPangoLangGroups array is missing some elements that MozGtkLangGroups has and MozPangoLangGroups has some entries that don't add any information. We can at least unify these two mappings into one. Even when we eventually retain the lang information from documents that have it, we'll still need to make a guess at a language for the cases where the langGroup is inferred from the document.
Created attachment 340072 [details] [diff] [review] gfxFontconfigUtils::GetSampleLangForGroup am for x-ethi seems more consistent with langGroups.properties than et (Estonian), which was in MozPangoLangGroups. pango_language_from_string does the necessary conversion to lowercase. Also chooses the user's preferred language rather than always using the language expected to get the biggest vote.
In GetSampleLangForGroup, don't we have some kind of string tokenizer object you can use instead of tokenizing the environment variable yourself? + FcPatternAddString(aPattern, FC_LANG, (FcChar8 *)lang.get()); Use const_cast Other than that, looks fine.
(In reply to comment #2) > In GetSampleLangForGroup, don't we have some kind of string tokenizer object > you can use instead of tokenizing the environment variable yourself? I considered these options: nsCWhitespaceTokenizer Perfect, except that it only tokenizes on whitespace. htmlparser/src/nsScanner.h htmlparser/public/nsScannerString.h Only support PRUnichar strings. nsCRT::strok Provides NUL-terminated tokens and so writes to the source. Environment variables are writable AFAIK, but it feels evil, and we don't need NUL-terminated tokens. strchr and PRInt32 nsACString::FindChar(char_type, index_type offset = 0) Neither return the end of the string when failing to find a separator, and so require special casing the last token. PRBool FindCharInReadable( PRUnichar aChar, nsAString::const_iterator& aSearchStart, const nsAString::const_iterator& aSearchEnd ); Moves aSearchStart nicely, but uses deprecated iterators. http://hg.mozilla.org/mozilla-central/annotate/ad2eb162ecfc/xpcom/string/public/nsTSubstring.h#l125 #define _GNU_SOURCE char *strchrnul(const char *s, int c); Good, but I'm not sure whether GNU is a standard we require.
Created attachment 340264 [details] [diff] [review] gfxFontconfigUtils::GetSampleLangForGroup v1.1
Comment on attachment 340264 [details] [diff] [review] gfxFontconfigUtils::GetSampleLangForGroup v1.1 C++ casts and changes to GetSampleLangForGroup to iterate through the environment variable only once.