Closed Bug 61108 Opened 24 years ago Closed 24 years ago

prob converting locale to language

Categories

(Core :: Internationalization, defect, P1)

Sun
Solaris
defect

Tracking

()

VERIFIED FIXED
mozilla0.9

People

(Reporter: masaki.katakai, Assigned: bstell)

References

Details

(Keywords: intl)

Attachments

(9 files)

In Solaris chinese locales (e.g. zh_CN.EUC), the same UI font isn't used for UI characters. Some glyphs are small than the others. I'll attach the snapshot. We need a reasonable workaround or the exact fix for Sun'S OEM release of Netscape 6. It seems that unicode encoding is used for UI glyphs, which doesn't use the same font for whole glyphs of UI. I defined the following fonts for zh-CN, but which couldn't solve the problem, user_pref("font.name.monospace.zh-CN", "sun-song-gb2312.1980-0"); user_pref("font.name.sans-serif.zh-CN", "sun-song-gb2312.1980-0"); user_pref("font.name.serif.zh-CN", "sun-song-gb2312.1980-0"); However, when I defined gb2312 for Unicode encoding, the UI glyphs can be displayed with the same font I specified, like, user_pref("font.name.monospace.x-unicode", "sun-song-gb2312.1980-0"); user_pref("font.name.sans-serif.x-unicode", "sun-song-gb2312.1980-0"); user_pref("font.name.serif.x-unicode", "sun-song-gb2312.1980-0"); I don't think this can be a reasonable workaround. I'll investigate more detail, but if you have any suggestion, please le me know.
Attached image snapshot
Blocks: 60916
Summary: font size problem for Chinese UI characters → font size problem for Chinese UI characters
Reassign to erik.
Assignee: nhotta → erik
I think Mozilla should be using the fonts set for the language group of the locale when the document is in a Unicode-based encoding. Maybe the locale name (zh_CN.EUC) is not being recognized as a Simplified Chinese locale? (I thought I saw a related bug report today...)
Thanks for evaluation, Erik. Yes, you're right. x-western seems to be passed to GFX. I found a problem that nsLanguageAtomService.cpp can not get 'zh-CN' langGroup by 'zh-cn' key. In the nsLanguageAtomService class, nsIPersistentProperties is used for mLangGroups but it seems that the nsIPersistentProperties can not handle '-' character. So even when zh-cn=zh-CN exists in langGroups.properties, zh-CN can not be returned. res = mLangGroups->GetStringProperty(lowered, langGroupStr); if (NS_FAILED(res)) { PRInt32 hyphen = lowered.FindChar('-'); When the first query fails, the key 'zh-cn' will be changed to 'zh', then trying to get langGroup again. zh= entry doesn't exist also this should contain cn or tw to distinguish the locales. So, I believe we should change the codes to handle '-'. What do you think? How about using nsURLProperties class of uconv/ for this purpose? For testing, I added 'zh=zh-CN' entry and changed font definitions of zh-CN for Solaris, it worked fine.
Assignee: erik → katakai
Hi, Katakai-san: Erik will be out for a while. Would you please provide a patch so we can proceed to the review phase? thx
If PersistentProperties cannot handle the '-' character, then that should be fixed (instead of changing LanguageAtomService). Are you sure that zh-cn is being looked up? Maybe it is actually zh_cn because of a bug in the locale name parsing code.
Sorry for confusion. I found my comment is wrong. The langgroup is retrieved by the exact locale name, e.g. "zh", "zh-gb.gbk", "zh-tw.euc". Those are not defined in langGroups.properties file. When the first retrieval fails, LanguageAtomService will cut the word after "-" and will try again, e.g. zh-tw and zh-cn will be "zh". "zh" is not defined in langGroups.properties. For example, "ja_JP.PCK" will work because "ja-jp.pck" fails, but "ja" is defined. We can not define "zh=zh-CN" or "zh=zh-TW" in langGroups.properties because it's for zh-cn and zh-tw. How about the following scenario? Current implementation uses only '-' when it fails, but I believe we should check '.' first, then use '-' for retrieval. 0) add the entry below to langGroups.properties zh=zh-CN 1) Start Mozilla in zh_TW.EUC locale in Solaris 2) zh_TW.EUC -> zh-tw.euc 3) "zh-tw.euc" isn't defined, so it fails 4) cut after '.', try "zh-tw" => zh-tw=zh-TW is defined in langGroups.properties 1) Start Mozilla in zh locale 2) zh -> zh zh=zh-CN is defined in langGroups.properties 1) in zh.GBK locale 2) zh.gbk fails (Assumed 60954 is fixed) 3) try zh zh=zh-CN is defined This seems to work for me. I'll attach the patch if ready.
I've attached the proposal patch. Please review. Should I ask Shanjian also?
I do not approve of this patch. There should not be any dot (.) in the locale name returned by the locale module. It should return names like en, en-US, zh-CN, etc. The locale name parser is bad, and should be fixed. We should not change nsLanguageAtomService to work around the bug in the locale name parser.
Thanks for comments. Do you mean UNIX's nsPosixLocale::GetXPLocale() is wrong? Actually this method returns ".EUC" as extra, not only "zh-TW". However, I think this ".EUC" should be there in UNIX platform because we can't find zh-TW.BIG5 and zh-TW.EUC is different. For example, nsDateTimeFormatUnix.cpp uses "NSILOCALE_TIME" and it stores "zh-TW.EUC", and tries to get charset by using the key. (at GetDefaultCharsetForLocale()). However, when we change it to only "zh-TW", we will not be able to get the correct charset from unixcharset.properties file. For japanese, ja-JP.PCK (SJIS in Solaris) will be considered as "ja-JP", and unixcharset.properties will return EUC-JP by mistake. Thus, in UNIX platform, I'm thinking .??? field is needed in nsLocaleSerice.
Adding bstell to cc.
Added intl and nsbeta1 keywords. This bug is a showstopper Sun's Chinese releases (see snapshot)
Keywords: intl, nsbeta1
*** Bug 60179 has been marked as a duplicate of this bug. ***
reassigning to bstell
Assignee: katakai → bstell
Target Milestone: --- → mozilla0.8
Status: NEW → ASSIGNED
Erik indicates that we should be referencing RFC 1766 for correct language tags.
Katakai-san, If possible I like to work on this in the html area instead of the menus. Can you recommend a page that displays this behavior in the html area?
Hi Brian, I understand that this problem happens because x-western langgroup is passed to GFX even when we start Mozilla in chinese locale on Solaris. When Mozilla is started in chinese locale, zh-cn or zh-tw should be passed as langgroup. So I think the following page can be used for testing. http://village.infoweb.ne.jp/~katakai/mozilla/UTF-8.txt When we start Mozilla in zh_TW locale and see the page, the page should be rendered as zh-TW langgroup, but now it seems that x-western langgroup so same characters are smaller than others.
Katakai-san, I am still waiting for a review from Erik of this patch. However it should fix the font size problem. Would you please try it?
It seems that "" (blank) is returned for LangGroup when I start Mozilla in zh and zh.GBK locale on Solaris. (Let's debug nsLanguageAtomService:: LookupLanguage()). So I'd like to suggest to appen zh=zh-CN entry into langGroups.properties. See my comments of 2000-12-07 00:13. I don't have enough environment for testing because our zh-TW.jar and zh-CN.jar are too old for the nightly build and Mozilla won't start.
It seems that the patch causes other problem. When I start Mozilla in ja_JP.UTF-8 locale, japanese characters of Month and Day become garbage. Because Mozilla seems to use EUC-JP for default charset of the locale, which should be UTF-8. Even in ja_JP.UTF-8 locale, is only "ja-JP" used as posixlocale? ja-JP is used to retrieve default charset in nsDateTimeFormatUnix.cpp but "UTF-8" is also needed because we should know EUC/Shift_JIS/UTF-8 for the date format. See my comments of 2000-12-08 00:33 for detail. - use new nsPosixLocale - change nsDateTimeFormatUnix.cpp to use system locale (ja_JP.UTF-8) to retrieve default charset (UTF-8) or - do not modify nsPosixLocale and use my patch of nsLanguageAtomService.cpp
Katakai-san, Can you give me an URL that causes the nsDateTimeFormatUnix code to be called?
nsDateTimeFormatUnix will be used when you open FilePicker (File->Open) also Manage Bookmark dialog (date of bookmarks).
Katakai-san, could you apply these patches to nsLocaleService.cpp and nsPosixLocale.cpp and let me know what happens?
Brian, langGroup is still "" and I'm sure you need to modify nsLanguageAtomService:: LookupLanguage() as well because the method assumes "-" for separator and ignores ".", now "_" and "." can be passed to the method. For example, If the method gets ja_jp.pck (zh_cn.gbk), replace "_" with "-". retrieve langgroup by ja-jp.pck (zh-cn.gbk) if not exist, ignore .pck, then search by ja-jp (zh-cn) if not exist, ignore -jp, then search by ja (zh) Also you need to add "zh=zh-cn" entry into langGroup.properties to handle zh and zh.GBK, zh.UTF-8 locale of Solaris.
Katakai-san, would you kindly give me a complete description of your setup and what you are doing when this issue is encountered? We seem to be solving different problems.
I changed the summary since the issue here is not a font problem. The language is not determined correctly and this causes the X font system to use fallback fonts which look funny. was: font size problem for Chinese UI characters
Summary: font size problem for Chinese UI characters → prob converting locale to language
here is what I believe is happening: there are several ways to format a locale string la_CO.EX the unix exteneded (raw) version la-CO the XP version la-CO the XP extended version la the language only version la = language CO = country EX = extra (eg: UTF8, EURO, ...)( To make Unix collation/datetime work we changed the unix version of GetXPLocale to return the XP extended version which effected languageAtomService (this bug). I propose that we: modify GetXPLocale to return the XP version modify the unix version of the nsLocaleService constructor to to also save the raw version so it can be used by the unix collation/date code modify the unix collation/date code to use the raw version This way: 1) The XP API will remain the same and the XP code will see the same format for Win/Mac/Unix. 2) The Unix code can access raw version as necessary.
Target Milestone: mozilla0.8 → mozilla0.9
Setting Priority to P1. This is a show-stopper for Sun.
Priority: P3 → P1
Katakai-san, This attachment, 24863: modifies nsLocaleService.cpp to remember the original (raw) locale info modifies nsCollationUnix.cpp and nsDateTimeFormatUnix.cpp to use the orig locale info has the previous changes to sPosixLocale.cpp to parse the xp locale Would you try this and let me know what does work and what still needs more work? thanks
Personally, I prefer katakai's approach. I do not see a reason why we could not modify nsLanguageAtomService. RFC1766 sometimes is not enought to identify charset, (ie. .extension). To keep 2 copies make the logic rather complicated. However, katakai is fix should be modified this way: try to lookup the locale_name, if succeed, return. if there is dot part, remove it and try again. If succeed, return. if there is - part, remove it and try again. If succeed, return fall back to western. In his original diff, '-' stripping in embedded inside '.' stripping, that's will cause problem. On win or mac, because it has no '.' part, I do not see any side-effect there.
Changed QA contact to katakai@japan.sun.com.
QA Contact: teruko → katakai
I still prefer to have it fixed in the locale code. nsLanguageAtomService has a very clear purpose, and I want to keep it that way. We use Unicode internally, in the XP code. We convert to and from Unicode in the platform-specific code. Similarly, we should use RFC 1766 codes in nsLanguageAtomService, which is XP code. We can handle Unix-specific locale name issues in the Unix-specific code, without polluting nsLanguageAtomService.
Shanjian, It is important that the unix version produce the same XP (cross platform) data as the windows and mac versions. We could change nsLanguageAtomService to work around the current unix-only locale string and it would work but we would then be required to put work arounds in all future XP uses of locale as well. Modifying XP code to work around a platform specific problem is counter to the design goals of Mozilla/Netscape (this rule was written by unix developers who were tired of windows developers changing XP code to make windows work but break unix). We need to either follow the XP standard or convince the windows/mac teams to change the XP standard and to find and put any needed fixes in their code. It seems simpler and less risky to change the unix code to store the XP locale as "la-CO" not "la-CO.extra" (where la=language, CO=country). Storing the XP version of locale makes the XP code work but is not enough information for the Unix locale sensitive code to work. This is why I propose to store the Unix specific data under a separate key. If you have an alternate way to meet both the XP and platform requirements those suggestions are welcome.
Hi Brian, I have tested your patch of http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24863 and it looks good. zh-CN and zh-TW langgroup is passed properly. However, please don't miss zh=zh-CN entry is also needed in langGroups.properties, which will be needed in Solaris zh, zh.GBK and zh.UTF-8 locale names. Thanks.
Katakai-san, Could you attach a patch for the zh=zh-CN entry? Thanks
Frank, I updated this per the code reviewed we did together. Would you kindly add the r=ftang? Thanks Erik, Would you please super review this? The latest changes are in: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=26106 Thanks
r=ftang
String-fu: + category_platform.AssignWithConversion(LocaleList[i]); + category_platform.AppendWithConversion("_PLATFORM"); The first is an unavoidable conversion from ASCII to the two-byte-per-char, UTF-16-like codes that Mozilla stores in PRUnichar arrays in ns{Auto,}Strings, but the second should be + category_platform.Append(NS_LITERAL_STRING("_PLATFORM")); same for + platformLocale.AssignWithConversion("en_US"); and for + aCategory.AssignWithConversion("NSILOCALE_COLLATE_PLATFORM"); and so on, later. Silly question: are the various LocaleList[i] names guaranteed never to be of the form COLLATE_PLATFORM or CTYPE_PLATFORM? I know, these are settled names and no one would add a _PLATFORM suffix -- but just in case, maybe your mangled extension should be a bit more mangled? Maybe use ".PLATFORM" or "#PLATFORM" or some such name that would never arise (because of the funky separator char) in the relevant POSIX or whatever standards. + PR_snprintf(posix_locale,128,"%s-%s",lang_code,country_code); Since these buffer sizes just changed, please do: + PR_snprintf(posix_locale, sizeof posix_locale, "%s-%s", lang_code, country_code); etc. + dest = language; + dest_space = 3; // buf_len - 1 (for null terminator) + while ((*src) && (isalpha(*src)) && (dest_space--)) { + *dest++ = tolower(*src++); + } + *dest = '\0'; + len = dest - language; + if ((len != 2) && (len != 3)) { + NS_ASSERTION((len == 2) || (len == 3), "language code too short"); + NS_ASSERTION(len < 3, "reminder: verify we can handle 3+ chararacter language code"); + *language = '\0'; + return(PR_FALSE); + } typo in that last assertion, and lots of over-parenthesized sub-expressions, but the thing that really bugged me here was the unchecked stores into dest. This isn't an XPCOM compliant method, to be sure -- in fact it's protected: and inline(!) in nsPosixLocale.h. Grumble. Ok, I buy it -- hackers need to build DEBUG and watch those assertions and buffer sizes. Maybe use #defines for the magic 3 and 65 numbers, and size the arrays by COUNTRY_MAXLEN + 1, etc.? + // + // parse the country part + // + if ((*src != '_') || (*src != '-')) { Shouldn't these != operators be ==? /me is confused. /be
To be consistent with later new code that uses MAX_LOCALE_LEN+1 to dimension char arrays, why not use +1 here: - char newLocale[128]; - (void) setlocale(LC_COLLATE, mLocale.ToCString(newLocale, 128)); + char newLocale[MAX_LOCALE_LEN]; + (void) setlocale(LC_COLLATE, mLocale.ToCString(newLocale, sizeof(newLocale))); } } inline void nsCollationUnix::DoRestoreLocale() { if (!mSavedLocale.EqualsIgnoreCase(mLocale)) { - char oldLocale[128]; - (void) setlocale(LC_COLLATE, mSavedLocale.ToCString(oldLocale, 128)); + char oldLocale[MAX_LOCALE_LEN]; + (void) setlocale(LC_COLLATE, mSavedLocale.ToCString(oldLocale, sizeof(oldLocale))); Sure, the old code used 128 for maximum capacity, or buffer length, or size (as in sizeof -- meaning including room for the '\0' terminator), but later, in GetPlatformLocale and GetXPLocale, you use MAX_LOCALE_LEN+1 to size arrays, and rightly so. Do the same here, and sr=brendan@mozilla.org. /be
fixed checked in (02/28)
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Thank you, Brian. I'll check your check-in when nightly build is ready.
Verified on 2001031110 build. zh-CN is being passed properly on zh.GBK and zh locale on Solaris environment. Also TimeFormatUnix works properly. I could not see any garbage character on file picker on zh, zh.GBK, ja_JP.PCK, ja_JP.UTF-8 locale.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: