Closed
Bug 61108
Opened 24 years ago
Closed 24 years ago
prob converting locale to language
Categories
(Core :: Internationalization, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla0.9
People
(Reporter: masaki.katakai, Assigned: bstell)
References
Details
(Keywords: intl)
Attachments
(9 files)
50.63 KB,
image/jpeg
|
Details | |
1.74 KB,
patch
|
Details | Diff | Splinter Review | |
4.08 KB,
patch
|
Details | Diff | Splinter Review | |
6.41 KB,
patch
|
Details | Diff | Splinter Review | |
patches to nsLocaleService.cpp, nsCollationUnix.cpp, nsDateTimeFormatUnix.cpp, and nsPosixLocale.cpp
7.99 KB,
patch
|
Details | Diff | Splinter Review | |
463 bytes,
patch
|
Details | Diff | Splinter Review | |
10.81 KB,
patch
|
Details | Diff | Splinter Review | |
13.09 KB,
patch
|
Details | Diff | Splinter Review | |
13.40 KB,
patch
|
Details | Diff | Splinter Review |
In Solaris chinese locales (e.g. zh_CN.EUC), the same UI font
isn't used for UI characters. Some glyphs are small than the others.
I'll attach the snapshot. We need a reasonable workaround or
the exact fix for Sun'S OEM release of Netscape 6.
It seems that unicode encoding is used for UI glyphs, which doesn't
use the same font for whole glyphs of UI.
I defined the following fonts for zh-CN, but which couldn't solve
the problem,
user_pref("font.name.monospace.zh-CN", "sun-song-gb2312.1980-0");
user_pref("font.name.sans-serif.zh-CN", "sun-song-gb2312.1980-0");
user_pref("font.name.serif.zh-CN", "sun-song-gb2312.1980-0");
However, when I defined gb2312 for Unicode encoding, the UI glyphs
can be displayed with the same font I specified, like,
user_pref("font.name.monospace.x-unicode", "sun-song-gb2312.1980-0");
user_pref("font.name.sans-serif.x-unicode", "sun-song-gb2312.1980-0");
user_pref("font.name.serif.x-unicode", "sun-song-gb2312.1980-0");
I don't think this can be a reasonable workaround.
I'll investigate more detail, but if you have any suggestion, please
le me know.
Reporter | ||
Comment 1•24 years ago
|
||
Reporter | ||
Updated•24 years ago
|
Blocks: 60916
Summary: font size problem for Chinese UI characters → font size problem for Chinese UI characters
Comment 3•24 years ago
|
||
I think Mozilla should be using the fonts set for the language group of the
locale when the document is in a Unicode-based encoding. Maybe the locale name
(zh_CN.EUC) is not being recognized as a Simplified Chinese locale? (I thought
I saw a related bug report today...)
Reporter | ||
Comment 4•24 years ago
|
||
Thanks for evaluation, Erik. Yes, you're right. x-western seems to be passed
to GFX. I found a problem that nsLanguageAtomService.cpp can not get
'zh-CN' langGroup by 'zh-cn' key. In the nsLanguageAtomService class,
nsIPersistentProperties is used for mLangGroups but it seems that the
nsIPersistentProperties can not handle '-' character. So even when
zh-cn=zh-CN exists in langGroups.properties, zh-CN can not be returned.
res = mLangGroups->GetStringProperty(lowered, langGroupStr);
if (NS_FAILED(res)) {
PRInt32 hyphen = lowered.FindChar('-');
When the first query fails, the key 'zh-cn' will be changed to
'zh', then trying to get langGroup again. zh= entry doesn't exist
also this should contain cn or tw to distinguish the locales.
So, I believe we should change the codes to handle '-'. What do
you think? How about using nsURLProperties class of uconv/ for this purpose?
For testing, I added 'zh=zh-CN' entry and changed
font definitions of zh-CN for Solaris, it worked fine.
Hi, Katakai-san:
Erik will be out for a while. Would you please provide a patch so we can
proceed to the review phase?
thx
Comment 6•24 years ago
|
||
If PersistentProperties cannot handle the '-' character, then that should be
fixed (instead of changing LanguageAtomService). Are you sure that zh-cn is
being looked up? Maybe it is actually zh_cn because of a bug in the locale name
parsing code.
Reporter | ||
Comment 7•24 years ago
|
||
Sorry for confusion. I found my comment is wrong. The langgroup is retrieved
by the exact locale name, e.g. "zh", "zh-gb.gbk", "zh-tw.euc". Those are not
defined in langGroups.properties file.
When the first retrieval fails, LanguageAtomService will cut the word after "-"
and will try again, e.g. zh-tw and zh-cn will be "zh". "zh" is not defined in
langGroups.properties. For example, "ja_JP.PCK" will work because "ja-jp.pck"
fails, but "ja" is defined.
We can not define "zh=zh-CN" or "zh=zh-TW" in langGroups.properties because
it's for zh-cn and zh-tw. How about the following scenario? Current
implementation uses only '-' when it fails, but I believe we should check
'.' first, then use '-' for retrieval.
0) add the entry below to langGroups.properties
zh=zh-CN
1) Start Mozilla in zh_TW.EUC locale in Solaris
2) zh_TW.EUC -> zh-tw.euc
3) "zh-tw.euc" isn't defined, so it fails
4) cut after '.', try "zh-tw"
=> zh-tw=zh-TW is defined in langGroups.properties
1) Start Mozilla in zh locale
2) zh -> zh
zh=zh-CN is defined in langGroups.properties
1) in zh.GBK locale
2) zh.gbk fails (Assumed 60954 is fixed)
3) try zh
zh=zh-CN is defined
This seems to work for me. I'll attach the patch if ready.
Reporter | ||
Comment 8•24 years ago
|
||
Reporter | ||
Comment 9•24 years ago
|
||
I've attached the proposal patch. Please review. Should I ask Shanjian also?
Comment 10•24 years ago
|
||
I do not approve of this patch. There should not be any dot (.) in the locale
name returned by the locale module. It should return names like en, en-US,
zh-CN, etc. The locale name parser is bad, and should be fixed. We should not
change nsLanguageAtomService to work around the bug in the locale name parser.
Reporter | ||
Comment 11•24 years ago
|
||
Thanks for comments.
Do you mean UNIX's nsPosixLocale::GetXPLocale() is wrong? Actually this method
returns
".EUC" as extra, not only "zh-TW". However, I think this ".EUC" should be there
in UNIX
platform because we can't find zh-TW.BIG5 and zh-TW.EUC is different. For example,
nsDateTimeFormatUnix.cpp uses "NSILOCALE_TIME" and it stores "zh-TW.EUC", and
tries to get charset by using the key. (at GetDefaultCharsetForLocale()).
However, when we change it to only "zh-TW", we will not be able to get the correct
charset from unixcharset.properties file. For japanese, ja-JP.PCK (SJIS in Solaris)
will be considered as "ja-JP", and unixcharset.properties will return EUC-JP by
mistake.
Thus, in UNIX platform, I'm thinking .??? field is needed in nsLocaleSerice.
Comment 12•24 years ago
|
||
Adding bstell to cc.
Comment 13•24 years ago
|
||
Added intl and nsbeta1 keywords.
This bug is a showstopper Sun's Chinese releases (see snapshot)
Comment 14•24 years ago
|
||
*** Bug 60179 has been marked as a duplicate of this bug. ***
Assignee | ||
Updated•24 years ago
|
Target Milestone: --- → mozilla0.8
Assignee | ||
Updated•24 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 16•24 years ago
|
||
Erik indicates that we should be referencing RFC 1766 for correct language tags.
Assignee | ||
Comment 17•24 years ago
|
||
Katakai-san,
If possible I like to work on this in the html area instead of the menus.
Can you recommend a page that displays this behavior in the html area?
Reporter | ||
Comment 18•24 years ago
|
||
Hi Brian,
I understand that this problem happens because x-western langgroup is
passed to GFX even when we start Mozilla in chinese locale on Solaris.
When Mozilla is started in chinese locale, zh-cn or zh-tw should be
passed as langgroup. So I think the following page can be used for
testing.
http://village.infoweb.ne.jp/~katakai/mozilla/UTF-8.txt
When we start Mozilla in zh_TW locale and see the page, the page
should be rendered as zh-TW langgroup, but now it seems that
x-western langgroup so same characters are smaller than others.
Assignee | ||
Comment 19•24 years ago
|
||
Assignee | ||
Comment 20•24 years ago
|
||
Katakai-san,
I am still waiting for a review from Erik of this patch. However it should fix
the font size problem.
Would you please try it?
Reporter | ||
Comment 21•24 years ago
|
||
It seems that "" (blank) is returned for LangGroup when I start Mozilla
in zh and zh.GBK locale on Solaris. (Let's debug nsLanguageAtomService::
LookupLanguage()). So I'd like to suggest to appen
zh=zh-CN
entry into langGroups.properties. See my comments of 2000-12-07 00:13.
I don't have enough environment for testing because our zh-TW.jar and
zh-CN.jar are too old for the nightly build and Mozilla won't start.
Reporter | ||
Comment 22•24 years ago
|
||
It seems that the patch causes other problem. When I start Mozilla in
ja_JP.UTF-8 locale, japanese characters of Month and Day become garbage.
Because Mozilla seems to use EUC-JP for default charset of the locale, which
should be UTF-8.
Even in ja_JP.UTF-8 locale, is only "ja-JP" used as posixlocale? ja-JP is used
to retrieve default charset in nsDateTimeFormatUnix.cpp but "UTF-8" is
also needed because we should know EUC/Shift_JIS/UTF-8 for the date format.
See my comments of 2000-12-08 00:33 for detail.
- use new nsPosixLocale
- change nsDateTimeFormatUnix.cpp to use system locale (ja_JP.UTF-8) to
retrieve default charset (UTF-8)
or
- do not modify nsPosixLocale and use my patch of nsLanguageAtomService.cpp
Assignee | ||
Comment 23•24 years ago
|
||
Katakai-san,
Can you give me an URL that causes the nsDateTimeFormatUnix code to be called?
Reporter | ||
Comment 24•24 years ago
|
||
nsDateTimeFormatUnix will be used when you open FilePicker (File->Open)
also Manage Bookmark dialog (date of bookmarks).
Assignee | ||
Comment 25•24 years ago
|
||
Assignee | ||
Comment 26•24 years ago
|
||
Katakai-san,
could you apply these patches to nsLocaleService.cpp and nsPosixLocale.cpp and
let me know what happens?
Reporter | ||
Comment 27•24 years ago
|
||
Brian,
langGroup is still "" and I'm sure you need to modify nsLanguageAtomService::
LookupLanguage() as well because the method assumes "-" for separator
and ignores ".", now "_" and "." can be passed to the method.
For example, If the method gets ja_jp.pck (zh_cn.gbk), replace "_" with "-".
retrieve langgroup by ja-jp.pck (zh-cn.gbk)
if not exist, ignore .pck, then search by ja-jp (zh-cn)
if not exist, ignore -jp, then search by ja (zh)
Also you need to add "zh=zh-cn" entry into langGroup.properties to handle
zh and zh.GBK, zh.UTF-8 locale of Solaris.
Assignee | ||
Comment 28•24 years ago
|
||
Katakai-san,
would you kindly give me a complete description of your setup and what you are
doing when this issue is encountered? We seem to be solving different problems.
Assignee | ||
Comment 29•24 years ago
|
||
I changed the summary since the issue here is not a font problem. The language
is not determined correctly and this causes the X font system to use fallback
fonts which look funny.
was: font size problem for Chinese UI characters
Summary: font size problem for Chinese UI characters → prob converting locale to language
Assignee | ||
Comment 30•24 years ago
|
||
here is what I believe is happening:
there are several ways to format a locale string
la_CO.EX the unix exteneded (raw) version
la-CO the XP version
la-CO the XP extended version
la the language only version
la = language
CO = country
EX = extra (eg: UTF8, EURO, ...)(
To make Unix collation/datetime work we changed the unix version of GetXPLocale
to return the XP extended version which effected languageAtomService (this bug).
I propose that we:
modify GetXPLocale to return the XP version
modify the unix version of the nsLocaleService constructor to
to also save the raw version so it can be used by the unix
collation/date code
modify the unix collation/date code to use the raw version
This way:
1) The XP API will remain the same and the XP code will see the same format for
Win/Mac/Unix.
2) The Unix code can access raw version as necessary.
Assignee | ||
Updated•24 years ago
|
Target Milestone: mozilla0.8 → mozilla0.9
Assignee | ||
Comment 32•24 years ago
|
||
Assignee | ||
Comment 33•24 years ago
|
||
Katakai-san,
This attachment, 24863:
modifies nsLocaleService.cpp to remember the original (raw) locale info
modifies nsCollationUnix.cpp and nsDateTimeFormatUnix.cpp to use the orig
locale info
has the previous changes to sPosixLocale.cpp to parse the xp locale
Would you try this and let me know what does work and what still needs more
work?
thanks
Comment 34•24 years ago
|
||
Personally, I prefer katakai's approach. I do not see a reason why we could not
modify nsLanguageAtomService. RFC1766 sometimes is not enought to identify charset,
(ie. .extension). To keep 2 copies make the logic rather complicated. However, katakai
is fix should be modified this way:
try to lookup the locale_name, if succeed, return.
if there is dot part, remove it and try again. If succeed, return.
if there is - part, remove it and try again. If succeed, return
fall back to western.
In his original diff, '-' stripping in embedded inside '.' stripping, that's
will cause problem.
On win or mac, because it has no '.' part, I do not see any side-effect there.
Comment 36•24 years ago
|
||
I still prefer to have it fixed in the locale code. nsLanguageAtomService has
a very clear purpose, and I want to keep it that way.
We use Unicode internally, in the XP code. We convert to and from Unicode in the
platform-specific code.
Similarly, we should use RFC 1766 codes in nsLanguageAtomService, which is XP
code. We can handle Unix-specific locale name issues in the Unix-specific code,
without polluting nsLanguageAtomService.
Assignee | ||
Comment 37•24 years ago
|
||
Shanjian,
It is important that the unix version produce the same XP (cross
platform) data as the windows and mac versions. We could change
nsLanguageAtomService to work around the current unix-only locale
string and it would work but we would then be required to put
work arounds in all future XP uses of locale as well. Modifying
XP code to work around a platform specific problem is counter to
the design goals of Mozilla/Netscape (this rule was written by
unix developers who were tired of windows developers changing XP
code to make windows work but break unix).
We need to either follow the XP standard or convince the
windows/mac teams to change the XP standard and to find and put
any needed fixes in their code.
It seems simpler and less risky to change the unix code to
store the XP locale as "la-CO" not "la-CO.extra" (where
la=language, CO=country).
Storing the XP version of locale makes the XP code work but is
not enough information for the Unix locale sensitive code to work.
This is why I propose to store the Unix specific data under a
separate key.
If you have an alternate way to meet both the XP and platform
requirements those suggestions are welcome.
Reporter | ||
Comment 38•24 years ago
|
||
Hi Brian, I have tested your patch of
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24863
and it looks good. zh-CN and zh-TW langgroup is passed properly.
However, please don't miss zh=zh-CN entry is also needed in
langGroups.properties, which will be needed in Solaris zh, zh.GBK
and zh.UTF-8 locale names. Thanks.
Assignee | ||
Comment 39•24 years ago
|
||
Katakai-san,
Could you attach a patch for the zh=zh-CN entry?
Thanks
Reporter | ||
Comment 40•24 years ago
|
||
Assignee | ||
Comment 41•24 years ago
|
||
Assignee | ||
Comment 42•24 years ago
|
||
Frank,
I updated this per the code reviewed we did together.
Would you kindly add the r=ftang?
Thanks
Erik,
Would you please super review this? The latest changes are in:
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=26106
Thanks
Comment 43•24 years ago
|
||
r=ftang
Comment 44•24 years ago
|
||
String-fu:
+ category_platform.AssignWithConversion(LocaleList[i]);
+ category_platform.AppendWithConversion("_PLATFORM");
The first is an unavoidable conversion from ASCII to the two-byte-per-char,
UTF-16-like codes that Mozilla stores in PRUnichar arrays in ns{Auto,}Strings,
but the second should be
+ category_platform.Append(NS_LITERAL_STRING("_PLATFORM"));
same for
+ platformLocale.AssignWithConversion("en_US");
and for
+ aCategory.AssignWithConversion("NSILOCALE_COLLATE_PLATFORM");
and so on, later.
Silly question: are the various LocaleList[i] names guaranteed never to be of
the form COLLATE_PLATFORM or CTYPE_PLATFORM? I know, these are settled names
and no one would add a _PLATFORM suffix -- but just in case, maybe your mangled
extension should be a bit more mangled? Maybe use ".PLATFORM" or "#PLATFORM" or
some such name that would never arise (because of the funky separator char) in
the relevant POSIX or whatever standards.
+ PR_snprintf(posix_locale,128,"%s-%s",lang_code,country_code);
Since these buffer sizes just changed, please do:
+ PR_snprintf(posix_locale, sizeof posix_locale, "%s-%s", lang_code,
country_code);
etc.
+ dest = language;
+ dest_space = 3; // buf_len - 1 (for null terminator)
+ while ((*src) && (isalpha(*src)) && (dest_space--)) {
+ *dest++ = tolower(*src++);
+ }
+ *dest = '\0';
+ len = dest - language;
+ if ((len != 2) && (len != 3)) {
+ NS_ASSERTION((len == 2) || (len == 3), "language code too short");
+ NS_ASSERTION(len < 3, "reminder: verify we can handle 3+ chararacter
language code");
+ *language = '\0';
+ return(PR_FALSE);
+ }
typo in that last assertion, and lots of over-parenthesized sub-expressions, but
the thing that really bugged me here was the unchecked stores into dest. This
isn't an XPCOM compliant method, to be sure -- in fact it's protected: and
inline(!) in nsPosixLocale.h. Grumble. Ok, I buy it -- hackers need to build
DEBUG and watch those assertions and buffer sizes.
Maybe use #defines for the magic 3 and 65 numbers, and size the arrays by
COUNTRY_MAXLEN + 1, etc.?
+ //
+ // parse the country part
+ //
+ if ((*src != '_') || (*src != '-')) {
Shouldn't these != operators be ==? /me is confused.
/be
Assignee | ||
Comment 45•24 years ago
|
||
Comment 46•24 years ago
|
||
To be consistent with later new code that uses MAX_LOCALE_LEN+1 to dimension
char arrays, why not use +1 here:
- char newLocale[128];
- (void) setlocale(LC_COLLATE, mLocale.ToCString(newLocale, 128));
+ char newLocale[MAX_LOCALE_LEN];
+ (void) setlocale(LC_COLLATE, mLocale.ToCString(newLocale, sizeof(newLocale)));
}
}
inline void nsCollationUnix::DoRestoreLocale()
{
if (!mSavedLocale.EqualsIgnoreCase(mLocale)) {
- char oldLocale[128];
- (void) setlocale(LC_COLLATE, mSavedLocale.ToCString(oldLocale, 128));
+ char oldLocale[MAX_LOCALE_LEN];
+ (void) setlocale(LC_COLLATE, mSavedLocale.ToCString(oldLocale,
sizeof(oldLocale)));
Sure, the old code used 128 for maximum capacity, or buffer length, or size (as
in sizeof -- meaning including room for the '\0' terminator), but later, in
GetPlatformLocale and GetXPLocale, you use MAX_LOCALE_LEN+1 to size arrays, and
rightly so. Do the same here, and sr=brendan@mozilla.org.
/be
Assignee | ||
Comment 47•24 years ago
|
||
Assignee | ||
Comment 48•24 years ago
|
||
fixed checked in (02/28)
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 49•24 years ago
|
||
Thank you, Brian. I'll check your check-in when nightly build
is ready.
Reporter | ||
Comment 50•24 years ago
|
||
Verified on 2001031110 build. zh-CN is being passed properly on zh.GBK and zh
locale on Solaris environment. Also TimeFormatUnix works properly. I could not
see any garbage character on file picker on zh, zh.GBK, ja_JP.PCK, ja_JP.UTF-8
locale.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•