Closed Bug 61108 Opened 21 years ago Closed 21 years ago

prob converting locale to language

Categories

(Core :: Internationalization, defect, P1)

Sun
Solaris
defect

Tracking

()

VERIFIED FIXED
mozilla0.9

People

(Reporter: masaki.katakai, Assigned: bstell)

References

Details

(Keywords: intl)

Attachments

(9 files)

In Solaris chinese locales (e.g. zh_CN.EUC), the same UI font
isn't used for UI characters. Some glyphs are small than the others.
I'll attach the snapshot. We need a reasonable workaround or
the exact fix for Sun'S OEM release of Netscape 6.

It seems that unicode encoding is used for UI glyphs, which doesn't
use the same font for whole glyphs of UI.

I defined the following fonts for zh-CN, but which couldn't solve
the problem,

user_pref("font.name.monospace.zh-CN", "sun-song-gb2312.1980-0");
user_pref("font.name.sans-serif.zh-CN", "sun-song-gb2312.1980-0");
user_pref("font.name.serif.zh-CN", "sun-song-gb2312.1980-0");

However, when I defined gb2312 for Unicode encoding, the UI glyphs
can be displayed with the same font I specified, like,

user_pref("font.name.monospace.x-unicode", "sun-song-gb2312.1980-0");
user_pref("font.name.sans-serif.x-unicode", "sun-song-gb2312.1980-0");
user_pref("font.name.serif.x-unicode", "sun-song-gb2312.1980-0");

I don't think this can be a reasonable workaround.

I'll investigate more detail, but if you have any suggestion, please
le me know.
Attached image snapshot
Blocks: 60916
Summary: font size problem for Chinese UI characters → font size problem for Chinese UI characters
Reassign to erik.
Assignee: nhotta → erik
I think Mozilla should be using the fonts set for the language group of the
locale when the document is in a Unicode-based encoding. Maybe the locale name
(zh_CN.EUC) is not being recognized as a Simplified Chinese locale? (I thought
I saw a related bug report today...)
Thanks for evaluation, Erik. Yes, you're right. x-western seems to be passed
to GFX. I found a problem that nsLanguageAtomService.cpp can not get
'zh-CN' langGroup by 'zh-cn' key. In the nsLanguageAtomService class,
nsIPersistentProperties is used for mLangGroups but it seems that the
nsIPersistentProperties can not handle '-' character. So even when
zh-cn=zh-CN exists in langGroups.properties, zh-CN can not be returned.

    res = mLangGroups->GetStringProperty(lowered, langGroupStr);
    if (NS_FAILED(res)) {
      PRInt32 hyphen = lowered.FindChar('-');

When the first query fails, the key 'zh-cn' will be changed to
'zh', then trying to get langGroup again. zh= entry doesn't exist
also this should contain cn or tw to distinguish the locales.

So, I believe we should change the codes to handle '-'. What do
you think? How about using nsURLProperties class of uconv/ for this purpose?

For testing, I added 'zh=zh-CN' entry and changed
font definitions of zh-CN for Solaris, it worked fine.
Assignee: erik → katakai
Hi, Katakai-san:


Erik will be out for a while. Would you please provide a patch so we can
proceed to the review phase?


thx
If PersistentProperties cannot handle the '-' character, then that should be
fixed (instead of changing LanguageAtomService). Are you sure that zh-cn is
being looked up? Maybe it is actually zh_cn because of a bug in the locale name
parsing code.
Sorry for confusion. I found my comment is wrong. The langgroup is retrieved
by the exact locale name, e.g. "zh", "zh-gb.gbk", "zh-tw.euc". Those are not
defined in langGroups.properties file.

When the first retrieval fails, LanguageAtomService will cut the word after "-"
and will try again, e.g. zh-tw and zh-cn will be "zh". "zh" is not defined in
langGroups.properties. For example, "ja_JP.PCK" will work because "ja-jp.pck"
fails, but "ja" is defined.

We can not define "zh=zh-CN" or "zh=zh-TW" in langGroups.properties because
it's for zh-cn and zh-tw. How about the following scenario? Current
implementation uses only '-' when it fails, but I believe we should check
'.' first, then use '-' for retrieval.

0) add the entry below to langGroups.properties
	zh=zh-CN

1) Start Mozilla in zh_TW.EUC locale in Solaris
2) zh_TW.EUC -> zh-tw.euc
3) "zh-tw.euc" isn't defined, so it fails
4) cut after '.', try "zh-tw"
   => zh-tw=zh-TW is defined in langGroups.properties

1) Start Mozilla in zh locale
2) zh -> zh
   zh=zh-CN is defined in langGroups.properties

1) in zh.GBK locale
2) zh.gbk fails (Assumed 60954 is fixed)
3) try zh
   zh=zh-CN is defined

This seems to work for me. I'll attach the patch if ready.
I've attached the proposal patch. Please review. Should I ask Shanjian also?
I do not approve of this patch. There should not be any dot (.) in the locale
name returned by the locale module. It should return names like en, en-US,
zh-CN, etc. The locale name parser is bad, and should be fixed. We should not
change nsLanguageAtomService to work around the bug in the locale name parser.
Thanks for comments.

Do you mean UNIX's nsPosixLocale::GetXPLocale() is wrong? Actually this method
returns
".EUC" as extra, not only "zh-TW". However, I think this ".EUC" should be there
in UNIX
platform because we can't find zh-TW.BIG5 and zh-TW.EUC is different. For example,
nsDateTimeFormatUnix.cpp uses "NSILOCALE_TIME" and it stores "zh-TW.EUC", and
tries to get charset by using the key. (at GetDefaultCharsetForLocale()).

However, when we change it to only "zh-TW", we will not be able to get the correct
charset from unixcharset.properties file. For japanese, ja-JP.PCK (SJIS in Solaris)
will be considered as "ja-JP", and unixcharset.properties will return EUC-JP by
mistake.

Thus, in UNIX platform, I'm thinking .??? field is needed in nsLocaleSerice.
Adding bstell to cc.
Added intl and nsbeta1 keywords.

This bug is a showstopper Sun's Chinese releases (see snapshot)
Keywords: intl, nsbeta1
*** Bug 60179 has been marked as a duplicate of this bug. ***
reassigning to bstell
Assignee: katakai → bstell
Target Milestone: --- → mozilla0.8
Status: NEW → ASSIGNED
Erik indicates that we should be referencing RFC 1766 for correct language tags.
Katakai-san,

If possible I like to work on this in the html area instead of the menus.

Can you recommend a page that displays this behavior in the html area?
Hi Brian,

I understand that this problem happens because x-western langgroup is
passed to GFX even when we start Mozilla in chinese locale on Solaris.
When Mozilla is started in chinese locale, zh-cn or zh-tw should be
passed as langgroup. So I think the following page can be used for
testing.

http://village.infoweb.ne.jp/~katakai/mozilla/UTF-8.txt

When we start Mozilla in zh_TW locale and see the page, the page
should be rendered as zh-TW langgroup, but now it seems that
x-western langgroup so same characters are smaller than others.
Katakai-san,

I am still waiting for a review from Erik of this patch. However it should fix 
the font size problem.

Would you please try it?

It seems that "" (blank) is returned for LangGroup when I start Mozilla
in zh and zh.GBK locale on Solaris. (Let's debug nsLanguageAtomService::
LookupLanguage()). So I'd like to suggest to appen

	zh=zh-CN

entry into langGroups.properties. See my comments of 2000-12-07 00:13.

I don't have enough environment for testing because our zh-TW.jar and
zh-CN.jar are too old for the nightly build and Mozilla won't start.
It seems that the patch causes other problem. When I start Mozilla in
ja_JP.UTF-8 locale, japanese characters of Month and Day become garbage.
Because Mozilla seems to use EUC-JP for default charset of the locale, which
should be UTF-8.

Even in ja_JP.UTF-8 locale, is only "ja-JP" used as posixlocale? ja-JP is used
to retrieve default charset in nsDateTimeFormatUnix.cpp but "UTF-8" is
also needed because we should know EUC/Shift_JIS/UTF-8 for the date format.
See my comments of 2000-12-08 00:33 for detail.

 - use new nsPosixLocale
 - change nsDateTimeFormatUnix.cpp to use system locale (ja_JP.UTF-8) to
   retrieve default charset (UTF-8)

or

 - do not modify nsPosixLocale and use my patch of nsLanguageAtomService.cpp
Katakai-san,

Can you give me an URL that causes the nsDateTimeFormatUnix code to be called?
nsDateTimeFormatUnix will be used when you open FilePicker (File->Open)
also Manage Bookmark dialog (date of bookmarks).
Katakai-san,

could you apply these patches to nsLocaleService.cpp and nsPosixLocale.cpp and 
let me know what happens?

Brian,

langGroup is still "" and I'm sure you need to modify nsLanguageAtomService::
LookupLanguage() as well because the method assumes "-" for separator
and ignores ".", now "_" and "." can be passed to the method.

For example, If the method gets ja_jp.pck (zh_cn.gbk), replace "_" with "-".

retrieve langgroup by ja-jp.pck (zh-cn.gbk)

if not exist, ignore .pck, then search by ja-jp (zh-cn)

if not exist, ignore -jp, then search by ja (zh)

Also you need to add "zh=zh-cn" entry into langGroup.properties to handle
zh and zh.GBK, zh.UTF-8 locale of Solaris.

Katakai-san,

would you kindly give me a complete description of your setup and what you are 
doing when this issue is encountered? We seem to be solving different problems.

I changed the summary since the issue here is not a font problem. The language 
is not determined correctly and this causes the X font system to use fallback 
fonts which look funny.

was: font size problem for Chinese UI characters
Summary: font size problem for Chinese UI characters → prob converting locale to language
here is what I believe is happening:

there are several ways to format a locale string

    la_CO.EX    the unix exteneded (raw) version
    la-CO       the XP version
    la-CO       the XP extended version
    la          the language only version

la = language
CO = country
EX = extra (eg: UTF8, EURO, ...)(

To make Unix collation/datetime work we changed the unix version of GetXPLocale 
to return the XP extended version which effected languageAtomService (this bug).

I propose that we:

    modify GetXPLocale to return the XP version
    modify the unix version of the nsLocaleService constructor to
       to also save the raw version so it can be used by the unix 
       collation/date code
    modify the unix collation/date code to use the raw version

This way:

1) The XP API will remain the same and the XP code will see the same format for 
Win/Mac/Unix.

2) The Unix code can access raw version as necessary.

Target Milestone: mozilla0.8 → mozilla0.9
Setting Priority to P1.  This is a show-stopper for Sun.
Priority: P3 → P1
Katakai-san,

This attachment, 24863:

  modifies nsLocaleService.cpp to remember the original (raw) locale info
  modifies nsCollationUnix.cpp and nsDateTimeFormatUnix.cpp to use the orig 
locale info
  has the previous changes to sPosixLocale.cpp to parse the xp locale

Would you try this and let me know what does work and what still needs more 
work?

thanks
Personally, I prefer katakai's approach. I do not see a reason why we could not 
modify nsLanguageAtomService. RFC1766 sometimes is not enought to identify charset,
(ie. .extension). To keep 2 copies make the logic rather complicated. However, katakai
is fix should be modified this way:
  try to lookup the locale_name, if succeed, return.
  if there is dot part, remove it and try again. If succeed, return.
  if there is - part, remove it and try again. If succeed, return
  fall back to western. 

In his original diff, '-' stripping in embedded inside '.' stripping, that's 
will cause problem. 

On win or mac, because it has no '.' part, I do not see any side-effect there.
  
Changed QA contact to katakai@japan.sun.com.
QA Contact: teruko → katakai
I still prefer to have it fixed in the locale code. nsLanguageAtomService has
a very clear purpose, and I want to keep it that way.

We use Unicode internally, in the XP code. We convert to and from Unicode in the
platform-specific code.

Similarly, we should use RFC 1766 codes in nsLanguageAtomService, which is XP
code. We can handle Unix-specific locale name issues in the Unix-specific code,
without polluting nsLanguageAtomService.
Shanjian,

It is important that the unix version produce the same XP (cross 
platform) data as the windows and mac versions. We could change
nsLanguageAtomService to work around the current unix-only locale
string and it would work but we would then be required to put 
work arounds in all future XP uses of locale as well. Modifying 
XP code to work around a platform specific problem is counter to 
the design goals of Mozilla/Netscape (this rule was written by 
unix developers who were tired of windows developers changing XP 
code to make windows work but break unix).

We need to either follow the XP standard or convince the 
windows/mac teams to change the XP standard and to find and put 
any needed fixes in their code.

It seems simpler and less risky to change the unix code to 
store the XP locale as "la-CO" not "la-CO.extra" (where 
la=language, CO=country).

Storing the XP version of locale makes the XP code work but is 
not enough information for the Unix locale sensitive code to work. 
This is why I propose to store the Unix specific data under a 
separate key.

If you have an alternate way to meet both the XP and platform 
requirements those suggestions are welcome.
Hi Brian, I have tested your patch of

	http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24863

and it looks good. zh-CN and zh-TW langgroup is passed properly.

However, please don't miss zh=zh-CN entry is also needed in
langGroups.properties, which will be needed in Solaris zh, zh.GBK
and zh.UTF-8 locale names. Thanks.
Katakai-san,

Could you attach a patch for the zh=zh-CN entry?

Thanks
Frank, 

I updated this per the code reviewed we did together.
Would you kindly add the r=ftang?

Thanks

Erik,

Would you please super review this? The latest changes are in:

  http://bugzilla.mozilla.org/showattachment.cgi?attach_id=26106

Thanks
r=ftang
String-fu:

+            category_platform.AssignWithConversion(LocaleList[i]);
+            category_platform.AppendWithConversion("_PLATFORM");

The first is an unavoidable conversion from ASCII to the two-byte-per-char,
UTF-16-like codes that Mozilla stores in PRUnichar arrays in ns{Auto,}Strings,
but the second should be

+            category_platform.Append(NS_LITERAL_STRING("_PLATFORM"));

same for

+                    platformLocale.AssignWithConversion("en_US");

and for

+  aCategory.AssignWithConversion("NSILOCALE_COLLATE_PLATFORM");

and so on, later.

Silly question: are the various LocaleList[i] names guaranteed never to be of
the form COLLATE_PLATFORM or CTYPE_PLATFORM?  I know, these are settled names
and no one would add a _PLATFORM suffix -- but just in case, maybe your mangled
extension should be a bit more mangled?  Maybe use ".PLATFORM" or "#PLATFORM" or
some such name that would never arise (because of the funky separator char) in
the relevant POSIX or whatever standards.

+      PR_snprintf(posix_locale,128,"%s-%s",lang_code,country_code);

Since these buffer sizes just changed, please do:

+      PR_snprintf(posix_locale, sizeof posix_locale, "%s-%s", lang_code,
country_code);

etc.

+  dest = language;
+  dest_space = 3; // buf_len - 1 (for null terminator)
+  while ((*src) && (isalpha(*src)) && (dest_space--)) {
+    *dest++ = tolower(*src++);
+  }
+  *dest = '\0';
+  len = dest - language;
+  if ((len != 2) && (len != 3)) {
+    NS_ASSERTION((len == 2) || (len == 3), "language code too short");
+    NS_ASSERTION(len < 3, "reminder: verify we can handle 3+ chararacter
language code");
+    *language = '\0';
+    return(PR_FALSE);
+  }

typo in that last assertion, and lots of over-parenthesized sub-expressions, but
the thing that really bugged me here was the unchecked stores into dest.  This
isn't an XPCOM compliant method, to be sure -- in fact it's protected: and
inline(!) in nsPosixLocale.h.  Grumble.  Ok, I buy it -- hackers need to build
DEBUG and watch those assertions and buffer sizes.

Maybe use #defines for the magic 3 and 65 numbers, and size the arrays by
COUNTRY_MAXLEN + 1, etc.?

+  //
+  // parse the country part
+  //
+  if ((*src != '_') || (*src != '-')) { 

Shouldn't these != operators be ==?  /me is confused.

/be
To be consistent with later new code that uses MAX_LOCALE_LEN+1 to dimension
char arrays, why not use +1 here:

-    char newLocale[128];
-    (void) setlocale(LC_COLLATE, mLocale.ToCString(newLocale, 128));
+    char newLocale[MAX_LOCALE_LEN];
+    (void) setlocale(LC_COLLATE, mLocale.ToCString(newLocale, sizeof(newLocale)));
   }
 }
 
 inline void nsCollationUnix::DoRestoreLocale()
 {
   if (!mSavedLocale.EqualsIgnoreCase(mLocale)) { 
-    char oldLocale[128];
-    (void) setlocale(LC_COLLATE, mSavedLocale.ToCString(oldLocale, 128));
+    char oldLocale[MAX_LOCALE_LEN];
+    (void) setlocale(LC_COLLATE, mSavedLocale.ToCString(oldLocale,
sizeof(oldLocale)));

Sure, the old code used 128 for maximum capacity, or buffer length, or size (as
in sizeof -- meaning including room for the '\0' terminator), but later, in
GetPlatformLocale and GetXPLocale, you use MAX_LOCALE_LEN+1 to size arrays, and
rightly so.  Do the same here, and sr=brendan@mozilla.org.

/be
fixed checked in (02/28)
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Thank you, Brian. I'll check your check-in when nightly build
is ready.
Verified on 2001031110 build. zh-CN is being passed properly on zh.GBK and zh
locale on Solaris environment. Also TimeFormatUnix works properly. I could not
see any garbage character on file picker on zh, zh.GBK, ja_JP.PCK, ja_JP.UTF-8
locale.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.