prob converting locale to language

VERIFIED FIXED in mozilla0.9

Status

()

Core
Internationalization
P1
major
VERIFIED FIXED
17 years ago
17 years ago

People

(Reporter: Masaki Katakai, Assigned: kill this account)

Tracking

({intl})

Trunk
mozilla0.9
Sun
Solaris
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(9 attachments)

(Reporter)

Description

17 years ago
In Solaris chinese locales (e.g. zh_CN.EUC), the same UI font
isn't used for UI characters. Some glyphs are small than the others.
I'll attach the snapshot. We need a reasonable workaround or
the exact fix for Sun'S OEM release of Netscape 6.

It seems that unicode encoding is used for UI glyphs, which doesn't
use the same font for whole glyphs of UI.

I defined the following fonts for zh-CN, but which couldn't solve
the problem,

user_pref("font.name.monospace.zh-CN", "sun-song-gb2312.1980-0");
user_pref("font.name.sans-serif.zh-CN", "sun-song-gb2312.1980-0");
user_pref("font.name.serif.zh-CN", "sun-song-gb2312.1980-0");

However, when I defined gb2312 for Unicode encoding, the UI glyphs
can be displayed with the same font I specified, like,

user_pref("font.name.monospace.x-unicode", "sun-song-gb2312.1980-0");
user_pref("font.name.sans-serif.x-unicode", "sun-song-gb2312.1980-0");
user_pref("font.name.serif.x-unicode", "sun-song-gb2312.1980-0");

I don't think this can be a reasonable workaround.

I'll investigate more detail, but if you have any suggestion, please
le me know.
(Reporter)

Comment 1

17 years ago
Created attachment 19668 [details]
snapshot
(Reporter)

Updated

17 years ago
Blocks: 60916
Summary: font size problem for Chinese UI characters → font size problem for Chinese UI characters

Comment 2

17 years ago
Reassign to erik.
Assignee: nhotta → erik

Comment 3

17 years ago
I think Mozilla should be using the fonts set for the language group of the
locale when the document is in a Unicode-based encoding. Maybe the locale name
(zh_CN.EUC) is not being recognized as a Simplified Chinese locale? (I thought
I saw a related bug report today...)
(Reporter)

Comment 4

17 years ago
Thanks for evaluation, Erik. Yes, you're right. x-western seems to be passed
to GFX. I found a problem that nsLanguageAtomService.cpp can not get
'zh-CN' langGroup by 'zh-cn' key. In the nsLanguageAtomService class,
nsIPersistentProperties is used for mLangGroups but it seems that the
nsIPersistentProperties can not handle '-' character. So even when
zh-cn=zh-CN exists in langGroups.properties, zh-CN can not be returned.

    res = mLangGroups->GetStringProperty(lowered, langGroupStr);
    if (NS_FAILED(res)) {
      PRInt32 hyphen = lowered.FindChar('-');

When the first query fails, the key 'zh-cn' will be changed to
'zh', then trying to get langGroup again. zh= entry doesn't exist
also this should contain cn or tw to distinguish the locales.

So, I believe we should change the codes to handle '-'. What do
you think? How about using nsURLProperties class of uconv/ for this purpose?

For testing, I added 'zh=zh-CN' entry and changed
font definitions of zh-CN for Solaris, it worked fine.

Updated

17 years ago
Assignee: erik → katakai

Comment 5

17 years ago
Hi, Katakai-san:


Erik will be out for a while. Would you please provide a patch so we can
proceed to the review phase?


thx

Comment 6

17 years ago
If PersistentProperties cannot handle the '-' character, then that should be
fixed (instead of changing LanguageAtomService). Are you sure that zh-cn is
being looked up? Maybe it is actually zh_cn because of a bug in the locale name
parsing code.
(Reporter)

Comment 7

17 years ago
Sorry for confusion. I found my comment is wrong. The langgroup is retrieved
by the exact locale name, e.g. "zh", "zh-gb.gbk", "zh-tw.euc". Those are not
defined in langGroups.properties file.

When the first retrieval fails, LanguageAtomService will cut the word after "-"
and will try again, e.g. zh-tw and zh-cn will be "zh". "zh" is not defined in
langGroups.properties. For example, "ja_JP.PCK" will work because "ja-jp.pck"
fails, but "ja" is defined.

We can not define "zh=zh-CN" or "zh=zh-TW" in langGroups.properties because
it's for zh-cn and zh-tw. How about the following scenario? Current
implementation uses only '-' when it fails, but I believe we should check
'.' first, then use '-' for retrieval.

0) add the entry below to langGroups.properties
	zh=zh-CN

1) Start Mozilla in zh_TW.EUC locale in Solaris
2) zh_TW.EUC -> zh-tw.euc
3) "zh-tw.euc" isn't defined, so it fails
4) cut after '.', try "zh-tw"
   => zh-tw=zh-TW is defined in langGroups.properties

1) Start Mozilla in zh locale
2) zh -> zh
   zh=zh-CN is defined in langGroups.properties

1) in zh.GBK locale
2) zh.gbk fails (Assumed 60954 is fixed)
3) try zh
   zh=zh-CN is defined

This seems to work for me. I'll attach the patch if ready.
(Reporter)

Comment 8

17 years ago
Created attachment 20329 [details] [diff] [review]
patch for nsLanguageAtomService.cpp
(Reporter)

Comment 9

17 years ago
I've attached the proposal patch. Please review. Should I ask Shanjian also?

Comment 10

17 years ago
I do not approve of this patch. There should not be any dot (.) in the locale
name returned by the locale module. It should return names like en, en-US,
zh-CN, etc. The locale name parser is bad, and should be fixed. We should not
change nsLanguageAtomService to work around the bug in the locale name parser.
(Reporter)

Comment 11

17 years ago
Thanks for comments.

Do you mean UNIX's nsPosixLocale::GetXPLocale() is wrong? Actually this method
returns
".EUC" as extra, not only "zh-TW". However, I think this ".EUC" should be there
in UNIX
platform because we can't find zh-TW.BIG5 and zh-TW.EUC is different. For example,
nsDateTimeFormatUnix.cpp uses "NSILOCALE_TIME" and it stores "zh-TW.EUC", and
tries to get charset by using the key. (at GetDefaultCharsetForLocale()).

However, when we change it to only "zh-TW", we will not be able to get the correct
charset from unixcharset.properties file. For japanese, ja-JP.PCK (SJIS in Solaris)
will be considered as "ja-JP", and unixcharset.properties will return EUC-JP by
mistake.

Thus, in UNIX platform, I'm thinking .??? field is needed in nsLocaleSerice.

Comment 12

17 years ago
Adding bstell to cc.

Comment 13

17 years ago
Added intl and nsbeta1 keywords.

This bug is a showstopper Sun's Chinese releases (see snapshot)
Keywords: intl, nsbeta1

Comment 14

17 years ago
*** Bug 60179 has been marked as a duplicate of this bug. ***
(Assignee)

Comment 15

17 years ago
reassigning to bstell
Assignee: katakai → bstell
(Assignee)

Updated

17 years ago
Target Milestone: --- → mozilla0.8
(Assignee)

Updated

17 years ago
Status: NEW → ASSIGNED
(Assignee)

Comment 16

17 years ago
Erik indicates that we should be referencing RFC 1766 for correct language tags.
(Assignee)

Comment 17

17 years ago
Katakai-san,

If possible I like to work on this in the html area instead of the menus.

Can you recommend a page that displays this behavior in the html area?
(Reporter)

Comment 18

17 years ago
Hi Brian,

I understand that this problem happens because x-western langgroup is
passed to GFX even when we start Mozilla in chinese locale on Solaris.
When Mozilla is started in chinese locale, zh-cn or zh-tw should be
passed as langgroup. So I think the following page can be used for
testing.

http://village.infoweb.ne.jp/~katakai/mozilla/UTF-8.txt

When we start Mozilla in zh_TW locale and see the page, the page
should be rendered as zh-TW langgroup, but now it seems that
x-western langgroup so same characters are smaller than others.
(Assignee)

Comment 19

17 years ago
Created attachment 24191 [details] [diff] [review]
patch to nsPosixLocale.cpp to fix encoding parsing
(Assignee)

Comment 20

17 years ago
Katakai-san,

I am still waiting for a review from Erik of this patch. However it should fix 
the font size problem.

Would you please try it?

(Reporter)

Comment 21

17 years ago
It seems that "" (blank) is returned for LangGroup when I start Mozilla
in zh and zh.GBK locale on Solaris. (Let's debug nsLanguageAtomService::
LookupLanguage()). So I'd like to suggest to appen

	zh=zh-CN

entry into langGroups.properties. See my comments of 2000-12-07 00:13.

I don't have enough environment for testing because our zh-TW.jar and
zh-CN.jar are too old for the nightly build and Mozilla won't start.
(Reporter)

Comment 22

17 years ago
It seems that the patch causes other problem. When I start Mozilla in
ja_JP.UTF-8 locale, japanese characters of Month and Day become garbage.
Because Mozilla seems to use EUC-JP for default charset of the locale, which
should be UTF-8.

Even in ja_JP.UTF-8 locale, is only "ja-JP" used as posixlocale? ja-JP is used
to retrieve default charset in nsDateTimeFormatUnix.cpp but "UTF-8" is
also needed because we should know EUC/Shift_JIS/UTF-8 for the date format.
See my comments of 2000-12-08 00:33 for detail.

 - use new nsPosixLocale
 - change nsDateTimeFormatUnix.cpp to use system locale (ja_JP.UTF-8) to
   retrieve default charset (UTF-8)

or

 - do not modify nsPosixLocale and use my patch of nsLanguageAtomService.cpp
(Assignee)

Comment 23

17 years ago
Katakai-san,

Can you give me an URL that causes the nsDateTimeFormatUnix code to be called?
(Reporter)

Comment 24

17 years ago
nsDateTimeFormatUnix will be used when you open FilePicker (File->Open)
also Manage Bookmark dialog (date of bookmarks).
(Assignee)

Comment 25

17 years ago
Created attachment 24336 [details] [diff] [review]
patch to nsLocaleService.cpp and nsPosixLocale.cpp
(Assignee)

Comment 26

17 years ago
Katakai-san,

could you apply these patches to nsLocaleService.cpp and nsPosixLocale.cpp and 
let me know what happens?

(Reporter)

Comment 27

17 years ago
Brian,

langGroup is still "" and I'm sure you need to modify nsLanguageAtomService::
LookupLanguage() as well because the method assumes "-" for separator
and ignores ".", now "_" and "." can be passed to the method.

For example, If the method gets ja_jp.pck (zh_cn.gbk), replace "_" with "-".

retrieve langgroup by ja-jp.pck (zh-cn.gbk)

if not exist, ignore .pck, then search by ja-jp (zh-cn)

if not exist, ignore -jp, then search by ja (zh)

Also you need to add "zh=zh-cn" entry into langGroup.properties to handle
zh and zh.GBK, zh.UTF-8 locale of Solaris.

(Assignee)

Comment 28

17 years ago
Katakai-san,

would you kindly give me a complete description of your setup and what you are 
doing when this issue is encountered? We seem to be solving different problems.

(Assignee)

Comment 29

17 years ago
I changed the summary since the issue here is not a font problem. The language 
is not determined correctly and this causes the X font system to use fallback 
fonts which look funny.

was: font size problem for Chinese UI characters
Summary: font size problem for Chinese UI characters → prob converting locale to language
(Assignee)

Comment 30

17 years ago
here is what I believe is happening:

there are several ways to format a locale string

    la_CO.EX    the unix exteneded (raw) version
    la-CO       the XP version
    la-CO       the XP extended version
    la          the language only version

la = language
CO = country
EX = extra (eg: UTF8, EURO, ...)(

To make Unix collation/datetime work we changed the unix version of GetXPLocale 
to return the XP extended version which effected languageAtomService (this bug).

I propose that we:

    modify GetXPLocale to return the XP version
    modify the unix version of the nsLocaleService constructor to
       to also save the raw version so it can be used by the unix 
       collation/date code
    modify the unix collation/date code to use the raw version

This way:

1) The XP API will remain the same and the XP code will see the same format for 
Win/Mac/Unix.

2) The Unix code can access raw version as necessary.

(Assignee)

Updated

17 years ago
Target Milestone: mozilla0.8 → mozilla0.9

Comment 31

17 years ago
Setting Priority to P1.  This is a show-stopper for Sun.
Priority: P3 → P1
(Assignee)

Comment 32

17 years ago
Created attachment 24863 [details] [diff] [review]
patches to nsLocaleService.cpp, nsCollationUnix.cpp, nsDateTimeFormatUnix.cpp, and nsPosixLocale.cpp
(Assignee)

Comment 33

17 years ago
Katakai-san,

This attachment, 24863:

  modifies nsLocaleService.cpp to remember the original (raw) locale info
  modifies nsCollationUnix.cpp and nsDateTimeFormatUnix.cpp to use the orig 
locale info
  has the previous changes to sPosixLocale.cpp to parse the xp locale

Would you try this and let me know what does work and what still needs more 
work?

thanks

Comment 34

17 years ago
Personally, I prefer katakai's approach. I do not see a reason why we could not 
modify nsLanguageAtomService. RFC1766 sometimes is not enought to identify charset,
(ie. .extension). To keep 2 copies make the logic rather complicated. However, katakai
is fix should be modified this way:
  try to lookup the locale_name, if succeed, return.
  if there is dot part, remove it and try again. If succeed, return.
  if there is - part, remove it and try again. If succeed, return
  fall back to western. 

In his original diff, '-' stripping in embedded inside '.' stripping, that's 
will cause problem. 

On win or mac, because it has no '.' part, I do not see any side-effect there.
  

Comment 35

17 years ago
Changed QA contact to katakai@japan.sun.com.
QA Contact: teruko → katakai

Comment 36

17 years ago
I still prefer to have it fixed in the locale code. nsLanguageAtomService has
a very clear purpose, and I want to keep it that way.

We use Unicode internally, in the XP code. We convert to and from Unicode in the
platform-specific code.

Similarly, we should use RFC 1766 codes in nsLanguageAtomService, which is XP
code. We can handle Unix-specific locale name issues in the Unix-specific code,
without polluting nsLanguageAtomService.
(Assignee)

Comment 37

17 years ago
Shanjian,

It is important that the unix version produce the same XP (cross 
platform) data as the windows and mac versions. We could change
nsLanguageAtomService to work around the current unix-only locale
string and it would work but we would then be required to put 
work arounds in all future XP uses of locale as well. Modifying 
XP code to work around a platform specific problem is counter to 
the design goals of Mozilla/Netscape (this rule was written by 
unix developers who were tired of windows developers changing XP 
code to make windows work but break unix).

We need to either follow the XP standard or convince the 
windows/mac teams to change the XP standard and to find and put 
any needed fixes in their code.

It seems simpler and less risky to change the unix code to 
store the XP locale as "la-CO" not "la-CO.extra" (where 
la=language, CO=country).

Storing the XP version of locale makes the XP code work but is 
not enough information for the Unix locale sensitive code to work. 
This is why I propose to store the Unix specific data under a 
separate key.

If you have an alternate way to meet both the XP and platform 
requirements those suggestions are welcome.
(Reporter)

Comment 38

17 years ago
Hi Brian, I have tested your patch of

	http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24863

and it looks good. zh-CN and zh-TW langgroup is passed properly.

However, please don't miss zh=zh-CN entry is also needed in
langGroups.properties, which will be needed in Solaris zh, zh.GBK
and zh.UTF-8 locale names. Thanks.
(Assignee)

Comment 39

17 years ago
Katakai-san,

Could you attach a patch for the zh=zh-CN entry?

Thanks
(Reporter)

Comment 40

17 years ago
Created attachment 26083 [details] [diff] [review]
Adding zn-CN entry for zh
(Assignee)

Comment 41

17 years ago
Created attachment 26106 [details] [diff] [review]
patches to langGroups.properties, nsLocaleService.cpp, nsCollationUnix.cpp, nsDateTimeFormatUnix.cpp, nsPosixLocale.cpp
(Assignee)

Comment 42

17 years ago
Frank, 

I updated this per the code reviewed we did together.
Would you kindly add the r=ftang?

Thanks

Erik,

Would you please super review this? The latest changes are in:

  http://bugzilla.mozilla.org/showattachment.cgi?attach_id=26106

Thanks

Comment 43

17 years ago
r=ftang
String-fu:

+            category_platform.AssignWithConversion(LocaleList[i]);
+            category_platform.AppendWithConversion("_PLATFORM");

The first is an unavoidable conversion from ASCII to the two-byte-per-char,
UTF-16-like codes that Mozilla stores in PRUnichar arrays in ns{Auto,}Strings,
but the second should be

+            category_platform.Append(NS_LITERAL_STRING("_PLATFORM"));

same for

+                    platformLocale.AssignWithConversion("en_US");

and for

+  aCategory.AssignWithConversion("NSILOCALE_COLLATE_PLATFORM");

and so on, later.

Silly question: are the various LocaleList[i] names guaranteed never to be of
the form COLLATE_PLATFORM or CTYPE_PLATFORM?  I know, these are settled names
and no one would add a _PLATFORM suffix -- but just in case, maybe your mangled
extension should be a bit more mangled?  Maybe use ".PLATFORM" or "#PLATFORM" or
some such name that would never arise (because of the funky separator char) in
the relevant POSIX or whatever standards.

+      PR_snprintf(posix_locale,128,"%s-%s",lang_code,country_code);

Since these buffer sizes just changed, please do:

+      PR_snprintf(posix_locale, sizeof posix_locale, "%s-%s", lang_code,
country_code);

etc.

+  dest = language;
+  dest_space = 3; // buf_len - 1 (for null terminator)
+  while ((*src) && (isalpha(*src)) && (dest_space--)) {
+    *dest++ = tolower(*src++);
+  }
+  *dest = '\0';
+  len = dest - language;
+  if ((len != 2) && (len != 3)) {
+    NS_ASSERTION((len == 2) || (len == 3), "language code too short");
+    NS_ASSERTION(len < 3, "reminder: verify we can handle 3+ chararacter
language code");
+    *language = '\0';
+    return(PR_FALSE);
+  }

typo in that last assertion, and lots of over-parenthesized sub-expressions, but
the thing that really bugged me here was the unchecked stores into dest.  This
isn't an XPCOM compliant method, to be sure -- in fact it's protected: and
inline(!) in nsPosixLocale.h.  Grumble.  Ok, I buy it -- hackers need to build
DEBUG and watch those assertions and buffer sizes.

Maybe use #defines for the magic 3 and 65 numbers, and size the arrays by
COUNTRY_MAXLEN + 1, etc.?

+  //
+  // parse the country part
+  //
+  if ((*src != '_') || (*src != '-')) { 

Shouldn't these != operators be ==?  /me is confused.

/be
(Assignee)

Comment 45

17 years ago
Created attachment 26363 [details] [diff] [review]
revised patches to nsIPosixLocale.h, nsLocaleService.cpp, nsCollationUnix.cpp, nsDateTimeFormatUnix.cpp, nsPosixLocale.cpp, and langGroups.properties
To be consistent with later new code that uses MAX_LOCALE_LEN+1 to dimension
char arrays, why not use +1 here:

-    char newLocale[128];
-    (void) setlocale(LC_COLLATE, mLocale.ToCString(newLocale, 128));
+    char newLocale[MAX_LOCALE_LEN];
+    (void) setlocale(LC_COLLATE, mLocale.ToCString(newLocale, sizeof(newLocale)));
   }
 }
 
 inline void nsCollationUnix::DoRestoreLocale()
 {
   if (!mSavedLocale.EqualsIgnoreCase(mLocale)) { 
-    char oldLocale[128];
-    (void) setlocale(LC_COLLATE, mSavedLocale.ToCString(oldLocale, 128));
+    char oldLocale[MAX_LOCALE_LEN];
+    (void) setlocale(LC_COLLATE, mSavedLocale.ToCString(oldLocale,
sizeof(oldLocale)));

Sure, the old code used 128 for maximum capacity, or buffer length, or size (as
in sizeof -- meaning including room for the '\0' terminator), but later, in
GetPlatformLocale and GetXPLocale, you use MAX_LOCALE_LEN+1 to size arrays, and
rightly so.  Do the same here, and sr=brendan@mozilla.org.

/be
(Assignee)

Comment 47

17 years ago
Created attachment 26432 [details] [diff] [review]
revised patch with the "MAX_LOCALE_LEN+1" change
(Assignee)

Comment 48

17 years ago
fixed checked in (02/28)
Status: ASSIGNED → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED
(Reporter)

Comment 49

17 years ago
Thank you, Brian. I'll check your check-in when nightly build
is ready.
(Reporter)

Comment 50

17 years ago
Verified on 2001031110 build. zh-CN is being passed properly on zh.GBK and zh
locale on Solaris environment. Also TimeFormatUnix works properly. I could not
see any garbage character on file picker on zh, zh.GBK, ja_JP.PCK, ja_JP.UTF-8
locale.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.