Closed Bug 74670 Opened 24 years ago Closed 24 years ago

Need some entries in unixcharset.properties for Solaris new Asian locales.

Categories

(Core :: Internationalization, defect, P2)

Sun
Solaris
defect

Tracking

()

VERIFIED FIXED
mozilla0.9.1

People

(Reporter: eyan, Assigned: ftang)

References

Details

Attachments

(1 file)

In Solaris 9, some new Asian locales are integrated. so unixcharset.properties need add some entries for these new locales. After check the unixcharset.properties file, we think need add some entries for the following locales of Solaris 9: 1. zh_HK.BIG5HK 2. zh_HK.UTF-8 3. zh_CN.GB18030 4. th_TH.ISO8859-11 5. hi_IN.UTF-8 6. zh_TW.UTF-8 the entries in unixcharset.properties need just like that: locale.all.zh_HK.BIG5HK=Big5-HKSCS locale.all.zh_CN.GB18030=x-gb18030 or gb18030 locale.all.th_TH.ISO8859-11=ISO-8859-11 locale.all.hi_IN.UTF-8=UTF-8 locale.all.zh_HK.UTF-8=UTF-8 locale.all.zh_TW.UTF-8=UTF-8 Notes: x-gb18030 or gb18030 are not defined now because 72525 is still open. Maybe Big5-HKSCS is also not defined.
Blocks: 60916
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reassign to bstell, cc to tao.
Assignee: nhotta → bstell
The fix of bug 54000 can cover the most cases for our new locales. We don't have to change unixcharset.properties for *.UTF-8 *.BIG5HK But th_TH.ISO8859-11 should be defined. locale.all.th_TH.ISO8859-11=TIS-620 I could not find any entry for ISO-8869-11. Is this correct? For GB18030, Brian@Sun, Ervin, could you try nl_langinfo(CODESET) in GB18030 locale and tell me the returned string?
The following is the nl_langinfo(CODESET) of the locales in Solaris: ko --- 5601 zh(zh_CN.EUC) --- gb2312 zh_TW(zh_TW.EUC) --- cns11643 zh.GBK(zh_CN.GBK) --- GBK zh_CN.GB18030 --- GB18030 zh_TW.BIG5 --- BIG5 zh_HK.BIG5HK --- Big5-HKSCS th(th_TH,th_TH.TIS610,th_TH.ISO-8859-11) --- TIS620.2533 Thanks. Brian.
Brian and Frank, I have checked the current implementation and here are my questions. I don't understand which file will need changes, charsetalias.properties or unixcharset.properties, or both. I'd like to know the policy. 1. get nl_langinfo(CODESET) 2. check the entry in charsetalias.properties, if OK return charset 3. check the entry in unixcharset.properties, if OK return charset 4. fallback to ISO8859-1 For example, zh_HK.BIG5HK entry is not unixcharset.properties now, but it returns Big5-HKSCS by 1 and 2. It seems OK on Solaris, however, what will happen when it runs on system which does not have nl_langinfo()? If we consider such system, should we define entry in unixcharset.properties? Also, for th_TH.ISO8859-11, nl_langinfo(CODESET) returns TIS620.2533. There is no entry in charsetalias.properties, also in unixcharset.properties. If we could add the following to charsetalias.properties, the charset returns by 2. tis620.2533=TIS-620 However, if we consider the system which does not have nl_langinfo(), I'm thinking we will need add the following to unixcharset.properties. locale.all.th_TH.ISO8859-11=TIS-620 Or, it will work when the entry above only in unixcharset.properties. So, my question is I want to know which file we should modify, unixcharset.properties or charsetalias.properties or both?
Katakai, In your debug build can you add a printf to the locale file to output the charset? This way you can see if the correct encoding is being used. http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUNIXCharset.cpp#239 238 #if HAVE_NL_LANGINFO && defined(CODESET) 239 nl_langinfo_codeset = nl_langinfo(CODESET); 240 NS_ASSERTION(nl_langinfo_codeset, "cannot get nl_langinfo(CODESET)"); + if (nl_langinfo_codeset) + printf("nl_langinfo(CODESET) = %s\n", nl_langinfo_codeset); + else + printf("nl_langinfo(CODESET) returned NULL\n");
re: what will happen when it runs on system which does not have nl_langinfo()? ============================================================================== The decision to use ns_langinfo is made at compile time on a per OS basis. I know that Linux, Solaris, HPUX, AIX will all use it. If the nl_langinfo returns an alternate name for the encoding and it is reasonable to add to charsetalias.properties we should. If it is not reasonable to add to charsetalias.properties then we will need to create a unixcharset.<OSARCH>.properties file to remap to a useable value. Only if a system's nl_langinfo is incomplete and does not return any value for a locale would we put it in the depreciated unixcharset.properties.
Thanks Brian, So, I understand GB18030 and TIS620.2533 definitions should be into charsetalias.properties. Any problem?
Mark this as moz0.9.2 P2
Priority: -- → P2
Target Milestone: --- → mozilla0.9.2
Attached the patch for charsetalias.properties, not unixcharset.properties. Frank, Brian, can you take a look the patch?
Changing QA contact to katakai@japan.sun.com.
QA Contact: andreasb → katakai
please do not add gb18030=GB18030 we should use lower case here since I already add some other code for that. reassign this bug to me and I will land both. the tis one is ok
Assignee: bstell → ftang
fixed and check in.
Status: NEW → ASSIGNED
QA contact to Ervin. I believe the latest nightly has the fix for TIS. Can you try?
QA Contact: katakai → eyan
mark it as fixed
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Target Milestone: mozilla0.9.2 → mozilla0.9.1
TIS charset now can be displayed OK in Mozilla nightly build 2001051310.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: