Closed Bug 910169 Opened 6 years ago Closed 6 years ago

Marathi localization has a bogus value for intl.charset.default; should be windows-1252

Categories

(Mozilla Localizations :: mr / Marathi, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hsivonen, Unassigned)

References

Details

(Whiteboard: [fixed by bug 910192])

Telemetry data indicates that it's unusually common for users of the mr  locale to invoke the character encoding override, which suggests a bad value for the intl.charset.default preference. https://bug906032.bugzilla.mozilla.org/attachment.cgi?id=796536

Indeed, the mr localization sets intl.charset.default to "UTF-8, ISO-8859-1" in https://mxr.mozilla.org/l10n-central/source/mr/toolkit/chrome/global/intl.properties

This is not a legal value for the preference. The preference takes a single encoding name.

In accordance with the guidance given in https://developer.mozilla.org/en-US/docs/Localizations_and_character_encodings and for consistency with other Devanagari localizations, the value should be windows-1252 (ISO-8859-1 is treated as an alias for windows-1252). Please delete the UTF-8 bit.
Is it possible to provide use-case for reproducing this bug ?


Thanks.
-Sandeep
What sort of use case do you mean? The preference is designed to take one label as the value. By code inspection, the localization tries to give it to labels as a value. That's clearly wrong.
Currently, mr localization [https://mxr.mozilla.org/l10n-mozilla-aurora/source/mr/toolkit/chrome/global/intl.properties] sets intl.charset.default to "UTF-8, ISO-8859-1". 

If intl.charset.default accepts only 'one' value, then I agree that present values for intl.charset.default are bogus.

'UTF-8' is dominant character encoding for Devanagari (code points: U+0900 to U+097F) script in India region.


'intl.charset.default=UTF-8' should suffice towards resolution of this bug. 

Fix would be available at http://hg.mozilla.org/releases/l10n/mozilla-aurora/mr/ .

Let me know your concerns...


Thanks.
-Sandeep
As per http://msdn.microsoft.com/en-us/goglobal/cc305145.aspx, windows-1252 doesn't provide support to Indic scripts. 

Example:
Devangari code points starts from 'U+0900' and ends at 'U+097F'
It's not about what encoding supports what scripts. It's about what encoding is the best fallback encoding. Does Internet Explorer in your region default to utf-8? Does Chrome? I'm doubtful.
hi-IN Firefox uses ISO-8859-1 (aka. windows-1252) as the fallback.

It's not about what's the dominant character encoding for Devanagari. It's about what's the most successful fallback for unlabeled (i.e. misauthored) content (which might well be in English).
(In reply to Henri Sivonen (:hsivonen) from comment #6)
> It's not about what's the dominant character encoding for Devanagari.

That is, this setting has nothing to do with properly-authored Devanagari content that declares its encoding as UTF-8.
From Comment 5,


+ Chrome 29.0.1547.59 @Android 4.1.1 and Android 4.1.2
\ + fallback charset is: ISO-8859-1

+ Internet Explorer 10.0.9200.16660 @Windows 8 Pro
\ + fallback charset is: windows-1252
+ http://mxr.mozilla.org/l10n-mozilla-aurora/search?string=intl.charset.default=ISO-8859-1&find=global/intl.properties
\ + Found 65 matching lines in 65 files

+ http://mxr.mozilla.org/l10n-mozilla-aurora/search?string=intl.charset.default=UTF-\8&find=global/intl.properties
\ + Found 20 matching lines in 20 files
\ + Discard results for 'mr' and 'te' (values being bogus and needs correction)

+ http://mxr.mozilla.org/l10n-mozilla-aurora/search?string=intl.charset.default=windows-1251&find=global/intl.properties
\ + Found 2 matching lines in 2 files

+ http://mxr.mozilla.org/l10n-mozilla-aurora/search?string=intl.charset.default=windows-1252&find=global/intl.properties
\ + No matching files


From above, locales have favoured hugely for following as default fallback charset:
+ ISO-8859-1 (alias for 'windows-1252')
+ UTF-8


I am highly in favor of 'UTF-8' and making it as default wherever possible. Helpful ?


Looking forward...


Thanks.
-Sandeep
No, utf-8 is a terrible fallback for the web as it exists. Please align us with Chrome and Internet Explorer.
For Indic languages we need UTF-8 as a default encoding only. Otherwise sometimes I need to manually choose encoding to UTF-8 to have correctly rendering text in Firefox browser
Is that true for Internet Explorer and Chrome? That seems weird.
CC'ing Axel and Arky for comments on this bug.
The default charset is not to read Marathi content, it's for reading legacy content in India. I suspect that the majority of that is actually going to be in some encoding that's good for English.
intl.charset.default is no more.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Whiteboard: [fixed by bug 910192]
You need to log in before you can comment on or make changes to this bug.