910169 - Marathi localization has a bogus value for intl.charset.default; should be windows-1252

Reporter

Description

•

12 years ago

Telemetry data indicates that it's unusually common for users of the mr locale to invoke the character encoding override, which suggests a bad value for the intl.charset.default preference. https://bug906032.bugzilla.mozilla.org/attachment.cgi?id=796536 Indeed, the mr localization sets intl.charset.default to "UTF-8, ISO-8859-1" in https://mxr.mozilla.org/l10n-central/source/mr/toolkit/chrome/global/intl.properties This is not a legal value for the preference. The preference takes a single encoding name. In accordance with the guidance given in https://developer.mozilla.org/en-US/docs/Localizations_and_character_encodings and for consistency with other Devanagari localizations, the value should be windows-1252 (ISO-8859-1 is treated as an alias for windows-1252). Please delete the UTF-8 bit.

sandeep shedmake

Comment 1

•

12 years ago

Is it possible to provide use-case for reproducing this bug ? Thanks. -Sandeep

Henri Sivonen (:hsivonen)

Reporter

Comment 2

•

12 years ago

What sort of use case do you mean? The preference is designed to take one label as the value. By code inspection, the localization tries to give it to labels as a value. That's clearly wrong.

sandeep shedmake

Comment 3

•

12 years ago

Currently, mr localization [https://mxr.mozilla.org/l10n-mozilla-aurora/source/mr/toolkit/chrome/global/intl.properties] sets intl.charset.default to "UTF-8, ISO-8859-1". If intl.charset.default accepts only 'one' value, then I agree that present values for intl.charset.default are bogus. 'UTF-8' is dominant character encoding for Devanagari (code points: U+0900 to U+097F) script in India region. 'intl.charset.default=UTF-8' should suffice towards resolution of this bug. Fix would be available at http://hg.mozilla.org/releases/l10n/mozilla-aurora/mr/ . Let me know your concerns... Thanks. -Sandeep

sandeep shedmake

Comment 4

•

12 years ago

As per http://msdn.microsoft.com/en-us/goglobal/cc305145.aspx, windows-1252 doesn't provide support to Indic scripts. Example: Devangari code points starts from 'U+0900' and ends at 'U+097F'

Anne (:annevk)

Comment 5

•

12 years ago

It's not about what encoding supports what scripts. It's about what encoding is the best fallback encoding. Does Internet Explorer in your region default to utf-8? Does Chrome? I'm doubtful.

Henri Sivonen (:hsivonen)

Reporter

Comment 6

•

12 years ago

hi-IN Firefox uses ISO-8859-1 (aka. windows-1252) as the fallback. It's not about what's the dominant character encoding for Devanagari. It's about what's the most successful fallback for unlabeled (i.e. misauthored) content (which might well be in English).

Henri Sivonen (:hsivonen)

Reporter

Comment 7

•

12 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #6) > It's not about what's the dominant character encoding for Devanagari. That is, this setting has nothing to do with properly-authored Devanagari content that declares its encoding as UTF-8.

sandeep shedmake

Comment 8

•

12 years ago

From Comment 5, + Chrome 29.0.1547.59 @Android 4.1.1 and Android 4.1.2 \ + fallback charset is: ISO-8859-1 + Internet Explorer 10.0.9200.16660 @Windows 8 Pro \ + fallback charset is: windows-1252

sandeep shedmake

Comment 9

•

12 years ago

+ http://mxr.mozilla.org/l10n-mozilla-aurora/search?string=intl.charset.default=ISO-8859-1&find=global/intl.properties \ + Found 65 matching lines in 65 files + http://mxr.mozilla.org/l10n-mozilla-aurora/search?string=intl.charset.default=UTF-\8&find=global/intl.properties \ + Found 20 matching lines in 20 files \ + Discard results for 'mr' and 'te' (values being bogus and needs correction) + http://mxr.mozilla.org/l10n-mozilla-aurora/search?string=intl.charset.default=windows-1251&find=global/intl.properties \ + Found 2 matching lines in 2 files + http://mxr.mozilla.org/l10n-mozilla-aurora/search?string=intl.charset.default=windows-1252&find=global/intl.properties \ + No matching files From above, locales have favoured hugely for following as default fallback charset: + ISO-8859-1 (alias for 'windows-1252') + UTF-8 I am highly in favor of 'UTF-8' and making it as default wherever possible. Helpful ? Looking forward... Thanks. -Sandeep

Anne (:annevk)

Comment 10

•

12 years ago

No, utf-8 is a terrible fallback for the web as it exists. Please align us with Chrome and Internet Explorer.

Parag Nemade

Comment 11

•

12 years ago

For Indic languages we need UTF-8 as a default encoding only. Otherwise sometimes I need to manually choose encoding to UTF-8 to have correctly rendering text in Firefox browser

Anne (:annevk)

Comment 12

•

12 years ago

Is that true for Internet Explorer and Chrome? That seems weird.

sandeep shedmake

Comment 13

•

12 years ago

CC'ing Axel and Arky for comments on this bug.

Axel Hecht [:Pike]

Comment 14

•

12 years ago

The default charset is not to read Marathi content, it's for reading legacy content in India. I suspect that the majority of that is actually going to be in some encoding that's good for English.

Henri Sivonen (:hsivonen)

Reporter

Comment 15

•

11 years ago

intl.charset.default is no more.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Whiteboard: [fixed by bug 910192]

Bugzilla

Marathi localization has a bogus value for intl.charset.default; should be windows-1252

Categories

(Mozilla Localizations :: mr / Marathi, defect)

Tracking

(Not tracked)

People

(Reporter: hsivonen, Unassigned)

References

Details

(Whiteboard: [fixed by bug 910192])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15