Closed Bug 844042 Opened 12 years ago Closed 11 years ago

intl.properties should provide better localization advice for charset-related settings

Categories

(Core :: Internationalization: Localization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: hsivonen, Assigned: hsivonen)

References

Details

(Whiteboard: [fixed by bug 943252])

Attachments

(1 file, 2 obsolete files)

Currently, the localization notes in intl.properties don't explain the semantics of some of the preferences properly and don't give proper advice on what to do with them. This is especially a problem for the detector setting and the fallback encoding.
Attached patch Better advice (obsolete) — Splinter Review
Attachment #717055 - Flags: review?(l10n)
Summary: intl.properties should provide better localization advice → intl.properties should provide better localization advice for charset-related settings
Just giving things a quick first look, I'm not sure that specific locales should be called out in the localization note. That has a tendency to stick around for a long time, so it should be reserved for things that are not likely to change. (To me, anything locale-specific is likely to change; only things globally inherent to the setting should be in the localization note, IMO.)
Non-empty detector defaults make sense for 4 locales: Russian, Ukranian, Japanese and maybe for Traditional Chinese to the extent users also read a lot of unlabeled Simplified Chinese. This changes so rarely that there's no reason not to say this in a localization note. I'll revise the patch with more obvious text.
Comment on attachment 717055 [details] [diff] [review] Better advice >+# declare its encoding as UTF-8. If in doubt, specify ISO-8859-1. > intl.charset.default=ISO-8859-1 > intl.charsetmenu.browser.static=ISO-8859-1, UTF-8 windows-1252.
Blocks: 805374
(In reply to Masatoshi Kimura [:emk] from comment #4) > Comment on attachment 717055 [details] [diff] [review] > Better advice > > >+# declare its encoding as UTF-8. If in doubt, specify ISO-8859-1. > > intl.charset.default=ISO-8859-1 > > intl.charsetmenu.browser.static=ISO-8859-1, UTF-8 > > windows-1252. I think we shouldn't ask localizations to use the windows-1252 label before we migrate en-US to windows-1252.
Attachment #717055 - Attachment is obsolete: true
Attachment #717055 - Flags: review?(l10n)
Attachment #717829 - Flags: review?(l10n)
Comment on attachment 717829 [details] [diff] [review] Better advice, assumes that bug 844776 lands first Review of attachment 717829 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/locales/en-US/chrome/global/intl.properties @@ +47,5 @@ > +# ru: ruprob > +# uk: ukprob > +# ja: ja_parallel_state_machine > +# ja-JP-Mac: ja_parallel_state_machine > +# zh-TW: zh_parallel_state_machine I still don't think this is a good idea. If you already know what the values should be in those locales, why not just edit those locales straightaway? And if no new locale will need to (or shouldn't) set this, perhaps intl.properties is no longer the right place for this? @@ +54,5 @@ > +# LOCALIZATION NOTE (intl.charset.default): > +# This preference controls the fallback encoding used for decoding text/html > +# and text/plain content that does not specify its encoding. This preference > +# should be set to the encoding that the users of the locale are most likely > +# to encounter as the encoding of *unlabeled* *legacy* content *on the Web*. Do localization notes support formatting like this? If not, perhaps it would be better to rephrase such that the formatting isn't necessary to get across the importance of this information. @@ +55,5 @@ > +# This preference controls the fallback encoding used for decoding text/html > +# and text/plain content that does not specify its encoding. This preference > +# should be set to the encoding that the users of the locale are most likely > +# to encounter as the encoding of *unlabeled* *legacy* content *on the Web*. > +# This is most likely the fallback encoding that that Internet Explorer uses Double "that". @@ +62,5 @@ > +# is meant for legacy content, specifying UTF-8 is most likely wrong, since > +# newly-authored UTF-8 content is supposed to declare its encoding as UTF-8. > +# If in doubt, specify windows-1252. > +# The value must be a canonical name. Canonical names are the ones that > +# occur to the right of the = sign in I don't like how this is phrased. It's either "=" or "equals sign". Also, I don't feel like URLs should be treated as just another word in the sentence. That's why I used a colon to introduce it in the original; this should probably do the same. @@ +71,5 @@ > +# This preference controls which encodings are most easily available from > +# the character encoding menu. Include at least the value of > +# intl.charset.default and UTF-8 here. > +# The names must be canonical names. Canonical names are the ones that > +# occur to the right of the = sign in Same as above.
Yes, we should eventually move this out of preferences, but lets do that in baby steps.
Comment on attachment 717829 [details] [diff] [review] Better advice, assumes that bug 844776 lands first I've just run across a counter-example, 'sah' is a minority language in Russia. You get the idea. I think the comments in the code here fail in concept, though. Many tools show only a small portion of screen for comments, and the storylines here just outgrow that. I'd suggest to actually create documentation for this on MDN instead, and reference that from the file. I'm all for baby-steps, but having 100 people read the comment here isn't a baby step, but instead a rather massive undertaking. Also, I think we should be smarter on the intl.charset.detector all around. Things we could do: Inspect accept-lang for any languages for which we recommend a charset detector, and use them. That probably wants a rewrite on how we do charset detectors. I do think that it's much closer to what we should do, though. We may also want to back up decisions here on telemetry data, so that we get current data on what legacy content we're actually dealing with. http://mxr.mozilla.org/l10n-mozilla-aurora/search?string=intl.charset.det&find=intl.properties$ shows that we're currently only having a single locale with an actual problem, AFAICT.
Attachment #717829 - Flags: review?(l10n) → review-
(In reply to Axel Hecht [:Pike] from comment #9) > Comment on attachment 717829 [details] [diff] [review] > Better advice, assumes that bug 844776 lands first > > I've just run across a counter-example, 'sah' is a minority language in > Russia. You get the idea. What's this a counter example of? Do you mean a sah localization should or should not turn on the Russian detector? > I'd suggest to actually create documentation for this on MDN instead, and > reference that from the file. OK. > Also, I think we should be smarter on the intl.charset.detector all around. What do you mean? (The best way to be smart on this front is to remove the "universal" detector, which isn't really universal, from the codebase so that it's stops attracting attention. Turning on an "universal" solution seems attractive on its face. Too bad it's neither universal nor a good solution. Bug 844115.) > We may also want to back up decisions here on telemetry data, so that we get > current data on what legacy content we're actually dealing with. I wanted to get telemetry data, but I got privacy pushback, because the user population for some localizations is so small that results may become personally identifying. A reasonable heuristics is: Does IE have more market share than Firefox in this locale? If yes, what fallback does IE use? (Since sites are likely to rely on IE) > http://mxr.mozilla.org/l10n-mozilla-aurora/search?string=intl.charset. > det&find=intl.properties$ shows that we're currently only having a single > locale with an actual problem, AFAICT. We currently have a detector problem only with zh-TW. We previously had one with Swedish (really!), too, but not anymore, because I got it fixed. I'd prefer to give localizers sensible advice instead of hunting down errors many releases after the fact.
Comment on attachment 718917 [details] [diff] [review] Point to MDN, assumes that bug 844776 lands first Review of attachment 718917 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/locales/en-US/chrome/global/intl.properties @@ +49,4 @@ > intl.charset.default=windows-1252 > + > +# LOCALIZATION NOTE (intl.charsetmenu.browser.static): > +# Please see https://developer.mozilla.org/en-US/docs/Localizations_and_character_encodings Since these three l10n notes are now all the same, you should be able to maintain the single combined note, as in the current version. @@ +53,4 @@ > intl.charsetmenu.browser.static=windows-1252, UTF-8 > + > +# LOCALIZATION NOTE (intl.charsetmenu.mailedit): > +# Don't localize. Does this mean "don't change"? Or just "don't translate"?
(In reply to Gordon P. Hemsley [:gphemsley] from comment #12) > Comment on attachment 718917 [details] [diff] [review] > Point to MDN, assumes that bug 844776 lands first > > Review of attachment 718917 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: toolkit/locales/en-US/chrome/global/intl.properties > @@ +49,4 @@ > > intl.charset.default=windows-1252 > > + > > +# LOCALIZATION NOTE (intl.charsetmenu.browser.static): > > +# Please see https://developer.mozilla.org/en-US/docs/Localizations_and_character_encodings > > Since these three l10n notes are now all the same, you should be able to > maintain the single combined note, as in the current version. Do the localization tool guarantee for sure that a combined note is shown in the context of each localizable string? Are such guarantees so absolutely sure that it's not worthwhile to have per-key localization notes? > @@ +53,4 @@ > > intl.charsetmenu.browser.static=windows-1252, UTF-8 > > + > > +# LOCALIZATION NOTE (intl.charsetmenu.mailedit): > > +# Don't localize. > > Does this mean "don't change"? Or just "don't translate"? Don't change. Bug 844045.
Comment on attachment 718917 [details] [diff] [review] Point to MDN, assumes that bug 844776 lands first Review of attachment 718917 [details] [diff] [review]: ----------------------------------------------------------------- This has been obsoleted by bug 943252, sorry for the lag. Given how the comment looks today, I think that only the minority language piece is missing?
Attachment #718917 - Flags: review?(l10n)
(In reply to Axel Hecht [:Pike] from comment #14) > Given how the comment looks today, I think that only the minority language > piece is missing? Yes. Let's call this close enough to FIXED.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [fixed by bug 943252]
Target Milestone: --- → mozilla32
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: