Last Comment Bug 723609 - Improve localization notes in intl.properties
: Improve localization notes in intl.properties
Status: RESOLVED FIXED
[bcp47]
:
Product: Core
Classification: Components
Component: Localization (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla16
Assigned To: Gordon P. Hemsley [:GPHemsley]
:
Mentors:
http://hg.mozilla.org/mozilla-central...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-02-02 10:26 PST by Gordon P. Hemsley [:GPHemsley]
Modified: 2012-07-03 16:07 PDT (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Clarify instructions for localizing intl.accept_languages (v1) (3.92 KB, patch)
2012-06-15 09:19 PDT, Gordon P. Hemsley [:GPHemsley]
no flags Details | Diff | Review
Clarify instructions for localizing intl.accept_languages (v2) (3.93 KB, patch)
2012-06-15 09:23 PDT, Gordon P. Hemsley [:GPHemsley]
no flags Details | Diff | Review
Clarify instructions for localizing intl.accept_languages (v3) (3.88 KB, patch)
2012-06-20 09:47 PDT, Gordon P. Hemsley [:GPHemsley]
jbeatty: review+
l10n: review-
Details | Diff | Review
Improve localization notes in intl.properties (v1) (5.57 KB, patch)
2012-06-20 11:59 PDT, Gordon P. Hemsley [:GPHemsley]
l10n: review+
jbeatty: feedback+
Details | Diff | Review

Description Gordon P. Hemsley [:GPHemsley] 2012-02-02 10:26:26 PST
The localization note for intl.accept_languages currently says:
# Localization Note: Add the code for your language at the front of this entry,
# leaving "en-us, en" for fallback. It's recommended to use the same form, e.g.
# "ja-jp, ja, en-us, en"

These instructions are misleading and have caused many localizers to localize the value of intl.accept_languages incorrectly.

I recommend changing it to something more along the lines of this:
# Localization Note:
# This is a comma-separated list of valid BCP 47 language tags.
#
# It should begin with the code representing your locale. (If your locale code
# does not include a region subtag, the language tag representing your
# language should not include one, either.) Following that, you should include
# language tags for other languages that users of your locale might be
# expected to speak, so that their browsing experience degrades gracefully if
# content is not available in their native language.
#
# For backwards compatibility, it is recommended that "en-US, en" be included
# at the end of your list. However, if you know that users of your locale
# might prefer a different variety of English, or they are not likely to
# understand English at all, you may opt to include a different English
# fallback or exclude English altogether.
#
# For example, the Breton [br] locale might consider including French and
# British English as a fallback, since those languages are commonly spoken in
# the same area:
# "br, fr-FR, fr, en-GB, en"

We can argue the specifics of the wording, but I think the gist of my proposal is clear: These instructions need to be much more verbose in describing what is expected of the localizer.
Comment 1 Axel Hecht [:Pike] 2012-02-04 09:06:11 PST
I'm not convinced about the "might be expected to speak" myself, I think this may want to be a bit more majority-focused. 80/20 rule, IMHO.

CCing Jeff for input on the text, and possibly just adding an MDN doc on this and adding a link?
Comment 2 Gordon P. Hemsley [:GPHemsley] 2012-02-05 09:35:17 PST
(In reply to Axel Hecht [:Pike] from comment #1)
> I'm not convinced about the "might be expected to speak" myself, I think
> this may want to be a bit more majority-focused. 80/20 rule, IMHO.

Well, it's intended to be referring to other languages that the *majority* of the users of the locale might be expected to speak. It is up to each locale (with guidance from me, I suppose) to determine what language tags satisfy that requirement for that locale. Perhaps the text could make that more clear.

With regard to the example, speakers of Breton generally live in the Brittany area of France. Outside of Breton, the next most common language they are likely to come across is French, and specifically the French spoken in France. So it's likely that they would at least have an exposure to French, if they are not already fluent in it. So, fr-RF > fr. After that, if they have contact with English, it's most likely going to be the English as spoken in the United Kingdom, rather than the United States or anywhere else. Beyond that, if British English isn't available, they could probably get by with any other dialect of English, so there's the generic 'en' as the last fallback.

> CCing Jeff for input on the text, and possibly just adding an MDN doc on
> this and adding a link?

I am definitely not against having an MDN doc. In fact, we should probably have one regardless of what text winds up in the file itself.
Comment 3 Axel Hecht [:Pike] 2012-02-05 09:41:29 PST
Gordon, I think your example here is good. I disagree with your definition of accept-language, though. Quoting from bug 662746 comment 18, you suggested my, ybd, rki, kac, ksw, kjp, shn, rhg, prk. That's not a reasonable default value for any but a handful of people speaking Burmese.
Comment 4 Gordon P. Hemsley [:GPHemsley] 2012-02-05 10:58:40 PST
(In reply to Axel Hecht [:Pike] from comment #3)
> Gordon, I think your example here is good. I disagree with your definition
> of accept-language, though. Quoting from bug 662746 comment 18, you
> suggested my, ybd, rki, kac, ksw, kjp, shn, rhg, prk. That's not a
> reasonable default value for any but a handful of people speaking Burmese.

Well, my suggestions are never intended to be the end-all, be-all value. They are proposals based on an assessment of the available linguistic data. They are supposed to begin discussions with the localizers, who actually know their target audience, about what the real world usage might be.

My analysis is based on a number of factors, including the genetic relationship (i.e. similarity) of various languages. The idea is to offer up an order of fallbacks such that content comprehension would degrade gracefully for users of that locale. It is by no means intended to assert that a user of the Burmese locale is a fluent speaker of every language listed in Accept-Languages. It is supposed to be help a content provider say "well, we don't have content in your preferred language, but we do have content in a similar language that you might be able to understand".
Comment 5 Jeff Beatty [:gueroJeff] 2012-02-22 11:54:36 PST
I don't think this needs a wiki page. A localization note should suffice. 

I also think that note is too wordy and contains some terminology that some localizers may not be familiar with in English. Below is my input:

# Localization Note:
# This is a comma-separated list of valid BCP 47 language tags.
#
# The list begins with your locale code and region subtag. If your locale code
# does not include a region subtag, do not add one. After your locale code, you can # include language tags for other popular languages that your locale's users may 
# speak. This way their browsing experience degrades gracefully if content is 
# unavailable in their native language.
#
# We recommend that "en-US, en" be included at the end of your list as a 
# default language setting. If your locale's users prefer a different variety of 
# English, or do not understand English at all, you may include a different 
# locale code.
#
# For example, the Breton [br] locale might consider including French and
# British English, since those languages are commonly spoken in the region:
# "br, fr-FR, fr, en-GB, en"
Comment 6 Gordon P. Hemsley [:GPHemsley] 2012-02-22 13:27:04 PST
(In reply to jbeatty from comment #5)
> I don't think this needs a wiki page. A localization note should suffice. 

I like the idea of a wiki page to expand this in more detail, should localizers need more guidance. However, I agree that a localization note should suffice for most situations.

> I also think that note is too wordy and contains some terminology that some
> localizers may not be familiar with in English. Below is my input:

Ah, I see. Didn't care for "fallback", eh? :)

> # Localization Note:
> # This is a comma-separated list of valid BCP 47 language tags.
> #
> # The list begins with your locale code and region subtag. If your locale
> code
> # does not include a region subtag, do not add one. After your locale code,
> you can # include language tags for other popular languages that your
> locale's users may 
> # speak. This way their browsing experience degrades gracefully if content
> is 
> # unavailable in their native language.

To be clear on the terminology:

* "locale code" means whatever code Mozilla uses to represent the language. This is usually a full language tag, but sometimes it isn't (e.g. 'jp-JP-mac'). I used this term so as to not confuse "language tag" with "language subtag".
* "language tag" is the full representation of a language/dialect/variety/whatever, as specified by BCP 47.
* "language subtag" is specifically the 2- or 3-char part of the language tag that comes first in any non-private-use language tag (roughly equivalent to an ISO 639 code), as specified by BCP 47.
* "region subtag" is specifically the 2-letter or 3-digit code that narrows the use of the language subtag to a specific region. It usually comes immediately after the language subtag, but the two may optionally be separated by a script subtag.
* "script subtag" is the 4-char code that specifies what script is being used to write the given language. It is not yet supported by any Gecko mechanism, AFAIK, and should probably not be mentioned here. (Not that I'm saying it is.)

> # We recommend that "en-US, en" be included at the end of your list as a 
> # default language setting. If your locale's users prefer a different
> variety of 
> # English, or do not understand English at all, you may include a different 
> # locale code.

I don't think that last part should be included. This section is talking about the less-relevant fallback option of English or a reasonable facsimile thereof. They should have already listed their "different locale codes" by the time they get to making the decision about whether to heed our recommendation.

The decision they're making at this step is whether to:
(1) use the recommended "en-US, en"
(2) use something like "en-GB, en" (or even simply "en")
(3) exclude English altogether.

This rewording doesn't reflect that three-way choice, IMO.

> # For example, the Breton [br] locale might consider including French and
> # British English, since those languages are commonly spoken in the region:
> # "br, fr-FR, fr, en-GB, en"

WFM.
Comment 7 Jeff Beatty [:gueroJeff] 2012-02-28 10:26:37 PST
(In reply to Gordon P. Hemsley [:gphemsley] from comment #6)
> (In reply to jbeatty from comment #5)
> > I don't think this needs a wiki page. A localization note should suffice. 
> 
> I like the idea of a wiki page to expand this in more detail, should
> localizers need more guidance. However, I agree that a localization note
> should suffice for most situations.

Maybe instead of a dedicated wiki page, it can be included as part of a troubleshooting page?
> 
> > I also think that note is too wordy and contains some terminology that some
> > localizers may not be familiar with in English. Below is my input:
> 
> Ah, I see. Didn't care for "fallback", eh? :)

Haha, just figured that "default" was a term most ESL speakers would be familiar with :-)
> 
# Localization Note:
# This is a comma-separated list of valid BCP 47 language tags.
#
# The list begins with your locale code with region subtag. If your locale code
# does not include a region subtag, do not add one. After creating a langauge tag 
# for your locale code, you can include tags for other popular languages that your
# locale's users may speak. This way their browsing experience degrades gracefully # if content is unavailable in their native language.
 
> To be clear on the terminology:
> 
> * "locale code" means whatever code Mozilla uses to represent the language.
> This is usually a full language tag, but sometimes it isn't (e.g.
> 'jp-JP-mac'). I used this term so as to not confuse "language tag" with
> "language subtag".
> * "language tag" is the full representation of a
> language/dialect/variety/whatever, as specified by BCP 47.
> * "language subtag" is specifically the 2- or 3-char part of the language
> tag that comes first in any non-private-use language tag (roughly equivalent
> to an ISO 639 code), as specified by BCP 47.
> * "region subtag" is specifically the 2-letter or 3-digit code that narrows
> the use of the language subtag to a specific region. It usually comes
> immediately after the language subtag, but the two may optionally be
> separated by a script subtag.
> * "script subtag" is the 4-char code that specifies what script is being
> used to write the given language. It is not yet supported by any Gecko
> mechanism, AFAIK, and should probably not be mentioned here. (Not that I'm
> saying it is.)
> 
# We recommend that the "en-US, en" tag be included at the end of your list as a 
# last resource language setting. If your locale's users prefer a different variety 
# of English, or do not understand English at all, you may include a different 
# locale code as the last resource instead.
 
> I don't think that last part should be included. This section is talking
> about the less-relevant fallback option of English or a reasonable facsimile
> thereof. They should have already listed their "different locale codes" by
> the time they get to making the decision about whether to heed our
> recommendation.
> 
> The decision they're making at this step is whether to:
> (1) use the recommended "en-US, en"
> (2) use something like "en-GB, en" (or even simply "en")
> (3) exclude English altogether.
> 
> This rewording doesn't reflect that three-way choice, IMO.

The rewording references the recommended option as well as the option to choose another variety of English (like "en-GB,en"), or a different language tag. Aren't those the three options? I added another word or two. Does that make the three-way choice clearer?

> 
# For example, the Breton [br] locale might consider including French and
# British English, since those languages are commonly spoken in the region:
# "br, fr-FR, fr, en-GB, en"
> 
> WFM.
Comment 8 Gordon P. Hemsley [:GPHemsley] 2012-06-15 09:19:33 PDT
Created attachment 633553 [details] [diff] [review]
Clarify instructions for localizing intl.accept_languages (v1)

I just realized that nothing ever became of this because there was never a patch to check in.

I've taken into account the discussion that we've had, but I've used my original proposal text as a template, as I feel the language I used is more exact. If you still feel that the language I've used might be unclear to non-native English speakers, I think that is all the more reason to create a wiki page with more information. (However, I don't think it should be demoted it to a mere "troubleshooting" page, as I don't think having more information is a bad thing.)
Comment 9 Gordon P. Hemsley [:GPHemsley] 2012-06-15 09:23:04 PDT
Created attachment 633556 [details] [diff] [review]
Clarify instructions for localizing intl.accept_languages (v2)

Added "most" to clarify that this is intended to address the majority of users of a locale. (Apologies for the churn.)
Comment 10 Jeff Beatty [:gueroJeff] 2012-06-19 14:28:54 PDT
(In reply to Gordon P. Hemsley [:gphemsley] from comment #8)
> Created attachment 633553 [details] [diff] [review]
> Clarify instructions for localizing intl.accept_languages (v1)
> 
> I just realized that nothing ever became of this because there was never a
> patch to check in.
> 
> I've taken into account the discussion that we've had, but I've used my
> original proposal text as a template, as I feel the language I used is more
> exact. If you still feel that the language I've used might be unclear to
> non-native English speakers, I think that is all the more reason to create a
> wiki page with more information. (However, I don't think it should be
> demoted it to a mere "troubleshooting" page, as I don't think having more
> information is a bad thing.)

I still think it's too wordy and jargon-y for non-native English speakers, which only adds to the already high learning curve set for these contributors. 

The idea of more information equals better information isn't always true. Describing instructions in a simple and brief manner is usually more beneficial for audiences than the opposite. Especially when those audiences have varying levels of English language proficiency. This is why I feel like this instruction should be a simplified, brief l10n note and not another page clogging up the wiki.

At the very least, replacing all instances of "you should" with imperative language, replacing all passive voice with active voice, and removing the term "backwards compatibility" will result in a simpler, more brief, and understandable note.
Comment 11 Gordon P. Hemsley [:GPHemsley] 2012-06-20 09:47:47 PDT
Created attachment 634954 [details] [diff] [review]
Clarify instructions for localizing intl.accept_languages (v3)

(In reply to jbeatty from comment #10)
> I still think it's too wordy and jargon-y for non-native English speakers,
> which only adds to the already high learning curve set for these
> contributors. 
> 
> The idea of more information equals better information isn't always true.
> Describing instructions in a simple and brief manner is usually more
> beneficial for audiences than the opposite. Especially when those audiences
> have varying levels of English language proficiency. This is why I feel like
> this instruction should be a simplified, brief l10n note and not another
> page clogging up the wiki.

I think this note has already been simplified to the bare minimum of information necessary to properly set the default value of the 'intl.accept_languages' preference. Removing information at this point would not do enough to improve the status quo.

I understand that we have to consider the localizers who have a low proficiency in English, but I don't think we should do so at the expense of localizers who have a high proficiency. (Without seeing hard stats, I believe there are a great number of localizers who are even native speakers of English.) Those who cannot understand the instructions will simply ignore them—which would be no worse than the current situation where they are given little instruction at all.

As an aside, I don't see why you consider adding another page to the wiki as "clogging" it. IMO, there should be wiki pages describing all of the preferences used by the codebase; it may be that many don't require more than a few sentences of explanation, but I don't see why it would be such a big deal if there were preferences that required pages unto themselves. Understand there are at least three different standards that come into play when setting the 'intl.accept_languages' preference. I don't see why it would be a bad thing to reduce those three standards to a single page to explain only the details pertinent to the value of the preference.

> At the very least, replacing all instances of "you should" with imperative
> language, replacing all passive voice with active voice, and removing the
> term "backwards compatibility" will result in a simpler, more brief, and
> understandable note.

With all that being said, I have attempted to satisfy your desire for succinctness. Despite "you should" being a form of imperative, and most of the passive agents being the localizer themselves, I have made the changes you have requested. (One caveat: I left in "it is recommended" because I didn't feel comfortable saying "we recommend".)
Comment 12 Jeff Beatty [:gueroJeff] 2012-06-20 09:57:21 PDT
Comment on attachment 634954 [details] [diff] [review]
Clarify instructions for localizing intl.accept_languages (v3)

Gordon, you're totally right, what I meant to say was replace it with more direct form of imperative.

I'm very pleased. Thank you for taking the time and taking my comments into consideration :-)

I'm very happy to discuss target audiences and the information models for l10n on the wikis with you further, but not on this bug. Shoot me an email sometime and we'll talk :-)
Comment 13 Axel Hecht [:Pike] 2012-06-20 10:36:21 PDT
Comment on attachment 634954 [details] [diff] [review]
Clarify instructions for localizing intl.accept_languages (v3)

Review of attachment 634954 [details] [diff] [review]:
-----------------------------------------------------------------

Technicalities make this an r- from me.

This patch changes lines that are factually wrong, even if just whitespace.

We can either make this bug cover "make comments in intl.properties useful", or file a follow up, but then your patch shouldn't change lines.

"Useful" would be:

Group all charset entries together, make the localization note reference the entities (for most tooling stability).
Update the charset comment to reference the mxr url, http://mxr.mozilla.org/mozilla-central/source/intl/locale/src/charsetalias.properties, and clarify that the charsets you should stick to are actually the values in that file, and not the keys. At least that's how I guess this is meant.

Group locale code entries together, and also add a comment for general.useragent.locale (I've seen that being translated).

Add a line of whitespace between those entries that belong together.

Remove "all.js".

Again, either un-change the whitespace in the bad comment and follow-up bug, or just fix all the file to be easier to deal with throughout.

::: toolkit/locales/en-US/chrome/global/intl.properties
@@ +5,5 @@
>  # all.js
>  #
> +# Localization Note: Cases of charset names must be preserved. If you're
> +# adding charsets to your localized version, please refer to
> +# intl/uconv/src/charsetalias.properties file for the list of canonical

Please don't change lines you're not changing. Just use an editor that doesn't outsmart humans.

Also, if it does outsmart humans, it should recognize that this comment is wrong. The file is at http://mxr.mozilla.org/mozilla-central/source/intl/locale/src/charsetalias.properties (s/uconv/locale/).
Comment 14 Gordon P. Hemsley [:GPHemsley] 2012-06-20 11:59:49 PDT
Created attachment 634998 [details] [diff] [review]
Improve localization notes in intl.properties (v1)

I think this improves the localization notes and structure of the file as you requested.
Comment 15 Jeff Beatty [:gueroJeff] 2012-07-02 06:35:49 PDT
Comment on attachment 634998 [details] [diff] [review]
Improve localization notes in intl.properties (v1)

Sorry for the delay. Looks good to me. Thank you!
Comment 16 Axel Hecht [:Pike] 2012-07-03 06:46:02 PDT
Comment on attachment 634998 [details] [diff] [review]
Improve localization notes in intl.properties (v1)

Review of attachment 634998 [details] [diff] [review]:
-----------------------------------------------------------------

Sweet, thanks, r=me.
Comment 17 Daniel Holbert [:dholbert] (largely AFK until June 28) 2012-07-03 09:04:25 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/1757e2d83cd1
Comment 18 Ryan VanderMeulen [:RyanVM] 2012-07-03 16:07:30 PDT
https://hg.mozilla.org/mozilla-central/rev/1757e2d83cd1

Note You need to log in before you can comment on or make changes to this bug.