Closed Bug 178491 Opened 22 years ago Closed 18 years ago

Bring languageNames.properties up to date with IANA registry

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: smontagu, Assigned: smontagu)

References

()

Details

(Keywords: intl)

Attachments

(2 files, 2 obsolete files)

This is split off from bug 167908.
Patch by Malcom Rowe (bugzilla2@farside.demon.co.uk). Comments copied from bug
167908 comment 2:

This patch updates languageNames.properties to be consistent with the latest
updates to ISO 639-2, and also places the file back into language-code order.

To enable Frisian in the dialog, we will also have to add an entry to
intl/locale/src/language.properties, once we know what country/countries it
should be placed in.

The new file is from ISO 639-2, taken from
1. http://www.loc.gov/standards/iso639-2/langcodes.html, plus
2. The addition of the two extra codes (ast, x-kok) which were already in
place, and the renaming of Greek, Modern to Greek (as was already done), plus
3. The re-addition of the following deprecated codes:
   in = Indonesian (deprecated 1989 in favour of id)
   ji = Yiddish (deprecated 1989 in favour of yi)
   sh = Serbo-Croatian (deprecated 2000)
(see http://www.loc.gov/standards/iso639-2/codechanges.html)

Changes from our current languageNames.properties:
0. In country-code order.
1. Many spelling/name changes, notably the following names:
   Bhutani -> Dzongkha
   Farsi -> Persian
   Scots Gaelic -> Gaelic
   Cambodian -> Khmer
   Greenlandic -> Kalaallisut
2. Addition of many codes, including Frisian.
3. Javanese changed from jw to jv - a known errata in ISO 639:1988.
4. Removal of:
   sb = Sorbian
   sx = Sutu

I can find no reference to sb or sx ever being valid ISO 639 codes.
Blocks: 167908
Keywords: intl
>  sb = Sorbian
>  sx = Sutu

These two are used in Microsoft products:

http://msdn.microsoft.com/workshop/author/dhtml/reference/language_codes.asp

It seems that at the time when MS adopted these, there were no ISO-639-1
abbreviations nor ISO-639-2 ones. Sorbian (Upper & Lower) now
has 3-letter code (wen). Sutu still does to have any representation
in ISO-639-1/2. Sutu is a variant name for one of Southern Sotho 
languages:

http://www.ethnologue.com/show_language.asp?code=SSO

Apparently MS thought it important to use this 2-letter 
abbreviation for Sutu until it is established. There are
a few precedents of Netscape doing something similar before.

Question: Is this a proposal to add to the current visible
          list (through the UI dialog) **all** the ISO-639-1/2
          languages?
          Or are we completing the list but only turning on the
          flag for the ones which are needed?

We have been taking the latter approach up to now because the entire list
will the list too long.

I agree that we should continue with the latter approach. The patch does not
include any new three letter language codes, but includes all two letter codes
not in the current list. Adding all the three letter codes with no two letter
equivalent would make the list much longer, and I suggest we continue only
adding them when someone specifically requests them.
My comments on the patch:

There is now one more new two letter code in
http://www.loc.gov/standards/iso639-2/codechanges.html: ii Sichuan Yi

The name for "ho" should be Hiri Motu.

Why do we want to retain the deprecated codes "sh" "ji" and "in"?

Konkani has a standard code "kok", which we should probably use instead of "x-kok".

We should try to investigate whether the name changes are acceptable in the
field. As Malcolm points out in the original bug, we have already rejected the
change from "Galician" to "Gallegan", see bug 127946 comment 7. 
We can add back: 

wen = sorbian

if we are willing to start adding 3-letter code from ISO-6639-2. 
Currently I don't see any use of 3-letter code but our code for
handling accept-language headers are designed to take these as well
and so it shoudl present no problem in that regard.

As for sutu, we cam split the original reference to:

nso = Sotho, Northern 
st = Sotho, Southern

each having more 3.5 million speakers in South Africa.
I have posted in the netscape.public.mozilla.i18n newsgroup requesting feedback
on the name changes. I have already received a comment offlist that Punjabi is
correct, not Panjabi.

The names in the standard derive from the Library of Congress Subject Headings,
and we should not expect our needs and priorities to be identical with those of
the Library of Congress.
Summary: Bring language.properties up to date with iso639-2 → Bring languageNames.properties up to date with iso639-2
With regard to cmment #7, very often, English names for languages 
have variants. Both Panjabi and Punjabi are known variant names for 
the same language. In our list, we normally list only one name and 
that shoud be the "preferred" name. United Nations for example
recognizes both names:

http://www.unhchr.ch/udhr/navigate/alpha.htm#P

(Was the person who wrtoe to smontagu a Pakistani or Indian? That could
also be a factor in the preferred name.)
I don't mind not changing this name to Panjabi given this state of
affairs -- it was there in the code before and we may not change
it unless there is a compelling reason to.
 
BTW, I don't believe ISO-639-1/2 lang names are based on LofC 
names. They are based on submissions from requesters with reasons
provided for preferring one name over others if variants are
submitted.
Status: NEW → ASSIGNED
My correspondent was from India, and "Punjabi" does seem to be the
transliteration used by Punjabis. The official government sites of the Punjab in
India and Pakistan are http://www.punjab.gov.in/ and http://www.punjab.gov.pk/
*** Bug 209591 has been marked as a duplicate of this bug. ***
Ok, here's another attempt at a patch. This is pretty much the same idea as
described in comment 1, except that where we had an existing description, I've
kept it (with two exceptions, see below).

The changes from our current version are as follows:
1. It's sorted in code order (this just makes it easier to compare with the
official list).
2. I've added all the missing codes.
3. Removals:
     in (Indonesian): deprecated 1989 in favour of id
     ji (Yiddish): deprecated 1989 in favour of yi
     sh (Serbo-Croatian): deprecated 2000
3. Code changes:
     jw (Javanese) to jv: known errata in ISO 639:1988
     sb (Sorbian) to wen: non-standard, now using correct ISO 639-2 code.
     x-kok (Konkani) to kok: x-code, now using correct ISO 639-2 code.
4. Sotho/Sutu:
     From: st (Sesotho) and sx [not standard] (Sutu)
     To: st (Sotho, Southern), and nso (Sotho, Northern).
     (see comment 2, comment 6).
5. Description change:
     vo (Volapuk) changed to vo (Volap\u00fck)
Attachment #105202 - Attachment is obsolete: true
Attachment #126544 - Flags: review?(smontagu)
Comment on attachment 126544 [details] [diff] [review]
Update languageNames.properties (non-controversial changes only)

Please add back the name change from Farsi to Persian (bug 204767 comment 2).

With that, r=smontagu.
Blocks: 204767
As for attachment 126544 [details] [diff] [review], but includes the name change from Farsi to Persian.
Attachment #126544 - Attachment is obsolete: true
Attachment #126587 - Flags: superreview?(alecf)
Attachment #126587 - Flags: review?(smontagu)
Comment on attachment 126587 [details] [diff] [review]
v3 Update languageNames.properties (non-controversial changes only)

r=smontagu
Attachment #126587 - Flags: review?(smontagu) → review+
Attachment #126544 - Flags: review?(smontagu)
Comment on attachment 126587 [details] [diff] [review]
v3 Update languageNames.properties (non-controversial changes only)

sr=alecf
Attachment #126587 - Flags: superreview?(alecf) → superreview+
Checking in xpfe/global/resources/locale/en-US/languageNames.properties;
/cvsroot/mozilla/xpfe/global/resources/locale/en-US/languageNames.properties,v
<--  languageNames.properties
new revision: 1.11; previous revision: 1.10
done
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
biesi checked in the 'non-controversial' patch, but I'd like to reopen this to 
document the differences between the current version and the standard, so that 
we can decide what else (if anything) we'd like to change.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Ok, here's a list of the remaining differences.

Code  We say          ISO639-2 says
dz    Bhutani         Dzongkha
fo    Faeroese        Faroese
gd    Scots Gaelic    Gaelic /or/ Scottish Gaelic
gl    Galician        Gallegan                     [wontfix, see below]
ik    Inupiak         Inupiaq
km    Cambodian       Khmer
lo    Laothian        Lao
pa    Punjabi         Panjabi
ps    Pashto          Pushto
rm    Rhaeto-Romanic  Raeto-Romance
rn    Kirundi         Rundi
sg    Sangro          Sango
si    Singhalese      Sinhalese
ss    Siswati         Swati
su    Sudanese        Sundanese

Codes (el, ia, oc, to) technically also differ from the offical descriptions, 
but only because we truncate the description - 'Greek' rather than 'Greek, 
Modern (1453-)', for example.

I'm not suggesting that we should change all of the above. In fact, we have 
already decided /not/ to change at least one of them (gl - bug 127946 comment 
7).

I do wonder whether any of the differences are caused because we're using the 
foreign-language name of the language rather than the English-language name 
(like using 'Deutsch' instead of 'German'). See bug 208295 for an example of 
this.

I'm hoping we should be able to classify the remaining differences into one of 
four categories: 1. spelled wrong (will fix), 2. foreign-language name rather 
than English-language name (will fix), 3. wrong for another reason (will fix), 
4. wontfix (for whatever reason).

Alternatively, we could just decide which ones to fix, but this topic seems 
particularly contentious (no surpise), so we should document why we are or 
aren't changing things.
One on that list that *does* look wrong to me is Sudanese / Sundanese.  From 
what I can see, 'Sudanese' refers to the people of Sudan, in Africa, and is not 
the name of a language (the Sudanese primarily speak Arabic), while 'Sundanese' 
appears to be the language spoken by the Sundanese in Indonesia.
I thought I had commented about "Sudanese" earlier. It's certainly a typo for
"Sundanese" and should be corrected.
Hi, this bug is important for me. We won't be able to translate Google into 
Aragonese since IE or Mozilla include this language (their rules). Thx.
Mozilla already includes Aragonese (language code 'an'), it's just not visible 
in the dialog by default, though you can still enter it manually.

If you want it to be visible in the dialog, please file a separate bug.
Hi,

I'm the localization contributor for Sorbian. I accidentially found this bug and
stated that since 2003-09-01 the language code wen that I used till now was
changed (and splitted) to dsb (for Lower Sorbian) and hsb (Upper Sorbian). And
with Mozilla 1.6b I've got a problem. I can't create a profile directly for
Sorbian. I have to switch by
Edit-->Preferences-->Apeearancde-->Languages/Content. Since Mozilla 1.6b there
isn't more an entry in the file res/languages.properties.
Till Mozilla 1.6a there was an entry "sb.accept=true". Is this missing line the
reason that I can't create a Sorbian profile directly? Maybe there is a problem
in my language pack /http://www.sorbzilla.de/lanwende.xpi).
Michael,
'sb' was removed in bug 224546 because I couldn't find any trace of it having
been ever defined in the official ISO 639 site
(http://www.loc.gov/standards/iso639-2/). 'hsb', 'dsb' and 'wen' are defined for
High Sorbian, Low Sorbian and 'Sorbian Lanugages' in ISO 639-2, but I couldn't
find 'sb' at
http://www.loc.gov/standards/iso639-2/codechanges.html

So, I guess the fix is to add 'wen.accept=true' line unless there are two
separate language packs for High Sorbian and Low Sorbian. Can you file a bug on
that (that is off-topic here) and assign it to me? 

Punjabi is now the preferred method of writing Punjabi/Panjabi.  Panjabi is
actually the correct transliteration (if you take the inherit vowel as being an
'a') but because of the way it is pronounced in Punjabi, the vowel used is
actually somewhere inbetween 'a', 'e' and 'u'. :D  Thus, for English speakers,
the  letter 'u' is the most appropriate.
see bug 318161 for inclusion of Friulian
Blocks: 318161
Blocks: 307755
Blocks: 341860
*** Bug 353278 has been marked as a duplicate of this bug. ***
> I'm hoping we should be able to classify the remaining differences into one of 
> four categories: 1. spelled wrong (will fix), 2. foreign-language name rather 
> than English-language name (will fix), 3. wrong for another reason (will fix), 
> 4. wontfix (for whatever reason).
> 
> Alternatively, we could just decide which ones to fix, but this topic seems 
> particularly contentious (no surpise), so we should document why we are or 
> aren't changing things.

OK, let's have a whack at least at category 1. Where possible I'll use English-language sources from the sites of government agencies or language committees.

I'm also changing the summary and URL to reflect that the IANA registry is now the normative source for language codes (per RFC 4646).

Code  We say     IANA says  Source
fo    Faeroese   Faroese    http://www.fmn.fo/malnevndin/about.htm
ik    Inupiak    Inupiaq    http://www.uaf.edu/anlc/langs/i.html
sg    Sangro     Sango      http://www.ethnologue.com/14/show_iso639.asp?code=sg
su    Sudanese   Sundanese  http://www.ethnologue.com/14/show_iso639.asp?code=su

I think that only sg and su are actually "spelled wrong" in that list. The others are alternative spellings where the IANA spelling seems more normative.

For the following, I can't find a source to prefer either of the two alternatives:

Code  We say          IANA says
lo    Laothian        Lao
ps    Pashto          Pushto
rn    Kirundi         Rundi
si    Singhalese      Sinhalese
ss    Siswati         Swati
Summary: Bring languageNames.properties up to date with iso639-2 → Bring languageNames.properties up to date with IANA registry
(In reply to comment #28)
> For the following, I can't find a source to prefer either of the two
> alternatives:

Add to this list:

Code  We say          IANA says
rm    Rhaeto-Romanic  Raeto-Romance

Category 2 (fix):

Code  We say          IANA says
dz    Bhutani         Dzongkha
km    Cambodian       Khmer

These both seem to be the other way round from Persian/Farsi: IANA is using a native name and we are using an English name. In both cases as far as I can tell the native name is used by native speakers when writing in English. See http://www.education.gov.bt/Departments/DDA/DDA.htm and http://www.mot.gov.kh/learn_khmer.asp
Comment on attachment 242623 [details] [diff] [review]
Patch with the changes from the last few comments

r=jshin
sorry for the delay
Attachment #242623 - Flags: review?(jshin1987) → review+
Checked in and closing bug. Future work will be done in bug 356038.
Status: REOPENED → RESOLVED
Closed: 21 years ago18 years ago
Resolution: --- → FIXED
Flags: in-testsuite-
test
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: