178491 - Bring languageNames.properties up to date with IANA registry

Assignee

Description

•

22 years ago

This is split off from bug 167908.

Simon Montagu :smontagu

Assignee

Comment 1

•

22 years ago

Attached patch Patch to languageNames.properties (obsolete) — Details — Splinter Review

Patch by Malcom Rowe (bugzilla2@farside.demon.co.uk). Comments copied from bug
167908 comment 2:

This patch updates languageNames.properties to be consistent with the latest
updates to ISO 639-2, and also places the file back into language-code order.

To enable Frisian in the dialog, we will also have to add an entry to
intl/locale/src/language.properties, once we know what country/countries it
should be placed in.

The new file is from ISO 639-2, taken from
1. http://www.loc.gov/standards/iso639-2/langcodes.html, plus
2. The addition of the two extra codes (ast, x-kok) which were already in
place, and the renaming of Greek, Modern to Greek (as was already done), plus
3. The re-addition of the following deprecated codes:
   in = Indonesian (deprecated 1989 in favour of id)
   ji = Yiddish (deprecated 1989 in favour of yi)
   sh = Serbo-Croatian (deprecated 2000)
(see http://www.loc.gov/standards/iso639-2/codechanges.html)

Changes from our current languageNames.properties:
0. In country-code order.
1. Many spelling/name changes, notably the following names:
   Bhutani -> Dzongkha
   Farsi -> Persian
   Scots Gaelic -> Gaelic
   Cambodian -> Khmer
   Greenlandic -> Kalaallisut
2. Addition of many codes, including Frisian.
3. Javanese changed from jw to jv - a known errata in ISO 639:1988.
4. Removal of:
   sb = Sorbian
   sx = Sutu

I can find no reference to sb or sx ever being valid ISO 639 codes.

Simon Montagu :smontagu

Assignee

Updated

•

22 years ago

Blocks: 167908

Yuying Long

Updated

•

22 years ago

Keywords: intl

Katsuhiko Momoi

Comment 2

•

22 years ago

>  sb = Sorbian
>  sx = Sutu

These two are used in Microsoft products:

http://msdn.microsoft.com/workshop/author/dhtml/reference/language_codes.asp

It seems that at the time when MS adopted these, there were no ISO-639-1
abbreviations nor ISO-639-2 ones. Sorbian (Upper & Lower) now
has 3-letter code (wen). Sutu still does to have any representation
in ISO-639-1/2. Sutu is a variant name for one of Southern Sotho 
languages:

http://www.ethnologue.com/show_language.asp?code=SSO

Apparently MS thought it important to use this 2-letter 
abbreviation for Sutu until it is established. There are
a few precedents of Netscape doing something similar before.

Katsuhiko Momoi

Comment 3

•

22 years ago

Question: Is this a proposal to add to the current visible
          list (through the UI dialog) **all** the ISO-639-1/2
          languages?
          Or are we completing the list but only turning on the
          flag for the ones which are needed?

We have been taking the latter approach up to now because the entire list
will the list too long.

Simon Montagu :smontagu

Assignee

Comment 4

•

22 years ago

I agree that we should continue with the latter approach. The patch does not
include any new three letter language codes, but includes all two letter codes
not in the current list. Adding all the three letter codes with no two letter
equivalent would make the list much longer, and I suggest we continue only
adding them when someone specifically requests them.

Simon Montagu :smontagu

Assignee

Comment 5

•

22 years ago

My comments on the patch:

There is now one more new two letter code in
http://www.loc.gov/standards/iso639-2/codechanges.html: ii Sichuan Yi

The name for "ho" should be Hiri Motu.

Why do we want to retain the deprecated codes "sh" "ji" and "in"?

Konkani has a standard code "kok", which we should probably use instead of "x-kok".

We should try to investigate whether the name changes are acceptable in the
field. As Malcolm points out in the original bug, we have already rejected the
change from "Galician" to "Gallegan", see bug 127946 comment 7.

Katsuhiko Momoi

Comment 6

•

22 years ago

We can add back: 

wen = sorbian

if we are willing to start adding 3-letter code from ISO-6639-2. 
Currently I don't see any use of 3-letter code but our code for
handling accept-language headers are designed to take these as well
and so it shoudl present no problem in that regard.

As for sutu, we cam split the original reference to:

nso = Sotho, Northern 
st = Sotho, Southern

each having more 3.5 million speakers in South Africa.

Simon Montagu :smontagu

Assignee

Comment 7

•

22 years ago

I have posted in the netscape.public.mozilla.i18n newsgroup requesting feedback
on the name changes. I have already received a comment offlist that Punjabi is
correct, not Panjabi.

The names in the standard derive from the Library of Congress Subject Headings,
and we should not expect our needs and priorities to be identical with those of
the Library of Congress.

Summary: Bring language.properties up to date with iso639-2 → Bring languageNames.properties up to date with iso639-2

Katsuhiko Momoi

Comment 8

•

22 years ago

With regard to cmment #7, very often, English names for languages 
have variants. Both Panjabi and Punjabi are known variant names for 
the same language. In our list, we normally list only one name and 
that shoud be the "preferred" name. United Nations for example
recognizes both names:

http://www.unhchr.ch/udhr/navigate/alpha.htm#P

(Was the person who wrtoe to smontagu a Pakistani or Indian? That could
also be a factor in the preferred name.)
I don't mind not changing this name to Panjabi given this state of
affairs -- it was there in the code before and we may not change
it unless there is a compelling reason to.
 
BTW, I don't believe ISO-639-1/2 lang names are based on LofC 
names. They are based on submissions from requesters with reasons
provided for preferring one name over others if variants are
submitted.

Status: NEW → ASSIGNED

Simon Montagu :smontagu

Assignee

Comment 9

•

22 years ago

My correspondent was from India, and "Punjabi" does seem to be the
transliteration used by Punjabis. The official government sites of the Punjab in
India and Pakistan are http://www.punjab.gov.in/ and http://www.punjab.gov.pk/

Simon Montagu :smontagu

Assignee

Comment 10

•

21 years ago

*** Bug 209591 has been marked as a duplicate of this bug. ***

Malcolm Rowe

Comment 11

•

21 years ago

Attached patch Update languageNames.properties (non-controversial changes only) (obsolete) — Details — Splinter Review

Ok, here's another attempt at a patch. This is pretty much the same idea as
described in comment 1, except that where we had an existing description, I've
kept it (with two exceptions, see below).

The changes from our current version are as follows:
1. It's sorted in code order (this just makes it easier to compare with the
official list).
2. I've added all the missing codes.
3. Removals:
     in (Indonesian): deprecated 1989 in favour of id
     ji (Yiddish): deprecated 1989 in favour of yi
     sh (Serbo-Croatian): deprecated 2000
3. Code changes:
     jw (Javanese) to jv: known errata in ISO 639:1988
     sb (Sorbian) to wen: non-standard, now using correct ISO 639-2 code.
     x-kok (Konkani) to kok: x-code, now using correct ISO 639-2 code.
4. Sotho/Sutu:
     From: st (Sesotho) and sx [not standard] (Sutu)
     To: st (Sotho, Southern), and nso (Sotho, Northern).
     (see comment 2, comment 6).
5. Description change:
     vo (Volapuk) changed to vo (Volap\u00fck)

Attachment #105202 - Attachment is obsolete: true

Malcolm Rowe

Updated

•

21 years ago

Attachment #126544 - Flags: review?(smontagu)

Simon Montagu :smontagu

Assignee

Comment 12

•

21 years ago

Comment on attachment 126544 [details] [diff] [review]
Update languageNames.properties (non-controversial changes only)

Please add back the name change from Farsi to Persian (bug 204767 comment 2).

With that, r=smontagu.

Simon Montagu :smontagu

Assignee

Updated

•

21 years ago

Blocks: 204767

Malcolm Rowe

Comment 13

•

21 years ago

Attached patch v3 Update languageNames.properties (non-controversial changes only) — Details — Splinter Review

As for attachment 126544 [details] [diff] [review], but includes the name change from Farsi to Persian.

Malcolm Rowe

Updated

•

21 years ago

Attachment #126544 - Attachment is obsolete: true

Malcolm Rowe

Updated

•

21 years ago

Attachment #126587 - Flags: superreview?(alecf)

Attachment #126587 - Flags: review?(smontagu)

Simon Montagu :smontagu

Assignee

Comment 14

•

21 years ago

Comment on attachment 126587 [details] [diff] [review]
v3 Update languageNames.properties (non-controversial changes only)

r=smontagu

Attachment #126587 - Flags: review?(smontagu) → review+

Simon Montagu :smontagu

Assignee

Updated

•

21 years ago

Attachment #126544 - Flags: review?(smontagu)

Alec Flett

Comment 15

•

21 years ago

Comment on attachment 126587 [details] [diff] [review]
v3 Update languageNames.properties (non-controversial changes only)

sr=alecf

Attachment #126587 - Flags: superreview?(alecf) → superreview+

Christian :Biesinger (don't email me, ping me on IRC)

Comment 16

•

21 years ago

Checking in xpfe/global/resources/locale/en-US/languageNames.properties;
/cvsroot/mozilla/xpfe/global/resources/locale/en-US/languageNames.properties,v
<--  languageNames.properties
new revision: 1.11; previous revision: 1.10
done

Status: ASSIGNED → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

Malcolm Rowe

Comment 17

•

21 years ago

biesi checked in the 'non-controversial' patch, but I'd like to reopen this to 
document the differences between the current version and the standard, so that 
we can decide what else (if anything) we'd like to change.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Malcolm Rowe

Comment 18

•

21 years ago

Ok, here's a list of the remaining differences.

Code  We say          ISO639-2 says
dz    Bhutani         Dzongkha
fo    Faeroese        Faroese
gd    Scots Gaelic    Gaelic /or/ Scottish Gaelic
gl    Galician        Gallegan                     [wontfix, see below]
ik    Inupiak         Inupiaq
km    Cambodian       Khmer
lo    Laothian        Lao
pa    Punjabi         Panjabi
ps    Pashto          Pushto
rm    Rhaeto-Romanic  Raeto-Romance
rn    Kirundi         Rundi
sg    Sangro          Sango
si    Singhalese      Sinhalese
ss    Siswati         Swati
su    Sudanese        Sundanese

Codes (el, ia, oc, to) technically also differ from the offical descriptions, 
but only because we truncate the description - 'Greek' rather than 'Greek, 
Modern (1453-)', for example.

I'm not suggesting that we should change all of the above. In fact, we have 
already decided /not/ to change at least one of them (gl - bug 127946 comment 
7).

I do wonder whether any of the differences are caused because we're using the 
foreign-language name of the language rather than the English-language name 
(like using 'Deutsch' instead of 'German'). See bug 208295 for an example of 
this.

I'm hoping we should be able to classify the remaining differences into one of 
four categories: 1. spelled wrong (will fix), 2. foreign-language name rather 
than English-language name (will fix), 3. wrong for another reason (will fix), 
4. wontfix (for whatever reason).

Alternatively, we could just decide which ones to fix, but this topic seems 
particularly contentious (no surpise), so we should document why we are or 
aren't changing things.

Malcolm Rowe

Comment 19

•

21 years ago

One on that list that *does* look wrong to me is Sudanese / Sundanese.  From 
what I can see, 'Sudanese' refers to the people of Sudan, in Africa, and is not 
the name of a language (the Sudanese primarily speak Arabic), while 'Sundanese' 
appears to be the language spoken by the Sundanese in Indonesia.

Simon Montagu :smontagu

Assignee

Comment 20

•

21 years ago

I thought I had commented about "Sudanese" earlier. It's certainly a typo for
"Sundanese" and should be corrected.

panxut

Comment 21

•

21 years ago

Hi, this bug is important for me. We won't be able to translate Google into 
Aragonese since IE or Mozilla include this language (their rules). Thx.

Malcolm Rowe

Comment 22

•

21 years ago

Mozilla already includes Aragonese (language code 'an'), it's just not visible 
in the dialog by default, though you can still enter it manually.

If you want it to be visible in the dialog, please file a separate bug.

Michael Wolf

Comment 23

•

21 years ago

Hi,

I'm the localization contributor for Sorbian. I accidentially found this bug and
stated that since 2003-09-01 the language code wen that I used till now was
changed (and splitted) to dsb (for Lower Sorbian) and hsb (Upper Sorbian). And
with Mozilla 1.6b I've got a problem. I can't create a profile directly for
Sorbian. I have to switch by
Edit-->Preferences-->Apeearancde-->Languages/Content. Since Mozilla 1.6b there
isn't more an entry in the file res/languages.properties.
Till Mozilla 1.6a there was an entry "sb.accept=true". Is this missing line the
reason that I can't create a Sorbian profile directly? Maybe there is a problem
in my language pack /http://www.sorbzilla.de/lanwende.xpi).

Jungshik Shin

Comment 24

•

21 years ago

Michael,
'sb' was removed in bug 224546 because I couldn't find any trace of it having
been ever defined in the official ISO 639 site
(http://www.loc.gov/standards/iso639-2/). 'hsb', 'dsb' and 'wen' are defined for
High Sorbian, Low Sorbian and 'Sorbian Lanugages' in ISO 639-2, but I couldn't
find 'sb' at
http://www.loc.gov/standards/iso639-2/codechanges.html

So, I guess the fix is to add 'wen.accept=true' line unless there are two
separate language packs for High Sorbian and Low Sorbian. Can you file a bug on
that (that is off-topic here) and assign it to me?

Sukh

Comment 25

•

20 years ago

Punjabi is now the preferred method of writing Punjabi/Panjabi.  Panjabi is
actually the correct transliteration (if you take the inherit vowel as being an
'a') but because of the way it is pronounced in Punjabi, the vowel used is
actually somewhere inbetween 'a', 'e' and 'u'. :D  Thus, for English speakers,
the  letter 'u' is the most appropriate.

Jo Hermans

Comment 26

•

19 years ago

see bug 318161 for inclusion of Friulian

Blocks: 318161

Boris Zbarsky [:bzbarsky]

Updated

•

19 years ago

Blocks: 307755

Jo Hermans

Updated

•

18 years ago

Blocks: 341860

Jo Hermans

Comment 27

•

18 years ago

*** Bug 353278 has been marked as a duplicate of this bug. ***

Simon Montagu :smontagu

Assignee

Comment 28

•

18 years ago

> I'm hoping we should be able to classify the remaining differences into one of 
> four categories: 1. spelled wrong (will fix), 2. foreign-language name rather 
> than English-language name (will fix), 3. wrong for another reason (will fix), 
> 4. wontfix (for whatever reason).
> 
> Alternatively, we could just decide which ones to fix, but this topic seems 
> particularly contentious (no surpise), so we should document why we are or 
> aren't changing things.

OK, let's have a whack at least at category 1. Where possible I'll use English-language sources from the sites of government agencies or language committees.

I'm also changing the summary and URL to reflect that the IANA registry is now the normative source for language codes (per RFC 4646).

Code  We say     IANA says  Source
fo    Faeroese   Faroese    http://www.fmn.fo/malnevndin/about.htm
ik    Inupiak    Inupiaq    http://www.uaf.edu/anlc/langs/i.html
sg    Sangro     Sango      http://www.ethnologue.com/14/show_iso639.asp?code=sg
su    Sudanese   Sundanese  http://www.ethnologue.com/14/show_iso639.asp?code=su

I think that only sg and su are actually "spelled wrong" in that list. The others are alternative spellings where the IANA spelling seems more normative.

For the following, I can't find a source to prefer either of the two alternatives:

Code  We say          IANA says
lo    Laothian        Lao
ps    Pashto          Pushto
rn    Kirundi         Rundi
si    Singhalese      Sinhalese
ss    Siswati         Swati

URL: http://www.loc.gov/standards/iso639-2... → http://www.iana.org/assignments/langu...

Summary: Bring languageNames.properties up to date with iso639-2 → Bring languageNames.properties up to date with IANA registry

Simon Montagu :smontagu

Assignee

Comment 29

•

18 years ago

(In reply to comment #28)
> For the following, I can't find a source to prefer either of the two
> alternatives:

Add to this list:

Code  We say          IANA says
rm    Rhaeto-Romanic  Raeto-Romance

Category 2 (fix):

Code  We say          IANA says
dz    Bhutani         Dzongkha
km    Cambodian       Khmer

These both seem to be the other way round from Persian/Farsi: IANA is using a native name and we are using an English name. In both cases as far as I can tell the native name is used by native speakers when writing in English. See http://www.education.gov.bt/Departments/DDA/DDA.htm and http://www.mot.gov.kh/learn_khmer.asp

Simon Montagu :smontagu

Assignee

Comment 30

•

18 years ago

Attached patch Patch with the changes from the last few comments — Details — Splinter Review

Attachment #242623 - Flags: review?(jshin1987)

Jungshik Shin

Comment 31

•

18 years ago

Comment on attachment 242623 [details] [diff] [review]
Patch with the changes from the last few comments

r=jshin
sorry for the delay

Attachment #242623 - Flags: review?(jshin1987) → review+

Simon Montagu :smontagu

Assignee

Comment 32

•

18 years ago

Checked in and closing bug. Future work will be done in bug 356038.

Status: REOPENED → RESOLVED

Closed: 21 years ago → 18 years ago

Resolution: --- → FIXED

Benjamin Smedberg

Updated

•

18 years ago

Flags: in-testsuite-

Jalileh

Comment 33

•

12 years ago

test

Patch to languageNames.properties 22 years ago Simon Montagu :smontagu 3.35 KB, patch		Details \| Diff \| Splinter Review
Update languageNames.properties (non-controversial changes only) 21 years ago Malcolm Rowe 3.21 KB, patch		Details \| Diff \| Splinter Review
v3 Update languageNames.properties (non-controversial changes only) 21 years ago Malcolm Rowe 3.39 KB, patch	smontagu : review+ alecf : superreview+	Details \| Diff \| Splinter Review
Patch with the changes from the last few comments 18 years ago Simon Montagu :smontagu 1.30 KB, patch	jshin1987 : review+	Details \| Diff \| Splinter Review