Closed Bug 716321 Opened 13 years ago Closed 3 years ago

Update existing list of language subtags to reflect more modern usage

Tracking

()

Status:

RESOLVED INACTIVE

People

(Reporter: GPHemsley, Assigned: GPHemsley)

References

(Blocks 3 open bugs,
URL
)

Details

(Whiteboard: [bcp47])

Attachments

(1 file, 4 obsolete files)

Update languageNames.properties 13 years ago Gordon P. Hemsley [:GPHemsley] 5.82 KB, patch		Details \| Diff \| Splinter Review
Update languageNames.properties (v2) 12 years ago Gordon P. Hemsley [:GPHemsley] 6.25 KB, patch		Details \| Diff \| Splinter Review
Update languageNames.properties (v3) 12 years ago Gordon P. Hemsley [:GPHemsley] 6.29 KB, patch		Details \| Diff \| Splinter Review
Update regionNames.properties (v1) 12 years ago Gordon P. Hemsley [:GPHemsley] 12.53 KB, patch	Pike : review-	Details \| Diff \| Splinter Review
Update languageNames.properties (v4) 12 years ago Gordon P. Hemsley [:GPHemsley] 9.00 KB, patch		Details \| Diff \| Splinter Review

Gordon P. Hemsley [:GPHemsley]

Assignee

Description

•

13 years ago

I'm spinning this off from bug 666662, because that bug requires a logistical discussion that should not block updating the existing list of language subtags, which is used by the existing language preference interface and spellcheck extension authors.

There are numerous bugs open requesting various changes to the existing lists, and this should supersede all of those. (In fact, if it doesn't then this bug should be updated.)

The updated list is created based on various sources that use language subtags, including Kevin's list of spellcheckers, as well as available localizations of Google, Wikipedia, and mozilla-aurora. (See the URL for the makefile which obtains this data.)

Gordon P. Hemsley [:GPHemsley]

Assignee

Comment 1

•

13 years ago

Attached patch Update languageNames.properties (obsolete) — Details — Splinter Review

This updates the list of language names to the most recent available information.

It sorts the 3-char subtags below the 2-char subtags, which explains some of the apparent deletions.

Using bug 399667 as precedent, it also excludes deprecated subtags, though they are sometimes used by the sources. The full list of deprecated subtags is available here: https://github.com/GPHemsley/BCP47/blob/master/languageDeprecated.properties

Attachment #586758 - Flags: review?(l10n)

Gordon P. Hemsley [:GPHemsley]

Assignee

Updated

•

13 years ago

Blocks: 724594

Gordon P. Hemsley [:GPHemsley]

Assignee

Updated

•

12 years ago

Blocks: 489404

Gordon P. Hemsley [:GPHemsley]

Assignee

Comment 2

•

12 years ago

Attached patch Update languageNames.properties (v2) (obsolete) — Details — Splinter Review

Add some additional languages and remove parentheticals (which included various disambiguators) from language names.

Attachment #586758 - Attachment is obsolete: true

Attachment #586758 - Flags: review?(l10n)

Attachment #610346 - Flags: review?(l10n)

Gordon P. Hemsley [:GPHemsley]

Assignee

Comment 3

•

12 years ago

Attached patch Update languageNames.properties (v3) (obsolete) — Details — Splinter Review

Add a couple more languages that have spellcheckers available.

Attachment #610346 - Attachment is obsolete: true

Attachment #610346 - Flags: review?(l10n)

Attachment #610440 - Flags: review?(l10n)

Gordon P. Hemsley [:GPHemsley]

Assignee

Comment 4

•

12 years ago

Try server builds and (basic) test results available:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/gphemsley@gmail.com-34db65aca0e5/
https://tbpl.mozilla.org/?tree=Try&rev=34db65aca0e5

Gordon P. Hemsley [:GPHemsley]

Assignee

Updated

•

12 years ago

Blocks: 741842

Gordon P. Hemsley [:GPHemsley]

Assignee

Comment 5

•

12 years ago

Attached patch Update regionNames.properties (v1) (obsolete) — Details — Splinter Review

This patch updates the list of region subtags to reflect more modern usage (addition of South Sudan, numeric region subtags, etc.). It also enables the test for numeric region subtags.

Assignee: smontagu → gphemsley

Attachment #612048 - Flags: review?(l10n)

Gordon P. Hemsley [:GPHemsley]

Assignee

Updated

•

12 years ago

Summary: Update existing list of language subtags to reflect more modern usage → Update existing list of language and region subtags to reflect more modern usage

Gordon P. Hemsley [:GPHemsley]

Assignee

Updated

•

12 years ago

Blocks: 705542

Gordon P. Hemsley [:GPHemsley]

Assignee

Comment 6

•

12 years ago

Axel, I don't have too much time to devote to this in the next few weeks, but it would be nice to be able to land it before the uplift of 14 to Aurora next week, especially given that the changes from bug 730209 are already present there. What's left to do here?

Axel Hecht [:Pike]

Comment 7

•

12 years ago

Comment on attachment 610440 [details] [diff] [review]
Update languageNames.properties (v3)

Review of attachment 610440 [details] [diff] [review]:
-----------------------------------------------------------------

Canceling the review, I can't r+ or r- this without understanding why this is the dataset.

Attachment #610440 - Flags: review?(l10n)

Axel Hecht [:Pike]

Comment 8

•

12 years ago

Comment on attachment 612048 [details] [diff] [review]
Update regionNames.properties (v1)

Review of attachment 612048 [details] [diff] [review]:
-----------------------------------------------------------------

The regionNames pose the same question, what's the data set, and why?

I find a footnote on http://de.wikipedia.org/wiki/ISO-3166-1-Kodierliste#cite_note-anm1-0 which claims that ea etc shouldn't be included, for example. Can't find a corresponding note in English, sorry.

Technically, I'd prefer if you didn't change whitespace. If you have to, don't align the '=', but just go consistently for ' = '. r- for the technical nit.

Attachment #612048 - Flags: review?(l10n) → review-

Gordon P. Hemsley [:GPHemsley]

Assignee

Comment 9

•

12 years ago

(In reply to Axel Hecht [:Pike] from comment #8)
> Comment on attachment 612048 [details] [diff] [review]
> Update regionNames.properties (v1)
> 
> Review of attachment 612048 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> The regionNames pose the same question, what's the data set, and why?
> 
> I find a footnote on
> http://de.wikipedia.org/wiki/ISO-3166-1-Kodierliste#cite_note-anm1-0 which
> claims that ea etc shouldn't be included, for example. Can't find a
> corresponding note in English, sorry.

Well, first off, I should remind you that we're implementing BCP 47, not ISO 3166. It is up to curators of the IANA Language Subtag Registry to determine whether a particular ISO 3166 is appropriate for use in a language tag. They have determined that certain reserved codes are appropriate and certain ones are not. (You'll note, for example, that reserved code 'UK' is not in this list, because 'GB' is the code that should be used.)

With that being said, this list is generated from the region subtags listed in the IANA Language Subtag Registry, with the deprecated and private use subtags removed. (It is actually debatable whether we want to exclude the deprecated subtags, but we made the decision to do so.)

The files involved in generating this patch are here:
https://github.com/GPHemsley/BCP47/blob/master/get_subtags.py
https://github.com/GPHemsley/BCP47/blob/master/region.txt
https://github.com/GPHemsley/BCP47/blob/master/regionNames.properties
https://github.com/GPHemsley/BCP47/blob/master/regionDeprecated.properties
https://github.com/GPHemsley/BCP47/blob/master/makefile#L67
https://github.com/GPHemsley/BCP47/blob/master/regionNames-l10n.properties

> Technically, I'd prefer if you didn't change whitespace. If you have to,
> don't align the '=', but just go consistently for ' = '. r- for the
> technical nit.

Per BCP 47, a region subtag is either 2 letters or 3 numbers. As such, I readjusted the whitespace to match the maximum possible length of a region subtag (instead of the seemingly-arbitrary number that currently exists in the file).

If you'd like to me to change it to a single space on either side, that's fine by me. Just know that the numerical entries won't be aligned with the alphabetical entries.

Gordon P. Hemsley [:GPHemsley]

Assignee

Comment 10

•

12 years ago

(In reply to Gordon P. Hemsley [:gphemsley] from comment #9)
> With that being said, this list is generated from the region subtags listed
> in the IANA Language Subtag Registry, with the deprecated and private use
> subtags removed. (It is actually debatable whether we want to exclude the
> deprecated subtags, but we made the decision to do so.)

I should also note that some of the English names have been overridden from the names that are listed in the registry.

The regions in question are here:
https://github.com/GPHemsley/BCP47/blob/master/get_subtags.py#L213

My original justification for these choices is here:
http://groups.google.com/group/mozilla.dev.l10n/browse_thread/thread/97d2dddb8db97248/1231aceeaf2cfc06

(Note: Some of the "renames" I justify in that thread merely involve reverting to the name used in the registry. The get_subtags.py lists the manual overrides in relation to the registry, not the existing names in the Mozilla source.)

Gordon P. Hemsley [:GPHemsley]

Assignee

Comment 11

•

12 years ago

Returning the discussion about region subtags to bug 705542. Axel, please respond there.

No longer blocks: 705542

Summary: Update existing list of language and region subtags to reflect more modern usage → Update existing list of language subtags to reflect more modern usage

Gordon P. Hemsley [:GPHemsley]

Assignee

Updated

•

12 years ago

Attachment #612048 - Attachment is obsolete: true

Gordon P. Hemsley [:GPHemsley]

Assignee

Updated

•

12 years ago

Blocks: 709930

Gordon P. Hemsley [:GPHemsley]

Assignee

Updated

•

12 years ago

No longer blocks: 741842

Kevin Scannell

Comment 12

•

12 years ago

(In reply to Axel Hecht [:Pike] from comment #7)
> Comment on attachment 610440 [details] [diff] [review]
> Update languageNames.properties (v3)
> 
> Review of attachment 610440 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Canceling the review, I can't r+ or r- this without understanding why this
> is the dataset.

Even though Gordon is doing all the hard work on this stuff I thought I'd jump in and at least explain the choice of languages in the patch.

The list consists of:
(1) currently localized language names
(2) languages for which there are existing Mozilla l10n efforts (== landed on mozilla-aurora)
(3) languages for which there are existing open source spell checkers
(4) languages with Wikipedias
(5) languages for which Google's search interface is localized

My original conservative proposal was (1)-(3).  (3) is personally important to me since I've worked on several spell checkers for languages not on the list and the experience for users of these addons is broken.  In particular, AMO reviewers have been unwilling to grant full reviews to these addons. 

We added (4) and (5) based on suggestions on the dev-l10n list; here's that thread:

https://groups.google.com/forum/?fromgroups#!msg/mozilla.dev.l10n/L4KF6mNTwRA/m7EQML0FlkUJ

Were we to go even bigger, the next natural set of languages to include would be the ones in CLDR, but that's another ~300 to add, and we decided that would be an unnecessary burden on localizers.

Hope this helps!

Gordon P. Hemsley [:GPHemsley]

Assignee

Comment 13

•

12 years ago

Attached patch Update languageNames.properties (v4) — Details — Splinter Review

Rebase patch and include additional languages.

Attachment #610440 - Attachment is obsolete: true

Gordon P. Hemsley [:GPHemsley]

Assignee

Updated

•

12 years ago

Status: NEW → ASSIGNED

Gordon P. Hemsley [:GPHemsley]

Assignee

Updated

•

12 years ago

No longer blocks: 489404

Gordon P. Hemsley [:GPHemsley]

Assignee

Comment 14

•

12 years ago

To expedite the process for existing bugs on file, I've created individual patches for the following bugs:

* Bug 535422: Add support for Lower Sorbian [dsb].
* Bug 586085: Add support for Kashubian [csb], Hawaiian [haw], and Hiligaynon [hil]. 
* Bug 531849: Rename "Haitian" to "Haitian Creole" [ht].
* Bug 724594: Rename "Scots Gaelic" to "Scottish Gaelic" [gd].

I've also filed bug 788178 to remove a bunch of trailing whitespace from language.properties (the file that dictates what is displayed in the Languages preferences dialog list).

For these patches to apply the most cleanly, they should be applied in the order they were in my patch queue: whitespace patch first, then the rest of the bugs in the order they appear above (which, incidentally, is based on how long they've been on file, prioritizing new additions over renamings).

No longer blocks: 709930

Gordon P. Hemsley [:GPHemsley]

Assignee

Updated

•

12 years ago

No longer blocks: 586085

Gordon P. Hemsley [:GPHemsley]

Assignee

Updated

•

12 years ago

No longer blocks: 535422, 724594

Gordon P. Hemsley [:GPHemsley]

Assignee

Updated

•

11 years ago

Blocks: 829658

Alexander L. Slovesnik

Updated

•

11 years ago

Blocks: 835074

Dan Minor [:dminor]

Updated

•

3 years ago

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

Resolution: --- → INACTIVE

You need to log in before you can comment on or make changes to this bug.