[tracking] Verify language and region names in CLDR

NEW
Assigned to

Status

enhancement
2 years ago
3 months ago

People

(Reporter: flod, Assigned: flod)

Tracking

(Depends on 4 bugs, Blocks 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

2 years ago
In mozilla-central we have a list of language names
https://searchfox.org/mozilla-central/source/toolkit/locales/en-US/chrome/global/languageNames.properties

We should verify if:
a) We support more language names that CLDR, and request those fields
b) Upstream translations where possible
c) Open bugs to discuss divergent translations, and tickets to CLDR when necessary.
(Assignee)

Comment 1

a year ago
Adding region names to the scope
https://searchfox.org/mozilla-central/source/toolkit/locales/en-US/chrome/global/regionNames.properties
Summary: [tracking] Verify language names in CLDR → [tracking] Verify language and region names in CLDR
(Assignee)

Comment 2

a year ago
@zibi
I'm looking at https://github.com/unicode-cldr/cldr-localenames-full/tree/master/main/en (languages, territories). Is that the right place?

There are two cases:
* Data is available in Mozilla but non in CLDR.
* Values diverge between Mozilla and CLDR. 

Note that region names were changed quite recently (bug 1203171).
Flags: needinfo?(gandalf)
(Assignee)

Comment 3

a year ago
Posted file Full differences for en-US (obsolete) —
Missing languages in CLDR

bh: Bihari
I can't find this
http://www-01.sil.org/iso639-3/documentation.asp?id=bh
https://www.ethnologue.com/language/bh

son: Songhay
http://www-01.sil.org/iso639-3/documentation.asp?id=son

wen: Sorbian
http://www-01.sil.org/iso639-3/documentation.asp?id=wen

Both son and wen are collective languages, I wonder if that's the reason for not being here.

Missing regions in CLDR:
QM: Midway Islands
QS: Bassas da India
QU: Juan de Nova Island
QW: Wake Island
QX: Glorioso Islands
QZ: Akrotiri
XA: Ashmore and Cartier Islands
XB: Baker Island
XC: Coral Sea Islands
XD: Dhekelia
XE: Europa Island
XG: Gaza Strip
XH: Howland Island
XJ: Jan Mayen
XL: Palmyra Atoll
XM: Kingman Reef
XP: Paracel Islands
XQ: Jarvis Island
XR: Svalbard
XS: Spratly Islands
XT: Tromelin Island
XU: Johnston Atoll
XV: Navassa Island
XW: West Bank

Note about regions
https://wiki.mozilla.org/Lists_of_Countries_and_Regions
https://groups.google.com/forum/#!searchin/mozilla.governance/territory%7Csort:date/mozilla.governance/CZ_He_ul63s/bVu6kktqYDEJ
(Assignee)

Comment 4

a year ago
(In reply to Francesco Lodolo [:flod] from comment #3)
> bh: Bihari
> I can't find this
> http://www-01.sil.org/iso639-3/documentation.asp?id=bh
> https://www.ethnologue.com/language/bh

Weird, it's part of the ISO 639-1 list
http://www.loc.gov/standards/iso639-2/php/code_list.php

http://www-01.sil.org/iso639-3/documentation.asp?id=bih

It's another "collective" locale.
Yeah, that's data up to date with CLDR 32.0
Flags: needinfo?(gandalf)
(Assignee)

Comment 6

a year ago
Posted file data_analysis.txt
Explanation of the file content.

> Missing language names in CLDR

Compares the content of languageNames.properties to
https://github.com/unicode-cldr/cldr-localenames-full/blob/master/main/en/languages.json

And reports language codes that are not available.

> Different values

Compare the actual English name for each language between Mozilla and CLDR.

> Missing region names in CLDR

Compares the content of regionNames.properties to
https://github.com/unicode-cldr/cldr-localenames-full/blob/master/main/en/territories.json

> Different values

Compare the actual English name for each region/territory between Mozilla and CLDR.

> Locales not supported by CLDR

Compare the current list of hg repositories (not just shipping locales, so includes also obsolete languages, or brand new ones) with the list of languages available in CLDR. To determine the latter, I look at the folders available in
https://github.com/unicode-cldr/cldr-localenames-full/tree/master/main

At first I tried looking at
https://github.com/unicode-cldr/cldr-core/blob/master/supplemental/languageData.json#L8
But it turns out it contains data for locales that are actually not available (e.g. xh).

I failed to find a list of seed locales, beyond getting the list of folders from
https://www.unicode.org/repos/cldr/trunk/seed/main/

If a locale is missing from CLDR but available as seed, it's noted.

> Missing locales in languageNames.properties

Compare the current list of hg repositories (as before) to the locale codes available in languageNames.properties
This is basically to avoid issues like bug 1263437.

It's OK for that list to not be empty, we have plenty of locales that start and never make it to builds.
Attachment #8941767 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Depends on: 1434854
(Assignee)

Updated

a year ago
Depends on: 1434886
(Assignee)

Updated

a year ago
Depends on: 1443098
(Assignee)

Updated

a year ago
Depends on: 1443817
(Assignee)

Updated

a year ago
Depends on: 1443818
You need to log in before you can comment on or make changes to this bug.