Closed
Bug 1433694
Opened 7 years ago
Closed 5 years ago
Create a script to extract language/region names FTL file out of CLDR
Categories
(Core :: Internationalization, enhancement, P3)
Core
Internationalization
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: zbraniecki, Assigned: kekoariggin)
References
Details
I imagine the script to follow similar logic as those two:
- for timezones: https://searchfox.org/mozilla-central/source/intl/update-tzdata.sh
- for ICU use: https://searchfox.org/mozilla-central/source/intl/update-icu.sh
The script would iterate over a list of locales, read CLDR file like [0] and produce a file like [1] or [2] but using FTL syntax.
For FTL we would use python-fluent [3].
[0] http://bugs.icu-project.org/trac/browser/trunk/icu4c/source/data/lang/pl.txt
[1] https://searchfox.org/mozilla-central/source/toolkit/locales/en-US/chrome/global/languageNames.properties
[2] https://searchfox.org/mozilla-central/source/toolkit/locales/en-US/chrome/global/regionNames.properties
[3] https://github.com/projectfluent/python-fluent
Reporter | ||
Updated•7 years ago
|
Priority: -- → P3
Reporter | ||
Comment 1•7 years ago
|
||
Answering questions from bug 1431324 comment 5:
> - What are the files that need to be processed and where can I find them?
you can read them from: http://bugs.icu-project.org/trac/browser/trunk/icu4c/source/data/lang/ - which uses its own format, but there's also JSON https://github.com/unicode-cldr that may make it way easier (read from JSON, write to FTL)
> - What output files are we hoping to have? Am I understanding correctly that we need one file per locale per data file?
Fluent. You can find a parser/ast/serializer in https://github.com/projectfluent/python-fluent
I assume you'll read the source (CLDR format or JSON), produce AST for Fluent and the use serializer to write the file.
Assignee: nobody → kekoariggin
Comment 2•7 years ago
|
||
We should cast a wider net around our choice to upstream our choice of region names. Back in bug 1203171, Gerv changed our region names to be more or less what GENC uses.
Gerv, any concerns moving that over to CLDR?
Flags: needinfo?(gerv)
Comment 3•7 years ago
|
||
Bug 1416148 has a lot more data about the differences we have, for both language and region names. I was planning to NI Gerv there, not sure about the relation these two bugs should have.
Reporter | ||
Comment 4•7 years ago
|
||
I think they are correctly scoped. This bug is about writing a script. That bug is about analyzing differences in terminology and scope. They don't depend on one another. maybe "see also"?
Comment 5•7 years ago
|
||
We chose GENC for human-readable region very carefully, after a lot of thought about the pros and cons of each option. It took some months to get that change approved, and so there would need to be a very good reason to reopen that discussion. Are you suggesting there is one?
It is not surprising that there are a number of differences between GENC and other sources; if all the sources were the same, there would be no need for a discussion on which to use :-) But those differences are significant, and we have decided that a) making decisions on a per-region-name basis is a really bad idea; and b) the GENC list is the best list, considering all factors.
Does that help?
Gerv
Flags: needinfo?(gerv)
Comment 6•7 years ago
|
||
Note about languages: the current CLDR data is currently not usable for some languages (and I've only checked French and Italian).
Mozilla uses uppercase (e.g. "Italiano" for Italian), CLDR uses lowercase ("italiano"). That would work in the middle of a sentence, not as stand-alone (current use in Mozilla) or at the beginning of a sentence.
That means that we'd need not just to import data, but also apply transformation to those language names. And potentially need a per-locale rule of the kind of capitalization to apply.
Reporter | ||
Comment 7•7 years ago
|
||
I'd prefer to talk about the unification around CLDR for L10n data in bug 1431324, and leave this bug just for the implementation if we decide to do this. I responded to Gerv in bug 1431324 comment 7.
Comment 8•7 years ago
|
||
I've found that CLDR includes rules for capitalizing language names in context, e.g.
https://github.com/unicode-cldr/cldr-misc-full/blob/master/main/it/contextTransforms.json
There are casing rules for territories, but I can't find them in the GitHub repository, on in SVN
https://www.unicode.org/repos/cldr/tags/release-33/common/casing/it.xml
Reporter | ||
Comment 9•5 years ago
|
||
This is basically obsoleted by Intl.DisplayNames
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•