1433694 - Create a script to extract language/region names FTL file out of CLDR

Answering questions from bug 1431324 comment 5: > - What are the files that need to be processed and where can I find them? you can read them from: http://bugs.icu-project.org/trac/browser/trunk/icu4c/source/data/lang/ - which uses its own format, but there's also JSON https://github.com/unicode-cldr that may make it way easier (read from JSON, write to FTL) > - What output files are we hoping to have? Am I understanding correctly that we need one file per locale per data file? Fluent. You can find a parser/ast/serializer in https://github.com/projectfluent/python-fluent I assume you'll read the source (CLDR format or JSON), produce AST for Fluent and the use serializer to write the file.

Assignee: nobody → kekoariggin

Axel Hecht [:Pike]

Comment 2

•

7 years ago

We should cast a wider net around our choice to upstream our choice of region names. Back in bug 1203171, Gerv changed our region names to be more or less what GENC uses. Gerv, any concerns moving that over to CLDR?

Flags: needinfo?(gerv)

Francesco Lodolo [:flod]

Comment 3

•

7 years ago

Bug 1416148 has a lot more data about the differences we have, for both language and region names. I was planning to NI Gerv there, not sure about the relation these two bugs should have.

Zibi Braniecki [:zbraniecki][:gandalf]

Reporter

Comment 4

•

7 years ago

I think they are correctly scoped. This bug is about writing a script. That bug is about analyzing differences in terminology and scope. They don't depend on one another. maybe "see also"?

Gervase Markham [:gerv]

Comment 5

•

7 years ago

We chose GENC for human-readable region very carefully, after a lot of thought about the pros and cons of each option. It took some months to get that change approved, and so there would need to be a very good reason to reopen that discussion. Are you suggesting there is one? It is not surprising that there are a number of differences between GENC and other sources; if all the sources were the same, there would be no need for a discussion on which to use :-) But those differences are significant, and we have decided that a) making decisions on a per-region-name basis is a really bad idea; and b) the GENC list is the best list, considering all factors. Does that help? Gerv

Flags: needinfo?(gerv)

Francesco Lodolo [:flod]

Comment 6

•

7 years ago

Note about languages: the current CLDR data is currently not usable for some languages (and I've only checked French and Italian). Mozilla uses uppercase (e.g. "Italiano" for Italian), CLDR uses lowercase ("italiano"). That would work in the middle of a sentence, not as stand-alone (current use in Mozilla) or at the beginning of a sentence. That means that we'd need not just to import data, but also apply transformation to those language names. And potentially need a per-locale rule of the kind of capitalization to apply.

Zibi Braniecki [:zbraniecki][:gandalf]

Reporter

Comment 7

•

7 years ago

I'd prefer to talk about the unification around CLDR for L10n data in bug 1431324, and leave this bug just for the implementation if we decide to do this. I responded to Gerv in bug 1431324 comment 7.

Francesco Lodolo [:flod]

Comment 8

•

7 years ago

I've found that CLDR includes rules for capitalizing language names in context, e.g. https://github.com/unicode-cldr/cldr-misc-full/blob/master/main/it/contextTransforms.json There are casing rules for territories, but I can't find them in the GitHub repository, on in SVN https://www.unicode.org/repos/cldr/tags/release-33/common/casing/it.xml

Zibi Braniecki [:zbraniecki][:gandalf]

Reporter

Comment 9

•

5 years ago

This is basically obsoleted by Intl.DisplayNames

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → WONTFIX

Bugzilla

Create a script to extract language/region names FTL file out of CLDR

Categories

(Core :: Internationalization, enhancement, P3)

Tracking

()

People

(Reporter: zbraniecki, Assigned: kekoariggin)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9