Closed
Bug 1451082
Opened 7 years ago
Closed 7 years ago
Update IANA Language Subtag Registry data to version 2018-03-30
Categories
(Core :: JavaScript: Internationalization API, enhancement)
Core
JavaScript: Internationalization API
Tracking
()
RESOLVED
FIXED
mozilla61
Tracking | Status | |
---|---|---|
firefox61 | --- | fixed |
People
(Reporter: anba, Assigned: anba)
References
Details
Attachments
(1 file, 3 obsolete files)
55.95 KB,
patch
|
anba
:
review+
|
Details | Diff | Splinter Review |
https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
---
%%
Type: variant
Subtag: arevela
Description: Eastern Armenian
Added: 2006-09-18
Deprecated: 2018-03-24
Preferred-Value: hy
Prefix: hy
%%
Type: variant
Subtag: arevmda
Description: Western Armenian
Added: 2006-09-18
Deprecated: 2018-03-24
Preferred-Value: hyw
Prefix: hy
%%
---
-> https://searchfox.org/mozilla-central/rev/f5fb323246bf22a3a3b4185882a1c5d8a2c02996/js/src/builtin/intl/make_intl_data.py#148-150
:-(((((((((((
Assignee | ||
Comment 1•7 years ago
|
||
The latest version of language tag registry added preferred subtags for the variants "arevela" and "arevmda".
So we need to support:
js> Intl.getCanonicalLocales("hy-arevmda")
["hyw"]
And it doesn't seem too unreasonable to me we also want to support this replacement when script, region, or other subtags are present:
js> Intl.getCanonicalLocales("hy-Armn-arevmda")
["hyw-Armn"]
But the later is not possible with the current |langTagMappings| mapping table, because that table can only map complete tags. :-(
I assume it is correct to canonicalize "hy-Armn-arevmda" to "hyw-Armn" based on the wording in <https://tools.ietf.org/html/rfc5646#section-3.1.2>:
> Prefix's field-body contains a valid language tag that is
> RECOMMENDED as one possible prefix to this record's subtag.
This patch replaces |langTagMappings| with |grandfatheredMappings| for grandfathered tags and a new function called |updateLangTagMappings| which performs an in place update for complete tags.
As a side-effect of these changes, we're now also changing the canonicalization for some redundant language tags. For example "zh-wuu-u-ca-chinese" is now canonicalized to "wuu-u-ca-chinese". I didn't see anything in RFC 5646 which disallows this canonicalization, so I assume it's okay to do. (I'd pretty much prefer these tags were just remove from the registry instead of keeping them as a "historical curiosity" <https://tools.ietf.org/html/rfc5646#section-2.2.8>):
> These redundant tags are maintained in the registry as records of type 'redundant', mostly as a matter of historical curiosity.
|updateLangTags| in make_intl_data.py was changed to use |io.open| instead of |codecs.open|, because the latter is discouraged in newer Python code and because using |io.open| is more consistent with the rest of make_intl_data.py.
This patch has at least one positive aspect: Removing the "locale" property from the language tag object of non-grandfathered tags, having a separate |grandfatheredMappings| table, and correcting the case for script and region subtags directly in |parseLanguageTag| makes Intl.Locale (bug 1433303) a bit easier to implement. :-)
Attachment #8964740 -
Flags: review?(jwalden+bmo)
Assignee | ||
Comment 2•7 years ago
|
||
(In reply to André Bargull [:anba] from comment #1)
> As a side-effect of these changes, we're now also changing the
> canonicalization for some redundant language tags. For example
> "zh-wuu-u-ca-chinese" is now canonicalized to "wuu-u-ca-chinese".
Err, "zh-wuu" is a bad example, because it's also covered by extlang mappings. A better one: "sgn-US-u-ca-gregory" is now canonicalized to "ase-u-ca-gregory".
Assignee | ||
Comment 3•7 years ago
|
||
Updated patch to remove extra mappings for redundant language tags when they're also covered by equivalent extlang mappings, cf. comment #2.
Attachment #8964740 -
Attachment is obsolete: true
Attachment #8964740 -
Flags: review?(jwalden+bmo)
Attachment #8964898 -
Flags: review?(jwalden+bmo)
Assignee | ||
Comment 4•7 years ago
|
||
Hmm, the new mapping for "sgn-US-u-ca-gregory" to "ase-u-ca-gregory" could be wrong, at least when only considering the text in <https://tools.ietf.org/html/rfc5646#section-4.5>:
> Redundant or grandfathered tags are replaced by their 'Preferred-Value', if there is one.
Where |tag| means the complete language tag, so it's presumably not allowed to perform some form of "Extended Filtering" (RFC 4647) for these tags. :-)
Comment 5•7 years ago
|
||
Comment on attachment 8964898 [details] [diff] [review]
bug1451082.patch
Review of attachment 8964898 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/builtin/intl/CommonFunctions.js
@@ +610,1 @@
> canonical += "-" + script;
Given we're now implicitly depending on |parseLanguageTag| to give this a particular casing, let's add
assert(script ==
callFunction(std_String_toUpperCase, script[0]) +
callFunction(std_String_toLowerCase, Substring(script, 1, script.length - 1)),
"script must be [A-Z][a-z]{3}");
@@ +614,5 @@
> // "BU" -> "MM"
> if (hasOwn(region, regionMappings))
> region = regionMappings[region];
>
> canonical += "-" + region;
Same sort of assertion here.
@@ +698,5 @@
>
> // The language subtag is canonicalized to lower case.
> locale = callFunction(std_String_toLowerCase, locale);
>
> + // updateLangTagMappings doesn't modify tags which contain only
s/which contain/containing/
::: js/src/builtin/intl/make_intl_data.py
@@ +219,2 @@
>
> +def writeMappingsFunction(println, mapping, extlangMappings, name, description, fileDate, url):
Can we s/mapping/langTagMappings/ since that's what this is and it's clearer in uses below precisely what's in this if we do?
@@ +264,5 @@
> + (kind, subtag) = next(splitSubtags(tag))
> + assert kind == Subtag.Language
> + return subtag
> +
> + def emitCompare(key, value, index):
"key" and "value" are not very helpful names here. subtag/preferred?
@@ +276,5 @@
> + lastVariant = None
> + for (kind, subtag) in splitSubtags(key):
> + if kind == Subtag.Language:
> + pass
> + elif kind == Subtag.ExtLang:
Kinda prefer s/elif/if/ here, same sort of no-else-after-return as in C++.
@@ +277,5 @@
> + for (kind, subtag) in splitSubtags(key):
> + if kind == Subtag.Language:
> + pass
> + elif kind == Subtag.ExtLang:
> + assert extlangIndex in [1, 2, 3]
, "Language-Tag permits no more than three extlang subtags"
@@ +312,5 @@
> +
> + # Update |tag| in place to use the preferred values.
> + key_it = splitSubtags(key)
> + value_it = splitSubtags(value)
> + (key_kind, key_subtag) = next(key_it, dummy)
Maybe worth a
def next_subtag():
(key_kind, key_subtag) = next(key_it, dummy)
helper function to use everywhere. Same for value too.
@@ +333,5 @@
> + println3(u"tag.extlang{} = undefined;".format(extlangIndex))
> + extlangIndex += 1
> + (key_kind, key_subtag) = next(key_it, dummy)
> +
> + # Update the script subtag.
These script/region update sections are exactly identical except for kind-value and field-name. I don't think you need to common them up, but it does seem at least worth deciding *not* to common them up, before having mostly-duplicate code.
@@ +362,5 @@
> +
> + # Update variant subtags.
> + if key_kind == Subtag.Variant or value_kind == Subtag.Variant:
> + # JS doesn't provide an easy way to remove elements from an array,
> + # instead we need to create a new array and copy all elements.
We could use ArrayFilter for this, but that does SpeciesCreate goo that's vulnerable to user-defined interference. Alas. (Possibly worth noting that problem for why we don't use it here.)
@@ +371,5 @@
> + # Copy all variant subtags, ignoring those which should be removed.
> + println3(u"for (var i = 0; i < tag.variants.length; i++) {")
> + println3(u" var variant = tag.variants[i];")
> + while key_kind == Subtag.Variant:
> + println3(u' if (variant === "{}") continue;'.format(key_subtag))
Put the continue on a new line with the usual formatting, please.
@@ +384,5 @@
> + (value_kind, value_subtag) = next(value_it, dummy)
> +
> + # Update the property.
> + println3(u"tag.variants = newVariants;")
> +
Would be nice to assert both iterators are exhausted here, if possible in some clean manner.
@@ +415,5 @@
> + # Return true if the mapping is for an extlang language and the extlang
> + # mapping table contains an equivalent entry and any trailing elements,
> + # if present, are the same.
> + return key_kind == Subtag.ExtLang and \
> + (key_extlang, {"preferred":value_lang, "prefix":key_lang}) in extlangMappings.items() and \
Does Python style really demand no spaces between key: and value? Ugh. Please don't if possible.
@@ +418,5 @@
> + return key_kind == Subtag.ExtLang and \
> + (key_extlang, {"preferred":value_lang, "prefix":key_lang}) in extlangMappings.items() and \
> + list(key_it) == list(value_it)
> +
> + mapping = {key: value for (key, value) in mapping.items() if not hasExtlangMapping(key, value)}
Mm, okay, so |mapping| isn't the same as langTagMappings now. langTagMappingsMinimized or something.
@@ +434,5 @@
> + for lang in sorted(set(language(key) for key in mapping)):
> + println(u' case "{}":'.format(lang))
> + index = 0
> + for key in sorted(key for key in mapping if language(key) == lang):
> + assert isinstance(mapping[key], basestring), "Only supports complete language tags"
Generally I think assertion messages don't start with caps.
Attachment #8964898 -
Flags: review?(jwalden+bmo) → review+
Assignee | ||
Comment 6•7 years ago
|
||
(In reply to Jeff Walden [:Waldo] from comment #5)
> ::: js/src/builtin/intl/make_intl_data.py
> @@ +264,5 @@
> > + (kind, subtag) = next(splitSubtags(tag))
> > + assert kind == Subtag.Language
> > + return subtag
> > +
> > + def emitCompare(key, value, index):
>
> "key" and "value" are not very helpful names here. subtag/preferred?
I've went with "tag" and "preferred", because the input can be a complete language tag.
>
> @@ +276,5 @@
> > + lastVariant = None
> > + for (kind, subtag) in splitSubtags(key):
> > + if kind == Subtag.Language:
> > + pass
> > + elif kind == Subtag.ExtLang:
>
> Kinda prefer s/elif/if/ here, same sort of no-else-after-return as in C++.
Hmm, that only works with s/pass/continue/. Is this what you had in mind?
> @@ +312,5 @@
> > +
> > + # Update |tag| in place to use the preferred values.
> > + key_it = splitSubtags(key)
> > + value_it = splitSubtags(value)
> > + (key_kind, key_subtag) = next(key_it, dummy)
>
> Maybe worth a
>
> def next_subtag():
> (key_kind, key_subtag) = next(key_it, dummy)
>
> helper function to use everywhere. Same for value too.
That'll require |nonlocal| which in turn requires Python 3. :-)
>
> @@ +333,5 @@
> > + println3(u"tag.extlang{} = undefined;".format(extlangIndex))
> > + extlangIndex += 1
> > + (key_kind, key_subtag) = next(key_it, dummy)
> > +
> > + # Update the script subtag.
>
> These script/region update sections are exactly identical except for
> kind-value and field-name. I don't think you need to common them up, but it
> does seem at least worth deciding *not* to common them up, before having
> mostly-duplicate code.
It's just as easy to just merge them and merging them won't hurt readability.
Assignee | ||
Comment 7•7 years ago
|
||
Updated per review comments in comment #6. Also includes the change mentioned in comment #5 to only consider complete tags for redundant language tags. And replaces the backslashes in the original patch with parenthesised expressions where possible, because PEP 8 or so.
Attachment #8964898 -
Attachment is obsolete: true
Attachment #8970311 -
Flags: review+
Comment 8•7 years ago
|
||
(In reply to André Bargull [:anba] from comment #6)
> Hmm, that only works with s/pass/continue/. Is this what you had in mind?
Er, right. Yes, I think continue is better than pass in terms of simplifying the code in the loop, and in understanding it.
> That'll require |nonlocal| which in turn requires Python 3. :-)
Bah, I had forgotten that Python's scoping was cracktastic. And somehow I have never heard of nonlocal before, which is kinda shocking.
Assignee | ||
Comment 9•7 years ago
|
||
Replace |pass| with |continue| per comment #8.
Attachment #8970311 -
Attachment is obsolete: true
Attachment #8970986 -
Flags: review+
Assignee | ||
Comment 10•7 years ago
|
||
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2ebdb48d52da55bfea32c787848b46ab047c7391
Keywords: checkin-needed
Comment 11•7 years ago
|
||
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/69a75b74af93
Update IANA language subtag registry data to version 2018-03-30. r=Waldo
Keywords: checkin-needed
Comment 12•7 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in
before you can comment on or make changes to this bug.
Description
•