Closed Bug 1451082 Opened 6 years ago Closed 6 years ago

Update IANA Language Subtag Registry data to version 2018-03-30

Categories

(Core :: JavaScript: Internationalization API, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: anba, Assigned: anba)

References

Details

Attachments

(1 file, 3 obsolete files)

https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
---
%%
Type: variant
Subtag: arevela
Description: Eastern Armenian
Added: 2006-09-18
Deprecated: 2018-03-24
Preferred-Value: hy
Prefix: hy
%%
Type: variant
Subtag: arevmda
Description: Western Armenian
Added: 2006-09-18
Deprecated: 2018-03-24
Preferred-Value: hyw
Prefix: hy
%%
---

-> https://searchfox.org/mozilla-central/rev/f5fb323246bf22a3a3b4185882a1c5d8a2c02996/js/src/builtin/intl/make_intl_data.py#148-150

:-(((((((((((
Attached patch bug1451082.patch (obsolete) — Splinter Review
The latest version of language tag registry added preferred subtags for the variants "arevela" and "arevmda". 

So we need to support:

js> Intl.getCanonicalLocales("hy-arevmda")
["hyw"]


And it doesn't seem too unreasonable to me we also want to support this replacement when script, region, or other subtags are present:

js> Intl.getCanonicalLocales("hy-Armn-arevmda")
["hyw-Armn"]


But the later is not possible with the current |langTagMappings| mapping table, because that table can only map complete tags. :-(

I assume it is correct to canonicalize "hy-Armn-arevmda" to "hyw-Armn" based on the wording in <https://tools.ietf.org/html/rfc5646#section-3.1.2>:

> Prefix's field-body contains a valid language tag that is
> RECOMMENDED as one possible prefix to this record's subtag.


This patch replaces |langTagMappings| with |grandfatheredMappings| for grandfathered tags and a new function called |updateLangTagMappings| which performs an in place update for complete tags. 

As a side-effect of these changes, we're now also changing the canonicalization for some redundant language tags. For example "zh-wuu-u-ca-chinese" is now canonicalized to "wuu-u-ca-chinese". I didn't see anything in RFC 5646 which disallows this canonicalization, so I assume it's okay to do. (I'd pretty much prefer these tags were just remove from the registry instead of keeping them as a "historical curiosity" <https://tools.ietf.org/html/rfc5646#section-2.2.8>):

> These redundant tags are maintained in the registry as records of type 'redundant', mostly as a matter of historical curiosity.


|updateLangTags| in make_intl_data.py was changed to use |io.open| instead of |codecs.open|, because the latter is discouraged in newer Python code and because using |io.open| is more consistent with the rest of make_intl_data.py.

This patch has at least one positive aspect: Removing the "locale" property from the language tag object of non-grandfathered tags, having a separate |grandfatheredMappings| table, and correcting the case for script and region subtags directly in |parseLanguageTag| makes Intl.Locale (bug 1433303) a bit easier to implement. :-)
Attachment #8964740 - Flags: review?(jwalden+bmo)
Blocks: 1433303
(In reply to André Bargull [:anba] from comment #1)
> As a side-effect of these changes, we're now also changing the
> canonicalization for some redundant language tags. For example
> "zh-wuu-u-ca-chinese" is now canonicalized to "wuu-u-ca-chinese".

Err, "zh-wuu" is a bad example, because it's also covered by extlang mappings. A better one: "sgn-US-u-ca-gregory" is now canonicalized to "ase-u-ca-gregory".
Attached patch bug1451082.patch (obsolete) — Splinter Review
Updated patch to remove extra mappings for redundant language tags when they're also covered by equivalent extlang mappings, cf. comment #2.
Attachment #8964740 - Attachment is obsolete: true
Attachment #8964740 - Flags: review?(jwalden+bmo)
Attachment #8964898 - Flags: review?(jwalden+bmo)
Hmm, the new mapping for "sgn-US-u-ca-gregory" to "ase-u-ca-gregory" could be wrong, at least when only considering the text in <https://tools.ietf.org/html/rfc5646#section-4.5>:
> Redundant or grandfathered tags are replaced by their 'Preferred-Value', if there is one.

Where |tag| means the complete language tag, so it's presumably not allowed to perform some form of "Extended Filtering" (RFC 4647) for these tags. :-)
Comment on attachment 8964898 [details] [diff] [review]
bug1451082.patch

Review of attachment 8964898 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/intl/CommonFunctions.js
@@ +610,1 @@
>          canonical += "-" + script;

Given we're now implicitly depending on |parseLanguageTag| to give this a particular casing, let's add

  assert(script ==
         callFunction(std_String_toUpperCase, script[0]) +
         callFunction(std_String_toLowerCase, Substring(script, 1, script.length - 1)),
         "script must be [A-Z][a-z]{3}");

@@ +614,5 @@
>          // "BU" -> "MM"
>          if (hasOwn(region, regionMappings))
>              region = regionMappings[region];
>  
>          canonical += "-" + region;

Same sort of assertion here.

@@ +698,5 @@
>  
>          // The language subtag is canonicalized to lower case.
>          locale = callFunction(std_String_toLowerCase, locale);
>  
> +        // updateLangTagMappings doesn't modify tags which contain only

s/which contain/containing/

::: js/src/builtin/intl/make_intl_data.py
@@ +219,2 @@
>  
> +def writeMappingsFunction(println, mapping, extlangMappings, name, description, fileDate, url):

Can we s/mapping/langTagMappings/ since that's what this is and it's clearer in uses below precisely what's in this if we do?

@@ +264,5 @@
> +        (kind, subtag) = next(splitSubtags(tag))
> +        assert kind == Subtag.Language
> +        return subtag
> +
> +    def emitCompare(key, value, index):

"key" and "value" are not very helpful names here.  subtag/preferred?

@@ +276,5 @@
> +        lastVariant = None
> +        for (kind, subtag) in splitSubtags(key):
> +            if kind == Subtag.Language:
> +                pass
> +            elif kind == Subtag.ExtLang:

Kinda prefer s/elif/if/ here, same sort of no-else-after-return as in C++.

@@ +277,5 @@
> +        for (kind, subtag) in splitSubtags(key):
> +            if kind == Subtag.Language:
> +                pass
> +            elif kind == Subtag.ExtLang:
> +                assert extlangIndex in [1, 2, 3]

, "Language-Tag permits no more than three extlang subtags"

@@ +312,5 @@
> +
> +        # Update |tag| in place to use the preferred values.
> +        key_it = splitSubtags(key)
> +        value_it = splitSubtags(value)
> +        (key_kind, key_subtag) = next(key_it, dummy)

Maybe worth a

  def next_subtag():
    (key_kind, key_subtag) = next(key_it, dummy)

helper function to use everywhere.  Same for value too.

@@ +333,5 @@
> +            println3(u"tag.extlang{} = undefined;".format(extlangIndex))
> +            extlangIndex += 1
> +            (key_kind, key_subtag) = next(key_it, dummy)
> +
> +        # Update the script subtag.

These script/region update sections are exactly identical except for kind-value and field-name.  I don't think you need to common them up, but it does seem at least worth deciding *not* to common them up, before having mostly-duplicate code.

@@ +362,5 @@
> +
> +        # Update variant subtags.
> +        if key_kind == Subtag.Variant or value_kind == Subtag.Variant:
> +            # JS doesn't provide an easy way to remove elements from an array,
> +            # instead we need to create a new array and copy all elements.

We could use ArrayFilter for this, but that does SpeciesCreate goo that's vulnerable to user-defined interference.  Alas.  (Possibly worth noting that problem for why we don't use it here.)

@@ +371,5 @@
> +            # Copy all variant subtags, ignoring those which should be removed.
> +            println3(u"for (var i = 0; i < tag.variants.length; i++) {")
> +            println3(u"    var variant = tag.variants[i];")
> +            while key_kind == Subtag.Variant:
> +                println3(u'    if (variant === "{}") continue;'.format(key_subtag))

Put the continue on a new line with the usual formatting, please.

@@ +384,5 @@
> +                (value_kind, value_subtag) = next(value_it, dummy)
> +
> +            # Update the property.
> +            println3(u"tag.variants = newVariants;")
> +

Would be nice to assert both iterators are exhausted here, if possible in some clean manner.

@@ +415,5 @@
> +        # Return true if the mapping is for an extlang language and the extlang
> +        # mapping table contains an equivalent entry and any trailing elements,
> +        # if present, are the same.
> +        return key_kind == Subtag.ExtLang and \
> +               (key_extlang, {"preferred":value_lang, "prefix":key_lang}) in extlangMappings.items() and \

Does Python style really demand no spaces between key: and value?  Ugh.  Please don't if possible.

@@ +418,5 @@
> +        return key_kind == Subtag.ExtLang and \
> +               (key_extlang, {"preferred":value_lang, "prefix":key_lang}) in extlangMappings.items() and \
> +               list(key_it) == list(value_it)
> +
> +    mapping = {key: value for (key, value) in mapping.items() if not hasExtlangMapping(key, value)}

Mm, okay, so |mapping| isn't the same as langTagMappings now.  langTagMappingsMinimized or something.

@@ +434,5 @@
> +    for lang in sorted(set(language(key) for key in mapping)):
> +        println(u'      case "{}":'.format(lang))
> +        index = 0
> +        for key in sorted(key for key in mapping if language(key) == lang):
> +            assert isinstance(mapping[key], basestring), "Only supports complete language tags"

Generally I think assertion messages don't start with caps.
Attachment #8964898 - Flags: review?(jwalden+bmo) → review+
(In reply to Jeff Walden [:Waldo] from comment #5)
> ::: js/src/builtin/intl/make_intl_data.py
> @@ +264,5 @@
> > +        (kind, subtag) = next(splitSubtags(tag))
> > +        assert kind == Subtag.Language
> > +        return subtag
> > +
> > +    def emitCompare(key, value, index):
> 
> "key" and "value" are not very helpful names here.  subtag/preferred?

I've went with "tag" and "preferred", because the input can be a complete language tag.


> 
> @@ +276,5 @@
> > +        lastVariant = None
> > +        for (kind, subtag) in splitSubtags(key):
> > +            if kind == Subtag.Language:
> > +                pass
> > +            elif kind == Subtag.ExtLang:
> 
> Kinda prefer s/elif/if/ here, same sort of no-else-after-return as in C++.

Hmm, that only works with s/pass/continue/. Is this what you had in mind?


> @@ +312,5 @@
> > +
> > +        # Update |tag| in place to use the preferred values.
> > +        key_it = splitSubtags(key)
> > +        value_it = splitSubtags(value)
> > +        (key_kind, key_subtag) = next(key_it, dummy)
> 
> Maybe worth a
> 
>   def next_subtag():
>     (key_kind, key_subtag) = next(key_it, dummy)
> 
> helper function to use everywhere.  Same for value too.

That'll require |nonlocal| which in turn requires Python 3. :-)


> 
> @@ +333,5 @@
> > +            println3(u"tag.extlang{} = undefined;".format(extlangIndex))
> > +            extlangIndex += 1
> > +            (key_kind, key_subtag) = next(key_it, dummy)
> > +
> > +        # Update the script subtag.
> 
> These script/region update sections are exactly identical except for
> kind-value and field-name.  I don't think you need to common them up, but it
> does seem at least worth deciding *not* to common them up, before having
> mostly-duplicate code.

It's just as easy to just merge them and merging them won't hurt readability.
Attached patch bug1451082.patch (obsolete) — Splinter Review
Updated per review comments in comment #6. Also includes the change mentioned in comment #5 to only consider complete tags for redundant language tags. And replaces the backslashes in the original patch with parenthesised expressions where possible, because PEP 8 or so.
Attachment #8964898 - Attachment is obsolete: true
Attachment #8970311 - Flags: review+
(In reply to André Bargull [:anba] from comment #6)
> Hmm, that only works with s/pass/continue/. Is this what you had in mind?

Er, right.  Yes, I think continue is better than pass in terms of simplifying the code in the loop, and in understanding it.

> That'll require |nonlocal| which in turn requires Python 3. :-)

Bah, I had forgotten that Python's scoping was cracktastic.  And somehow I have never heard of nonlocal before, which is kinda shocking.
Attached patch bug1451082.patchSplinter Review
Replace |pass| with |continue| per comment #8.
Attachment #8970311 - Attachment is obsolete: true
Attachment #8970986 - Flags: review+
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/69a75b74af93
Update IANA language subtag registry data to version 2018-03-30. r=Waldo
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/69a75b74af93
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in before you can comment on or make changes to this bug.