Closed Bug 1451082 Opened 7 years ago Closed 7 years ago

Update IANA Language Subtag Registry data to version 2018-03-30

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla61

Tracking Flags:

Tracking

Status

firefox61

---

fixed

People

(Reporter: anba, Assigned: anba)

References

Details

Attachments

(1 file, 3 obsolete files)

bug1451082.patch 7 years ago André Bargull [:anba] 45.33 KB, patch		Details \| Diff \| Splinter Review
bug1451082.patch 7 years ago André Bargull [:anba] 45.51 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
bug1451082.patch 7 years ago André Bargull [:anba] 55.95 KB, patch	anba : review+	Details \| Diff \| Splinter Review
bug1451082.patch 7 years ago André Bargull [:anba] 55.95 KB, patch	anba : review+	Details \| Diff \| Splinter Review

André Bargull [:anba]

Assignee

Description

•

7 years ago

https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry --- %% Type: variant Subtag: arevela Description: Eastern Armenian Added: 2006-09-18 Deprecated: 2018-03-24 Preferred-Value: hy Prefix: hy %% Type: variant Subtag: arevmda Description: Western Armenian Added: 2006-09-18 Deprecated: 2018-03-24 Preferred-Value: hyw Prefix: hy %% --- -> https://searchfox.org/mozilla-central/rev/f5fb323246bf22a3a3b4185882a1c5d8a2c02996/js/src/builtin/intl/make_intl_data.py#148-150 :-(((((((((((

André Bargull [:anba]

Assignee

Comment 1

•

7 years ago

Attached patch bug1451082.patch (obsolete) — Details — Splinter Review

The latest version of language tag registry added preferred subtags for the variants "arevela" and "arevmda". So we need to support: js> Intl.getCanonicalLocales("hy-arevmda") ["hyw"] And it doesn't seem too unreasonable to me we also want to support this replacement when script, region, or other subtags are present: js> Intl.getCanonicalLocales("hy-Armn-arevmda") ["hyw-Armn"] But the later is not possible with the current |langTagMappings| mapping table, because that table can only map complete tags. :-( I assume it is correct to canonicalize "hy-Armn-arevmda" to "hyw-Armn" based on the wording in <https://tools.ietf.org/html/rfc5646#section-3.1.2>: > Prefix's field-body contains a valid language tag that is > RECOMMENDED as one possible prefix to this record's subtag. This patch replaces |langTagMappings| with |grandfatheredMappings| for grandfathered tags and a new function called |updateLangTagMappings| which performs an in place update for complete tags. As a side-effect of these changes, we're now also changing the canonicalization for some redundant language tags. For example "zh-wuu-u-ca-chinese" is now canonicalized to "wuu-u-ca-chinese". I didn't see anything in RFC 5646 which disallows this canonicalization, so I assume it's okay to do. (I'd pretty much prefer these tags were just remove from the registry instead of keeping them as a "historical curiosity" <https://tools.ietf.org/html/rfc5646#section-2.2.8>): > These redundant tags are maintained in the registry as records of type 'redundant', mostly as a matter of historical curiosity. |updateLangTags| in make_intl_data.py was changed to use |io.open| instead of |codecs.open|, because the latter is discouraged in newer Python code and because using |io.open| is more consistent with the rest of make_intl_data.py. This patch has at least one positive aspect: Removing the "locale" property from the language tag object of non-grandfathered tags, having a separate |grandfatheredMappings| table, and correcting the case for script and region subtags directly in |parseLanguageTag| makes Intl.Locale (bug 1433303) a bit easier to implement. :-)

Attachment #8964740 - Flags: review?(jwalden+bmo)

André Bargull [:anba]

Assignee

Updated

•

7 years ago

Blocks: 1433303

André Bargull [:anba]

Assignee

Comment 2

•

7 years ago

(In reply to André Bargull [:anba] from comment #1) > As a side-effect of these changes, we're now also changing the > canonicalization for some redundant language tags. For example > "zh-wuu-u-ca-chinese" is now canonicalized to "wuu-u-ca-chinese". Err, "zh-wuu" is a bad example, because it's also covered by extlang mappings. A better one: "sgn-US-u-ca-gregory" is now canonicalized to "ase-u-ca-gregory".

André Bargull [:anba]

Assignee

Comment 3

•

7 years ago

Attached patch bug1451082.patch (obsolete) — Details — Splinter Review

Updated patch to remove extra mappings for redundant language tags when they're also covered by equivalent extlang mappings, cf. comment #2.

Attachment #8964740 - Attachment is obsolete: true

Attachment #8964740 - Flags: review?(jwalden+bmo)

Attachment #8964898 - Flags: review?(jwalden+bmo)

André Bargull [:anba]

Assignee

Comment 4

•

7 years ago

Hmm, the new mapping for "sgn-US-u-ca-gregory" to "ase-u-ca-gregory" could be wrong, at least when only considering the text in <https://tools.ietf.org/html/rfc5646#section-4.5>: > Redundant or grandfathered tags are replaced by their 'Preferred-Value', if there is one. Where |tag| means the complete language tag, so it's presumably not allowed to perform some form of "Extended Filtering" (RFC 4647) for these tags. :-)

Jeff Walden [:Waldo]

Comment 5

•

7 years ago

Comment on attachment 8964898 [details] [diff] [review] bug1451082.patch Review of attachment 8964898 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/builtin/intl/CommonFunctions.js @@ +610,1 @@ > canonical += "-" + script; Given we're now implicitly depending on |parseLanguageTag| to give this a particular casing, let's add assert(script == callFunction(std_String_toUpperCase, script[0]) + callFunction(std_String_toLowerCase, Substring(script, 1, script.length - 1)), "script must be [A-Z][a-z]{3}"); @@ +614,5 @@ > // "BU" -> "MM" > if (hasOwn(region, regionMappings)) > region = regionMappings[region]; > > canonical += "-" + region; Same sort of assertion here. @@ +698,5 @@ > > // The language subtag is canonicalized to lower case. > locale = callFunction(std_String_toLowerCase, locale); > > + // updateLangTagMappings doesn't modify tags which contain only s/which contain/containing/ ::: js/src/builtin/intl/make_intl_data.py @@ +219,2 @@ > > +def writeMappingsFunction(println, mapping, extlangMappings, name, description, fileDate, url): Can we s/mapping/langTagMappings/ since that's what this is and it's clearer in uses below precisely what's in this if we do? @@ +264,5 @@ > + (kind, subtag) = next(splitSubtags(tag)) > + assert kind == Subtag.Language > + return subtag > + > + def emitCompare(key, value, index): "key" and "value" are not very helpful names here. subtag/preferred? @@ +276,5 @@ > + lastVariant = None > + for (kind, subtag) in splitSubtags(key): > + if kind == Subtag.Language: > + pass > + elif kind == Subtag.ExtLang: Kinda prefer s/elif/if/ here, same sort of no-else-after-return as in C++. @@ +277,5 @@ > + for (kind, subtag) in splitSubtags(key): > + if kind == Subtag.Language: > + pass > + elif kind == Subtag.ExtLang: > + assert extlangIndex in [1, 2, 3] , "Language-Tag permits no more than three extlang subtags" @@ +312,5 @@ > + > + # Update |tag| in place to use the preferred values. > + key_it = splitSubtags(key) > + value_it = splitSubtags(value) > + (key_kind, key_subtag) = next(key_it, dummy) Maybe worth a def next_subtag(): (key_kind, key_subtag) = next(key_it, dummy) helper function to use everywhere. Same for value too. @@ +333,5 @@ > + println3(u"tag.extlang{} = undefined;".format(extlangIndex)) > + extlangIndex += 1 > + (key_kind, key_subtag) = next(key_it, dummy) > + > + # Update the script subtag. These script/region update sections are exactly identical except for kind-value and field-name. I don't think you need to common them up, but it does seem at least worth deciding *not* to common them up, before having mostly-duplicate code. @@ +362,5 @@ > + > + # Update variant subtags. > + if key_kind == Subtag.Variant or value_kind == Subtag.Variant: > + # JS doesn't provide an easy way to remove elements from an array, > + # instead we need to create a new array and copy all elements. We could use ArrayFilter for this, but that does SpeciesCreate goo that's vulnerable to user-defined interference. Alas. (Possibly worth noting that problem for why we don't use it here.) @@ +371,5 @@ > + # Copy all variant subtags, ignoring those which should be removed. > + println3(u"for (var i = 0; i < tag.variants.length; i++) {") > + println3(u" var variant = tag.variants[i];") > + while key_kind == Subtag.Variant: > + println3(u' if (variant === "{}") continue;'.format(key_subtag)) Put the continue on a new line with the usual formatting, please. @@ +384,5 @@ > + (value_kind, value_subtag) = next(value_it, dummy) > + > + # Update the property. > + println3(u"tag.variants = newVariants;") > + Would be nice to assert both iterators are exhausted here, if possible in some clean manner. @@ +415,5 @@ > + # Return true if the mapping is for an extlang language and the extlang > + # mapping table contains an equivalent entry and any trailing elements, > + # if present, are the same. > + return key_kind == Subtag.ExtLang and \ > + (key_extlang, {"preferred":value_lang, "prefix":key_lang}) in extlangMappings.items() and \ Does Python style really demand no spaces between key: and value? Ugh. Please don't if possible. @@ +418,5 @@ > + return key_kind == Subtag.ExtLang and \ > + (key_extlang, {"preferred":value_lang, "prefix":key_lang}) in extlangMappings.items() and \ > + list(key_it) == list(value_it) > + > + mapping = {key: value for (key, value) in mapping.items() if not hasExtlangMapping(key, value)} Mm, okay, so |mapping| isn't the same as langTagMappings now. langTagMappingsMinimized or something. @@ +434,5 @@ > + for lang in sorted(set(language(key) for key in mapping)): > + println(u' case "{}":'.format(lang)) > + index = 0 > + for key in sorted(key for key in mapping if language(key) == lang): > + assert isinstance(mapping[key], basestring), "Only supports complete language tags" Generally I think assertion messages don't start with caps.

Attachment #8964898 - Flags: review?(jwalden+bmo) → review+

André Bargull [:anba]

Assignee

Comment 6

•

7 years ago

(In reply to Jeff Walden [:Waldo] from comment #5) > ::: js/src/builtin/intl/make_intl_data.py > @@ +264,5 @@ > > + (kind, subtag) = next(splitSubtags(tag)) > > + assert kind == Subtag.Language > > + return subtag > > + > > + def emitCompare(key, value, index): > > "key" and "value" are not very helpful names here. subtag/preferred? I've went with "tag" and "preferred", because the input can be a complete language tag. > > @@ +276,5 @@ > > + lastVariant = None > > + for (kind, subtag) in splitSubtags(key): > > + if kind == Subtag.Language: > > + pass > > + elif kind == Subtag.ExtLang: > > Kinda prefer s/elif/if/ here, same sort of no-else-after-return as in C++. Hmm, that only works with s/pass/continue/. Is this what you had in mind? > @@ +312,5 @@ > > + > > + # Update |tag| in place to use the preferred values. > > + key_it = splitSubtags(key) > > + value_it = splitSubtags(value) > > + (key_kind, key_subtag) = next(key_it, dummy) > > Maybe worth a > > def next_subtag(): > (key_kind, key_subtag) = next(key_it, dummy) > > helper function to use everywhere. Same for value too. That'll require |nonlocal| which in turn requires Python 3. :-) > > @@ +333,5 @@ > > + println3(u"tag.extlang{} = undefined;".format(extlangIndex)) > > + extlangIndex += 1 > > + (key_kind, key_subtag) = next(key_it, dummy) > > + > > + # Update the script subtag. > > These script/region update sections are exactly identical except for > kind-value and field-name. I don't think you need to common them up, but it > does seem at least worth deciding *not* to common them up, before having > mostly-duplicate code. It's just as easy to just merge them and merging them won't hurt readability.

André Bargull [:anba]

Assignee

Comment 7

•

7 years ago

Attached patch bug1451082.patch (obsolete) — Details — Splinter Review

Updated per review comments in comment #6. Also includes the change mentioned in comment #5 to only consider complete tags for redundant language tags. And replaces the backslashes in the original patch with parenthesised expressions where possible, because PEP 8 or so.

Attachment #8964898 - Attachment is obsolete: true

Attachment #8970311 - Flags: review+

Jeff Walden [:Waldo]

Comment 8

•

7 years ago

(In reply to André Bargull [:anba] from comment #6) > Hmm, that only works with s/pass/continue/. Is this what you had in mind? Er, right. Yes, I think continue is better than pass in terms of simplifying the code in the loop, and in understanding it. > That'll require |nonlocal| which in turn requires Python 3. :-) Bah, I had forgotten that Python's scoping was cracktastic. And somehow I have never heard of nonlocal before, which is kinda shocking.

André Bargull [:anba]

Assignee

Comment 9

•

7 years ago

Attached patch bug1451082.patch — Details — Splinter Review

Replace |pass| with |continue| per comment #8.

Attachment #8970311 - Attachment is obsolete: true

Attachment #8970986 - Flags: review+

André Bargull [:anba]

Assignee

Comment 10

•

7 years ago

Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2ebdb48d52da55bfea32c787848b46ab047c7391

Keywords: checkin-needed

Pulsebot

Comment 11

•

7 years ago

Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/69a75b74af93 Update IANA language subtag registry data to version 2018-03-30. r=Waldo

Keywords: checkin-needed

Andreea Pavel [:apavel]

Comment 12

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/69a75b74af93

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox61: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla61

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Update IANA Language Subtag Registry data to version 2018-03-30

Categories

(Core :: JavaScript: Internationalization API, enhancement)

Tracking

()

People

(Reporter: anba, Assigned: anba)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 3 obsolete files)

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Attachment

General

Description

File Name

Content Type