Closed Bug 1157277 Opened 4 years ago Closed 3 years ago

Update String.prototype.toLowerCase, toUpperCase to work on code points

Categories

(Core :: JavaScript: Standard Library, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox40 --- affected
firefox50 --- fixed

People

(Reporter: anba, Assigned: arai)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

ES2015 and ECMA402v2 changed String.prototype.{toLowerCase, toUpperCase, toLocaleLowerCase, toLocaleUpperCase} to work on code points instead of code units.


String.prototype.to[Locale]LowerCase(): "\uD801\uDC00".toLowerCase() (and "\uD801\uDC00".toLocaleLowerCase())

Expected: Returns "\uD801\uDC28"
Actual: Returns "\uD801\uDC00"


String.prototype.to[Locale]UpperCase(): "\uD801\uDC28".toUpperCase() (and "\uD801\uDC28".toLocaleUpperCase())

Expected: Returns "\uD801\uDC00"
Actual: Returns "\uD801\uDC28"
Assignee: nobody → arai.unmht
currently case folding information in non-BMP area is stored in FOR_EACH_NON_BMP_CASE_FOLDING macro, that is written manually.

this patch changes make_unicode.py to generate the macro automatically, and also lower/upper case information too.

current FOR_EACH_NON_BMP_CASE_FOLDING macro contains both `code ==> CaseFolding(case)` and `CaseFolding(case) ==> code` information, but this patch separates them into 2 macros: FOR_EACH_NON_BMP_CASE_FOLDING and FOR_EACH_NON_BMP_REV_CASE_FOLDING.
Attachment #8768191 - Flags: review?(till)
This patch adds non-BMP variant for CanUpperCase/CanLowerCase and ToUpperCase/ToLowerCase functions.

CanUpperCase/CanLowerCase receives surrogate pair and returns if it can be upper/lower case.

ToUpperCase/ToLowerCase receives surrogate pair and returns trail surrogate for upper/lower case, as currently lead surrogate is same between upper/lower cases.
Attachment #8768192 - Flags: review?(till)
This patch adds surrogate pair handling to toLowerCase/toUpperCase (and ToLocale* fallback too), and adds testcase for them.
Attachment #8768194 - Flags: review?(till)
String.prototype.{toLocaleLowerCase,toLocaleUpperCase} on browser already works on code points, so we don't touch them here.

Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8aa2a69127b9
Comment on attachment 8768191 [details] [diff] [review]
Part 1: Generate macros for non-BMP lowercase/uppercase/folding with make_unicode.py.

Review of attachment 8768191 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay, looks good.
Attachment #8768191 - Flags: review?(till) → review+
Comment on attachment 8768192 [details] [diff] [review]
Part 2: Add functions to handle non-BMP ToLowerCase/ToUpperCase.

Review of attachment 8768192 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comment addressed.

::: js/src/vm/Unicode.h
@@ +237,5 @@
>  
> +inline bool
> +CanUpperCaseNonBMP(char16_t lead, char16_t trail)
> +{
> +#define CHECK_RANGE(FROM, TO, LEAD, TRAIL_FROM, TRAIL_TO, DIFF) \

Any reason you can't define CHECK_RANGE and CALC_TRAIL once and use them at both call sites? If that is indeed possible, please change to that.
Attachment #8768192 - Flags: review?(till) → review+
Comment on attachment 8768194 [details] [diff] [review]
Part 3: Update String.prototype.{toLowerCase,toUpperCase,toLocaleLowerCase,toLocaleUpperCase} to work on code points.

Review of attachment 8768194 [details] [diff] [review]:
-----------------------------------------------------------------

I'm somewhat unhappy about the multiple occurrences of almost-identical code here, but yeah, hoisting that out would be somewhat annoying. So, let's keep this as-is.

::: js/src/vm/make_unicode.py
@@ +612,5 @@
>      generate_unicode_stuff(unicode_data, case_folding,
>          open('Unicode.cpp', 'w'),
>          open('UnicodeNonBMP.h', 'w'),
>          open('../tests/ecma_5/String/string-upper-lower-mapping.js', 'w'),
> +        open('../tests/ecma_6/String/string-upper-lower-mapping.js', 'w'),

Can you name this file something like "string-code-point-upper-lower-mapping.js"? Having it named identical to the ES5 one seems confusing.
Attachment #8768194 - Flags: review?(till) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/0a822e7f78cbb140a6172b134ba05b4c0a83e6e4
Bug 1157277 - Part 1: Generate macros for non-BMP lowercase/uppercase/folding with make_unicode.py. r=till

https://hg.mozilla.org/integration/mozilla-inbound/rev/de1cf380b1d55c91ce5bd7c07f917510fa98a55e
Bug 1157277 - Part 2: Add functions to handle non-BMP ToLowerCase/ToUpperCase. r=till

https://hg.mozilla.org/integration/mozilla-inbound/rev/4e952735e78954dffc362bc6b8e5b3b4f6147313
Bug 1157277 - Part 3: Update String.prototype.{toLowerCase,toUpperCase,toLocaleLowerCase,toLocaleUpperCase} to work on code points. r=till
You need to log in before you can comment on or make changes to this bug.