Closed Bug 1318403 Opened 9 years ago Closed 9 years ago

Add support for locale argument to String.prototype.toLocale{Lower,Upper}Case

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla55

Tracking Flags:

Tracking

Status

firefox55

---

fixed

People

(Reporter: anba, Assigned: anba)

References

(Blocks 1 open bug)

Details

(Keywords: dev-doc-complete)

Attachments

(12 files, 17 obsolete files)

bug1318403.patch 9 years ago André Bargull [:anba] 6.23 MB, patch		Details \| Diff \| Splinter Review
bug1318403.patch 9 years ago André Bargull [:anba] 6.23 MB, patch		Details \| Diff \| Splinter Review
bug1318403.patch 9 years ago André Bargull [:anba] 6.23 MB, patch		Details \| Diff \| Splinter Review
bug1318403-part1-reduce-code-duplication.patch 9 years ago André Bargull [:anba] 9.71 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
bug1318403-part2-process-special-casing.patch 9 years ago André Bargull [:anba] 24.73 KB, patch		Details \| Diff \| Splinter Review
bug1318403-part3-recreate-files.patch 9 years ago André Bargull [:anba] 6.04 MB, patch		Details \| Diff \| Splinter Review
bug1318403-part4-string-special-casing.patch 9 years ago André Bargull [:anba] 29.06 KB, patch	Waldo : review-	Details \| Diff \| Splinter Review
bug1318403-part5-intl-case-conversion.patch 9 years ago André Bargull [:anba] 26.93 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
bug1318403-part6-update-exclusion.patch 9 years ago André Bargull [:anba] 4.50 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
bug1318403-part2-process-special-casing.patch 9 years ago André Bargull [:anba] 22.53 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
bug1318403-part3-recreate-files.patch 9 years ago André Bargull [:anba] 6.03 MB, patch	Waldo : review+	Details \| Diff \| Splinter Review
bug1318403-part1.1-reduce-code-duplication.patch 9 years ago André Bargull [:anba] 9.72 KB, patch	anba : review+	Details \| Diff \| Splinter Review
bug1318403-part1.2-generate-files.patch 9 years ago André Bargull [:anba] 6.03 MB, patch	anba : review+	Details \| Diff \| Splinter Review
bug1318403-part2-process-special-casing.patch 9 years ago André Bargull [:anba] 23.22 KB, patch	anba : review+	Details \| Diff \| Splinter Review
bug1318403-part3-recreate-files-special-casing.patch 9 years ago André Bargull [:anba] 67.92 KB, patch	anba : review+	Details \| Diff \| Splinter Review
bug1318403-part2-process-special-casing.patch 9 years ago André Bargull [:anba] 23.21 KB, patch	anba : review+	Details \| Diff \| Splinter Review
bug1318403-part4.1-string-special-casing.patch 9 years ago André Bargull [:anba] 33.68 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
bug1318403-part4.2-use-realloc.patch 9 years ago André Bargull [:anba] 4.33 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
bug1318403-part4.3-toupper-duplicate-code.patch 9 years ago André Bargull [:anba] 4.31 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
bug1318403-part5-intl-case-conversion.patch 9 years ago André Bargull [:anba] 29.77 KB, patch	anba : review+	Details \| Diff \| Splinter Review
bug1318403-part6-update-exclusion.patch 9 years ago André Bargull [:anba] 4.47 KB, patch	anba : review+	Details \| Diff \| Splinter Review
bug1318403-part7-codepoint-names-for-generated.patch 9 years ago André Bargull [:anba] 30.13 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
bug1318403-part8-generate-with-codepoint-names.patch 9 years ago André Bargull [:anba] 555.89 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
bug1318403-part4.1-string-special-casing.patch 9 years ago André Bargull [:anba] 37.67 KB, patch	anba : review+	Details \| Diff \| Splinter Review
bug1318403-part4.2-use-realloc.patch 9 years ago André Bargull [:anba] 4.33 KB, patch	anba : review+	Details \| Diff \| Splinter Review
bug1318403-part4.3-toupper-duplicate-code.patch 9 years ago André Bargull [:anba] 4.29 KB, patch	anba : review+	Details \| Diff \| Splinter Review
bug1318403-part5-intl-case-conversion.patch 9 years ago André Bargull [:anba] 29.77 KB, patch	anba : review+	Details \| Diff \| Splinter Review
bug1318403-part6-update-exclusion.patch 9 years ago André Bargull [:anba] 4.61 KB, patch	anba : review+	Details \| Diff \| Splinter Review
bug1318403-part9-suppress-gc-analysis.patch 9 years ago André Bargull [:anba] 1.26 KB, patch	anba : review+	Details \| Diff \| Splinter Review

André Bargull [:anba]

Assignee

Description

•

9 years ago

Implement [1,2] from the ES Intl spec. [1] https://tc39.github.io/ecma402/#sup-string.prototype.tolocalelowercase [2] https://tc39.github.io/ecma402/#sup-string.prototype.tolocaleuppercase

Florian Scholz (Open Web Docs)

Updated

•

9 years ago

Keywords: dev-doc-needed

André Bargull [:anba]

Assignee

Updated

•

9 years ago

Blocks: test262

André Bargull [:anba]

Assignee

Updated

•

9 years ago

Blocks: 672042

André Bargull [:anba]

Assignee

Comment 1

•

9 years ago

Attached patch bug1318403.patch (obsolete) — Details — Splinter Review

Unfortunately it wasn't possible to simply switch to ICU without causing noticeable performance regressions, so this patch got a bit longer than expected. :-| builtin/String.js: Support the locale argument for String.prototype.toLocale{Lower,Upper}Case() per [1,2]. Language tag validation and canonicalization isn't exactly fast, so we need fast paths to handle the common cases when zero or one arguments are provided. [1] http://tc39.github.io/ecma402/#sup-string.prototype.tolocalelowercase [2] http://tc39.github.io/ecma402/#sup-string.prototype.tolocaleuppercase builtin/Intl.js: Again another fast path to handle language tags which contain only a language subtag. builtin/Intl.cpp: Call to ICU when language dependent case mapping is required, otherwise use the normal String.prototype.to{Lower,Upper}String() implementation. We can't use ICU for all case mappings, because it's 0.3 to 4 times slower when compared to the optimized case conversion methods from this patch. The language check is done in C++ instead of self-hosted code, because it's faster to do it in C++, at least when measured with µ-benchmarks. jsstr.cpp: The case mapping methods in jsstr.cpp only supported simple case mappings, but ICU always applies full case mappings (that means the special case mappings from Unicode's SpecialCasing.txt file are used). This forces us to implement full case mapping to avoid inconsistent results when String.prototype.to{Lower,Upper}String() is invoked. Yay! Full case mapping has the interesting property that it can change the string length for upper or lower case forms. For example "ß" (German sharp S) is upper cased to two "S" characters. But since most strings won't contain any characters which require special casing, we first try to case map the string into a new string of the same length. Only if that fails we inspect the whole string to compute the actual length for the case mapped string and then restart the case mapping process. Some special casing rules are inlined in jsstr.cpp, for example the lower case mapping for U+0130. This shouldn't be a forward compatibility problem with new Unicode versions, because SpecialCasing.txt doesn't get updated regularly (last update was in 2002 for Unicode 3.2). Nevertheless I've also added validations for the inlined cases to vm/make_unicode.py to make sure we're notified if it was updated. vm/make_unicode.py: Added new methods to read and process the SpecialCasing.txt file. To avoid checking every special casing character one after another, the generated code for js::unicode::CanUpperCaseSpecialCasing() is a bit more complex than a simple switch-statement, but hopefully the added complexity will pay off in practice (it certainly does in µ-benchmarks).

Attachment #8812872 - Flags: review?(jwalden+bmo)

André Bargull [:anba]

Assignee

Comment 2

•

9 years ago

Attached patch bug1318403.patch (obsolete) — Details — Splinter Review

Updated patch to apply cleanly on inbound/central.

Attachment #8812872 - Attachment is obsolete: true

Attachment #8812872 - Flags: review?(jwalden+bmo)

Attachment #8813291 - Flags: review?(jwalden+bmo)

André Bargull [:anba]

Assignee

Comment 3

•

9 years ago

Attached patch bug1318403.patch (obsolete) — Details — Splinter Review

Updated patch to apply cleanly on inbound.

Attachment #8813291 - Attachment is obsolete: true

Attachment #8813291 - Flags: review?(jwalden+bmo)

Attachment #8822705 - Flags: review?(jwalden+bmo)

Jeff Walden [:Waldo]

Updated

•

9 years ago

Summary: Add support for locale argument to String.prototype.toLocale{Lower,Uppper}Case → Add support for locale argument to String.prototype.toLocale{Lower,Upper}Case

André Bargull [:anba]

Assignee

Comment 4

•

9 years ago

Attached patch bug1318403-part1-reduce-code-duplication.patch (obsolete) — Details — Splinter Review

Reduce code duplication and prepare make_bmp_mapping_test for modifications in part 2.

Attachment #8822705 - Attachment is obsolete: true

Attachment #8822705 - Flags: review?(jwalden+bmo)

Attachment #8840173 - Flags: review?(jwalden+bmo)

André Bargull [:anba]

Assignee

Comment 5

•

9 years ago

Attached patch bug1318403-part2-process-special-casing.patch (obsolete) — Details — Splinter Review

Added new methods to read and process the Unicode SpecialCasing.txt file. To avoid checking every special casing character one after another, the generated code for js::unicode::CanUpperCaseSpecialCasing() is a bit more complex than a simple switch-statement, but hopefully the added complexity will pay off in practice (it certainly does in µ-benchmarks).

Attachment #8840175 - Flags: review?(jwalden+bmo)

André Bargull [:anba]

Assignee

Comment 6

•

9 years ago

Attached patch bug1318403-part3-recreate-files.patch (obsolete) — Details — Splinter Review

Rerun make_unicode.py to pick up the changes from part 2.

Attachment #8840177 - Flags: review?(jwalden+bmo)

André Bargull [:anba]

Assignee

Comment 7

•

9 years ago

Attached patch bug1318403-part4-string-special-casing.patch (obsolete) — Details — Splinter Review

Change String.prototype.to{Lower,Upper}Case() to handle Unicode special casing characters, so we finally apply full case mapping instead of just simple case mapping. Full case mapping has the interesting property that it can change the string length for upper or lower case forms. For example "ß" (German sharp S) is upper cased to two "S" characters. But since most strings won't contain any characters which require special casing, we first try to case map the string into a new string of the same length. Only if that fails we inspect the whole string to compute the actual length for the case mapped string and then restart the case mapping process. Some special casing rules are inlined in jsstr.cpp, for example the lower case mapping for U+0130. This shouldn't be a forward compatibility problem with new Unicode versions, because SpecialCasing.txt doesn't get updated regularly (last update was in 2002 for Unicode 3.2). Nevertheless I've also added validations for the inlined cases to vm/make_unicode.py to make sure we're notified if it was updated.

Attachment #8840178 - Flags: review?(jwalden+bmo)

André Bargull [:anba]

Assignee

Comment 8

•

9 years ago

Attached patch bug1318403-part5-intl-case-conversion.patch (obsolete) — Details — Splinter Review

builtin/String.js: Support the locale argument for String.prototype.toLocale{Lower,Upper}Case() per [1,2]. Language tag validation and canonicalization isn't exactly fast, so we need fast paths to handle the common cases when zero or one arguments are provided. [1] http://tc39.github.io/ecma402/#sup-string.prototype.tolocalelowercase [2] http://tc39.github.io/ecma402/#sup-string.prototype.tolocaleuppercase builtin/Intl.js: Again another fast path to handle language tags which contain only a language subtag. builtin/Intl.cpp: Call to ICU when language dependent case mapping is required, otherwise use the normal String.prototype.to{Lower,Upper}String() implementation. We can't use ICU for all case mappings, because it's 0.3 to 4 times slower when compared to the optimized case conversion methods from this patch. The language check is done in C++ instead of self-hosted code, because it's faster to do it in C++, at least when measured with µ-benchmarks.

Attachment #8840179 - Flags: review?(jwalden+bmo)

André Bargull [:anba]

Assignee

Comment 9

•

9 years ago

Attached patch bug1318403-part6-update-exclusion.patch (obsolete) — Details — Splinter Review

Update the test262 exclusions list.

Attachment #8840180 - Flags: review?(jwalden+bmo)

Jeff Walden [:Waldo]

Comment 10

•

9 years ago

Comment on attachment 8840173 [details] [diff] [review] bug1318403-part1-reduce-code-duplication.patch Review of attachment 8840173 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/make_unicode.py @@ +726,5 @@ > + > + def write_table(data_type, name, tbl, idx1_name, idx1, idx2_name, idx2, println): > + println('const {} unicode::{}[] = {{'.format(data_type, name)) > + for d in tbl: > + println(' {{{}}},'.format(', '.join(str(e) for e in d))) {{ {} }} , extra spacing making this more readable.

Attachment #8840173 - Flags: review?(jwalden+bmo) → review+

André Bargull [:anba]

Assignee

Comment 11

•

9 years ago

Attached patch bug1318403-part2-process-special-casing.patch (obsolete) — Details — Splinter Review

Now with 100% less uncurry. :-)

Attachment #8840175 - Attachment is obsolete: true

Attachment #8840175 - Flags: review?(jwalden+bmo)

Attachment #8844524 - Flags: review?(jwalden+bmo)

André Bargull [:anba]

Assignee

Comment 12

•

9 years ago

Attached patch bug1318403-part3-recreate-files.patch (obsolete) — Details — Splinter Review

Fixed the indentation issue.

Attachment #8840177 - Attachment is obsolete: true

Attachment #8840177 - Flags: review?(jwalden+bmo)

Attachment #8844527 - Flags: review?(jwalden+bmo)

Jeff Walden [:Waldo]

Comment 13

•

9 years ago

Comment on attachment 8844524 [details] [diff] [review] bug1318403-part2-process-special-casing.patch Review of attachment 8844524 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/make_unicode.py @@ +163,3 @@ > def int_ranges(ints): > """ Yields consecutive ranges (inclusive) from integer values. """ > from itertools import tee, izip_longest If we're importing izip_longest at top level, it could be removed here maybe. @@ +602,5 @@ > + # Special casing for U+03A3 (GREEK CAPITAL LETTER SIGMA). > + assert lowerCase(0x03A3) == 0x03C3 and conditional_tolower[0x03A3] == ([0x03C2], 'Final_Sigma'); > + > + return ( > + unconditional_tolower, unconditional_toupper All on one line? @@ +706,5 @@ > + println(indent, 'if (ch < {})'.format(hexlit(min_child))) > + println(indent, ' return false;') > + > + # If there's no successor block, we can omit the |input <= max_child| check, > + # because it was already check when we emitted the parent range test. checked @@ +782,5 @@ > + println(' switch(ch) {') > + for (code, converted) in sorted(code_map.iteritems(), key=itemgetter(0)): > + println(' case {}: return {};'.format(hexlit(code), len(converted))) > + println(' }') > + println(' MOZ_ASSERT_UNREACHABLE("Bad character input.");') I'd prefer a println('') before this. @@ +805,5 @@ > + println(' return;') > + > + println('}') > + > + write_has_method('CanUpperCaseSpecialCasing', unconditional_toupper) I'd prefer if the function names were just inlined in all these functions, as there's only one caller of each of these functions. And maybe rename each function to write_CanUpperCaseSpecialCasing and so on? "has"/"length"/"append" as names aren't as clear as those would be. @@ +832,5 @@ > (upper, lower, name, alias) = entry > + upper = unconditional_toupper[code] if code in unconditional_toupper else [upper] > + lower = unconditional_tolower[code] if code in unconditional_tolower else [lower] > + println(' ["{}", "{}"], /* {}{} */'.format("".join(imap(unicodeEsc, upper)), > + "".join(imap(unicodeEsc, lower)), It would have been nice if converting this to hexints were done in a separate patch, so the *semantic* changes to the generated test, reflecting the new casing logic, could actually be reviewed. But I'm not going to hold you to it if it's at all difficult.

Attachment #8844524 - Flags: review?(jwalden+bmo) → review+

Jeff Walden [:Waldo]

Comment 14

•

9 years ago

Comment on attachment 8844527 [details] [diff] [review] bug1318403-part3-recreate-files.patch Review of attachment 8844527 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/Unicode.cpp @@ +2685,5 @@ > + case 0x1F88: return 2; > + case 0x1F89: return 2; > + case 0x1F8A: return 2; > + case 0x1F8B: return 2; > + case 0x1F8C: return 2; A part of me sort of wishes these had the codepoint names in a comment next to them, like they are in the test. But this is followup-land at best.

Attachment #8844527 - Flags: review?(jwalden+bmo) → review+

Jeff Walden [:Waldo]

Updated

•

9 years ago

Attachment #8840180 - Flags: review?(jwalden+bmo) → review+

André Bargull [:anba]

Assignee

Comment 15

•

9 years ago

Attached patch bug1318403-part1.1-reduce-code-duplication.patch — Details — Splinter Review

Updated part 1 per review comments, carrying r+.

Attachment #8840173 - Attachment is obsolete: true

Attachment #8845057 - Flags: review+

André Bargull [:anba]

Assignee

Comment 16

•

9 years ago

Attached patch bug1318403-part1.2-generate-files.patch — Details — Splinter Review

Recreated files, split from part 3. Carrying r+ for part 3.

Attachment #8845058 - Flags: review+

André Bargull [:anba]

Assignee

Comment 17

•

9 years ago

Attached patch bug1318403-part2-process-special-casing.patch (obsolete) — Details — Splinter Review

Updated part 2 per review comments, carrying r+.

Attachment #8844524 - Attachment is obsolete: true

Attachment #8845059 - Flags: review+

André Bargull [:anba]

Assignee

Comment 18

•

9 years ago

Attached patch bug1318403-part3-recreate-files-special-casing.patch — Details — Splinter Review

Updated part 3 so it only includes special casing changes, carrying r+.

Attachment #8844527 - Attachment is obsolete: true

Attachment #8845060 - Flags: review+

André Bargull [:anba]

Assignee

Comment 19

•

9 years ago

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #13) > @@ +832,5 @@ > > (upper, lower, name, alias) = entry > > + upper = unconditional_toupper[code] if code in unconditional_toupper else [upper] > > + lower = unconditional_tolower[code] if code in unconditional_tolower else [lower] > > + println(' ["{}", "{}"], /* {}{} */'.format("".join(imap(unicodeEsc, upper)), > > + "".join(imap(unicodeEsc, lower)), > > It would have been nice if converting this to hexints were done in a > separate patch, so the *semantic* changes to the generated test, reflecting > the new casing logic, could actually be reviewed. But I'm not going to hold > you to it if it's at all difficult. Split part 3 into part 1.2 (only reformatting) and new part 3 (special casing changes).

André Bargull [:anba]

Assignee

Comment 20

•

9 years ago

Attached patch bug1318403-part2-process-special-casing.patch — Details — Splinter Review

Missed to address one review comment.

Attachment #8845059 - Attachment is obsolete: true

Attachment #8845065 - Flags: review+

André Bargull [:anba]

Assignee

Comment 21

•

9 years ago

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #14) > ::: js/src/vm/Unicode.cpp > @@ +2685,5 @@ > > + case 0x1F88: return 2; > > + case 0x1F89: return 2; > > + case 0x1F8A: return 2; > > + case 0x1F8B: return 2; > > + case 0x1F8C: return 2; > > A part of me sort of wishes these had the codepoint names in a comment next > to them, like they are in the test. But this is followup-land at best. Sure, I can do that. Do you also want the names added to the AppendUpperCaseSpecialCasing() and the existing IsIdentifierStartNonBMP() and IsIdentifierPartNonBMP() methods?

Jeff Walden [:Waldo]

Comment 22

•

9 years ago

Comment on attachment 8840178 [details] [diff] [review] bug1318403-part4-string-special-casing.patch Review of attachment 8840178 [details] [diff] [review]: ----------------------------------------------------------------- Looks broadly fine, but this is too many comments piling up for me to r+, even tho I really want to. ::: js/src/jsstr.cpp @@ +617,5 @@ > + } > + } > + // Ignore any characters with the property Case_Ignorable. > + // NB: We need to skip over all Case_Ignorable characters, even when > + // they also have the Cased binary property. There's a tests of this behavior somewhere, right? @@ +658,5 @@ > + * U+03A3 (GREEK CAPITAL LETTER SIGMA) has two different lower case mappings > + * depending on its context: > + * When it's preceded by a cased character and not followed by another cased > + * character, its lower case form is U+03C2 (GREEK SMALL LETTER FINAL SIGMA). > + * Otherwise its lower case mapping is U+03C3 (GREEK SMALL LETTER SIGMA). Mild preference for not parenthesizing the codepoint name because convention is not to. @@ +671,5 @@ > + MOZ_ASSERT(unicode::ToLowerCase(0x3A3) == 0x3C3); > + > +#if ENABLE_INTL_API > + if (Final_Sigma_PrecededByCased(chars, length, index) && > + !Final_Sigma_FollowedByCased(chars, length, index)) As these two functions are both only used in this function, could you just inline them both, including with early-returns? As long as there are good comments indicating what they're trying to detect, I don't think we need separate functions for the two detections performed. @@ +710,5 @@ > + } > + } > + > + // Special case for U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). > + if (c == 0x130) { Looks like vm/Unicode.h could stand to have a section of const char16_t GREEK_SMALL_LETTER_SIGMA = 0x03C3; const char16_t LATIN_CAPITAL_LETTER_I_WITH_DOT_ABOVE = 0x0130; ... with all the various constants that keep piling up. (And ideally that have piled up in other files in one-offs, but we can deal with them later.) This, the codepoints expanded below, the sigmas, &c. @@ +712,5 @@ > + > + // Special case for U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). > + if (c == 0x130) { > + // Return if the output buffer is too small. > + if (srcLength == destLength) Prefer >= as safer if something goes awry. Also in similar other places. @@ +743,5 @@ > +template <typename CharT> > +static size_t > +ToLowerCaseLength(const CharT* chars, size_t startIndex, size_t length) > +{ > + size_t lowerLength = length; Maybe add // U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE is the only codepoint that // lowercases to more than one codepoint. if (mozilla::IsSame<Latin1Char, CharT>::value) return lowerLength; so we don't have to rely on the compiler recognizing the loop never adds 1 in the Latin1Char case. @@ +767,1 @@ > size_t length = str->length(); I think it would be slightly helpful to readers (or at least this reader) to make this |const|, given we have two length-y variables floating around now. @@ +773,5 @@ > + // We don't need extra special casing checks in the loop below, because > + // U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE) and U+03A3 (GREEK > + // CAPITAL LETTER SIGMA) already have simple lower case mappings. > + MOZ_ASSERT(unicode::CanLowerCase(0x130), "U+0130 has a simple lower case mapping"); > + MOZ_ASSERT(unicode::CanLowerCase(0x3A3), "U+03A3 has a simple lower case mapping"); It'd be nice if CanLowerCase were constexpr so we could static_assert this, but I imagine that's not readably writable now, without C++14's generalized constexpr support. @@ +779,1 @@ > // Look for the first upper case character. This comment is literally correct, but it's not really the right way to describe it -- it operates at too low a level to make intuitive sense, IMO. It really should be: """ Look for the first character that changes when lowercased. """ It's true that that by definition means "is uppercase", but "uppercase" only has that meaning if you remember that this function performs a lowercasing operation. And when there's both uppercasing and lowercasing in the conceptual space, having to remember that is cognitive overhead. My phrasing, on the other hand, has its precise meaning without having to keep in mind context. @@ +795,5 @@ > if (unicode::CanLowerCase(c)) > break; > } > > // If all characters are lower case, return the input string. Possibly this could be "If no character needs to change, return the input string." @@ +818,5 @@ > + PodCopy(buf.get(), newChars.get(), readChars); > + newChars = Move(buf); > + > + MOZ_ALWAYS_TRUE((length == > + ToLowerCaseImpl(newChars.get(), chars, readChars, length, resultLength))); Remove the apparent extra parens here. @@ +872,5 @@ > /* > * Forcefully ignore the first (or any) argument and return toLowerCase(), > * ECMA has reserved that argument, presumably for defining the locale. > */ > if (cx->runtime()->localeCallbacks && cx->runtime()->localeCallbacks->localeToLowerCase) { I guess after the next patch (if it doesn't do so), please guard this block by #if EXPOSE_INTL_API, and add "// not used #if EXPOSE_INTL_API" by localeToLowerCase's field definition in jsapi.h. Ditto for the uppercase equivalents. @@ +922,5 @@ > + > +static inline size_t > +LengthUpperCaseSpecialCasing(char16_t charCode) > +{ > + return unicode::LengthUpperCaseSpecialCasing(charCode); MOZ_ASSERT(CanUpperCaseSpecialCasing(charCode)); @@ +990,5 @@ > +} > + > +static bool > +ToUpperCaseImpl(Latin1Char* destChars, const char16_t* srcChars, size_t startIndex, > + size_t srcLength, size_t destLength) First, is this function necessary so that the template function above is never instantiated such that it would fail the src/dest-types static_assert? Possibly could use a comment to that effect. Second, is this overload actually called in any code? If it's not called, using = delete instead seems better. @@ +1040,5 @@ > { > AutoCheckCannotGC nogc; > const CharT* chars = str->chars<CharT>(nogc); > > // Look for the first lower case character. Same comment-adjustment as above. @@ +1064,3 @@ > } > > // If all characters are upper case, return the input string. Same adjustment as above. @@ +1067,5 @@ > if (i == length) > return str; > > // If the string is Latin1, check if it contains the MICRO SIGN (0xb5) > // or SMALL LETTER Y WITH DIAERESIS (0xff) character. The corresponding """ The string changes when uppercased, so we must create a new string. Can it be Latin-1? If the original string is Latin-1, it can -- unless the string contains U+00B5 MICRO SIGN or U+00FF SMALL LETTER Y WITH DIAERESIS, the only Latin-1 codepoints that don't uppercase within Latin-1. Search for those codepoints to decide whether the new string can be Latin-1. If the original string is a two-byte string, its uppercase form is so rarely Latin-1 that we don't even consider creating a new Latin-1 string. """ @@ +1088,5 @@ > } > > if (resultIsLatin1) { > + resultLength = length; > + Latin1CharPtr buf = cx->make_pod_array<Latin1Char>(resultLength + 1); If we make_pod_array here but find something inflated, we make_pod_array *again* and copy into it. But when the new allocation is only slightly bigger, which it almost always will be, we could just realloc and possibly avoid a fresh allocation and copy. That seems much better to me. So let's do something along those lines. Maybe tempalate<typename CharT> static bool ReallocChars(JSContext* cx, typename JS::UniquePtr<CharT[], JS::FreePolicy>* chars, size_t oldLength, size_t newLength) { CharT* oldChars = chars->release(); CharT* newChars = cx->maybe_pod_realloc(oldChars, oldLength, newLength); if (!newChars) return false; chars->reset(newChars); return true; } or so. @@ +1106,5 @@ > + CopyChars(buf2.get(), buf.get(), readChars); > + buf = Move(buf2); > + > + MOZ_ALWAYS_TRUE((length == > + ToUpperCaseImpl(buf.get(), chars, readChars, length, resultLength))); Extra parens here and in the two-byte block below. @@ +1112,1 @@ > newChars.construct<Latin1CharPtr>(Move(buf)); This whole block is exactly duplicated in the not-Latin-1 block. In a followup, could you make this one template function that's separately called in the two blocks? ::: js/src/vm/Unicode.h @@ +239,5 @@ > return CharInfo(ch).isSpace(); > } > > +/* > + * Returns the simple upper case mapping of the given UTF-16 code unit. "simple upper case mapping" (see CanUpperCaseSpecialCasing for details) and make the same change for ToLowerCase. @@ +342,5 @@ > + * Unicode defines two case mapping modes: > + * 1. "simple case mappings" for one-to-one mappings which are independent of > + * context and language (defined in UnicodeData.txt). > + * 2. "special case mappings" for mappings which can increase or decrease the > + * string length or are dependent of context or language (defined in dependent on (or independent of? but I don't think that's what was meant). @@ +343,5 @@ > + * 1. "simple case mappings" for one-to-one mappings which are independent of > + * context and language (defined in UnicodeData.txt). > + * 2. "special case mappings" for mappings which can increase or decrease the > + * string length or are dependent of context or language (defined in > + * SpecialCasing.txt) Maybe end #1 with a comma or semicolon, then end this with a period. @@ +349,5 @@ > + * The CanUpperCase() method defined above only supports simple case mappings. > + * In order to support the full case mappings of all Unicode characters, > + * callers need to check this method in addition to CanUpperCase(). > + * > + * NOTE: All special upper case mappings are unconditional in Unicode 9. Maybe "permanently unconditional as of Unicode 5", which I think is right but I could be mistaken about if I misunderstand other readings I've done. @@ +357,5 @@ > + > +/* > + * Returns the length of the upper case mapping of |ch|. > + * > + * An assertion is raised if |ch| doesn't have a special upper case mapping. This function asserts if and below as well.

Attachment #8840178 - Flags: review?(jwalden+bmo) → review-

Jeff Walden [:Waldo]

Comment 23

•

9 years ago

(In reply to André Bargull from comment #21) > (In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #14) > > ::: js/src/vm/Unicode.cpp > > @@ +2685,5 @@ > > > + case 0x1F88: return 2; > > > + case 0x1F89: return 2; > > > + case 0x1F8A: return 2; > > > + case 0x1F8B: return 2; > > > + case 0x1F8C: return 2; > > > > A part of me sort of wishes these had the codepoint names in a comment next > > to them, like they are in the test. But this is followup-land at best. > > Do you also want the names added to the > AppendUpperCaseSpecialCasing() and the existing IsIdentifierStartNonBMP() > and IsIdentifierPartNonBMP() methods? Yeah, I think so.

Jeff Walden [:Waldo]

Comment 24

•

9 years ago

Comment on attachment 8840179 [details] [diff] [review] bug1318403-part5-intl-case-conversion.patch Review of attachment 8840179 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/builtin/Intl.cpp @@ +798,5 @@ > > } // anonymous namespace > > +int32_t u_strToLower(UChar* dest, int32_t destCapacity, const UChar* src, int32_t srcLength, > + const char* locale, UErrorCode* pErrorCode) int32_t u_strToLower(...) { and same below. @@ +3641,5 @@ > > +/******************** String ********************/ > + > +// Lithuanian, Turkish, and Azeri have language dependent case mappings. > +const char* const languagesWithSpecialCasing[] = { "lt", "tr", "az" }; Might as well do static const char languagesWC[][3] to guarantee these strings are stored compactly/adjacently in memory. Also, if these are all only used in one function, put the array *in* that function. @@ +3657,5 @@ > + if (equal(locale, language, 2)) > + return language; > + } > + } > + return ""; // ICU root locale Blank line before this. ::: js/src/builtin/Intl.js @@ +510,5 @@ > + > + // Replace deprecated subtags with their preferred values. > + locale = callFunction(std_Object_hasOwnProperty, langSubtagMappings, locale) > + ? langSubtagMappings[locale] > + : locale; Please add an assertion to make_intl_data.py somewhere that |langTagMappings| doesn't contain any 2*3ALPHA mappings. Then add a backstop assertion here that there's no such mapping. ::: js/src/builtin/String.js @@ +627,5 @@ > + // Step 3. > + var requestedLocales = CanonicalizeLocaleList(locales); > + > + // Steps 4-6. > + requestedLocale = requestedLocales.length > 0 ? requestedLocales[0] : undefined; This use-just-the-first-element behavior is to spec, but an interesting deviation from how most of the Intl stuff works. Odd. (Ish?) ::: js/src/tests/Intl/String/toLocaleUpperCase.js @@ +31,5 @@ > +assertEq("a".toLocaleUpperCase(undefined), "A"); > +assertEq("a".toLocaleUpperCase([]), "A"); > +assertEq("a".toLocaleUpperCase({}), "A"); > +assertEq("a".toLocaleUpperCase({length: 0}), "A"); > +assertEq("a".toLocaleUpperCase({length: -1}), "A"); Seems like some verification with the "x-x" language tag would be desirable. (And also in the lowercasing file.) ::: js/src/vm/SelfHosting.cpp @@ +2354,5 @@ > #else > JS_FN("std_String_normalize", str_normalize, 0,0), > #endif > JS_FN("std_String_concat", str_concat, 1,0), > Mind removing the trailing WS while you're here?

Attachment #8840179 - Flags: review?(jwalden+bmo) → review+

André Bargull [:anba]

Assignee

Comment 25

•

9 years ago

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #22) > ::: js/src/jsstr.cpp > @@ +617,5 @@ > > + } > > + } > > + // Ignore any characters with the property Case_Ignorable. > > + // NB: We need to skip over all Case_Ignorable characters, even when > > + // they also have the Cased binary property. > > There's a tests of this behavior somewhere, right? Yes, this is covered in test262 (U+0345 COMBINING GREEK YPOGEGRAMMENI has the properties Cased and Case_Ignorable): https://github.com/tc39/test262/blob/master/test/built-ins/String/prototype/toLowerCase/special_casing_conditional.js (This is the test I mentioned on IRC when I said I first messed up the Final_Sigma check. :-) > @@ +712,5 @@ > > + > > + // Special case for U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). > > + if (c == 0x130) { > > + // Return if the output buffer is too small. > > + if (srcLength == destLength) > > Prefer >= as safer if something goes awry. Also in similar other places. I think using >= could be confusing given that the function starts with the assertion |MOZ_ASSERT(srcLength <= destLength)|. Maybe I should have added a comment to ToLowerCaseImpl explaining the two valid ways this function can be called: --- If srcLength == destLength is true, the output buffer was optimistically allocated with the same size as the input buffer. When we append characters which have special casing mappings, we test `srcLength == destLength` to decide if we need to back out and reallocate a sufficiently large output buffer. Otherwise the output buffer was allocated with the correct size to hold all lower case mapped characters, i.e. destLength == ToLowerCaseLength(chars, 0, len(chars)) is true. --- > > @@ +743,5 @@ > > +template <typename CharT> > > +static size_t > > +ToLowerCaseLength(const CharT* chars, size_t startIndex, size_t length) > > +{ > > + size_t lowerLength = length; > > Maybe add > > // U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE is the only codepoint > that > // lowercases to more than one codepoint. > if (mozilla::IsSame<Latin1Char, CharT>::value) > return lowerLength; > > so we don't have to rely on the compiler recognizing the loop never adds 1 > in the Latin1Char case. This function should never be called with Latin1 characters, because the first call to ToLowerCaseImpl will always succeed. (This is one of the reasons I wished if-constexpr was already a thing, because then I could guard the statements which are only necessary for Latin1 resp. don't apply for Latin1 characters.) > @@ +773,5 @@ > > + // We don't need extra special casing checks in the loop below, because > > + // U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE) and U+03A3 (GREEK > > + // CAPITAL LETTER SIGMA) already have simple lower case mappings. > > + MOZ_ASSERT(unicode::CanLowerCase(0x130), "U+0130 has a simple lower case mapping"); > > + MOZ_ASSERT(unicode::CanLowerCase(0x3A3), "U+03A3 has a simple lower case mapping"); > > It'd be nice if CanLowerCase were constexpr so we could static_assert this, > but I imagine that's not readably writable now, without C++14's generalized > constexpr support. Agreed. > > @@ +990,5 @@ > > +} > > + > > +static bool > > +ToUpperCaseImpl(Latin1Char* destChars, const char16_t* srcChars, size_t startIndex, > > + size_t srcLength, size_t destLength) > > First, is this function necessary so that the template function above is > never instantiated such that it would fail the src/dest-types static_assert? > Possibly could use a comment to that effect. > > Second, is this overload actually called in any code? If it's not called, > using = delete instead seems better. Yes, this definition was only necessary for the static_assert. > > @@ +1088,5 @@ > > } > > > > if (resultIsLatin1) { > > + resultLength = length; > > + Latin1CharPtr buf = cx->make_pod_array<Latin1Char>(resultLength + 1); > > If we make_pod_array here but find something inflated, we make_pod_array > *again* and copy into it. But when the new allocation is only slightly > bigger, which it almost always will be, we could just realloc and possibly > avoid a fresh allocation and copy. That seems much better to me. So let's > do something along those lines. Maybe > > tempalate<typename CharT> > static bool > ReallocChars(JSContext* cx, typename JS::UniquePtr<CharT[], JS::FreePolicy>* > chars, > size_t oldLength, size_t newLength) > { > CharT* oldChars = chars->release(); > CharT* newChars = cx->maybe_pod_realloc(oldChars, oldLength, newLength); > if (!newChars) > return false; > > chars->reset(newChars); > return true; > } > > or so. Okay, I can try to do this, but I'll split it into a separate patch, so it's easier for me to test and if I made any mistakes, we only need to back out the realloc part. :-) And |pod_realloc| instead of |maybe_pod_realloc|, right?

André Bargull [:anba]

Assignee

Comment 26

•

9 years ago

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #22) > @@ +349,5 @@ > > + * The CanUpperCase() method defined above only supports simple case mappings. > > + * In order to support the full case mappings of all Unicode characters, > > + * callers need to check this method in addition to CanUpperCase(). > > + * > > + * NOTE: All special upper case mappings are unconditional in Unicode 9. > > Maybe "permanently unconditional as of Unicode 5", which I think is right > but I could be mistaken about if I misunderstand other readings I've done. I didn't find any note in Unicode 9 that the mappings are permanently unconditional. 5.18 Case Mappings, "Complications for Case Mapping" section, only has this note: --- Because only a few context-sensitive case mappings exist, and because they involve only a very few characters, implementations may choose to hard-code the treatment of these characters for casing operations rather than using data-driven code based on the Unicode Character Database. However, if this approach is taken, each time the implementation is upgraded to a new version of the Unicode Standard, hard-coded casing operations should be checked for consistency with the updated data. See SpecialCasing.txt in the Unicode Character Database for details of context-sensitive case mappings. ---

André Bargull [:anba]

Assignee

Comment 27

•

9 years ago

Attached patch bug1318403-part4.1-string-special-casing.patch (obsolete) — Details — Splinter Review

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #22) > Looks broadly fine, but this is too many comments piling up for me to r+, > even tho I really want to. The updated patch should cover all review comments, except for this one: > @@ +712,5 @@ > > + > > + // Special case for U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). > > + if (c == 0x130) { > > + // Return if the output buffer is too small. > > + if (srcLength == destLength) > > Prefer >= as safer if something goes awry. Also in similar other places. I didn't change it to >=, but instead added a comment to explain the two possible values for |destLength|. > @@ +743,5 @@ > > +template <typename CharT> > > +static size_t > > +ToLowerCaseLength(const CharT* chars, size_t startIndex, size_t length) > > +{ > > + size_t lowerLength = length; > > Maybe add > > // U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE is the only codepoint > that > // lowercases to more than one codepoint. > if (mozilla::IsSame<Latin1Char, CharT>::value) > return lowerLength; > > so we don't have to rely on the compiler recognizing the loop never adds 1 > in the Latin1Char case. Added a Latin1Char version of ToLowerCaseLength which has `MOZ_ASSERT_UNREACHABLE("Never called for Latin1 strings");`. > > @@ +990,5 @@ > > +} > > + > > +static bool > > +ToUpperCaseImpl(Latin1Char* destChars, const char16_t* srcChars, size_t startIndex, > > + size_t srcLength, size_t destLength) > > First, is this function necessary so that the template function above is > never instantiated such that it would fail the src/dest-types static_assert? > Possibly could use a comment to that effect. Added a comment. > Second, is this overload actually called in any code? If it's not called, > using = delete instead seems better. It's never called at runtime, but the compiler is unable to see this at compile-time. > > @@ +1088,5 @@ > > } > > > > if (resultIsLatin1) { > > + resultLength = length; > > + Latin1CharPtr buf = cx->make_pod_array<Latin1Char>(resultLength + 1); > > If we make_pod_array here but find something inflated, we make_pod_array > *again* and copy into it. But when the new allocation is only slightly > bigger, which it almost always will be, we could just realloc and possibly > avoid a fresh allocation and copy. That seems much better to me. So let's > do something along those lines. Maybe > Addressed in part 4.2. > > @@ +1112,1 @@ > > newChars.construct<Latin1CharPtr>(Move(buf)); > > This whole block is exactly duplicated in the not-Latin-1 block. In a > followup, could you make this one template function that's separately called > in the two blocks? > Addressed in part 4.3.

Attachment #8840178 - Attachment is obsolete: true

Attachment #8846697 - Flags: review?(jwalden+bmo)

Jeff Walden [:Waldo]

Comment 28

•

9 years ago

Ah, right. I was thinking of the Case Folding Stability Policy, but that doesn't have the meaning I thought it had. http://www.unicode.org/policies/stability_policy.html#Case_Folding

André Bargull [:anba]

Assignee

Comment 29

•

9 years ago

Attached patch bug1318403-part4.2-use-realloc.patch (obsolete) — Details — Splinter Review

Changed malloc to realloc per comment #22. I didn't like the pointer of UniquePtr approach, so I used normal UniquePtrs and Moves.

Attachment #8846698 - Flags: review?(jwalden+bmo)

André Bargull [:anba]

Assignee

Comment 30

•

9 years ago

Attached patch bug1318403-part4.3-toupper-duplicate-code.patch (obsolete) — Details — Splinter Review

From comment #22: Moves the duplicated code out of ToUpperCase into a new helper method.

Attachment #8846699 - Flags: review?(jwalden+bmo)

André Bargull [:anba]

Assignee

Comment 31

•

9 years ago

Attached patch bug1318403-part5-intl-case-conversion.patch (obsolete) — Details — Splinter Review

Updated part 5 per review comments, carrying r+ from Waldo. And additionally changed the JSAutoByteString to a normal JSLinearString*, so we don't need to copy the language tag string needlessly.

Attachment #8840179 - Attachment is obsolete: true

Attachment #8846701 - Flags: review+

André Bargull [:anba]

Assignee

Updated

•

9 years ago

Attachment #8846697 - Attachment filename: bug1318403-part4-string-special-casing.patch → bug1318403-part4.1-string-special-casing.patch

André Bargull [:anba]

Assignee

Updated

•

9 years ago

Attachment #8846697 - Attachment description: bug1318403-part4-string-special-casing.patch → bug1318403-part4.1-string-special-casing.patch

Attachment #8846697 - Attachment filename: bug1318403-part4.1-string-special-casing.patch → bug1318403-part4-string-special-casing.patch

André Bargull [:anba]

Assignee

Comment 32

•

9 years ago

Attached patch bug1318403-part6-update-exclusion.patch (obsolete) — Details — Splinter Review

Updated part 6 to apply cleanly on inbound. Carrying r+.

Attachment #8840180 - Attachment is obsolete: true

Attachment #8846703 - Flags: review+

André Bargull [:anba]

Assignee

Comment 33

•

9 years ago

Attached patch bug1318403-part7-codepoint-names-for-generated.patch — Details — Splinter Review

Updates make_unicode.py to print the character names in all generated files. I've used the existing |test_table|, renamed it to |codepoint_table|, and changed it to also save the details of non-bmp characters. Drive-by changes: Moved some lines in process_unicode_data(...) which are only needed for bmp code-points. And removed the for_each_non_bmp_group function, because int_ranges provides the same functionality.

Attachment #8846706 - Flags: review?(jwalden+bmo)

André Bargull [:anba]

Assignee

Comment 34

•

9 years ago

Attached patch bug1318403-part8-generate-with-codepoint-names.patch — Details — Splinter Review

Regenerated all Unicode files after part 7.

Attachment #8846708 - Flags: review?(jwalden+bmo)

Jeff Walden [:Waldo]

Comment 35

•

9 years ago

Comment on attachment 8846698 [details] [diff] [review] bug1318403-part4.2-use-realloc.patch Review of attachment 8846698 [details] [diff] [review]: ----------------------------------------------------------------- I preferred rewriting the UniquePtr in place, but eh.

Attachment #8846698 - Flags: review?(jwalden+bmo) → review+

Jeff Walden [:Waldo]

Comment 36

•

9 years ago

Comment on attachment 8846697 [details] [diff] [review] bug1318403-part4.1-string-special-casing.patch Review of attachment 8846697 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsstr.cpp @@ +710,5 @@ > + continue; > + } > + } > + > + // Special case for U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE. The named constant means this is unnecessary. But really, it makes more sense to combine it with the "is lwoercased to" comment below, so: """ Special case: U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE lowers to <U+0069 U+0307>. """ or somesuch. @@ +722,5 @@ > + destChars[j++] = CharT(unicode::COMBINING_DOT_ABOVE); > + continue; > + } > + > + // Special case for U+03A3 GREEK CAPITAL LETTER SIGMA. """ Greek uppercase sigma lowercases to one of two codepoints depending on context. """ @@ +757,5 @@ > + > +static size_t > +ToLowerCaseLength(const Latin1Char* chars, size_t startIndex, size_t length) > +{ > + MOZ_ASSERT_UNREACHABLE("Never called for Latin1 strings"); Preference for lowercase at the start of assertion messages. @@ +766,5 @@ > static JSString* > ToLowerCase(JSContext* cx, JSLinearString* str) > { > + // Unlike toUpperCase, toLowerCase has the nice invariant that if the > + // input is a Latin1 string, the output is also a Latin1 string. Latin-1, and elsewhere.

Attachment #8846697 - Flags: review?(jwalden+bmo) → review+

Jeff Walden [:Waldo]

Comment 37

•

9 years ago

Comment on attachment 8846699 [details] [diff] [review] bug1318403-part4.3-toupper-duplicate-code.patch Review of attachment 8846699 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsstr.cpp @@ +1070,5 @@ > + > + *resultLength = length; > + DestCharPtr buf = cx->make_pod_array<DestChar>(length + 1); > + if (!buf) > + return DestCharPtr(); Could return buf.

Attachment #8846699 - Flags: review?(jwalden+bmo) → review+

Jeff Walden [:Waldo]

Updated

•

9 years ago

Attachment #8846706 - Flags: review?(jwalden+bmo) → review+

Jeff Walden [:Waldo]

Updated

•

9 years ago

Attachment #8846708 - Flags: review?(jwalden+bmo) → review+

André Bargull [:anba]

Assignee

Comment 38

•

9 years ago

Attached patch bug1318403-part4.1-string-special-casing.patch — Details — Splinter Review

Update part 4.1 per review comments, carrying r+ from Waldo.

Attachment #8846697 - Attachment is obsolete: true

Attachment #8855266 - Flags: review+

André Bargull [:anba]

Assignee

Comment 39

•

9 years ago

Attached patch bug1318403-part4.2-use-realloc.patch — Details — Splinter Review

Update part 4.2 to apply on changes from 4.1, carrying r+.

Attachment #8846698 - Attachment is obsolete: true

Attachment #8855267 - Flags: review+

André Bargull [:anba]

Assignee

Comment 40

•

9 years ago

Attached patch bug1318403-part4.3-toupper-duplicate-code.patch — Details — Splinter Review

Update part 4.3 per review comments, carrying r+.

Attachment #8846699 - Attachment is obsolete: true

Attachment #8855268 - Flags: review+

André Bargull [:anba]

Assignee

Comment 41

•

9 years ago

Attached patch bug1318403-part5-intl-case-conversion.patch — Details — Splinter Review

Update part 5 to apply on changes from part 4, carrying r+.

Attachment #8846701 - Attachment is obsolete: true

Attachment #8855269 - Flags: review+

André Bargull [:anba]

Assignee

Comment 42

•

9 years ago

Attached patch bug1318403-part6-update-exclusion.patch — Details — Splinter Review

Update part 6 to apply cleanly on inbound, carrying r+.

Attachment #8846703 - Attachment is obsolete: true

Attachment #8855270 - Flags: review+

André Bargull [:anba]

Assignee

Comment 43

•

9 years ago

Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=81c13cd4fcae6f8b2137a8828cf706df43d6b58d

Keywords: checkin-needed

Pulsebot

Comment 44

•

9 years ago

Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/9a91cb4b12e5 Part 1.1: Reduce code duplication in make_unicode. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/3a6ac8fe8434 Part 1.2: Recreate files created through make_unicode. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/e32eeab1e231 Part 2: Update make_unicode.py to process SpecialCasing.txt. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/0a8e8e545554 Part 3: Recreate files created through make_unicode to include special casing info. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/c34970cf36e9 Part 4.1: Handle special casing characters in String.prototype.to(Lower|Upper)Case. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/23da062eab54 Part 4.2: Use realloc instead of malloc when resizing a newly created string buffer. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/26c080929f59 Part 4.3: Move duplicate code in ToUpperCase to new helper method. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/a9b7a65ca73a Part 5: Reimplement String.prototype.toLocale{Lower,Upper}Case per ECMAScript Intl specification. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/5b06ed0f1b18 Part 6: Update test262 exclusion list. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/3043922c81c7 Part 7: Add code point names in generated Unicode sources. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/c008acb08872 Part 8: Generate Unicode resources to include code point names. r=Waldo

Keywords: checkin-needed

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 45

•

9 years ago

Flags: needinfo?(andrebargull)

André Bargull [:anba]

Assignee

Comment 46

•

9 years ago

This looks like a false-positive by the rooting analysis. It claims ToLowerCaseImpl (in jsstr.cpp) can GC, but I don't see how this is possible. Maybe the call to ICU's u_hasBinaryProperty in Final_Sigma (which is called from ToLowerCaseImpl) confuses the rooting analysis? Adding JS::AutoSuppressGCAnalysis to Final_Sigma seems to fix the rooting hazard: https://treeherder.mozilla.org/#/jobs?repo=try&revision=255b27c8f6b51fbb11f38af5621624333ef960bf @sfink: What is the correct way to let the rooting analysis know that a certain function cannot gc?

Flags: needinfo?(andrebargull) → needinfo?(sphink)

Steve Fink [:sfink] [:s:]

Comment 47

•

9 years ago

It would be much better to look at https://queue.taskcluster.net/v1/task/NHLSaCnQQj6_mU2ciVr5Rg/runs/0/artifacts/public/build/hazards.txt.gz since it gives the GC stack. In this case, Final_Sigma calls u_hasBinaryProperty_58 calls a function pointer named 'contains' in a BinaryProperty type, and it assumes that can do anything or everything, including GC. Your patch looks good to me. It's suppressing the checks in a function that doesn't touch anything related to GC, other than taking in a pointer that could theoretically point into a JSString. But (1) it looks like you only pass in a malloced buffer, and (2) this really *can't* GC. Except (1) doesn't matter, because it turns out GC *would* be fatal here, which is why there is an AutoCheckCannotGC that the hazard analysis tripped over. Anyway, the choices would be (1) use AutoSuppressGCAnalysis exactly as you did; (2) add u_hasBinaryProperty_58 to annotations.js to whitelist it, but that'd be gross because of the macro-added _58 prefix; or (3) whitelist BinaryProperty.contains in annotations.js. The choice of (1) vs (3) doesn't matter much unless we start hitting more hazards, and we'd want something that gets rid of all of them at once. The only thing I'd change is the comment. // u_hasBinaryProperty cannot gc, but the analysis doesn't know about it. The analysis *does* know about it, with its macro-mangled name even. It just sees that it calls through a function pointer that it has no way of knowing what is. So the comment should refer to BinaryProperty.contains, called by u_hasBinaryProperty.

Flags: needinfo?(sphink)

Steve Fink [:sfink] [:s:]

Comment 48

•

9 years ago

(r=me with the comment fix)

André Bargull [:anba]

Assignee

Comment 49

•

9 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #47) > It would be much better to look at > https://queue.taskcluster.net/v1/task/NHLSaCnQQj6_mU2ciVr5Rg/runs/0/ > artifacts/public/build/hazards.txt.gz since it gives the GC stack. Thanks for the tip about "hazards.txt.gz"! > Anyway, the choices would be (1) use AutoSuppressGCAnalysis exactly as you > did; (2) add u_hasBinaryProperty_58 to annotations.js to whitelist it, but > that'd be gross because of the macro-added _58 prefix; or (3) whitelist > BinaryProperty.contains in annotations.js. The choice of (1) vs (3) doesn't > matter much unless we start hitting more hazards, and we'd want something > that gets rid of all of them at once. Hmm, maybe it's better (or rather more future-proof) to use (3), given that we eventually want to support https://github.com/tc39/proposal-regexp-unicode-property-escapes, which will also need to call u_hasBinaryProperty?

André Bargull [:anba]

Assignee

Comment 50

•

9 years ago

(In reply to André Bargull from comment #49) > Hmm, maybe it's better (or rather more future-proof) to use (3), given that > we eventually want to support > https://github.com/tc39/proposal-regexp-unicode-property-escapes, which will > also need to call u_hasBinaryProperty? On second thought we can always switch to (3) later, so we don't need to guess now how we will implement Unicode property escapes then. :-)

André Bargull [:anba]

Assignee

Comment 51

•

9 years ago

Attached patch bug1318403-part9-suppress-gc-analysis.patch — Details — Splinter Review

Changed the comment to: // Tell the analysis the BinaryProperty.contains function pointer called by // u_hasBinaryProperty cannot GC. r+ by sfink in comment #48.

Attachment #8856249 - Flags: review+

André Bargull [:anba]

Assignee

Comment 52

•

9 years ago

Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8db624e361451696628a81d14d362fb4a59c467e

Keywords: checkin-needed

Pulsebot

Comment 53

•

9 years ago

Pushed by ihsiao@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/503d276ad3bb Part 1.1: Reduce code duplication in make_unicode. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/6890eedfe3be Part 1.2: Recreate files created through make_unicode. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/8424a6a068da Part 2: Update make_unicode.py to process SpecialCasing.txt. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/70abb1ca3cba Part 3: Recreate files created through make_unicode to include special casing info. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/d539219ef53a Part 4.1: Handle special casing characters in String.prototype.to(Lower|Upper)Case. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/3ebf41460510 Part 4.2: Use realloc instead of malloc when resizing a newly created string buffer. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/41df7f41e185 Part 4.3: Move duplicate code in ToUpperCase to new helper method. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/0e77e970cac1 Part 5: Reimplement String.prototype.toLocale{Lower,Upper}Case per ECMAScript Intl specification. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/85b3106777fa Part 6: Update test262 exclusion list. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/946ad1048730 Part 7: Add code point names in generated Unicode sources. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/d0c6a4e68b34 Part 8: Generate Unicode resources to include code point names. r=Waldo https://hg.mozilla.org/integration/mozilla-inbound/rev/85adabbcb617 Part 9: Suppress rooting analysis when calling u_hasBinaryProperty. r=sfink

Keywords: checkin-needed

Carsten Book [:Tomcat]

Comment 54

•

9 years ago

bugherder

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox55: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Julien Cristau [:jcristau]

Updated

•

9 years ago

status-firefox53: affected → ---

Florian Scholz (Open Web Docs)

Comment 56

•

8 years ago

https://developer.mozilla.org/en-US/Firefox/Releases/55#JavaScript https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toLocaleLowerCase https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toLocaleUpperCase

Keywords: dev-doc-needed → dev-doc-complete

You need to log in before you can comment on or make changes to this bug.