Closed Bug 1317397 Opened 8 years ago Closed 7 years ago

Implement RegExp.lastIndex changes from ES2017

Categories

(Core :: JavaScript: Standard Library, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: anba, Assigned: anba)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

lastIndex is no longer read resp. written to when the RegExp object is neither global nor sticky.


Spec changes:
https://github.com/tc39/ecma262/pull/627/files
The PR was accepted, but it seems to be erroneous and hence not implementable for us. Consider the following case:

```
var re = /(.)\1/gu;
re.lastIndex = {
  valueOf() {
    // Recompile without Unicode-flag.
    re.compile(re.source, "g");
    return 0;
  }
};
var r = re.exec("\u{D83D}\u{DCA3}\u{DCA3}");
print(r ? r[0] : null);
```

With the new semantics, it seems that the expected result value is `null`. And here's why:

In RegExpBuiltinExec, the [[OriginalFlags]] is retrieved and stored, followed by calling ToLength(Get(R, "lastIndex")). Side-effects in ToLength can recompile the RegExp object! When the [[RegExpMatcher]] tries to find a match at string index 0, a failure is returned, because the recompiled RegExp works on code units and the sequence <U+D83D, U+DCA3> doesn't match "(.)\1". AdvanceStringIndex is called to advance to the next string index, but AdvanceStringIndex uses fullUnicode=true, so it skips over the complete code point U+1F4A3 and returns 2. The matcher also won't be able to find a match at string index 2, and therefore RegExpBuiltinExec returns `null`.

I don't think we want to support these semantics, so back to TC39 to get this corrected.
partially reverted:
https://github.com/tc39/ecma262/pull/798/files

remaining changes:

https://tc39.github.io/ecma262/#sec-regexpbuiltinexec
> 21.2.5.2.2 Runtime Semantics: RegExpBuiltinExec ( R, S )
> ...
> 12. Repeat, while matchSucceeded is false
>   a. If lastIndex > length, then
>     i. If global is true or sticky is true, then
>       1. Perform ? Set(R, "lastIndex", 0, true).
> ...

https://tc39.github.io/ecma262/#sec-regexp.prototype-@@search
> 21.2.5.9 RegExp.prototype [ @@search ] ( string )
> ...
> 5. If SameValue(previousLastIndex, 0) is false, then
>   a. Perform ? Set(rx, "lastIndex", 0, true).
> ...
> 7. Let currentLastIndex be ? Get(rx, "lastIndex").
> 8. If SameValue(currentLastIndex, previousLastIndex) is false, then
>   a. Perform ? Set(rx, "lastIndex", previousLastIndex, true).
Depends on: 1343375
Attached patch bug1317397.patch (obsolete) — Splinter Review
The changes for the optimized path in RegExpSearch are bit more tricky, please let me know if you think it needs more explanations.
Assignee: nobody → andrebargull
Status: NEW → ASSIGNED
Attachment #8843468 - Flags: review?(arai.unmht)
Comment on attachment 8843468 [details] [diff] [review]
bug1317397.patch

Review of attachment 8843468 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you :)
great testcases!

the comment for RegExpSearch opt path should be sufficient.
Attachment #8843468 - Flags: review?(arai.unmht) → review+
Attached patch bug1317397.patchSplinter Review
Updated the patch to:
- fix an eslint style warning
- use correct git hashes for ES2017 draft references
- and to apply cleanly on inbound.

Carrying r+ from arai.
Attachment #8843468 - Attachment is obsolete: true
Attachment #8844060 - Flags: review+
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/21743640e4d1
Only set lastIndex for global or sticky RegExps in RegExpBuiltinExec per ES2017. r=arai
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/21743640e4d1
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: